From: Masami Hiramatsu <mhiramat@redhat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Andi Kleen <andi@firstfloor.org>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
Rusty Russell <rusty@rustcorp.com.au>,
"H. Peter Anvin" <hpa@zytor.com>,
Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions
Date: Fri, 27 Feb 2009 13:34:31 -0500 [thread overview]
Message-ID: <49A83237.40604@redhat.com> (raw)
In-Reply-To: <20090227180724.GA17947@Krystal>
Mathieu Desnoyers wrote:
> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>> Steven Rostedt wrote:
>>> On Mon, 23 Feb 2009, Mathieu Desnoyers wrote:
>>>>> Hmm, lets see. I simply set a bit in the PTE mappings. There's not many,
>>>>> since a lot are 2M pages, for x86_64. Call stop_machine, and now I can
>>>>> modify 1 or 20,000 locations. Set the PTE bit back. Note, the changing of
>>>>> the bits are only done when CONFIG_DEBUG_RODATA is set.
>>>>>
>>>>> text_poke requires allocating a page. Map the page into memory. Set up a
>>>>> break point.
>>>> text_poke does not _require_ a break point. text_poke can work with
>>>> stop_machine.
>>> It can? Doesn't text_poke require allocating pages? The code called by
>>> stop_machine is all atomic. vmap does not give an option to allocate with
>>> GFP_ATOMIC.
>> Hi,
>>
>> With my patch, text_poke() never allocate pages any more :)
>>
>> BTW, IMHO, both of your methods are useful and have trade-off.
>>
>> ftrace wants to change massive amount of code at once. If we do
>> that with text_poke(), we have to map/unmap pages each time and
>> it will take a long time -- might be longer than one stop_machine_run().
>>
>> On the other hand, text_poke() user like as kprobes and tracepoints,
>> just want to change a few amount of code at once, and it will be
>> added/removed incrementally. If we do that with stop_machine_run(),
>> we'll be annoyed by frequent machine stops.(Moreover, kprobes uses
>> breakpoint, so it doesn't need stop_machine_run())
>>
>
> Hi Masami,
>
> Is this text_poke version executable in atomic context ? If yes, then
> that would be good to add a comment saying it. Please see below for
> comments.
Thank you for comments!
I think it could be. ah, spin_lock might be changed to spin_lock_irqsave()...
>> Thank you,
>>
> [...]
>> Use map_vm_area() instead of vmap() in text_poke() for avoiding page allocation
>> and delayed unmapping.
>>
>> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
>> ---
>> arch/x86/include/asm/alternative.h | 1 +
>> arch/x86/kernel/alternative.c | 25 ++++++++++++++++++++-----
>> init/main.c | 3 +++
>> 3 files changed, 24 insertions(+), 5 deletions(-)
>>
>> Index: linux-2.6/arch/x86/include/asm/alternative.h
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/include/asm/alternative.h
>> +++ linux-2.6/arch/x86/include/asm/alternative.h
>> @@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
>> * The _early version expects the memory to already be RW.
>> */
>>
>> +extern void text_poke_init(void);
>> extern void *text_poke(void *addr, const void *opcode, size_t len);
>> extern void *text_poke_early(void *addr, const void *opcode, size_t len);
>>
>> Index: linux-2.6/arch/x86/kernel/alternative.c
>> ===================================================================
>> --- linux-2.6.orig/arch/x86/kernel/alternative.c
>> +++ linux-2.6/arch/x86/kernel/alternative.c
>> @@ -485,6 +485,16 @@ void *text_poke_early(void *addr, const
>> return addr;
>> }
>>
>> +static struct vm_struct *text_poke_area[2];
>> +static DEFINE_SPINLOCK(text_poke_lock);
>> +
>> +void __init text_poke_init(void)
>> +{
>> + text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
>> + text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
>
> Why is this text_poke_area[1] 2 * PAGE_SIZE in size ? I would have
> thought that text_poke_area[0] would be PAGE_SIZE, text_poke_area[1]
> also be PAGE_SIZE, and that the sum of both would be 2 * PAGE_SIZE..
Unfortunately, current map_vm_area() tries to map the size of vm_area,
this means, you can't use 2page-size vm_area for mapping just 1 page...
(or maybe, we can set pages[1] = pages[0] when 2nd page doesn't exist)
>> + BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
>> +}
>> +
>> /**
>> * text_poke - Update instructions on a live kernel
>> * @addr: address to modify
>> @@ -501,8 +511,9 @@ void *__kprobes text_poke(void *addr, co
>> unsigned long flags;
>> char *vaddr;
>> int nr_pages = 2;
>> - struct page *pages[2];
>> - int i;
>> + struct page *pages[2], **pgp = pages;
>
> Hrm, why do you need **pgp ? Could you simply pass &pages to map_vm_area ?
As you know, pages means just the address(value) of an array, so you can't
get the address of the address...(pages and &pages are same.)
int array[2];
printf("%p, %p",array, &array);
please try it :)
And actually, map_vm_area() requires the address of a pointer.
---
int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
{
unsigned long addr = (unsigned long)area->addr;
unsigned long end = addr + area->size - PAGE_SIZE;
int err;
err = vmap_page_range(addr, end, prot, *pages);
if (err > 0) {
*pages += err;
^^^^^^^^^^^^^^ Here, it tries to add err(=number of mapped pages)
to the pages pointer!
err = 0;
}
return err;
}
---
>
> Thanks,
>
> Mathieu
>
>> + int i, ret;
>> + struct vm_struct *vma;
>>
>> if (!core_kernel_text((unsigned long)addr)) {
>> pages[0] = vmalloc_to_page(addr);
>> @@ -515,12 +526,16 @@ void *__kprobes text_poke(void *addr, co
>> BUG_ON(!pages[0]);
>> if (!pages[1])
>> nr_pages = 1;
>> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>> - BUG_ON(!vaddr);
>> + spin_lock(&text_poke_lock);
>> + vma = text_poke_area[nr_pages-1];
>> + ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
>> + BUG_ON(ret);
>> + vaddr = vma->addr;
>> local_irq_save(flags);
>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>> local_irq_restore(flags);
>> - vunmap(vaddr);
>> + unmap_kernel_range((unsigned long)vma->addr, (unsigned long)vma->size);
>> + spin_unlock(&text_poke_lock);
>> sync_core();
>> /* Could also do a CLFLUSH here to speed up CPU recovery; but
>> that causes hangs on some VIA CPUs. */
>> @@ -528,3 +543,4 @@ void *__kprobes text_poke(void *addr, co
>> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
>> return addr;
>> }
>> +EXPORT_SYMBOL_GPL(text_poke);
>> Index: linux-2.6/init/main.c
>> ===================================================================
>> --- linux-2.6.orig/init/main.c
>> +++ linux-2.6/init/main.c
>> @@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
>> taskstats_init_early();
>> delayacct_init();
>>
>> +#ifdef CONFIG_X86
>> + text_poke_init();
>> +#endif
>> check_bugs();
>>
>> acpi_early_init(); /* before LAPIC and SMP init */
>
>
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
next prev parent reply other threads:[~2009-02-27 18:35 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-20 1:13 [git pull] changes for tip, and a nasty x86 page table bug Steven Rostedt
2009-02-20 1:13 ` [PATCH 1/6] x86: check PMD in spurious_fault handler Steven Rostedt
2009-02-20 1:13 ` [PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages Steven Rostedt
2009-02-20 1:13 ` [PATCH 3/6] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20 1:13 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20 1:32 ` Andrew Morton
2009-02-20 1:44 ` Steven Rostedt
2009-02-20 2:05 ` [PATCH][git pull] update to tip/tracing/ftrace Steven Rostedt
2009-02-22 17:50 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Andi Kleen
2009-02-22 22:53 ` Steven Rostedt
2009-02-23 0:29 ` Andi Kleen
2009-02-23 2:33 ` Mathieu Desnoyers
2009-02-23 4:29 ` Steven Rostedt
2009-02-23 4:53 ` Mathieu Desnoyers
2009-02-23 14:48 ` Steven Rostedt
2009-02-23 15:42 ` Mathieu Desnoyers
2009-02-23 15:51 ` Steven Rostedt
2009-02-23 15:55 ` Steven Rostedt
2009-02-23 16:13 ` Mathieu Desnoyers
2009-02-23 16:48 ` Steven Rostedt
2009-02-23 17:31 ` Mathieu Desnoyers
2009-02-23 18:17 ` Steven Rostedt
2009-02-23 18:34 ` Mathieu Desnoyers
2009-02-27 17:52 ` Masami Hiramatsu
2009-02-27 18:07 ` Mathieu Desnoyers
2009-02-27 18:34 ` Masami Hiramatsu [this message]
2009-02-27 18:53 ` Mathieu Desnoyers
2009-02-27 20:57 ` Masami Hiramatsu
2009-03-02 17:01 ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-02 17:19 ` Mathieu Desnoyers
2009-03-02 22:15 ` Masami Hiramatsu
2009-03-02 22:22 ` Ingo Molnar
2009-03-02 22:55 ` Masami Hiramatsu
2009-03-02 23:09 ` Ingo Molnar
2009-03-02 23:38 ` Masami Hiramatsu
2009-03-02 23:49 ` Ingo Molnar
2009-03-03 0:00 ` Mathieu Desnoyers
2009-03-03 0:00 ` [PATCH] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-03 0:32 ` Ingo Molnar
2009-03-03 0:39 ` Mathieu Desnoyers
2009-03-03 1:30 ` [PATCH] Text Edit Lock - Architecture Independent Code (v2) Mathieu Desnoyers
2009-03-03 1:31 ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Mathieu Desnoyers
2009-03-03 9:27 ` Ingo Molnar
2009-03-03 12:06 ` Ananth N Mavinakayanahalli
2009-03-03 14:28 ` Mathieu Desnoyers
2009-03-03 14:33 ` [PATCH] Text Edit Lock - kprobes architecture independent support (v3) Mathieu Desnoyers
2009-03-03 14:53 ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Ingo Molnar
2009-03-03 0:01 ` [PATCH] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-03 0:10 ` Masami Hiramatsu
2009-03-03 0:05 ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-03 0:22 ` Ingo Molnar
2009-03-03 0:31 ` Masami Hiramatsu
2009-03-03 16:31 ` [PATCH] x86: make text_poke() atomic using fixmap Masami Hiramatsu
2009-03-03 17:08 ` Mathieu Desnoyers
2009-03-05 10:38 ` Ingo Molnar
2009-03-06 14:06 ` Ingo Molnar
2009-03-06 14:49 ` Masami Hiramatsu
2009-03-02 18:28 ` [RFC][PATCH] x86: make text_poke() atomic Arjan van de Ven
2009-03-02 18:36 ` Mathieu Desnoyers
2009-03-02 18:55 ` Arjan van de Ven
2009-03-02 19:13 ` Masami Hiramatsu
2009-03-02 19:23 ` H. Peter Anvin
2009-03-02 19:47 ` Mathieu Desnoyers
2009-03-02 18:42 ` Linus Torvalds
2009-03-03 4:54 ` Nick Piggin
2009-02-23 18:23 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-23 9:02 ` Ingo Molnar
2009-02-27 21:08 ` Pavel Machek
2009-02-28 16:56 ` Andi Kleen
2009-02-28 22:08 ` Pavel Machek
[not found] ` <87wsba1a9f.fsf@basil.nowhere.org>
2009-02-28 22:19 ` Pavel Machek
2009-02-28 23:52 ` Andi Kleen
2009-02-20 1:13 ` [PATCH 5/6] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20 1:13 ` [PATCH 6/6] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20 2:00 ` [git pull] changes for tip, and a nasty x86 page table bug Linus Torvalds
2009-02-20 2:08 ` Steven Rostedt
2009-02-20 3:44 ` Linus Torvalds
2009-02-20 4:00 ` Steven Rostedt
2009-02-20 4:17 ` Linus Torvalds
2009-02-20 4:34 ` Steven Rostedt
2009-02-20 5:02 ` Huang Ying
2009-02-20 7:29 ` [PATCH] x86: use the right protections for split-up pagetables Ingo Molnar
2009-02-20 7:39 ` [PATCH, v2] " Ingo Molnar
2009-02-20 8:02 ` Ingo Molnar
2009-02-20 10:24 ` Ingo Molnar
2009-02-20 13:57 ` [PATCH] " Steven Rostedt
2009-02-20 15:40 ` Linus Torvalds
2009-02-20 16:59 ` Ingo Molnar
2009-02-20 18:33 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49A83237.40604@redhat.com \
--to=mhiramat@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@infradead.org \
--cc=fweisbec@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
--cc=srostedt@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.