All of lore.kernel.org
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: kernel: Use a separate stack for irq interrupts.
Date: Wed, 09 Sep 2015 19:13:42 +0100	[thread overview]
Message-ID: <55F076D6.60906@arm.com> (raw)
In-Reply-To: <9FF0599A-395F-43EC-A2C2-0206CE0B30C2@gmail.com>

On 09/09/15 14:22, Jungseok Lee wrote:
> On Sep 9, 2015, at 1:47 AM, James Morse wrote:
>> On 08/09/15 15:54, Jungseok Lee wrote:
>>> On Sep 7, 2015, at 11:36 PM, James Morse wrote:
>>>> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
>>>> index 463fa2e7e34c..10b57a006da8 100644
>>>> --- a/arch/arm64/kernel/irq.c
>>>> +++ b/arch/arm64/kernel/irq.c
>>>> @@ -26,11 +26,14 @@
>>>> #include <linux/smp.h>
>>>> #include <linux/init.h>
>>>> #include <linux/irqchip.h>
>>>> +#include <linux/percpu.h>
>>>> #include <linux/seq_file.h>
>>>> #include <linux/ratelimit.h>
>>>>
>>>> unsigned long irq_err_count;
>>>>
>>>> +DEFINE_PER_CPU(unsigned long, irq_sp) = 0;
>>>> +
>>>> int arch_show_interrupts(struct seq_file *p, int prec)
>>>> {
>>>> #ifdef CONFIG_SMP
>>>> @@ -55,6 +58,10 @@ void __init init_IRQ(void)
>>>> 	irqchip_init();
>>>> 	if (!handle_arch_irq)
>>>> 		panic("No interrupt controller found.");
>>>> +
>>>> +	/* Allocate an irq stack for the boot cpu */
>>>> +	if (alloc_irq_stack(smp_processor_id()))
>>>> +		panic("Failed to allocate irq stack for boot cpu.");
>>>> }
>>>>
>>>> #ifdef CONFIG_HOTPLUG_CPU
>>>> @@ -117,3 +124,48 @@ void migrate_irqs(void)
>>>> 	local_irq_restore(flags);
>>>> }
>>>> #endif /* CONFIG_HOTPLUG_CPU */
>>>> +
>>>> +/* Allocate an irq_stack for a cpu that is about to be brought up. */
>>>> +int alloc_irq_stack(unsigned int cpu)
>>>> +{
>>>> +	struct page *irq_stack_page;
>>>> +	union thread_union *irq_stack;
>>>> +
>>>> +	/* reuse stack allocated previously */
>>>> +	if (per_cpu(irq_sp, cpu))
>>>> +		return 0;
>>>
>>> I'd like to avoid even this simple check since CPU hotplug could be heavily
>>> used for power management.
>>
>> I don't think its a problem:
>> __cpu_up() contains a call to wait_for_completion_timeout() (which could
>> eventually end up in the scheduler), so I don't think it could ever be on a
>> 'really hot' path.
>>
>> For really frequent hotplug-like power management, cpu_suspend() makes use
>> of firmware support to power-off cores - from what I can see it doesn't use
>> __cpu_up().
> 
> In case of some platforms, CPU hotplug is triggered via sysfs for power management
> based on user data. What is advantage of putting stack allocation into this path?

It will only happen for CPUs that are brought up.


> IRQ stack allocation is an critical one-shot operation. So, there would be no issue
> to give this work to a booting core. 

I agree, but:

>From include/linux/cpumask.h:
> *  If HOTPLUG is enabled, then cpu_possible_mask is forced to have
> *  all NR_CPUS bits set, otherwise it is just the set of CPUs that
> *  ACPI reports present at boot.

(This doesn't seem to happen with DT - but might with ACPI.)

NR_CPUs could be much bigger than the number of cpus the system ever has.
Allocating a stack for every possible cpu would waste memory. It is better
to do it just-in-time, when we know the memory will be used.

This already happens for the per-cpu idle task, (please check I traced
these through correctly!)
_cpu_up()
  idle_thread_get()
    init_idle()
      fork_idle()
        copy_process()
          dup_task_struct()
            alloc_task_struct_node()
            alloc_thread_info_node()
            arch_dup_task_struct()

So plenty of memory-allocation occurs during _cpu_up(), idle_init() checks
whether the idle task has already been created.


>>>> +
>>>> +	irq_stack_page = alloc_kmem_pages(THREADINFO_GFP, THREAD_SIZE_ORDER);
>>>> +	if (!irq_stack_page) {
>>>> +		pr_crit("CPU%u: failed to allocate irq stack for cpu %u\n",
>>>> +			smp_processor_id(), cpu);
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +	irq_stack = page_address(irq_stack_page);
> 
> I forgot leaving a comment here. How about using __get_free_pages? It returns an
> address instead of page. 

Good idea, I was paranoid about the stack not being correctly aligned (as
current_thread_info() relies on this). I lifted that alloc_kmem_pages line
from kernel/fork.c.


>>>> +
>>>> +	per_cpu(irq_sp, cpu) = (unsigned long)irq_stack->stack
>>>> +			       + THREAD_START_SP;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Copy struct thread_info between two stacks, and update current->stack.
>>>> + * This is used when moving to the irq exception stack.
>>>> + * Changing current->stack is necessary so that non-arch thread_info writers
>>>> + * don't use the new thread_info->task->stack to find the old thread_info when
>>>> + * setting flags like TIF_NEED_RESCHED...
>>>> + */
>>>> +asmlinkage int irq_copy_thread_info(unsigned long dst_sp, unsigned long src_sp)
>>>> +{
>>>> +	struct thread_info *src = get_thread_info(src_sp);
>>>> +	struct thread_info *dst = get_thread_info(dst_sp);
>>>> +
>>>> +	if (dst == src)
>>>> +		return 0;
>>>> +
>>>> +	*dst = *src;
>>>> +	current->stack = dst;
>>>> +
>>>> +	return 1;
>>>> +}
>>>
>>> Although this is 32-byte memcpy, sizeof(struct thread_info), this function is called
>>> twice to handle a single interrupt. Isn's it too expensive?
>>>
>>> This is a major difference between my approach and this patch. I think we should get
>>> some feedbacks on struct thread_info information management for v2 preparation.
>>
>> Agreed.
>>
>> The other difference, as Akashi Takahiro pointed out, is the behaviour of
>> object_is_on_stack(). What should this return for an object on an irq stack?
> 
> 0 should be returned in that case to align with x86 behavior according to check_stack()
> context and the commit, 81520a1b0649d0.

Unless we can update ftrace on x86 too! :)


Thanks for your comments,


James

WARNING: multiple messages have this Message-ID (diff)
From: James Morse <james.morse@arm.com>
To: Jungseok Lee <jungseoklee85@gmail.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <Will.Deacon@arm.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] arm64: kernel: Use a separate stack for irq interrupts.
Date: Wed, 09 Sep 2015 19:13:42 +0100	[thread overview]
Message-ID: <55F076D6.60906@arm.com> (raw)
In-Reply-To: <9FF0599A-395F-43EC-A2C2-0206CE0B30C2@gmail.com>

On 09/09/15 14:22, Jungseok Lee wrote:
> On Sep 9, 2015, at 1:47 AM, James Morse wrote:
>> On 08/09/15 15:54, Jungseok Lee wrote:
>>> On Sep 7, 2015, at 11:36 PM, James Morse wrote:
>>>> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
>>>> index 463fa2e7e34c..10b57a006da8 100644
>>>> --- a/arch/arm64/kernel/irq.c
>>>> +++ b/arch/arm64/kernel/irq.c
>>>> @@ -26,11 +26,14 @@
>>>> #include <linux/smp.h>
>>>> #include <linux/init.h>
>>>> #include <linux/irqchip.h>
>>>> +#include <linux/percpu.h>
>>>> #include <linux/seq_file.h>
>>>> #include <linux/ratelimit.h>
>>>>
>>>> unsigned long irq_err_count;
>>>>
>>>> +DEFINE_PER_CPU(unsigned long, irq_sp) = 0;
>>>> +
>>>> int arch_show_interrupts(struct seq_file *p, int prec)
>>>> {
>>>> #ifdef CONFIG_SMP
>>>> @@ -55,6 +58,10 @@ void __init init_IRQ(void)
>>>> 	irqchip_init();
>>>> 	if (!handle_arch_irq)
>>>> 		panic("No interrupt controller found.");
>>>> +
>>>> +	/* Allocate an irq stack for the boot cpu */
>>>> +	if (alloc_irq_stack(smp_processor_id()))
>>>> +		panic("Failed to allocate irq stack for boot cpu.");
>>>> }
>>>>
>>>> #ifdef CONFIG_HOTPLUG_CPU
>>>> @@ -117,3 +124,48 @@ void migrate_irqs(void)
>>>> 	local_irq_restore(flags);
>>>> }
>>>> #endif /* CONFIG_HOTPLUG_CPU */
>>>> +
>>>> +/* Allocate an irq_stack for a cpu that is about to be brought up. */
>>>> +int alloc_irq_stack(unsigned int cpu)
>>>> +{
>>>> +	struct page *irq_stack_page;
>>>> +	union thread_union *irq_stack;
>>>> +
>>>> +	/* reuse stack allocated previously */
>>>> +	if (per_cpu(irq_sp, cpu))
>>>> +		return 0;
>>>
>>> I'd like to avoid even this simple check since CPU hotplug could be heavily
>>> used for power management.
>>
>> I don't think its a problem:
>> __cpu_up() contains a call to wait_for_completion_timeout() (which could
>> eventually end up in the scheduler), so I don't think it could ever be on a
>> 'really hot' path.
>>
>> For really frequent hotplug-like power management, cpu_suspend() makes use
>> of firmware support to power-off cores - from what I can see it doesn't use
>> __cpu_up().
> 
> In case of some platforms, CPU hotplug is triggered via sysfs for power management
> based on user data. What is advantage of putting stack allocation into this path?

It will only happen for CPUs that are brought up.


> IRQ stack allocation is an critical one-shot operation. So, there would be no issue
> to give this work to a booting core. 

I agree, but:

>From include/linux/cpumask.h:
> *  If HOTPLUG is enabled, then cpu_possible_mask is forced to have
> *  all NR_CPUS bits set, otherwise it is just the set of CPUs that
> *  ACPI reports present at boot.

(This doesn't seem to happen with DT - but might with ACPI.)

NR_CPUs could be much bigger than the number of cpus the system ever has.
Allocating a stack for every possible cpu would waste memory. It is better
to do it just-in-time, when we know the memory will be used.

This already happens for the per-cpu idle task, (please check I traced
these through correctly!)
_cpu_up()
  idle_thread_get()
    init_idle()
      fork_idle()
        copy_process()
          dup_task_struct()
            alloc_task_struct_node()
            alloc_thread_info_node()
            arch_dup_task_struct()

So plenty of memory-allocation occurs during _cpu_up(), idle_init() checks
whether the idle task has already been created.


>>>> +
>>>> +	irq_stack_page = alloc_kmem_pages(THREADINFO_GFP, THREAD_SIZE_ORDER);
>>>> +	if (!irq_stack_page) {
>>>> +		pr_crit("CPU%u: failed to allocate irq stack for cpu %u\n",
>>>> +			smp_processor_id(), cpu);
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +	irq_stack = page_address(irq_stack_page);
> 
> I forgot leaving a comment here. How about using __get_free_pages? It returns an
> address instead of page. 

Good idea, I was paranoid about the stack not being correctly aligned (as
current_thread_info() relies on this). I lifted that alloc_kmem_pages line
from kernel/fork.c.


>>>> +
>>>> +	per_cpu(irq_sp, cpu) = (unsigned long)irq_stack->stack
>>>> +			       + THREAD_START_SP;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Copy struct thread_info between two stacks, and update current->stack.
>>>> + * This is used when moving to the irq exception stack.
>>>> + * Changing current->stack is necessary so that non-arch thread_info writers
>>>> + * don't use the new thread_info->task->stack to find the old thread_info when
>>>> + * setting flags like TIF_NEED_RESCHED...
>>>> + */
>>>> +asmlinkage int irq_copy_thread_info(unsigned long dst_sp, unsigned long src_sp)
>>>> +{
>>>> +	struct thread_info *src = get_thread_info(src_sp);
>>>> +	struct thread_info *dst = get_thread_info(dst_sp);
>>>> +
>>>> +	if (dst == src)
>>>> +		return 0;
>>>> +
>>>> +	*dst = *src;
>>>> +	current->stack = dst;
>>>> +
>>>> +	return 1;
>>>> +}
>>>
>>> Although this is 32-byte memcpy, sizeof(struct thread_info), this function is called
>>> twice to handle a single interrupt. Isn's it too expensive?
>>>
>>> This is a major difference between my approach and this patch. I think we should get
>>> some feedbacks on struct thread_info information management for v2 preparation.
>>
>> Agreed.
>>
>> The other difference, as Akashi Takahiro pointed out, is the behaviour of
>> object_is_on_stack(). What should this return for an object on an irq stack?
> 
> 0 should be returned in that case to align with x86 behavior according to check_stack()
> context and the commit, 81520a1b0649d0.

Unless we can update ftrace on x86 too! :)


Thanks for your comments,


James



  reply	other threads:[~2015-09-09 18:13 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-04 14:23 [RFC PATCH 0/3] Implement IRQ stack on ARM64 Jungseok Lee
2015-09-04 14:23 ` Jungseok Lee
2015-09-04 14:23 ` [RFC PATCH 1/3] arm64: entry: Remove unnecessary calculation for S_SP in EL1h Jungseok Lee
2015-09-04 14:23   ` Jungseok Lee
2015-09-07 14:48   ` James Morse
2015-09-07 14:48     ` James Morse
2015-09-07 14:56   ` Mark Rutland
2015-09-07 14:56     ` Mark Rutland
2015-09-07 15:51     ` Jungseok Lee
2015-09-07 15:51       ` Jungseok Lee
2015-09-04 14:23 ` [RFC PATCH 2/3] arm64: Introduce IRQ stack Jungseok Lee
2015-09-04 14:23   ` Jungseok Lee
2015-09-04 17:12   ` Alexnader Kuleshov
2015-09-04 17:12     ` Alexnader Kuleshov
2015-09-07 14:08     ` Jungseok Lee
2015-09-07 14:08       ` Jungseok Lee
2015-09-07 14:48   ` James Morse
2015-09-07 14:48     ` James Morse
2015-09-08 14:28     ` Jungseok Lee
2015-09-08 14:28       ` Jungseok Lee
2015-09-04 14:23 ` [RFC PATCH 3/3] arm64: Reduce kernel stack size when using " Jungseok Lee
2015-09-04 14:23   ` Jungseok Lee
2015-09-07 14:33 ` [RFC PATCH 0/3] Implement IRQ stack on ARM64 James Morse
2015-09-07 14:33   ` James Morse
2015-09-07 14:36   ` [PATCH] arm64: kernel: Use a separate stack for irq interrupts James Morse
2015-09-07 14:36     ` James Morse
2015-09-07 15:48     ` Jungseok Lee
2015-09-07 15:48       ` Jungseok Lee
2015-09-07 16:06       ` James Morse
2015-09-07 16:06         ` James Morse
2015-09-07 16:34         ` Jungseok Lee
2015-09-07 16:34           ` Jungseok Lee
2015-09-08  1:45           ` AKASHI Takahiro
2015-09-08  1:45             ` AKASHI Takahiro
2015-09-08  6:44             ` AKASHI Takahiro
2015-09-08  6:44               ` AKASHI Takahiro
2015-09-08 14:59             ` Jungseok Lee
2015-09-08 14:59               ` Jungseok Lee
2015-09-08  7:51     ` AKASHI Takahiro
2015-09-08  7:51       ` AKASHI Takahiro
2015-09-08 14:54     ` Jungseok Lee
2015-09-08 14:54       ` Jungseok Lee
2015-09-08 16:47       ` James Morse
2015-09-08 16:47         ` James Morse
2015-09-09 13:22         ` Jungseok Lee
2015-09-09 13:22           ` Jungseok Lee
2015-09-09 18:13           ` James Morse [this message]
2015-09-09 18:13             ` James Morse
2015-09-10 23:30             ` Jungseok Lee
2015-09-10 23:30               ` Jungseok Lee
2015-09-07 15:42   ` [RFC PATCH 0/3] Implement IRQ stack on ARM64 Jungseok Lee
2015-09-07 15:42     ` Jungseok Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F076D6.60906@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.