From: Krzysztof Kozlowski <k.kozlowski@samsung.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Kukjin Kim <kgene.kim@samsung.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-samsung-soc@vger.kernel.org,
Kyungmin Park <kyungmin.park@samsung.com>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
Tomasz Figa <t.figa@samsung.com>,
stable@vger.kernel.org
Subject: Re: [PATCH 1/3] clocksource: exynos_mct: Fix stall after CPU hotplugging
Date: Wed, 16 Apr 2014 10:51:03 +0200 [thread overview]
Message-ID: <1397638263.6147.7.camel@AMDC1943> (raw)
In-Reply-To: <alpine.DEB.2.02.1404151753090.22697@ionos.tec.linutronix.de>
On wto, 2014-04-15 at 18:20 +0200, Thomas Gleixner wrote:
> B1;3202;0cOn Tue, 15 Apr 2014, Krzysztof Kozlowski wrote:
> > On wto, 2014-04-15 at 17:20 +0200, Thomas Gleixner wrote:
> > > On Tue, 15 Apr 2014, Krzysztof Kozlowski wrote:
> > >
> > > > On wto, 2014-04-15 at 14:28 +0200, Daniel Lezcano wrote:
> > > > > On 04/15/2014 11:34 AM, Krzysztof Kozlowski wrote:
> > > > > > On pią, 2014-03-28 at 14:06 +0100, Krzysztof Kozlowski wrote:
> > > > > >> Fix stall after hotplugging CPU1. Affected are SoCs where Multi Core Timer
> > > > > >> interrupts are shared (SPI), e.g. Exynos 4210. The stall was a result of
> > > > > >> starting the CPU1 local timer not in L1 timer but in L0 (which is used
> > > > > >> by CPU0).
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Do you have any comments on these 3 patches? They fix the CPU stall on
> > > > > > Exynos4210 and also on Exynos3250 (Chanwoo Choi sent patches for it
> > > > > > recently).
> > > > >
> > > > > You describe this issue as impacting different SoC not only the exynos,
> > > > > right ?
> > > > >
> > > > > Do you know what other SoCs are impacted by this ?
> > > >
> > > > No, affected are only Exynos SoC-s. It was confirmed on Exynos4210
> > > > (Trats board) and Exynos3250 (new SoC, patches for it were recently
> > > > posted by Chanwoo).
> > > >
> > > > Other Exynos SoC-s where MCT local timers use shared interrupts (SPI)
> > > > can also be affected. Candidates are Exynos 5250 and 5420 but I haven't
> > > > tested them.
> > > >
> > > > > I guess this issue is not reproducible just with the line below, we need
> > > > > a timer to expire right at the moment CPU1 is hotplugged, right ?
> > > >
> > > > Right. The timer must fire in short time between enabling local timer
> > > > for CPU1 and setting the affinity for IRQ.
> > >
> > > Why do you set the affinity in the CPU_ONLINE hotplug callback and not
> > > right away when the interrupt is requested?
> >
> > Hi,
> >
> > I think the problem in such code is in GIC. The gic_set_affinity() uses
> > cpu_online_mask:
> > unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
> > In that time this CPU is not present in that mask so -EINVAL would be
> > returned.
>
> Hmm, indeed. Stupid me.
>
> Here is a complete solution to the problem. We really want the drivers
> to be fast and clean and not work around such issues.
>
> I'm quite happy that I kept the 'force' argument of set_affinity
> callbacks. I knew that I'd need it at some point.
>
> So with the flag set we can disable the online mask check and force
> the interrupt to the proper cpu.
Thanks for the solution.
I tested your patch on Exynos 3250 and it is still not sufficient. After
hotplugging CPU1 ~10 times the stall happens (set_next_event is called
on wrong CPU).
The patch 3/3 ("clocksource: exynos_mct: Fix too early ISR fire up on
wrong CPU") is needed as the clockevents_config_and_register should be
called a little later.
Do you have rest of patches (2/3 and 3/3) or should I resend them?
And one minor nit in your patch: 'cpu' local variable in
exynos4_mct_cpu_notify() is no longer used so it can be removed.
Best regards,
Krzysztof
> Thanks,
>
> tglx
>
> Index: linux-2.6/drivers/clocksource/exynos_mct.c
> ===================================================================
> --- linux-2.6.orig/drivers/clocksource/exynos_mct.c
> +++ linux-2.6/drivers/clocksource/exynos_mct.c
> @@ -430,6 +430,7 @@ static int exynos4_local_timer_setup(str
> evt->irq);
> return -EIO;
> }
> + irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
> } else {
> enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
> }
> @@ -461,12 +462,6 @@ static int exynos4_mct_cpu_notify(struct
> mevt = this_cpu_ptr(&percpu_mct_tick);
> exynos4_local_timer_setup(&mevt->evt);
> break;
> - case CPU_ONLINE:
> - cpu = (unsigned long)hcpu;
> - if (mct_int_type == MCT_INT_SPI)
> - irq_set_affinity(mct_irqs[MCT_L0_IRQ + cpu],
> - cpumask_of(cpu));
> - break;
> case CPU_DYING:
> mevt = this_cpu_ptr(&percpu_mct_tick);
> exynos4_local_timer_stop(&mevt->evt);
> Index: linux-2.6/drivers/irqchip/irq-gic.c
> ===================================================================
> --- linux-2.6.orig/drivers/irqchip/irq-gic.c
> +++ linux-2.6/drivers/irqchip/irq-gic.c
> @@ -246,10 +246,14 @@ static int gic_set_affinity(struct irq_d
> bool force)
> {
> void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + (gic_irq(d) & ~3);
> - unsigned int shift = (gic_irq(d) % 4) * 8;
> - unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + unsigned int cpu, shift = (gic_irq(d) % 4) * 8;
> u32 val, mask, bit;
>
> + if (!force)
> + cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + else
> + cpu = cpumask_first(mask_val);
> +
> if (cpu >= NR_GIC_CPU_IF || cpu >= nr_cpu_ids)
> return -EINVAL;
>
> Index: linux-2.6/include/linux/interrupt.h
> ===================================================================
> --- linux-2.6.orig/include/linux/interrupt.h
> +++ linux-2.6/include/linux/interrupt.h
> @@ -204,6 +204,7 @@ static inline int check_wakeup_irqs(void
> extern cpumask_var_t irq_default_affinity;
>
> extern int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask);
> +extern int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask);
> extern int irq_can_set_affinity(unsigned int irq);
> extern int irq_select_affinity(unsigned int irq);
>
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -180,7 +180,7 @@ int irq_do_set_affinity(struct irq_data
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> int ret;
>
> - ret = chip->irq_set_affinity(data, mask, false);
> + ret = chip->irq_set_affinity(data, mask, force);
> switch (ret) {
> case IRQ_SET_MASK_OK:
> cpumask_copy(data->affinity, mask);
> @@ -192,7 +192,8 @@ int irq_do_set_affinity(struct irq_data
> return ret;
> }
>
> -int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
> +static int irq_do_set_affinity_locked(struct irq_data *data,
> + const struct cpumask *mask, bool force)
> {
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> struct irq_desc *desc = irq_data_to_desc(data);
> @@ -202,7 +203,7 @@ int __irq_set_affinity_locked(struct irq
> return -EINVAL;
>
> if (irq_can_move_pcntxt(data)) {
> - ret = irq_do_set_affinity(data, mask, false);
> + ret = irq_do_set_affinity(data, mask, force);
> } else {
> irqd_set_move_pending(data);
> irq_copy_pending(desc, mask);
> @@ -217,13 +218,13 @@ int __irq_set_affinity_locked(struct irq
> return ret;
> }
>
> -/**
> - * irq_set_affinity - Set the irq affinity of a given irq
> - * @irq: Interrupt to set affinity
> - * @mask: cpumask
> - *
> - */
> -int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
> +int __irq_set_affinity_locked(struct irq_data *data,const struct cpumask *mask)
> +{
> + return irq_do_set_affinity_locked(data, mask, false);
> +}
> +
> +static int __irq_set_affinity(unsigned int irq, const struct cpumask *mask,
> + bool force)
> {
> struct irq_desc *desc = irq_to_desc(irq);
> unsigned long flags;
> @@ -233,11 +234,33 @@ int irq_set_affinity(unsigned int irq, c
> return -EINVAL;
>
> raw_spin_lock_irqsave(&desc->lock, flags);
> - ret = __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
> + ret = irq_do_set_affinity_locked(irq_desc_get_irq_data(desc), mask,
> + force);
> raw_spin_unlock_irqrestore(&desc->lock, flags);
> return ret;
> }
>
> +/**
> + * irq_force_affinity - Force the irq affinity of a given irq
> + * @irq: Interrupt to set affinity
> + * @mask: cpumask
> + */
> +int irq_force_affinity(unsigned int irq, const struct cpumask *mask)
> +{
> + return __irq_set_affinity(irq, mask, true);
> +}
> +
> +/**
> + * irq_set_affinity - Set the irq affinity of a given irq
> + * @irq: Interrupt to set affinity
> + * @mask: cpumask
> + *
> + */
> +int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
> +{
> + return __irq_set_affinity(irq, mask, false);
> +}
> +
> int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
> {
> unsigned long flags;
WARNING: multiple messages have this Message-ID (diff)
From: k.kozlowski@samsung.com (Krzysztof Kozlowski)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/3] clocksource: exynos_mct: Fix stall after CPU hotplugging
Date: Wed, 16 Apr 2014 10:51:03 +0200 [thread overview]
Message-ID: <1397638263.6147.7.camel@AMDC1943> (raw)
In-Reply-To: <alpine.DEB.2.02.1404151753090.22697@ionos.tec.linutronix.de>
On wto, 2014-04-15 at 18:20 +0200, Thomas Gleixner wrote:
> B1;3202;0cOn Tue, 15 Apr 2014, Krzysztof Kozlowski wrote:
> > On wto, 2014-04-15 at 17:20 +0200, Thomas Gleixner wrote:
> > > On Tue, 15 Apr 2014, Krzysztof Kozlowski wrote:
> > >
> > > > On wto, 2014-04-15 at 14:28 +0200, Daniel Lezcano wrote:
> > > > > On 04/15/2014 11:34 AM, Krzysztof Kozlowski wrote:
> > > > > > On pi?, 2014-03-28 at 14:06 +0100, Krzysztof Kozlowski wrote:
> > > > > >> Fix stall after hotplugging CPU1. Affected are SoCs where Multi Core Timer
> > > > > >> interrupts are shared (SPI), e.g. Exynos 4210. The stall was a result of
> > > > > >> starting the CPU1 local timer not in L1 timer but in L0 (which is used
> > > > > >> by CPU0).
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Do you have any comments on these 3 patches? They fix the CPU stall on
> > > > > > Exynos4210 and also on Exynos3250 (Chanwoo Choi sent patches for it
> > > > > > recently).
> > > > >
> > > > > You describe this issue as impacting different SoC not only the exynos,
> > > > > right ?
> > > > >
> > > > > Do you know what other SoCs are impacted by this ?
> > > >
> > > > No, affected are only Exynos SoC-s. It was confirmed on Exynos4210
> > > > (Trats board) and Exynos3250 (new SoC, patches for it were recently
> > > > posted by Chanwoo).
> > > >
> > > > Other Exynos SoC-s where MCT local timers use shared interrupts (SPI)
> > > > can also be affected. Candidates are Exynos 5250 and 5420 but I haven't
> > > > tested them.
> > > >
> > > > > I guess this issue is not reproducible just with the line below, we need
> > > > > a timer to expire right at the moment CPU1 is hotplugged, right ?
> > > >
> > > > Right. The timer must fire in short time between enabling local timer
> > > > for CPU1 and setting the affinity for IRQ.
> > >
> > > Why do you set the affinity in the CPU_ONLINE hotplug callback and not
> > > right away when the interrupt is requested?
> >
> > Hi,
> >
> > I think the problem in such code is in GIC. The gic_set_affinity() uses
> > cpu_online_mask:
> > unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
> > In that time this CPU is not present in that mask so -EINVAL would be
> > returned.
>
> Hmm, indeed. Stupid me.
>
> Here is a complete solution to the problem. We really want the drivers
> to be fast and clean and not work around such issues.
>
> I'm quite happy that I kept the 'force' argument of set_affinity
> callbacks. I knew that I'd need it at some point.
>
> So with the flag set we can disable the online mask check and force
> the interrupt to the proper cpu.
Thanks for the solution.
I tested your patch on Exynos 3250 and it is still not sufficient. After
hotplugging CPU1 ~10 times the stall happens (set_next_event is called
on wrong CPU).
The patch 3/3 ("clocksource: exynos_mct: Fix too early ISR fire up on
wrong CPU") is needed as the clockevents_config_and_register should be
called a little later.
Do you have rest of patches (2/3 and 3/3) or should I resend them?
And one minor nit in your patch: 'cpu' local variable in
exynos4_mct_cpu_notify() is no longer used so it can be removed.
Best regards,
Krzysztof
> Thanks,
>
> tglx
>
> Index: linux-2.6/drivers/clocksource/exynos_mct.c
> ===================================================================
> --- linux-2.6.orig/drivers/clocksource/exynos_mct.c
> +++ linux-2.6/drivers/clocksource/exynos_mct.c
> @@ -430,6 +430,7 @@ static int exynos4_local_timer_setup(str
> evt->irq);
> return -EIO;
> }
> + irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu));
> } else {
> enable_percpu_irq(mct_irqs[MCT_L0_IRQ], 0);
> }
> @@ -461,12 +462,6 @@ static int exynos4_mct_cpu_notify(struct
> mevt = this_cpu_ptr(&percpu_mct_tick);
> exynos4_local_timer_setup(&mevt->evt);
> break;
> - case CPU_ONLINE:
> - cpu = (unsigned long)hcpu;
> - if (mct_int_type == MCT_INT_SPI)
> - irq_set_affinity(mct_irqs[MCT_L0_IRQ + cpu],
> - cpumask_of(cpu));
> - break;
> case CPU_DYING:
> mevt = this_cpu_ptr(&percpu_mct_tick);
> exynos4_local_timer_stop(&mevt->evt);
> Index: linux-2.6/drivers/irqchip/irq-gic.c
> ===================================================================
> --- linux-2.6.orig/drivers/irqchip/irq-gic.c
> +++ linux-2.6/drivers/irqchip/irq-gic.c
> @@ -246,10 +246,14 @@ static int gic_set_affinity(struct irq_d
> bool force)
> {
> void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + (gic_irq(d) & ~3);
> - unsigned int shift = (gic_irq(d) % 4) * 8;
> - unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + unsigned int cpu, shift = (gic_irq(d) % 4) * 8;
> u32 val, mask, bit;
>
> + if (!force)
> + cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + else
> + cpu = cpumask_first(mask_val);
> +
> if (cpu >= NR_GIC_CPU_IF || cpu >= nr_cpu_ids)
> return -EINVAL;
>
> Index: linux-2.6/include/linux/interrupt.h
> ===================================================================
> --- linux-2.6.orig/include/linux/interrupt.h
> +++ linux-2.6/include/linux/interrupt.h
> @@ -204,6 +204,7 @@ static inline int check_wakeup_irqs(void
> extern cpumask_var_t irq_default_affinity;
>
> extern int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask);
> +extern int irq_force_affinity(unsigned int irq, const struct cpumask *cpumask);
> extern int irq_can_set_affinity(unsigned int irq);
> extern int irq_select_affinity(unsigned int irq);
>
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -180,7 +180,7 @@ int irq_do_set_affinity(struct irq_data
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> int ret;
>
> - ret = chip->irq_set_affinity(data, mask, false);
> + ret = chip->irq_set_affinity(data, mask, force);
> switch (ret) {
> case IRQ_SET_MASK_OK:
> cpumask_copy(data->affinity, mask);
> @@ -192,7 +192,8 @@ int irq_do_set_affinity(struct irq_data
> return ret;
> }
>
> -int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask)
> +static int irq_do_set_affinity_locked(struct irq_data *data,
> + const struct cpumask *mask, bool force)
> {
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> struct irq_desc *desc = irq_data_to_desc(data);
> @@ -202,7 +203,7 @@ int __irq_set_affinity_locked(struct irq
> return -EINVAL;
>
> if (irq_can_move_pcntxt(data)) {
> - ret = irq_do_set_affinity(data, mask, false);
> + ret = irq_do_set_affinity(data, mask, force);
> } else {
> irqd_set_move_pending(data);
> irq_copy_pending(desc, mask);
> @@ -217,13 +218,13 @@ int __irq_set_affinity_locked(struct irq
> return ret;
> }
>
> -/**
> - * irq_set_affinity - Set the irq affinity of a given irq
> - * @irq: Interrupt to set affinity
> - * @mask: cpumask
> - *
> - */
> -int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
> +int __irq_set_affinity_locked(struct irq_data *data,const struct cpumask *mask)
> +{
> + return irq_do_set_affinity_locked(data, mask, false);
> +}
> +
> +static int __irq_set_affinity(unsigned int irq, const struct cpumask *mask,
> + bool force)
> {
> struct irq_desc *desc = irq_to_desc(irq);
> unsigned long flags;
> @@ -233,11 +234,33 @@ int irq_set_affinity(unsigned int irq, c
> return -EINVAL;
>
> raw_spin_lock_irqsave(&desc->lock, flags);
> - ret = __irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask);
> + ret = irq_do_set_affinity_locked(irq_desc_get_irq_data(desc), mask,
> + force);
> raw_spin_unlock_irqrestore(&desc->lock, flags);
> return ret;
> }
>
> +/**
> + * irq_force_affinity - Force the irq affinity of a given irq
> + * @irq: Interrupt to set affinity
> + * @mask: cpumask
> + */
> +int irq_force_affinity(unsigned int irq, const struct cpumask *mask)
> +{
> + return __irq_set_affinity(irq, mask, true);
> +}
> +
> +/**
> + * irq_set_affinity - Set the irq affinity of a given irq
> + * @irq: Interrupt to set affinity
> + * @mask: cpumask
> + *
> + */
> +int irq_set_affinity(unsigned int irq, const struct cpumask *mask)
> +{
> + return __irq_set_affinity(irq, mask, false);
> +}
> +
> int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
> {
> unsigned long flags;
next prev parent reply other threads:[~2014-04-16 8:51 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-28 13:06 [PATCH 1/3] clocksource: exynos_mct: Fix stall after CPU hotplugging Krzysztof Kozlowski
2014-03-28 13:06 ` Krzysztof Kozlowski
2014-03-28 13:06 ` [PATCH 2/3] clocksource: exynos_mct: Change exynos4_mct_tick_clear return type to void Krzysztof Kozlowski
2014-03-28 13:06 ` Krzysztof Kozlowski
2014-03-28 13:06 ` [PATCH 3/3] clocksource: exynos_mct: Fix too early ISR fire up on wrong CPU Krzysztof Kozlowski
2014-03-28 13:06 ` Krzysztof Kozlowski
2014-03-28 13:06 ` Krzysztof Kozlowski
2014-04-15 9:34 ` [PATCH 1/3] clocksource: exynos_mct: Fix stall after CPU hotplugging Krzysztof Kozlowski
2014-04-15 9:34 ` Krzysztof Kozlowski
2014-04-15 12:28 ` Daniel Lezcano
2014-04-15 12:28 ` Daniel Lezcano
2014-04-15 12:47 ` Krzysztof Kozlowski
2014-04-15 12:47 ` Krzysztof Kozlowski
2014-04-15 15:20 ` Thomas Gleixner
2014-04-15 15:20 ` Thomas Gleixner
2014-04-15 15:41 ` Krzysztof Kozlowski
2014-04-15 15:41 ` Krzysztof Kozlowski
2014-04-15 16:20 ` Thomas Gleixner
2014-04-15 16:20 ` Thomas Gleixner
2014-04-16 8:51 ` Krzysztof Kozlowski [this message]
2014-04-16 8:51 ` Krzysztof Kozlowski
2014-04-16 9:41 ` Thomas Gleixner
2014-04-16 9:41 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1397638263.6147.7.camel@AMDC1943 \
--to=k.kozlowski@samsung.com \
--cc=b.zolnierkie@samsung.com \
--cc=daniel.lezcano@linaro.org \
--cc=kgene.kim@samsung.com \
--cc=kyungmin.park@samsung.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-samsung-soc@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=stable@vger.kernel.org \
--cc=t.figa@samsung.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.