From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris J Arges <chris.j.arges@canonical.com>,
Rafael David Tinoco <inaddy@ubuntu.com>,
Peter Anvin <hpa@zytor.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
Frederic Weisbecker <fweisbec@gmail.com>,
Gema Gomez <gema.gomez-solano@canonical.com>,
the arch/x86 maintainers <x86@kernel.org>
Subject: [debug PATCHes] Re: smp_call_function_single lockups
Date: Tue, 31 Mar 2015 12:56:56 +0200 [thread overview]
Message-ID: <20150331105656.GA25180@gmail.com> (raw)
In-Reply-To: <CA+55aFyk9nnStb9sNm0xt=j7K+eGES=m_dDm9E49QSnW7jFgkQ@mail.gmail.com>
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Ok, interesting. So the whole "we try to do an APIC ACK with the ISR
> bit clear" seems to be a real issue.
It's interesting in particular when it happens with an edge-triggered
interrupt source: it's much harder to miss level triggered IRQs, which
stay around until actively handled. Edge triggered irqs are more
fragile to loss of event processing.
> > Anyway, maybe this sheds some more light on this issue. I can
> > reproduce this at will, so let me know of other experiments to do.
Btw., could you please describe (again) what your current best method
for reproduction is? It's been a long discussion ...
> Somebody else who knows the apic needs to also take a look, but I'd
> love to hear what the actual hardware interrupt is (from that
> "vector_irq[vector]" thing above.
>
> I'm not recognizing 0xe1 as any of the hardcoded SMP vectors (they
> are 0xf0-0xff), so it sounds like an external one. But that then
> requires the whole mapping table thing.
>
> Ingo/Peter/Jiang - is there anything else useful we could print out?
> I worry about the irq movement code. Can we add printk's to when an
> irq is chasing from one CPU to another and doing that
> "move_in_progress" thing? I've always been scared of that code.
1)
So the irq_cfg::move_in_progress field originates from:
610142927b5b ("[PATCH] x86_64 irq: Safely cleanup an irq after moving it.")
and I agree that it looks fragile, so it would be nice to see how much
(if at all?) it gets used, by sticking a few pr_info()s in it.
Debug patch for the vector movement state machine attached below.
Untested.
2)
But ... having taken a quick look at the vector movement handling, I
am scared more than ever,and I cannot convince myself that it's race
free. It's possibly harmless, but still, famous last words ...
For example, most ->move_in_progress transitions happen with the
vector_lock held - with the exception of send_cleanup_vector(): there
we are only holding the desc->lock, but that's not enough! For
example, this codepath:
static void __irq_complete_move(struct irq_cfg *cfg, unsigned vector)
{
unsigned me;
if (likely(!cfg->move_in_progress))
return;
me = smp_processor_id();
if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
send_cleanup_vector(cfg);
...
void send_cleanup_vector(struct irq_cfg *cfg)
{
...
cfg->move_in_progress = 0;
}
is blatantly racy, unless I missed something obvious?
2b)
Initially I thought this only affects slowpaths like driver unregister
or CPU hotplug, but I think this would be relevant for the fastpath of
edge triggered irqs as well:
void apic_ack_edge(struct irq_data *data)
{
irq_complete_move(irqd_cfg(data));
irq_move_irq(data);
ack_APIC_irq();
}
and irq_complete_move() checks and clears cfg->move_in_progress as
listed above.
So in most scenarios this is probably harmless, because it can in the
worst case result in the missing of a ->move_in_progress flag setting.
But it does not look harmless in the apic_set_affinity() code path:
there we call into assign_irq_vector() without the desc->lock held,
due to this gem in the IRQ core:
int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
{
unsigned long flags;
struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
if (!desc)
return -EINVAL;
desc->affinity_hint = m;
irq_put_desc_unlock(desc, flags);
/* set the initial affinity to prevent every interrupt being on CPU0 */
if (m)
__irq_set_affinity(irq, m, false);
return 0;
}
argh!
Now if this ever crosses path with an assign_irq_vector() call on
another CPU, then I really cannot see what would save us from deep
trouble: we use cfg->vector while that is possibly being modified,
right?
So I'd suggest to put a printk into irq_set_affinity_hint() as well,
to make sure it's not used during this test. In particular:
drivers/virtio/virtio_pci_common.c: irq_set_affinity_hint(irq, NULL);
drivers/virtio/virtio_pci_common.c: irq_set_affinity_hint(irq, mask);
might be triggering it in the virtualization code...
The second patch below adds the relevant pr_info(). (untested)
Now this too might be unrelated, because the affinity hint was added
ages ago, in:
e7a297b0d7d6 ("genirq: Add CPU mask affinity hint")
and the potentially racy nature of calling into set_affinity without
the desc lock held was probably not realized back then.
VirtIO's use of the affinity-hint was added a while ago as well, four
years ago, in:
75a0a52be3c2 ("virtio: introduce an API to set affinity for a virtqueue")
2c)
The other thing that worries me here is that we apparently send an
IRQ-move IPI while having an un-acked local APIC (!). I have vague
memories that this was problematic and fragile before.
To ease my worries in this area I've attached a third patch below that
changes the order around.
This too is ancient behavior - but might be more prominent on certain
hardware (and virtualization) variants, so it has a chance to be
relevant.
3)
Furthermore, I also noticed that the whole vector_lock()/unlock()
business is actively misleading IMHO, the vector_lock looks local to
vector.c, while we have these two calls that essentially export it ...
Some of that damage was inflicted in this ancient commit:
d388e5fdc461 ("x86: Restore proper vector locking during cpu hotplug")
the proper approach would be to protect against hotplug events in
assign_irq_vector(), not the other way around!
Probably not a functional problem, yet somewhat disgusting.
-------------------------------
So ... all in one, these various problems might be related to the
regression or not, but they sure need addressing.
Thanks,
Ingo
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..1dd1de9dd690 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -143,6 +143,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
cpumask_andnot(cfg->old_domain, cfg->domain, tmp_mask);
cfg->move_in_progress =
cpumask_intersects(cfg->old_domain, cpu_online_mask);
+ if (cfg->move_in_progress)
+ pr_info("apic: vector %02x, same-domain move in progress\n", cfg->vector);
cpumask_and(cfg->domain, cfg->domain, tmp_mask);
break;
}
@@ -178,6 +180,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
cpumask_copy(cfg->old_domain, cfg->domain);
cfg->move_in_progress =
cpumask_intersects(cfg->old_domain, cpu_online_mask);
+ if (cfg->move_in_progress)
+ pr_info("apic: vector %02x, new-domain move in progress\n", cfg->vector);
}
for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq;
@@ -232,6 +236,7 @@ void clear_irq_vector(int irq, struct irq_cfg *cfg)
}
}
cfg->move_in_progress = 0;
+ pr_info("apic: vector %02x, vector cleared, move completed\n", cfg->vector);
raw_spin_unlock_irqrestore(&vector_lock, flags);
}
@@ -391,6 +396,7 @@ void send_cleanup_vector(struct irq_cfg *cfg)
free_cpumask_var(cleanup_mask);
}
cfg->move_in_progress = 0;
+ pr_info("apic: vector %02x, sent cleanup vector, move completed\n", cfg->vector);
}
asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c0a1100d911f..18300a7b8ed2 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -250,8 +250,10 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
desc->affinity_hint = m;
irq_put_desc_unlock(desc, flags);
/* set the initial affinity to prevent every interrupt being on CPU0 */
- if (m)
+ if (m) {
+ pr_info("irq/core: Called irq_set_affinity_hint(%d)\n", irq);
__irq_set_affinity(irq, m, false);
+ }
return 0;
}
EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..833a981c5420 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -335,9 +340,11 @@ int apic_retrigger_irq(struct irq_data *data)
void apic_ack_edge(struct irq_data *data)
{
+ ack_APIC_irq();
+
+ /* Might generate IPIs, so do this after having ACKed the APIC: */
irq_complete_move(irqd_cfg(data));
irq_move_irq(data);
- ack_APIC_irq();
}
/*
next prev parent reply other threads:[~2015-03-31 10:57 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-11 13:19 smp_call_function_single lockups Rafael David Tinoco
2015-02-11 18:18 ` Linus Torvalds
2015-02-11 19:59 ` Linus Torvalds
2015-02-11 20:42 ` Linus Torvalds
2015-02-12 16:38 ` Rafael David Tinoco
2015-02-18 22:25 ` Peter Zijlstra
2015-02-19 15:42 ` Rafael David Tinoco
2015-02-19 16:14 ` Linus Torvalds
2015-02-23 14:01 ` Rafael David Tinoco
2015-02-23 19:32 ` Linus Torvalds
2015-02-23 20:50 ` Peter Zijlstra
2015-02-23 21:02 ` Rafael David Tinoco
2015-02-19 16:16 ` Peter Zijlstra
2015-02-19 16:26 ` Linus Torvalds
2015-02-19 16:32 ` Rafael David Tinoco
2015-02-19 16:59 ` Linus Torvalds
2015-02-19 17:30 ` Rafael David Tinoco
2015-02-19 17:39 ` Linus Torvalds
2015-02-19 20:29 ` Linus Torvalds
2015-02-19 21:59 ` Linus Torvalds
2015-02-19 22:45 ` Linus Torvalds
2015-03-31 3:15 ` Chris J Arges
2015-03-31 4:28 ` Linus Torvalds
2015-03-31 10:56 ` Ingo Molnar [this message]
2015-03-31 22:38 ` [debug PATCHes] " Chris J Arges
2015-04-01 12:39 ` Ingo Molnar
2015-04-01 14:10 ` Chris J Arges
2015-04-01 14:55 ` Ingo Molnar
2015-03-31 4:46 ` Linus Torvalds
2015-03-31 15:08 ` Linus Torvalds
2015-03-31 22:23 ` Chris J Arges
2015-03-31 23:07 ` Linus Torvalds
2015-04-01 14:32 ` Chris J Arges
2015-04-01 15:36 ` Linus Torvalds
2015-04-02 9:55 ` Ingo Molnar
2015-04-02 17:35 ` Linus Torvalds
2015-04-01 12:43 ` Ingo Molnar
2015-04-01 16:10 ` Chris J Arges
2015-04-01 16:14 ` Linus Torvalds
2015-04-01 21:59 ` Chris J Arges
2015-04-02 17:31 ` Linus Torvalds
2015-04-02 18:26 ` Ingo Molnar
2015-04-02 18:51 ` Chris J Arges
2015-04-02 19:07 ` Ingo Molnar
2015-04-02 20:57 ` Linus Torvalds
2015-04-02 21:13 ` Chris J Arges
2015-04-03 5:43 ` [PATCH] smp/call: Detect stuck CSD locks Ingo Molnar
2015-04-03 5:47 ` Ingo Molnar
2015-04-06 16:58 ` Chris J Arges
2015-04-06 17:32 ` Linus Torvalds
2015-04-07 9:21 ` Ingo Molnar
2015-04-07 20:59 ` Chris J Arges
2015-04-07 21:15 ` Linus Torvalds
2015-04-08 6:47 ` Ingo Molnar
2015-04-13 3:56 ` Chris J Arges
2015-04-13 6:14 ` Ingo Molnar
2015-04-15 19:54 ` Chris J Arges
2015-04-16 11:04 ` Ingo Molnar
2015-04-16 15:58 ` Chris J Arges
2015-04-16 16:31 ` Ingo Molnar
2015-04-29 21:08 ` Chris J Arges
2015-05-11 14:00 ` Ingo Molnar
2015-05-20 18:19 ` Chris J Arges
2015-04-03 5:45 ` smp_call_function_single lockups Ingo Molnar
2015-04-06 17:23 ` Chris J Arges
2015-02-20 9:30 ` Ingo Molnar
2015-02-20 16:49 ` Linus Torvalds
2015-02-20 19:41 ` Ingo Molnar
2015-02-20 20:03 ` Linus Torvalds
2015-02-20 20:11 ` Ingo Molnar
2015-03-20 10:15 ` Peter Zijlstra
2015-03-20 16:26 ` Linus Torvalds
2015-03-20 17:14 ` Mike Galbraith
2015-04-01 14:22 ` Frederic Weisbecker
2015-04-18 10:13 ` [tip:locking/urgent] smp: Fix smp_call_function_single_async() locking tip-bot for Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2015-04-01 2:00 [debug PATCHes] Re: smp_call_function_single lockups Daniel J Blueman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150331105656.GA25180@gmail.com \
--to=mingo@kernel.org \
--cc=axboe@kernel.dk \
--cc=chris.j.arges@canonical.com \
--cc=fweisbec@gmail.com \
--cc=gema.gomez-solano@canonical.com \
--cc=hpa@zytor.com \
--cc=inaddy@ubuntu.com \
--cc=jiang.liu@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).