From: Yanko Kaneti <yaneti@declera.com>
To: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: LKML <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Chuck Ebbert <cebbert.lkml@gmail.com>,
Marc Zyngier <marc.zyngier@arm.com>
Subject: Re: [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective affinity to interrupts actually using it)
Date: Sun, 01 Oct 2017 16:06:30 +0300 [thread overview]
Message-ID: <1506863190.18340.20.camel@declera.com> (raw)
In-Reply-To: <4cd63212-8f77-891d-67a0-9a305ddba549@leemhuis.info>
On Sun, 2017-10-01 at 14:46 +0200, Thorsten Leemhuis wrote:
> Hi, the regression tracker here. What's the status of this issue? Was
> the problem fixed? It seems nothing happened for more than 10 days -- or
> did the discussion move somewhere else? Ciao, Thorsten
The commit was reverted last week before rc2
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0551968add53777fddd18f4ffb4e3bbc1f646d79
Thanks for tracking it
-Yanko
>
> On 20.09.2017 02:30, Chuck Ebbert wrote:
> > On Tue, 19 Sep 2017 16:51:06 +0100
> > Marc Zyngier <marc.zyngier@arm.com> wrote:
> >
> > > On 19/09/17 16:40, Yanko Kaneti wrote:
> > > > On Tue, 2017-09-19 at 16:33 +0100, Marc Zyngier wrote:
> > > > > On 19/09/17 16:12, Yanko Kaneti wrote:
> > > > > > Hello,
> > > > > >
> > > > > > Fedora rawhide config here.
> > > > > > AMD FX-8370E
> > > > > >
> > > > > > Bisected a problem to:
> > > > > > 74def747bcd0 (genirq: Restrict effective affinity to interrupts
> > > > > > actually using it)
> > > > > >
> > > > > > It seems to be causing stalls, short lived or long lived lockups
> > > > > > very shortly after boot. Everything becomes jerky.
> > > > > >
> > > > > > The only visible in the log indication is something like :
> > > > > > ....
> > > > > > [ 59.802129] clocksource: timekeeping watchdog on CPU3: Marking
> > > > > > clocksource 'tsc' as unstable because the skew is too large:
> > > > > > [ 59.802134] clocksource: 'hpet' wd_now:
> > > > > > 3326e7aa wd_last: 329956f8 mask: ffffffff [ 59.802137]
> > > > > > clocksource: 'tsc' cs_now: 423662bc6f
> > > > > > cs_last: 41dfc91650 mask: ffffffffffffffff [ 59.802140] tsc:
> > > > > > Marking TSC unstable due to clocksource watchdog [ 59.802158]
> > > > > > TSC found unstable after boot, most likely due to broken BIOS.
> > > > > > Use 'tsc=unstable'. [ 59.802161] sched_clock: Marking unstable
> > > > > > (59802142067, 15510)<-(59920871789, -118714277) [ 60.015604]
> > > > > > clocksource: Switched to clocksource hpet [ 89.015994] INFO:
> > > > > > NMI handler (perf_event_nmi_handler) took too long to run:
> > > > > > 209.660 msecs [ 89.016003] perf: interrupt took too long
> > > > > > (1638003 > 2500), lowering kernel.perf_event_max_sample_rate to
> > > > > > 1000 ....
> > > > > >
> > > > > > Just reverting that commit on top of linus mainline cures all the
> > > > > > symptoms
> > > > >
> > > > > Interesting. Do you still get HPET interrupts?
> > > >
> > > > Sorry, I might need some basic help here (i.e where do I count
> > > > them...)
> > >
> > > /proc/interrupts should display them.
> > >
> > > > After the watchdog switches the clocksource to hpet the system is
> > > > still somewhat alive, so I'll guess some clock is still
> > > > ticking....
> > >
> > > Probably, but I suspect they're not hitting the right CPU, hence the
> > > lockups.
> > >
> > > Unfortunately, my x86-foo is pretty minimal, and I'm about to drop off
> > > the net for a few days.
> > >
> > > Thomas, any insight?
> >
> > Looking at flat_cpu_mask_to_apicid(), I don't see how 74def747bcd0
> > can be correct:
> >
> > struct cpumask *effmsk =
> > irq_data_get_effective_affinity_mask(irqdata); unsigned long
> > cpu_mask = cpumask_bits(mask)[0] & APIC_ALL_CPUS;
> >
> > if (!cpu_mask)
> > return -EINVAL;
> > *apicid = (unsigned int)cpu_mask;
> > cpumask_bits(effmsk)[0] = cpu_mask;
> >
> > Before that patch, this function wrote to the effective mask
> > unconditionally. After, it only writes to effective_mask if it is
> > already non-zero.
> >
> >
> > http://news.gmane.org/find-root.php?message_id=20170919203044.560cb9f1%40gmail.com
> > http://mid.gmane.org/20170919203044.560cb9f1%40gmail.com
> >
next prev parent reply other threads:[~2017-10-01 13:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-19 15:12 [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective affinity to interrupts actually using it) Yanko Kaneti
2017-09-19 15:33 ` Marc Zyngier
2017-09-19 15:40 ` Yanko Kaneti
2017-09-19 15:51 ` Marc Zyngier
2017-09-20 0:30 ` Chuck Ebbert
2017-10-01 12:46 ` Thorsten Leemhuis
2017-10-01 13:06 ` Yanko Kaneti [this message]
2017-10-01 13:17 ` Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1506863190.18340.20.camel@declera.com \
--to=yaneti@declera.com \
--cc=cebbert.lkml@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=regressions@leemhuis.info \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.