From: David Gibson <david@gibson.dropbear.id.au>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>,
lkml <linux-kernel@vger.kernel.org>,
Liav Rehana <liavr@mellanox.com>,
Chris Metcalf <cmetcalf@mellanox.com>,
Richard Cochran <richardcochran@gmail.com>,
Ingo Molnar <mingo@kernel.org>,
Prarit Bhargava <prarit@redhat.com>,
Laurent Vivier <lvivier@redhat.com>,
"Christopher S . Hall" <christopher.s.hall@intel.com>,
"4.6+" <stable@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation.
Date: Sat, 3 Dec 2016 11:33:09 +1100 [thread overview]
Message-ID: <20161203003309.GL10089@umbus.fritz.box> (raw)
In-Reply-To: <alpine.DEB.2.20.1612020921500.4295@nanos>
[-- Attachment #1: Type: text/plain, Size: 3134 bytes --]
On Fri, Dec 02, 2016 at 09:36:42AM +0100, Thomas Gleixner wrote:
> On Fri, 2 Dec 2016, David Gibson wrote:
> > On Thu, Dec 01, 2016 at 12:59:51PM +0100, Thomas Gleixner wrote:
> > > So I assume that you are talking about a VM which was not scheduled by the
> > > host due to overcommitment (who ever thought that this is a good idea) or
> > > whatever other reason (yes, people were complaining about wreckage caused
> > > by stopping kernels with debuggers) for a long enough time to trigger that
> > > overflow situation. If that's the case then the unsigned conversion will
> > > just make it more unlikely but it still will happen.
> >
> > It was essentially the stopped by debugger case. I forget exactly
> > why, but the guest was being explicitly stopped from outside, it
> > wasn't just scheduling lag. I think it was something in the vicinity
> > of 10 minutes stopped.
>
> Ok. Debuggers stopping stuff is one issue, but if I understood Liav
> correctly, then he is seing the issue on a heavy loaded machine.
Right. I can't speak to other situations which might trigger this.
> Liav, can you please describe the scenario in detail? Are you observing
> this on bare metal or in a VM which gets scheduled out long enough or was
> there debugging/hypervisor intervention involved?
>
> > It's long enough ago that I can't be sure, but I thought we'd tried
> > various different stoppage periods, which should have also triggered
> > the unsigned overflow you're describing, and didn't observe the crash
> > once the change was applied. Note that there have been other changes
> > to the timekeeping code since then, which might have made a
> > difference.
> >
> > I agree that it's not reasonable for the guest to be entirely
> > unaffected by such a large stoppage: I'd have no complaints if the
> > guest time was messed up, and/or it spewed warnings. But complete
> > guest death seems a rather more fragile response to the situation than
> > we'd like.
>
> Guests death? Is it really dead/crashed or just stuck in that endless loop
> trying to add that huge negative value piecewise?
Well, I don't know. But the point was it was unusable from the
console, and didn't come back any time soon.
> That's at least what Liav was describing as he mentioned
> __iter_div_u64_rem() explicitely.
>
> While I'm less worried about debuggers, I worry about the real thing.
>
> I agree that we should not starve after resume from a debug stop, but in
> that case the least of my worries is time going backwards.
>
> Though if the signed mult overrun is observable in a live system, then we
> need to worry about time going backwards even with the unsigned
> conversion. Simply because once we fixed the starvation issue people with
> insane enough setups will trigger the unsigned overrun and complain about
> time going backwards.
>
> Thanks,
>
> tglx
>
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
prev parent reply other threads:[~2016-12-03 0:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-19 4:53 [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation John Stultz
2016-11-28 22:50 ` John Stultz
2016-11-29 14:22 ` Thomas Gleixner
2016-11-29 23:57 ` David Gibson
2016-11-30 23:21 ` Thomas Gleixner
2016-12-01 2:12 ` David Gibson
2016-12-01 11:59 ` Thomas Gleixner
2016-12-01 20:23 ` John Stultz
2016-12-01 20:46 ` Thomas Gleixner
2016-12-01 21:19 ` John Stultz
2016-12-01 22:44 ` Thomas Gleixner
2016-12-01 23:03 ` John Stultz
2016-12-01 23:08 ` Thomas Gleixner
2016-12-01 23:32 ` David Gibson
2016-12-02 8:36 ` Thomas Gleixner
2016-12-03 0:33 ` David Gibson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161203003309.GL10089@umbus.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=christopher.s.hall@intel.com \
--cc=cmetcalf@mellanox.com \
--cc=john.stultz@linaro.org \
--cc=liavr@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lvivier@redhat.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=prarit@redhat.com \
--cc=richardcochran@gmail.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).