All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Haigh <netwiz@crc.id.au>
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	linux-kernel@vger.kernel.org
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Subject: Re: 4.4: INFO: rcu_sched self-detected stall on CPU
Date: Wed, 30 Mar 2016 04:44:55 +1100	[thread overview]
Message-ID: <56FABF17.7090608@crc.id.au> (raw)
In-Reply-To: <56FA8DDD.7070406@oracle.com>


[-- Attachment #1.1: Type: text/plain, Size: 5229 bytes --]

On 30/03/2016 1:14 AM, Boris Ostrovsky wrote:
> On 03/29/2016 04:56 AM, Steven Haigh wrote:
>>
>> Interestingly enough, this just happened again - but on a different
>> virtual machine. I'm starting to wonder if this may have something to do
>> with the uptime of the machine - as the system that this seems to happen
>> to is always different.
>>
>> Destroying it and monitoring it again has so far come up blank.
>>
>> I've thrown the latest lot of kernel messages here:
>>      http://paste.fedoraproject.org/346802/59241532
> 
> Would be good to see full console log. The one that you posted starts
> with an error so I wonder what was before that.

Agreed. It started off with me observing this on one VM - but since
trying to get details on that VM - others have started showing issues as
well. It frustrating as it seems I've been playing whack-a-mole to get
more debug on what is going on.

So, I've changed the kernel command line to the following on ALL VMs on
this system:
enforcemodulesig=1 selinux=0 fsck.repair=yes loglevel=7 console=tty0
console=ttyS0,38400n8

In the Dom0 (which runs the same kernel package), I've started a screen
sessions with a screen for each of the DomUs running attached to the
console via 'xl console blah' - so hopefully the next one that goes down
(whichever one that is) will get caught in the console.

> Have you tried this on bare metal, BTW? And you said this is only
> observed on 4.4, not 4.5, right?

I use the same kernel package as the Dom0 kernel - and so far haven't
seen any issues running this as the Dom0. I haven't used it on baremetal
as a non-xen kernel as yet.

The kernel package I'm currently running is for CentOS / Scientific
Linux / RHEL at:
    http://au1.mirror.crc.id.au/repo/el7-testing/x86_64/

I'm using 4.4.6-3 at the moment - which has CONFIG_PREEMPT_VOLUNTARY set
- which *MAY* have increased the time between this happening - or may
have no effect at all. I'm not convinced either way as yet.

With respect to 4.5, I have had reports from another user of my packages
that they haven't seen the same crash using the same Xen packages but
with kernel 4.5. I have not verified this myself as yet as I haven't
gone down the path of making 4.5 packages for testing. As such, I
wouldn't treat this as a conclusive test case as yet.

I'm hoping that the steps I've taken above may give some more
information in which we can drill down into exactly what is going on -
or at least give more pointers into the root cause.

>>
>> Interestingly, around the same time, /var/log/messages on the remote
>> syslog server shows:
>> Mar 29 17:00:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:00:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:00:01 zeus systemd: Started Session 1567 of user root.
>> Mar 29 17:00:01 zeus systemd: Starting Session 1567 of user root.
>> Mar 29 17:00:01 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:00:01 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:01:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:01:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:01:01 zeus systemd: Started Session 1568 of user root.
>> Mar 29 17:01:01 zeus systemd: Starting Session 1568 of user root.
>> Mar 29 17:08:34 zeus ntpdate[18569]: adjust time server 203.56.246.94
>> offset -0.002247 sec
>> Mar 29 17:08:34 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:08:34 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Started Session 1569 of user root.
>> Mar 29 17:10:01 zeus systemd: Starting Session 1569 of user root.
>> Mar 29 17:10:01 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:10:01 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Created slice user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Starting user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Started Session 1570 of user root.
>> Mar 29 17:20:01 zeus systemd: Starting Session 1570 of user root.
>> Mar 29 17:20:01 zeus systemd: Removed slice user-0.slice.
>> Mar 29 17:20:01 zeus systemd: Stopping user-0.slice.
>> Mar 29 17:30:55 zeus systemd: systemd-logind.service watchdog timeout
>> (limit 1min)!
>> Mar 29 17:32:25 zeus systemd: systemd-logind.service stop-sigabrt timed
>> out. Terminating.
>> Mar 29 17:33:56 zeus systemd: systemd-logind.service stop-sigterm timed
>> out. Killing.
>> Mar 29 17:35:26 zeus systemd: systemd-logind.service still around after
>> SIGKILL. Ignoring.
>> Mar 29 17:36:56 zeus systemd: systemd-logind.service stop-final-sigterm
>> timed out. Killing.
>> Mar 29 17:38:26 zeus systemd: systemd-logind.service still around after
>> final SIGKILL. Entering failed mode.
>> Mar 29 17:38:26 zeus systemd: Unit systemd-logind.service entered failed
>> state.
>> Mar 29 17:38:26 zeus systemd: systemd-logind.service failed.
> 
> 
> These may be result of your system not feeling well, which is not
> surprising.
> 
> -boris

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2016-03-29 17:45 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25  2:53 4.4: INFO: rcu_sched self-detected stall on CPU Steven Haigh
2016-03-25  2:53 ` Steven Haigh
2016-03-25 12:23 ` Boris Ostrovsky
2016-03-25 12:23 ` [Xen-devel] " Boris Ostrovsky
2016-03-25 14:05   ` Steven Haigh
2016-03-25 14:05   ` [Xen-devel] " Steven Haigh
2016-03-25 14:44     ` Boris Ostrovsky
2016-03-25 16:04       ` Steven Haigh
2016-03-25 16:04       ` [Xen-devel] " Steven Haigh
2016-03-25 16:20         ` Boris Ostrovsky
2016-03-25 16:20         ` [Xen-devel] " Boris Ostrovsky
2016-03-25 21:07           ` Steven Haigh
2016-03-25 21:07           ` Steven Haigh
2016-03-29  8:56             ` Steven Haigh
2016-03-29 14:14               ` Boris Ostrovsky
2016-03-29 17:44                 ` Steven Haigh
2016-03-29 17:44                 ` Steven Haigh [this message]
2016-03-29 18:04                   ` Steven Haigh
2016-03-29 18:04                   ` Steven Haigh
2016-03-29 18:32                     ` Steven Haigh
2016-03-30 13:44                     ` Boris Ostrovsky
2016-03-30 13:44                     ` Boris Ostrovsky
2016-05-02 20:54                     ` gregkh
2016-05-02 20:54                     ` gregkh
2016-05-03 15:11                       ` Steven Haigh
2016-05-03 15:46                         ` gregkh
2016-04-02  1:50                 ` Steven Haigh
2016-04-02  1:50                 ` Steven Haigh
2016-03-29 14:14               ` Boris Ostrovsky
2016-03-29  8:56             ` Steven Haigh
2016-03-25 14:44     ` Boris Ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FABF17.7090608@crc.id.au \
    --to=netwiz@crc.id.au \
    --cc=boris.ostrovsky@oracle.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.