linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jet Chen <jet.chen@intel.com>, Su Tao <tao.su@intel.com>,
	Yuanhan Liu <yuanhan.liu@intel.com>, LKP <lkp@01.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [torture] BUG: unable to handle kernel NULL pointer dereference at (null)
Date: Fri, 26 Sep 2014 00:42:23 -0700	[thread overview]
Message-ID: <20140926074223.GN4723@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140918131751.GA8797@localhost>

On Thu, Sep 18, 2014 at 09:17:51PM +0800, Fengguang Wu wrote:
> Hi Paul,
> 
> > > > > plymouth-upstart-bridge: ply-event-loop.c:497: ply_event_loop_new: Assertion `loop->epoll_fd >= 0' failed.
> > > > > /etc/lsb-base-logging.sh: line 5:  2580 Aborted                 plymouth --ping > /dev/null 2>&1
> > > > > /etc/lsb-base-logging.sh: line 5:  2585 Aborted                 plymouth --ping > /dev/null 2>&1
> > > > > mount: proc has wrong device number or fs type proc not supported
> > > > > /etc/lsb-base-logging.sh: line 5:  2601 Aborted                 plymouth --ping > /dev/null 2>&1
> > > > > /etc/rc6.d/S40umountfs: line 20: /proc/mounts: No such file or directory
> > > > > cat: /proc/1/maps: No such file or directory
> > > > > cat: /proc/1/maps: No such file or directory
> > > > > cat: /proc/1/maps: No such file or directory
> > > > > cat: /proc/1/maps: No such file or directory
> > > > > cat: /proc/1/maps: No such file or directory
> > > > > cat: /proc/1/maps: No such file or directory
> > > > > umount: /var/run: not mounted
> > > > > umount: /var/lock: not mounted
> > > > > umount: /dev/shm: not mounted
> > > > > mount: / is busy
> > > > >  * Will now restart
> > 
> > Are these expected behavior?
> 
> Yes, because it's randconfig boot tests, the user space may well
> complain about random stuff and I'll ignore them all as long as it
> will eventually call the shutdown command to finish the test in time.  :)
> 
> > So again, I can invoke this commit without losing much (sendkey
> > alt-sysrq-z is after all my friend), but it is not clear to me that we
> > have gotten to the root of this problem.
> 
> Sorry about that! If you see any debug tricks that I can try, or
> information I can collect, please let me know.

Hmmm...

Looks like rcutorture might be starting too soon.  With all the selftests,
it is taking 3-4 minutes to boot.  One approach would be to set
rcutorture.stat_interval=200 or whatever the duration of boot is.
Another would be to set rcutorture.torture_runnable=0, and to change:

	int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;
	module_param(rcutorture_runnable, int, 0444);
	MODULE_PARM_DESC(rcutorture_runnable, "Start rcutorture at boot");

To:

	int rcutorture_runnable = RCUTORTURE_RUNNABLE_INIT;
	module_param(rcutorture_runnable, int, 0644);
	MODULE_PARM_DESC(rcutorture_runnable, "Start rcutorture at boot");

In kernel/rcu/rcutorture.c.

Then have your scripts set rcutorture_runnable=1 from sysfs once boot
completes.

Alternatively, if poking sysfs is not reasonable (and it
would not be in my test scripts), put a delay just after the
rcutorture_record_test_transition() in rcu_torture_init().  For example,
schedule_timeout_interruptible(200 * HZ) to delay 200 seconds.

Another approach would be for me to figure out some way for rcutorture
to figure out that boot was not far enough along for it to safely
do much, probably enabled by a third value of rcutorture_runnable.

One more approach would be to replace DUMP_ALL with DUMP_NONE in
kernel/rcu/rcutorture.c's rcutorture_trace_dump() function.  Or
to remove the ftrace_dump() statement entirely.  (The question that
this might help answer is which part of rcutorture_trace_dump() is
causing the problem.)

Any of these approaches seem reasonable?

							Thanx, Paul


  reply	other threads:[~2014-09-26  7:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20140913122412.GA10984@localhost>
2014-09-15 20:27 ` [torture] BUG: unable to handle kernel NULL pointer dereference at (null) Paul E. McKenney
2014-09-17  2:31   ` Fengguang Wu
2014-09-17 16:17     ` Paul E. McKenney
2014-09-18 13:17       ` Fengguang Wu
2014-09-26  7:42         ` Paul E. McKenney [this message]
2014-09-30  2:27           ` Fengguang Wu
2014-09-30  9:58             ` Paul E. McKenney
2014-09-30 11:41               ` Fengguang Wu
2014-09-30 19:22                 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140926074223.GN4723@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=fengguang.wu@intel.com \
    --cc=jet.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=tao.su@intel.com \
    --cc=yuanhan.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).