linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
@ 2011-02-18 21:47 bugzilla-daemon
  2011-02-18 21:54 ` [Bug 29402] " bugzilla-daemon
                   ` (22 more replies)
  0 siblings, 23 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-18 21:47 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402

           Summary: kernel panics while running ffsb scalability workloads
                    on 2.6.38-rc1 through -rc5
           Product: File System
           Version: 2.5
    Kernel Version: 2.6.38-rc5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
        AssignedTo: fs_ext4@kernel-bugs.osdl.org
        ReportedBy: eric.whitney@hp.com
        Regression: Yes


Created an attachment (id=48352)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48352)
captured console output - spinlock bad magic: ext4lazyinit

The 2.6.38-rc5 kernel can panic while running any one of the ffsb profiles in
http://free.linux.hp.com/~enw/ext4/profiles on an ext4 filesystem on a 48 core
x86 system. These panics occur most frequently using the 48 or 192 thread
versions of those profiles.  The problem has been reproduced on a 16 core x86
system using identical storage, but occurs there at lower frequency.  On
average, it takes only two runs of "large_file_creates_threads-192.ffsb" to
produce a panic on the 48 core system.

The panics occur more or less equally frequently on a vanilla ext4 filesystem,
ext4 filesystem without a journal, and an ext4 filesystem with a journal but
mounted with mblk_io_submit.

With various debugging options enabled including spinlock debugging, panics or
oopses or BUGS occur in four varieties: protection violation, invalid opcode,
NULL pointer, and spinlock bad magic.  Typically, the first fault triggers a
cascade of subsequent oopses, etc.

These panics can be suppressed by using -E lazy_itable_init at mkfs time.  The
test system survived two series of 10 ffsb tests beginning with a single mkfs
each.  Subsequently, the system survived a run of about 16 hours in which a
complete scalability measurement pass was made

Repeated ffsb runs on ext3 and xfs filesystems on 2.6.38-rc* have not produced
panics.

Numerous previous ffsb scalability runs on ext4 and 2.6.37 did not produce
panics.

The panics can be produced using either HP SmartArray (backplane RAID) or
FibreChannel storage with no material difference in the panic backtraces.

Attempted bisection of the bug in 38-rc1 was inconclusive.  Repeatability was
lost the earlier in -rc1 I got.  The last clear indication was in the midst of
perf changes very early in the release (SHA id beginning with 006b20fe4c,
"Merge branch 'perf/urgent' into perf/core").  Preceding that are RCU and GFS
patches, plus a small number of x86 patches.

Relatively little useful spinlock debugging information was reported in
repeated tests in early 38 rc's - with later rc's, more information gradually
became visible (or maybe I was just getting progressively more lucky).

The first attachment contains the partial backtrace that most clearly suggests
lazy_itable_init involvement.  The softirq portion of this backtrace tends to
look the same across the panics I've seen.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-03-14 17:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
2011-02-18 21:54 ` [Bug 29402] " bugzilla-daemon
2011-02-18 22:07 ` bugzilla-daemon
2011-02-18 22:20 ` bugzilla-daemon
2011-02-18 22:20 ` bugzilla-daemon
2011-02-21 15:52 ` [Bug 29402] New: " Lukas Czerner
2011-02-21 16:52 ` [Bug 29402] " bugzilla-daemon
2011-02-22  4:10 ` bugzilla-daemon
2011-02-22 11:30   ` Lukas Czerner
2011-02-22 11:30 ` bugzilla-daemon
2011-02-22 13:43 ` bugzilla-daemon
2011-02-22 13:43 ` bugzilla-daemon
2011-02-25  3:08 ` bugzilla-daemon
2011-02-25  3:25 ` bugzilla-daemon
2011-02-25  3:27 ` bugzilla-daemon
2011-02-26  3:50 ` bugzilla-daemon
2011-02-26 10:48 ` bugzilla-daemon
2011-02-28 18:07 ` bugzilla-daemon
2011-02-28 18:22 ` bugzilla-daemon
2011-03-03 18:58 ` bugzilla-daemon
2011-03-04 20:18 ` bugzilla-daemon
2011-03-04 20:19 ` bugzilla-daemon
2011-03-06 12:03 ` bugzilla-daemon
2011-03-11  8:13 ` bugzilla-daemon
2011-03-14 17:25 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).