linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
@ 2011-02-18 21:47 bugzilla-daemon
  2011-02-18 21:54 ` [Bug 29402] " bugzilla-daemon
                   ` (22 more replies)
  0 siblings, 23 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-18 21:47 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402

           Summary: kernel panics while running ffsb scalability workloads
                    on 2.6.38-rc1 through -rc5
           Product: File System
           Version: 2.5
    Kernel Version: 2.6.38-rc5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
        AssignedTo: fs_ext4@kernel-bugs.osdl.org
        ReportedBy: eric.whitney@hp.com
        Regression: Yes


Created an attachment (id=48352)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48352)
captured console output - spinlock bad magic: ext4lazyinit

The 2.6.38-rc5 kernel can panic while running any one of the ffsb profiles in
http://free.linux.hp.com/~enw/ext4/profiles on an ext4 filesystem on a 48 core
x86 system. These panics occur most frequently using the 48 or 192 thread
versions of those profiles.  The problem has been reproduced on a 16 core x86
system using identical storage, but occurs there at lower frequency.  On
average, it takes only two runs of "large_file_creates_threads-192.ffsb" to
produce a panic on the 48 core system.

The panics occur more or less equally frequently on a vanilla ext4 filesystem,
ext4 filesystem without a journal, and an ext4 filesystem with a journal but
mounted with mblk_io_submit.

With various debugging options enabled including spinlock debugging, panics or
oopses or BUGS occur in four varieties: protection violation, invalid opcode,
NULL pointer, and spinlock bad magic.  Typically, the first fault triggers a
cascade of subsequent oopses, etc.

These panics can be suppressed by using -E lazy_itable_init at mkfs time.  The
test system survived two series of 10 ffsb tests beginning with a single mkfs
each.  Subsequently, the system survived a run of about 16 hours in which a
complete scalability measurement pass was made

Repeated ffsb runs on ext3 and xfs filesystems on 2.6.38-rc* have not produced
panics.

Numerous previous ffsb scalability runs on ext4 and 2.6.37 did not produce
panics.

The panics can be produced using either HP SmartArray (backplane RAID) or
FibreChannel storage with no material difference in the panic backtraces.

Attempted bisection of the bug in 38-rc1 was inconclusive.  Repeatability was
lost the earlier in -rc1 I got.  The last clear indication was in the midst of
perf changes very early in the release (SHA id beginning with 006b20fe4c,
"Merge branch 'perf/urgent' into perf/core").  Preceding that are RCU and GFS
patches, plus a small number of x86 patches.

Relatively little useful spinlock debugging information was reported in
repeated tests in early 38 rc's - with later rc's, more information gradually
became visible (or maybe I was just getting progressively more lucky).

The first attachment contains the partial backtrace that most clearly suggests
lazy_itable_init involvement.  The softirq portion of this backtrace tends to
look the same across the panics I've seen.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
@ 2011-02-18 21:54 ` bugzilla-daemon
  2011-02-18 22:07 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-18 21:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #1 from Eric Whitney <eric.whitney@hp.com>  2011-02-18 21:54:16 ---
Created an attachment (id=48362)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48362)
captured console output - another back trace of a different sort

This backtrace is also from -rc5, and shows both the typical trace in softirq
as well as the cascade of faults mentioned in the description.  In this case,
we've got a failure other than spinlock bad magic reported.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
  2011-02-18 21:54 ` [Bug 29402] " bugzilla-daemon
@ 2011-02-18 22:07 ` bugzilla-daemon
  2011-02-18 22:20 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-18 22:07 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Rafael J. Wysocki <rjw@sisk.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |florian@mickler.org,
                   |                            |maciej.rutecki@gmail.com,
                   |                            |rjw@sisk.pl
             Blocks|                            |27352




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
  2011-02-18 21:54 ` [Bug 29402] " bugzilla-daemon
  2011-02-18 22:07 ` bugzilla-daemon
@ 2011-02-18 22:20 ` bugzilla-daemon
  2011-02-18 22:20 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-18 22:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Eric Whitney <eric.whitney@hp.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #48352|application/octet-stream    |text/plain
          mime type|                            |




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2011-02-18 22:20 ` bugzilla-daemon
@ 2011-02-18 22:20 ` bugzilla-daemon
  2011-02-21 15:52 ` [Bug 29402] New: " Lukas Czerner
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-18 22:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Eric Whitney <eric.whitney@hp.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #48362|application/octet-stream    |text/plain
          mime type|                            |




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2011-02-18 22:20 ` bugzilla-daemon
@ 2011-02-21 15:52 ` Lukas Czerner
  2011-02-21 16:52 ` [Bug 29402] " bugzilla-daemon
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: Lukas Czerner @ 2011-02-21 15:52 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-ext4

On Fri, 18 Feb 2011, bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> These panics can be suppressed by using -E lazy_itable_init at mkfs time.  The
> test system survived two series of 10 ffsb tests beginning with a single mkfs
> each.  Subsequently, the system survived a run of about 16 hours in which a
> complete scalability measurement pass was made
> 

Hi Eric,

Are you saying that it works fine (without panic) with lazy_itable_init
feature enabled ? I am confused now, because in the first trace
"spinlock bad magic: ext4lazyinit" the ext4lazyinit thread is obviously
running. When the ext4lazyinit thread is running, it means that inode
table was not zeroed, hence kernel thread (ext4lazyinit) is trying to do
the job. Am I missing something ?

Note that on the kernel which supports lazy itable initialization
(/sys/fs/ext4/features/lazy_itable_init exists) the mke2fs (since
v1.41.12-50-g210fd2c) will skip inode table initialization
automatically (it is the same as you would specify -E lazy_itable_init).

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2011-02-21 15:52 ` [Bug 29402] New: " Lukas Czerner
@ 2011-02-21 16:52 ` bugzilla-daemon
  2011-02-22  4:10 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-21 16:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #2 from Lukas Czerner <lczerner@redhat.com>  2011-02-21 16:52:14 ---
On Fri, 18 Feb 2011, bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> These panics can be suppressed by using -E lazy_itable_init at mkfs time.  The
> test system survived two series of 10 ffsb tests beginning with a single mkfs
> each.  Subsequently, the system survived a run of about 16 hours in which a
> complete scalability measurement pass was made
> 

Hi Eric,

Are you saying that it works fine (without panic) with lazy_itable_init
feature enabled ? I am confused now, because in the first trace
"spinlock bad magic: ext4lazyinit" the ext4lazyinit thread is obviously
running. When the ext4lazyinit thread is running, it means that inode
table was not zeroed, hence kernel thread (ext4lazyinit) is trying to do
the job. Am I missing something ?

Note that on the kernel which supports lazy itable initialization
(/sys/fs/ext4/features/lazy_itable_init exists) the mke2fs (since
v1.41.12-50-g210fd2c) will skip inode table initialization
automatically (it is the same as you would specify -E lazy_itable_init).

Thanks!
-Lukas

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2011-02-21 16:52 ` [Bug 29402] " bugzilla-daemon
@ 2011-02-22  4:10 ` bugzilla-daemon
  2011-02-22 11:30   ` Lukas Czerner
  2011-02-22 11:30 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-22  4:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #3 from Eric Whitney <eric.whitney@hp.com>  2011-02-22 04:10:33 ---
Hi, Lukas:

Unfortunately, I omitted the argument, and that's what's confusing you.   I'm
saying that it works fine - no panic - with the feature disabled:  -E
lazy_itable_init=0.  (I'm sorry about that.  I was filing just a little too
quickly after informing Ted via IRC.  It's been a long hunt.)  Again, it panics
easily with lazy_itable_init set to its default, which means without using the
switch.

In addition to what I initially reported, I've made more than 80 additional
ffsb runs on freshly made ext4, ext4 nojournal, and ext4 mblk_io_submit
filesystems over the last three days with lazy_itable_init=0, and there have
been no further panics.

For the record, I'm using e2fsprogs 1.41.14 straight from Ted's git tree.

Thanks,
Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-22  4:10 ` bugzilla-daemon
@ 2011-02-22 11:30   ` Lukas Czerner
  0 siblings, 0 replies; 25+ messages in thread
From: Lukas Czerner @ 2011-02-22 11:30 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-ext4

On Tue, 22 Feb 2011, bugzilla-daemon@bugzilla.kernel.org wrote:

> Unfortunately, I omitted the argument, and that's what's confusing you.

That's what I was afraid of. Fortunately we know where to look now. It
seems that the problem is really in the lazy init code in ext4. It is
possible that due some locking problem we might be zeroing existing
inode. Did you tried to run fsck on that filesystem after the crash ?

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2011-02-22  4:10 ` bugzilla-daemon
@ 2011-02-22 11:30 ` bugzilla-daemon
  2011-02-22 13:43 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-22 11:30 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #4 from Lukas Czerner <lczerner@redhat.com>  2011-02-22 11:30:34 ---
On Tue, 22 Feb 2011, bugzilla-daemon@bugzilla.kernel.org wrote:

> Unfortunately, I omitted the argument, and that's what's confusing you.

That's what I was afraid of. Fortunately we know where to look now. It
seems that the problem is really in the lazy init code in ext4. It is
possible that due some locking problem we might be zeroing existing
inode. Did you tried to run fsck on that filesystem after the crash ?

Thanks!
-Lukas

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2011-02-22 11:30 ` bugzilla-daemon
@ 2011-02-22 13:43 ` bugzilla-daemon
  2011-02-22 13:43 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-22 13:43 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #5 from Lukas Czerner <lczerner@redhat.com>  2011-02-22 13:43:10 ---
Created an attachment (id=48672)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48672)
Fix ordering patch

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2011-02-22 13:43 ` bugzilla-daemon
@ 2011-02-22 13:43 ` bugzilla-daemon
  2011-02-25  3:08 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-22 13:43 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Lukas Czerner <lczerner@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lczerner@redhat.com




--- Comment #6 from Lukas Czerner <lczerner@redhat.com>  2011-02-22 13:43:24 ---
Hi Eric,

it might be just a shot in the dark, but would you be willing to try out this
patch ? Unfortunately I do not currently have a machine to reproduce the
problem.

I *think* that the problem is in the block layer (bio_batch_end_io() and
blkdev_issue_zeroout() to be specific) and since lazy init is using that to do
the zeroing we can not hit it with lazy init turned off.

Now, the problem I see is that when we are going to wait_for_completion() in
blkdev_issue_discard() we check if the bb.done equals issued (number of issued
bios). If it equals, we can skip the wait_for_completion() and jump out of the
function since there is nothing to wait for. However, there is a ordering
problem because bio_batch_end_io() is calling atomic_inc(&bb->done) before
complete(), hence it might seem to blkdev_issue_zeroout() that all bios has
been completed and exit. At this point when bio_batch_end_io() is going to call
complete(bb->wait) bb does not longer exist since it was declared locally in
blkdev_issue_zeroout() ==> panic while trying to acquire wait.lock!

(thread 1)                         (thread 2)
bio_batch_end_io()                 blkdev_issue_zeroout()
  if(bb) {                         ...
    if (bb->end_io)                ...
      bb->end_io(bio, err);        ...
    atomic_inc(&bb->done);         ...
    ...                            while (issued != atomic_read(&bb.done))
    ...                            (let issued == bb.done)
    ...                            (do the rest of the function)
    ...                            return ret;
    complete(bb->wait);
    ^^^^^^^^
Panic in complete() while trying to acquire spinlock.

I hope it is not complete nonsense :).Please let me know whether the patch
helps.

Thanks!
-Lukas

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (9 preceding siblings ...)
  2011-02-22 13:43 ` bugzilla-daemon
@ 2011-02-25  3:08 ` bugzilla-daemon
  2011-02-25  3:25 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-25  3:08 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #7 from Eric Whitney <eric.whitney@hp.com>  2011-02-25 03:08:48 ---
Hi Lukas:

When running the large_file_creates_threads=192 workload with this patch on an
-rc5 kernel, I consistently saw a new failure. The ext4lazyinit thread stalled
(tripping hung task detector) along with the flusher and journal threads, and
ffsb then ceased to make forward progress.

I think we're just seeing a new variant of the original race, where
bio_batch_end_io() sometimes never gets the value of &bb->done it's waiting
for:

bio_batch_end_io()
  ...
complete(bb->wait)

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (10 preceding siblings ...)
  2011-02-25  3:08 ` bugzilla-daemon
@ 2011-02-25  3:25 ` bugzilla-daemon
  2011-02-25  3:27 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-25  3:25 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #8 from Eric Whitney <eric.whitney@hp.com>  2011-02-25 03:25:46 ---
(sorry - network hiccup - starting from the top again)

bio_batch_end_io()                  blkdev_issue_zeroout()
  ...                                 wait_for_completion(&wait)
  complete(bb->wait)                  ...
  ...                                 issued != atomic_read(&bb.done)
  ...                                 wait_for_completion(&wait)
  atomic_inc(&bb->done)               ...
                                      <wedge while waiting for the completion
                                       that won't be coming>

At any rate, that's what I think is happening - we're missing the last "inc"
and then we're not getting the completion notification we need to finish in
blkdev_issue_zeroout().

I'll attach the first hung task trace from the console for reference.

Thanks!
Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (11 preceding siblings ...)
  2011-02-25  3:25 ` bugzilla-daemon
@ 2011-02-25  3:27 ` bugzilla-daemon
  2011-02-26  3:50 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-25  3:27 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #9 from Eric Whitney <eric.whitney@hp.com>  2011-02-25 03:27:13 ---
Created an attachment (id=49042)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=49042)
hung task trace for ext4lazyinit resulting from first patch

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (12 preceding siblings ...)
  2011-02-25  3:27 ` bugzilla-daemon
@ 2011-02-26  3:50 ` bugzilla-daemon
  2011-02-26 10:48 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-26  3:50 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #10 from Eric Whitney <eric.whitney@hp.com>  2011-02-26 03:50:09 ---
Created an attachment (id=49172)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=49172)
Possible fix - patch simplifies completion counting and avoids races

Hi Lukas:

This patch has successfully passed an initial round of tests which 2.6.38-rc*
kernels have never passed.  It seemed appropriate to simplify the way
outstanding completions were being accounted for to avoid the races, and the
code ought to be just a little more efficient on a SMP/NUMA box as well.  At
any rate, it's a stab at it - I'd appreciate any comments you might have.  I'll
put it through a weekend's worth of test runs next.

Thanks!
Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (13 preceding siblings ...)
  2011-02-26  3:50 ` bugzilla-daemon
@ 2011-02-26 10:48 ` bugzilla-daemon
  2011-02-28 18:07 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-26 10:48 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #11 from Lukas Czerner <lczerner@redhat.com>  2011-02-26 10:48:42 ---
Hi Eric,

thanks for the patch. I have almost the same one, except you do not
really need "done" variable, but instead just do:

    while (issued--)

Also, there is no need for bio_end_io_t in struct bio_batch, but it does
not affect the results, obviously. Finally, I was able to reproduce the
problem so I'll give it a try as well.

Thanks!
-Lukas

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (14 preceding siblings ...)
  2011-02-26 10:48 ` bugzilla-daemon
@ 2011-02-28 18:07 ` bugzilla-daemon
  2011-02-28 18:22 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-28 18:07 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #12 from Eric Whitney <eric.whitney@hp.com>  2011-02-28 18:07:38 ---
Hi Lukas:

The quick and dirty patch I attached on 26 Feb successfully passed 36 hours of
testing on ext4, ext4 nojournal, and ext4 mblk_io_submit filesystems using the
ffsb scalability workloads.  That was a total of 135 ffsb runs.  The original
panic appeared on average after two runs.

At any rate, hopefully this is a useful data point.  I'd be happy to test your
more polished patch in the same manner.

Thanks,
Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (15 preceding siblings ...)
  2011-02-28 18:07 ` bugzilla-daemon
@ 2011-02-28 18:22 ` bugzilla-daemon
  2011-03-03 18:58 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-02-28 18:22 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Lukas Czerner <lczerner@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #48672|0                           |1
        is obsolete|                            |




--- Comment #13 from Lukas Czerner <lczerner@redhat.com>  2011-02-28 18:22:03 ---
Created an attachment (id=49642)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=49642)
Possible fix no.2 - simplification to avoid bad ordering

Hi Eric,

here is a patch for testing. The behaviour should be the same as with your
patch, to I hope it would work as expected. Thanks for testing.

-Lukas

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (16 preceding siblings ...)
  2011-02-28 18:22 ` bugzilla-daemon
@ 2011-03-03 18:58 ` bugzilla-daemon
  2011-03-04 20:18 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-03-03 18:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #14 from Eric Whitney <eric.whitney@hp.com>  2011-03-03 18:58:15 ---
Hi Lukas:

Your "Possible fix no. 2" passed 36 hours of extended testing using the ffsb
scalability test suite on a modified 2.6.35-rc5 kernel with ext4, ext4
nojournal, and ext4 mblk filesystems.  The patch survived 135 ffsb runs where
an average of two runs were sufficient to produce the bug originally. 

I think your fix is good and ought to be submitted.  Please feel free to add my
Reported-by/Tested-by if you wish.

I'll follow up and retest once this patch makes it to mainline.

Thanks very much,
Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (17 preceding siblings ...)
  2011-03-03 18:58 ` bugzilla-daemon
@ 2011-03-04 20:18 ` bugzilla-daemon
  2011-03-04 20:19 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-03-04 20:18 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Florian Mickler <florian@mickler.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |PATCH_ALREADY_AVAILABLE




--- Comment #15 from Florian Mickler <florian@mickler.org>  2011-03-04 20:18:56 ---
https://bugzilla.kernel.org/show_bug.cgi?id=29402#c13

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (18 preceding siblings ...)
  2011-03-04 20:18 ` bugzilla-daemon
@ 2011-03-04 20:19 ` bugzilla-daemon
  2011-03-06 12:03 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-03-04 20:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #16 from Florian Mickler <florian@mickler.org>  2011-03-04 20:19:10 ---
Patch: https://bugzilla.kernel.org/show_bug.cgi?id=29402#c13

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (19 preceding siblings ...)
  2011-03-04 20:19 ` bugzilla-daemon
@ 2011-03-06 12:03 ` bugzilla-daemon
  2011-03-11  8:13 ` bugzilla-daemon
  2011-03-14 17:25 ` bugzilla-daemon
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-03-06 12:03 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402





--- Comment #17 from Rafael J. Wysocki <rjw@sisk.pl>  2011-03-06 12:03:24 ---
Ignore-Patch : https://bugzilla.kernel.org/show_bug.cgi?id=29402#c13
Patch : https://bugzilla.kernel.org/attachment.cgi?id=49642

Handled-By : Lukas Czerner <lczerner@redhat.com>

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (20 preceding siblings ...)
  2011-03-06 12:03 ` bugzilla-daemon
@ 2011-03-11  8:13 ` bugzilla-daemon
  2011-03-14 17:25 ` bugzilla-daemon
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-03-11  8:13 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Lukas Czerner <lczerner@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #49642|0                           |1
        is obsolete|                            |




--- Comment #18 from Lukas Czerner <lczerner@redhat.com>  2011-03-11 08:13:05 ---
Created an attachment (id=50622)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=50622)
Simplification to avoid bad ordering

This is new version of the bug fix, which has been tested by Eric Whitney and
posted (https://patchwork.kernel.org/patch/627671/).

Quoting Eric:
"No panics, another full pass through the ffsb workloads on the three ext4
variants without trouble, and the system logs are clean as well."

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug 29402] kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5
  2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
                   ` (21 preceding siblings ...)
  2011-03-11  8:13 ` bugzilla-daemon
@ 2011-03-14 17:25 ` bugzilla-daemon
  22 siblings, 0 replies; 25+ messages in thread
From: bugzilla-daemon @ 2011-03-14 17:25 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=29402


Florian Mickler <florian@mickler.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED
         Resolution|PATCH_ALREADY_AVAILABLE     |CODE_FIX




--- Comment #19 from Florian Mickler <florian@mickler.org>  2011-03-14 17:25:37 ---
merged after .38-rc8:
commit 0aeea18964173715a1037034ef6838198f319319
Author: Lukas Czerner <lczerner@redhat.com>
Date:   Fri Mar 11 10:23:53 2011 +0100

    block: fix mis-synchronisation in blkdev_issue_zeroout()

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-03-14 17:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-18 21:47 [Bug 29402] New: kernel panics while running ffsb scalability workloads on 2.6.38-rc1 through -rc5 bugzilla-daemon
2011-02-18 21:54 ` [Bug 29402] " bugzilla-daemon
2011-02-18 22:07 ` bugzilla-daemon
2011-02-18 22:20 ` bugzilla-daemon
2011-02-18 22:20 ` bugzilla-daemon
2011-02-21 15:52 ` [Bug 29402] New: " Lukas Czerner
2011-02-21 16:52 ` [Bug 29402] " bugzilla-daemon
2011-02-22  4:10 ` bugzilla-daemon
2011-02-22 11:30   ` Lukas Czerner
2011-02-22 11:30 ` bugzilla-daemon
2011-02-22 13:43 ` bugzilla-daemon
2011-02-22 13:43 ` bugzilla-daemon
2011-02-25  3:08 ` bugzilla-daemon
2011-02-25  3:25 ` bugzilla-daemon
2011-02-25  3:27 ` bugzilla-daemon
2011-02-26  3:50 ` bugzilla-daemon
2011-02-26 10:48 ` bugzilla-daemon
2011-02-28 18:07 ` bugzilla-daemon
2011-02-28 18:22 ` bugzilla-daemon
2011-03-03 18:58 ` bugzilla-daemon
2011-03-04 20:18 ` bugzilla-daemon
2011-03-04 20:19 ` bugzilla-daemon
2011-03-06 12:03 ` bugzilla-daemon
2011-03-11  8:13 ` bugzilla-daemon
2011-03-14 17:25 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).