* [Bug 56821] New: an ext4 commit ee0906f causes weird disk hangs
@ 2013-04-19 12:25 bugzilla-daemon
2013-04-19 12:26 ` [Bug 56821] " bugzilla-daemon
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: bugzilla-daemon @ 2013-04-19 12:25 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=56821
Summary: an ext4 commit ee0906f causes weird disk hangs
Product: File System
Version: 2.5
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
AssignedTo: fs_ext4@kernel-bugs.osdl.org
ReportedBy: kynde@ts.ray.fi
Regression: Yes
Created an attachment (id=99301)
--> (https://bugzilla.kernel.org/attachment.cgi?id=99301)
A console msg often seen during the hang
The commit (ee0906fc8da3447d168a73570754a160ecbe399b ext4: use
s_extent_max_zeroout_kb value as number of kb) causes a strange disk/raid/fs
hang for me.
Steps to reproduce:
1) login
2) startx (I've tried with nv and nvidia)
3) launch thunderbird and wait 3..10 secs
Expected results:
- just another day in the office
Actual results:
- A hang. First I see some refreshes not happening and shortly I can't do
anything besides jump from X to consoles and back. I tap something out on those
terminals that are still live, but any disk access will hang them, too. The
attached console_msg.txt pops out sometimes if I wait long enough. Magic sysrq
sync,mount ro, boot is what I do next.
I've used practically every stable release on this box since some time before
3.0 without problems. And ever since 3.8.5 I've been stuck to 3.8.4. Since then
I've tried every stable release up to 3.8.8 and none of them work.
The ee0906f commit seems to cause it. I did double checks on surrounding
commits, but not more than that. I takes 10 minutes to resync my raid-1 after a
failure and that kinda limits my enthusiasm to work it further on my own. No
damage seems to be caused by such an event though. The raid sync succeeds every
time it only takes a while.
The setup is an updated Fedora 18 on an AMD 4184, 16 Gb ram, LSI SAS controller
with two 300GB disks. Three partitions each, first on both is a 50Gb raid1 ext4
as root and second of both is a 100Gb raid1 ext4 as /home. Third partitions are
non-raid old ext3 or ext4 filesystems that aren't mounted or used.
I haven't managed to cause the hang when outside of X. I've tried some kernel
compiling and catting files to null, but no. Equally while in X (nv or nvidia,
doesn't matter) thunderbird seems to trigger it. It launches fully but within a
few to ten seconds things start to fail. Another interesting tid bit is that
the disk leds in the array both get turned off, which is anomalous. Usually
they only blink during access.
I'm willing to provide information and try out things, just let me know what
you need.
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 56821] an ext4 commit ee0906f causes weird disk hangs
2013-04-19 12:25 [Bug 56821] New: an ext4 commit ee0906f causes weird disk hangs bugzilla-daemon
@ 2013-04-19 12:26 ` bugzilla-daemon
2013-04-19 12:40 ` bugzilla-daemon
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2013-04-19 12:26 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=56821
--- Comment #1 from Tommi Kyntola <kynde@ts.ray.fi> 2013-04-19 12:26:22 ---
Created an attachment (id=99311)
--> (https://bugzilla.kernel.org/attachment.cgi?id=99311)
the config used during the bisection
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 56821] an ext4 commit ee0906f causes weird disk hangs
2013-04-19 12:25 [Bug 56821] New: an ext4 commit ee0906f causes weird disk hangs bugzilla-daemon
2013-04-19 12:26 ` [Bug 56821] " bugzilla-daemon
@ 2013-04-19 12:40 ` bugzilla-daemon
2013-04-19 17:32 ` bugzilla-daemon
2013-04-20 17:59 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2013-04-19 12:40 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=56821
Tommi Kyntola <kynde@ts.ray.fi> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kynde@ts.ray.fi
Platform|All |x86-64
Kernel Version| |3.8.5
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 56821] an ext4 commit ee0906f causes weird disk hangs
2013-04-19 12:25 [Bug 56821] New: an ext4 commit ee0906f causes weird disk hangs bugzilla-daemon
2013-04-19 12:26 ` [Bug 56821] " bugzilla-daemon
2013-04-19 12:40 ` bugzilla-daemon
@ 2013-04-19 17:32 ` bugzilla-daemon
2013-04-20 17:59 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2013-04-19 17:32 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=56821
Theodore Tso <tytso@mit.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tytso@mit.edu
--- Comment #2 from Theodore Tso <tytso@mit.edu> 2013-04-19 17:32:54 ---
This should allow your system not to crash.
echo 0 > /sys/fs/ext4/<dev>/extent_max_zeroout_kb
The failure which you are showing seems to be one where your SCSI controller
and/or your SCSI disks are freaking out when ext4 tries to zero out a block
range by calling sb_issue_zeroout(). The block layer will translate this into
a TRIM command or a SCSI WRITE SAME command for those devices which support
this, so that blocks can be efficiently zeroed out.
It looks like the block device layer translated this to a standard SCSI
WRITE(10) command which is getting issued to both disks at the same time (I
assume you are using a software raid via an md device?). I suspect this is a
case where ext4 is enabling a new block device optimization interface, and this
is interacting badly with your hardware or your block device driver.
So we need to figure out what is actually causing the feature, so we can some
how automatically blacklist whatever is failing. In the mean time, you can
force off the optimization at the ext4 layer by setting extent_max_zeroout_kb
to zero. Hopefully we can figure out a better way of disabling the
optimization at a lower level (so you can get the benefits of minimizing extent
tree fragmentation without causing your raid array to hang), and some way of
disabling some level of optimization or hardware breakage workaround
automatically.
mptscsih: ioc0: attempting task abort! (sc=ffff8803ec450f00)
sd 6:0:1:0: [sdb] CDB:
Write(10): 2a 00 12 60 a0 a8 00 00 40 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed},
SubCode(0x0000) cb_idx mptscsih_io_done
mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff8803ec450f00)
mptscsih: ioc0: attempting task abort! (sc=ffff8803ec450900)
sd 6:0:0:0: [sda] CDB:
Write(10): 2a 00 12 60 a0 a8 00 00 40 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed},
SubCode(0x0000) cb_idx mptscsih_io_done
mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff8803ec450900)
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug 56821] an ext4 commit ee0906f causes weird disk hangs
2013-04-19 12:25 [Bug 56821] New: an ext4 commit ee0906f causes weird disk hangs bugzilla-daemon
` (2 preceding siblings ...)
2013-04-19 17:32 ` bugzilla-daemon
@ 2013-04-20 17:59 ` bugzilla-daemon
3 siblings, 0 replies; 5+ messages in thread
From: bugzilla-daemon @ 2013-04-20 17:59 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=56821
--- Comment #3 from Tommi Kyntola <kynde@ts.ray.fi> 2013-04-20 17:59:25 ---
Yes to software raid, I thought I mentioned it there already, but evidently I
skipped that bit, sorry.
I can strace the thunderbird launch, which is still the only way I've managed
to trigger it, but thankfully it triggers it every time, but we'll have to wait
until Monday as it's my office box.
You wouldn't happen to have ideas what I should/could try to cause that in a
cleaner way? (i.e. should I be generating load with large files or lots of
small ones, lots of concurrent action or will a single thread do, etc)
And what else should I be trying? Stipping down kernel config? Would ext4
debugs help?
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-04-20 17:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19 12:25 [Bug 56821] New: an ext4 commit ee0906f causes weird disk hangs bugzilla-daemon
2013-04-19 12:26 ` [Bug 56821] " bugzilla-daemon
2013-04-19 12:40 ` bugzilla-daemon
2013-04-19 17:32 ` bugzilla-daemon
2013-04-20 17:59 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).