All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andre Noll <maan@systemlinux.org>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>,
	Andrew Vasquez <andrew.vasquez@qlogic.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	Linux Driver <Linux-Driver@qlogic.com>,
	Thomas Helle <Helle@tuebingen.mpg.de>
Subject: Re: ext4: (2.6.34-rc4): This should not happen!!  Data will be lost
Date: Tue, 20 Apr 2010 17:37:23 +0200	[thread overview]
Message-ID: <20100420153723.GE25507@skl-net.de> (raw)
In-Reply-To: <20100417223854.GD25507@skl-net.de>

[-- Attachment #1: Type: text/plain, Size: 3493 bytes --]

On 00:38, Andre Noll wrote:
> > I still don't think it's likely a filesystem problem but maybe you can
> > pinpoint the fs behavior that triggers it.
> 
> I'll try to reproduce the problem using different timeout values and the
> ext4 options you suggest. If I can find a reliable reproducer, I'll run
> blktrace and post the results.

Here are some results. Prior to running the tests I wrote a bunch of
10G files and then filled the fs completely with 2T files containing
zeros.  Each of the tests below consisted of three runs of

	- remove 5 of the above 10G files to make 50G space available
	- run stress -d 5 --hdd-bytes 10G --hdd-noclean until it dies
	- run fsck if any fs errors occured

Summary: Increasing the device timeout to 60s _or_ disabling barriers
makes the problem go away. Deactivating delayed allocation makes the
problem worse.

- device timeout 60s, default ext4 parameters
	No problems at all, all three runs OK

- device timeout 30s, default ext4 parameters
	1. OK
	2. dmesg:
		qla2xxx 0000:06:09.0: scsi(0:0:0): Abort command issued -- 1 2ea270b 2002.
		end_request: I/O error, dev sda, sector 7812889640
		Aborting journal on device sda-8.
		EXT4-fs error (device sda): ext4_journal_start_sb: Detected aborted journal
		EXT4-fs (sda): Remounting filesystem read-only

	fsck:
			Inode 287, i_blocks is 4294918568, should be 416.  Fix? yes
			Inode 288, i_size is 2198897426432, should be 2199023251456.  Fix? yes
			Inode 288, i_blocks is 4294721960, should be 416.  Fix? yes

	3.
		qla2xxx 0000:06:09.0: scsi(0:0:0): Abort command issued -- 1 2ece6a8 2002.
		qla2xxx 0000:06:09.0: scsi(0:0:0): Abort command issued -- 1 2ece6dc 2002.
		end_request: I/O error, dev sda, sector 7812690136
		Aborting journal on device sda-8.
		EXT4-fs error (device sda) in ext4_free_blocks: Journal has aborted
		EXT4-fs error (device sda) in ext4_ext_remove_space: Journal has aborted
		EXT4-fs error (device sda) in ext4_reserve_inode_write: Journal has aborted
		EXT4-fs error (device sda) in ext4_ext_truncate: Journal has aborted
		EXT4-fs error (device sda) in ext4_reserve_inode_write: Journal has aborted
		EXT4-fs error (device sda) in ext4_orphan_del: Journal has aborted
		EXT4-fs error (device sda) in ext4_reserve_inode_write: Journal has aborted
		EXT4-fs error (device sda) in ext4_delete_inode: Journal has aborted
		EXT4-fs error (device sda): ext4_journal_start_sb: Detected aborted journal
		EXT4-fs (sda): Remounting filesystem read-only

		fsck:
			e2fsck 1.41.10 (10-Feb-2009)
			/dev/sda: recovering journal
			Clearing orphaned inode 179 (uid=0, gid=0, mode=0100600, size=0)
			/dev/sda: clean, 301/244158464 files, 1935004235/1953247232 blocks


- device timeout 30s, nodelalloc

	This seems to trigger the problem more reliably:

	1.
		dmesg: same qla, ext4 errors as above
		fsck: orphaned inodes as above
	2. and 3.
		errors already while removing files:
			rm: cannot remove `stress.98q1gG': Read-only file system
		dmesg: same qla/ext4 errors, but also
			JBD2: Detected IO errors while flushing file data on sda-8
		fsck: clean

- device timeout 30s, nobarrier
	No problem at all, all three runs OK.

Eric, are you still interested in seeing the blktrace output? Suppose,
I should use a 30s timeout, nodealloc and barriers=1 as this triggers
the problem within minutes.


Regards
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2010-04-20 15:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-16 12:35 ext4: (2.6.34-rc4): This should not happen!! Data will be lost Andre Noll
2010-04-16 14:29 ` tytso
2010-04-16 15:02 ` Eric Sandeen
2010-04-16 15:30   ` Andre Noll
2010-04-16 15:43     ` Eric Sandeen
2010-04-16 15:52       ` Andrew Vasquez
2010-04-16 16:08         ` Andre Noll
2010-04-16 16:36           ` Andrew Vasquez
2010-04-16 17:07             ` Andre Noll
2010-04-17 16:55               ` Bernd Schubert
2010-04-17 18:19                 ` Andre Noll
2010-04-17 18:43                   ` Bernd Schubert
2010-04-17 20:45                 ` Eric Sandeen
2010-04-17 22:38                   ` Andre Noll
2010-04-20 15:37                     ` Andre Noll [this message]
2010-04-20 16:51                       ` Eric Sandeen
2010-04-20 17:26                         ` Bernd Schubert
2010-04-20 18:35                           ` Andre Noll
2010-04-20 20:09                             ` Bernd Schubert
2010-04-20 17:46                         ` Andre Noll
2010-04-22  8:21                           ` Andre Noll
2010-04-21  8:57                       ` Dmitry Monakhov
2010-04-21 13:47                         ` Andre Noll
2010-04-16 15:57       ` Andre Noll

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100420153723.GE25507@skl-net.de \
    --to=maan@systemlinux.org \
    --cc=Helle@tuebingen.mpg.de \
    --cc=Linux-Driver@qlogic.com \
    --cc=andrew.vasquez@qlogic.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.