From: Andriy Rysin <arysin@bcsii.net>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: linux-kernel@vger.kernel.org, sct@redhat.com,
Andrew Morton <akpm@digeo.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: ext3 crash with 2.4.22: Assertion failure in journal_forget_R10d91946()
Date: Tue, 07 Oct 2003 15:18:02 -0700 [thread overview]
Message-ID: <3F833B9A.3080800@bcsii.net> (raw)
In-Reply-To: <Pine.LNX.4.44.0310061946290.2403-100000@logos.cnet>
Marcelo Tosatti wrote:
>Andriy,
>
>On Thu, 2 Oct 2003, Andriy Rysin wrote:
>
>
>
>>I am having crashes on ext3 with 2.4.22 kernel. System was up for 8
>>days. I am not sure I can reproduce it real quick but we've seen it
>>occasionly on 2.4.20 for about several months and after we updated to
>>2.4.22 it's here again.
>>
>>please CC me if you answer or need more information.
>>
>>
>>the log looks like this:
>>
>>Sep 29 20:15:08 dunne-demo kernel: EXT3-fs error (device ide0(3,2)):
>>ext3_free_blocks: Freeing blocks not in datazone - bloc
>>k = 2907885836, count = 1
>>Sep 29 20:15:08 dunne-demo kernel: EXT3-fs error (device ide0(3,2)):
>>ext3_free_blocks: Freeing blocks not in datazone - bloc
>>k = 1660415916, count = 1
>>Sep 29 20:15:08 dunne-demo kernel: EXT3-fs error (device ide0(3,2)):
>>ext3_free_blocks: Freeing blocks not in datazone - bloc
>>k = 1438298218, count = 1
>>Sep 29 20:15:08 dunne-demo kernel: EXT3-fs error (device ide0(3,2)):
>>ext3_free_blocks: Freeing blocks not in datazone - bloc
>>k = 4209573569, count = 1
>>Sep 29 20:15:08 dunne-demo kernel: EXT3-fs error (device ide0(3,2)):
>>ext3_free_blocks: Freeing blocks not in datazone - bloc
>>k = 2918065562, count = 1
>>......
>>Sep 29 21:05:18 dunne-demo kernel: EXT3-fs error (device ide0(3,2)):
>>ext3_free_blocks: bit already cleared for block 5970190
>>......
>>Oct 2 00:43:53 dunne-demo kernel: hda: dma_timer_expiry: dma status == 0x20
>>Oct 2 00:43:53 dunne-demo kernel: hda: timeout waiting for DMA
>>Oct 2 00:43:53 dunne-demo kernel: hda: timeout waiting for DMA
>>Oct 2 00:43:53 dunne-demo kernel: hda: (__ide_dma_test_irq) called
>>
>>
>
>You are getting DMA timeouts and such. Try turning off the DMA.
>
>But anyway the ext3 fs errors shouldnt happen I guess. Andrew, Stephen?
>
>
If I turn DMA off the system won't be able handle the load we need. This
problem happens under heavy load on different machines. It seems like
IBM and WDC drives give DMA errors while Maxtor don't. But DMA problems
are not quite related to the filesystem problems. On several systems we
had the same ext3 problem while not observing any DMA errors
(particularly Maxtor case). I doubt all those drives are faulty.
I may add that nature of our application is writing media data files in
cycle manner. When filsystem gets close to full the script deletes
oldest files. Usual filesystem size is about 80G, file sizes range from
several kilos to 1.5G with average about several megs. The system can
simultaniously write up to 32 files and deletion happens in parallel.
Usual high load for the systems is about 4-5MB/s on writing reported by
sar -b (not much for reading ~ 300KB/s).
Also what interesting is that DMA errors happen mostly at pretty low
load on disk ~400KB/s.
A week ago I replaced ext3 with jfs on 3 systems and till now don't get
any DMA or filesystem errors. I even was loading those systems with
scripts constantly copying directory with ~4G of files and removing it
for about 4-5 hours (avg loag by sar was 13MB/s for reading and 15MB/s
for writing) and still did not get any problems.
So it seems like it's not the fault of the drives. I am still curious if
jfs somehow puts less load on the drive not causing any DMAs but I'd
like to spend couple of more weeks testing before claiming that.
Andriy
next prev parent reply other threads:[~2003-10-07 22:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-02 20:14 ext3 crash with 2.4.22: Assertion failure in journal_forget_R10d91946() Andriy Rysin
2003-10-06 22:56 ` Marcelo Tosatti
2003-10-07 9:11 ` Stephen C. Tweedie
2003-10-07 22:18 ` Andriy Rysin [this message]
2003-10-15 22:26 ` 2.4.20, 2.4.22, 2.4.6-test7: system locks up completely when writing to floppy (2.2.20 is ok) Andriy Rysin
2003-10-17 18:28 ` 2.4.20, 2.4.22, 2.4.6-test7: system locks up completely when writing to floppy (2.2.20 is ok) - solution Andriy Rysin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F833B9A.3080800@bcsii.net \
--to=arysin@bcsii.net \
--cc=akpm@digeo.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).