public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Clark <michael@metaparadigm.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Andrew Morton <akpm@digeo.com>, linux-kernel@vger.kernel.org
Subject: Re: 2.4.19pre10aa4 OOPS in ext3 (get_hash_table, unmap_underlying_metadata)
Date: Thu, 26 Sep 2002 19:43:21 +0800	[thread overview]
Message-ID: <3D92F2D9.3050308@metaparadigm.com> (raw)
In-Reply-To: 20020926111234.GC11333@dualathlon.random

On 09/26/02 19:12, Andrea Arcangeli wrote:
> On Thu, Sep 26, 2002 at 01:02:20AM -0700, Andrew Morton wrote:
> 
>>Michael Clark wrote:
>>
>>>On 09/26/02 15:27, Andrew Morton wrote:
>>>
>>>>Michael Clark wrote:
>>>>
>>>>
>>>>>Hiya,
>>>>>
>>>>>Been having frequent (every 4-8 days) oopses with 2.4.19pre10aa4 on
>>>>>a moderately loaded server (100 users - 0.4 load avg).
>>>>>
>>>>>The server is a Intel STL2 with dual P3, 1GB RAM, Intel Pro1000T
>>>>>and Qlogic 2300 Fibre channel HBA.
>>>>>
>>>>>We are running qla2300, e1000 and lvm modules unmodified as present in
>>>>>2.4.19pre10aa4. We also have quotas enabled on 1 of the ext3 fs.
>>>>>
>>>>
>>>>
>>>>It's not familiar, sorry.
>>>
>>>Maybe I should try XFS? I've heard of people running this for
>>>80+ days and no downtime. I really would like to get past 8 days.
>>
>>Well that would be one way of eliminating variables, and that's
>>the only way to narrow this down.   Looks like something somewhere
>>(software or hardware) has corrupted some memory.
> 
> 
> yep. This is the hardest kind of bugs to fix, since the oops (or a lkcd
> crash dump) would be almost totally useless in these cases. I'm quite
> relaxed about the core, I would look into either scsi driver or e1000
> first.

I'm pretty sure it's not e1000 as we just switched to it from
GA621's using the ns82830 driver - still had the same lookups
with a completely different eth driver (although at that stage
we were loosing the oopses).

I'm still dubious about the qlogic driver although v4.45 had
stood up to a 2 week cerberus and bonnie run with the 2.4.18pre2aa2
kernel, although this kernel also would panic every 8-10 days
under fileserver load.

Possibly an LVM interaction with the qlogic driver. Maybe i
should stick with fixed partition sizes and drop the LVM.

Due to corrupted bufferheads in the oops, it is most likely some
interaction between ext3,LVM and qlogic ???

afpd (Netatalk)
   |
ext3 with quotas
   |
LVM
   |
qlogic2300 6.0.1b1

> While I got good reports about qla drivers, I'm not sure how many people
> is testing the e1000 in pre10aa4, that's an old driver, 4.2.x, the new
> one in mainline 2.4.20 is 4.3.15. So I would suggest first of all to
> upgrade the e1000 driver, just in case (I will shortly upload a new -aa
> with all pending stuff included, but the latest one against 20pre5
> is just in sync with mainline in terms of e1000).

Okay. Hmmm, still not sure what my best next step for elimination,
still my personal hunch is LVM, ext3 or this mix. I'm sorta thinking
XFS with no LVM at the moment (or maybe I should just remove the LVM
as a first step). Trouble is it takes a week or so for the problem to
show up.

Would you suggest i tried 2.4.20pre5aa2 with its qlogic 6.1b1 (I notice
b5 is the latest) and no LVM?

>>The problem is that even if you _do_ fix the problem by switching
>>something out, the cause could lie elsewhere.

Cheers,
~mc


  reply	other threads:[~2002-09-26 11:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-26  5:57 2.4.19pre10aa4 OOPS in ext3 (get_hash_table, unmap_underlying_metadata) Michael Clark
2002-09-26  7:27 ` Andrew Morton
2002-09-26  7:56   ` Michael Clark
2002-09-26  8:02     ` Andrew Morton
2002-09-26 11:12       ` Andrea Arcangeli
2002-09-26 11:43         ` Michael Clark [this message]
2002-09-26  9:21     ` Mario Mikocevic
2002-09-26  9:53       ` Michael Clark
2002-09-26 11:13         ` Mario Mikocevic
2002-09-26 11:48           ` Michael Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D92F2D9.3050308@metaparadigm.com \
    --to=michael@metaparadigm.com \
    --cc=akpm@digeo.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox