From: Joe Landman <landman@scalableinformatics.com>
To: Jon Marshall <jon@campbell-lange.net>
Cc: xfs@oss.sgi.com
Subject: Re: XFS on CoRAID errors with SMB
Date: Mon, 28 Nov 2011 10:36:48 -0500 [thread overview]
Message-ID: <4ED3AA90.8090202@scalableinformatics.com> (raw)
In-Reply-To: <20111128152652.GD1795@campbell-lange.net>
On 11/28/2011 10:26 AM, Jon Marshall wrote:
> Hi Joe,
>
> Thanks for the rapid response.
>
> Is this something that has been reported often in relation to AoE? Is
We've experienced it in the past when we supported our customers with
Coraid gear. Most of that is gone now, so we haven't seen much AoE
stuff as of late (last 2 years or so).
This said, the AoE stack depends critically upon the network stack, and
between AoE and the network stack (or possibly something else), you ran
out of memory for use in the kernel. Our experience with this is
usually a leaky network driver. e1000 and similar Intel drivers shipped
with default RHEL5/Centos5 are highly problematic. AoE could be leaking
itself (early versions were pretty bad in this regard, though I haven't
looked at the driver in the last few years, they hopefully have improved
it).
The xfs connection to this (to stay relevant to this group) is that xfs
is ok atop this, as long as the other layers don't go away. If you can
detect problems like this in advance, you might be able to issue an
xfs_freeze, and preserve the integrity of the underlying filesystem
(obviating the need for an xfs_repair). The hard part would be an
accurate prediction, but if your drivers are grabbing memory and not
releasing it back, or you have a run-away memory consuming process,
yeah, you could potentially predict this onset.
> there any chance you could point us in the direction of some more
> background on the issue? I am checking the AoE mailing list, but if you know
> of something specific that would be very helpful.
Not really, we aren't doing much with AoE anymore. This may or may not
be an AoE issue per se. Likely AoE crashed, and the reason for the
crash is very probably the same reason that xfs crashed, it ran out of
memory. If AoE is the culprit, you might find some sort of imprint of
this in the logs, though our experience has been usually a run-away
network driver. Since AoE does its block devices over raw ethernet
packets, it doesn't take very long for a leaky driver to crash such a
system under load.
>
> I am also looking into the ethernet drivers we have in place on the
> system in question.
>
> Again, thanks for the quick and informative response.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2011-11-28 15:36 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-28 13:55 XFS on CoRAID errors with SMB Jon Marshall
2011-11-28 14:46 ` Joe Landman
2011-11-28 15:26 ` Jon Marshall
2011-11-28 15:36 ` Joe Landman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ED3AA90.8090202@scalableinformatics.com \
--to=landman@scalableinformatics.com \
--cc=jon@campbell-lange.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox