All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Kirby <sim@hostway.ca>
To: drbd-dev@lists.linbit.com, linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [Drbd-dev] [3.1-rc4] XFS+DRBD hangs
Date: Fri, 9 Sep 2011 13:13:24 -0700	[thread overview]
Message-ID: <20110909201324.GD6195@hostway.ca> (raw)
In-Reply-To: <20110908174324.GA8043@hostway.ca>

On Thu, Sep 08, 2011 at 10:43:24AM -0700, Simon Kirby wrote:

> On Thu, Sep 08, 2011 at 05:13:05PM +0200, Lars Ellenberg wrote:
> 
> > Sorry for double posting on drbd-dev, I managed to strip the other lists from Cc.
> > 
> > > We upgraded from 2.6.36 which seemed to have a page leak (file pages left
> > > on the LRU) and so would eventually perform very poorly. 2.6.37 and
> > > 2.6.38 seemed to have some unix socket issue that caused heartbeat to
> > > wedge. Shall we enable lock debugging or something here?
> > 
> > That could help us understand that stack trace.
> > 
> > It looks like cpu 1 blocks in
> > 
> > > [ 1532.427149]  [<ffffffff8103d512>] ? try_to_wake_up+0xc2/0x270
> > > [ 1532.427149]  <<EOE>>  <IRQ>  [<ffffffff8103d6cd>] default_wake_function+0xd/0x10
> > 
> > Which does not make sense to me at all.
> 
> Well, good news, I think.. I believe this may be related to
> "PCI: Set PCI-E Max Payload Size on fabric", added by b03e7495a862b02829.
> 3.1-rc5 is running now with a patch to basically disable those changes,
> and has been stable for 12 hours. It usually hung in a few minutes
> before.
> 
> The XFS peoples say it was very likely not 58d84c4ee0389ddeb86238d5 which
> is the only other thing that changed between these versions that seems to
> be at all in the hang path.
> 
> Also, when the thing hangs, it stops pinging immediately, and with the
> PCI-E max payload thing active, the device that raises a bus error is
> actually the PCI-E to PCI-X bridge chip used to support the BCM5708 NICs,
> so that all seems related.

Except that I accidentally git reset out the patch, and so it's been
running unmodified 79016f648872549392d232cd648bd02298c2d2bb (past -rc5),
and still hasn't crashed, so I guess it _was_ the XFS changes, or
something else. Boggle. In any event, it's still running well. :)

Simon-

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Simon Kirby <sim@hostway.ca>
To: drbd-dev@lists.linbit.com, linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [Drbd-dev] [3.1-rc4] XFS+DRBD hangs
Date: Fri, 9 Sep 2011 13:13:24 -0700	[thread overview]
Message-ID: <20110909201324.GD6195@hostway.ca> (raw)
In-Reply-To: <20110908174324.GA8043@hostway.ca>

On Thu, Sep 08, 2011 at 10:43:24AM -0700, Simon Kirby wrote:

> On Thu, Sep 08, 2011 at 05:13:05PM +0200, Lars Ellenberg wrote:
> 
> > Sorry for double posting on drbd-dev, I managed to strip the other lists from Cc.
> > 
> > > We upgraded from 2.6.36 which seemed to have a page leak (file pages left
> > > on the LRU) and so would eventually perform very poorly. 2.6.37 and
> > > 2.6.38 seemed to have some unix socket issue that caused heartbeat to
> > > wedge. Shall we enable lock debugging or something here?
> > 
> > That could help us understand that stack trace.
> > 
> > It looks like cpu 1 blocks in
> > 
> > > [ 1532.427149]  [<ffffffff8103d512>] ? try_to_wake_up+0xc2/0x270
> > > [ 1532.427149]  <<EOE>>  <IRQ>  [<ffffffff8103d6cd>] default_wake_function+0xd/0x10
> > 
> > Which does not make sense to me at all.
> 
> Well, good news, I think.. I believe this may be related to
> "PCI: Set PCI-E Max Payload Size on fabric", added by b03e7495a862b02829.
> 3.1-rc5 is running now with a patch to basically disable those changes,
> and has been stable for 12 hours. It usually hung in a few minutes
> before.
> 
> The XFS peoples say it was very likely not 58d84c4ee0389ddeb86238d5 which
> is the only other thing that changed between these versions that seems to
> be at all in the hang path.
> 
> Also, when the thing hangs, it stops pinging immediately, and with the
> PCI-E max payload thing active, the device that raises a bus error is
> actually the PCI-E to PCI-X bridge chip used to support the BCM5708 NICs,
> so that all seems related.

Except that I accidentally git reset out the patch, and so it's been
running unmodified 79016f648872549392d232cd648bd02298c2d2bb (past -rc5),
and still hasn't crashed, so I guess it _was_ the XFS changes, or
something else. Boggle. In any event, it's still running well. :)

Simon-

  reply	other threads:[~2011-09-09 20:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-07 22:15 [3.1-rc4] XFS+DRBD hangs Simon Kirby
2011-09-07 22:15 ` Simon Kirby
2011-09-07 22:15 ` [Drbd-dev] " Simon Kirby
2011-09-08 15:10 ` Lars Ellenberg
2011-09-08 15:13 ` Lars Ellenberg
2011-09-08 15:13   ` Lars Ellenberg
2011-09-08 17:43   ` Simon Kirby
2011-09-08 17:43     ` Simon Kirby
2011-09-09 20:13     ` Simon Kirby [this message]
2011-09-09 20:13       ` Simon Kirby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110909201324.GD6195@hostway.ca \
    --to=sim@hostway.ca \
    --cc=drbd-dev@lists.linbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.