From: James Chapman <jchapman@katalix.com>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: David Miller <davem@davemloft.net>,
Paul Mackerras <paulus@samba.org>,
netdev@vger.kernel.org
Subject: Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
Date: Thu, 21 Feb 2008 09:53:56 +0000 [thread overview]
Message-ID: <47BD4A34.7070606@katalix.com> (raw)
In-Reply-To: <20080221085959.GA12944@ff.dom.local>
Jarek Poplawski wrote:
> On Wed, Feb 20, 2008 at 10:37:57PM +0000, James Chapman wrote:
>> Jarek Poplawski wrote:
>>
>>>>> (testing patch #1)
>>> But I hope you tested with the fixed (take 2) version of this patch...
>> Yes I did. :)
>>
>> But I just got another lockdep error (attached).
>>
>>> Since it's quite experimental (testing) this patch could be wrong
>>> as it is, but I hope it should show the proper way to solve this
>>> problem. Probably you did some of these, but here are a few of my
>>> suggestions for testing this:
>>>
>>> 1) try my patch with your full bh locking changing patch;
>>> 2) add while loops to these trylocks on failure, with e.g. __delay(1);
>>> this should work like full locks again, but there should be no (this
>>> kind of) lockdep reports;
>> Hmm, isn't this just bypassing the lockdep checks?
>
> Yes! But it's only for debugging: to find if this change in locking
> is to be blamed for these new lockups. It should effectively work just
> like without this patch, but without this lockdep warning. So, if
> after such change lockups still happen, then it would seem you didn't
> test this enough before. Otherwise the new patch is to blame and needs
> reworking.
The lockups still happen, but I think they are now due to a different
problem, as you say.
Some background on this issue might be useful to help get feedback from
others on the list. This issue was first reported by an ISP who found
random lockups if an L2TP tunnel carrying hundreds/thousands of L2TP
sessions went down due to a network outage and then recovered itself. On
recovery, all of the tunnel's sessions (PPP) are created rapidly.
Sometimes the tunnel would recover just fine, but other times not. The
ISP put some effort into reproducing the problem and found that
repeatedly creating/deleting a tunnel with lots of L2TP sessions would
cause the failure after a random time between a few minutes and several
hours. The original lockdep trace came from the ISP. I initially
couldn't reproduce the problem but I borrowed two equivalent quad-core
systems and can now reproduce it. Subsequent lockdep traces have been
from my testing.
The _bh locking fixes in pppol2tp combined with your ppp_generic change
solved that problem. So I then added data traffic into the mix (since
this will happen in a real network) and found that lockups still happen.
But the lockdep trace in this case is different, as you noted.
Does PPPoE stress the PPP setup code as much as this scenario? I guess
in theory it could if lots of PPPoE clients connected at the same time,
but there is no aggregate tunnel like there is with L2TP to cause all
sessions to connect simultaneously. Perhaps PPTP also suffers from these
issues? Perhaps not because it tends to be used only in VPN setups where
there is only 1 session per tunnel.
>>> 3) I send here another testing patch with this second way to do this:
>>> on the write side, but it's even more "experimental" and only a
>>> proof of concept (should be applied on vanilla ppp_generic).
>> I'll look over it. I think I need to take a step back and look at what's
>> happening in more detail though.
>
> This is something completely new and changes all the picture: the xmit
> path wasn't expected (at least by me) to be called in softirq context
> at all, and there were no traces of this on previous reports. But,
> since lockdep always stops after the first warning, there could be
> even more surprises like this in the future. I'll check this report.
Doesn't the TX softirq do transmits if they've been queued up?
--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development
next prev parent reply other threads:[~2008-02-21 9:54 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-11 9:22 [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver James Chapman
2008-02-11 18:57 ` Jarek Poplawski
2008-02-11 22:19 ` James Chapman
2008-02-11 22:49 ` Jarek Poplawski
2008-02-11 22:55 ` Jarek Poplawski
2008-02-11 23:42 ` James Chapman
2008-02-12 10:42 ` Jarek Poplawski
2008-02-11 23:41 ` James Chapman
2008-02-12 5:30 ` David Miller
2008-02-12 10:58 ` James Chapman
2008-02-12 13:24 ` Jarek Poplawski
2008-02-13 6:00 ` David Miller
2008-02-13 7:29 ` Jarek Poplawski
2008-02-14 13:00 ` Jarek Poplawski
2008-02-18 22:09 ` James Chapman
2008-02-18 23:01 ` Jarek Poplawski
2008-02-19 9:09 ` James Chapman
2008-02-19 4:29 ` David Miller
2008-02-19 9:03 ` James Chapman
2008-02-19 10:30 ` Jarek Poplawski
2008-02-19 10:36 ` Jarek Poplawski
2008-02-19 14:37 ` James Chapman
2008-02-19 23:06 ` Jarek Poplawski
2008-02-19 23:28 ` Jarek Poplawski
2008-02-20 16:02 ` James Chapman
2008-02-20 18:38 ` Jarek Poplawski
2008-02-20 22:37 ` James Chapman
2008-02-21 8:59 ` Jarek Poplawski
2008-02-21 9:53 ` James Chapman [this message]
2008-02-21 12:08 ` Jarek Poplawski
2008-02-21 17:09 ` Jarek Poplawski
2008-02-25 12:19 ` James Chapman
2008-02-25 13:05 ` Jarek Poplawski
2008-02-25 13:39 ` Jarek Poplawski
2008-02-25 14:02 ` Jarek Poplawski
2008-02-25 21:58 ` Jarek Poplawski
2008-02-26 12:14 ` James Chapman
2008-02-26 13:03 ` Jarek Poplawski
2008-02-26 13:18 ` Jarek Poplawski
2008-02-26 20:00 ` Jarek Poplawski
2008-03-02 20:29 ` James Chapman
2008-03-03 8:22 ` Jarek Poplawski
2008-03-03 9:35 ` Jarek Poplawski
2008-02-27 10:54 ` [PATCH][PPPOL2TP] add missing sock_put() in pppol2tp_recv_dequeue() Jarek Poplawski
2008-03-02 20:31 ` James Chapman
2008-03-04 4:49 ` David Miller
2008-02-27 11:48 ` [PATCH][PPPOL2TP] add missing sock_put() in pppol2tp_tunnel_closeall() Jarek Poplawski
2008-03-02 20:32 ` James Chapman
2008-03-04 4:49 ` David Miller
2008-02-22 14:16 ` [PATCH][NET] sock.c: sk_dst_lock lockdep keys and names per af_family Jarek Poplawski
2008-02-12 7:19 ` [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47BD4A34.7070606@katalix.com \
--to=jchapman@katalix.com \
--cc=davem@davemloft.net \
--cc=jarkao2@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.