From: Guillaume Nault <gnault@redhat.com>
To: Martin Zaharinov <micron10@gmail.com>
Cc: "linux-kernel@vger kernel. org" <linux-kernel@vger.kernel.org>,
Eric Dumazet <eric.dumazet@gmail.com>,
netdev@vger.kernel.org
Subject: Re: Urgent: BUG: PPP ioctl Transport endpoint is not connected
Date: Mon, 14 Dec 2020 17:44:11 +0100 [thread overview]
Message-ID: <20201214164411.GA8350@linux.home> (raw)
In-Reply-To: <FDF5FB97-DB82-4DFD-AC05-28F60C6D166F@gmail.com>
On Thu, Dec 10, 2020 at 09:16:24AM +0200, Martin Zaharinov wrote:
> And one other
> From other mailing I see you send patch to Denys Fedoryshchenko this patch is :
>
> diff --git a/drivers/net/ppp/ppp_generic.c
> b/drivers/net/ppp/ppp_generic.c
>
> index 255a5def56e9..2acf4b0eabd1 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int
> unit)
>
> goto outl;
>
> ppp_lock(ppp);
> + spin_lock_bh(>downl);
> + if (!pch->chan) {
> + /* Don't connect unregistered channels */
> + ppp_unlock(ppp);
> + spin_unlock_bh(>downl);
> + ret = -ENOTCONN;
> + goto outl;
> + }
> + spin_unlock_bh(>downl);
> if (pch->file.hdrlen > ppp->file.hdrlen)
> ppp->file.hdrlen = pch->file.hdrlen;
> hdrlen = pch->file.hdrlen + 2; /* for protocol bytes */
This was a quick untested patch that I sent to help debugging Denys'
problem. It has a lock inversion problem that I fixed before I formally
submitted it upstream. I even warned about it in the original thread:
https://lore.kernel.org/netdev/20180302174328.GD1413@alphalink.fr/
> But in official stable kernel three In ppp_generic.c is this :
>
> spin_lock_bh(&pch->downl);
> if (!pch->chan) {
> /* Don't connect unregistered channels */
> spin_unlock_bh(&pch->downl);
> ppp_unlock(ppp);
> ret = -ENOTCONN;
> goto outl; }
> spin_unlock_bh(&pch->downl);
This one is correct.
> It is normal to unlock ppp after spin_unlock ?
> shouldn't it be as you wrote it?
> In your patch first :
>
> + ppp_unlock(ppp);
> + spin_unlock_bh(>downl);
No, nested locks have to be released in the reverse order they were
acquired.
> But in stable kernel is :
>
> spin_unlock_bh(&pch->downl);
> ppp_unlock(ppp);
This is correct, and has been correctly backported to 4.14-stable.
> > On 9 Dec 2020, at 20:10, Guillaume Nault <gnault@redhat.com> wrote:
> >
> > On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote:
> >>> On 9 Dec 2020, at 18:40, Guillaume Nault <gnault@redhat.com> wrote:
> >>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote:
> >>>> Hi All
> >>>>
> >>>> I have problem with latest kernel release
> >>>> And the problem is base on this late problem :
> >>>>
> >>>>
> >>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1
> >>>>
> >>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have same problem.
> >>>>
> >>>>
> >>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment accel service stop with defunct and in log have many of this line :
> >>>>
> >>>>
> >>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>>
> >>>> In one moment connected user bump double or triple and after that service defunct and need wait to drop all session to start .
> >>>>
> >>>> I talk with accel-ppp team and they said this is kernel related problem and to back to kernel 4.14 there is not this problem.
> >>>>
> >>>> Problem is come after kernel 4.15 > and not have solution to this moment.
> >>>
> >>> I'm sorry, I don't understand.
> >>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)?
> >>> Did the problem start appearing in v4.15? Or did v4.15 work and the
> >>> problem appeared in v4.16?
> >>
> >> In Telegram group I talk with Sergey and Dimka and told my the problem is come after changes from 4.14 to 4.15
> >> Sergey write this : "as I know, there was a similar issue in kernel 4.15 so maybe it is still not fixed"
> >
> > Ok, but what is your experience? Do you have a kernel version where
> > accel-ppp reports no ioctl() error and doesn't crash the kernel?
> >
> > There wasn't a lot of changes between 4.14 and 4.15 for PPP.
> > The only PPP patch I can see that might have been risky is commit
> > 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device").
> >
> >> I don’t have options to test with this old kernel 4.14.xxx i don’t have support for them.
> >>
> >>
> >>>
> >>>> Please help to find the problem.
> >>>>
> >>>> Last time in link I see is make changes in ppp_generic.c
> >>>>
> >>>> ppp_lock(ppp);
> >>>> spin_lock_bh(&pch->downl);
> >>>> if (!pch->chan) {
> >>>> /* Don't connect unregistered channels */
> >>>> spin_unlock_bh(&pch->downl);
> >>>> ppp_unlock(ppp);
> >>>> ret = -ENOTCONN;
> >>>> goto outl;
> >>>> }
> >>>> spin_unlock_bh(&pch->downl);
> >>>>
> >>>>
> >>>> But this fix only to don’t display error and freeze system
> >>>> The problem is stay and is to big.
> >>>
> >>> Do you use accel-ppp's unit-cache option? Does the problem go away if
> >>> you stop using it?
> >>>
> >>
> >> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but user Is connect and disconnet more fast.
> >>
> >> The problem is same with unit and without .
> >> Only after this patch I don’t see error in dimes but this is not solution.
> >
> > Soryy, what's "in dimes"?
> > Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent
> > unregistered channels from connecting to PPP units") fixes your problem?
> >
> >> In network have customer what have power cut problem, when drop 600 user and back Is normal but in this moment kernel is locking and start to make this :
> >> sessions:
> >> starting: 4235
> >> active: 3882
> >> finishing: 378
> >> The problem is starting session is not real user normal user in this server is ~4k customers .
> >
> > What type of session is it? L2TP, PPPoE, PPTP?
> >
> >> I use pppd_compat .
> >>
> >> Any idea ?
> >>
> >>>>
> >>>> Please help to fix.
> >> Martin
>
prev parent reply other threads:[~2020-12-14 16:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-09 14:47 Urgent: BUG: PPP ioctl Transport endpoint is not connected Martin Zaharinov
2020-12-09 16:40 ` Guillaume Nault
2020-12-09 16:57 ` Martin Zaharinov
2020-12-09 17:29 ` Martin Zaharinov
2020-12-09 18:10 ` Guillaume Nault
2020-12-09 19:12 ` Martin Zaharinov
2020-12-14 17:09 ` Guillaume Nault
2020-12-10 7:06 ` Martin Zaharinov
2020-12-10 7:16 ` Martin Zaharinov
2020-12-14 16:44 ` Guillaume Nault [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201214164411.GA8350@linux.home \
--to=gnault@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=micron10@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).