From: Guillaume Nault <gnault@redhat.com>
To: Martin Zaharinov <micron10@gmail.com>
Cc: "Pali Rohár" <pali@kernel.org>,
"Greg KH" <gregkh@linuxfoundation.org>,
netdev <netdev@vger.kernel.org>,
"Eric Dumazet" <eric.dumazet@gmail.com>
Subject: Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected
Date: Tue, 14 Sep 2021 10:02:06 +0200 [thread overview]
Message-ID: <20210914080206.GA20454@pc-4.home> (raw)
In-Reply-To: <E95FDB1D-488B-4780-96A1-A2D5C9616A7A@gmail.com>
On Tue, Sep 14, 2021 at 09:16:55AM +0300, Martin Zaharinov wrote:
> Hi Nault
>
> See this stats :
>
> Linux 5.14.2 (testb) 09/14/21 _x86_64_ (12 CPU)
>
> 11:33:44 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> 11:33:45 all 1.75 0.00 18.85 0.00 0.00 5.00 0.00 0.00 0.00 74.40
> 11:33:46 all 1.74 0.00 17.88 0.00 0.00 4.72 0.00 0.00 0.00 75.66
> 11:33:47 all 2.23 0.00 17.62 0.00 0.00 5.05 0.00 0.00 0.00 75.10
> 11:33:48 all 1.82 0.00 13.64 0.00 0.00 5.70 0.00 0.00 0.00 78.84
> 11:33:49 all 1.50 0.00 13.46 0.00 0.00 5.15 0.00 0.00 0.00 79.90
> 11:33:50 all 3.06 0.00 13.96 0.00 0.00 4.79 0.00 0.00 0.00 78.20
> 11:33:51 all 1.40 0.00 16.53 0.00 0.00 5.21 0.00 0.00 0.00 76.86
> 11:33:52 all 4.43 0.00 19.44 0.00 0.00 6.56 0.00 0.00 0.00 69.57
> 11:33:53 all 1.51 0.00 16.40 0.00 0.00 4.77 0.00 0.00 0.00 77.32
> 11:33:54 all 1.51 0.00 16.55 0.00 0.00 4.71 0.00 0.00 0.00 77.23
> 11:33:55 all 1.00 0.00 13.21 0.00 0.00 5.90 0.00 0.00 0.00 79.90
> Average: all 2.00 0.00 16.14 0.00 0.00 5.23 0.00 0.00 0.00 76.63
>
>
> PerfTop: 28046 irqs/sec kernel:96.3% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 23.37% [nf_conntrack] [k] nf_ct_iterate_cleanup
> 17.76% [kernel] [k] mutex_spin_on_owner
> 9.47% [pppoe] [k] pppoe_rcv
> 7.71% [kernel] [k] osq_lock
> 2.77% [nf_nat] [k] inet_cmp
> 2.59% [nf_nat] [k] device_cmp
> 2.55% [kernel] [k] __local_bh_enable_ip
> 2.04% [kernel] [k] _raw_spin_lock
> 1.23% [kernel] [k] __cond_resched
> 1.16% [kernel] [k] rcu_all_qs
> 1.13% libfrr.so.0.0.0 [.] 0x00000000000ce970
> 0.79% [nf_conntrack] [k] nf_conntrack_lock
> 0.75% libfrr.so.0.0.0 [.] 0x00000000000ce94e
> 0.53% [kernel] [k] __netif_receive_skb_core.constprop.0
> 0.46% [kernel] [k] fib_table_lookup
> 0.46% [ip_tables] [k] ipt_do_table
> 0.45% [ixgbe] [k] ixgbe_clean_rx_irq
> 0.37% [kernel] [k] __dev_queue_xmit
> 0.34% [nf_conntrack] [k] __nf_conntrack_find_get.isra.0
> 0.33% [ixgbe] [k] ixgbe_clean_tx_irq
> 0.30% [kernel] [k] menu_select
> 0.25% [kernel] [k] vlan_do_receive
> 0.21% [kernel] [k] ip_finish_output2
> 0.21% [ixgbe] [k] ixgbe_poll
> 0.20% [kernel] [k] _raw_spin_lock_irqsave
> 0.19% [kernel] [k] get_rps_cpu
> 0.19% libc.so.6 [.] 0x0000000000186afa
> 0.19% [kernel] [k] queued_read_lock_slowpath
> 0.19% [kernel] [k] do_poll.constprop.0
> 0.19% [kernel] [k] cpuidle_enter_state
> 0.18% [kernel] [k] dev_hard_start_xmit
> 0.18% [kernel] [k] ___slab_alloc.constprop.0
> 0.17% zebra [.] 0x00000000000b9271
> 0.16% [kernel] [k] csum_partial_copy_generic
> 0.16% zebra [.] 0x00000000000b91f1
> 0.16% [kernel] [k] page_frag_free
> 0.16% [kernel] [k] kmem_cache_alloc
> 0.15% [kernel] [k] __skb_flow_dissect
> 0.15% [kernel] [k] sched_clock
> 0.15% libc.so.6 [.] 0x00000000000965a2
> 0.15% [kernel] [k] kmem_cache_free_bulk.part.0
> 0.15% [pppoe] [k] pppoe_flush_dev
> 0.15% [ixgbe] [k] ixgbe_tx_map
> 0.14% [kernel] [k] _raw_spin_lock_bh
> 0.14% [kernel] [k] fib_table_flush
> 0.14% [kernel] [k] native_irq_return_iret
> 0.14% [kernel] [k] __dev_xmit_skb
> 0.13% [kernel] [k] nf_hook_slow
> 0.13% [kernel] [k] fib_lookup_good_nhc
> 0.12% [kernel] [k] __fget_files
> 0.12% [kernel] [k] process_backlog
> 0.12% [xt_dtvqos] [k] 0x00000000000008d1
> 0.12% [kernel] [k] __list_del_entry_valid
> 0.12% [kernel] [k] skb_release_data
> 0.12% [kernel] [k] ip_route_input_slow
> 0.11% [kernel] [k] netif_skb_features
> 0.11% [kernel] [k] sock_poll
> 0.11% [kernel] [k] __schedule
> 0.11% [kernel] [k] __softirqentry_text_start
>
>
> And on time of problem when try to write : ip a
> to list interface wait 15-20 sec i finaly have options to simulate but users is angry when down internet.
Probably some contention on the rtnl lock.
> In case need to know why system is overloaded when deconfig ppp interface.
Does it help if you disable conntrack?
>
> Best regards,
> Martin
>
>
>
>
> > On 11 Sep 2021, at 9:26, Martin Zaharinov <micron10@gmail.com> wrote:
> >
> > Hi Guillaume
> >
> > Main problem is overload of service because have many finishing ppp (customer) last two day down from 40-50 to 100-200 users and make problem when is happen if try to type : ip a wait 10-20 sec to start list interface .
> > But how to find where is a problem any locking or other.
> > And is there options to make fast remove ppp interface from kernel to reduce this load.
> >
> >
> > Martin
> >
> >> On 7 Sep 2021, at 9:42, Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> Perf top from text
> >>
> >>
> >> PerfTop: 28391 irqs/sec kernel:98.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs)
> >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> 17.01% [nf_conntrack] [k] nf_ct_iterate_cleanup
> >> 9.73% [kernel] [k] mutex_spin_on_owner
> >> 9.07% [pppoe] [k] pppoe_rcv
> >> 2.77% [nf_nat] [k] device_cmp
> >> 1.66% [kernel] [k] osq_lock
> >> 1.65% [kernel] [k] _raw_spin_lock
> >> 1.61% [kernel] [k] __local_bh_enable_ip
> >> 1.35% [nf_nat] [k] inet_cmp
> >> 1.30% [kernel] [k] __netif_receive_skb_core.constprop.0
> >> 1.16% [kernel] [k] menu_select
> >> 0.99% [kernel] [k] cpuidle_enter_state
> >> 0.96% [ixgbe] [k] ixgbe_clean_rx_irq
> >> 0.86% [kernel] [k] __dev_queue_xmit
> >> 0.70% [kernel] [k] __cond_resched
> >> 0.69% [sch_cake] [k] cake_dequeue
> >> 0.67% [nf_tables] [k] nft_do_chain
> >> 0.63% [kernel] [k] rcu_all_qs
> >> 0.61% [kernel] [k] fib_table_lookup
> >> 0.57% [kernel] [k] __schedule
> >> 0.57% [kernel] [k] skb_release_data
> >> 0.54% [kernel] [k] sched_clock
> >> 0.54% [kernel] [k] __copy_skb_header
> >> 0.53% [kernel] [k] dev_queue_xmit_nit
> >> 0.53% [kernel] [k] _raw_spin_lock_irqsave
> >> 0.50% [kernel] [k] kmem_cache_free
> >> 0.48% libfrr.so.0.0.0 [.] 0x00000000000ce970
> >> 0.47% [ixgbe] [k] ixgbe_clean_tx_irq
> >> 0.45% [kernel] [k] timerqueue_add
> >> 0.45% [kernel] [k] lapic_next_deadline
> >> 0.45% [kernel] [k] csum_partial_copy_generic
> >> 0.44% [nf_flow_table] [k] nf_flow_offload_ip_hook
> >> 0.44% [kernel] [k] kmem_cache_alloc
> >> 0.44% [nf_conntrack] [k] nf_conntrack_lock
> >>
> >>> On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@gmail.com> wrote:
> >>>
> >>> Hi
> >>> Sorry for delay but not easy to catch moment .
> >>>
> >>>
> >>> See this is mpstatl 1 :
> >>>
> >>> Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU)
> >>>
> >>> 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
> >>> 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05
> >>> 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51
> >>> 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21
> >>> 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84
> >>> 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67
> >>> 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75
> >>> 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61
> >>> 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11
> >>> 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58
> >>> 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30
> >>> 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20
> >>> 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07
> >>> 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20
> >>> 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92
> >>> 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03
> >>> 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17
> >>> Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26
> >>>
> >>>
> >>> I attache and one screenshot from perf top (Screenshot is send on preview mail)
> >>>
> >>> And I see in lsmod
> >>>
> >>> pppoe 20480 8198
> >>> pppox 16384 1 pppoe
> >>> ppp_generic 45056 16364 pppox,pppoe
> >>> slhc 16384 1 ppp_generic
> >>>
> >>> To slow remove pppoe session .
> >>>
> >>> And from log :
> >>>
> >>> [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>> [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
> >>>
> >>>> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote:
> >>>>
> >>>> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote:
> >>>>> And one more that see.
> >>>>>
> >>>>> Problem is come when accel start finishing sessions,
> >>>>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans ,
> >>>>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans.
> >>>>> May be kernel destroy old session slow and entrained other users by locking other sessions.
> >>>>> is there a way to speed up the closing of stopped/dead sessions.
> >>>>
> >>>> What are the CPU stats when that happen? Is it users space or kernel
> >>>> space that keeps it busy?
> >>>>
> >>>> One easy way to check is to run "mpstat 1" for a few seconds when the
> >>>> problem occurs.
> >>>>
> >>>
> >>
> >
>
next prev parent reply other threads:[~2021-09-14 8:07 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-05 20:53 Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected Martin Zaharinov
2021-08-06 4:40 ` Greg KH
2021-08-06 5:40 ` Martin Zaharinov
2021-08-08 15:14 ` Martin Zaharinov
2021-08-08 15:23 ` Pali Rohár
2021-08-08 15:29 ` Martin Zaharinov
2021-08-09 15:15 ` Pali Rohár
2021-08-10 18:27 ` Martin Zaharinov
2021-08-11 16:40 ` Guillaume Nault
2021-08-11 11:10 ` Martin Zaharinov
2021-08-11 16:48 ` Guillaume Nault
2021-09-07 6:16 ` Martin Zaharinov
2021-09-07 6:42 ` Martin Zaharinov
2021-09-11 6:26 ` Martin Zaharinov
2021-09-14 6:16 ` Martin Zaharinov
2021-09-14 8:02 ` Guillaume Nault [this message]
2021-09-14 9:50 ` Florian Westphal
2021-09-14 10:01 ` Martin Zaharinov
2021-09-14 11:00 ` Florian Westphal
2021-09-15 14:25 ` Martin Zaharinov
2021-09-15 14:37 ` Martin Zaharinov
2021-09-16 20:00 ` Martin Zaharinov
2021-09-14 10:53 ` Martin Zaharinov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210914080206.GA20454@pc-4.home \
--to=gnault@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=micron10@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pali@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.