All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: "Li,Rongqing" <lirongqing@baidu.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: "edumazet@google.com" <edumazet@google.com>
Subject: Re: 答复: [PATCH] net: clean the sk_frag.page of new cloned socket
Date: Thu, 25 Jan 2018 19:03:22 -0800	[thread overview]
Message-ID: <1516935802.3715.54.camel@gmail.com> (raw)
In-Reply-To: <2AD939572F25A448A3AE3CAEA61328C23694645C@BC-MAIL-MBX12.internal.baidu.com>

On Fri, 2018-01-26 at 02:09 +0000, Li,Rongqing wrote:
> > >  		if (newsk->sk_prot->sockets_allocated)
> > >  			sk_sockets_allocated_inc(newsk);
> > 
> > Good catch.
> > 
> > I suspect this was discovered by some syzkaller/syzbot run ?
> > 
> 
> 
> No.
> 
> I am seeing a panic, a page is in both task.task_frag.page and buddy free list;
> It should not happen , and the page->lru->next and page->lru->pre is 
> 0xdead000000100100, then when alloc page from buddy, the system panic at
>  __list_del of __rmqueue 
> 
> #0 [ffff881fff0c3850] machine_kexec at ffffffff8103cca8
>  #1 [ffff881fff0c38a0] crash_kexec at ffffffff810c2443
>  #2 [ffff881fff0c3968] oops_end at ffffffff816cae70
>  #3 [ffff881fff0c3990] die at ffffffff810063eb
>  #4 [ffff881fff0c39c0] do_general_protection at ffffffff816ca7ce
>  #5 [ffff881fff0c39f0] general_protection at ffffffff816ca0d8
>     [exception RIP: __rmqueue+120]
>     RIP: ffffffff8113a918  RSP: ffff881fff0c3aa0  RFLAGS: 00010046
>     RAX: ffff88207ffd8018  RBX: 0000000000000003  RCX: 0000000000000003
>     RDX: 0000000000000001  RSI: ffffea006f4cf620  RDI: dead000000200200
>     RBP: ffff881fff0c3b00   R8: ffff88207ffd8018   R9: 0000000000000000
>     R10: dead000000100100  R11: ffffea007ecc6480  R12: ffffea006f4cf600
>     R13: 0000000000000000  R14: 0000000000000003  R15: ffff88207ffd7e80
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
>  #6 [ffff881fff0c3b08] get_page_from_freelist at ffffffff8113ce71
>  #7 [ffff881fff0c3be0] __alloc_pages_nodemask at ffffffff8113d15f
>  #8 [ffff881fff0c3d10] __alloc_page_frag at ffffffff815e2362
>  #9 [ffff881fff0c3d40] __netdev_alloc_frag at ffffffff815e241b
> #10 [ffff881fff0c3d58] __alloc_rx_skb at ffffffff815e2f91
> #11 [ffff881fff0c3d78] __netdev_alloc_skb at ffffffff815e300b
> #12 [ffff881fff0c3d90] ixgbe_clean_rx_irq at ffffffffa003a98f [ixgbe]
> #13 [ffff881fff0c3df8] ixgbe_poll at ffffffffa003c233 [ixgbe]
> #14 [ffff881fff0c3e70] net_rx_action at ffffffff815f2f09
> #15 [ffff881fff0c3ec8] __do_softirq at ffffffff81064867
> #16 [ffff881fff0c3f38] call_softirq at ffffffff816d3a9c
> #17 [ffff881fff0c3f50] do_softirq at ffffffff81004e65
> #18 [ffff881fff0c3f68] irq_exit at ffffffff81064b7d
> #19 [ffff881fff0c3f78] do_IRQ at ffffffff816d4428
> 
> The page info is like below, some element is removed:
> 
> crash> struct page ffffea006f4cf600 -x
> struct page {
>   flags = 0x2fffff00004000, 
>   mapping = 0x0, 
>   {
>     {
>       counters = 0x2ffffffff, 
>       {
>         {
>           _mapcount = {
>             counter = 0xffffffff
>           }, 
>           {
>             inuse = 0xffff, 
>             objects = 0x7fff, 
>             frozen = 0x1
>           }, 
>           units = 0xffffffff
>         }, 
>         _count = {
>           counter = 0x2
>         }
>       }
>     }
>   }, 
>   {
>     lru = {
>       next = 0xdead000000100100, 
>       prev = 0xdead000000200200
>     }, 
>   }, 
> …..
>   }
> }
> crash>
> 
> 
> the page ffffea006f4cf600 is in other task task_frag.page and 
> the task backtrace is like below
> 
> crash> task 8683|grep ffffea006f4cf600 -A3  
>     page = 0xffffea006f4cf600, 
>     offset = 32768, 
>     size = 32768
>   }, 
> crash>
> 
> crash> bt 8683
> PID: 8683   TASK: ffff881faa088000  CPU: 10  COMMAND: "mynode"
>  #0 [ffff881fff145e78] crash_nmi_callback at ffffffff81031712
>  #1 [ffff881fff145e88] nmi_handle at ffffffff816cafe9
>  #2 [ffff881fff145ec8] do_nmi at ffffffff816cb0f0
>  #3 [ffff881fff145ef0] end_repeat_nmi at ffffffff816ca4a1
>     [exception RIP: _raw_spin_lock_irqsave+62]
>     RIP: ffffffff816c9a9e  RSP: ffff881fa992b990  RFLAGS: 00000002
>     RAX: 0000000000004358  RBX: ffff88207ffd7e80  RCX: 0000000000004358
>     RDX: 0000000000004356  RSI: 0000000000000246  RDI: ffff88207ffd7ee8
>     RBP: ffff881fa992b990   R8: 0000000000000000   R9: 00000000019a16e6
>     R10: 0000000000004d24  R11: 0000000000004000  R12: 0000000000000242
>     R13: 0000000000004d24  R14: 0000000000000001  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> --- <NMI exception stack> ---
>  #4 [ffff881fa992b990] _raw_spin_lock_irqsave at ffffffff816c9a9e
>  #5 [ffff881fa992b998] get_page_from_freelist at ffffffff8113ce5f
>  #6 [ffff881fa992ba70] __alloc_pages_nodemask at ffffffff8113d15f
>  #7 [ffff881fa992bba0] alloc_pages_current at ffffffff8117ab29
>  #8 [ffff881fa992bbe8] sk_page_frag_refill at ffffffff815dd310
>  #9 [ffff881fa992bc18] tcp_sendmsg at ffffffff8163e4f3
> #10 [ffff881fa992bcd8] inet_sendmsg at ffffffff81668434
> #11 [ffff881fa992bd08] sock_sendmsg at ffffffff815d9719
> #12 [ffff881fa992be58] SYSC_sendto at ffffffff815d9c81
> #13 [ffff881fa992bf70] sys_sendto at ffffffff815da6ae
> #14 [ffff881fa992bf80] system_call_fastpath at ffffffff816d2189
>     RIP: 00007f5bfe1d804b  RSP: 00007f5bfa63b3b0  RFLAGS: 00000206
>     RAX: 000000000000002c  RBX: ffffffff816d2189  RCX: 00007f5bfa63b420
>     RDX: 0000000000002000  RSI: 000000000c096000  RDI: 0000000000000040
>     RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>     R10: 0000000000000000  R11: 0000000000000246  R12: ffffffff815da6ae
>     R13: ffff881fa992bf78  R14: 000000000000a552  R15: 0000000000000016
>     ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b
> crash>
> 
> 
> my kernel is 3.10, I did not find the root cause, I guest all kind of possibility
> 

Have you backported 22a0e18eac7a9e986fec76c60fa4a2926d1291e2 ?


> > I would rather move that in tcp_disconnect() that only fuzzers use, instead of
> > doing this on every clone and slowing down normal users.
> > 
> 
> 
> Do you mean we should fix it like below:
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index f08eebe60446..44f8320610ab 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2431,6 +2431,12 @@ int tcp_disconnect(struct sock *sk, int flags)
>  
>         WARN_ON(inet->inet_num && !icsk->icsk_bind_hash);
>  
> +
> +       if (sk->sk_frag.page) {
> +               put_page(sk->sk_frag.page);
> +               sk->sk_frag.page = NULL;
> +       }
> +
>         sk->sk_error_report(sk);
>         return err;
>  }

Yes, something like that.

  reply	other threads:[~2018-01-26  3:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25 12:08 [PATCH] net: clean the sk_frag.page of new cloned socket Li RongQing
2018-01-25 12:44 ` Eric Dumazet
2018-01-26  2:09   ` 答复: " Li,Rongqing
2018-01-26  3:03     ` Eric Dumazet [this message]
2018-01-26  3:21       ` 答复: " Li,Rongqing
2018-01-26  3:14     ` Eric Dumazet
2018-01-26  3:23       ` 答复: " Li,Rongqing
2018-01-26  5:16       ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1516935802.3715.54.camel@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=edumazet@google.com \
    --cc=lirongqing@baidu.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.