From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,FSL_HELO_FAKE, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77945C43381 for ; Tue, 26 Feb 2019 09:23:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3AADE2173C for ; Tue, 26 Feb 2019 09:23:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iZG9SDcZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727456AbfBZJXJ (ORCPT ); Tue, 26 Feb 2019 04:23:09 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:33562 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727398AbfBZJXJ (ORCPT ); Tue, 26 Feb 2019 04:23:09 -0500 Received: by mail-pl1-f196.google.com with SMTP id y10so5973183plp.0 for ; Tue, 26 Feb 2019 01:23:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=B0KFfQy5+RF+StrZTmuSdESY+D/B3DTHg4Cr6Fcglrs=; b=iZG9SDcZQyD8tBOm7pPcnzaMFvG6gsbLt840jhX4Ku3w/XfCAmO3I1874yTRPDjeAy YgChBR1eAai3qBXbxFhWEGE/3JnmFnY1+9HiWDTUJ76fvuKqYX4XcOlEwUxiTQPTlWY9 uIwyO+CbPTn2uplkWzj1EJzMKaeniCY9h1+TxVLYZMQdN7vCFrW7ArgZM5loYoirKgak 15xKfaIFfgITSew90cQ4ZLTBbMh4boUg36oN6A8P2j2N5N44ho69C4yVMN6ngonUBlo/ GEPi+xzk70dZT8ODc29Yuf7+j857YgJctH0Boshr8o9meZZ8FY/kg+5kqV1Hxmukkjge +JdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=B0KFfQy5+RF+StrZTmuSdESY+D/B3DTHg4Cr6Fcglrs=; b=LpwTF8JU7U3lwE8+qdoTO3ZkikToB1TTdGDPcFyUnwJwP2DxEP8HVkxrhwrV5KrmBf N3AmVETdv6KFDSZrhbgHvtrCU7brhF0k/S+uZGruQ0+A/PIUgqR7Ap+g5hjfQv++CQWO 0FpU5wEjsrEJdo2POT4uWaRsAd23TnNOcRDpKWfvJhtW1uFtsVrRDmFOK5qLnn+vkNS/ LcMrDDPC/aEq4XkzydH4COgqQRSPDQDzGjlceFOc53H3zt8Ww1zRIN7EfzrX/ZupQM6t xnyYCMjA36i8r+3cusWC7eUwYPBCbVEHwwTYK5bP6/dZp+tFJ8gigoIeqP8VuYcPWTvi Hydg== X-Gm-Message-State: AHQUAubEb/TR3trvM/6VZ4NZ8KAjX6dDxcBSZsJCqyOZw7MWr59j2KYS TOWbWacWMb9MaHhhfqqRGVI= X-Google-Smtp-Source: AHgI3IaWyjY351axkSQed2XBJcqaHU2x+qgqCy/+wmfYUnIIO1ni3dWooEoGQNTSNgnbu//RLYo4jQ== X-Received: by 2002:a17:902:2a29:: with SMTP id i38mr17627021plb.110.1551172987609; Tue, 26 Feb 2019 01:23:07 -0800 (PST) Received: from gmail.com (c-73-140-212-29.hsd1.wa.comcast.net. [73.140.212.29]) by smtp.gmail.com with ESMTPSA id j24sm18856810pgl.58.2019.02.26.01.23.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Feb 2019 01:23:07 -0800 (PST) Date: Tue, 26 Feb 2019 01:23:05 -0800 From: Andrei Vagin To: Eric Dumazet Cc: "David S . Miller" , netdev , Eric Dumazet , Soheil Hassas Yeganeh , Neal Cardwell , Yuchung Cheng , syzbot , Andrey Vagin Subject: Re: [PATCH net] tcp: repaired skbs must init their tso_segs Message-ID: <20190226092302.GA25925@gmail.com> References: <20190223235151.168283-1-edumazet@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20190223235151.168283-1-edumazet@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Sat, Feb 23, 2019 at 03:51:51PM -0800, Eric Dumazet wrote: > syzbot reported a WARN_ON(!tcp_skb_pcount(skb)) > in tcp_send_loss_probe() [1] > > This was caused by TCP_REPAIR sent skbs that inadvertenly > were missing a call to tcp_init_tso_segs() > > [1] > WARNING: CPU: 1 PID: 0 at net/ipv4/tcp_output.c:2534 tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534 > Kernel panic - not syncing: panic_on_warn set ... > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc7+ #77 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > panic+0x2cb/0x65c kernel/panic.c:214 > __warn.cold+0x20/0x45 kernel/panic.c:571 > report_bug+0x263/0x2b0 lib/bug.c:186 > fixup_bug arch/x86/kernel/traps.c:178 [inline] > fixup_bug arch/x86/kernel/traps.c:173 [inline] > do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271 > do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290 > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973 > RIP: 0010:tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534 > Code: 88 fc ff ff 4c 89 ef e8 ed 75 c8 fb e9 c8 fc ff ff e8 43 76 c8 fb e9 63 fd ff ff e8 d9 75 c8 fb e9 94 f9 ff ff e8 bf 03 91 fb <0f> 0b e9 7d fa ff ff e8 b3 03 91 fb 0f b6 1d 37 43 7a 03 31 ff 89 > RSP: 0018:ffff8880ae907c60 EFLAGS: 00010206 > RAX: ffff8880a989c340 RBX: 0000000000000000 RCX: ffffffff85dedbdb > RDX: 0000000000000100 RSI: ffffffff85dee0b1 RDI: 0000000000000005 > RBP: ffff8880ae907c90 R08: ffff8880a989c340 R09: ffffed10147d1ae1 > R10: ffffed10147d1ae0 R11: ffff8880a3e8d703 R12: ffff888091b90040 > R13: ffff8880a3e8d540 R14: 0000000000008000 R15: ffff888091b90860 > tcp_write_timer_handler+0x5c0/0x8a0 net/ipv4/tcp_timer.c:583 > tcp_write_timer+0x10e/0x1d0 net/ipv4/tcp_timer.c:607 > call_timer_fn+0x190/0x720 kernel/time/timer.c:1325 > expire_timers kernel/time/timer.c:1362 [inline] > __run_timers kernel/time/timer.c:1681 [inline] > __run_timers kernel/time/timer.c:1649 [inline] > run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694 > __do_softirq+0x266/0x95a kernel/softirq.c:292 > invoke_softirq kernel/softirq.c:373 [inline] > irq_exit+0x180/0x1d0 kernel/softirq.c:413 > exiting_irq arch/x86/include/asm/apic.h:536 [inline] > smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062 > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 > > RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58 > Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 0c a1 fa 48 8b 45 d8 e9 ce fe ff ff 48 89 df e8 48 0c a1 fa eb 82 90 90 90 90 90 90 fb f4 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90 > RSP: 0018:ffff8880a98afd78 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 > RAX: 1ffffffff1125061 RBX: ffff8880a989c340 RCX: 0000000000000000 > RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880a989cbbc > RBP: ffff8880a98afda8 R08: ffff8880a989c340 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > R13: ffffffff889282f8 R14: 0000000000000001 R15: 0000000000000000 > arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555 > default_idle_call+0x36/0x90 kernel/sched/idle.c:93 > cpuidle_idle_call kernel/sched/idle.c:153 [inline] > do_idle+0x386/0x570 kernel/sched/idle.c:262 > cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353 > start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271 > secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 > Kernel Offset: disabled > Rebooting in 86400 seconds.. > Thank you Eric. I saw a few test fails when tcp_peek_sndq() returned more data than we expected. I have executed the test with this fix in a loop and it works without any problem. Without this fix, it fails after a few iteration. https://github.com/checkpoint-restore/criu/issues/622 > Fixes: 79861919b889 ("tcp: fix TCP_REPAIR xmit queue setup") > Signed-off-by: Eric Dumazet > Reported-by: syzbot > Cc: Andrey Vagin > Cc: Soheil Hassas Yeganeh > Cc: Neal Cardwell > --- > net/ipv4/tcp_output.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 730bc44dbad9363814705b28c2f91a2253d91207..ccc78f3a4b60d3012430488bdfbcfc5122ff8627 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2347,6 +2347,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > /* "skb_mstamp_ns" is used as a start point for the retransmit timer */ > skb->skb_mstamp_ns = tp->tcp_wstamp_ns = tp->tcp_clock_cache; > list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue); > + tcp_init_tso_segs(skb, mss_now); > goto repair; /* Skip network transmission */ > } > > -- > 2.21.0.rc0.258.g878e2cd30e-goog >