From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Yuchung Cheng <ycheng@google.com>
Cc: netdev <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Patrick McHardy <kaber@trash.net>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
James Morris <jmorris@namei.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: [REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero
Date: Sun, 10 Jan 2016 12:23:43 +0200 [thread overview]
Message-ID: <8830013.bJO8vnIv9N@spock> (raw)
In-Reply-To: <CAK6E8=chikkT02ygDHzHN9xP7q7kzWUqHEVgj7g0oW_RUPcuoA@mail.gmail.com>
With the patch queued for upstream and ECN enabled I get WARN_ON_ONCE()
triggered. Here is the stacktrace:
https://gist.github.com/89203e77bfcb051f269a
It seems that tp->prior_cwnd is zero.
Ideas?
On середа, 6 січня 2016 р. 10:43:45 EET Yuchung Cheng wrote:
> On Wed, Jan 6, 2016 at 10:19 AM, Yuchung Cheng <ycheng@google.com> wrote:
> > On Wed, Jan 6, 2016 at 8:50 AM, Oleksandr Natalenko
> >
> > <oleksandr@natalenko.name> wrote:
> >> Unfortunately, the patch didn't help -- I've got the same stacktrace with
> >> slightly different offset (+3) within the function.
> >>
> >> Now trying to get full stacktrace via netconsole. Need more time.
> >>
> >> Meanwhile, any other ideas on what went wrong?
> >
> > That's odd b/c the patch already checks and avoids div0. Can post me
> > the stacktrace and kernel warnings if any ...
> >
> > One possibility is that tcp_cwnd_reduction() may set a cwnd of 0,
> > which then gets used to start another recovery phase. This may or may
> > not be the culprit of this div0 issue because I wasn't able to
> > reproduce exactly your issue on our servers. But I will post the fix
> > today and CC you.
>
> Could you turn off ecn (sysctl net.ipv4.tcp_ecn=0) to see if this still
> happen?
> >> On December 22, 2015 4:10:32 AM EET, Yuchung Cheng <ycheng@google.com>
wrote:
> >> >On Mon, Dec 21, 2015 at 12:25 PM, Oleksandr Natalenko
> >> >
> >> ><oleksandr@natalenko.name> wrote:
> >> >> Commit 3759824da87b30ce7a35b4873b62b0ba38905ef5 (tcp: PRR uses CRB
> >> >
> >> >mode by
> >> >
> >> >> default and SS mode conditionally) introduced changes to
> >> >
> >> >net/ipv4/tcp_input.c
> >> >
> >> >> tcp_cwnd_reduction() that, possibly, cause division by zero, and
> >> >
> >> >therefore,
> >> >
> >> >> kernel panic in interrupt handler [1].
> >> >>
> >> >> Reverting 3759824da87b30ce7a35b4873b62b0ba38905ef5 seems to fix the
> >> >
> >> >issue.
> >> >
> >> >> I'm able to reproduce the issue on 4.3.0–4.3.3 once per several day
> >> >> (occasionally).
> >> >>
> >> >> What could be done to help in debugging this issue?
> >> >
> >> >Do you have ECN enabled (i.e. sysctl net.ipv4.tcp_ecn > 0)?
> >> >
> >> >If so I suspect an ACK carrying ECE during CA_Loss causes entering CWR
> >> >state w/o calling tcp_init_cwnd_reduct() to set tp->prior_cwnd. Can
> >> >you try this debug / quick-fix patch and send me the error message if
> >> >any?
> >> >
> >> >> Regards,
> >> >>
> >> >> Oleksandr.
> >> >>
> >> >> [1] http://i.piccy.info/
> >> >
> >> >i9/6f5cb187c4ff282d189f78c63f95af43/1450729403/283985/951663/panic.jpg
> >>
> >> --
> >> Sent from my Android device with K-9 Mail. Please excuse my brevity.
next prev parent reply other threads:[~2016-01-10 10:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-21 20:25 [REGRESSION] tcp/ipv4: kernel panic because of (possible) division by zero Oleksandr Natalenko
2015-12-22 2:10 ` Yuchung Cheng
2015-12-22 20:13 ` Oleksandr Natalenko
2016-01-06 16:50 ` Oleksandr Natalenko
2016-01-06 18:19 ` Yuchung Cheng
2016-01-06 18:43 ` Yuchung Cheng
2016-01-06 18:49 ` Oleksandr Natalenko
2016-01-09 17:34 ` Oleksandr Natalenko
2016-01-10 10:23 ` Oleksandr Natalenko [this message]
2016-01-10 14:48 ` Neal Cardwell
2016-01-10 14:54 ` Neal Cardwell
2016-01-10 14:57 ` Oleksandr Natalenko
2016-01-10 14:57 ` Oleksandr Natalenko
2016-01-10 17:29 ` Neal Cardwell
2016-01-10 17:50 ` Oleksandr Natalenko
2016-01-10 18:00 ` Neal Cardwell
2016-01-10 21:56 ` Oleksandr Natalenko
2016-01-11 18:47 ` Neal Cardwell
2016-01-11 23:26 ` Oleksandr Natalenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8830013.bJO8vnIv9N@spock \
--to=oleksandr@natalenko.name \
--cc=davem@davemloft.net \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=ycheng@google.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.