From: Patrick McHardy <kaber@trash.net>
To: David Miller <davem@davemloft.net>
Cc: joy@entuzijast.net, mchan@broadcom.com, billfink@mindspring.com,
bhutchings@solarflare.com, netdev@vger.kernel.org,
mirrors@debian.org, devik@cdi.cz
Subject: Re: bnx2_poll panicking kernel
Date: Fri, 11 Jul 2008 14:19:59 +0200 [thread overview]
Message-ID: <48774FEF.8050700@trash.net> (raw)
In-Reply-To: <20080711.025656.261409874.davem@davemloft.net>
David Miller wrote:
> From: Josip Rodin <joy@entuzijast.net>
> Date: Fri, 11 Jul 2008 11:24:16 +0200
>
> [ Patrick/Martin, you can simply skip to the final paragraph. ]
>
>> Here we go, it triggered, here are the first few, tell me if you need more:
>
> Thanks for the trace:
>
>> Jul 11 02:15:10 arrakis kernel: Splitting cloned skb
>> Jul 11 02:15:10 arrakis kernel: Pid: 0, comm: swapper Not tainted 2.6.25.6 #2
>> Jul 11 02:15:10 arrakis kernel:
>> Jul 11 02:15:10 arrakis kernel: Call Trace:
>> Jul 11 02:15:10 arrakis kernel: <IRQ> [<ffffffff803e5e75>] __alloc_skb+0x85/0x150
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff803e66aa>] skb_split+0x4a/0x300
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff8042700b>] tso_fragment+0xfb/0x180
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff8042716e>] __tcp_push_pending_frames+0xde/0x860
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff80424596>] tcp_rcv_established+0x596/0x9d0
>
> So it's splitting a frame up which should be new data, but for
> some reason made it to the device previously.
>
> The comment above tso_fragment() reads:
>
> /* Trim TSO SKB to LEN bytes, put the remaining data into a new packet
> * which is put after SKB on the list. It is very much like
> * tcp_fragment() except that it may make several kinds of assumptions
> * in order to speed up the splitting operation. In particular, we
> * know that all the data is in scatter-gather pages, and that the
> * packet has never been sent out before (and thus is not cloned).
> */
>
> Note in particular the final phrase inside the parens. :-)))
>
> There is only one way this situation seen in the trace can develop.
> That is if the queueing discipline gave the packet to the device, yet
> returned a value that made TCP believe the packet was not.
>
> When TCP sees such a return value, it does not advance the head of the
> write queue. It will retry to send that head packet again later. And
> that's what we seem to be seeing here.
>
> TCP treats any non-zero return value other than NET_XMIT_CN
> in this way (see tcp_transmit_skb and how it uses net_xmit_eval).
>
> I notice that HTB does a lot of very queer things wrt. return
> values.
>
> For example, it seems that if the class's leaf queue ->enqueue()
> returns any non-success value, it gives NET_XMIT_DROP back down to the
> call chain.
>
> But what if that leaf ->enqueue() is something that passes back
> NET_XMIT_CN? NET_XMIT_CN can be signalled for things like RED, in
> cases where some "other" packet in the same class got dropped but not
> necessarily the one you enqueued.
>
> NET_XMIT_CN means backoff, but it does not indicate that the specific
> packet being enqueued was dropped. It just means "some" packet from
> the same flow was dropped, and therefore there is congestion on this
> flow.
>
> Even more simpler qdiscs such as SFQ use the NET_XMIT_CN return value
> when it does a drop.
>
> So this return value munging being done by HTB creates the illegal
> situation.
>
> I'm not sure how to fix this, because I'm not sure how these
> NET_XMIT_CN situations should be handled wrt. maintaining a proper
> parent queue length value.
>
> Patrick/Martin, in HTB's ->enqueue() and ->requeue() we need to
> propagate NET_XMIT_CN to the caller if that's what the leaf qdisc
> signals to us. But the question is, should sch->q.qlen be
> incremented in that case? NET_XMIT_CN means that some packet got
> dropped, but not necessarily this one. If, for example, RED
> drops another packet already in the queue does it somehow adjust
> the parent sch->q.qlen back down? If not, it's pretty clear how
> this bug got created in the first place :)
Usually we only increment q.qlen on NET_XMIT_SUCCESS, in
all other cases it stays untouched.
> Below is my idiotic
> attempt to cure this, but this whole situation needs an audit:
Yes, this also reminded me of another related bug, when actions
steel a packet, qdiscs return NET_XMIT_SUCCESS, which causes upper
qdiscs to perform incorrect qlen adjustments.
I'll see if I can audit all these paths sometime this weekend.
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 3fb58f4..aa20b47 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> + ret = cl->un.leaf.q->enqueue(skb, cl->un.leaf.q);
> + if (ret == NET_XMIT_DROP) {
> + sch->qstats.drops++;
> + cl->qstats.drops++;
> + } else {
> + cl->bstats.packets +=
> + skb_is_gso(skb)?skb_shinfo(skb)->gso_segs:1;
> + cl->bstats.bytes += skb->len;
> + htb_activate(q, cl);
> + }
> }
The propagation of the leaf qdiscs return value is definitely
correct. The patch looks fine to me.
next prev parent reply other threads:[~2008-07-11 12:20 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-16 12:57 bnx2_poll panicking kernel Josip Rodin
2008-06-16 14:04 ` Ben Hutchings
2008-06-16 15:52 ` Michael Chan
2008-06-16 19:13 ` Josip Rodin
2008-06-16 21:38 ` Josip Rodin
2008-06-16 21:48 ` Josip Rodin
2008-06-16 23:45 ` Michael Chan
2008-06-17 22:37 ` Josip Rodin
2008-06-17 22:47 ` Michael Chan
2008-06-21 11:18 ` Josip Rodin
2008-06-21 15:34 ` Bill Fink
2008-06-21 16:11 ` Michael Chan
2008-06-23 18:04 ` Josip Rodin
2008-06-23 21:36 ` Josip Rodin
2008-06-23 22:48 ` Michael Chan
2008-06-24 22:58 ` Michael Chan
2008-06-25 0:04 ` David Miller
2008-06-26 11:01 ` Josip Rodin
2008-06-26 18:04 ` Michael Chan
2008-07-09 16:46 ` Josip Rodin
2008-07-09 16:57 ` Michael Chan
2008-07-09 23:46 ` David Miller
2008-07-10 9:45 ` Aviv Greenberg
2008-07-10 10:09 ` David Miller
2008-07-10 21:00 ` Michael Chan
2008-07-10 21:00 ` David Miller
2008-07-10 21:23 ` Josip Rodin
2008-07-10 21:38 ` Michael Chan
2008-07-10 22:00 ` Josip Rodin
2008-07-10 22:26 ` Michael Chan
2008-07-10 22:31 ` Josip Rodin
2008-07-10 23:20 ` David Miller
2008-07-11 9:24 ` Josip Rodin
2008-07-11 9:56 ` David Miller
2008-07-11 12:19 ` Patrick McHardy [this message]
2008-07-12 9:49 ` Jarek Poplawski
2008-07-12 13:21 ` Jarek Poplawski
2008-07-14 15:27 ` Patrick McHardy
2008-07-14 17:20 ` Jarek Poplawski
2008-07-14 17:25 ` Jarek Poplawski
2008-07-14 20:21 ` Josip Rodin
2008-07-14 21:22 ` Jarek Poplawski
2008-07-14 21:26 ` Josip Rodin
2008-07-14 21:48 ` Jarek Poplawski
2008-07-17 21:30 ` Josip Rodin
2008-07-17 21:44 ` David Miller
2008-07-18 5:12 ` Jarek Poplawski
2008-08-02 12:28 ` bad htb_{en,re}queue return codes causing corrupt data in drivers [was Re: bnx2_poll panicking kernel] Josip Rodin
2008-08-03 7:06 ` bad htb_{en,re}queue return codes causing corrupt data in drivers David Miller
2008-07-14 22:05 ` bnx2_poll panicking kernel Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48774FEF.8050700@trash.net \
--to=kaber@trash.net \
--cc=bhutchings@solarflare.com \
--cc=billfink@mindspring.com \
--cc=davem@davemloft.net \
--cc=devik@cdi.cz \
--cc=joy@entuzijast.net \
--cc=mchan@broadcom.com \
--cc=mirrors@debian.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).