From: Patrick McHardy <kaber@trash.net>
To: David Miller <davem@davemloft.net>
Cc: joy@entuzijast.net, mchan@broadcom.com, billfink@mindspring.com,
bhutchings@solarflare.com, netdev@vger.kernel.org,
mirrors@debian.org, devik@cdi.cz
Subject: Re: bnx2_poll panicking kernel
Date: Fri, 11 Jul 2008 14:19:59 +0200 [thread overview]
Message-ID: <48774FEF.8050700@trash.net> (raw)
In-Reply-To: <20080711.025656.261409874.davem@davemloft.net>
David Miller wrote:
> From: Josip Rodin <joy@entuzijast.net>
> Date: Fri, 11 Jul 2008 11:24:16 +0200
>
> [ Patrick/Martin, you can simply skip to the final paragraph. ]
>
>> Here we go, it triggered, here are the first few, tell me if you need more:
>
> Thanks for the trace:
>
>> Jul 11 02:15:10 arrakis kernel: Splitting cloned skb
>> Jul 11 02:15:10 arrakis kernel: Pid: 0, comm: swapper Not tainted 2.6.25.6 #2
>> Jul 11 02:15:10 arrakis kernel:
>> Jul 11 02:15:10 arrakis kernel: Call Trace:
>> Jul 11 02:15:10 arrakis kernel: <IRQ> [<ffffffff803e5e75>] __alloc_skb+0x85/0x150
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff803e66aa>] skb_split+0x4a/0x300
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff8042700b>] tso_fragment+0xfb/0x180
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff8042716e>] __tcp_push_pending_frames+0xde/0x860
>> Jul 11 02:15:10 arrakis kernel: [<ffffffff80424596>] tcp_rcv_established+0x596/0x9d0
>
> So it's splitting a frame up which should be new data, but for
> some reason made it to the device previously.
>
> The comment above tso_fragment() reads:
>
> /* Trim TSO SKB to LEN bytes, put the remaining data into a new packet
> * which is put after SKB on the list. It is very much like
> * tcp_fragment() except that it may make several kinds of assumptions
> * in order to speed up the splitting operation. In particular, we
> * know that all the data is in scatter-gather pages, and that the
> * packet has never been sent out before (and thus is not cloned).
> */
>
> Note in particular the final phrase inside the parens. :-)))
>
> There is only one way this situation seen in the trace can develop.
> That is if the queueing discipline gave the packet to the device, yet
> returned a value that made TCP believe the packet was not.
>
> When TCP sees such a return value, it does not advance the head of the
> write queue. It will retry to send that head packet again later. And
> that's what we seem to be seeing here.
>
> TCP treats any non-zero return value other than NET_XMIT_CN
> in this way (see tcp_transmit_skb and how it uses net_xmit_eval).
>
> I notice that HTB does a lot of very queer things wrt. return
> values.
>
> For example, it seems that if the class's leaf queue ->enqueue()
> returns any non-success value, it gives NET_XMIT_DROP back down to the
> call chain.
>
> But what if that leaf ->enqueue() is something that passes back
> NET_XMIT_CN? NET_XMIT_CN can be signalled for things like RED, in
> cases where some "other" packet in the same class got dropped but not
> necessarily the one you enqueued.
>
> NET_XMIT_CN means backoff, but it does not indicate that the specific
> packet being enqueued was dropped. It just means "some" packet from
> the same flow was dropped, and therefore there is congestion on this
> flow.
>
> Even more simpler qdiscs such as SFQ use the NET_XMIT_CN return value
> when it does a drop.
>
> So this return value munging being done by HTB creates the illegal
> situation.
>
> I'm not sure how to fix this, because I'm not sure how these
> NET_XMIT_CN situations should be handled wrt. maintaining a proper
> parent queue length value.
>
> Patrick/Martin, in HTB's ->enqueue() and ->requeue() we need to
> propagate NET_XMIT_CN to the caller if that's what the leaf qdisc
> signals to us. But the question is, should sch->q.qlen be
> incremented in that case? NET_XMIT_CN means that some packet got
> dropped, but not necessarily this one. If, for example, RED
> drops another packet already in the queue does it somehow adjust
> the parent sch->q.qlen back down? If not, it's pretty clear how
> this bug got created in the first place :)
Usually we only increment q.qlen on NET_XMIT_SUCCESS, in
all other cases it stays untouched.
> Below is my idiotic
> attempt to cure this, but this whole situation needs an audit:
Yes, this also reminded me of another related bug, when actions
steel a packet, qdiscs return NET_XMIT_SUCCESS, which causes upper
qdiscs to perform incorrect qlen adjustments.
I'll see if I can audit all these paths sometime this weekend.
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 3fb58f4..aa20b47 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> + ret = cl->un.leaf.q->enqueue(skb, cl->un.leaf.q);
> + if (ret == NET_XMIT_DROP) {
> + sch->qstats.drops++;
> + cl->qstats.drops++;
> + } else {
> + cl->bstats.packets +=
> + skb_is_gso(skb)?skb_shinfo(skb)->gso_segs:1;
> + cl->bstats.bytes += skb->len;
> + htb_activate(q, cl);
> + }
> }
The propagation of the leaf qdiscs return value is definitely
correct. The patch looks fine to me.
next prev parent reply other threads:[~2008-07-11 12:20 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-16 12:57 bnx2_poll panicking kernel Josip Rodin
2008-06-16 14:04 ` Ben Hutchings
2008-06-16 15:52 ` Michael Chan
2008-06-16 19:13 ` Josip Rodin
2008-06-16 21:38 ` Josip Rodin
2008-06-16 21:48 ` Josip Rodin
2008-06-16 23:45 ` Michael Chan
2008-06-17 22:37 ` Josip Rodin
2008-06-17 22:47 ` Michael Chan
2008-06-21 11:18 ` Josip Rodin
2008-06-21 15:34 ` Bill Fink
2008-06-21 16:11 ` Michael Chan
2008-06-23 18:04 ` Josip Rodin
2008-06-23 21:36 ` Josip Rodin
2008-06-23 22:48 ` Michael Chan
2008-06-24 22:58 ` Michael Chan
2008-06-25 0:04 ` David Miller
2008-06-26 11:01 ` Josip Rodin
2008-06-26 18:04 ` Michael Chan
2008-07-09 16:46 ` Josip Rodin
2008-07-09 16:57 ` Michael Chan
2008-07-09 23:46 ` David Miller
2008-07-10 9:45 ` Aviv Greenberg
2008-07-10 10:09 ` David Miller
2008-07-10 21:00 ` Michael Chan
2008-07-10 21:00 ` David Miller
2008-07-10 21:23 ` Josip Rodin
2008-07-10 21:38 ` Michael Chan
2008-07-10 22:00 ` Josip Rodin
2008-07-10 22:26 ` Michael Chan
2008-07-10 22:31 ` Josip Rodin
2008-07-10 23:20 ` David Miller
2008-07-11 9:24 ` Josip Rodin
2008-07-11 9:56 ` David Miller
2008-07-11 12:19 ` Patrick McHardy [this message]
2008-07-12 9:49 ` Jarek Poplawski
2008-07-12 13:21 ` Jarek Poplawski
2008-07-14 15:27 ` Patrick McHardy
2008-07-14 17:20 ` Jarek Poplawski
2008-07-14 17:25 ` Jarek Poplawski
2008-07-14 20:21 ` Josip Rodin
2008-07-14 21:22 ` Jarek Poplawski
2008-07-14 21:26 ` Josip Rodin
2008-07-14 21:48 ` Jarek Poplawski
2008-07-17 21:30 ` Josip Rodin
2008-07-17 21:44 ` David Miller
2008-07-18 5:12 ` Jarek Poplawski
2008-08-02 12:28 ` bad htb_{en,re}queue return codes causing corrupt data in drivers [was Re: bnx2_poll panicking kernel] Josip Rodin
2008-08-03 7:06 ` bad htb_{en,re}queue return codes causing corrupt data in drivers David Miller
2008-07-14 22:05 ` bnx2_poll panicking kernel Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48774FEF.8050700@trash.net \
--to=kaber@trash.net \
--cc=bhutchings@solarflare.com \
--cc=billfink@mindspring.com \
--cc=davem@davemloft.net \
--cc=devik@cdi.cz \
--cc=joy@entuzijast.net \
--cc=mchan@broadcom.com \
--cc=mirrors@debian.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.