netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: David Miller <davem@davemloft.net>
Cc: joy@entuzijast.net, mchan@broadcom.com, billfink@mindspring.com,
	bhutchings@solarflare.com, netdev@vger.kernel.org,
	mirrors@debian.org, devik@cdi.cz
Subject: Re: bnx2_poll panicking kernel
Date: Fri, 11 Jul 2008 14:19:59 +0200	[thread overview]
Message-ID: <48774FEF.8050700@trash.net> (raw)
In-Reply-To: <20080711.025656.261409874.davem@davemloft.net>

David Miller wrote:
> From: Josip Rodin <joy@entuzijast.net>
> Date: Fri, 11 Jul 2008 11:24:16 +0200
> 
> [ Patrick/Martin, you can simply skip to the final paragraph. ]
> 
>> Here we go, it triggered, here are the first few, tell me if you need more:
> 
> Thanks for the trace:
> 
>> Jul 11 02:15:10 arrakis kernel: Splitting cloned skb
>> Jul 11 02:15:10 arrakis kernel: Pid: 0, comm: swapper Not tainted 2.6.25.6 #2
>> Jul 11 02:15:10 arrakis kernel: 
>> Jul 11 02:15:10 arrakis kernel: Call Trace:
>> Jul 11 02:15:10 arrakis kernel:  <IRQ>  [<ffffffff803e5e75>] __alloc_skb+0x85/0x150
>> Jul 11 02:15:10 arrakis kernel:  [<ffffffff803e66aa>] skb_split+0x4a/0x300
>> Jul 11 02:15:10 arrakis kernel:  [<ffffffff8042700b>] tso_fragment+0xfb/0x180
>> Jul 11 02:15:10 arrakis kernel:  [<ffffffff8042716e>] __tcp_push_pending_frames+0xde/0x860
>> Jul 11 02:15:10 arrakis kernel:  [<ffffffff80424596>] tcp_rcv_established+0x596/0x9d0
> 
> So it's splitting a frame up which should be new data, but for
> some reason made it to the device previously.
> 
> The comment above tso_fragment() reads:
> 
> /* Trim TSO SKB to LEN bytes, put the remaining data into a new packet
>  * which is put after SKB on the list.  It is very much like
>  * tcp_fragment() except that it may make several kinds of assumptions
>  * in order to speed up the splitting operation.  In particular, we
>  * know that all the data is in scatter-gather pages, and that the
>  * packet has never been sent out before (and thus is not cloned).
>  */
> 
> Note in particular the final phrase inside the parens. :-)))
> 
> There is only one way this situation seen in the trace can develop.
> That is if the queueing discipline gave the packet to the device, yet
> returned a value that made TCP believe the packet was not.
> 
> When TCP sees such a return value, it does not advance the head of the
> write queue.  It will retry to send that head packet again later.  And
> that's what we seem to be seeing here.
> 
> TCP treats any non-zero return value other than NET_XMIT_CN
> in this way (see tcp_transmit_skb and how it uses net_xmit_eval).
> 
> I notice that HTB does a lot of very queer things wrt. return
> values.
> 
> For example, it seems that if the class's leaf queue ->enqueue()
> returns any non-success value, it gives NET_XMIT_DROP back down to the
> call chain.
> 
> But what if that leaf ->enqueue() is something that passes back
> NET_XMIT_CN?  NET_XMIT_CN can be signalled for things like RED, in
> cases where some "other" packet in the same class got dropped but not
> necessarily the one you enqueued.
> 
> NET_XMIT_CN means backoff, but it does not indicate that the specific
> packet being enqueued was dropped.  It just means "some" packet from
> the same flow was dropped, and therefore there is congestion on this
> flow.
> 
> Even more simpler qdiscs such as SFQ use the NET_XMIT_CN return value
> when it does a drop.
> 
> So this return value munging being done by HTB creates the illegal
> situation.
> 
> I'm not sure how to fix this, because I'm not sure how these
> NET_XMIT_CN situations should be handled wrt. maintaining a proper
> parent queue length value.
> 
> Patrick/Martin, in HTB's ->enqueue() and ->requeue() we need to
> propagate NET_XMIT_CN to the caller if that's what the leaf qdisc
> signals to us.  But the question is, should sch->q.qlen be
> incremented in that case?  NET_XMIT_CN means that some packet got
> dropped, but not necessarily this one.  If, for example, RED
> drops another packet already in the queue does it somehow adjust
> the parent sch->q.qlen back down?  If not, it's pretty clear how
> this bug got created in the first place :)

Usually we only increment q.qlen on NET_XMIT_SUCCESS, in
all other cases it stays untouched.

> Below is my idiotic
> attempt to cure this, but this whole situation needs an audit:

Yes, this also reminded me of another related bug, when actions
steel a packet, qdiscs return NET_XMIT_SUCCESS, which causes upper
qdiscs to perform incorrect qlen adjustments.

I'll see if I can audit all these paths sometime this weekend.

> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 3fb58f4..aa20b47 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> +		ret = cl->un.leaf.q->enqueue(skb, cl->un.leaf.q);
> +		if (ret == NET_XMIT_DROP) {
> +			sch->qstats.drops++;
> +			cl->qstats.drops++;
> +		} else {
> +			cl->bstats.packets +=
> +				skb_is_gso(skb)?skb_shinfo(skb)->gso_segs:1;
> +			cl->bstats.bytes += skb->len;
> +			htb_activate(q, cl);
> +		}
>  	}

The propagation of the leaf qdiscs return value is definitely
correct. The patch looks fine to me.

  reply	other threads:[~2008-07-11 12:20 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-16 12:57 bnx2_poll panicking kernel Josip Rodin
2008-06-16 14:04 ` Ben Hutchings
2008-06-16 15:52   ` Michael Chan
2008-06-16 19:13     ` Josip Rodin
2008-06-16 21:38       ` Josip Rodin
2008-06-16 21:48         ` Josip Rodin
2008-06-16 23:45           ` Michael Chan
2008-06-17 22:37             ` Josip Rodin
2008-06-17 22:47               ` Michael Chan
2008-06-21 11:18                 ` Josip Rodin
2008-06-21 15:34                   ` Bill Fink
2008-06-21 16:11                     ` Michael Chan
2008-06-23 18:04                       ` Josip Rodin
2008-06-23 21:36                         ` Josip Rodin
2008-06-23 22:48                           ` Michael Chan
2008-06-24 22:58                             ` Michael Chan
2008-06-25  0:04                               ` David Miller
2008-06-26 11:01                               ` Josip Rodin
2008-06-26 18:04                                 ` Michael Chan
2008-07-09 16:46                                   ` Josip Rodin
2008-07-09 16:57                                     ` Michael Chan
2008-07-09 23:46                                       ` David Miller
2008-07-10  9:45                                         ` Aviv Greenberg
2008-07-10 10:09                                           ` David Miller
2008-07-10 21:00                                         ` Michael Chan
2008-07-10 21:00                                           ` David Miller
2008-07-10 21:23                                           ` Josip Rodin
2008-07-10 21:38                                             ` Michael Chan
2008-07-10 22:00                                               ` Josip Rodin
2008-07-10 22:26                                                 ` Michael Chan
2008-07-10 22:31                                                   ` Josip Rodin
2008-07-10 23:20                                                     ` David Miller
2008-07-11  9:24                                                       ` Josip Rodin
2008-07-11  9:56                                                         ` David Miller
2008-07-11 12:19                                                           ` Patrick McHardy [this message]
2008-07-12  9:49                                                           ` Jarek Poplawski
2008-07-12 13:21                                                             ` Jarek Poplawski
2008-07-14 15:27                                                               ` Patrick McHardy
2008-07-14 17:20                                                                 ` Jarek Poplawski
2008-07-14 17:25                                                                   ` Jarek Poplawski
2008-07-14 20:21                                                                   ` Josip Rodin
2008-07-14 21:22                                                                     ` Jarek Poplawski
2008-07-14 21:26                                                                       ` Josip Rodin
2008-07-14 21:48                                                                         ` Jarek Poplawski
2008-07-17 21:30                                                                           ` Josip Rodin
2008-07-17 21:44                                                                             ` David Miller
2008-07-18  5:12                                                                               ` Jarek Poplawski
2008-08-02 12:28                                                                               ` bad htb_{en,re}queue return codes causing corrupt data in drivers [was Re: bnx2_poll panicking kernel] Josip Rodin
2008-08-03  7:06                                                                                 ` bad htb_{en,re}queue return codes causing corrupt data in drivers David Miller
2008-07-14 22:05                                                                         ` bnx2_poll panicking kernel Jarek Poplawski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48774FEF.8050700@trash.net \
    --to=kaber@trash.net \
    --cc=bhutchings@solarflare.com \
    --cc=billfink@mindspring.com \
    --cc=davem@davemloft.net \
    --cc=devik@cdi.cz \
    --cc=joy@entuzijast.net \
    --cc=mchan@broadcom.com \
    --cc=mirrors@debian.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).