Netdev List
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-04 15:12 UTC (permalink / raw)
  To: Michael Büsch
  Cc: Alexey Zaytsev, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <20110704164351.338dc12e@maggie>

Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> On Mon, 4 Jul 2011 16:27:26 +0200
> Michael Büsch <m@bues.ch> wrote:
> > We do this in b43, which has exactly the same DMA engine.
> 
> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>  But that's a bug. We should do a wmb() on the RX side before advancing the
>  descriptor ring pointer.)

I am wondering what happens if RX ring is set to 64, and we receive
exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?

Alexey, could you try this patch please ?

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..51072a3 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -829,6 +829,7 @@ static int b44_rx(struct b44 *bp, int budget)
 	}
 
 	bp->rx_cons = cons;
+	wmb();
 	bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
 
 	return received;



^ permalink raw reply related

* [PATCH] iproute2: Fix building xt module against xtables version 6
From: Andreas Henriksson @ 2011-07-04 15:17 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Jan Engelhardt

iptables/xtables apparently changed API again.... Now you need to pass
and extra parameter (orig_opts) which was not needed before.

Sprinkle some lovely pre-processor magic to be compatible with both older
and new versions. In the beginning of times XTABLES_VERSION_CODE didn't
exist. Then it was (0x10000 * major + 0x100 * minor + patch) when it was
first introduced (according to git), but now it's at 6...
Don't know what official iptables releases has defined it to over time.
Lets just hope none of the older versions with is has the define
higher then 6 is still around.... so only the "current" versioning
scheme is supported.... lets see how long this lasts now.

For the API change in xtables, see:
http://git.netfilter.org/cgi-bin/gitweb.cgi?p=iptables.git;a=commitdiff;h=600f38db82548a683775fd89b6e136673e924097

Signed-off-by: Andreas Henriksson <andreas@fatal.se>

---

Would be great if Jan could ack this change first...
I just tested that it atleast builds (against the new xtables version),
nothing more then that...


diff --git a/tc/m_xt.c b/tc/m_xt.c
index 651a59e..13bf19f 100644
--- a/tc/m_xt.c
+++ b/tc/m_xt.c
@@ -160,9 +160,13 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 					return -1;
 				}
 				tcipt_globals.opts =
-				    xtables_merge_options(tcipt_globals.opts,
-				                          m->extra_opts,
-				                          &m->option_offset);
+				    xtables_merge_options(
+#if (XTABLES_VERSION_CODE >= 6)
+				        tcipt_globals.orig_opts,
+#endif
+				        tcipt_globals.opts,
+				        m->extra_opts,
+				        &m->option_offset);
 			} else {
 				fprintf(stderr," failed to find target %s\n\n", optarg);
 				return -1;
@@ -305,7 +309,11 @@ print_ipt(struct action_util *au,FILE * f, struct rtattr *arg)
 			}
 
 			tcipt_globals.opts =
-			    xtables_merge_options(tcipt_globals.opts,
+			    xtables_merge_options(
+#if (XTABLES_VERSION_CODE >= 6)
+				                  tcipt_globals.orig_opts,
+#endif
+				                  tcipt_globals.opts,
 			                          m->extra_opts,
 			                          &m->option_offset);
 		} else {

^ permalink raw reply related

* [PATCH net-next-2.6] b44: use dev_kfree_skb() in b44_tx()
From: Eric Dumazet @ 2011-07-04 15:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

b44_tx() is run from softirq handler, it can use dev_kfree_skb() instead
of dev_kfree_skb_irq()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/b44.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..4345e28 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -609,7 +609,7 @@ static void b44_tx(struct b44 *bp)
 				 skb->len,
 				 DMA_TO_DEVICE);
 		rp->skb = NULL;
-		dev_kfree_skb_irq(skb);
+		dev_kfree_skb(skb);
 	}
 
 	bp->tx_cons = cons;



^ permalink raw reply related

* Re: [PATCH linux-firmware] bnx2x: Adding FW 7.0.23.0
From: Ariel Elior @ 2011-07-04 16:07 UTC (permalink / raw)
  To: dwmw2; +Cc: eilong, netdev
In-Reply-To: <1309795166.16026.17.camel@lb-tlvb-ariel.il.broadcom.com>


On Mon, 2011-07-04 at 18:59 +0300, Ariel Elior wrote:
> This FW supports multiple concurrent classes of service in network traffic.
> 
> Signed-off-by: Ariel Elior <ariele@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
in case you need it, the patch is also available under
http://linux.broadcom.com/eilong/FW-7.0.23.0/0001-bnx2x-Adding-FW-7.0.23.0.patch
thanks,
Ariel



^ permalink raw reply

* Re: [PATCH] iproute2: Fix building xt module against xtables version 6
From: Jan Engelhardt @ 2011-07-04 16:11 UTC (permalink / raw)
  To: Andreas Henriksson; +Cc: shemminger, netdev
In-Reply-To: <20110704151742.GA6819@amd64.fatal.se>

On Monday 2011-07-04 17:17, Andreas Henriksson wrote:

>iptables/xtables apparently changed API again.... Now you need to pass
>and extra parameter (orig_opts) which was not needed before.
>
>Sprinkle some lovely pre-processor magic to be compatible with both older
>and new versions. In the beginning of times XTABLES_VERSION_CODE didn't
>exist. Then it was (0x10000 * major + 0x100 * minor + patch) when it was
>first introduced (according to git), but now it's at 6...
>Don't know what official iptables releases has defined it to over time.

It reflects the soversion, IOW, the ABI (if the developer feels like it, 
also API). This increases only when needed rather than at every release. 
It also won't reach versions >=0x10000 anytime soon because of that. 
More importantly though, the API drafting was just finalized for 1.4.3 
(XTABLES_VERSION_CODE=1), so anything before that is a very blurry 
construct anyway.


>Lets just hope none of the older versions with is has the define
>higher then 6 is still around.... so only the "current" versioning
>scheme is supported.... lets see how long this lasts now.

^ permalink raw reply

* Re: [PATCH 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Ben Hutchings @ 2011-07-04 16:11 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-sctp, netdev
In-Reply-To: <20110704083605.AF9C28156C57@regina.usersys.redhat.com>

On Mon, 2011-06-20 at 18:08 +1000, Max Matveev wrote:
> sctp does not use second and third ("default" and "max") values
> of sctp_(r|w)mem tunables. The format is the same and tcp_(r|w)mem
> but the meaning is different so make the documentation explicit to
> avoid confusion.
> 
> Signed-off-by: Max Matveev <makc@redhat.com>
> ---
>  Documentation/networking/ip-sysctl.txt |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index d3d653a..f37d374 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1465,10 +1465,17 @@ sctp_mem - vector of 3 INTEGERs: min, pressure, max
>  	Default is calculated at boot time from amount of available memory.
>  
>  sctp_rmem - vector of 3 INTEGERs: min, default, max
> -	See tcp_rmem for a description.
> +	Only the first value ("min") is used, "default" and "max" are
> +	ignored and may be removed in the future versions.
> +
> +	min: Minimal size of receive buffer used by SCTP socket.
> +	It is guaranteed to each STCP socket (but not association) even 

Typo: 'STCP' should be 'SCTP'.

Ben.

> +	under moderate memory pressure.
> +
> +	Default: 1 page
>  
>  sctp_wmem  - vector of 3 INTEGERs: min, default, max
> -	See tcp_wmem for a description.
> +	Currently this tunable has no effect.
>  
>  addr_scope_policy - INTEGER
>  	Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [net-next PATCH 1/2] dcbnl: Aggregated CEE GET operation
From: Shmulik Ravid @ 2011-07-04 18:22 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <4E10B03C.1070803@intel.com>


On Sun, 2011-07-03 at 11:09 -0700, John Fastabend wrote:
> >  
> > +static int dcbnl_cee_pg_fill(struct sk_buff *skb, struct net_device *dev,
> > +			     int dir)
> 
> Why the 'dir'? Could you just pack both tx and rx together? It looks like all
> cases pack TX then pack RX. Did I miss something?
> 
This is just an auxiliary function meant to avoid duplicating the loops
that build the DCB_ATTR_CEE_TX_PG and DCB_ATTR_CEE_RX_PG nested
attributes. The only differences are the parent attribute and the driver
callbacks. Its similar to the way  dcbnl_pgtx_getcfg() and
dcbnl_pgrx_getcfg() are already implemented.

I'll incorporate the other comments to a v2 version of the patch.

Thanks,
Shmulik




^ permalink raw reply

* Re: bnx2: FTQ dump on heavy workload(bnx2-2.0.23b + kernel 2.6.32.36)
From: Michael Chan @ 2011-07-04 17:04 UTC (permalink / raw)
  To: 'MaoXiaoyun', 'netdev@vger.kernel.org'; +Cc: David Christensen
In-Reply-To: <blu157-w30B8D4FBD571396FC81A04DA5C0@phx.gbl>

MaoXiaoyun wrote:

> Could it be caused by the similar timeout as
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-
> 2.6.git;a=commit;h=c441b8d2cb2194b05550a558d6d95d8944e56a84.

Based on the register dump below, it is not caused by the MSI-X issue.

> 
> Maybe timeout still happens in my test scenerino.
> 
> Well, from the patch, BNX2_MISC_ECO_HW_CTL is defined 0x000008cc. But I
> cannot find
> the defines in programmer reference Guide.(NetXtremeII-PG203-R.pdf).
> Could some help
> to point out for me or is the doc is out of date.

I will request the document to be updated to describe that register.  We
are increasing the register read and write timeout value to workaround
the problem of the MSI-X table being updated while there is a pending
MSI-X.  Without the patch, the write to unmask the MSI-X table entry can
be dropped by the chip.

> 
> Also, is there a way to comfirm whether the timeout really happen?
> (which regisiter
> shall I read?) Or is there a bigger timeout I can set?

Again, the register dump shows that it is not caused by this issue.  I'll
send you some additional debug patch to try to debug the problem.

Thanks.
> 
> thanks.
> 
> ----------------------------------------
> > From: tinnycloud@hotmail.com
> > To: netdev@vger.kernel.org
> > Subject: bnx2: FTQ dump on heavy workload(bnx2-2.0.23b + kernel
> 2.6.32.36)
> > Date: Mon, 4 Jul 2011 15:40:01 +0800
> >
> >
> > Hi:
> >
> > I met bnx2 FTQ dump over and over again during my testing on Xen live
> migration which generate
> > heavy network workload.
> >
> > I have two physcial machine, both have xen 4.0.1 installed, and
> kernel 2.6.32.36, bnx2 2.0.23b.
> > I start 15 Virtual Machines totoally, and doing migration between the
> host over and over again,
> > about 16hours, the network will not work, and sometimes, it can reset
> successfully, sometimes, it
> > cause kernel crash.
> >
> > I've tried debug some, add code in the driver. below is the code when
> FTQ happened.
> > It looks like the NIC is stop transmit the packets, and cause
> timeout.
> >
> > BTW, cpu max_cstate=1 in my grub.
> >
> > Thanks.
> >
> > --------------
> > static void
> > bnx2_tx_timeout(struct net_device *dev)
> > {
> > struct bnx2 *bp = netdev_priv(dev);
> > struct bnx2_napi *bnapi = &bp->bnx2_napi[0];
> > struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
> > struct bnx2_rx_ring_info *rxr = &bnapi->rx_ring;
> > int i ;
> > bnx2_dump_ftq(bp);
> > bnx2_dump_state(bp);
> > if (stop_on_tx_timeout) {
> > printk(KERN_WARNING PFX
> > "%s: prevent chip reset during tx timeout\n",
> > bp->dev->name);
> > smp_rmb();
> > printk("last status idx %d \n", bnapi->last_status_idx);
> > printk("hw_tx_cons %d, txr->hw_tx_conds %d txr->tx_prod %d txr-
> >tx_cons %d\n",
> > bnx2_get_hw_tx_cons(bnapi), txr->hw_tx_cons, txr->tx_prod, txr-
> >tx_cons);
> > printk("hw_rx_cons %d, txr->hw_rx_conds %d\n",
> bnx2_get_hw_rx_cons(bnapi), rxr->rx_cons);
> > printk("sblk->status_attn_bits %d\n",bnapi->status_blk.msi-
> >status_attn_bits);
> > printk("sblk->status_attn_bits_ack %d\n",bnapi->status_blk.msi-
> >status_attn_bits_ack);
> > printk("bnx2_tx_avail %d \n",(bnx2_tx_avail(bp, txr)));
> > printk("sblk->status_tx_quick_consumer_index0 %d\n",bnapi-
> >status_blk.msi->status_tx_quick_consumer_index0);
> > printk("sblk->status_tx_quick_consumer_index1 %d\n",bnapi-
> >status_blk.msi->status_tx_quick_consumer_index1);
> > printk("sblk->status_tx_quick_consumer_index2 %d\n",bnapi-
> >status_blk.msi->status_tx_quick_consumer_index2);
> > printk("sblk->status_tx_quick_consumer_index3 %d\n",bnapi-
> >status_blk.msi->status_tx_quick_consumer_index3);
> > printk("sblk->status_rx_quick_consumer_index0 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index0);
> > printk("sblk->status_rx_quick_consumer_index1 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index1);
> > printk("sblk->status_rx_quick_consumer_index2 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index2);
> > printk("sblk->status_rx_quick_consumer_index3 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index3);
> > printk("sblk->status_rx_quick_consumer_index4 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index4);
> > printk("sblk->status_rx_quick_consumer_index5 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index5);
> > printk("sblk->status_rx_quick_consumer_index6 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index6);
> > printk("sblk->status_rx_quick_consumer_index7 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index7);
> > printk("sblk->status_rx_quick_consumer_index8 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index8);
> > printk("sblk->status_rx_quick_consumer_index9 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index9);
> > printk("sblk->status_rx_quick_consumer_index10 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index10);
> > printk("sblk->status_rx_quick_consumer_index11 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index11);
> > printk("sblk->status_rx_quick_consumer_index12 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index12);
> > printk("sblk->status_rx_quick_consumer_index13 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index13);
> > printk("sblk->status_rx_quick_consumer_index14 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index14);
> > printk("sblk->status_rx_quick_consumer_index15 %d\n",bnapi-
> >status_blk.msi->status_rx_quick_consumer_index15);
> > printk("sblk->status_completion_producer_index %d\n",bnapi-
> >status_blk.msi->status_completion_producer_index);
> > printk("sblk->status_cmd_consumer_index %d\n",bnapi->status_blk.msi-
> >status_cmd_consumer_index);
> > printk("sblk->status_idx %d\n",bnapi->status_blk.msi->status_idx);
> > printk("sblk->status_unused %d\n",bnapi->status_blk.msi-
> >status_unused);
> > printk("sblk->status_blk_num %d\n",bnapi->status_blk.msi-
> >status_blk_num);
> > is_timedout = 1;
> > for (i = 0; i < bp->irq_nvecs; i++) {
> > bnapi = &bp->bnx2_napi[i];
> > bnx2_tx_int(bp, bnapi, 0);
> > }
> > return;
> > }
> > -----------------
> >
> > -------------FTQ log in /var/log/message
> > ------------[ cut here ]------------
> > WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x105/0x16a()
> > Hardware name: Tecal RH2285
> > Modules linked in: iptable_filter ip_tables nfs fscache nfs_acl
> auth_rpcgss bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler
> lockd sunrpc ipv6 xenfs dm_multipath fuse xen_netback xen_blkback
> blktap blkback_pagemap loop nbd video output sbs sbshc parport_pc lp
> parport snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> snd_seq_device snd_pcm_oss snd_mixer_oss bnx2 serio_raw snd_pcm
> snd_timer snd soundcore snd_page_alloc i2c_i801 iTCO_wdt
> iTCO_vendor_support i2c_core pata_acpi ata_generic pcspkr ata_piix
> shpchp mptsas mptscsih mptbase [last unloaded: freq_table]
> > Pid: 0, comm: swapper Not tainted 2.6.32.36xen #1
> > Call Trace:
> > <IRQ> [<ffffffff813ba154>] ? dev_watchdog+0x105/0x16a
> > [<ffffffff81056666>] warn_slowpath_common+0x7c/0x94
> > [<ffffffff81056738>] warn_slowpath_fmt+0xa4/0xa6
> > [<ffffffff81080bfa>] ? clockevents_program_event+0x78/0x81
> > [<ffffffff81081fce>] ? tick_program_event+0x2a/0x2c
> > [<ffffffff813b951d>] ? __netif_tx_lock+0x1b/0x24
> > [<ffffffff813b95a8>] ? netif_tx_lock+0x46/0x6e
> > [<ffffffff813a3ed1>] ? netdev_drivername+0x48/0x4f
> > [<ffffffff813ba154>] dev_watchdog+0x105/0x16a
> > [<ffffffff81063d98>] run_timer_softirq+0x156/0x1f8
> > [<ffffffff813ba04f>] ? dev_watchdog+0x0/0x16a
> > [<ffffffff8105d6f0>] __do_softirq+0xd7/0x19e
> > [<ffffffff81013eac>] call_softirq+0x1c/0x30
> > [<ffffffff8101564b>] do_softirq+0x46/0x87
> > [<ffffffff8105d575>] irq_exit+0x3b/0x7a
> > [<ffffffff8128dcfe>] xen_evtchn_do_upcall+0x38/0x46
> > [<ffffffff81013efe>] xen_do_hypervisor_callback+0x1e/0x30
> > <EOI> [<ffffffff8103f642>] ? pick_next_task_idle+0x18/0x22
> > [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000
> > [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000
> > [<ffffffff8100f1bb>] ? xen_safe_halt+0x10/0x1a
> > [<ffffffff81019e14>] ? default_idle+0x39/0x56
> > [<ffffffff81011cd0>] ? cpu_idle+0x5d/0x8c
> > [<ffffffff8143375d>] ? cpu_bringup_and_idle+0x13/0x15
> > ---[ end trace 867bb8f6cd959b03 ]---
> > bnx2: <--- start FTQ dump on peth0 --->
> > bnx2: peth0: BNX2_RV2P_PFTQ_CTL 10000
> > bnx2: peth0: BNX2_RV2P_TFTQ_CTL 20000
> > bnx2: peth0: BNX2_RV2P_MFTQ_CTL 4000
> > bnx2: peth0: BNX2_TBDR_FTQ_CTL 1004002
> > bnx2: peth0: BNX2_TDMA_FTQ_CTL 4010002
> > bnx2: peth0: BNX2_TXP_FTQ_CTL 2410002
> > bnx2: peth0: BNX2_TPAT_FTQ_CTL 10002
> > bnx2: peth0: BNX2_RXP_CFTQ_CTL 8000
> > bnx2: peth0: BNX2_RXP_FTQ_CTL 100000
> > bnx2: peth0: BNX2_COM_COMXQ_FTQ_CTL 10000
> > bnx2: peth0: BNX2_COM_COMTQ_FTQ_CTL 20000
> > bnx2: peth0: BNX2_COM_COMQ_FTQ_CTL 10000
> > bnx2: peth0: BNX2_CP_CPQ_FTQ_CTL 4000
> > bnx2: peth0: TXP mode b84c state 80005000 evt_mask 500 pc 8000d60 pc
> 8000d60 instr 8f860000
> > bnx2: peth0: TPAT mode b84c state 80009000 evt_mask 500 pc 8000a5c pc
> 8000a5c instr 10400016
> > bnx2: peth0: RXP mode b84c state 80001000 evt_mask 500 pc 8004c14 pc
> 8004c14 instr 10e00088
> > bnx2: peth0: COM mode b8cc state 80000000 evt_mask 500 pc 8000b28 pc
> 8000a9c instr 8c530000
> > bnx2: peth0: CP mode b8cc state 80000000 evt_mask 500 pc 8000c50 pc
> 8000c58 instr 8ca50020
> > bnx2: <--- end FTQ dump on peth0 --->
> > bnx2: peth0 DEBUG: intr_sem[0]
> > bnx2: peth0 DEBUG: intr_sem[0] PCI_CMD[20100406]
> > bnx2: peth0 DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
> > bnx2: peth0 DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
> > bnx2: peth0 RPM_MGMT_PKT_CTRL[40000088]
> > bnx2: peth0 DEBUG: MCP_STATE_P0[0007e10e] MCP_STATE_P1[0003e00e]
> > bnx2: peth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01ff0000]
> > bnx2: peth0 DEBUG: PBA[00000000]
> > BNX2_PCICFG_INT_ACK_CMD[00013ce1]
> > bnx2: peth0: prevent chip reset during tx timeout
> > last status idx 2426
> > hw_tx_cons 32474, txr->hw_tx_conds 32474 txr->tx_prod 32641 txr-
> >tx_cons 32474
> > hw_rx_cons 19665, txr->hw_rx_conds 19665
> > sblk->status_attn_bits 1
> > sblk->status_attn_bits_ack 1
> > bnx2_tx_avail 88
> > sblk->status_tx_quick_consumer_index0 32474
> > sblk->status_tx_quick_consumer_index1 0
> > sblk->status_tx_quick_consumer_index2 0
> > sblk->status_tx_quick_consumer_index3 0
> > sblk->status_rx_quick_consumer_index0 19665
> > sblk->status_rx_quick_consumer_index1 0
> > sblk->status_rx_quick_consumer_index2 0
> > sblk->status_rx_quick_consumer_index3 0
> > sblk->status_rx_quick_consumer_index4 0
> > sblk->status_rx_quick_consumer_index5 0
> > sblk->status_rx_quick_consumer_index6 0
> > sblk->status_rx_quick_consumer_index7 0
> > sblk->status_rx_quick_consumer_index8 0
> > sblk->status_rx_quick_consumer_index9 0
> > sblk->status_rx_quick_consumer_index10 0
> > sblk->status_rx_quick_consumer_index11 0
> > sblk->status_rx_quick_consumer_index12 0
> > sblk->status_rx_quick_consumer_index13 0
> > sblk->status_rx_quick_consumer_index14 0
> > sblk->status_rx_quick_consumer_index15 0
> > sblk->status_completion_producer_index 0
> > sblk->status_cmd_consumer_index 0
> > sblk->status_idx 2426
> > sblk->status_unused 0
> > sblk->status_blk_num 0
> > hw_cons 32474 sw_cons 32474 ffff8801d27f85c0 bnapi
> > return hw_cons 32474 sw_cons 32474 ffff8801d27f85c0 bnapi
> > hw_cons 3628 sw_cons 3625 ffff8801d27f8bc0 bnapi
> > return hw_cons 3628 sw_cons 3625 ffff8801d27f8bc0 bnapi
> > hw_cons 62094 sw_cons 62090 ffff8801d27f91c0 bnapi
> > return hw_cons 62094 sw_cons 62090 ffff8801d27f91c0 bnapi
> > hw_cons 3184 sw_cons 3173 ffff8801d27f97c0 bnapi
> > return hw_cons 3184 sw_cons 3173 ffff8801d27f97c0 bnapi
> > hw_cons 0 sw_cons 0 ffff8801d27f9dc0 bnapi
> > return hw_cons 0 sw_cons 0 ffff8801d27f9dc0 bnapi
> 



^ permalink raw reply

* Re: divide error: 0000, in bictcp_cong_avoid, kernel 2.6.39
From: Stephen Hemminger @ 2011-07-04 17:36 UTC (permalink / raw)
  To: TB; +Cc: netdev
In-Reply-To: <4E11CCC4.2010404@techboom.com>

Any data about the type of connection, kernel configuration or other
information that might be useful in reproducing the problem?

Also please try 2.6.39.2

^ permalink raw reply

* Re: divide error: 0000, in bictcp_cong_avoid, kernel 2.6.39
From: TB @ 2011-07-04 18:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20110704103622.6294f9ed@nehalam.ftrdhcpuser.net>

On 11-07-04 01:36 PM, Stephen Hemminger wrote:
> Any data about the type of connection, kernel configuration or other
> information that might be useful in reproducing the problem?
> 
> Also please try 2.6.39.2

We haven't found a sure way of reproducing it.
It happened on 1.2% of our servers over the weekend and seems random.
Both are connected with 2 gigabit ports using bonding. Traffic tends to
be heavy, but doesn't seem to be a factor.

Would a .config help ?

Only the very basic filter module for iptables is compiled in.

We will try 2.6.39.2 soon

^ permalink raw reply

* [PATCH 4/6] dccp ccid-2: move rfc 3390 function into header file
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

This moves CCID-2's initial window function into the header file, since several
parts throughout the CCID-2 code need to call it (CCID-2 still uses RFC 3390).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Leandro Melo de Sales <leandro@ic.ufal.br>
---
 net/dccp/ccids/ccid2.c |    9 ---------
 net/dccp/ccids/ccid2.h |    9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -583,15 +583,6 @@ done:
 	dccp_ackvec_parsed_cleanup(&hc->tx_av_chunks);
 }
 
-/*
- * Convert RFC 3390 larger initial window into an equivalent number of packets.
- * This is based on the numbers specified in RFC 5681, 3.1.
- */
-static inline u32 rfc3390_bytes_to_packets(const u32 smss)
-{
-	return smss <= 1095 ? 4 : (smss > 2190 ? 2 : 3);
-}
-
 static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
 {
 	struct ccid2_hc_tx_sock *hc = ccid_priv(ccid);
--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -88,6 +88,15 @@ static inline bool ccid2_cwnd_network_limited(struct ccid2_hc_tx_sock *hc)
 	return hc->tx_pipe >= hc->tx_cwnd;
 }
 
+/*
+ * Convert RFC 3390 larger initial window into an equivalent number of packets.
+ * This is based on the numbers specified in RFC 5681, 3.1.
+ */
+static inline u32 rfc3390_bytes_to_packets(const u32 smss)
+{
+	return smss <= 1095 ? 4 : (smss > 2190 ? 2 : 3);
+}
+
 struct ccid2_hc_rx_sock {
 	int	rx_data;
 };

^ permalink raw reply

* [PATCH 5/6] dccp ccid-2: Use existing function to test for data packets
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

This replaces a switch statement with a test, using the equivalent
function dccp_data_packet(skb).  It also doubles the range of the field
`rx_num_data_pkts' by changing the type from `int' to `u32', avoiding
signed/unsigned comparison with the u16 field `dccps_r_ack_ratio'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid2.c |   16 ++++++----------
 net/dccp/ccids/ccid2.h |    6 +++++-
 2 files changed, 11 insertions(+), 11 deletions(-)

--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -627,18 +627,14 @@ static void ccid2_hc_tx_exit(struct sock *sk)
 
 static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
-	const struct dccp_sock *dp = dccp_sk(sk);
 	struct ccid2_hc_rx_sock *hc = ccid2_hc_rx_sk(sk);
 
-	switch (DCCP_SKB_CB(skb)->dccpd_type) {
-	case DCCP_PKT_DATA:
-	case DCCP_PKT_DATAACK:
-		hc->rx_data++;
-		if (hc->rx_data >= dp->dccps_r_ack_ratio) {
-			dccp_send_ack(sk);
-			hc->rx_data = 0;
-		}
-		break;
+	if (!dccp_data_packet(skb))
+		return;
+
+	if (++hc->rx_num_data_pkts >= dccp_sk(sk)->dccps_r_ack_ratio) {
+		dccp_send_ack(sk);
+		hc->rx_num_data_pkts = 0;
 	}
 }
 
--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -97,8 +97,12 @@ static inline u32 rfc3390_bytes_to_packets(const u32 smss)
 	return smss <= 1095 ? 4 : (smss > 2190 ? 2 : 3);
 }
 
+/**
+ * struct ccid2_hc_rx_sock  -  Receiving end of CCID-2 half-connection
+ * @rx_num_data_pkts: number of data packets received since last feedback
+ */
 struct ccid2_hc_rx_sock {
-	int	rx_data;
+	u32	rx_num_data_pkts;
 };
 
 static inline struct ccid2_hc_tx_sock *ccid2_hc_tx_sk(const struct sock *sk)

^ permalink raw reply

* [PATCH 6/6] dccp ccid-2: Perform congestion-window validation
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

CCID-2's cwnd increases like TCP during slow-start, which has implications for
 * the local Sequence Window value (should be > cwnd),
 * the Ack Ratio value.
Hence an exponential growth, if it does not reflect the actual network
conditions, can quickly lead to instability.

This patch adds congestion-window validation (RFC2861) to CCID-2:
 * cwnd is constrained if the sender is application limited;
 * cwnd is reduced after a long idle period, as suggested in the '90 paper
   by Van Jacobson, in RFC 2581 (sec. 4.1);
 * cwnd is never reduced below the RFC 3390 initial window.

As marked in the comments, the code is actually almost a direct copy of the
TCP congestion-window-validation algorithms. By continuing this work, it may
in future be possible to use the TCP code (not possible at the moment).

The mechanism can be turned off using a module parameter. Sampling of the
currently-used window (moving-maximum) is however done constantly; this is
used to determine the expected window, which can be exploited to regulate
DCCP's Sequence Window value.

This patch also sets slow-start-after-idle (RFC 4341, 5.1), i.e. it behaves like
TCP when net.ipv4.tcp_slow_start_after_idle = 1.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid2.h |   10 ++++++
 net/dccp/ccids/ccid2.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 91 insertions(+), 3 deletions(-)

--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -53,6 +53,10 @@ struct ccid2_seq {
  * @tx_rttvar:		     moving average/maximum of @mdev_max
  * @tx_rto:		     RTO value deriving from SRTT and RTTVAR (RFC 2988)
  * @tx_rtt_seq:		     to decay RTTVAR at most once per flight
+ * @tx_cwnd_used:	     actually used cwnd, W_used of RFC 2861
+ * @tx_expected_wnd:	     moving average of @tx_cwnd_used
+ * @tx_cwnd_stamp:	     to track idle periods in CWV
+ * @tx_lsndtime:	     last time (in jiffies) a data packet was sent
  * @tx_rpseq:		     last consecutive seqno
  * @tx_rpdupack:	     dupacks since rpseq
  * @tx_av_chunks:	     list of Ack Vectors received on current skb
@@ -76,6 +80,12 @@ struct ccid2_hc_tx_sock {
 	u64			tx_rtt_seq:48;
 	struct timer_list	tx_rtotimer;
 
+	/* Congestion Window validation (optional, RFC 2861) */
+	u32			tx_cwnd_used,
+				tx_expected_wnd,
+				tx_cwnd_stamp,
+				tx_lsndtime;
+
 	u64			tx_rpseq;
 	int			tx_rpdupack;
 	u32			tx_last_cong;
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -153,17 +153,93 @@ out:
 	sock_put(sk);
 }
 
+/*
+ *	Congestion window validation (RFC 2861).
+ */
+static int ccid2_do_cwv = 1;
+module_param(ccid2_do_cwv, bool, 0644);
+MODULE_PARM_DESC(ccid2_do_cwv, "Perform RFC2861 Congestion Window Validation");
+
+/**
+ * ccid2_update_used_window  -  Track how much of cwnd is actually used
+ * This is done in addition to CWV. The sender needs to have an idea of how many
+ * packets may be in flight, to set the local Sequence Window value accordingly
+ * (RFC 4340, 7.5.2). The CWV mechanism is exploited to keep track of the
+ * maximum-used window. We use an EWMA low-pass filter to filter out noise.
+ */
+static void ccid2_update_used_window(struct ccid2_hc_tx_sock *hc, u32 new_wnd)
+{
+	hc->tx_expected_wnd = (3 * hc->tx_expected_wnd + new_wnd) / 4;
+}
+
+/* This borrows the code of tcp_cwnd_application_limited() */
+static void ccid2_cwnd_application_limited(struct sock *sk, const u32 now)
+{
+	struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
+	/* don't reduce cwnd below the initial window (IW) */
+	u32 init_win = rfc3390_bytes_to_packets(dccp_sk(sk)->dccps_mss_cache),
+	    win_used = max(hc->tx_cwnd_used, init_win);
+
+	if (win_used < hc->tx_cwnd) {
+		hc->tx_ssthresh = max(hc->tx_ssthresh,
+				     (hc->tx_cwnd >> 1) + (hc->tx_cwnd >> 2));
+		hc->tx_cwnd = (hc->tx_cwnd + win_used) >> 1;
+	}
+	hc->tx_cwnd_used  = 0;
+	hc->tx_cwnd_stamp = now;
+}
+
+/* This borrows the code of tcp_cwnd_restart() */
+static void ccid2_cwnd_restart(struct sock *sk, const u32 now)
+{
+	struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
+	u32 cwnd = hc->tx_cwnd, restart_cwnd,
+	    iwnd = rfc3390_bytes_to_packets(dccp_sk(sk)->dccps_mss_cache);
+
+	hc->tx_ssthresh = max(hc->tx_ssthresh, (cwnd >> 1) + (cwnd >> 2));
+
+	/* don't reduce cwnd below the initial window (IW) */
+	restart_cwnd = min(cwnd, iwnd);
+	cwnd >>= (now - hc->tx_lsndtime) / hc->tx_rto;
+	hc->tx_cwnd = max(cwnd, restart_cwnd);
+
+	hc->tx_cwnd_stamp = now;
+	hc->tx_cwnd_used  = 0;
+}
+
 static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
 {
 	struct dccp_sock *dp = dccp_sk(sk);
 	struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
+	const u32 now = ccid2_time_stamp;
 	struct ccid2_seq *next;
 
-	hc->tx_pipe++;
+	/* slow-start after idle periods (RFC 2581, RFC 2861) */
+	if (ccid2_do_cwv && !hc->tx_pipe &&
+	    (s32)(now - hc->tx_lsndtime) >= hc->tx_rto)
+		ccid2_cwnd_restart(sk, now);
+
+	hc->tx_lsndtime = now;
+	hc->tx_pipe    += 1;
+
+	/* see whether cwnd was fully used (RFC 2861), update expected window */
+	if (ccid2_cwnd_network_limited(hc)) {
+		ccid2_update_used_window(hc, hc->tx_cwnd);
+		hc->tx_cwnd_used  = 0;
+		hc->tx_cwnd_stamp = now;
+	} else {
+		if (hc->tx_pipe > hc->tx_cwnd_used)
+			hc->tx_cwnd_used = hc->tx_pipe;
+
+		ccid2_update_used_window(hc, hc->tx_cwnd_used);
+
+		if (ccid2_do_cwv && (s32)(now - hc->tx_cwnd_stamp) >= hc->tx_rto)
+			ccid2_cwnd_application_limited(sk, now);
+	}
 
 	hc->tx_seqh->ccid2s_seq   = dp->dccps_gss;
 	hc->tx_seqh->ccid2s_acked = 0;
-	hc->tx_seqh->ccid2s_sent  = ccid2_time_stamp;
+	hc->tx_seqh->ccid2s_sent  = now;
 
 	next = hc->tx_seqh->ccid2s_next;
 	/* check if we need to alloc more space */
@@ -594,6 +670,7 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
 
 	/* Use larger initial windows (RFC 4341, section 5). */
 	hc->tx_cwnd = rfc3390_bytes_to_packets(dp->dccps_mss_cache);
+	hc->tx_expected_wnd = hc->tx_cwnd;
 
 	/* Make sure that Ack Ratio is enabled and within bounds. */
 	max_ratio = DIV_ROUND_UP(hc->tx_cwnd, 2);
@@ -606,7 +683,8 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
 
 	hc->tx_rto	 = DCCP_TIMEOUT_INIT;
 	hc->tx_rpdupack  = -1;
-	hc->tx_last_cong = ccid2_time_stamp;
+	hc->tx_last_cong = hc->tx_lsndtime = hc->tx_cwnd_stamp = ccid2_time_stamp;
+	hc->tx_cwnd_used = 0;
 	setup_timer(&hc->tx_rtotimer, ccid2_hc_tx_rto_expire,
 			(unsigned long)sk);
 	INIT_LIST_HEAD(&hc->tx_av_chunks);

^ permalink raw reply

* net-next-2.6 [PATCH 0/6] dccp: several long-tested updates from dccp test tree
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

Dear Dave,

in the usual slow steps (i.e. only a few patches per week) I would like to
continue submitting the DCCP test tree patches. Most of these have been in
tested for several years, several fix bugs or open issues, some are needed
by subsequent test tree patches.

This set contains several minor fixes and CWV for CCID-2.

  Patch #1: fixes several issues in slow-path input processing.
  Patch #2: refactors common pattern (queueing + cloning) into the same function.
  Patch #3: straightens cosmetics of info message appearing in logs.
  Patch #4: puts frequently used CCID-2 inline function into header file. 
  Patch #5: reuses existing function to test for data packets in CCID-2.
  Patch #6: implements congestion window validation (RFC 2861) in CCID-2.


This set is also available for today's net-next-2.6 at

	git://eden-feed.erg.abdn.ac.uk/net-next-2.6   ===>   subtree "dccp"

---
 net/dccp/ccid.c        |    4 +-
 net/dccp/ccids/ccid2.c |  109 ++++++++++++++++++++++++++++++++++++++----------
 net/dccp/ccids/ccid2.h |   25 +++++++++++-
 net/dccp/input.c       |   61 +++++++++++++--------------
 net/dccp/output.c      |   14 +++---
 5 files changed, 150 insertions(+), 63 deletions(-)

^ permalink raw reply

* [PATCH 3/6] dccp: cosmetics of info message
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

Change the CCID (de)activation message to start with the
protocol name, as 'CCID' is already in there.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccid.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/dccp/ccid.c
+++ b/net/dccp/ccid.c
@@ -118,7 +118,7 @@ static int ccid_activate(struct ccid_operations *ccid_ops)
 	if (ccid_ops->ccid_hc_tx_slab == NULL)
 		goto out_free_rx_slab;
 
-	pr_info("CCID: Activated CCID %d (%s)\n",
+	pr_info("DCCP: Activated CCID %d (%s)\n",
 		ccid_ops->ccid_id, ccid_ops->ccid_name);
 	err = 0;
 out:
@@ -136,7 +136,7 @@ static void ccid_deactivate(struct ccid_operations *ccid_ops)
 	ccid_kmem_cache_destroy(ccid_ops->ccid_hc_rx_slab);
 	ccid_ops->ccid_hc_rx_slab = NULL;
 
-	pr_info("CCID: Deactivated CCID %d (%s)\n",
+	pr_info("DCCP: Deactivated CCID %d (%s)\n",
 		ccid_ops->ccid_id, ccid_ops->ccid_name);
 }
 

^ permalink raw reply

* [PATCH 2/6] dccp: combine the functionality of enqeueing and cloning
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

Realising the following call pattern,
 * first dccp_entail() is called to enqueue a new skb and
 * then skb_clone() is called to transmit a clone of that skb,
this patch integrates both into the same function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/output.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -27,11 +27,13 @@ static inline void dccp_event_ack_sent(struct sock *sk)
 	inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
 }
 
-static void dccp_skb_entail(struct sock *sk, struct sk_buff *skb)
+/* enqueue @skb on sk_send_head for retransmission, return clone to send now */
+static struct sk_buff *dccp_skb_entail(struct sock *sk, struct sk_buff *skb)
 {
 	skb_set_owner_w(skb, sk);
 	WARN_ON(sk->sk_send_head);
 	sk->sk_send_head = skb;
+	return skb_clone(sk->sk_send_head, gfp_any());
 }
 
 /*
@@ -552,8 +554,7 @@ int dccp_connect(struct sock *sk)
 
 	DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_REQUEST;
 
-	dccp_skb_entail(sk, skb);
-	dccp_transmit_skb(sk, skb_clone(skb, GFP_KERNEL));
+	dccp_transmit_skb(sk, dccp_skb_entail(sk, skb));
 	DCCP_INC_STATS(DCCP_MIB_ACTIVEOPENS);
 
 	/* Timer for repeating the REQUEST until an answer. */
@@ -678,8 +679,7 @@ void dccp_send_close(struct sock *sk, const int active)
 		DCCP_SKB_CB(skb)->dccpd_type = DCCP_PKT_CLOSE;
 
 	if (active) {
-		dccp_skb_entail(sk, skb);
-		dccp_transmit_skb(sk, skb_clone(skb, prio));
+		skb = dccp_skb_entail(sk, skb);
 		/*
 		 * Retransmission timer for active-close: RFC 4340, 8.3 requires
 		 * to retransmit the Close/CloseReq until the CLOSING/CLOSEREQ
@@ -692,6 +692,6 @@ void dccp_send_close(struct sock *sk, const int active)
 		 */
 		inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
 					  DCCP_TIMEOUT_INIT, DCCP_RTO_MAX);
-	} else
-		dccp_transmit_skb(sk, skb);
+	}
+	dccp_transmit_skb(sk, skb);
 }

^ permalink raw reply

* [PATCH 1/6] dccp: Clean up slow-path input processing
From: Gerrit Renker @ 2011-07-04 19:01 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <test_tree_patch_set_update_2011-07-04>

This patch rearranges the order of statements of the slow-path input processing
(i.e. any other state than OPEN), to resolve the following issues.

 1. Dependencies: the order of statements now better matches RFC 4340, 8.5, i.e.
    step 7 is before step 9 (previously 9 was before 7), and parsing options in
    step 8 (which may consume resources) now comes after step 7.
 2. Sequence number checks are omitted if in state LISTEN/REQUEST, due to the
    note underneath the table in RFC 4340, 7.5.3.
    As a result, CCID processing is now indeed confined to OPEN/PARTOPEN states,
    i.e. congestion control is performed only on the flow of data packets. This
    avoids pathological cases of doing congestion control on those messages
    which set up and terminate the connection.
 3. Packets are now passed on to Ack Vector / CCID processing only after
    - step 7  (receive unexpected packets),
    - step 9  (receive Reset),
    - step 13 (receive CloseReq),
    - step 14 (receive Close)
    and only if the state is PARTOPEN. This simplifies CCID processing:
    - in LISTEN/CLOSED the CCIDs are non-existent;
    - in RESPOND/REQUEST the CCIDs have not yet been negotiated;
    - in CLOSEREQ and active-CLOSING the node has already closed this socket;
    - in passive-CLOSING the client is waiting for its Reset.
    In the last case, RFC 4340, 8.3 leaves it open to ignore further incoming
    data, which is the approach taken here.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/input.c |   61 ++++++++++++++++++++++++++---------------------------
 1 files changed, 30 insertions(+), 31 deletions(-)

--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -619,20 +619,31 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 		return 1;
 	}
 
-	if (sk->sk_state != DCCP_REQUESTING && sk->sk_state != DCCP_RESPOND) {
-		if (dccp_check_seqno(sk, skb))
-			goto discard;
-
-		/*
-		 * Step 8: Process options and mark acknowledgeable
-		 */
-		if (dccp_parse_options(sk, NULL, skb))
-			return 1;
+	/* Step 6: Check sequence numbers (omitted in LISTEN/REQUEST state) */
+	if (sk->sk_state != DCCP_REQUESTING && dccp_check_seqno(sk, skb))
+		goto discard;
 
-		dccp_handle_ackvec_processing(sk, skb);
-		dccp_deliver_input_to_ccids(sk, skb);
+	/*
+	 *   Step 7: Check for unexpected packet types
+	 *      If (S.is_server and P.type == Response)
+	 *	    or (S.is_client and P.type == Request)
+	 *	    or (S.state == RESPOND and P.type == Data),
+	 *	  Send Sync packet acknowledging P.seqno
+	 *	  Drop packet and return
+	 */
+	if ((dp->dccps_role != DCCP_ROLE_CLIENT &&
+	     dh->dccph_type == DCCP_PKT_RESPONSE) ||
+	    (dp->dccps_role == DCCP_ROLE_CLIENT &&
+	     dh->dccph_type == DCCP_PKT_REQUEST) ||
+	    (sk->sk_state == DCCP_RESPOND && dh->dccph_type == DCCP_PKT_DATA)) {
+		dccp_send_sync(sk, dcb->dccpd_seq, DCCP_PKT_SYNC);
+		goto discard;
 	}
 
+	/*  Step 8: Process options */
+	if (dccp_parse_options(sk, NULL, skb))
+		return 1;
+
 	/*
 	 *  Step 9: Process Reset
 	 *	If P.type == Reset,
@@ -640,31 +651,15 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 	 *		S.state := TIMEWAIT
 	 *		Set TIMEWAIT timer
 	 *		Drop packet and return
-	*/
+	 */
 	if (dh->dccph_type == DCCP_PKT_RESET) {
 		dccp_rcv_reset(sk, skb);
 		return 0;
-		/*
-		 *   Step 7: Check for unexpected packet types
-		 *      If (S.is_server and P.type == Response)
-		 *	    or (S.is_client and P.type == Request)
-		 *	    or (S.state == RESPOND and P.type == Data),
-		 *	  Send Sync packet acknowledging P.seqno
-		 *	  Drop packet and return
-		 */
-	} else if ((dp->dccps_role != DCCP_ROLE_CLIENT &&
-		    dh->dccph_type == DCCP_PKT_RESPONSE) ||
-		    (dp->dccps_role == DCCP_ROLE_CLIENT &&
-		     dh->dccph_type == DCCP_PKT_REQUEST) ||
-		    (sk->sk_state == DCCP_RESPOND &&
-		     dh->dccph_type == DCCP_PKT_DATA)) {
-		dccp_send_sync(sk, dcb->dccpd_seq, DCCP_PKT_SYNC);
-		goto discard;
-	} else if (dh->dccph_type == DCCP_PKT_CLOSEREQ) {
+	} else if (dh->dccph_type == DCCP_PKT_CLOSEREQ) {	/* Step 13 */
 		if (dccp_rcv_closereq(sk, skb))
 			return 0;
 		goto discard;
-	} else if (dh->dccph_type == DCCP_PKT_CLOSE) {
+	} else if (dh->dccph_type == DCCP_PKT_CLOSE) {		/* Step 14 */
 		if (dccp_rcv_close(sk, skb))
 			return 0;
 		goto discard;
@@ -679,8 +674,12 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 		__kfree_skb(skb);
 		return 0;
 
-	case DCCP_RESPOND:
 	case DCCP_PARTOPEN:
+		/* Step 8: if using Ack Vectors, mark packet acknowledgeable */
+		dccp_handle_ackvec_processing(sk, skb);
+		dccp_deliver_input_to_ccids(sk, skb);
+		/* fall through */
+	case DCCP_RESPOND:
 		queued = dccp_rcv_respond_partopen_state_process(sk, skb,
 								 dh, len);
 		break;

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Alexey Zaytsev @ 2011-07-04 20:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309792323.2247.33.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>> On Mon, 4 Jul 2011 16:27:26 +0200
>> Michael Büsch <m@bues.ch> wrote:
>> > We do this in b43, which has exactly the same DMA engine.
>>
>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>>  descriptor ring pointer.)
>
> I am wondering what happens if RX ring is set to 64, and we receive
> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>
> Alexey, could you try this patch please ?

Sorry, did not help.

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Alexey Zaytsev @ 2011-07-04 22:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael Büsch, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <CAB9v_DHHmeFtQ4jrtVRSGWBvB=W_=QBHCvMi-u25e0nPBQ1jAA@mail.gmail.com>

On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>>> On Mon, 4 Jul 2011 16:27:26 +0200
>>> Michael Büsch <m@bues.ch> wrote:
>>> > We do this in b43, which has exactly the same DMA engine.
>>>
>>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>>>  descriptor ring pointer.)
>>
>> I am wondering what happens if RX ring is set to 64, and we receive
>> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>>
>> Alexey, could you try this patch please ?
>
> Sorry, did not help.
>

Ran a few rounds of tcpdump. Seeing a significant number or duplicate
ACKs from the problematic machine. Not seeing them when testing
between this machine and an other linux box. Or the illumos machine
and the other linux box.

Dumps are available here:

http://zaytsev.su/tmp/caps/

dump1-3 - between the problematic machine an the illumos box,
collected on illumos side. All show dups.
dump5 - between an other linux box and the illumos machine, no dups.
Collcted on the illumos side.
dump-linux - between 2 linux machines, collected on the
non-problematic side. No dups, no corruptions.

192.168.0.33 - the problematic machine.
192.168.0.72 - the illumos machine.
192.168.0.122 - an other linux machine.

^ permalink raw reply

* Re: [PATCH] net/core: Make urgent data inline by default
From: David Miller @ 2011-07-04 23:38 UTC (permalink / raw)
  To: esa-pekka.pyokkimies; +Cc: netdev
In-Reply-To: <op.vx3hzgtw6ywr33@esapekka-pc.rad1>

From: Esa-Pekka Pyokkimies <esa-pekka.pyokkimies@stonesoft.com>
Date: Mon, 4 Jul 2011 15:51:06 +0300

> Make urgent data inline by default. As explained in RFC 6093, urgent
> data should never be handled out-of-band.
> 
> "The TCP urgent mechanism is NOT a mechanism for sending "out-of-band"
>  data: the so-called "urgent data" should be delivered "in-line" to
>  the TCP user."
> 
> Signed-off-by: Esa-Pekka Pyokkimies
> <esa-pekka.pyokkimies@stonesoft.com>

There is no way we can make this change, we've had the default
we currently have for 18+ years.  Breaking applications is a
very real possibility.

It doesn't matter what some RFC says.


^ permalink raw reply

* IPv6 destination cache in Linux kernel
From: Mikhail Paremski @ 2011-07-05  0:35 UTC (permalink / raw)
  To: netdev

Hello,

I'm trying to understand how Linux kernel routes IPv6 datagrams. It looks it does it much different from IPv4. While IPv4 uses Destination cache to find out what is the destination IP, MAC and interface index to send packet out, in IPv6 case it uses some destination tree to do that. It also  does not uses that tree in case, if destination address is not on a local network. Is it correct?So, I have a few questions:

1. What are the reasons to rout datagrams differently fromIPv4?
2. Where I could get details how IPv6 stack routes datagrams?
3. Are there around books like "IPv6 Core Protocols Implementation" but with Linux specific details?
4. Any other sources of information about this topic?

Thank you very much,
Mikhail. 

^ permalink raw reply

* Re: IPv6 destination cache in Linux kernel
From: David Miller @ 2011-07-05  0:46 UTC (permalink / raw)
  To: mparemm; +Cc: netdev
In-Reply-To: <CF5C0F47-9A82-44CF-8441-AC23C26E476B@gmail.com>

From: Mikhail Paremski <mparemm@gmail.com>
Date: Mon, 4 Jul 2011 17:35:04 -0700

> 1. What are the reasons to rout datagrams differently fromIPv4?

It was just a design decision made when the ipv6 routing code
was implemented.

> 2. Where I could get details how IPv6 stack routes datagrams?

By reading the source code.

^ permalink raw reply

* Re: net-next-2.6 [PATCH 0/6] dccp: several long-tested updates from dccp test tree
From: David Miller @ 2011-07-05  0:55 UTC (permalink / raw)
  To: gerrit; +Cc: dccp, netdev
In-Reply-To: <1309806079-21604-1-git-send-email-gerrit@erg.abdn.ac.uk>

From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Mon,  4 Jul 2011 13:01:13 -0600

> This set contains several minor fixes and CWV for CCID-2.
> 
>   Patch #1: fixes several issues in slow-path input processing.
>   Patch #2: refactors common pattern (queueing + cloning) into the same function.
>   Patch #3: straightens cosmetics of info message appearing in logs.
>   Patch #4: puts frequently used CCID-2 inline function into header file. 
>   Patch #5: reuses existing function to test for data packets in CCID-2.
>   Patch #6: implements congestion window validation (RFC 2861) in CCID-2.
> 
> This set is also available for today's net-next-2.6 at
> 
> 	git://eden-feed.erg.abdn.ac.uk/net-next-2.6   ===>   subtree "dccp"

Pulled, thanks Gerrit.

^ permalink raw reply

* Re: [PATCH net-next-2.6] b44: use dev_kfree_skb() in b44_tx()
From: David Miller @ 2011-07-05  0:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1309794113.2247.36.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 04 Jul 2011 17:41:53 +0200

> b44_tx() is run from softirq handler, it can use dev_kfree_skb() instead
> of dev_kfree_skb_irq()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH 1/2] Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables
From: Shan Wei @ 2011-07-05  1:34 UTC (permalink / raw)
  To: Max Matveev; +Cc: linux-sctp, netdev
In-Reply-To: <20110704083605.AF9C28156C57@regina.usersys.redhat.com>

Max Matveev wrote, at 06/20/2011 04:08 PM:
> sctp does not use second and third ("default" and "max") values
> of sctp_(r|w)mem tunables. 

Avoid confusion, but you introduced new confusion.
Hope that you also can correct your changelog in next version. 


-- 
Best Regards
-----
Shan Wei

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox