netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCHES 0/7]: Reorganization of RX history patches
@ 2007-12-02 21:36 Arnaldo Carvalho de Melo
  2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar

WARNING: After reading some messages from Ingo Molnar on lkml I think we should really
         trim the number of lists we use for kernel development. And since I moved
	 back to using mutt for reading e-mails, something I should have never, ever
	 stopped doing, I guess we should move the DCCP discussions to netdev,
	 where we hopefully can get more people interested and reviewing the work we
	 do, so please consider moving DCCP discussion to netdev@vger.kernel.org,
	 where lots of smart networking folks are present and can help our efforts
	 on turning RFCs to code.

Back to business...:

Hi Gerrit,

	Please take a look at this patch series where I reorganized your work on the new
TFRC rx history handling code. I'll wait for your considerations and then do as many
interactions as reasonable to get your work merged.

	It should be completely equivalent, plus some fixes and optimizations, such as:

. The code that allocates the RX ring deals with failures when one of the entries in
  the ring buffer is not successfully allocated, the original code was leaking the
  successfully allocated entries.

. We do just one allocation for the ring buffer, as the number of entries is fixed we
  should just do one allocation and not TFRC_NDUPACK times.

. I haven't checked if all the code was commited, as I tried to introduce just what was
  immediatelly used, probably we'll need to do some changes when working on the merge
  of your loss intervals code.

. I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the first
  non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the
  initial value in the EWMA calculation.

. I also moved some patch parts (hunks) around trying to improve the readability of
  the patches, trying to get things that logically replaced what was there before
  closer together.

. Separation of parts of your patches and combination of others is also another thing
  you'll see in this patch set. I understand that it is difficult to find the right
  compromise and I hope you don't feel too bad with the decisions I made, eventually
  we'll find a common ground.

. Another change was related to namespacing, I added tfrc_rx_hist_ to a number of
  functions and in some cases just normalised the naming to be consistent.

. I'm not that happy with deferring changes to the loss intervals code that uses
  rx handling data structures, but I'm OK with leaving some code commented out till
  we get to merging the new loss intervals code.

	For what is worth I leave her my deep appreciation of your work and also my
(repeated) apologies for not being able to do these kinds of review sessions months ago,
but I also I'm willing and able to cure these shortcomings by continuing the work I've
been doing recently on finally reviewing your hard work, keep it up!

	It is available at:

master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25

Best Regards,

- Arnaldo

 b/net/dccp/ccids/Kconfig                       |   13
 b/net/dccp/ccids/ccid3.c                       |   35 -
 b/net/dccp/ccids/ccid3.h                       |   14
 b/net/dccp/ccids/lib/Makefile                  |    2
 b/net/dccp/ccids/lib/loss_interval.c           |   14
 b/net/dccp/ccids/lib/packet_history.c          |   27 -
 b/net/dccp/ccids/lib/packet_history.h          |    3
 b/net/dccp/ccids/lib/packet_history_internal.h |   68 +++
 b/net/dccp/ccids/lib/tfrc.c                    |   48 ++
 b/net/dccp/ccids/lib/tfrc.h                    |   18
 b/net/dccp/dccp.h                              |   13
 net/dccp/ccids/ccid3.c                         |  289 ++++----------
 net/dccp/ccids/lib/loss_interval.c             |   14
 net/dccp/ccids/lib/packet_history.c            |  483 +++++++++++++------------
 net/dccp/ccids/lib/packet_history.h            |  175 +++------
 15 files changed, 602 insertions(+), 614 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/7] [TFRC]: Provide central source file and debug facility
  2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo
@ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo
  2007-12-02 21:36   ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo
  2007-12-03  8:23 ` [RFC][PATCHES 0/7]: Reorganization of RX history patches Ian McDonald
  2007-12-03  8:35 ` Gerrit Renker
  2 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker
  Cc: netdev, dccp, Ingo Molnar, Gerrit Renker, Ian McDonald,
	Arnaldo Carvalho de Melo

From: Gerrit Renker <gerrit@erg.abdn.ac.uk>

This patch changes the tfrc_lib module in the following manner:

 (1) a dedicated tfrc source file to call the packet history &
     loss interval init/exit functions.
 (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.

Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3
select IP_DCCP_TFRC_LIB.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/Kconfig              |   13 ++++++---
 net/dccp/ccids/lib/Makefile         |    2 +-
 net/dccp/ccids/lib/packet_history.c |   27 ++-----------------
 net/dccp/ccids/lib/packet_history.h |    3 +-
 net/dccp/ccids/lib/tfrc.c           |   48 +++++++++++++++++++++++++++++++++++
 net/dccp/ccids/lib/tfrc.h           |   17 +++++++++---
 6 files changed, 75 insertions(+), 35 deletions(-)
 create mode 100644 net/dccp/ccids/lib/tfrc.c

diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig
index 3d7d867..1227594 100644
--- a/net/dccp/ccids/Kconfig
+++ b/net/dccp/ccids/Kconfig
@@ -38,6 +38,7 @@ config IP_DCCP_CCID2_DEBUG
 config IP_DCCP_CCID3
 	tristate "CCID3 (TCP-Friendly) (EXPERIMENTAL)"
 	def_tristate IP_DCCP
+	select IP_DCCP_TFRC_LIB
 	---help---
 	  CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based
 	  rate-controlled congestion control mechanism.  TFRC is designed to
@@ -63,10 +64,6 @@ config IP_DCCP_CCID3
 
 	  If in doubt, say M.
 
-config IP_DCCP_TFRC_LIB
-	depends on IP_DCCP_CCID3
-	def_tristate IP_DCCP_CCID3
-
 config IP_DCCP_CCID3_DEBUG
 	  bool "CCID3 debugging messages"
 	  depends on IP_DCCP_CCID3
@@ -110,5 +107,13 @@ config IP_DCCP_CCID3_RTO
 	    is serious network congestion: experimenting with larger values should
 	    therefore not be performed on WANs.
 
+config IP_DCCP_TFRC_LIB
+	tristate
+	default n
+
+config IP_DCCP_TFRC_DEBUG
+	bool
+	depends on IP_DCCP_TFRC_LIB
+	default y if IP_DCCP_CCID3_DEBUG
 
 endmenu
diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile
index 5f940a6..68c93e3 100644
--- a/net/dccp/ccids/lib/Makefile
+++ b/net/dccp/ccids/lib/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o
 
-dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o
+dccp_tfrc_lib-y := tfrc.o tfrc_equation.o packet_history.o loss_interval.o
diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
index 4805de9..1d4d6ee 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -35,7 +35,6 @@
  *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include <linux/module.h>
 #include <linux/string.h>
 #include "packet_history.h"
 
@@ -277,39 +276,19 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list)
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
-extern int __init dccp_li_init(void);
-extern void dccp_li_exit(void);
-
-static __init int packet_history_init(void)
+__init int packet_history_init(void)
 {
-	if (dccp_li_init() != 0)
-		goto out;
-
 	tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist",
 					 sizeof(struct tfrc_tx_hist_entry), 0,
 					 SLAB_HWCACHE_ALIGN, NULL);
-	if (tfrc_tx_hist == NULL)
-		goto out_li_exit;
 
-	return 0;
-out_li_exit:
-	dccp_li_exit();
-out:
-	return -ENOBUFS;
+	return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
 }
-module_init(packet_history_init);
 
-static __exit void packet_history_exit(void)
+void packet_history_exit(void)
 {
 	if (tfrc_tx_hist != NULL) {
 		kmem_cache_destroy(tfrc_tx_hist);
 		tfrc_tx_hist = NULL;
 	}
-	dccp_li_exit();
 }
-module_exit(packet_history_exit);
-
-MODULE_AUTHOR("Ian McDonald <ian.mcdonald@jandi.co.nz>, "
-	      "Arnaldo Carvalho de Melo <acme@ghostprotocols.net>");
-MODULE_DESCRIPTION("DCCP TFRC library");
-MODULE_LICENSE("GPL");
diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h
index 0670f46..9a2642e 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -39,8 +39,7 @@
 #include <linux/ktime.h>
 #include <linux/list.h>
 #include <linux/slab.h>
-
-#include "../../dccp.h"
+#include "tfrc.h"
 
 /* Number of later packets received before one is considered lost */
 #define TFRC_RECV_NUM_LATE_LOSS	 3
diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c
new file mode 100644
index 0000000..3a7a183
--- /dev/null
+++ b/net/dccp/ccids/lib/tfrc.c
@@ -0,0 +1,48 @@
+/*
+ * TFRC: main module holding the pieces of the TFRC library together
+ *
+ * Copyright (c) 2007 The University of Aberdeen, Scotland, UK
+ * Copyright (c) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+ */
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include "tfrc.h"
+
+#ifdef CONFIG_IP_DCCP_TFRC_DEBUG
+int tfrc_debug;
+module_param(tfrc_debug, bool, 0444);
+MODULE_PARM_DESC(tfrc_debug, "Enable debug messages");
+#endif
+
+extern int  dccp_li_init(void);
+extern void dccp_li_exit(void);
+extern int packet_history_init(void);
+extern void packet_history_exit(void);
+
+static int __init tfrc_module_init(void)
+{
+	int rc = dccp_li_init();
+
+	if (rc == 0) {
+		rc = packet_history_init();
+		if (rc != 0)
+			dccp_li_exit();
+	}
+
+	return rc;
+}
+
+static void __exit tfrc_module_exit(void)
+{
+	packet_history_exit();
+	dccp_li_exit();
+}
+
+module_init(tfrc_module_init);
+module_exit(tfrc_module_exit);
+
+MODULE_AUTHOR("Gerrit Renker <gerrit@erg.abdn.ac.uk>, "
+	      "Ian McDonald <ian.mcdonald@jandi.co.nz>, "
+	      "Arnaldo Carvalho de Melo <acme@redhat.com>");
+MODULE_DESCRIPTION("DCCP TFRC library");
+MODULE_LICENSE("GPL");
diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h
index 5a0ba86..ab8848c 100644
--- a/net/dccp/ccids/lib/tfrc.h
+++ b/net/dccp/ccids/lib/tfrc.h
@@ -3,10 +3,11 @@
 /*
  *  net/dccp/ccids/lib/tfrc.h
  *
- *  Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
- *  Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz>
- *  Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
- *  Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-6 Ian McDonald <ian.mcdonald@jandi.co.nz>
+ *  Copyright (c) 2005   Arnaldo Carvalho de Melo <acme@conectiva.com.br>
+ *  Copyright (c) 2003   Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -15,6 +16,14 @@
  */
 #include <linux/types.h>
 #include <asm/div64.h>
+#include "../../dccp.h"
+
+#ifdef CONFIG_IP_DCCP_TFRC_DEBUG
+extern int tfrc_debug;
+#define tfrc_pr_debug(format, a...)	DCCP_PR_DEBUG(tfrc_debug, format, ##a)
+#else
+#define tfrc_pr_debug(format, a...)
+#endif
 
 /* integer-arithmetic divisions of type (a * 1000000)/b */
 static inline u64 scaled_div(u64 a, u32 b)
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets'
  2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo
@ 2007-12-02 21:36   ` Arnaldo Carvalho de Melo
  2007-12-02 21:36     ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker
  Cc: netdev, dccp, Ingo Molnar, Gerrit Renker, Ian McDonald,
	Arnaldo Carvalho de Melo

From: Gerrit Renker <gerrit@erg.abdn.ac.uk>

as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/dccp.h |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index ee97950..f4a5ea1 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -334,6 +334,7 @@ struct dccp_skb_cb {
 
 #define DCCP_SKB_CB(__skb) ((struct dccp_skb_cb *)&((__skb)->cb[0]))
 
+/* RFC 4340, sec. 7.7 */
 static inline int dccp_non_data_packet(const struct sk_buff *skb)
 {
 	const __u8 type = DCCP_SKB_CB(skb)->dccpd_type;
@@ -346,6 +347,17 @@ static inline int dccp_non_data_packet(const struct sk_buff *skb)
 	       type == DCCP_PKT_SYNCACK;
 }
 
+/* RFC 4340, sec. 7.7 */
+static inline int dccp_data_packet(const struct sk_buff *skb)
+{
+	const __u8 type = DCCP_SKB_CB(skb)->dccpd_type;
+
+	return type == DCCP_PKT_DATA	 ||
+	       type == DCCP_PKT_DATAACK  ||
+	       type == DCCP_PKT_REQUEST  ||
+	       type == DCCP_PKT_RESPONSE;
+}
+
 static inline int dccp_packet_without_ack(const struct sk_buff *skb)
 {
 	const __u8 type = DCCP_SKB_CB(skb)->dccpd_type;
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
  2007-12-02 21:36   ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo
@ 2007-12-02 21:36     ` Arnaldo Carvalho de Melo
  2007-12-02 21:36       ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo
  2007-12-06 13:57       ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Gerrit Renker
  0 siblings, 2 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/lib/packet_history.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
index 1d4d6ee..b628714 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -53,7 +53,7 @@ struct tfrc_tx_hist_entry {
 /*
  * Transmitter History Routines
  */
-static struct kmem_cache *tfrc_tx_hist;
+static struct kmem_cache *tfrc_tx_hist_slab;
 
 static struct tfrc_tx_hist_entry *
 	tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno)
@@ -66,7 +66,7 @@ static struct tfrc_tx_hist_entry *
 
 int tfrc_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno)
 {
-	struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist, gfp_any());
+	struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist_slab, gfp_any());
 
 	if (entry == NULL)
 		return -ENOBUFS;
@@ -85,7 +85,7 @@ void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp)
 	while (head != NULL) {
 		struct tfrc_tx_hist_entry *next = head->next;
 
-		kmem_cache_free(tfrc_tx_hist, head);
+		kmem_cache_free(tfrc_tx_hist_slab, head);
 		head = next;
 	}
 
@@ -278,17 +278,17 @@ EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
 
 __init int packet_history_init(void)
 {
-	tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist",
-					 sizeof(struct tfrc_tx_hist_entry), 0,
-					 SLAB_HWCACHE_ALIGN, NULL);
+	tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist",
+					      sizeof(struct tfrc_tx_hist_entry), 0,
+					      SLAB_HWCACHE_ALIGN, NULL);
 
-	return tfrc_tx_hist == NULL ? -ENOBUFS : 0;
+	return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0;
 }
 
 void packet_history_exit(void)
 {
-	if (tfrc_tx_hist != NULL) {
-		kmem_cache_destroy(tfrc_tx_hist);
-		tfrc_tx_hist = NULL;
+	if (tfrc_tx_hist_slab != NULL) {
+		kmem_cache_destroy(tfrc_tx_hist_slab);
+		tfrc_tx_hist_slab = NULL;
 	}
 }
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/7] [TFRC]: Make the rx history slab be global
  2007-12-02 21:36     ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo
@ 2007-12-02 21:36       ` Arnaldo Carvalho de Melo
  2007-12-02 21:36         ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo
  2007-12-06 13:59         ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Gerrit Renker
  2007-12-06 13:57       ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Gerrit Renker
  1 sibling, 2 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo

This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/ccid3.c              |   35 ++-----------
 net/dccp/ccids/lib/packet_history.c |   95 ++++++++++++++++++-----------------
 net/dccp/ccids/lib/packet_history.h |   43 ++--------------
 3 files changed, 60 insertions(+), 113 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 5dea690..07920bb 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -49,8 +49,6 @@ static int ccid3_debug;
 #define ccid3_pr_debug(format, a...)
 #endif
 
-static struct dccp_rx_hist *ccid3_rx_hist;
-
 /*
  *	Transmitter Half-Connection Routines
  */
@@ -807,9 +805,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk,
 	}
 
 detect_out:
-	dccp_rx_hist_add_packet(ccid3_rx_hist, &hcrx->ccid3hcrx_hist,
-		   &hcrx->ccid3hcrx_li_hist, packet,
-		   hcrx->ccid3hcrx_seqno_nonloss);
+	dccp_rx_hist_add_packet(&hcrx->ccid3hcrx_hist,
+				&hcrx->ccid3hcrx_li_hist, packet,
+				hcrx->ccid3hcrx_seqno_nonloss);
 	return loss;
 }
 
@@ -852,8 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
-	packet = dccp_rx_hist_entry_new(ccid3_rx_hist, opt_recv->dccpor_ndp,
-					skb, GFP_ATOMIC);
+	packet = dccp_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC);
 	if (unlikely(packet == NULL)) {
 		DCCP_WARN("%s(%p), Not enough mem to add rx packet "
 			  "to history, consider it lost!\n", dccp_role(sk), sk);
@@ -936,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk)
 	ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM);
 
 	/* Empty packet history */
-	dccp_rx_hist_purge(ccid3_rx_hist, &hcrx->ccid3hcrx_hist);
+	dccp_rx_hist_purge(&hcrx->ccid3hcrx_hist);
 
 	/* Empty loss interval history */
 	dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist);
@@ -1013,33 +1010,13 @@ MODULE_PARM_DESC(ccid3_debug, "Enable debug messages");
 
 static __init int ccid3_module_init(void)
 {
-	int rc = -ENOBUFS;
-
-	ccid3_rx_hist = dccp_rx_hist_new("ccid3");
-	if (ccid3_rx_hist == NULL)
-		goto out;
-
-	rc = ccid_register(&ccid3);
-	if (rc != 0)
-		goto out_free_rx;
-out:
-	return rc;
-
-out_free_rx:
-	dccp_rx_hist_delete(ccid3_rx_hist);
-	ccid3_rx_hist = NULL;
-	goto out;
+	return ccid_register(&ccid3);
 }
 module_init(ccid3_module_init);
 
 static __exit void ccid3_module_exit(void)
 {
 	ccid_unregister(&ccid3);
-
-	if (ccid3_rx_hist != NULL) {
-		dccp_rx_hist_delete(ccid3_rx_hist);
-		ccid3_rx_hist = NULL;
-	}
 }
 module_exit(ccid3_module_exit);
 
diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
index b628714..e1ab853 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -114,48 +114,33 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
 /*
  * 	Receiver History Routines
  */
-struct dccp_rx_hist *dccp_rx_hist_new(const char *name)
+static struct kmem_cache *tfrc_rx_hist_slab;
+
+struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp,
+						  const struct sk_buff *skb,
+						  const gfp_t prio)
 {
-	struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC);
-	static const char dccp_rx_hist_mask[] = "rx_hist_%s";
-	char *slab_name;
-
-	if (hist == NULL)
-		goto out;
-
-	slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1,
-			    GFP_ATOMIC);
-	if (slab_name == NULL)
-		goto out_free_hist;
-
-	sprintf(slab_name, dccp_rx_hist_mask, name);
-	hist->dccprxh_slab = kmem_cache_create(slab_name,
-					     sizeof(struct dccp_rx_hist_entry),
-					     0, SLAB_HWCACHE_ALIGN, NULL);
-	if (hist->dccprxh_slab == NULL)
-		goto out_free_slab_name;
-out:
-	return hist;
-out_free_slab_name:
-	kfree(slab_name);
-out_free_hist:
-	kfree(hist);
-	hist = NULL;
-	goto out;
-}
+	struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab,
+							    prio);
 
-EXPORT_SYMBOL_GPL(dccp_rx_hist_new);
+	if (entry != NULL) {
+		const struct dccp_hdr *dh = dccp_hdr(skb);
 
-void dccp_rx_hist_delete(struct dccp_rx_hist *hist)
-{
-	const char* name = kmem_cache_name(hist->dccprxh_slab);
+		entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
+		entry->dccphrx_ccval = dh->dccph_ccval;
+		entry->dccphrx_type  = dh->dccph_type;
+		entry->dccphrx_ndp   = ndp;
+		entry->dccphrx_tstamp = ktime_get_real();
+	}
 
-	kmem_cache_destroy(hist->dccprxh_slab);
-	kfree(name);
-	kfree(hist);
+	return entry;
 }
+EXPORT_SYMBOL_GPL(dccp_rx_hist_entry_new);
 
-EXPORT_SYMBOL_GPL(dccp_rx_hist_delete);
+static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist_entry *entry)
+{
+	kmem_cache_free(tfrc_rx_hist_slab, entry);
+}
 
 int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
 			    u8 *ccval)
@@ -192,11 +177,10 @@ struct dccp_rx_hist_entry *
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet);
 
-void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
-			    struct list_head *rx_list,
-			    struct list_head *li_list,
-			    struct dccp_rx_hist_entry *packet,
-			    u64 nonloss_seqno)
+void dccp_rx_hist_add_packet(struct list_head *rx_list,
+			     struct list_head *li_list,
+			     struct dccp_rx_hist_entry *packet,
+			     u64 nonloss_seqno)
 {
 	struct dccp_rx_hist_entry *entry, *next;
 	u8 num_later = 0;
@@ -211,7 +195,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
 				if (after48(nonloss_seqno,
 				   entry->dccphrx_seqno)) {
 					list_del_init(&entry->dccphrx_node);
-					dccp_rx_hist_entry_delete(hist, entry);
+					dccp_rx_hist_entry_delete(entry);
 				}
 			} else if (dccp_rx_hist_entry_data_packet(entry))
 				--num_later;
@@ -253,7 +237,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
 					break;
 				case 3:
 					list_del_init(&entry->dccphrx_node);
-					dccp_rx_hist_entry_delete(hist, entry);
+					dccp_rx_hist_entry_delete(entry);
 					break;
 				}
 			} else if (dccp_rx_hist_entry_data_packet(entry))
@@ -264,13 +248,13 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
 
 EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet);
 
-void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list)
+void dccp_rx_hist_purge(struct list_head *list)
 {
 	struct dccp_rx_hist_entry *entry, *next;
 
 	list_for_each_entry_safe(entry, next, list, dccphrx_node) {
 		list_del_init(&entry->dccphrx_node);
-		kmem_cache_free(hist->dccprxh_slab, entry);
+		dccp_rx_hist_entry_delete(entry);
 	}
 }
 
@@ -281,8 +265,22 @@ __init int packet_history_init(void)
 	tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist",
 					      sizeof(struct tfrc_tx_hist_entry), 0,
 					      SLAB_HWCACHE_ALIGN, NULL);
+	if (tfrc_tx_hist_slab == NULL)
+		goto out_err;
 
-	return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0;
+	tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist",
+					      sizeof(struct dccp_rx_hist_entry), 0,
+					      SLAB_HWCACHE_ALIGN, NULL);
+	if (tfrc_rx_hist_slab == NULL)
+		goto out_free_tx;
+
+	return 0;
+
+out_free_tx:
+	kmem_cache_destroy(tfrc_tx_hist_slab);
+	tfrc_tx_hist_slab = NULL;
+out_err:
+	return -ENOBUFS;
 }
 
 void packet_history_exit(void)
@@ -291,4 +289,9 @@ void packet_history_exit(void)
 		kmem_cache_destroy(tfrc_tx_hist_slab);
 		tfrc_tx_hist_slab = NULL;
 	}
+
+	if (tfrc_rx_hist_slab != NULL) {
+		kmem_cache_destroy(tfrc_rx_hist_slab);
+		tfrc_rx_hist_slab = NULL;
+	}
 }
diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h
index 9a2642e..34b180b 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -66,34 +66,10 @@ struct dccp_rx_hist_entry {
 	ktime_t		 dccphrx_tstamp;
 };
 
-struct dccp_rx_hist {
-	struct kmem_cache *dccprxh_slab;
-};
-
-extern struct dccp_rx_hist *dccp_rx_hist_new(const char *name);
-extern void 		dccp_rx_hist_delete(struct dccp_rx_hist *hist);
-
-static inline struct dccp_rx_hist_entry *
-			dccp_rx_hist_entry_new(struct dccp_rx_hist *hist,
-					       const u32 ndp,
+extern struct dccp_rx_hist_entry *
+			dccp_rx_hist_entry_new(const u32 ndp,
 					       const struct sk_buff *skb,
-					       const gfp_t prio)
-{
-	struct dccp_rx_hist_entry *entry = kmem_cache_alloc(hist->dccprxh_slab,
-							    prio);
-
-	if (entry != NULL) {
-		const struct dccp_hdr *dh = dccp_hdr(skb);
-
-		entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
-		entry->dccphrx_ccval = dh->dccph_ccval;
-		entry->dccphrx_type  = dh->dccph_type;
-		entry->dccphrx_ndp   = ndp;
-		entry->dccphrx_tstamp = ktime_get_real();
-	}
-
-	return entry;
-}
+					       const gfp_t prio);
 
 static inline struct dccp_rx_hist_entry *
 			dccp_rx_hist_head(struct list_head *list)
@@ -111,21 +87,12 @@ extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
 extern struct dccp_rx_hist_entry *
 		dccp_rx_hist_find_data_packet(const struct list_head *list);
 
-extern void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist,
-				    struct list_head *rx_list,
+extern void dccp_rx_hist_add_packet(struct list_head *rx_list,
 				    struct list_head *li_list,
 				    struct dccp_rx_hist_entry *packet,
 				    u64 nonloss_seqno);
 
-static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist *hist,
-					     struct dccp_rx_hist_entry *entry)
-{
-	if (entry != NULL)
-		kmem_cache_free(hist->dccprxh_slab, entry);
-}
-
-extern void dccp_rx_hist_purge(struct dccp_rx_hist *hist,
-			       struct list_head *list);
+extern void dccp_rx_hist_purge(struct list_head *list);
 
 static inline int
 	dccp_rx_hist_entry_data_packet(const struct dccp_rx_hist_entry *entry)
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_
  2007-12-02 21:36       ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo
@ 2007-12-02 21:36         ` Arnaldo Carvalho de Melo
  2007-12-02 21:36           ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo
  2007-12-06 13:59           ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Gerrit Renker
  2007-12-06 13:59         ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Gerrit Renker
  1 sibling, 2 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo

This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/ccid3.c              |   32 ++++++------
 net/dccp/ccids/lib/loss_interval.c  |   14 +++---
 net/dccp/ccids/lib/packet_history.c |   90 +++++++++++++++++-----------------
 net/dccp/ccids/lib/packet_history.h |   48 +++++++++---------
 4 files changed, 92 insertions(+), 92 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 07920bb..c95dca8 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -677,7 +677,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
-	struct dccp_rx_hist_entry *packet;
+	struct tfrc_rx_hist_entry *packet;
 	ktime_t now;
 	suseconds_t delta;
 
@@ -701,7 +701,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
 		return;
 	}
 
-	packet = dccp_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
+	packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
 	if (unlikely(packet == NULL)) {
 		DCCP_WARN("%s(%p), no data packet in history!\n",
 			  dccp_role(sk), sk);
@@ -709,7 +709,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk)
 	}
 
 	hcrx->ccid3hcrx_tstamp_last_feedback = now;
-	hcrx->ccid3hcrx_ccval_last_counter   = packet->dccphrx_ccval;
+	hcrx->ccid3hcrx_ccval_last_counter   = packet->tfrchrx_ccval;
 	hcrx->ccid3hcrx_bytes_recv	     = 0;
 
 	if (hcrx->ccid3hcrx_p == 0)
@@ -752,12 +752,12 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
 }
 
 static int ccid3_hc_rx_detect_loss(struct sock *sk,
-				    struct dccp_rx_hist_entry *packet)
+				    struct tfrc_rx_hist_entry *packet)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
-	struct dccp_rx_hist_entry *rx_hist =
-				dccp_rx_hist_head(&hcrx->ccid3hcrx_hist);
-	u64 seqno = packet->dccphrx_seqno;
+	struct tfrc_rx_hist_entry *rx_hist =
+				tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist);
+	u64 seqno = packet->tfrchrx_seqno;
 	u64 tmp_seqno;
 	int loss = 0;
 	u8 ccval;
@@ -766,9 +766,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk,
 	tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
 
 	if (!rx_hist ||
-	   follows48(packet->dccphrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
+	   follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
 		hcrx->ccid3hcrx_seqno_nonloss = seqno;
-		hcrx->ccid3hcrx_ccval_nonloss = packet->dccphrx_ccval;
+		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
 		goto detect_out;
 	}
 
@@ -789,7 +789,7 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk,
 		dccp_inc_seqno(&tmp_seqno);
 		hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
 		dccp_inc_seqno(&tmp_seqno);
-		while (dccp_rx_hist_find_entry(&hcrx->ccid3hcrx_hist,
+		while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist,
 		   tmp_seqno, &ccval)) {
 			hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
 			hcrx->ccid3hcrx_ccval_nonloss = ccval;
@@ -799,13 +799,13 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk,
 
 	/* FIXME - this code could be simplified with above while */
 	/* but works at moment */
-	if (follows48(packet->dccphrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
+	if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
 		hcrx->ccid3hcrx_seqno_nonloss = seqno;
-		hcrx->ccid3hcrx_ccval_nonloss = packet->dccphrx_ccval;
+		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
 	}
 
 detect_out:
-	dccp_rx_hist_add_packet(&hcrx->ccid3hcrx_hist,
+	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist,
 				&hcrx->ccid3hcrx_li_hist, packet,
 				hcrx->ccid3hcrx_seqno_nonloss);
 	return loss;
@@ -815,7 +815,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
 	const struct dccp_options_received *opt_recv;
-	struct dccp_rx_hist_entry *packet;
+	struct tfrc_rx_hist_entry *packet;
 	u32 p_prev, r_sample, rtt_prev;
 	int loss, payload_size;
 	ktime_t now;
@@ -850,7 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
-	packet = dccp_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC);
+	packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC);
 	if (unlikely(packet == NULL)) {
 		DCCP_WARN("%s(%p), Not enough mem to add rx packet "
 			  "to history, consider it lost!\n", dccp_role(sk), sk);
@@ -933,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk)
 	ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM);
 
 	/* Empty packet history */
-	dccp_rx_hist_purge(&hcrx->ccid3hcrx_hist);
+	tfrc_rx_hist_purge(&hcrx->ccid3hcrx_hist);
 
 	/* Empty loss interval history */
 	dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist);
diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c
index f2ca4eb..a5f59af 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -129,7 +129,7 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 				 u16 s, u32 bytes_recv,
 				 u32 previous_x_recv)
 {
-	struct dccp_rx_hist_entry *entry, *next, *tail = NULL;
+	struct tfrc_rx_hist_entry *entry, *next, *tail = NULL;
 	u32 x_recv, p;
 	suseconds_t rtt, delta;
 	ktime_t tstamp = ktime_set(0, 0);
@@ -138,18 +138,18 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 	int step = 0;
 	u64 fval;
 
-	list_for_each_entry_safe(entry, next, hist_list, dccphrx_node) {
-		if (dccp_rx_hist_entry_data_packet(entry)) {
+	list_for_each_entry_safe(entry, next, hist_list, tfrchrx_node) {
+		if (tfrc_rx_hist_entry_data_packet(entry)) {
 			tail = entry;
 
 			switch (step) {
 			case 0:
-				tstamp	  = entry->dccphrx_tstamp;
-				win_count = entry->dccphrx_ccval;
+				tstamp	  = entry->tfrchrx_tstamp;
+				win_count = entry->tfrchrx_ccval;
 				step = 1;
 				break;
 			case 1:
-				interval = win_count - entry->dccphrx_ccval;
+				interval = win_count - entry->tfrchrx_ccval;
 				if (interval < 0)
 					interval += TFRC_WIN_COUNT_LIMIT;
 				if (interval > 4)
@@ -176,7 +176,7 @@ found:
 		return ~0;
 	}
 
-	delta = ktime_us_delta(tstamp, tail->dccphrx_tstamp);
+	delta = ktime_us_delta(tstamp, tail->tfrchrx_tstamp);
 	DCCP_BUG_ON(delta < 0);
 
 	rtt = delta * 4 / interval;
diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
index e1ab853..255cca1 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -116,58 +116,58 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
  */
 static struct kmem_cache *tfrc_rx_hist_slab;
 
-struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp,
+struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp,
 						  const struct sk_buff *skb,
 						  const gfp_t prio)
 {
-	struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab,
+	struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab,
 							    prio);
 
 	if (entry != NULL) {
 		const struct dccp_hdr *dh = dccp_hdr(skb);
 
-		entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
-		entry->dccphrx_ccval = dh->dccph_ccval;
-		entry->dccphrx_type  = dh->dccph_type;
-		entry->dccphrx_ndp   = ndp;
-		entry->dccphrx_tstamp = ktime_get_real();
+		entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
+		entry->tfrchrx_ccval = dh->dccph_ccval;
+		entry->tfrchrx_type  = dh->dccph_type;
+		entry->tfrchrx_ndp   = ndp;
+		entry->tfrchrx_tstamp = ktime_get_real();
 	}
 
 	return entry;
 }
-EXPORT_SYMBOL_GPL(dccp_rx_hist_entry_new);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new);
 
-static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist_entry *entry)
+static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry)
 {
 	kmem_cache_free(tfrc_rx_hist_slab, entry);
 }
 
-int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
+int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
 			    u8 *ccval)
 {
-	struct dccp_rx_hist_entry *packet = NULL, *entry;
+	struct tfrc_rx_hist_entry *packet = NULL, *entry;
 
-	list_for_each_entry(entry, list, dccphrx_node)
-		if (entry->dccphrx_seqno == seq) {
+	list_for_each_entry(entry, list, tfrchrx_node)
+		if (entry->tfrchrx_seqno == seq) {
 			packet = entry;
 			break;
 		}
 
 	if (packet)
-		*ccval = packet->dccphrx_ccval;
+		*ccval = packet->tfrchrx_ccval;
 
 	return packet != NULL;
 }
 
-EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry);
-struct dccp_rx_hist_entry *
-		dccp_rx_hist_find_data_packet(const struct list_head *list)
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry);
+struct tfrc_rx_hist_entry *
+		tfrc_rx_hist_find_data_packet(const struct list_head *list)
 {
-	struct dccp_rx_hist_entry *entry, *packet = NULL;
+	struct tfrc_rx_hist_entry *entry, *packet = NULL;
 
-	list_for_each_entry(entry, list, dccphrx_node)
-		if (entry->dccphrx_type == DCCP_PKT_DATA ||
-		    entry->dccphrx_type == DCCP_PKT_DATAACK) {
+	list_for_each_entry(entry, list, tfrchrx_node)
+		if (entry->tfrchrx_type == DCCP_PKT_DATA ||
+		    entry->tfrchrx_type == DCCP_PKT_DATAACK) {
 			packet = entry;
 			break;
 		}
@@ -175,29 +175,29 @@ struct dccp_rx_hist_entry *
 	return packet;
 }
 
-EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet);
 
-void dccp_rx_hist_add_packet(struct list_head *rx_list,
+void tfrc_rx_hist_add_packet(struct list_head *rx_list,
 			     struct list_head *li_list,
-			     struct dccp_rx_hist_entry *packet,
+			     struct tfrc_rx_hist_entry *packet,
 			     u64 nonloss_seqno)
 {
-	struct dccp_rx_hist_entry *entry, *next;
+	struct tfrc_rx_hist_entry *entry, *next;
 	u8 num_later = 0;
 
-	list_add(&packet->dccphrx_node, rx_list);
+	list_add(&packet->tfrchrx_node, rx_list);
 
 	num_later = TFRC_RECV_NUM_LATE_LOSS + 1;
 
 	if (!list_empty(li_list)) {
-		list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) {
+		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
 			if (num_later == 0) {
 				if (after48(nonloss_seqno,
-				   entry->dccphrx_seqno)) {
-					list_del_init(&entry->dccphrx_node);
-					dccp_rx_hist_entry_delete(entry);
+				   entry->tfrchrx_seqno)) {
+					list_del_init(&entry->tfrchrx_node);
+					tfrc_rx_hist_entry_delete(entry);
 				}
-			} else if (dccp_rx_hist_entry_data_packet(entry))
+			} else if (tfrc_rx_hist_entry_data_packet(entry))
 				--num_later;
 		}
 	} else {
@@ -208,7 +208,7 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list,
 		 * We have no loss interval history so we need at least one
 		 * rtt:s of data packets to approximate rtt.
 		 */
-		list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) {
+		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
 			if (num_later == 0) {
 				switch (step) {
 				case 0:
@@ -220,10 +220,10 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list,
 					step = 2;
 					/* OK, find next data packet */
 					num_later = 1;
-					win_count = entry->dccphrx_ccval;
+					win_count = entry->tfrchrx_ccval;
 					break;
 				case 2:
-					tmp = win_count - entry->dccphrx_ccval;
+					tmp = win_count - entry->tfrchrx_ccval;
 					if (tmp < 0)
 						tmp += TFRC_WIN_COUNT_LIMIT;
 					if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) {
@@ -236,29 +236,29 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list,
 						num_later = 1;
 					break;
 				case 3:
-					list_del_init(&entry->dccphrx_node);
-					dccp_rx_hist_entry_delete(entry);
+					list_del_init(&entry->tfrchrx_node);
+					tfrc_rx_hist_entry_delete(entry);
 					break;
 				}
-			} else if (dccp_rx_hist_entry_data_packet(entry))
+			} else if (tfrc_rx_hist_entry_data_packet(entry))
 				--num_later;
 		}
 	}
 }
 
-EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
 
-void dccp_rx_hist_purge(struct list_head *list)
+void tfrc_rx_hist_purge(struct list_head *list)
 {
-	struct dccp_rx_hist_entry *entry, *next;
+	struct tfrc_rx_hist_entry *entry, *next;
 
-	list_for_each_entry_safe(entry, next, list, dccphrx_node) {
-		list_del_init(&entry->dccphrx_node);
-		dccp_rx_hist_entry_delete(entry);
+	list_for_each_entry_safe(entry, next, list, tfrchrx_node) {
+		list_del_init(&entry->tfrchrx_node);
+		tfrc_rx_hist_entry_delete(entry);
 	}
 }
 
-EXPORT_SYMBOL_GPL(dccp_rx_hist_purge);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
 
 __init int packet_history_init(void)
 {
@@ -269,7 +269,7 @@ __init int packet_history_init(void)
 		goto out_err;
 
 	tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist",
-					      sizeof(struct dccp_rx_hist_entry), 0,
+					      sizeof(struct tfrc_rx_hist_entry), 0,
 					      SLAB_HWCACHE_ALIGN, NULL);
 	if (tfrc_rx_hist_slab == NULL)
 		goto out_free_tx;
diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h
index 34b180b..5b0b983 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -57,51 +57,51 @@ extern u32  tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head,
 /*
  * 	Receiver History data structures and declarations
  */
-struct dccp_rx_hist_entry {
-	struct list_head dccphrx_node;
-	u64		 dccphrx_seqno:48,
-			 dccphrx_ccval:4,
-			 dccphrx_type:4;
-	u32		 dccphrx_ndp; /* In fact it is from 8 to 24 bits */
-	ktime_t		 dccphrx_tstamp;
+struct tfrc_rx_hist_entry {
+	struct list_head tfrchrx_node;
+	u64		 tfrchrx_seqno:48,
+			 tfrchrx_ccval:4,
+			 tfrchrx_type:4;
+	u32		 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
+	ktime_t		 tfrchrx_tstamp;
 };
 
-extern struct dccp_rx_hist_entry *
-			dccp_rx_hist_entry_new(const u32 ndp,
+extern struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_entry_new(const u32 ndp,
 					       const struct sk_buff *skb,
 					       const gfp_t prio);
 
-static inline struct dccp_rx_hist_entry *
-			dccp_rx_hist_head(struct list_head *list)
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_head(struct list_head *list)
 {
-	struct dccp_rx_hist_entry *head = NULL;
+	struct tfrc_rx_hist_entry *head = NULL;
 
 	if (!list_empty(list))
-		head = list_entry(list->next, struct dccp_rx_hist_entry,
-				  dccphrx_node);
+		head = list_entry(list->next, struct tfrc_rx_hist_entry,
+				  tfrchrx_node);
 	return head;
 }
 
-extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq,
+extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
 				   u8 *ccval);
-extern struct dccp_rx_hist_entry *
-		dccp_rx_hist_find_data_packet(const struct list_head *list);
+extern struct tfrc_rx_hist_entry *
+		tfrc_rx_hist_find_data_packet(const struct list_head *list);
 
-extern void dccp_rx_hist_add_packet(struct list_head *rx_list,
+extern void tfrc_rx_hist_add_packet(struct list_head *rx_list,
 				    struct list_head *li_list,
-				    struct dccp_rx_hist_entry *packet,
+				    struct tfrc_rx_hist_entry *packet,
 				    u64 nonloss_seqno);
 
-extern void dccp_rx_hist_purge(struct list_head *list);
+extern void tfrc_rx_hist_purge(struct list_head *list);
 
 static inline int
-	dccp_rx_hist_entry_data_packet(const struct dccp_rx_hist_entry *entry)
+	tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry)
 {
-	return entry->dccphrx_type == DCCP_PKT_DATA ||
-	       entry->dccphrx_type == DCCP_PKT_DATAACK;
+	return entry->tfrchrx_type == DCCP_PKT_DATA ||
+	       entry->tfrchrx_type == DCCP_PKT_DATAACK;
 }
 
-extern u64 dccp_rx_hist_detect_loss(struct list_head *rx_list,
+extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list,
 				    struct list_head *li_list, u8 *win_loss);
 
 #endif /* _DCCP_PKT_HIST_ */
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values
  2007-12-02 21:36         ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo
@ 2007-12-02 21:36           ` Arnaldo Carvalho de Melo
  2007-12-02 21:36             ` [PATCH 7/7] [TFRC] New rx history code Arnaldo Carvalho de Melo
  2007-12-06 13:59           ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Gerrit Renker
  1 sibling, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker
  Cc: netdev, dccp, Ingo Molnar, Gerrit Renker, Ian McDonald,
	Arnaldo Carvalho de Melo

From: Gerrit Renker <gerrit@erg.abdn.ac.uk>

Only the sender sets window counters [RFC 4342, sections 5 and 8.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/ccid3.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index c95dca8..5ff5aab 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -733,7 +733,6 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
 		return 0;
 
 	hcrx = ccid3_hc_rx_sk(sk);
-	DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_ccval_last_counter;
 
 	if (dccp_packet_without_ack(skb))
 		return 0;
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/7] [TFRC] New rx history code
  2007-12-02 21:36           ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo
@ 2007-12-02 21:36             ` Arnaldo Carvalho de Melo
  2007-12-04  6:55               ` Gerrit Renker
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
  To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo

Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, hide details from
the CCIDs and fix a couple bugs found in the initial implementation.

Original changeset comment from Gerrit:
      -----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.
      -----------

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/ccid3.c                       |  255 ++++++++----------------
 net/dccp/ccids/ccid3.h                       |   14 +-
 net/dccp/ccids/lib/loss_interval.c           |   14 +-
 net/dccp/ccids/lib/packet_history.c          |  277 +++++++++++++++-----------
 net/dccp/ccids/lib/packet_history.h          |   82 +++-----
 net/dccp/ccids/lib/packet_history_internal.h |   67 ++++++
 6 files changed, 353 insertions(+), 356 deletions(-)
 create mode 100644 net/dccp/ccids/lib/packet_history_internal.h

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 5ff5aab..af64c1d 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len,
 /*
  *	Receiver Half-Connection Routines
  */
+
+/* CCID3 feedback types */
+enum ccid3_fback_type {
+	CCID3_FBACK_NONE = 0,
+	CCID3_FBACK_INITIAL,
+	CCID3_FBACK_PERIODIC,
+	CCID3_FBACK_PARAM_CHANGE
+};
+
 #ifdef CONFIG_IP_DCCP_CCID3_DEBUG
 static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state)
 {
@@ -673,53 +682,60 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
 		hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
 }
 
-static void ccid3_hc_rx_send_feedback(struct sock *sk)
+static void ccid3_hc_rx_send_feedback(struct sock *sk,
+				      const struct sk_buff *skb,
+				      enum ccid3_fback_type fbtype)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
-	struct tfrc_rx_hist_entry *packet;
 	ktime_t now;
-	suseconds_t delta;
+	s64 delta = 0;
 
 	ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
 
+	if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM))
+		return;
+
 	now = ktime_get_real();
 
-	switch (hcrx->ccid3hcrx_state) {
-	case TFRC_RSTATE_NO_DATA:
+	switch (fbtype) {
+	case CCID3_FBACK_INITIAL:
 		hcrx->ccid3hcrx_x_recv = 0;
+		hcrx->ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
 		break;
-	case TFRC_RSTATE_DATA:
-		delta = ktime_us_delta(now,
-				       hcrx->ccid3hcrx_tstamp_last_feedback);
-		DCCP_BUG_ON(delta < 0);
-		hcrx->ccid3hcrx_x_recv =
-			scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
+	case CCID3_FBACK_PARAM_CHANGE:
+		/*
+		 * When parameters change (new loss or p > p_prev), we do not
+		 * have a reliable estimate for R_m of [RFC 3448, 6.2] and so
+		 * need to  reuse the previous value of X_recv. However, when
+		 * X_recv was 0 (due to early loss), this would kill X down to
+		 * s/t_mbi (i.e. one packet in 64 seconds).
+		 * To avoid such drastic reduction, we approximate X_recv as
+		 * the number of bytes since last feedback.
+		 * This is a safe fallback, since X is bounded above by X_calc.
+		 */
+		if (hcrx->ccid3hcrx_x_recv > 0)
+			break;
+		/* fall through */
+	case CCID3_FBACK_PERIODIC:
+		delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback);
+		if (delta <= 0)
+			DCCP_BUG("delta (%ld) <= 0", (long)delta);
+		else
+			hcrx->ccid3hcrx_x_recv =
+				scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
 		break;
-	case TFRC_RSTATE_TERM:
-		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
+	default:
 		return;
 	}
 
-	packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
-	if (unlikely(packet == NULL)) {
-		DCCP_WARN("%s(%p), no data packet in history!\n",
-			  dccp_role(sk), sk);
-		return;
-	}
+	ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta,
+		       hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv);
 
 	hcrx->ccid3hcrx_tstamp_last_feedback = now;
-	hcrx->ccid3hcrx_ccval_last_counter   = packet->tfrchrx_ccval;
+	hcrx->ccid3hcrx_last_counter	     = dccp_hdr(skb)->dccph_ccval;
 	hcrx->ccid3hcrx_bytes_recv	     = 0;
 
-	if (hcrx->ccid3hcrx_p == 0)
-		hcrx->ccid3hcrx_pinv = ~0U;	/* see RFC 4342, 8.5 */
-	else if (hcrx->ccid3hcrx_p > 1000000) {
-		DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p);
-		hcrx->ccid3hcrx_pinv = 1;	/* use 100% in this case */
-	} else
-		hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p;
-
 	dp->dccps_hc_rx_insert_options = 1;
 	dccp_send_ack(sk);
 }
@@ -753,162 +769,52 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
 static int ccid3_hc_rx_detect_loss(struct sock *sk,
 				    struct tfrc_rx_hist_entry *packet)
 {
-	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
-	struct tfrc_rx_hist_entry *rx_hist =
-				tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist);
-	u64 seqno = packet->tfrchrx_seqno;
-	u64 tmp_seqno;
-	int loss = 0;
-	u8 ccval;
-
-
-	tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
-
-	if (!rx_hist ||
-	   follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
-		hcrx->ccid3hcrx_seqno_nonloss = seqno;
-		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
-		goto detect_out;
-	}
-
-
-	while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno)
-	   > TFRC_RECV_NUM_LATE_LOSS) {
-		loss = 1;
-		dccp_li_update_li(sk,
-				  &hcrx->ccid3hcrx_li_hist,
-				  &hcrx->ccid3hcrx_hist,
-				  hcrx->ccid3hcrx_tstamp_last_feedback,
-				  hcrx->ccid3hcrx_s,
-				  hcrx->ccid3hcrx_bytes_recv,
-				  hcrx->ccid3hcrx_x_recv,
-				  hcrx->ccid3hcrx_seqno_nonloss,
-				  hcrx->ccid3hcrx_ccval_nonloss);
-		tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
-		dccp_inc_seqno(&tmp_seqno);
-		hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
-		dccp_inc_seqno(&tmp_seqno);
-		while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist,
-		   tmp_seqno, &ccval)) {
-			hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
-			hcrx->ccid3hcrx_ccval_nonloss = ccval;
-			dccp_inc_seqno(&tmp_seqno);
-		}
-	}
-
-	/* FIXME - this code could be simplified with above while */
-	/* but works at moment */
-	if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
-		hcrx->ccid3hcrx_seqno_nonloss = seqno;
-		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
-	}
-
-detect_out:
-	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist,
-				&hcrx->ccid3hcrx_li_hist, packet,
-				hcrx->ccid3hcrx_seqno_nonloss);
-	return loss;
+	return 0;
 }
 
 static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
-	const struct dccp_options_received *opt_recv;
-	struct tfrc_rx_hist_entry *packet;
-	u32 p_prev, r_sample, rtt_prev;
-	int loss, payload_size;
-	ktime_t now;
-
-	opt_recv = &dccp_sk(sk)->dccps_options_received;
+	const u32 payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
+	const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp;
+	enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE;
+	const bool is_data_packet = dccp_data_packet(skb);
 
-	switch (DCCP_SKB_CB(skb)->dccpd_type) {
-	case DCCP_PKT_ACK:
-		if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)
+	if (hcrx->ccid3hcrx_state != TFRC_RSTATE_NO_DATA) {
+		if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb))
 			return;
-	case DCCP_PKT_DATAACK:
-		if (opt_recv->dccpor_timestamp_echo == 0)
-			break;
-		r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo;
-		rtt_prev = hcrx->ccid3hcrx_rtt;
-		r_sample = dccp_sample_rtt(sk, 10 * r_sample);
-
-		if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)
-			hcrx->ccid3hcrx_rtt = r_sample;
-		else
-			hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 +
-					      r_sample / 10;
-
-		if (rtt_prev != hcrx->ccid3hcrx_rtt)
-			ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n",
-				       dccp_role(sk), sk, hcrx->ccid3hcrx_rtt,
-				       opt_recv->dccpor_elapsed_time);
-		break;
-	case DCCP_PKT_DATA:
-		break;
-	default: /* We're not interested in other packet types, move along */
-		return;
-	}
-
-	packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC);
-	if (unlikely(packet == NULL)) {
-		DCCP_WARN("%s(%p), Not enough mem to add rx packet "
-			  "to history, consider it lost!\n", dccp_role(sk), sk);
-		return;
-	}
-
-	loss = ccid3_hc_rx_detect_loss(sk, packet);
-
-	if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK)
-		return;
-
-	payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
-	ccid3_hc_rx_update_s(hcrx, payload_size);
 
-	switch (hcrx->ccid3hcrx_state) {
-	case TFRC_RSTATE_NO_DATA:
-		ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial "
-			       "feedback\n", dccp_role(sk), sk,
-			       dccp_state_name(sk->sk_state), skb);
-		ccid3_hc_rx_send_feedback(sk);
-		ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
-		return;
-	case TFRC_RSTATE_DATA:
-		hcrx->ccid3hcrx_bytes_recv += payload_size;
-		if (loss)
-			break;
-
-		now = ktime_get_real();
-		if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) -
-		     (s64)hcrx->ccid3hcrx_rtt) >= 0) {
-			hcrx->ccid3hcrx_tstamp_last_ack = now;
-			ccid3_hc_rx_send_feedback(sk);
+		    /* Handle pending losses and otherwise check for new loss */
+		if (!tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist,
+						     skb, ndp) &&
+		    /* Handle data packets: RTT sampling and monitoring p */
+		    is_data_packet)  {
+
+			if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
+				const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
+				if (sample != 0)
+					hcrx->ccid3hcrx_rtt =
+						tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9);
+			}
+
+			ccid3_hc_rx_update_s(hcrx, payload_size);
+			/* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */
+			if (SUB16(dccp_hdr(skb)->dccph_ccval,
+				  hcrx->ccid3hcrx_last_counter) > 3)
+				do_feedback = CCID3_FBACK_PERIODIC;
 		}
-		return;
-	case TFRC_RSTATE_TERM:
-		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
-		return;
-	}
-
-	/* Dealing with packet loss */
-	ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n",
-		       dccp_role(sk), sk, dccp_state_name(sk->sk_state));
-
-	p_prev = hcrx->ccid3hcrx_p;
+	} else if (is_data_packet) {
+		do_feedback = CCID3_FBACK_INITIAL;
+		ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
+		hcrx->ccid3hcrx_s = payload_size;
+ 	}
 
-	/* Calculate loss event rate */
-	if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
-		u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+	hcrx->ccid3hcrx_bytes_recv += payload_size;
 
-		/* Scaling up by 1000000 as fixed decimal */
-		if (i_mean != 0)
-			hcrx->ccid3hcrx_p = 1000000 / i_mean;
-	} else
-		DCCP_BUG("empty loss history");
+	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp);
 
-	if (hcrx->ccid3hcrx_p > p_prev) {
-		ccid3_hc_rx_send_feedback(sk);
-		return;
-	}
+	if (do_feedback)
+		ccid3_hc_rx_send_feedback(sk, skb, do_feedback);
 }
 
 static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
@@ -918,11 +824,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
 	ccid3_pr_debug("entry\n");
 
 	hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
-	INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist);
 	INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist);
-	hcrx->ccid3hcrx_tstamp_last_feedback =
-		hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real();
-	return 0;
+	return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist);
 }
 
 static void ccid3_hc_rx_exit(struct sock *sk)
diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
index b842a7d..3c33dc6 100644
--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -1,7 +1,8 @@
 /*
  *  net/dccp/ccids/ccid3.h
  *
- *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *
  *  An implementation of the DCCP protocol
  *
@@ -135,9 +136,7 @@ enum ccid3_hc_rx_states {
  *  @ccid3hcrx_x_recv  -  Receiver estimate of send rate (RFC 3448 4.3)
  *  @ccid3hcrx_rtt  -  Receiver estimate of rtt (non-standard)
  *  @ccid3hcrx_p  -  current loss event rate (RFC 3448 5.4)
- *  @ccid3hcrx_seqno_nonloss  -  Last received non-loss sequence number
- *  @ccid3hcrx_ccval_nonloss  -  Last received non-loss Window CCVal
- *  @ccid3hcrx_ccval_last_counter  -  Tracks window counter (RFC 4342, 8.1)
+ *  @ccid3hcrx_last_counter  -  Tracks window counter (RFC 4342, 8.1)
  *  @ccid3hcrx_state  -  receiver state, one of %ccid3_hc_rx_states
  *  @ccid3hcrx_bytes_recv  -  Total sum of DCCP payload bytes
  *  @ccid3hcrx_tstamp_last_feedback  -  Time at which last feedback was sent
@@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock {
 #define ccid3hcrx_x_recv		ccid3hcrx_tfrc.tfrcrx_x_recv
 #define ccid3hcrx_rtt			ccid3hcrx_tfrc.tfrcrx_rtt
 #define ccid3hcrx_p			ccid3hcrx_tfrc.tfrcrx_p
-	u64				ccid3hcrx_seqno_nonloss:48,
-					ccid3hcrx_ccval_nonloss:4,
-					ccid3hcrx_ccval_last_counter:4;
+	u8				ccid3hcrx_last_counter:4;
 	enum ccid3_hc_rx_states		ccid3hcrx_state:8;
 	u32				ccid3hcrx_bytes_recv;
 	ktime_t				ccid3hcrx_tstamp_last_feedback;
-	ktime_t				ccid3hcrx_tstamp_last_ack;
-	struct list_head		ccid3hcrx_hist;
+	struct tfrc_rx_hist		ccid3hcrx_hist;
 	struct list_head		ccid3hcrx_li_hist;
 	u16				ccid3hcrx_s;
 	u32				ccid3hcrx_pinv;
diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c
index a5f59af..71080ca 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -16,6 +16,7 @@
 #include "../../dccp.h"
 #include "loss_interval.h"
 #include "packet_history.h"
+#include "packet_history_internal.h"
 #include "tfrc.h"
 
 #define DCCP_LI_HIST_IVAL_F_LENGTH  8
@@ -129,6 +130,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 				 u16 s, u32 bytes_recv,
 				 u32 previous_x_recv)
 {
+/*
+ * FIXME:
+ * Will be rewritten in the upcoming new loss intervals code. 
+ * Has to be commented ou because it relies on the old rx history
+ * data structures
+ */
+#if 0
 	struct tfrc_rx_hist_entry *entry, *next, *tail = NULL;
 	u32 x_recv, p;
 	suseconds_t rtt, delta;
@@ -216,10 +224,10 @@ found:
 	dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied "
 		      "loss rate=%u\n", dccp_role(sk), sk, x_recv, p);
 
-	if (p == 0)
-		return ~0;
-	else
+	if (p != 0)
 		return 1000000 / p;
+#endif
+	return ~0;
 }
 
 void dccp_li_update_li(struct sock *sk,
diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
index 255cca1..80ad4cd 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -36,7 +36,10 @@
  */
 
 #include <linux/string.h>
+#include <linux/slab.h>
 #include "packet_history.h"
+#include "../../dccp.h"
+#include "packet_history_internal.h"
 
 /**
  *  tfrc_tx_hist_entry  -  Simple singly-linked TX history list
@@ -111,154 +114,195 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno,
 }
 EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
 
-/*
- * 	Receiver History Routines
+/**
+ * tfrc_rx_hist_index - index to reach n-th entry after loss_start
  */
-static struct kmem_cache *tfrc_rx_hist_slab;
+static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
+{
+	return (h->loss_start + n) & TFRC_NDUPACK;
+}
 
-struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp,
-						  const struct sk_buff *skb,
-						  const gfp_t prio)
+/**
+ * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab,
-							    prio);
+	return h->ring + tfrc_rx_hist_index(h, h->loss_count);
+}
 
-	if (entry != NULL) {
-		const struct dccp_hdr *dh = dccp_hdr(skb);
+/**
+ * tfrc_rx_hist_entry - return the n-th history entry after loss_start
+ */
+static inline struct tfrc_rx_hist_entry *
+		tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n)
+{
+	return h->ring + tfrc_rx_hist_index(h, n);
+}
 
-		entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
-		entry->tfrchrx_ccval = dh->dccph_ccval;
-		entry->tfrchrx_type  = dh->dccph_type;
-		entry->tfrchrx_ndp   = ndp;
-		entry->tfrchrx_tstamp = ktime_get_real();
-	}
+/**
+ * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h)
+{
+	return h->ring + h->loss_start;
+}
+
+/*
+ * 	Receiver History Routines
+ */
+static struct kmem_cache *tfrc_rx_hist_slab;
 
-	return entry;
+void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
+			     const struct sk_buff *skb,
+			     const u32 ndp)
+{
+	struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h);
+	const struct dccp_hdr *dh = dccp_hdr(skb);
+
+	entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
+	entry->tfrchrx_ccval = dh->dccph_ccval;
+	entry->tfrchrx_type  = dh->dccph_type;
+	entry->tfrchrx_ndp   = ndp;
+	entry->tfrchrx_tstamp = ktime_get_real();
 }
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
 
 static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry)
 {
 	kmem_cache_free(tfrc_rx_hist_slab, entry);
 }
 
-int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
-			    u8 *ccval)
+/* has the packet contained in skb been seen before? */
+int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
 {
-	struct tfrc_rx_hist_entry *packet = NULL, *entry;
+	const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq;
+	int i;
 
-	list_for_each_entry(entry, list, tfrchrx_node)
-		if (entry->tfrchrx_seqno == seq) {
-			packet = entry;
-			break;
-		}
+	if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0)
+		return 1;
 
-	if (packet)
-		*ccval = packet->tfrchrx_ccval;
+	for (i = 1; i <= h->loss_count; i++)
+		if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq)
+			return 1;
 
-	return packet != NULL;
+	return 0;
 }
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate);
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry);
-struct tfrc_rx_hist_entry *
-		tfrc_rx_hist_find_data_packet(const struct list_head *list)
+/* initialise loss detection and disable RTT sampling */
+static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry, *packet = NULL;
+	h->loss_count = 1;
+}
 
-	list_for_each_entry(entry, list, tfrchrx_node)
-		if (entry->tfrchrx_type == DCCP_PKT_DATA ||
-		    entry->tfrchrx_type == DCCP_PKT_DATAACK) {
-			packet = entry;
-			break;
-		}
+/* indicate whether previously a packet was detected missing */
+static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
+{
+	return h->loss_count;
+}
 
-	return packet;
+/* any data packets missing between last reception and skb ? */
+int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
+				    const struct sk_buff *skb, u32 ndp)
+{
+	int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno,
+				     DCCP_SKB_CB(skb)->dccpd_seq);
+
+	if (delta > 1 && ndp < delta)
+		tfrc_rx_hist_loss_indicated(h);
+
+	return tfrc_rx_hist_loss_pending(h);
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated);
+
+void tfrc_rx_hist_purge(struct tfrc_rx_hist *h)
+{
+	kmem_cache_free(tfrc_rx_hist_slab, h->ring);
+	h->ring = NULL;
 }
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
 
-void tfrc_rx_hist_add_packet(struct list_head *rx_list,
-			     struct list_head *li_list,
-			     struct tfrc_rx_hist_entry *packet,
-			     u64 nonloss_seqno)
+int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry, *next;
-	u8 num_later = 0;
-
-	list_add(&packet->tfrchrx_node, rx_list);
-
-	num_later = TFRC_RECV_NUM_LATE_LOSS + 1;
-
-	if (!list_empty(li_list)) {
-		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
-			if (num_later == 0) {
-				if (after48(nonloss_seqno,
-				   entry->tfrchrx_seqno)) {
-					list_del_init(&entry->tfrchrx_node);
-					tfrc_rx_hist_entry_delete(entry);
-				}
-			} else if (tfrc_rx_hist_entry_data_packet(entry))
-				--num_later;
-		}
-	} else {
-		int step = 0;
-		u8 win_count = 0; /* Not needed, but lets shut up gcc */
-		int tmp;
-		/*
-		 * We have no loss interval history so we need at least one
-		 * rtt:s of data packets to approximate rtt.
-		 */
-		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
-			if (num_later == 0) {
-				switch (step) {
-				case 0:
-					step = 1;
-					/* OK, find next data packet */
-					num_later = 1;
-					break;
-				case 1:
-					step = 2;
-					/* OK, find next data packet */
-					num_later = 1;
-					win_count = entry->tfrchrx_ccval;
-					break;
-				case 2:
-					tmp = win_count - entry->tfrchrx_ccval;
-					if (tmp < 0)
-						tmp += TFRC_WIN_COUNT_LIMIT;
-					if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) {
-						/*
-						 * We have found a packet older
-						 * than one rtt remove the rest
-						 */
-						step = 3;
-					} else /* OK, find next data packet */
-						num_later = 1;
-					break;
-				case 3:
-					list_del_init(&entry->tfrchrx_node);
-					tfrc_rx_hist_entry_delete(entry);
-					break;
-				}
-			} else if (tfrc_rx_hist_entry_data_packet(entry))
-				--num_later;
-		}
-	}
+	h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
+	h->loss_count = h->loss_start = 0;
+	return h->ring == NULL;
 }
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc);
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
+/**
+ * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h)
+{
+	return h->ring;
+}
 
-void tfrc_rx_hist_purge(struct list_head *list)
+/**
+ * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry, *next;
+	return h->ring + h->rtt_sample_prev;
+}
 
-	list_for_each_entry_safe(entry, next, list, tfrchrx_node) {
-		list_del_init(&entry->tfrchrx_node);
-		tfrc_rx_hist_entry_delete(entry);
+/**
+ * tfrc_rx_hist_sample_rtt  -  Sample RTT from timestamp / CCVal
+ * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able
+ * to compute a sample with given data - calling function should check this.
+ */
+u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb)
+{
+	u32 sample = 0,
+	    delta_v = SUB16(dccp_hdr(skb)->dccph_ccval,
+	    		    tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+
+	if (delta_v < 1 || delta_v > 4) {	/* unsuitable CCVal delta */
+		if (h->rtt_sample_prev == 2) {	/* previous candidate stored */
+			sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval,
+				       tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+			if (sample)
+				sample = 4 / sample *
+				         ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp,
+							tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp);
+			else    /*
+				 * FIXME: This condition is in principle not
+				 * possible but occurs when CCID is used for
+				 * two-way data traffic. I have tried to trace
+				 * it, but the cause does not seem to be here.
+				 */
+				DCCP_BUG("please report to dccp@vger.kernel.org"
+					 " => prev = %u, last = %u",
+					 tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval,
+					 tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+		} else if (delta_v < 1) {
+			h->rtt_sample_prev = 1;
+			goto keep_ref_for_next_time;
+		}
+
+	} else if (delta_v == 4) /* optimal match */
+		sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp));
+	else {			 /* suboptimal match */
+		h->rtt_sample_prev = 2;
+		goto keep_ref_for_next_time;
 	}
-}
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
+	if (unlikely(sample > DCCP_SANE_RTT_MAX)) {
+		DCCP_WARN("RTT sample %u too large, using max\n", sample);
+		sample = DCCP_SANE_RTT_MAX;
+	}
+
+	h->rtt_sample_prev = 0;	       /* use current entry as next reference */
+keep_ref_for_next_time:
+
+	return sample;
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt);
 
 __init int packet_history_init(void)
 {
@@ -269,7 +313,8 @@ __init int packet_history_init(void)
 		goto out_err;
 
 	tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist",
-					      sizeof(struct tfrc_rx_hist_entry), 0,
+					      (sizeof(struct tfrc_rx_hist_entry) *
+					       (TFRC_NDUPACK + 1)), 0,
 					      SLAB_HWCACHE_ALIGN, NULL);
 	if (tfrc_rx_hist_slab == NULL)
 		goto out_free_tx;
diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h
index 5b0b983..6871e51 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -37,15 +37,9 @@
 #define _DCCP_PKT_HIST_
 
 #include <linux/ktime.h>
-#include <linux/list.h>
-#include <linux/slab.h>
-#include "tfrc.h"
+#include <linux/types.h>
 
-/* Number of later packets received before one is considered lost */
-#define TFRC_RECV_NUM_LATE_LOSS	 3
-
-#define TFRC_WIN_COUNT_PER_RTT	 4
-#define TFRC_WIN_COUNT_LIMIT	16
+struct sk_buff;
 
 struct tfrc_tx_hist_entry;
 
@@ -54,54 +48,38 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp);
 extern u32  tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head,
 			     const u64 seqno, const ktime_t now);
 
-/*
- * 	Receiver History data structures and declarations
- */
-struct tfrc_rx_hist_entry {
-	struct list_head tfrchrx_node;
-	u64		 tfrchrx_seqno:48,
-			 tfrchrx_ccval:4,
-			 tfrchrx_type:4;
-	u32		 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
-	ktime_t		 tfrchrx_tstamp;
-};
-
-extern struct tfrc_rx_hist_entry *
-			tfrc_rx_hist_entry_new(const u32 ndp,
-					       const struct sk_buff *skb,
-					       const gfp_t prio);
+struct tfrc_rx_hist_entry;
 
-static inline struct tfrc_rx_hist_entry *
-			tfrc_rx_hist_head(struct list_head *list)
-{
-	struct tfrc_rx_hist_entry *head = NULL;
+/* Subtraction a-b modulo-16, respects circular wrap-around */
+#define SUB16(a, b) (((a) + 16 - (b)) & 0xF)
 
-	if (!list_empty(list))
-		head = list_entry(list->next, struct tfrc_rx_hist_entry,
-				  tfrchrx_node);
-	return head;
-}
+/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */
+#define TFRC_NDUPACK 3
 
-extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
-				   u8 *ccval);
-extern struct tfrc_rx_hist_entry *
-		tfrc_rx_hist_find_data_packet(const struct list_head *list);
-
-extern void tfrc_rx_hist_add_packet(struct list_head *rx_list,
-				    struct list_head *li_list,
-				    struct tfrc_rx_hist_entry *packet,
-				    u64 nonloss_seqno);
-
-extern void tfrc_rx_hist_purge(struct list_head *list);
+/**
+ * tfrc_rx_hist  -  RX history structure for TFRC-based protocols
+ *
+ * @ring:		Packet history for RTT sampling and loss detection
+ * @loss_count:		Number of entries in circular history
+ * @loss_start:		Movable index (for loss detection)
+ * @rtt_sample_prev:	Used during RTT sampling, points to candidate entry
+ */
+struct tfrc_rx_hist {
+	struct tfrc_rx_hist_entry *ring;
+	u8			  loss_count:2,
+				  loss_start:2;
+#define rtt_sample_prev		  loss_start
+};
 
-static inline int
-	tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry)
-{
-	return entry->tfrchrx_type == DCCP_PKT_DATA ||
-	       entry->tfrchrx_type == DCCP_PKT_DATAACK;
-}
+extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
+				    const struct sk_buff *skb, const u32 ndp);
 
-extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list,
-				    struct list_head *li_list, u8 *win_loss);
+extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb);
+extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
+					   const struct sk_buff *skb, u32 ndp);
+extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h,
+				   const struct sk_buff *skb);
+extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h);
+extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h);
 
 #endif /* _DCCP_PKT_HIST_ */
diff --git a/net/dccp/ccids/lib/packet_history_internal.h b/net/dccp/ccids/lib/packet_history_internal.h
new file mode 100644
index 0000000..70d5c31
--- /dev/null
+++ b/net/dccp/ccids/lib/packet_history_internal.h
@@ -0,0 +1,67 @@
+#ifndef _DCCP_PKT_HIST_INT_
+#define _DCCP_PKT_HIST_INT_
+/*
+ *  Packet RX/TX history data structures and routines for TFRC-based protocols.
+ *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
+ *
+ *  This code has been developed by the University of Waikato WAND
+ *  research group. For further information please see http://www.wand.net.nz/
+ *  or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz
+ *
+ *  This code also uses code from Lulea University, rereleased as GPL by its
+ *  authors:
+ *  Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
+ *
+ *  Changes to meet Linux coding standards, to make it meet latest ccid3 draft
+ *  and to make it work as a loadable module in the DCCP stack written by
+ *  Arnaldo Carvalho de Melo <acme@conectiva.com.br>.
+ *
+ *  Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/dccp.h>
+#include <linux/ktime.h>
+#include <linux/types.h>
+
+#define TFRC_WIN_COUNT_LIMIT 16
+
+/**
+ * tfrc_rx_hist_entry - Store information about a single received packet
+ * @tfrchrx_seqno:	DCCP packet sequence number
+ * @tfrchrx_ccval:	window counter value of packet (RFC 4342, 8.1)
+ * @tfrchrx_ndp:	the NDP count (if any) of the packet
+ * @tfrchrx_tstamp:	actual receive time of packet
+ */
+struct tfrc_rx_hist_entry {
+	u64	tfrchrx_seqno:48,
+		tfrchrx_ccval:4,
+		tfrchrx_type:4;
+	u32	tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
+	ktime_t	tfrchrx_tstamp;
+};
+
+static inline bool tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *h)
+{
+	return h->tfrchrx_type == DCCP_PKT_DATA	   ||
+	       h->tfrchrx_type == DCCP_PKT_DATAACK ||
+	       h->tfrchrx_type == DCCP_PKT_REQUEST ||
+	       h->tfrchrx_type == DCCP_PKT_RESPONSE;
+}
+
+#endif /* _DCCP_PKT_HIST_INT_ */
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo
  2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo
@ 2007-12-03  8:23 ` Ian McDonald
  2007-12-03  8:35 ` Gerrit Renker
  2 siblings, 0 replies; 27+ messages in thread
From: Ian McDonald @ 2007-12-03  8:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Gerrit Renker, netdev, dccp, Ingo Molnar

On 12/3/07, Arnaldo Carvalho de Melo <acme@redhat.com> wrote:
> WARNING: After reading some messages from Ingo Molnar on lkml I think we should really
>          trim the number of lists we use for kernel development. And since I moved
>          back to using mutt for reading e-mails, something I should have never, ever
>          stopped doing, I guess we should move the DCCP discussions to netdev,
>          where we hopefully can get more people interested and reviewing the work we
>          do, so please consider moving DCCP discussion to netdev@vger.kernel.org,
>          where lots of smart networking folks are present and can help our efforts
>          on turning RFCs to code.
>
I (and others too) don't necessarily have time to read netdev so would
vote on keeping dccp. I would totally agree to making sure that
cross-post to netdev as well as dccp.

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo
  2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo
  2007-12-03  8:23 ` [RFC][PATCHES 0/7]: Reorganization of RX history patches Ian McDonald
@ 2007-12-03  8:35 ` Gerrit Renker
  2007-12-03 12:44   ` Arnaldo Carvalho de Melo
  2 siblings, 1 reply; 27+ messages in thread
From: Gerrit Renker @ 2007-12-03  8:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp, Ingo Molnar

Hi Arnaldo,

hank you for going through this. I have just backported your recent patches of 2.6.25
to the DCCP/CCID4/Faster Restart test tree at 
	git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr}
as per subsequent message.
| 	 do, so please consider moving DCCP discussion to netdev@vger.kernel.org,
| 	 where lots of smart networking folks are present and can help our efforts
| 	 on turning RFCs to code.
Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org?
  

| 	Please take a look at this patch series where I reorganized your work on the new
| TFRC rx history handling code. I'll wait for your considerations and then do as many
| interactions as reasonable to get your work merged.
| 
| 	It should be completely equivalent, plus some fixes and optimizations, such as:
It will be necessary to address these points one-by-one. Before diving into making
fixes and `optimisations', have you tested your code? The patches you are referring to
have been posted and re-posted for a period of over 9 months on dccp@vger, and
there are regression tests which show that this code improves on the existing Linux
implementation. These are labelled as `test tree' on 
	http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing
So if you are making changes to the code, I would like to ask if you have run similar
regression tests, to avoid having to step back later.


 
| . The code that allocates the RX ring deals with failures when one of the entries in
|   the ring buffer is not successfully allocated, the original code was leaking the
|   successfully allocated entries.
| 
| . We do just one allocation for the ring buffer, as the number of entries is fixed we
|   should just do one allocation and not TFRC_NDUPACK times.
Will look at the first point in the patch; with regard to the second point I agree, this
will make the code simpler, which is good. 

| . I haven't checked if all the code was commited, as I tried to introduce just what was
|   immediatelly used, probably we'll need to do some changes when working on the merge
|   of your loss intervals code.
Sorry I don't understand this point.

| . I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the first
|   non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the
|   initial value in the EWMA calculation.
This is a misunderstanding. Non-data packets are not considered in the moving average
for the data packet size `s'; and it would be an error to do (consider 40byte Acks vs.
1460byte data packets, also it is in RFC 4342). 
Where would the zero initial value come from? I think this is also a misunderstanding.
Please have a look below:
	static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
	{
		// ...
		u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;

		if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
			if (is_data_packet) {
				do_feedback = FBACK_INITIAL;
				ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
				ccid3_hc_rx_update_s(hcrx, payload_size);
			}
			goto update_records;
		}

==> Non-data packets are ignored for the purposes of computing s (this is in the RFC),
    consequently update_s() is only called for data packets; using the two following
    functions:


	static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight)
	{
		return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval;
	}

	static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
	{
		if (likely(len > 0))	/* don't update on empty packets (e.g. ACKs) */
			hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
	}

==> Hence I can't see where a zero value should come from: ccid3hrx_s is initially 
    initialised with zero (memset(...,0,...)); when first called, update_s() will
    feed a non-zero payload size to tfrc_ewma(), which will return  `newval' = payload_size,
    hence the first data packet will contribute a non-zero payload_size.
    Zero-sized DCCP-Data packets are pathological and are ignored by the CCID calculations
    (not by the receiver); a corresponding counterpart for zero-sized

| 
| 	It is available at:
| 
| master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
| 
Need to do this separately. As said, the code has been developed and tested over a long time,
it took a long while until it acted predictably, so being careful is very important.

I would rather not have my patches merged and continue to run a test tree if the current
changes alter the behaviour to the worse.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-03  8:35 ` Gerrit Renker
@ 2007-12-03 12:44   ` Arnaldo Carvalho de Melo
  2007-12-03 13:49     ` Gerrit Renker
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-03 12:44 UTC (permalink / raw)
  To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp,
	Ingo Molnar

Em Mon, Dec 03, 2007 at 08:35:12AM +0000, Gerrit Renker escreveu:
> Hi Arnaldo,
> 
> hank you for going through this. I have just backported your recent patches of 2.6.25
> to the DCCP/CCID4/Faster Restart test tree at 
> 	git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr}
> as per subsequent message.
> | 	 do, so please consider moving DCCP discussion to netdev@vger.kernel.org,
> | 	 where lots of smart networking folks are present and can help our efforts
> | 	 on turning RFCs to code.
> Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org?

Well, since at least one person that has contributed significantly in
the past has said he can't cope with traffic on netdev, we can CC
dccp@vger.kernel.org.

> 
> | 	Please take a look at this patch series where I reorganized your work on the new
> | TFRC rx history handling code. I'll wait for your considerations and then do as many
> | interactions as reasonable to get your work merged.
> | 
> | 	It should be completely equivalent, plus some fixes and optimizations, such as:
> It will be necessary to address these points one-by-one. Before diving into making
> fixes and `optimisations', have you tested your code? The patches you are referring to

I have performed basic tests, and when doing so noticed a bug in
inet_diag, that I commented with Herbert Xu and he has since provided a
fix.

> have been posted and re-posted for a period of over 9 months on dccp@vger, and

Being posted and re-posted does not guarantee that the patch is OK or
that is in a form that is acceptable to all tree maintainers. DaveM is
subscribed to dccp@vger and could have picked this if he had the time to
do the review. I didn't, but now I'm trying to spend my time on
reviewing your patches and in the process doing modifications I find
appropriate while trying hard not to introduce bugs in your hard work
to get it merged.

> there are regression tests which show that this code improves on the existing Linux
> implementation. These are labelled as `test tree' on 
> 	http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing
> So if you are making changes to the code, I would like to ask if you have run similar
> regression tests, to avoid having to step back later.

Fair enough, I will do that before asking Herbert or Dave to pull from
my tree.
 
> | . The code that allocates the RX ring deals with failures when one of the entries in
> |   the ring buffer is not successfully allocated, the original code was leaking the
> |   successfully allocated entries.

Sorry for not point out exactly this, here it goes:

Your original patch:

+int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
+{
+	int i;
+
+	for (i = 0; i <= NDUPACK; i++) {
+		h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
+		if (h->ring[i] == NULL)
+			return 1;
+	}
+	h->loss_count = 0;
+	h->loss_start = 0;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);

Then, in ccid3_hc_rx_init you do:

static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
{
        struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid);

        ccid3_pr_debug("entry\n");

        hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
        tfrc_lh_init(&hcrx->ccid3hcrx_li_hist);
        return tfrc_rx_hist_init(&hcrx->ccid3hcrx_hist);
}

So if tfrc_rx_hist_init fail to allocate, say, the last entry in the
ring, it will return 1, and from looking at how you initialize
h->loss_{count,start} to zero I assumed that the whole tfrc_rx_hist
is not zeroed when tfrc_rx_hist_init is called, so one can also assume
that the ring entries are not initialized to NULL and that if the
error recovery is to assume that later on tfrc_rx_hist_cleanup is called
we would not have a leak, but an OOPS when tfrc_rx_hist_cleanup tries
to call kfree_cache_free on the uninitialized ring entries.

But if you look at ccid_new(), that calls ccid3_hc_rx_init(), you'll see
that when the ccid rx or hx routine fails, it just frees the
struct ccid with the area where h->ring is, so, what was not cleaned up
by the ccid init routine is effectively forgot, leaked.

I first did the cleanup at tfrc_rx_hist_init (that I renamed to
tfrc_rx_hist_alloc, since it doesn't just initializes things, but
allocates entries from slab), but then I just made the rx history slab
have arrays of tfrc_rx_hist_entry objects, not individual ones as your
code always allocates them as arrays.

> | . We do just one allocation for the ring buffer, as the number of entries is fixed we
> |   should just do one allocation and not TFRC_NDUPACK times.
> Will look at the first point in the patch; with regard to the second point I agree, this
> will make the code simpler, which is good. 

good

> | . I haven't checked if all the code was commited, as I tried to introduce just what was
> |   immediatelly used, probably we'll need to do some changes when working on the merge
> |   of your loss intervals code.
> Sorry I don't understand this point.

Let me check now and tell you for sure:

tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as
they were not used, we should introduce them later, when getting to the
working on the loss interval code.
 
> | . I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the first
> |   non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the
> |   initial value in the EWMA calculation.

> This is a misunderstanding. Non-data packets are not considered in the moving average
> for the data packet size `s'; and it would be an error to do (consider 40byte Acks vs.
> 1460byte data packets, also it is in RFC 4342). 
> Where would the zero initial value come from? I think this is also a misunderstanding.
> Please have a look below:
> 	static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
> 	{
> 		// ...
> 		u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
> 
> 		if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
> 			if (is_data_packet) {
> 				do_feedback = FBACK_INITIAL;
> 				ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
> 				ccid3_hc_rx_update_s(hcrx, payload_size);
> 			}
> 			goto update_records;
> 		}
> 
> ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC),
>     consequently update_s() is only called for data packets; using the two following
>     functions:
> 
> 
> 	static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight)
> 	{
> 		return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval;
> 	}

I hadn't considered that tfrc_ewma would for every packet check if the
avg was 0 and I find it suboptimal now that I look at it, we are just
feeding data packets, no? So checking just when we are at
TFRC_RSTATE_NO_DATA, as your code does seems to be better, but not by
calling ccid3_hc_rx_update_s, but just doing:

<SNIP>
	{
                do_feedback = CCID3_FBACK_INITIAL;
                ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
                hcrx->ccid3hcrx_s = payload_size;
	}
<SNIP>
 
> 	static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
> 	{
> 		if (likely(len > 0))	/* don't update on empty packets (e.g. ACKs) */
> 			hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
> 	}

	And we also just do test for len > 0 in update_s, that looks like
also excessive, no? As we should pass just data packets, perhaps we should
just do a BUG_ON there.
> 
> ==> Hence I can't see where a zero value should come from: ccid3hrx_s is initially 
>     initialised with zero (memset(...,0,...)); when first called, update_s() will
>     feed a non-zero payload size to tfrc_ewma(), which will return  `newval' = payload_size,
>     hence the first data packet will contribute a non-zero payload_size.
>     Zero-sized DCCP-Data packets are pathological and are ignored by the CCID calculations
>     (not by the receiver); a corresponding counterpart for zero-sized

Understood, consider this one then an optimization and not a bugfix. My
motivation to add this as an optimization had I realized that tfrc_ewma
checks for avg being zero would have been greatly diminished, but since
we are having all this discussion, I think the optimization is
OK to have merged.
 
> | 
> | 	It is available at:
> | 
> | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
> | 
> Need to do this separately. As said, the code has been developed and tested over a long time,
> it took a long while until it acted predictably, so being careful is very important.

Gerrit, I do these kinds of code reorganizations on the core Linux
networking for ages, I'm not perfect (huh! :-P) but I also didn't
performed extensive testing doing coverage analisys of all the
imaginable possibilities, just did my best to improve the abstractions
and the code, ocasionally fixing bugs as I went thru the existing code.

So while I understand your concerns about being careful, rest asured I
_am_ careful and will be even _more_ careful by using the regression
tests you mentioned, as it would be a shame not to use them.

But I can't just take your patches as-is all the time, I have my
personal preferences and will try to use them when taking the time to
merge code from anyone.
 
> I would rather not have my patches merged and continue to run a test tree if the current
> changes alter the behaviour to the worse.

Its not like that, you released your code as GPL, its out there, it
would be a shame not to merge it, but not as-is all the time. I'm making
sure I stick my name when I do major code changes while giving you due
credit for your work.

And heck, this was an [RFC], I didn't pushed this upstream before
listening to your concerns. Its only fair that if you expect me to
review your work, I should expect for you to review my work, even more
when is to improve yours.

Best Regards,

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-03 12:44   ` Arnaldo Carvalho de Melo
@ 2007-12-03 13:49     ` Gerrit Renker
  2007-12-03 14:54       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 27+ messages in thread
From: Gerrit Renker @ 2007-12-03 13:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar

| > Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org?
| 
| Well, since at least one person that has contributed significantly in
| the past has said he can't cope with traffic on netdev, we can CC
| dccp@vger.kernel.org
I have a similar problem with the traffic but agree and will copy as well.


| > have been posted and re-posted for a period of over 9 months on dccp@vger, and
| 
| Being posted and re-posted does not guarantee that the patch is OK or
| that is in a form that is acceptable to all tree maintainers.
With the first point I couldn't agree more, but this is not really what I meant - the point
was that despite posting and re-posting there was often silence. And now there is feedback,
in form of a patchset made by you; and all that I am asking for is just to be given the time to
look that through. Last time a RFC patch appeared at 3pm and was checked in the same evening
(only found out next morning).
With your experience and long background as a maintainer maybe you expect quicker turn-around
times, but I also had waited patiently until you had had a chance to review the stuff I sent.


| > | . The code that allocates the RX ring deals with failures when one of the entries in
| > |   the ring buffer is not successfully allocated, the original code was leaking the
| > |   successfully allocated entries.
| 
| Sorry for not point out exactly this, here it goes:
| 
| Your original patch:
| 
| +int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
| +{
| +	int i;
| +
| +	for (i = 0; i <= NDUPACK; i++) {
| +		h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
| +		if (h->ring[i] == NULL)
| +			return 1;
| +	}
| +	h->loss_count = 0;
| +	h->loss_start = 0;
| +	return 0;
| +}
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
| 
I expected this and it actually was very clear from your original message. I fully back up
your solution in this point, see below. But my question above was rather: are there any
other bugs rather than the above leakage, which is what the previous email seemed to indicate?

With regard to your solution - you are using

	int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
	{
		h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
		h->loss_count = h->loss_start = 0;
		return h->ring == NULL;
	}

which is better not only for one but for two reasons. It solves the leakage and in addition makes
the entire code simpler. Fully agreed.
  

| 
| > | . I haven't checked if all the code was commited, as I tried to introduce just what was
| > |   immediatelly used, probably we'll need to do some changes when working on the merge
| > |   of your loss intervals code.
| > Sorry I don't understand this point.
| 
| Let me check now and tell you for sure:
| 
| tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as
| they were not used, we should introduce them later, when getting to the
| working on the loss interval code.
Ah thanks, that was really not clear. Just beginning to work through the set.

| > 	static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
| > 	{
| > 		// ...
| > 		u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
| > 
| > 		if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
| > 			if (is_data_packet) {
| > 				do_feedback = FBACK_INITIAL;
| > 				ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
| > 				ccid3_hc_rx_update_s(hcrx, payload_size);
| > 			}
| > 			goto update_records;
| > 		}
| > 
| > ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC),
| >     consequently update_s() is only called for data packets; using the two following
| >     functions:
| > 
| > 
| > 	static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight)
| > 	{
| > 		return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval;
| > 	}
| 
| I hadn't considered that tfrc_ewma would for every packet check if the
| avg was 0 and I find it suboptimal now that I look at it, we are just
| feeding data packets, no? 
Yes exactly, only data packets are used for s.

| > 	static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
| > 	{
| > 		if (likely(len > 0))	/* don't update on empty packets (e.g. ACKs) */
| > 			hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
| > 	}
| 
| 	And we also just do test for len > 0 in update_s, that looks like
| also excessive, no?
Hm, I think we need to make it robust against API bugs and/or zero-sized
data packets. The check `len > 0' may seem redundant but it catches such
a condition. For a moving average an accidental zero value will have
quite an impact on the overall value. In CCID3 it is

  	x_new = 0.9 * x_old + 0.1 * len

So if len is accidentally 0 where it should not be, then each time the
moving average is reduced to 90%.

|As we should pass just data packets, perhaps we should just do a BUG_ON there.
That is a good idea, can we make it a DCCP_WARN-like thing so that the computer does
not freeze? 

| Understood, consider this one then an optimization and not a bugfix. My
| motivation to add this as an optimization had I realized that tfrc_ewma
| checks for avg being zero would have been greatly diminished, 
| but since we are having all this discussion, I think the optimization is
| OK to have merged.
I see your point but as this is in patch 7/7 can you please allow me
some time to digest it.  

| > | 
| > | 	It is available at:
| > | 
| > | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
| > | 
| > Need to do this separately. As said, the code has been developed and tested over a long time,
| > it took a long while until it acted predictably, so being careful is very important.
| 
| Gerrit, I do these kinds of code reorganizations on the core Linux
| networking for ages, I'm not perfect (huh! :-P) but I also didn't
| performed extensive testing doing coverage analisys of all the
| imaginable possibilities, just did my best to improve the abstractions
| and the code, ocasionally fixing bugs as I went thru the existing code.
| 
As above - more than likely you will have done a really good job. In the
past experience of collaborating with you I can not recall a single bad
thing introduced, and learned lots from that. It is just, I am not as
fast as you and need longer to understand what is going on. 
As a comparison - the entire patch set took about a full month to do.
But that meant I am reasonably sure the algorithm is sound and can cope
with problematic conditions.

| But I can't just take your patches as-is all the time, I have my
| personal preferences and will try to use them when taking the time to
| merge code from anyone.
|
... which is perfectly allright and actually I am grateful for that. It was just
last week where your feedback on the passive-close states lead to a reconsideration
which brought to light three previously unresolved problems:

 (a) an skb leakage which you pointed out
 (b) not checking for state transitions from other states
 (c) how to handle simultaneous-close

So it all started when you said you had concerns about (a), but you may
have had an inkling or so, and see there were two more problems. Had you
silently merged these patches, these three problems would probably still
exist.

In other words I am not interested in accepting-patches-for-instant-gratification,
let's discuss where the code may be weak and iron that out.

Thanks a lot for your clarifying email I now understand better.
Hopefully there will be feedback. If you can bear with me until tomorrow
(Tuesday) or better Wednesday, I will have a look and send feedback
about your patch set. I already have some ideas, but need to sit down
and look at it.

Thanks indeed
Gerrit

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-03 13:49     ` Gerrit Renker
@ 2007-12-03 14:54       ` Arnaldo Carvalho de Melo
  2007-12-03 15:44         ` Gerrit Renker
  2007-12-05 10:27         ` Gerrit Renker
  0 siblings, 2 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-03 14:54 UTC (permalink / raw)
  To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp,
	Ingo Molnar

Em Mon, Dec 03, 2007 at 01:49:47PM +0000, Gerrit Renker escreveu:
> | > Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org?
> | 
> | Well, since at least one person that has contributed significantly in
> | the past has said he can't cope with traffic on netdev, we can CC
> | dccp@vger.kernel.org
> I have a similar problem with the traffic but agree and will copy as well.
> 
> 
> | > have been posted and re-posted for a period of over 9 months on dccp@vger, and
> | 
> | Being posted and re-posted does not guarantee that the patch is OK or
> | that is in a form that is acceptable to all tree maintainers.

> With the first point I couldn't agree more, but this is not really what I meant - the point
> was that despite posting and re-posting there was often silence. And now there is feedback,
> in form of a patchset made by you; and all that I am asking for is just to be given the time to
> look that through. Last time a RFC patch appeared at 3pm and was checked in the same evening
> (only found out next morning).

I was too optimistic about that one, feeling that it was safe, sorry
about that, will avoid doing that in the future.

> With your experience and long background as a maintainer maybe you expect quicker turn-around
> times, but I also had waited patiently until you had had a chance to review the stuff I sent.

Agreed, my bad, will be more patient with my side as you've been with
yours :-)

> | > | . The code that allocates the RX ring deals with failures when one of the entries in
> | > |   the ring buffer is not successfully allocated, the original code was leaking the
> | > |   successfully allocated entries.
> | 
> | Sorry for not point out exactly this, here it goes:
> | 
> | Your original patch:
> | 
> | +int tfrc_rx_hist_init(struct tfrc_rx_hist *h)
> | +{
> | +	int i;
> | +
> | +	for (i = 0; i <= NDUPACK; i++) {
> | +		h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC);
> | +		if (h->ring[i] == NULL)
> | +			return 1;
> | +	}
> | +	h->loss_count = 0;
> | +	h->loss_start = 0;
> | +	return 0;
> | +}
> | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init);
> | 
> I expected this and it actually was very clear from your original message. I fully back up
> your solution in this point, see below. But my question above was rather: are there any
> other bugs rather than the above leakage, which is what the previous email seemed to indicate?
> 
> With regard to your solution - you are using
> 
> 	int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
> 	{
> 		h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
> 		h->loss_count = h->loss_start = 0;
> 		return h->ring == NULL;
> 	}
> 
> which is better not only for one but for two reasons. It solves the leakage and in addition makes
> the entire code simpler. Fully agreed.

good
   
> 
> | 
> | > | . I haven't checked if all the code was commited, as I tried to introduce just what was
> | > |   immediatelly used, probably we'll need to do some changes when working on the merge
> | > |   of your loss intervals code.
> | > Sorry I don't understand this point.
> | 
> | Let me check now and tell you for sure:
> | 
> | tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as
> | they were not used, we should introduce them later, when getting to the
> | working on the loss interval code.
> Ah thanks, that was really not clear. Just beginning to work through the set.

great
 
> | > 	static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
> | > 	{
> | > 		// ...
> | > 		u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
> | > 
> | > 		if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) {
> | > 			if (is_data_packet) {
> | > 				do_feedback = FBACK_INITIAL;
> | > 				ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
> | > 				ccid3_hc_rx_update_s(hcrx, payload_size);
> | > 			}
> | > 			goto update_records;
> | > 		}
> | > 
> | > ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC),
> | >     consequently update_s() is only called for data packets; using the two following
> | >     functions:
> | > 
> | > 
> | > 	static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight)
> | > 	{
> | > 		return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval;
> | > 	}
> | 
> | I hadn't considered that tfrc_ewma would for every packet check if the
> | avg was 0 and I find it suboptimal now that I look at it, we are just
> | feeding data packets, no? 
> Yes exactly, only data packets are used for s.
> 
> | > 	static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
> | > 	{
> | > 		if (likely(len > 0))	/* don't update on empty packets (e.g. ACKs) */
> | > 			hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
> | > 	}
> | 
> | 	And we also just do test for len > 0 in update_s, that looks like
> | also excessive, no?
> Hm, I think we need to make it robust against API bugs and/or zero-sized
> data packets. The check `len > 0' may seem redundant but it catches such
> a condition. For a moving average an accidental zero value will have
> quite an impact on the overall value. In CCID3 it is
> 
>   	x_new = 0.9 * x_old + 0.1 * len
> 
> So if len is accidentally 0 where it should not be, then each time the
> moving average is reduced to 90%.

So we must make it a BUG_ON, not something that is to be always present.
 
> |As we should pass just data packets, perhaps we should just do a BUG_ON there.

> That is a good idea, can we make it a DCCP_WARN-like thing so that the computer does
> not freeze? 

that is ok at this point where we're still going thru lots of patches,
as soon as we collectively agree that it should be a serious bug to pass
a 0 sized packet to this routine, we can turn it into a BUG_ON. But we
must make it from now clear that it is something completely unexpected.
 
> | Understood, consider this one then an optimization and not a bugfix. My
> | motivation to add this as an optimization had I realized that tfrc_ewma
> | checks for avg being zero would have been greatly diminished, 
> | but since we are having all this discussion, I think the optimization is
> | OK to have merged.

> I see your point but as this is in patch 7/7 can you please allow me
> some time to digest it.  

OK, its normal working day, so I'm supposed to be doing "Real Work", so
you'll have plenty of time to look at it before I get back at doing more
serious work on DCCP.
 
> | > | 
> | > | 	It is available at:
> | > | 
> | > | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
> | > | 
> | > Need to do this separately. As said, the code has been developed and tested over a long time,
> | > it took a long while until it acted predictably, so being careful is very important.
> | 
> | Gerrit, I do these kinds of code reorganizations on the core Linux
> | networking for ages, I'm not perfect (huh! :-P) but I also didn't
> | performed extensive testing doing coverage analisys of all the
> | imaginable possibilities, just did my best to improve the abstractions
> | and the code, ocasionally fixing bugs as I went thru the existing code.
> | 

> As above - more than likely you will have done a really good job. In the
> past experience of collaborating with you I can not recall a single bad
> thing introduced, and learned lots from that. It is just, I am not as
> fast as you and need longer to understand what is going on. 
> As a comparison - the entire patch set took about a full month to do.
> But that meant I am reasonably sure the algorithm is sound and can cope
> with problematic conditions.

And from what I saw so far that is my impression too, if you look at
what I'm doing it is:

1. go thru the whole patch trying to understand hunk by hunk
2. do consistency changes (add namespace prefixes)
3. reorganize the code to look more like what is already there, we
   both have different backgrounds and tastes about how code should
   be written, so its only normal that if we want to keep the code
   consistent, and I want that, I jump into things I think should be
   "reworded", while trying to keep the algorithm expressed by you.
 
> | But I can't just take your patches as-is all the time, I have my
> | personal preferences and will try to use them when taking the time to
> | merge code from anyone.
> |
> ... which is perfectly allright and actually I am grateful for that. It was just
> last week where your feedback on the passive-close states lead to a reconsideration
> which brought to light three previously unresolved problems:
> 
>  (a) an skb leakage which you pointed out
>  (b) not checking for state transitions from other states
>  (c) how to handle simultaneous-close
> 
> So it all started when you said you had concerns about (a), but you may
> have had an inkling or so, and see there were two more problems. Had you
> silently merged these patches, these three problems would probably still
> exist.

Glad that I was of help
 
> In other words I am not interested in accepting-patches-for-instant-gratification,
> let's discuss where the code may be weak and iron that out.

Excellent
 
> Thanks a lot for your clarifying email I now understand better.
> Hopefully there will be feedback. If you can bear with me until tomorrow
> (Tuesday) or better Wednesday, I will have a look and send feedback
> about your patch set. I already have some ideas, but need to sit down
> and look at it.

That is perfectly fine and will give me time to use the regression
testing that you mentioned and also to let me think about further
automatization on regression testing.

Thank you,

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-03 14:54       ` Arnaldo Carvalho de Melo
@ 2007-12-03 15:44         ` Gerrit Renker
  2007-12-05 10:27         ` Gerrit Renker
  1 sibling, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-03 15:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar

| > | > 	static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
| > | > 	{
| > | > 		if (likely(len > 0))	/* don't update on empty packets (e.g. ACKs) */
| > | > 			hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
| > | > 	}
| > | 
| > | 	And we also just do test for len > 0 in update_s, that looks like
| > | also excessive, no?
| > Hm, I think we need to make it robust against API bugs and/or zero-sized
| > data packets. The check `len > 0' may seem redundant but it catches such
| > a condition. For a moving average an accidental zero value will have
| > quite an impact on the overall value. In CCID3 it is
| > 
| >   	x_new = 0.9 * x_old + 0.1 * len
| > 
| > So if len is accidentally 0 where it should not be, then each time the
| > moving average is reduced to 90%.
| 
| So we must make it a BUG_ON, not something that is to be always present.
|  
I think it should be a warning condition since it can be triggered when
the remote party sends zero-sized packets. It may be good to log this
into the syslog to warn about possibly misbehaving apps/peers/remote
stacks.


| > As a comparison - the entire patch set took about a full month to do.
| > But that meant I am reasonably sure the algorithm is sound and can cope
| > with problematic conditions.
| 
| And from what I saw so far that is my impression too, if you look at
| what I'm doing it is:
| 
| 1. go thru the whole patch trying to understand hunk by hunk
You are doing a great job - in particular as it really is a lot of material.

| 2. do consistency changes (add namespace prefixes)
| 3. reorganize the code to look more like what is already there, we
|    both have different backgrounds and tastes about how code should
|    be written, so its only normal that if we want to keep the code
|    consistent, and I want that, I jump into things I think should be
|    "reworded", while trying to keep the algorithm expressed by you.
|
Agree, that is not always easy to get right. I try to stick as close as
possible to existing conventions but of course that is my
interpretation, so I am already anticipating such changes/comments here.

| think about further automatization on regression testing.
| 
If it is of any use, some scripts and setups are at the bottom of the page at
http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] [TFRC] New rx history code
  2007-12-02 21:36             ` [PATCH 7/7] [TFRC] New rx history code Arnaldo Carvalho de Melo
@ 2007-12-04  6:55               ` Gerrit Renker
  2007-12-04 11:59                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 27+ messages in thread
From: Gerrit Renker @ 2007-12-04  6:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp, Ingo Molnar

NAK. You have changed the control flow of the algorithm and the underlying
data structure. Originally it had been an array of pointers, and this had
been a design decision right from the first submissions in March. From I
of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
 
 "1. 'ring' is an array of pointers rather than a contiguous area in
     memory. The reason is that, when having to swap adjacent entries
     to sort the history, it is easier to swap pointers than to copy
     (heavy-weight) history-entry data structs."

But in your algorithm it becomes a normal linear array.

As a consequence all subsequent code needs to be rewritten to use
copying instead of swapping pointers. And there is a lot of that, since
the loss detection code makes heavy use of swapping entries in order to
cope with reordered and/or delayed packets.

This is not what I would call "entirely equivalent" as you claim. Your
algorithm is fundamentally different, the control flow is not the same,
nor are the underlying data structures.

| Credit here goes to Gerrit Renker, that provided the initial implementation for
| this new codebase.
| 
| I modified it just to try to make it closer to the existing API, hide details from
| the CCIDs and fix a couple bugs found in the initial implementation.
What is "a couple bugs"? So far you have pointed out only one problem and that was
agreed, it can be fixed. But it remains a side issue, it is not a failure of the
algorithm per se. What is worse here is that you take this single occurrence as a
kind of carte blanche to mess around with the code as you please. Where please are
the other bugs you are referring to?

I am not buying into this and am not convinced that you are understanding
what you are doing. It is the third time that you take patches which
have been submitted on dccp@vger for over half a year, for which you
have offered no more than a sentence of feedback, release them under
your name, and introduce fundamental changes which alter the behaviour.

The first instance was the set of ktime patches which I had developed
for the test tree and which you extended to DCCP. In this case it were
in fact three bugs that you added in migrating my patches. I know this
because it messed up the way CCID3 behaved and since I spent several
hours chasing these. And, in contrast to the above, it is not a mere
claim: that is recorded in the netdev mail archives.

The second instance was when you released the TX history patches under
your name. Pro forma there was an RFC patch at 3pm, de facto it was
checked in a few hours later: input not welcome.

The third instance is now where you change in essence the floor
underneath this work. Since you are using a different basis, the
algorithm is (in addition to the changes in control flow) necessarily
different. 

I have provided documentation, written test modules, and am able to prove
that the algorithm as submitted works reasonably correct. In addition, the
behaviour has been verified using regression tests.

I am not prepared to take your claims and expressions of "deepest
respect" at face value since your actions - not your words - show that
you are, in fact, not dealing respectfully with work which has taken
months to complete and verify.

During 9 months on dccp@vger you did not provide so much as a paragraph
of feedback. Instead you mopped up code from the test tree, modified it
according to your own whims and now, in the circle of your
invitation-only buddies, start to talk about having discussions and 
iterations. The only iteration I can see is in fixing up the things you
introduced, so it is not you who has to pay the price.

Sorry, unless you can offer a more mature model of collaboration,
consider me out of this and the patches summarily withdrawn. I am not
prepared to throw away several months of work just because you feel
inspired to do as you please with the work of others. 

Gerrit

| Original changeset comment from Gerrit:
|       -----------
| This provides a new, self-contained and generic RX history service for TFRC
| based protocols.
| 
| Details:
|  * new data structure, initialisation and cleanup routines;
|  * allocation of dccp_rx_hist entries local to packet_history.c,
|    as a service exported by the dccp_tfrc_lib module.
|  * interface to automatically track highest-received seqno;
|  * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
|  * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.
|       -----------
| 
| Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| ---
|  net/dccp/ccids/ccid3.c                       |  255 ++++++++----------------
|  net/dccp/ccids/ccid3.h                       |   14 +-
|  net/dccp/ccids/lib/loss_interval.c           |   14 +-
|  net/dccp/ccids/lib/packet_history.c          |  277 +++++++++++++++-----------
|  net/dccp/ccids/lib/packet_history.h          |   82 +++-----
|  net/dccp/ccids/lib/packet_history_internal.h |   67 ++++++
|  6 files changed, 353 insertions(+), 356 deletions(-)
|  create mode 100644 net/dccp/ccids/lib/packet_history_internal.h
| 
| diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
| index 5ff5aab..af64c1d 100644
| --- a/net/dccp/ccids/ccid3.c
| +++ b/net/dccp/ccids/ccid3.c
| @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len,
|  /*
|   *	Receiver Half-Connection Routines
|   */
| +
| +/* CCID3 feedback types */
| +enum ccid3_fback_type {
| +	CCID3_FBACK_NONE = 0,
| +	CCID3_FBACK_INITIAL,
| +	CCID3_FBACK_PERIODIC,| +	CCID3_FBACK_PARAM_CHANGE
| +};
| +
|  #ifdef CONFIG_IP_DCCP_CCID3_DEBUG
|  static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state)
|  {
| @@ -673,53 +682,60 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
|  		hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
|  }
|  
| -static void ccid3_hc_rx_send_feedback(struct sock *sk)
| +static void ccid3_hc_rx_send_feedback(struct sock *sk,
| +				      const struct sk_buff *skb,
| +				      enum ccid3_fback_type fbtype)
|  {
|  	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
|  	struct dccp_sock *dp = dccp_sk(sk);
| -	struct tfrc_rx_hist_entry *packet;
|  	ktime_t now;
| -	suseconds_t delta;
| +	s64 delta = 0;
|  
|  	ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
|  
| +	if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM))
| +		return;
| +
|  	now = ktime_get_real();
|  
| -	switch (hcrx->ccid3hcrx_state) {
| -	case TFRC_RSTATE_NO_DATA:
| +	switch (fbtype) {
| +	case CCID3_FBACK_INITIAL:
|  		hcrx->ccid3hcrx_x_recv = 0;
| +		hcrx->ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
|  		break;
| -	case TFRC_RSTATE_DATA:
| -		delta = ktime_us_delta(now,
| -				       hcrx->ccid3hcrx_tstamp_last_feedback);
| -		DCCP_BUG_ON(delta < 0);
| -		hcrx->ccid3hcrx_x_recv =
| -			scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
| +	case CCID3_FBACK_PARAM_CHANGE:
| +		/*
| +		 * When parameters change (new loss or p > p_prev), we do not
| +		 * have a reliable estimate for R_m of [RFC 3448, 6.2] and so
| +		 * need to  reuse the previous value of X_recv. However, when
| +		 * X_recv was 0 (due to early loss), this would kill X down to
| +		 * s/t_mbi (i.e. one packet in 64 seconds).
| +		 * To avoid such drastic reduction, we approximate X_recv as
| +		 * the number of bytes since last feedback.
| +		 * This is a safe fallback, since X is bounded above by X_calc.
| +		 */
| +		if (hcrx->ccid3hcrx_x_recv > 0)
| +			break;
| +		/* fall through */
| +	case CCID3_FBACK_PERIODIC:
| +		delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback);
| +		if (delta <= 0)
| +			DCCP_BUG("delta (%ld) <= 0", (long)delta);
| +		else
| +			hcrx->ccid3hcrx_x_recv =
| +				scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
|  		break;
| -	case TFRC_RSTATE_TERM:
| -		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
| +	default:
|  		return;
|  	}
|  
| -	packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
| -	if (unlikely(packet == NULL)) {
| -		DCCP_WARN("%s(%p), no data packet in history!\n",
| -			  dccp_role(sk), sk);
| -		return;
| -	}
| +	ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta,
| +		       hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv);
|  
|  	hcrx->ccid3hcrx_tstamp_last_feedback = now;
| -	hcrx->ccid3hcrx_ccval_last_counter   = packet->tfrchrx_ccval;
| +	hcrx->ccid3hcrx_last_counter	     = dccp_hdr(skb)->dccph_ccval;
|  	hcrx->ccid3hcrx_bytes_recv	     = 0;
|  
| -	if (hcrx->ccid3hcrx_p == 0)
| -		hcrx->ccid3hcrx_pinv = ~0U;	/* see RFC 4342, 8.5 */
| -	else if (hcrx->ccid3hcrx_p > 1000000) {
| -		DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p);
| -		hcrx->ccid3hcrx_pinv = 1;	/* use 100% in this case */
| -	} else
| -		hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p;
| -
|  	dp->dccps_hc_rx_insert_options = 1;
|  	dccp_send_ack(sk);
|  }
| @@ -753,162 +769,52 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
|  static int ccid3_hc_rx_detect_loss(struct sock *sk,
|  				    struct tfrc_rx_hist_entry *packet)
|  {
| -	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
| -	struct tfrc_rx_hist_entry *rx_hist =
| -				tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist);
| -	u64 seqno = packet->tfrchrx_seqno;
| -	u64 tmp_seqno;
| -	int loss = 0;
| -	u8 ccval;
| -
| -
| -	tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
| -
| -	if (!rx_hist ||
| -	   follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
| -		hcrx->ccid3hcrx_seqno_nonloss = seqno;
| -		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
| -		goto detect_out;
| -	}
| -
| -
| -	while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno)
| -	   > TFRC_RECV_NUM_LATE_LOSS) {
| -		loss = 1;
| -		dccp_li_update_li(sk,
| -				  &hcrx->ccid3hcrx_li_hist,
| -				  &hcrx->ccid3hcrx_hist,
| -				  hcrx->ccid3hcrx_tstamp_last_feedback,
| -				  hcrx->ccid3hcrx_s,
| -				  hcrx->ccid3hcrx_bytes_recv,
| -				  hcrx->ccid3hcrx_x_recv,
| -				  hcrx->ccid3hcrx_seqno_nonloss,
| -				  hcrx->ccid3hcrx_ccval_nonloss);
| -		tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
| -		dccp_inc_seqno(&tmp_seqno);
| -		hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
| -		dccp_inc_seqno(&tmp_seqno);
| -		while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist,
| -		   tmp_seqno, &ccval)) {
| -			hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
| -			hcrx->ccid3hcrx_ccval_nonloss = ccval;
| -			dccp_inc_seqno(&tmp_seqno);
| -		}
| -	}
| -
| -	/* FIXME - this code could be simplified with above while */
| -	/* but works at moment */
| -	if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
| -		hcrx->ccid3hcrx_seqno_nonloss = seqno;
| -		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
| -	}
| -
| -detect_out:
| -	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist,
| -				&hcrx->ccid3hcrx_li_hist, packet,
| -				hcrx->ccid3hcrx_seqno_nonloss);
| -	return loss;
| +	return 0;
|  }
|  
|  static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
|  {
|  	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
| -	const struct dccp_options_received *opt_recv;
| -	struct tfrc_rx_hist_entry *packet;
| -	u32 p_prev, r_sample, rtt_prev;
| -	int loss, payload_size;
| -	ktime_t now;
| -
| -	opt_recv = &dccp_sk(sk)->dccps_options_received;
| +	const u32 payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
| +	const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp;
| +	enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE;
| +	const bool is_data_packet = dccp_data_packet(skb);
|  
| -	switch (DCCP_SKB_CB(skb)->dccpd_type) {
| -	case DCCP_PKT_ACK:
| -		if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)
| +	if (hcrx->ccid3hcrx_state != TFRC_RSTATE_NO_DATA) {
| +		if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb))
|  			return;
| -	case DCCP_PKT_DATAACK:
| -		if (opt_recv->dccpor_timestamp_echo == 0)
| -			break;
| -		r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo;
| -		rtt_prev = hcrx->ccid3hcrx_rtt;
| -		r_sample = dccp_sample_rtt(sk, 10 * r_sample);
| -
| -		if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)
| -			hcrx->ccid3hcrx_rtt = r_sample;
| -		else
| -			hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 +
| -					      r_sample / 10;
| -
| -		if (rtt_prev != hcrx->ccid3hcrx_rtt)
| -			ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n",
| -				       dccp_role(sk), sk, hcrx->ccid3hcrx_rtt,
| -				       opt_recv->dccpor_elapsed_time);
| -		break;
| -	case DCCP_PKT_DATA:
| -		break;
| -	default: /* We're not interested in other packet types, move along */
| -		return;
| -	}
| -
| -	packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC);
| -	if (unlikely(packet == NULL)) {
| -		DCCP_WARN("%s(%p), Not enough mem to add rx packet "
| -			  "to history, consider it lost!\n", dccp_role(sk), sk);
| -		return;
| -	}
| -
| -	loss = ccid3_hc_rx_detect_loss(sk, packet);
| -
| -	if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK)
| -		return;
| -
| -	payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
| -	ccid3_hc_rx_update_s(hcrx, payload_size);
|  
| -	switch (hcrx->ccid3hcrx_state) {
| -	case TFRC_RSTATE_NO_DATA:
| -		ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial "
| -			       "feedback\n", dccp_role(sk), sk,
| -			       dccp_state_name(sk->sk_state), skb);
| -		ccid3_hc_rx_send_feedback(sk);
| -		ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
| -		return;
| -	case TFRC_RSTATE_DATA:
| -		hcrx->ccid3hcrx_bytes_recv += payload_size;
| -		if (loss)
| -			break;
| -
| -		now = ktime_get_real();
| -		if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) -
| -		     (s64)hcrx->ccid3hcrx_rtt) >= 0) {
| -			hcrx->ccid3hcrx_tstamp_last_ack = now;
| -			ccid3_hc_rx_send_feedback(sk);
| +		    /* Handle pending losses and otherwise check for new loss */
| +		if (!tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist,
| +						     skb, ndp) &&
| +		    /* Handle data packets: RTT sampling and monitoring p */
| +		    is_data_packet)  {
| +
| +			if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
| +				const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
| +				if (sample != 0)
| +					hcrx->ccid3hcrx_rtt =
| +						tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9);
| +			}
| +
| +			ccid3_hc_rx_update_s(hcrx, payload_size);
| +			/* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */
| +			if (SUB16(dccp_hdr(skb)->dccph_ccval,
| +				  hcrx->ccid3hcrx_last_counter) > 3)
| +				do_feedback = CCID3_FBACK_PERIODIC;
|  		}
| -		return;
| -	case TFRC_RSTATE_TERM:
| -		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
| -		return;
| -	}
| -
| -	/* Dealing with packet loss */
| -	ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n",
| -		       dccp_role(sk), sk, dccp_state_name(sk->sk_state));
| -
| -	p_prev = hcrx->ccid3hcrx_p;
| +	} else if (is_data_packet) {
| +		do_feedback = CCID3_FBACK_INITIAL;
| +		ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
| +		hcrx->ccid3hcrx_s = payload_size;
| + 	}
|  
| -	/* Calculate loss event rate */
| -	if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
| -		u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
| +	hcrx->ccid3hcrx_bytes_recv += payload_size;
|  
| -		/* Scaling up by 1000000 as fixed decimal */
| -		if (i_mean != 0)
| -			hcrx->ccid3hcrx_p = 1000000 / i_mean;
| -	} else
| -		DCCP_BUG("empty loss history");
| +	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp);
|  
| -	if (hcrx->ccid3hcrx_p > p_prev) {
| -		ccid3_hc_rx_send_feedback(sk);
| -		return;
| -	}
| +	if (do_feedback)
| +		ccid3_hc_rx_send_feedback(sk, skb, do_feedback);
|  }
|  
|  static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
| @@ -918,11 +824,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
|  	ccid3_pr_debug("entry\n");
|  
|  	hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
| -	INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist);
|  	INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist);
| -	hcrx->ccid3hcrx_tstamp_last_feedback =
| -		hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real();
| -	return 0;
| +	return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist);
|  }
|  
|  static void ccid3_hc_rx_exit(struct sock *sk)
| diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
| index b842a7d..3c33dc6 100644
| --- a/net/dccp/ccids/ccid3.h
| +++ b/net/dccp/ccids/ccid3.h
| @@ -1,7 +1,8 @@
|  /*
|   *  net/dccp/ccids/ccid3.h
|   *
| - *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
| + *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
| + *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
|   *
|   *  An implementation of the DCCP protocol
|   *
| @@ -135,9 +136,7 @@ enum ccid3_hc_rx_states {
|   *  @ccid3hcrx_x_recv  -  Receiver estimate of send rate (RFC 3448 4.3)
|   *  @ccid3hcrx_rtt  -  Receiver estimate of rtt (non-standard)
|   *  @ccid3hcrx_p  -  current loss event rate (RFC 3448 5.4)
| - *  @ccid3hcrx_seqno_nonloss  -  Last received non-loss sequence number
| - *  @ccid3hcrx_ccval_nonloss  -  Last received non-loss Window CCVal
| - *  @ccid3hcrx_ccval_last_counter  -  Tracks window counter (RFC 4342, 8.1)
| + *  @ccid3hcrx_last_counter  -  Tracks window counter (RFC 4342, 8.1)
|   *  @ccid3hcrx_state  -  receiver state, one of %ccid3_hc_rx_states
|   *  @ccid3hcrx_bytes_recv  -  Total sum of DCCP payload bytes
|   *  @ccid3hcrx_tstamp_last_feedback  -  Time at which last feedback was sent
| @@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock {
|  #define ccid3hcrx_x_recv		ccid3hcrx_tfrc.tfrcrx_x_recv
|  #define ccid3hcrx_rtt			ccid3hcrx_tfrc.tfrcrx_rtt
|  #define ccid3hcrx_p			ccid3hcrx_tfrc.tfrcrx_p
| -	u64				ccid3hcrx_seqno_nonloss:48,
| -					ccid3hcrx_ccval_nonloss:4,
| -					ccid3hcrx_ccval_last_counter:4;
| +	u8				ccid3hcrx_last_counter:4;
|  	enum ccid3_hc_rx_states		ccid3hcrx_state:8;
|  	u32				ccid3hcrx_bytes_recv;
|  	ktime_t				ccid3hcrx_tstamp_last_feedback;
| -	ktime_t				ccid3hcrx_tstamp_last_ack;
| -	struct list_head		ccid3hcrx_hist;
| +	struct tfrc_rx_hist		ccid3hcrx_hist;
|  	struct list_head		ccid3hcrx_li_hist;
|  	u16				ccid3hcrx_s;
|  	u32				ccid3hcrx_pinv;
| diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c
| index a5f59af..71080ca 100644
| --- a/net/dccp/ccids/lib/loss_interval.c
| +++ b/net/dccp/ccids/lib/loss_interval.c
| @@ -16,6 +16,7 @@
|  #include "../../dccp.h"
|  #include "loss_interval.h"
|  #include "packet_history.h"
| +#include "packet_history_internal.h"
|  #include "tfrc.h"
|  
|  #define DCCP_LI_HIST_IVAL_F_LENGTH  8
| @@ -129,6 +130,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
|  				 u16 s, u32 bytes_recv,
|  				 u32 previous_x_recv)
|  {
| +/*
| + * FIXME:
| + * Will be rewritten in the upcoming new loss intervals code. 
| + * Has to be commented ou because it relies on the old rx history
| + * data structures
| + */
| +#if 0
|  	struct tfrc_rx_hist_entry *entry, *next, *tail = NULL;
|  	u32 x_recv, p;
|  	suseconds_t rtt, delta;
| @@ -216,10 +224,10 @@ found:
|  	dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied "
|  		      "loss rate=%u\n", dccp_role(sk), sk, x_recv, p);
|  
| -	if (p == 0)
| -		return ~0;
| -	else
| +	if (p != 0)
|  		return 1000000 / p;
| +#endif
| +	return ~0;
|  }
|  
|  void dccp_li_update_li(struct sock *sk,
| diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
| index 255cca1..80ad4cd 100644
| --- a/net/dccp/ccids/lib/packet_history.c
| +++ b/net/dccp/ccids/lib/packet_history.c
| @@ -36,7 +36,10 @@
|   */
|  
|  #include <linux/string.h>
| +#include <linux/slab.h>
|  #include "packet_history.h"
| +#include "../../dccp.h"
| +#include "packet_history_internal.h"
|  
|  /**
|   *  tfrc_tx_hist_entry  -  Simple singly-linked TX history list
| @@ -111,154 +114,195 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno,
|  }
|  EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
|  
| -/*
| - * 	Receiver History Routines
| +/**
| + * tfrc_rx_hist_index - index to reach n-th entry after loss_start
|   */
| -static struct kmem_cache *tfrc_rx_hist_slab;
| +static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
| +{
| +	return (h->loss_start + n) & TFRC_NDUPACK;
| +}
|  
| -struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp,
| -						  const struct sk_buff *skb,
| -						  const gfp_t prio)
| +/**
| + * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
| + */
| +static inline struct tfrc_rx_hist_entry *
| +			tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
|  {
| -	struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab,
| -							    prio);
| +	return h->ring + tfrc_rx_hist_index(h, h->loss_count);
| +}
|  
| -	if (entry != NULL) {
| -		const struct dccp_hdr *dh = dccp_hdr(skb);
| +/**
| + * tfrc_rx_hist_entry - return the n-th history entry after loss_start
| + */
| +static inline struct tfrc_rx_hist_entry *
| +		tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n)
| +{
| +	return h->ring + tfrc_rx_hist_index(h, n);
| +}
|  
| -		entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
| -		entry->tfrchrx_ccval = dh->dccph_ccval;
| -		entry->tfrchrx_type  = dh->dccph_type;
| -		entry->tfrchrx_ndp   = ndp;
| -		entry->tfrchrx_tstamp = ktime_get_real();
| -	}
| +/**
| + * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected
| + */
| +static inline struct tfrc_rx_hist_entry *
| +			tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h)
| +{
| +	return h->ring + h->loss_start;
| +}
| +
| +/*
| + * 	Receiver History Routines
| + */
| +static struct kmem_cache *tfrc_rx_hist_slab;
|  
| -	return entry;
| +void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
| +			     const struct sk_buff *skb,
| +			     const u32 ndp)
| +{
| +	struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h);
| +	const struct dccp_hdr *dh = dccp_hdr(skb);
| +
| +	entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
| +	entry->tfrchrx_ccval = dh->dccph_ccval;
| +	entry->tfrchrx_type  = dh->dccph_type;
| +	entry->tfrchrx_ndp   = ndp;
| +	entry->tfrchrx_tstamp = ktime_get_real();
|  }
| -EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new);
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
|  
|  static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry)
|  {
|  	kmem_cache_free(tfrc_rx_hist_slab, entry);
|  }
|  
| -int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
| -			    u8 *ccval)
| +/* has the packet contained in skb been seen before? */
| +int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
|  {
| -	struct tfrc_rx_hist_entry *packet = NULL, *entry;
| +	const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq;
| +	int i;
|  
| -	list_for_each_entry(entry, list, tfrchrx_node)
| -		if (entry->tfrchrx_seqno == seq) {
| -			packet = entry;
| -			break;
| -		}
| +	if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0)
| +		return 1;
|  
| -	if (packet)
| -		*ccval = packet->tfrchrx_ccval;
| +	for (i = 1; i <= h->loss_count; i++)
| +		if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq)
| +			return 1;
|  
| -	return packet != NULL;
| +	return 0;
|  }
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate);
|  
| -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry);
| -struct tfrc_rx_hist_entry *
| -		tfrc_rx_hist_find_data_packet(const struct list_head *list)
| +/* initialise loss detection and disable RTT sampling */
| +static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
|  {
| -	struct tfrc_rx_hist_entry *entry, *packet = NULL;
| +	h->loss_count = 1;
| +}
|  
| -	list_for_each_entry(entry, list, tfrchrx_node)
| -		if (entry->tfrchrx_type == DCCP_PKT_DATA ||
| -		    entry->tfrchrx_type == DCCP_PKT_DATAACK) {
| -			packet = entry;
| -			break;
| -		}
| +/* indicate whether previously a packet was detected missing */
| +static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
| +{
| +	return h->loss_count;
| +}
|  
| -	return packet;
| +/* any data packets missing between last reception and skb ? */
| +int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
| +				    const struct sk_buff *skb, u32 ndp)
| +{
| +	int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno,
| +				     DCCP_SKB_CB(skb)->dccpd_seq);
| +
| +	if (delta > 1 && ndp < delta)
| +		tfrc_rx_hist_loss_indicated(h);
| +
| +	return tfrc_rx_hist_loss_pending(h);
| +}
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated);
| +
| +void tfrc_rx_hist_purge(struct tfrc_rx_hist *h)
| +{
| +	kmem_cache_free(tfrc_rx_hist_slab, h->ring);
| +	h->ring = NULL;
|  }
|  
| -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet);
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
|  
| -void tfrc_rx_hist_add_packet(struct list_head *rx_list,
| -			     struct list_head *li_list,
| -			     struct tfrc_rx_hist_entry *packet,
| -			     u64 nonloss_seqno)
| +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
|  {
| -	struct tfrc_rx_hist_entry *entry, *next;
| -	u8 num_later = 0;
| -
| -	list_add(&packet->tfrchrx_node, rx_list);
| -
| -	num_later = TFRC_RECV_NUM_LATE_LOSS + 1;
| -
| -	if (!list_empty(li_list)) {
| -		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
| -			if (num_later == 0) {
| -				if (after48(nonloss_seqno,
| -				   entry->tfrchrx_seqno)) {
| -					list_del_init(&entry->tfrchrx_node);
| -					tfrc_rx_hist_entry_delete(entry);
| -				}
| -			} else if (tfrc_rx_hist_entry_data_packet(entry))
| -				--num_later;
| -		}
| -	} else {
| -		int step = 0;
| -		u8 win_count = 0; /* Not needed, but lets shut up gcc */
| -		int tmp;
| -		/*
| -		 * We have no loss interval history so we need at least one
| -		 * rtt:s of data packets to approximate rtt.
| -		 */
| -		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
| -			if (num_later == 0) {
| -				switch (step) {
| -				case 0:
| -					step = 1;
| -					/* OK, find next data packet */
| -					num_later = 1;
| -					break;
| -				case 1:
| -					step = 2;
| -					/* OK, find next data packet */
| -					num_later = 1;
| -					win_count = entry->tfrchrx_ccval;
| -					break;
| -				case 2:
| -					tmp = win_count - entry->tfrchrx_ccval;
| -					if (tmp < 0)
| -						tmp += TFRC_WIN_COUNT_LIMIT;
| -					if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) {
| -						/*
| -						 * We have found a packet older
| -						 * than one rtt remove the rest
| -						 */
| -						step = 3;
| -					} else /* OK, find next data packet */
| -						num_later = 1;
| -					break;
| -				case 3:
| -					list_del_init(&entry->tfrchrx_node);
| -					tfrc_rx_hist_entry_delete(entry);
| -					break;
| -				}
| -			} else if (tfrc_rx_hist_entry_data_packet(entry))
| -				--num_later;
| -		}
| -	}
| +	h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
| +	h->loss_count = h->loss_start = 0;
| +	return h->ring == NULL;
|  }
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc);
|  
| -EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
| +/**
| + * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against
| + */
| +static inline struct tfrc_rx_hist_entry *
| +			tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h)
| +{
| +	return h->ring;
| +}
|  
| -void tfrc_rx_hist_purge(struct list_head *list)
| +/**
| + * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry
| + */
| +static inline struct tfrc_rx_hist_entry *
| +			tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h)
|  {
| -	struct tfrc_rx_hist_entry *entry, *next;
| +	return h->ring + h->rtt_sample_prev;
| +}
|  
| -	list_for_each_entry_safe(entry, next, list, tfrchrx_node) {
| -		list_del_init(&entry->tfrchrx_node);
| -		tfrc_rx_hist_entry_delete(entry);
| +/**
| + * tfrc_rx_hist_sample_rtt  -  Sample RTT from timestamp / CCVal
| + * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able
| + * to compute a sample with given data - calling function should check this.
| + */
| +u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb)
| +{
| +	u32 sample = 0,
| +	    delta_v = SUB16(dccp_hdr(skb)->dccph_ccval,
| +	    		    tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
| +
| +	if (delta_v < 1 || delta_v > 4) {	/* unsuitable CCVal delta */
| +		if (h->rtt_sample_prev == 2) {	/* previous candidate stored */
| +			sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval,
| +				       tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
| +			if (sample)
| +				sample = 4 / sample *
| +				         ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp,
| +							tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp);
| +			else    /*
| +				 * FIXME: This condition is in principle not
| +				 * possible but occurs when CCID is used for
| +				 * two-way data traffic. I have tried to trace
| +				 * it, but the cause does not seem to be here.
| +				 */
| +				DCCP_BUG("please report to dccp@vger.kernel.org"
| +					 " => prev = %u, last = %u",
| +					 tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval,
| +					 tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
| +		} else if (delta_v < 1) {
| +			h->rtt_sample_prev = 1;
| +			goto keep_ref_for_next_time;
| +		}
| +
| +	} else if (delta_v == 4) /* optimal match */
| +		sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp));
| +	else {			 /* suboptimal match */
| +		h->rtt_sample_prev = 2;
| +		goto keep_ref_for_next_time;
|  	}
| -}
|  
| -EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
| +	if (unlikely(sample > DCCP_SANE_RTT_MAX)) {
| +		DCCP_WARN("RTT sample %u too large, using max\n", sample);
| +		sample = DCCP_SANE_RTT_MAX;
| +	}
| +
| +	h->rtt_sample_prev = 0;	       /* use current entry as next reference */
| +keep_ref_for_next_time:
| +
| +	return sample;
| +}
| +EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt);
|  
|  __init int packet_history_init(void)
|  {
| @@ -269,7 +313,8 @@ __init int packet_history_init(void)
|  		goto out_err;
|  
|  	tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist",
| -					      sizeof(struct tfrc_rx_hist_entry), 0,
| +					      (sizeof(struct tfrc_rx_hist_entry) *
| +					       (TFRC_NDUPACK + 1)), 0,
|  					      SLAB_HWCACHE_ALIGN, NULL);
|  	if (tfrc_rx_hist_slab == NULL)
|  		goto out_free_tx;
| diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h
| index 5b0b983..6871e51 100644
| --- a/net/dccp/ccids/lib/packet_history.h
| +++ b/net/dccp/ccids/lib/packet_history.h
| @@ -37,15 +37,9 @@
|  #define _DCCP_PKT_HIST_
|  
|  #include <linux/ktime.h>
| -#include <linux/list.h>
| -#include <linux/slab.h>
| -#include "tfrc.h"
| +#include <linux/types.h>
|  
| -/* Number of later packets received before one is considered lost */
| -#define TFRC_RECV_NUM_LATE_LOSS	 3
| -
| -#define TFRC_WIN_COUNT_PER_RTT	 4
| -#define TFRC_WIN_COUNT_LIMIT	16
| +struct sk_buff;
|  
|  struct tfrc_tx_hist_entry;
|  
| @@ -54,54 +48,38 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp);
|  extern u32  tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head,
|  			     const u64 seqno, const ktime_t now);
|  
| -/*
| - * 	Receiver History data structures and declarations
| - */
| -struct tfrc_rx_hist_entry {
| -	struct list_head tfrchrx_node;
| -	u64		 tfrchrx_seqno:48,
| -			 tfrchrx_ccval:4,
| -			 tfrchrx_type:4;
| -	u32		 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
| -	ktime_t		 tfrchrx_tstamp;
| -};
| -
| -extern struct tfrc_rx_hist_entry *
| -			tfrc_rx_hist_entry_new(const u32 ndp,
| -					       const struct sk_buff *skb,
| -					       const gfp_t prio);
| +struct tfrc_rx_hist_entry;
|  
| -static inline struct tfrc_rx_hist_entry *
| -			tfrc_rx_hist_head(struct list_head *list)
| -{
| -	struct tfrc_rx_hist_entry *head = NULL;
| +/* Subtraction a-b modulo-16, respects circular wrap-around */
| +#define SUB16(a, b) (((a) + 16 - (b)) & 0xF)
|  
| -	if (!list_empty(list))
| -		head = list_entry(list->next, struct tfrc_rx_hist_entry,
| -				  tfrchrx_node);
| -	return head;
| -}
| +/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */
| +#define TFRC_NDUPACK 3
|  
| -extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
| -				   u8 *ccval);
| -extern struct tfrc_rx_hist_entry *
| -		tfrc_rx_hist_find_data_packet(const struct list_head *list);
| -
| -extern void tfrc_rx_hist_add_packet(struct list_head *rx_list,
| -				    struct list_head *li_list,
| -				    struct tfrc_rx_hist_entry *packet,
| -				    u64 nonloss_seqno);
| -
| -extern void tfrc_rx_hist_purge(struct list_head *list);
| +/**
| + * tfrc_rx_hist  -  RX history structure for TFRC-based protocols
| + *
| + * @ring:		Packet history for RTT sampling and loss detection
| + * @loss_count:		Number of entries in circular history
| + * @loss_start:		Movable index (for loss detection)
| + * @rtt_sample_prev:	Used during RTT sampling, points to candidate entry
| + */
| +struct tfrc_rx_hist {
| +	struct tfrc_rx_hist_entry *ring;
| +	u8			  loss_count:2,
| +				  loss_start:2;
| +#define rtt_sample_prev		  loss_start
| +};
|  
| -static inline int
| -	tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry)
| -{
| -	return entry->tfrchrx_type == DCCP_PKT_DATA ||
| -	       entry->tfrchrx_type == DCCP_PKT_DATAACK;
| -}
| +extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
| +				    const struct sk_buff *skb, const u32 ndp);
|  
| -extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list,
| -				    struct list_head *li_list, u8 *win_loss);
| +extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb);
| +extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
| +					   const struct sk_buff *skb, u32 ndp);
| +extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h,
| +				   const struct sk_buff *skb);
| +extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h);
| +extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h);
|  
|  #endif /* _DCCP_PKT_HIST_ */
| diff --git a/net/dccp/ccids/lib/packet_history_internal.h b/net/dccp/ccids/lib/packet_history_internal.h
| new file mode 100644
| index 0000000..70d5c31
| --- /dev/null
| +++ b/net/dccp/ccids/lib/packet_history_internal.h
| @@ -0,0 +1,67 @@
| +#ifndef _DCCP_PKT_HIST_INT_
| +#define _DCCP_PKT_HIST_INT_
| +/*
| + *  Packet RX/TX history data structures and routines for TFRC-based protocols.
| + *
| + *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
| + *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
| + *
| + *  This code has been developed by the University of Waikato WAND
| + *  research group. For further information please see http://www.wand.net.nz/
| + *  or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz
| + *
| + *  This code also uses code from Lulea University, rereleased as GPL by its
| + *  authors:
| + *  Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
| + *
| + *  Changes to meet Linux coding standards, to make it meet latest ccid3 draft
| + *  and to make it work as a loadable module in the DCCP stack written by
| + *  Arnaldo Carvalho de Melo <acme@conectiva.com.br>.
| + *
| + *  Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
| + *
| + *  This program is free software; you can redistribute it and/or modify
| + *  it under the terms of the GNU General Public License as published by
| + *  the Free Software Foundation; either version 2 of the License, or
| + *  (at your option) any later version.
| + *
| + *  This program is distributed in the hope that it will be useful,
| + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
| + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
| + *  GNU General Public License for more details.
| + *
| + *  You should have received a copy of the GNU General Public License
| + *  along with this program; if not, write to the Free Software
| + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
| + */
| +
| +#include <linux/dccp.h>
| +#include <linux/ktime.h>
| +#include <linux/types.h>
| +
| +#define TFRC_WIN_COUNT_LIMIT 16
| +
| +/**
| + * tfrc_rx_hist_entry - Store information about a single received packet
| + * @tfrchrx_seqno:	DCCP packet sequence number
| + * @tfrchrx_ccval:	window counter value of packet (RFC 4342, 8.1)
| + * @tfrchrx_ndp:	the NDP count (if any) of the packet
| + * @tfrchrx_tstamp:	actual receive time of packet
| + */
| +struct tfrc_rx_hist_entry {
| +	u64	tfrchrx_seqno:48,
| +		tfrchrx_ccval:4,
| +		tfrchrx_type:4;
| +	u32	tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
| +	ktime_t	tfrchrx_tstamp;
| +};
| +
| +static inline bool tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *h)
| +{
| +	return h->tfrchrx_type == DCCP_PKT_DATA	   ||
| +	       h->tfrchrx_type == DCCP_PKT_DATAACK ||
| +	       h->tfrchrx_type == DCCP_PKT_REQUEST ||
| +	       h->tfrchrx_type == DCCP_PKT_RESPONSE;
| +}
| +
| +#endif /* _DCCP_PKT_HIST_INT_ */
| -- 
| 1.5.3.4
| 
| -
| To unsubscribe from this list: send the line "unsubscribe dccp" in
| the body of a message to majordomo@vger.kernel.org
| More majordomo info at  http://vger.kernel.org/majordomo-info.html
| 

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] [TFRC] New rx history code
  2007-12-04  6:55               ` Gerrit Renker
@ 2007-12-04 11:59                 ` Arnaldo Carvalho de Melo
  2007-12-04 13:48                   ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo
  2007-12-05  9:35                   ` [PATCH 7/7] [TFRC] " Gerrit Renker
  0 siblings, 2 replies; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-04 11:59 UTC (permalink / raw)
  To: Gerrit Renker, netdev, dccp

Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu:
> NAK. You have changed the control flow of the algorithm and the underlying
> data structure. Originally it had been an array of pointers, and this had
> been a design decision right from the first submissions in March. From I
> of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
>  
>  "1. 'ring' is an array of pointers rather than a contiguous area in
>      memory. The reason is that, when having to swap adjacent entries
>      to sort the history, it is easier to swap pointers than to copy
>      (heavy-weight) history-entry data structs."
> 
> But in your algorithm it becomes a normal linear array.
> 
> As a consequence all subsequent code needs to be rewritten to use
> copying instead of swapping pointers. And there is a lot of that, since
> the loss detection code makes heavy use of swapping entries in order to
> cope with reordered and/or delayed packets.

So lets not do that, the decision was made based on the RX history patch
alone, where, as pointed out, no swapping occurs.
 
> This is not what I would call "entirely equivalent" as you claim. Your
> algorithm is fundamentally different, the control flow is not the same,
> nor are the underlying data structures.

Its is equivalent up to what is in the tree, thank you for point out
that in subsequent patches it will be used.

Had this design decision been made explicit in the changeset comment
that introduces the data structure I wouldn't have tried to reduce the
number of allocations.

And you even said that it was a good decision when first reacting to
this change, which I saw as positive feedback for the RFC I sent.

> | Credit here goes to Gerrit Renker, that provided the initial implementation for
> | this new codebase.
> | 
> | I modified it just to try to make it closer to the existing API, hide details from
> | the CCIDs and fix a couple bugs found in the initial implementation.
> What is "a couple bugs"? So far you have pointed out only one problem and that was
> agreed, it can be fixed. But it remains a side issue, it is not a failure of the

I pointed two problems one ended up being just the fact that tfrc_ewma
checks if the average is zero for every calculation.

> algorithm per se. What is worse here is that you take this single occurrence as a
> kind of carte blanche to mess around with the code as you please. Where please are
> the other bugs you are referring to?

I mentioned in the previous messages, one was a problem, the other you
clarified that was not a problem, but could be optimized.
 
> I am not buying into this and am not convinced that you are understanding
> what you are doing. It is the third time that you take patches which
> have been submitted on dccp@vger for over half a year, for which you
> have offered no more than a sentence of feedback, release them under
> your name, and introduce fundamental changes which alter the behaviour.

I commited it under my name because I changed it, while giving you
credit.
 
> The first instance was the set of ktime patches which I had developed
> for the test tree and which you extended to DCCP. In this case it were
> in fact three bugs that you added in migrating my patches. I know this
> because it messed up the way CCID3 behaved and since I spent several
> hours chasing these. And, in contrast to the above, it is not a mere
> claim: that is recorded in the netdev mail archives.

I'm sorry you feel so strongly when back and forth you express gratitude
and then you show contempt for my behaviour.

> The second instance was when you released the TX history patches under
> your name. Pro forma there was an RFC patch at 3pm, de facto it was
> checked in a few hours later: input not welcome.

Have you found any problems in it? Look again at the changeset to see if
I "stole" your code as you seem to imply:

commit 02271b56fd4028820e68e85e9d468628f42fb6ab
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Wed Nov 28 11:15:40 2007 -0200

    [TFRC]: Migrate TX history to singly-linked lis

    This patch was based on another made by Gerrit Renker, his changelog was:

        ------------------------------------------------------
    The patch set migrates TFRC TX history to a singly-linked list.

    The details are:
     * use of a consistent naming scheme (all TFRC functions now begin
     * with `tfrc_');
     * allocation and cleanup are taken care of internally;
     * provision of a lookup function, which is used by the CCID TX
     * infrastructure
       to determine the time a packet was sent (in turn used for RTT sampling);
     * integration of the new interface with the present use in CCID3.
        ------------------------------------------------------

    Simplifications I did:

    . removing the tfrc_tx_hist_head that had a pointer to the list head and
      another for the slabcache.
    . No need for creating a slabcache for each CCID that wants to use the TFRC
      tx history routines, create a single slabcache when the dccp_tfrc_lib module
      init routine is called.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Is this enough credit? I can make the process slower and simply post
these patches and wait till you review it and incorporate the changes.
 
> The third instance is now where you change in essence the floor
> underneath this work. Since you are using a different basis, the
> algorithm is (in addition to the changes in control flow) necessarily
> different. 

You are picking something I did on a RFC patch and exaggerating on your
reaction.

> I have provided documentation, written test modules, and am able to prove
> that the algorithm as submitted works reasonably correct. In addition, the
> behaviour has been verified using regression tests.
> 
> I am not prepared to take your claims and expressions of "deepest
> respect" at face value since your actions - not your words - show that
> you are, in fact, not dealing respectfully with work which has taken
> months to complete and verify.
> 
> During 9 months on dccp@vger you did not provide so much as a paragraph
> of feedback. Instead you mopped up code from the test tree, modified it
> according to your own whims and now, in the circle of your
> invitation-only buddies, start to talk about having discussions and 
> iterations. The only iteration I can see is in fixing up the things you
> introduced, so it is not you who has to pay the price.
> 
> Sorry, unless you can offer a more mature model of collaboration,
> consider me out of this and the patches summarily withdrawn. I am not
> prepared to throw away several months of work just because you feel
> inspired to do as you please with the work of others. 

Again you want to have your patches accepted as-is. Pointed to one case
where I gave you credit while improving on your work (TX history) and
another where the changes were up for review. I don't consider this a
warm welcome for me to finally dedicate time to this effort. Would you
prefer to continue working without help? I think we should encourage
more people to work on DCCP, you seem to think that changes to your work
are not acceptable.

Perhaps others can comment on what is happening and help us find, as you
say, to find a more mature model of collaboration.

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 7/7][TAKE 2][TFRC] New rx history code
  2007-12-04 11:59                 ` Arnaldo Carvalho de Melo
@ 2007-12-04 13:48                   ` Arnaldo Carvalho de Melo
  2007-12-05  9:42                     ` Gerrit Renker
  2007-12-05  9:35                   ` [PATCH 7/7] [TFRC] " Gerrit Renker
  1 sibling, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-04 13:48 UTC (permalink / raw)
  To: Gerrit Renker, netdev, dccp

Em Tue, Dec 04, 2007 at 09:59:39AM -0200, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu:
> > NAK. You have changed the control flow of the algorithm and the underlying
> > data structure. Originally it had been an array of pointers, and this had
> > been a design decision right from the first submissions in March. From I
> > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
> >  
> >  "1. 'ring' is an array of pointers rather than a contiguous area in
> >      memory. The reason is that, when having to swap adjacent entries
> >      to sort the history, it is easier to swap pointers than to copy
> >      (heavy-weight) history-entry data structs."
> > 
> > But in your algorithm it becomes a normal linear array.
> > 
> > As a consequence all subsequent code needs to be rewritten to use
> > copying instead of swapping pointers. And there is a lot of that, since
> > the loss detection code makes heavy use of swapping entries in order to
> > cope with reordered and/or delayed packets.
> 
> So lets not do that, the decision was made based on the RX history patch
> alone, where, as pointed out, no swapping occurs.
>  
> > This is not what I would call "entirely equivalent" as you claim. Your
> > algorithm is fundamentally different, the control flow is not the same,
> > nor are the underlying data structures.
> 
> Its is equivalent up to what is in the tree, thank you for point out
> that in subsequent patches it will be used.

I found a problem that I'm still investigating if was introduced by this
patch or if was already present. When sending 1 million 256 byte packets
with ttcp over loopback, using ccid3 it is crashing, the test machine
I'm using doesn't have a serial port, its a notebook, will switch to
another that has and provide the backtrace. It doesn't happens every
time.

Here is tfrc_rx_hist_alloc back using ring of pointers with the fixed
error path.

+int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
 {
+	int i;
+
+	for (i = 0; i <= TFRC_NDUPACK; i++) {
+		h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
+		if (h->ring[i] == NULL)
+			goto out_free;
+	}
+
+	h->loss_count = h->loss_start = 0;
+	return 0;
+
+out_free:
+	while (i-- != 0) {
+		kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]);
+		h->ring[i] = NULL;
 	}
+	return -ENOBUFS;
 }

- Arnaldo

>From 317c1ce69711fe7b0f89e23e84390a14c98c0f7e Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Tue, 4 Dec 2007 11:46:19 -0200
Subject: [PATCH 7/7] [TFRC]: New rx history code

Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, hide details
from the CCIDs and fix one bug where the tfrc_rx_hist_alloc was not freeing the
allocated ring entries on the error path.

Original changeset comment from Gerrit:
      -----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.
     -------------

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 net/dccp/ccids/ccid3.c                       |  255 +++++++----------------
 net/dccp/ccids/ccid3.h                       |   14 +-
 net/dccp/ccids/lib/loss_interval.c           |   14 +-
 net/dccp/ccids/lib/packet_history.c          |  290 ++++++++++++++++----------
 net/dccp/ccids/lib/packet_history.h          |   82 +++-----
 net/dccp/ccids/lib/packet_history_internal.h |   67 ++++++
 6 files changed, 368 insertions(+), 354 deletions(-)
 create mode 100644 net/dccp/ccids/lib/packet_history_internal.h

diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c
index 5ff5aab..af64c1d 100644
--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len,
 /*
  *	Receiver Half-Connection Routines
  */
+
+/* CCID3 feedback types */
+enum ccid3_fback_type {
+	CCID3_FBACK_NONE = 0,
+	CCID3_FBACK_INITIAL,
+	CCID3_FBACK_PERIODIC,
+	CCID3_FBACK_PARAM_CHANGE
+};
+
 #ifdef CONFIG_IP_DCCP_CCID3_DEBUG
 static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state)
 {
@@ -673,53 +682,60 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len)
 		hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9);
 }
 
-static void ccid3_hc_rx_send_feedback(struct sock *sk)
+static void ccid3_hc_rx_send_feedback(struct sock *sk,
+				      const struct sk_buff *skb,
+				      enum ccid3_fback_type fbtype)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
-	struct tfrc_rx_hist_entry *packet;
 	ktime_t now;
-	suseconds_t delta;
+	s64 delta = 0;
 
 	ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk);
 
+	if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM))
+		return;
+
 	now = ktime_get_real();
 
-	switch (hcrx->ccid3hcrx_state) {
-	case TFRC_RSTATE_NO_DATA:
+	switch (fbtype) {
+	case CCID3_FBACK_INITIAL:
 		hcrx->ccid3hcrx_x_recv = 0;
+		hcrx->ccid3hcrx_pinv   = ~0U;   /* see RFC 4342, 8.5 */
 		break;
-	case TFRC_RSTATE_DATA:
-		delta = ktime_us_delta(now,
-				       hcrx->ccid3hcrx_tstamp_last_feedback);
-		DCCP_BUG_ON(delta < 0);
-		hcrx->ccid3hcrx_x_recv =
-			scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
+	case CCID3_FBACK_PARAM_CHANGE:
+		/*
+		 * When parameters change (new loss or p > p_prev), we do not
+		 * have a reliable estimate for R_m of [RFC 3448, 6.2] and so
+		 * need to  reuse the previous value of X_recv. However, when
+		 * X_recv was 0 (due to early loss), this would kill X down to
+		 * s/t_mbi (i.e. one packet in 64 seconds).
+		 * To avoid such drastic reduction, we approximate X_recv as
+		 * the number of bytes since last feedback.
+		 * This is a safe fallback, since X is bounded above by X_calc.
+		 */
+		if (hcrx->ccid3hcrx_x_recv > 0)
+			break;
+		/* fall through */
+	case CCID3_FBACK_PERIODIC:
+		delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback);
+		if (delta <= 0)
+			DCCP_BUG("delta (%ld) <= 0", (long)delta);
+		else
+			hcrx->ccid3hcrx_x_recv =
+				scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta);
 		break;
-	case TFRC_RSTATE_TERM:
-		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
+	default:
 		return;
 	}
 
-	packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist);
-	if (unlikely(packet == NULL)) {
-		DCCP_WARN("%s(%p), no data packet in history!\n",
-			  dccp_role(sk), sk);
-		return;
-	}
+	ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta,
+		       hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv);
 
 	hcrx->ccid3hcrx_tstamp_last_feedback = now;
-	hcrx->ccid3hcrx_ccval_last_counter   = packet->tfrchrx_ccval;
+	hcrx->ccid3hcrx_last_counter	     = dccp_hdr(skb)->dccph_ccval;
 	hcrx->ccid3hcrx_bytes_recv	     = 0;
 
-	if (hcrx->ccid3hcrx_p == 0)
-		hcrx->ccid3hcrx_pinv = ~0U;	/* see RFC 4342, 8.5 */
-	else if (hcrx->ccid3hcrx_p > 1000000) {
-		DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p);
-		hcrx->ccid3hcrx_pinv = 1;	/* use 100% in this case */
-	} else
-		hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p;
-
 	dp->dccps_hc_rx_insert_options = 1;
 	dccp_send_ack(sk);
 }
@@ -753,162 +769,52 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb)
 static int ccid3_hc_rx_detect_loss(struct sock *sk,
 				    struct tfrc_rx_hist_entry *packet)
 {
-	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
-	struct tfrc_rx_hist_entry *rx_hist =
-				tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist);
-	u64 seqno = packet->tfrchrx_seqno;
-	u64 tmp_seqno;
-	int loss = 0;
-	u8 ccval;
-
-
-	tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
-
-	if (!rx_hist ||
-	   follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
-		hcrx->ccid3hcrx_seqno_nonloss = seqno;
-		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
-		goto detect_out;
-	}
-
-
-	while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno)
-	   > TFRC_RECV_NUM_LATE_LOSS) {
-		loss = 1;
-		dccp_li_update_li(sk,
-				  &hcrx->ccid3hcrx_li_hist,
-				  &hcrx->ccid3hcrx_hist,
-				  hcrx->ccid3hcrx_tstamp_last_feedback,
-				  hcrx->ccid3hcrx_s,
-				  hcrx->ccid3hcrx_bytes_recv,
-				  hcrx->ccid3hcrx_x_recv,
-				  hcrx->ccid3hcrx_seqno_nonloss,
-				  hcrx->ccid3hcrx_ccval_nonloss);
-		tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss;
-		dccp_inc_seqno(&tmp_seqno);
-		hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
-		dccp_inc_seqno(&tmp_seqno);
-		while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist,
-		   tmp_seqno, &ccval)) {
-			hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno;
-			hcrx->ccid3hcrx_ccval_nonloss = ccval;
-			dccp_inc_seqno(&tmp_seqno);
-		}
-	}
-
-	/* FIXME - this code could be simplified with above while */
-	/* but works at moment */
-	if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) {
-		hcrx->ccid3hcrx_seqno_nonloss = seqno;
-		hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval;
-	}
-
-detect_out:
-	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist,
-				&hcrx->ccid3hcrx_li_hist, packet,
-				hcrx->ccid3hcrx_seqno_nonloss);
-	return loss;
+	return 0;
 }
 
 static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk);
-	const struct dccp_options_received *opt_recv;
-	struct tfrc_rx_hist_entry *packet;
-	u32 p_prev, r_sample, rtt_prev;
-	int loss, payload_size;
-	ktime_t now;
-
-	opt_recv = &dccp_sk(sk)->dccps_options_received;
+	const u32 payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
+	const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp;
+	enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE;
+	const bool is_data_packet = dccp_data_packet(skb);
 
-	switch (DCCP_SKB_CB(skb)->dccpd_type) {
-	case DCCP_PKT_ACK:
-		if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)
+	if (hcrx->ccid3hcrx_state != TFRC_RSTATE_NO_DATA) {
+		if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb))
 			return;
-	case DCCP_PKT_DATAACK:
-		if (opt_recv->dccpor_timestamp_echo == 0)
-			break;
-		r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo;
-		rtt_prev = hcrx->ccid3hcrx_rtt;
-		r_sample = dccp_sample_rtt(sk, 10 * r_sample);
-
-		if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)
-			hcrx->ccid3hcrx_rtt = r_sample;
-		else
-			hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 +
-					      r_sample / 10;
-
-		if (rtt_prev != hcrx->ccid3hcrx_rtt)
-			ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n",
-				       dccp_role(sk), sk, hcrx->ccid3hcrx_rtt,
-				       opt_recv->dccpor_elapsed_time);
-		break;
-	case DCCP_PKT_DATA:
-		break;
-	default: /* We're not interested in other packet types, move along */
-		return;
-	}
-
-	packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC);
-	if (unlikely(packet == NULL)) {
-		DCCP_WARN("%s(%p), Not enough mem to add rx packet "
-			  "to history, consider it lost!\n", dccp_role(sk), sk);
-		return;
-	}
-
-	loss = ccid3_hc_rx_detect_loss(sk, packet);
-
-	if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK)
-		return;
-
-	payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4;
-	ccid3_hc_rx_update_s(hcrx, payload_size);
 
-	switch (hcrx->ccid3hcrx_state) {
-	case TFRC_RSTATE_NO_DATA:
-		ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial "
-			       "feedback\n", dccp_role(sk), sk,
-			       dccp_state_name(sk->sk_state), skb);
-		ccid3_hc_rx_send_feedback(sk);
-		ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
-		return;
-	case TFRC_RSTATE_DATA:
-		hcrx->ccid3hcrx_bytes_recv += payload_size;
-		if (loss)
-			break;
-
-		now = ktime_get_real();
-		if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) -
-		     (s64)hcrx->ccid3hcrx_rtt) >= 0) {
-			hcrx->ccid3hcrx_tstamp_last_ack = now;
-			ccid3_hc_rx_send_feedback(sk);
+		    /* Handle pending losses and otherwise check for new loss */
+		if (!tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist,
+						     skb, ndp) &&
+		    /* Handle data packets: RTT sampling and monitoring p */
+		    is_data_packet)  {
+
+			if (list_empty(&hcrx->ccid3hcrx_li_hist)) {  /* no loss so far: p = 0 */
+				const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb);
+				if (sample != 0)
+					hcrx->ccid3hcrx_rtt =
+						tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9);
+			}
+
+			ccid3_hc_rx_update_s(hcrx, payload_size);
+			/* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */
+			if (SUB16(dccp_hdr(skb)->dccph_ccval,
+				  hcrx->ccid3hcrx_last_counter) > 3)
+				do_feedback = CCID3_FBACK_PERIODIC;
 		}
-		return;
-	case TFRC_RSTATE_TERM:
-		DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk);
-		return;
-	}
-
-	/* Dealing with packet loss */
-	ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n",
-		       dccp_role(sk), sk, dccp_state_name(sk->sk_state));
-
-	p_prev = hcrx->ccid3hcrx_p;
+	} else if (is_data_packet) {
+		do_feedback = CCID3_FBACK_INITIAL;
+		ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA);
+		hcrx->ccid3hcrx_s = payload_size;
+ 	}
 
-	/* Calculate loss event rate */
-	if (!list_empty(&hcrx->ccid3hcrx_li_hist)) {
-		u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist);
+	hcrx->ccid3hcrx_bytes_recv += payload_size;
 
-		/* Scaling up by 1000000 as fixed decimal */
-		if (i_mean != 0)
-			hcrx->ccid3hcrx_p = 1000000 / i_mean;
-	} else
-		DCCP_BUG("empty loss history");
+	tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp);
 
-	if (hcrx->ccid3hcrx_p > p_prev) {
-		ccid3_hc_rx_send_feedback(sk);
-		return;
-	}
+	if (do_feedback)
+		ccid3_hc_rx_send_feedback(sk, skb, do_feedback);
 }
 
 static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
@@ -918,11 +824,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk)
 	ccid3_pr_debug("entry\n");
 
 	hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA;
-	INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist);
 	INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist);
-	hcrx->ccid3hcrx_tstamp_last_feedback =
-		hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real();
-	return 0;
+	return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist);
 }
 
 static void ccid3_hc_rx_exit(struct sock *sk)
diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h
index b842a7d..3c33dc6 100644
--- a/net/dccp/ccids/ccid3.h
+++ b/net/dccp/ccids/ccid3.h
@@ -1,7 +1,8 @@
 /*
  *  net/dccp/ccids/ccid3.h
  *
- *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
  *
  *  An implementation of the DCCP protocol
  *
@@ -135,9 +136,7 @@ enum ccid3_hc_rx_states {
  *  @ccid3hcrx_x_recv  -  Receiver estimate of send rate (RFC 3448 4.3)
  *  @ccid3hcrx_rtt  -  Receiver estimate of rtt (non-standard)
  *  @ccid3hcrx_p  -  current loss event rate (RFC 3448 5.4)
- *  @ccid3hcrx_seqno_nonloss  -  Last received non-loss sequence number
- *  @ccid3hcrx_ccval_nonloss  -  Last received non-loss Window CCVal
- *  @ccid3hcrx_ccval_last_counter  -  Tracks window counter (RFC 4342, 8.1)
+ *  @ccid3hcrx_last_counter  -  Tracks window counter (RFC 4342, 8.1)
  *  @ccid3hcrx_state  -  receiver state, one of %ccid3_hc_rx_states
  *  @ccid3hcrx_bytes_recv  -  Total sum of DCCP payload bytes
  *  @ccid3hcrx_tstamp_last_feedback  -  Time at which last feedback was sent
@@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock {
 #define ccid3hcrx_x_recv		ccid3hcrx_tfrc.tfrcrx_x_recv
 #define ccid3hcrx_rtt			ccid3hcrx_tfrc.tfrcrx_rtt
 #define ccid3hcrx_p			ccid3hcrx_tfrc.tfrcrx_p
-	u64				ccid3hcrx_seqno_nonloss:48,
-					ccid3hcrx_ccval_nonloss:4,
-					ccid3hcrx_ccval_last_counter:4;
+	u8				ccid3hcrx_last_counter:4;
 	enum ccid3_hc_rx_states		ccid3hcrx_state:8;
 	u32				ccid3hcrx_bytes_recv;
 	ktime_t				ccid3hcrx_tstamp_last_feedback;
-	ktime_t				ccid3hcrx_tstamp_last_ack;
-	struct list_head		ccid3hcrx_hist;
+	struct tfrc_rx_hist		ccid3hcrx_hist;
 	struct list_head		ccid3hcrx_li_hist;
 	u16				ccid3hcrx_s;
 	u32				ccid3hcrx_pinv;
diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c
index a5f59af..71080ca 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -16,6 +16,7 @@
 #include "../../dccp.h"
 #include "loss_interval.h"
 #include "packet_history.h"
+#include "packet_history_internal.h"
 #include "tfrc.h"
 
 #define DCCP_LI_HIST_IVAL_F_LENGTH  8
@@ -129,6 +130,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk,
 				 u16 s, u32 bytes_recv,
 				 u32 previous_x_recv)
 {
+/*
+ * FIXME:
+ * Will be rewritten in the upcoming new loss intervals code. 
+ * Has to be commented ou because it relies on the old rx history
+ * data structures
+ */
+#if 0
 	struct tfrc_rx_hist_entry *entry, *next, *tail = NULL;
 	u32 x_recv, p;
 	suseconds_t rtt, delta;
@@ -216,10 +224,10 @@ found:
 	dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied "
 		      "loss rate=%u\n", dccp_role(sk), sk, x_recv, p);
 
-	if (p == 0)
-		return ~0;
-	else
+	if (p != 0)
 		return 1000000 / p;
+#endif
+	return ~0;
 }
 
 void dccp_li_update_li(struct sock *sk,
diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c
index 255cca1..56a0254 100644
--- a/net/dccp/ccids/lib/packet_history.c
+++ b/net/dccp/ccids/lib/packet_history.c
@@ -36,7 +36,10 @@
  */
 
 #include <linux/string.h>
+#include <linux/slab.h>
 #include "packet_history.h"
+#include "../../dccp.h"
+#include "packet_history_internal.h"
 
 /**
  *  tfrc_tx_hist_entry  -  Simple singly-linked TX history list
@@ -111,154 +114,213 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno,
 }
 EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt);
 
-/*
- * 	Receiver History Routines
+/**
+ * tfrc_rx_hist_index - index to reach n-th entry after loss_start
  */
-static struct kmem_cache *tfrc_rx_hist_slab;
+static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n)
+{
+	return (h->loss_start + n) & TFRC_NDUPACK;
+}
 
-struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp,
-						  const struct sk_buff *skb,
-						  const gfp_t prio)
+/**
+ * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab,
-							    prio);
+	return h->ring[tfrc_rx_hist_index(h, h->loss_count)];
+}
 
-	if (entry != NULL) {
-		const struct dccp_hdr *dh = dccp_hdr(skb);
+/**
+ * tfrc_rx_hist_entry - return the n-th history entry after loss_start
+ */
+static inline struct tfrc_rx_hist_entry *
+		tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n)
+{
+	return h->ring[tfrc_rx_hist_index(h, n)];
+}
 
-		entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
-		entry->tfrchrx_ccval = dh->dccph_ccval;
-		entry->tfrchrx_type  = dh->dccph_type;
-		entry->tfrchrx_ndp   = ndp;
-		entry->tfrchrx_tstamp = ktime_get_real();
-	}
+/**
+ * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h)
+{
+	return h->ring[h->loss_start];
+}
 
-	return entry;
+/*
+ * 	Receiver History Routines
+ */
+static struct kmem_cache *tfrc_rx_hist_slab;
+
+void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
+			     const struct sk_buff *skb,
+			     const u32 ndp)
+{
+	struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h);
+	const struct dccp_hdr *dh = dccp_hdr(skb);
+
+	entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq;
+	entry->tfrchrx_ccval = dh->dccph_ccval;
+	entry->tfrchrx_type  = dh->dccph_type;
+	entry->tfrchrx_ndp   = ndp;
+	entry->tfrchrx_tstamp = ktime_get_real();
 }
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new);
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
 
 static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry)
 {
 	kmem_cache_free(tfrc_rx_hist_slab, entry);
 }
 
-int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
-			    u8 *ccval)
+/* has the packet contained in skb been seen before? */
+int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb)
 {
-	struct tfrc_rx_hist_entry *packet = NULL, *entry;
+	const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq;
+	int i;
 
-	list_for_each_entry(entry, list, tfrchrx_node)
-		if (entry->tfrchrx_seqno == seq) {
-			packet = entry;
-			break;
-		}
+	if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0)
+		return 1;
 
-	if (packet)
-		*ccval = packet->tfrchrx_ccval;
+	for (i = 1; i <= h->loss_count; i++)
+		if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq)
+			return 1;
 
-	return packet != NULL;
+	return 0;
 }
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate);
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry);
-struct tfrc_rx_hist_entry *
-		tfrc_rx_hist_find_data_packet(const struct list_head *list)
+/* initialise loss detection and disable RTT sampling */
+static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry, *packet = NULL;
-
-	list_for_each_entry(entry, list, tfrchrx_node)
-		if (entry->tfrchrx_type == DCCP_PKT_DATA ||
-		    entry->tfrchrx_type == DCCP_PKT_DATAACK) {
-			packet = entry;
-			break;
-		}
+	h->loss_count = 1;
+}
 
-	return packet;
+/* indicate whether previously a packet was detected missing */
+static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h)
+{
+	return h->loss_count;
 }
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet);
+/* any data packets missing between last reception and skb ? */
+int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
+				    const struct sk_buff *skb, u32 ndp)
+{
+	int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno,
+				     DCCP_SKB_CB(skb)->dccpd_seq);
+
+	if (delta > 1 && ndp < delta)
+		tfrc_rx_hist_loss_indicated(h);
+
+	return tfrc_rx_hist_loss_pending(h);
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated);
 
-void tfrc_rx_hist_add_packet(struct list_head *rx_list,
-			     struct list_head *li_list,
-			     struct tfrc_rx_hist_entry *packet,
-			     u64 nonloss_seqno)
+void tfrc_rx_hist_purge(struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry, *next;
-	u8 num_later = 0;
-
-	list_add(&packet->tfrchrx_node, rx_list);
-
-	num_later = TFRC_RECV_NUM_LATE_LOSS + 1;
-
-	if (!list_empty(li_list)) {
-		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
-			if (num_later == 0) {
-				if (after48(nonloss_seqno,
-				   entry->tfrchrx_seqno)) {
-					list_del_init(&entry->tfrchrx_node);
-					tfrc_rx_hist_entry_delete(entry);
-				}
-			} else if (tfrc_rx_hist_entry_data_packet(entry))
-				--num_later;
-		}
-	} else {
-		int step = 0;
-		u8 win_count = 0; /* Not needed, but lets shut up gcc */
-		int tmp;
-		/*
-		 * We have no loss interval history so we need at least one
-		 * rtt:s of data packets to approximate rtt.
-		 */
-		list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) {
-			if (num_later == 0) {
-				switch (step) {
-				case 0:
-					step = 1;
-					/* OK, find next data packet */
-					num_later = 1;
-					break;
-				case 1:
-					step = 2;
-					/* OK, find next data packet */
-					num_later = 1;
-					win_count = entry->tfrchrx_ccval;
-					break;
-				case 2:
-					tmp = win_count - entry->tfrchrx_ccval;
-					if (tmp < 0)
-						tmp += TFRC_WIN_COUNT_LIMIT;
-					if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) {
-						/*
-						 * We have found a packet older
-						 * than one rtt remove the rest
-						 */
-						step = 3;
-					} else /* OK, find next data packet */
-						num_later = 1;
-					break;
-				case 3:
-					list_del_init(&entry->tfrchrx_node);
-					tfrc_rx_hist_entry_delete(entry);
-					break;
-				}
-			} else if (tfrc_rx_hist_entry_data_packet(entry))
-				--num_later;
+	int i;
+
+	for (i = 0; i <= TFRC_NDUPACK; ++i)
+		if (h->ring[i] != NULL) {
+			kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]);
+			h->ring[i] = NULL;
 		}
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
+
+int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
+{
+	int i;
+
+	for (i = 0; i <= TFRC_NDUPACK; i++) {
+		h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
+		if (h->ring[i] == NULL)
+			goto out_free;
+	}
+
+	h->loss_count = h->loss_start = 0;
+	return 0;
+
+out_free:
+	while (i-- != 0) {
+		kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]);
+		h->ring[i] = NULL;
 	}
+	return -ENOBUFS;
 }
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc);
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet);
+/**
+ * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h)
+{
+	return h->ring[0];
+}
 
-void tfrc_rx_hist_purge(struct list_head *list)
+/**
+ * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry
+ */
+static inline struct tfrc_rx_hist_entry *
+			tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h)
 {
-	struct tfrc_rx_hist_entry *entry, *next;
+	return h->ring[h->rtt_sample_prev];
+}
 
-	list_for_each_entry_safe(entry, next, list, tfrchrx_node) {
-		list_del_init(&entry->tfrchrx_node);
-		tfrc_rx_hist_entry_delete(entry);
+/**
+ * tfrc_rx_hist_sample_rtt  -  Sample RTT from timestamp / CCVal
+ * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able
+ * to compute a sample with given data - calling function should check this.
+ */
+u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb)
+{
+	u32 sample = 0,
+	    delta_v = SUB16(dccp_hdr(skb)->dccph_ccval,
+	    		    tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+
+	if (delta_v < 1 || delta_v > 4) {	/* unsuitable CCVal delta */
+		if (h->rtt_sample_prev == 2) {	/* previous candidate stored */
+			sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval,
+				       tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+			if (sample)
+				sample = 4 / sample *
+				         ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp,
+							tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp);
+			else    /*
+				 * FIXME: This condition is in principle not
+				 * possible but occurs when CCID is used for
+				 * two-way data traffic. I have tried to trace
+				 * it, but the cause does not seem to be here.
+				 */
+				DCCP_BUG("please report to dccp@vger.kernel.org"
+					 " => prev = %u, last = %u",
+					 tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval,
+					 tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval);
+		} else if (delta_v < 1) {
+			h->rtt_sample_prev = 1;
+			goto keep_ref_for_next_time;
+		}
+
+	} else if (delta_v == 4) /* optimal match */
+		sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp));
+	else {			 /* suboptimal match */
+		h->rtt_sample_prev = 2;
+		goto keep_ref_for_next_time;
 	}
-}
 
-EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge);
+	if (unlikely(sample > DCCP_SANE_RTT_MAX)) {
+		DCCP_WARN("RTT sample %u too large, using max\n", sample);
+		sample = DCCP_SANE_RTT_MAX;
+	}
+
+	h->rtt_sample_prev = 0;	       /* use current entry as next reference */
+keep_ref_for_next_time:
+
+	return sample;
+}
+EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt);
 
 __init int packet_history_init(void)
 {
diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h
index 5b0b983..dac275f 100644
--- a/net/dccp/ccids/lib/packet_history.h
+++ b/net/dccp/ccids/lib/packet_history.h
@@ -37,15 +37,9 @@
 #define _DCCP_PKT_HIST_
 
 #include <linux/ktime.h>
-#include <linux/list.h>
-#include <linux/slab.h>
-#include "tfrc.h"
+#include <linux/types.h>
 
-/* Number of later packets received before one is considered lost */
-#define TFRC_RECV_NUM_LATE_LOSS	 3
-
-#define TFRC_WIN_COUNT_PER_RTT	 4
-#define TFRC_WIN_COUNT_LIMIT	16
+struct sk_buff;
 
 struct tfrc_tx_hist_entry;
 
@@ -54,54 +48,38 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp);
 extern u32  tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head,
 			     const u64 seqno, const ktime_t now);
 
-/*
- * 	Receiver History data structures and declarations
- */
-struct tfrc_rx_hist_entry {
-	struct list_head tfrchrx_node;
-	u64		 tfrchrx_seqno:48,
-			 tfrchrx_ccval:4,
-			 tfrchrx_type:4;
-	u32		 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
-	ktime_t		 tfrchrx_tstamp;
-};
-
-extern struct tfrc_rx_hist_entry *
-			tfrc_rx_hist_entry_new(const u32 ndp,
-					       const struct sk_buff *skb,
-					       const gfp_t prio);
+struct tfrc_rx_hist_entry;
 
-static inline struct tfrc_rx_hist_entry *
-			tfrc_rx_hist_head(struct list_head *list)
-{
-	struct tfrc_rx_hist_entry *head = NULL;
+/* Subtraction a-b modulo-16, respects circular wrap-around */
+#define SUB16(a, b) (((a) + 16 - (b)) & 0xF)
 
-	if (!list_empty(list))
-		head = list_entry(list->next, struct tfrc_rx_hist_entry,
-				  tfrchrx_node);
-	return head;
-}
+/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */
+#define TFRC_NDUPACK 3
 
-extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq,
-				   u8 *ccval);
-extern struct tfrc_rx_hist_entry *
-		tfrc_rx_hist_find_data_packet(const struct list_head *list);
-
-extern void tfrc_rx_hist_add_packet(struct list_head *rx_list,
-				    struct list_head *li_list,
-				    struct tfrc_rx_hist_entry *packet,
-				    u64 nonloss_seqno);
-
-extern void tfrc_rx_hist_purge(struct list_head *list);
+/**
+ * tfrc_rx_hist  -  RX history structure for TFRC-based protocols
+ *
+ * @ring:		Packet history for RTT sampling and loss detection
+ * @loss_count:		Number of entries in circular history
+ * @loss_start:		Movable index (for loss detection)
+ * @rtt_sample_prev:	Used during RTT sampling, points to candidate entry
+ */
+struct tfrc_rx_hist {
+	struct tfrc_rx_hist_entry *ring[TFRC_NDUPACK + 1];
+	u8			  loss_count:2,
+				  loss_start:2;
+#define rtt_sample_prev		  loss_start
+};
 
-static inline int
-	tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry)
-{
-	return entry->tfrchrx_type == DCCP_PKT_DATA ||
-	       entry->tfrchrx_type == DCCP_PKT_DATAACK;
-}
+extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h,
+				    const struct sk_buff *skb, const u32 ndp);
 
-extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list,
-				    struct list_head *li_list, u8 *win_loss);
+extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb);
+extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h,
+					   const struct sk_buff *skb, u32 ndp);
+extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h,
+				   const struct sk_buff *skb);
+extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h);
+extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h);
 
 #endif /* _DCCP_PKT_HIST_ */
diff --git a/net/dccp/ccids/lib/packet_history_internal.h b/net/dccp/ccids/lib/packet_history_internal.h
new file mode 100644
index 0000000..70d5c31
--- /dev/null
+++ b/net/dccp/ccids/lib/packet_history_internal.h
@@ -0,0 +1,67 @@
+#ifndef _DCCP_PKT_HIST_INT_
+#define _DCCP_PKT_HIST_INT_
+/*
+ *  Packet RX/TX history data structures and routines for TFRC-based protocols.
+ *
+ *  Copyright (c) 2007   The University of Aberdeen, Scotland, UK
+ *  Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
+ *
+ *  This code has been developed by the University of Waikato WAND
+ *  research group. For further information please see http://www.wand.net.nz/
+ *  or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz
+ *
+ *  This code also uses code from Lulea University, rereleased as GPL by its
+ *  authors:
+ *  Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon
+ *
+ *  Changes to meet Linux coding standards, to make it meet latest ccid3 draft
+ *  and to make it work as a loadable module in the DCCP stack written by
+ *  Arnaldo Carvalho de Melo <acme@conectiva.com.br>.
+ *
+ *  Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/dccp.h>
+#include <linux/ktime.h>
+#include <linux/types.h>
+
+#define TFRC_WIN_COUNT_LIMIT 16
+
+/**
+ * tfrc_rx_hist_entry - Store information about a single received packet
+ * @tfrchrx_seqno:	DCCP packet sequence number
+ * @tfrchrx_ccval:	window counter value of packet (RFC 4342, 8.1)
+ * @tfrchrx_ndp:	the NDP count (if any) of the packet
+ * @tfrchrx_tstamp:	actual receive time of packet
+ */
+struct tfrc_rx_hist_entry {
+	u64	tfrchrx_seqno:48,
+		tfrchrx_ccval:4,
+		tfrchrx_type:4;
+	u32	tfrchrx_ndp; /* In fact it is from 8 to 24 bits */
+	ktime_t	tfrchrx_tstamp;
+};
+
+static inline bool tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *h)
+{
+	return h->tfrchrx_type == DCCP_PKT_DATA	   ||
+	       h->tfrchrx_type == DCCP_PKT_DATAACK ||
+	       h->tfrchrx_type == DCCP_PKT_REQUEST ||
+	       h->tfrchrx_type == DCCP_PKT_RESPONSE;
+}
+
+#endif /* _DCCP_PKT_HIST_INT_ */
-- 
1.5.3.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] [TFRC] New rx history code
  2007-12-04 11:59                 ` Arnaldo Carvalho de Melo
  2007-12-04 13:48                   ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo
@ 2007-12-05  9:35                   ` Gerrit Renker
  2007-12-05 12:08                     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 27+ messages in thread
From: Gerrit Renker @ 2007-12-05  9:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp

Quoting Arnaldo:
| Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu:
| > NAK. You have changed the control flow of the algorithm and the underlying
| > data structure. Originally it had been an array of pointers, and this had
| > been a design decision right from the first submissions in March. From I
| > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
| >  
| >  "1. 'ring' is an array of pointers rather than a contiguous area in
| >      memory. The reason is that, when having to swap adjacent entries
| >      to sort the history, it is easier to swap pointers than to copy
| >      (heavy-weight) history-entry data structs."
| > 
| > But in your algorithm it becomes a normal linear array.
| > 
| > As a consequence all subsequent code needs to be rewritten to use
| > copying instead of swapping pointers. And there is a lot of that, since
| > the loss detection code makes heavy use of swapping entries in order to
| > cope with reordered and/or delayed packets.
| 
| So lets not do that, the decision was made based on the RX history patch
| alone, where, as pointed out, no swapping occurs.
|  
| > This is not what I would call "entirely equivalent" as you claim. Your
| > algorithm is fundamentally different, the control flow is not the same,
| > nor are the underlying data structures.
| 
| Its is equivalent up to what is in the tree, thank you for point out
| that in subsequent patches it will be used.
| 
| Had this design decision been made explicit in the changeset comment
| that introduces the data structure I wouldn't have tried to reduce the
| number of allocations.
| 
| And you even said that it was a good decision when first reacting to
| this change, which I saw as positive feedback for the RFC I sent.
| 
That was my fault. Your solution "looked" much better to me but even I had forgotten
that the algorithm is based on an array of pointers. When I finally sat down 
and looked through the patch set I found the original notes from March and
saw that using a linear array instead of an array of pointers will require quite
a few changes and means changing the algorithm. I then thought through whether there
could be any advantage in using a linear array as in this patch, but could find none.
In the end this went back to the same questions and issues that were tackled between
February and March this year. I wish we could have had this dicussion then; it would
have made this email unnecessary.

I thank you for your replay and I am sorry if you took offense at my perhaps a little strong
reaction, here is the essence what I tried to convey but what maybe did not succeed in conveying:

 * I am absolutely ok with you making changes to variable names, namespaces, file organisation,
   overall arrangement etc (as per previous email). You have experience and this will often be a 
   benefit. For instance, you combined tfrc_entry_from_skb() with tfrc_rx_hist_update() to give
   a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() is not otherwise
   used (and I only found this out when reading your patch). (On the other hand, since this is a
   4-element ring buffer, it is no real adding, the same entry will frequently be overwritten,
   this is why it was initially called hist_update().)

 * I am also not so much concerned about credit. As long as the (changed) end result is as
   least as good as or hopefully better than the submitted input, the author name is a side
   issue; so I do think that your approach is sound and fair. The only place where credits
   are needed is for the SatSix project (www.ist-satsix.org) which funded this work. This is
   why some of the copyrights now included "University of Aberdeen".  What in any case I'd like
   to add is at least the Signed-off-by: if it turns out to be a mess I want to contribute my
   part of responsibility.

 * But where I need to interact is when changes are made to the algorithm, even if these may 
   seem small. The underlying reasons that lead to the code may not be immediately clear,
   but since this is a solution for a specific problem, there are also specific reasons; which
   is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first
   before committing the patch. 

 * In part you are already doing this (message subject); it may be necessary to adapt the way
   of discussion/presentation. The subject is complex (spent a week with the flow chart only)
   and it is a lot - initially this had been 40 small patches.	
   
| > | I modified it just to try to make it closer to the existing API, hide details from
| > | the CCIDs and fix a couple bugs found in the initial implementation.
| > What is "a couple bugs"? So far you have pointed out only one problem and that was
| > agreed, it can be fixed. But it remains a side issue, it is not a failure of the
| 
| I pointed two problems one ended up being just the fact that tfrc_ewma
| checks if the average is zero for every calculation.
| 
With regard to the tfrc_ewma we need to consider the discussion as I think we are having a
misunderstanding. It is necessary to 
 (a) deal with the case that a peer may be sending zero-sized packets:
     - one solution is to simply omit the check "if (len > 0)" and declare 0-sized data
     - I think you have a point in that the len check is indeed redundant when we are
       already checking that this is a data packet
       packets as pathological (which is probably what they are);
     - a BUG_ON would be not such a good solution since 0-sized packets are legal;
     - my suggestion is to remove the "len > 0" test - then tfrc_ewma() can also be
       used directly, without the surrounding inline function wrapper.

 (b) reconsider the control flow: in your patch set you introduced a different control flow
     in the rx_packet_recv() function. I tried to figure out why this was introduced, there
     are two problems, one not immediately obvious at this moment:
     - it does not consider where and how loss detection is done (which is part of a later
       patch). This is a key point why the flow may seem strange: loss detection must be
       done on both non-data and data packets; there are several cases depending on whether
       loss previously occurred; the accounting must also be done in parallel for data packets
       (for instance s and bytes_recvd) and non-data packets (highest-received seqno is always
     - it is not equivalent with the initial one, when comparing this to the flowchart:
       http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/reception-main-flowchart.png

| > I have provided documentation, written test modules, and am able to prove
| > that the algorithm as submitted works reasonably correct. In addition, the
| > behaviour has been verified using regression tests.
| > 
| > I am not prepared to take your claims and expressions of "deepest
| > respect" at face value since your actions - not your words - show that
| > you are, in fact, not dealing respectfully with work which has taken
| > months to complete and verify.
| > 
| > During 9 months on dccp@vger you did not provide so much as a paragraph
| > of feedback. Instead you mopped up code from the test tree, modified it
| > according to your own whims and now, in the circle of your
| > invitation-only buddies, start to talk about having discussions and 
| > iterations. The only iteration I can see is in fixing up the things you
| > introduced, so it is not you who has to pay the price.
| > 
| > Sorry, unless you can offer a more mature model of collaboration,
| > consider me out of this and the patches summarily withdrawn. I am not
| > prepared to throw away several months of work just because you feel
| > inspired to do as you please with the work of others. 
| 
| Again you want to have your patches accepted as-is. Pointed to one case
| where I gave you credit while improving on your work (TX history) and
| another where the changes were up for review. I don't consider this a
| warm welcome for me to finally dedicate time to this effort. Would you
| prefer to continue working without help? I think we should encourage
| more people to work on DCCP, you seem to think that changes to your work
| are not acceptable.
| 
Yes and no. The internal algorithm I would rather have accepted as-is, unless
there are good reasons to change it. But this by no means means I want this 
waved through like a diplomat's car at the border: there are reasons why
it is as it is, and I am happy to discuss these reasons; and if someone
can point out a better solution, will accept that. It is just that this
is a solution to a specific problem which so far has proven to work, and
I therefore think that we can spent the time more efficiently by not
fixing things that have already been fixed once: there are many more
problems in DCCP (just recently you mentioned robustness of memory usage,
then there is the ongoing discussion of how to integrate CCID4 as well);
I think we could divide up work time with more benefit.


| Perhaps others can comment on what is happening and help us find, as you
| say, to find a more mature model of collaboration.
| 
No need to be cynical, you have replied gracefully and with substance.
For that I thank you. The topic is complicated, adapting how it is
treated may improve the discussion - for instance, if it helps, we could
switch back to smaller patch format as original, which you have already
been introducing.


Best regards
Gerrit

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7][TAKE 2][TFRC] New rx history code
  2007-12-04 13:48                   ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo
@ 2007-12-05  9:42                     ` Gerrit Renker
  0 siblings, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-05  9:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp

| I found a problem that I'm still investigating if was introduced by this
| patch or if was already present. When sending 1 million 256 byte packets
| with ttcp over loopback, using ccid3 it is crashing, the test machine
| I'm using doesn't have a serial port, its a notebook, will switch to
| another that has and provide the backtrace. It doesn't happens every
| time.
| 
CCID3 is difficult due to the TX queue. Small packets are faster on the
wire, fill up the TX queue faster. Since there is little loss on LANs, 
the slow-start algorithm will soon reach link capacity; but CCID3 can
not deal effectively with high speeds.
What is known not to work well at the moment is bidirectional data
transfer (e.g. an echo server/client). This lead to the comment in
tfrc_rx_sample_rtt(); the support for bidirectional data transfer
needs some more work, which however requires to make one-directional
transfer work well first.

| Here is tfrc_rx_hist_alloc back using ring of pointers with the fixed
| error path.
| 
Thank you - I was just about to send a similar patch as update since
you clearly identified this bug. I will resubmit with your version
and upload it to the test tree.


| +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h)
|  {
| +	int i;
| +
| +	for (i = 0; i <= TFRC_NDUPACK; i++) {
| +		h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC);
| +		if (h->ring[i] == NULL)
| +			goto out_free;
| +	}
| +
| +	h->loss_count = h->loss_start = 0;
| +	return 0;
| +
| +out_free:
| +	while (i-- != 0) {
| +		kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]);
| +		h->ring[i] = NULL;
|  	}
| +	return -ENOBUFS;
|  }
| 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-03 14:54       ` Arnaldo Carvalho de Melo
  2007-12-03 15:44         ` Gerrit Renker
@ 2007-12-05 10:27         ` Gerrit Renker
  2007-12-05 11:52           ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 27+ messages in thread
From: Gerrit Renker @ 2007-12-05 10:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar

Today being Wednesday, below is some feedback after working through the patch set.
Hope this is helpful. 

Patch #1:
---------
  Several new good points are introduced:
  - IP_DCCP_TFRC_DEBUG is made dependent on IP_DCCP_TFRC_DEBUG which is useful
  - the select from CONFIG_IP_DCCP_CCID3 => CONFIG_DCCP_TFRC_LIB
  - the cleanup action in tfrc_module_init() (when packet_history_init() fails)
    was previously missing, this is a good catch.
  Also a note: tfrc_pr_debug() is not currently used (but may be later should the 
               code common to both CCID3 and CCID4 be shared).
  		  
Patches #2/#6:
--------------
  Separated from main patch, no problems (were initially submitted in this format).
  I wonder whether switching back to smaller patch sizes is better?

Patch #3: 
---------
  Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems.

Patch #4:
---------
  packet_history_init() initialises both RX and TX history and is later called by the module_init()
  function in net/dccp/ccids/lib/tfrc.c. Just a suggestion, it is possible to simplify this further,
  by doing all the initialisation / cleanup in tfrc.c:
    int __init {rx,tx}_packet_history_init()
    {
	    tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", ...);
	    return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0;
    }
  and then call these successively in tfrc_module_init().

Patch #5:
---------
  The naming scheme introduced here is 
	  s/dccp_rx/tfrc_rx/g; 
	  s/dccphrx/tfrchrx/g;
  I wonder, while at it, would it be possible to extend this and extend this to other parts
  of the code? Basically this is the same discussion as earlier on dccp@vger with Leandro,
  who first came up with this naming scheme. There the same question came up and the result
  of the discussion was that a prefix of  `tfrchrx' is cryptic; if something simpler is 
  possible then it would be great.

Patch #7:
---------
 * ccid3_hc_rx_detect_loss() can be fully removed since no other code references it any
   further.
 * bytes_recv also counts the payload_size's of non-data packets, which is wrong (it should only
   sum up the sum of bytes transferred as data packets)	 

 * loss handling is not correctly taken care of: unlike in the other part, both data and non-data
   packets are used to detect loss (this is not correctly handled in the current Linux implementation).


 * tfrc_rx_hist_entry_data_packet() is not needed:
   - a similar function, called dccp_data_packet(), was introduced in patch 2/7
   - code compiles cleanly without tfrc_rx_hist_entry_data_packet()
   - all references to this function are deleted by patch 7/7
 * is another header file (net/dccp/ccids/lib/packet_history_internal.h) really necessary?
   - net/dccp/ccids/lib has already 3 x header file, 4 x source file
   - with the removal of tfrc_rx_hist_entry_data_packet(), only struct tfrc_rx_hist_entry
     remains as the sole occupant of that file
   - how about hiding the internals of struct tfrc_rx_hist_entry by putting it into packet_history.c,
     as it was done previously in a similar way for the TX packet history?

 * in ccid3_hc_rx_insert_options(), due to hunk-shifting or something else, there is still the
   call to dccp_insert_option_timestamp(sk, skb)
   --> this was meant to be removed by an earlier patch (which also removed the Elapsed Time option);
   --> in the original submission of this patch the call to dccp_insert_option_timestamp() did no
       longer appear (as can be found in the dccp@vger mailing list archives), and the test tree
       likewise does not use it;
   --> it can be removed with full confidence since no part of the code uses timestamps sent by the
       HC-receiver (only the HC-sender timestamps are used); and it is further not required by the
       spec to send HC-receiver timestamps (RFC 4342, section 6)

 * one of the two variables ccid3hcrx_tstamp_last_feedback and ccid3hcrx_tstamp_last_ack is
   redundant and can be removed (this may be part of a later patch); the variable ccid3hcrx_tstamp_last_feedback
   is very long (function is clear due to type of ktime_t).


 * the inlines are a good idea regarding type safety, but I take it that we can now throw overboard the old rule
   of 80 characters per line? Due to the longer names of the inlines, some expressions end at column 98 (cf. 
   tfrc_rx_hist_sample_rtt(); but to be honest I'd rather get rid of that function since the receiver-RTT
   sampling is notoriously inaccurate (wrong place to measure) and then there is little left to argue with the inlines).

 * with regard to RX history initialisation, would you be amicable to the idea of performing the RX/TX history, and
   loss intervals history in tfrc.c, as suggested for patch 1/7 (shortens the routines)?


 * tfrc_rx_hist_entry is lacking documentation (my fault, had been forgotten in the initial submission):
	/** 
	 * tfrc_rx_hist_entry - Store information about a single received packet
	 * @ptype:   the type (5.1) of the packet
	 ...
	 */
  
 * is it really necessary to give the field members of known structures long names such as
	   tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno?
   This is the same comment as per patch 5/7 and there has been an earlier discussion on dccp@vger where
   other developers (not just me) agreed that such long names are a burden to write; but we could leave that also for later.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-05 10:27         ` Gerrit Renker
@ 2007-12-05 11:52           ` Arnaldo Carvalho de Melo
  2007-12-05 13:45             ` Gerrit Renker
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-05 11:52 UTC (permalink / raw)
  To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp

Note: removed Ingo from the CC list, I had added it first just because
he advocated reducing the number of mailing lists, so I wanted him to
know that we're trying to do that.

Em Wed, Dec 05, 2007 at 10:27:36AM +0000, Gerrit Renker escreveu:
> Today being Wednesday, below is some feedback after working through the patch set.
> Hope this is helpful. 
> 
> Patch #1:
> ---------
>   Several new good points are introduced:
>   - IP_DCCP_TFRC_DEBUG is made dependent on IP_DCCP_TFRC_DEBUG which is useful
>   - the select from CONFIG_IP_DCCP_CCID3 => CONFIG_DCCP_TFRC_LIB
>   - the cleanup action in tfrc_module_init() (when packet_history_init() fails)
>     was previously missing, this is a good catch.
>   Also a note: tfrc_pr_debug() is not currently used (but may be later should the 
>                code common to both CCID3 and CCID4 be shared).

OK
   		  
> Patches #2/#6:
> --------------
>   Separated from main patch, no problems (were initially submitted in this format).
>   I wonder whether switching back to smaller patch sizes is better?

Patches that do one thing that is logically separated from others are
preferred, as its analisys is made faster.

Patches that rewrite an existing functionality are done best when the
functions being replaced keep as much as possible the same name. For
instance, in the new RX handling code we need to alloc the RX hist
internal structures, in the previous implementation we also needed to do
that, it also needs to purge entries, as the old one needed. Previous
one was a linked list, the new one is a ring. As we go we can and should
add new APIs when needed, but trying to keep an API so that if in the
future we decide to rework the internal structures, this can be done in
a plugin fashion, without changing too much its users (CCIDs in this
case), so that we can get back to the old one with minor disruption to
the users if needed. Sometimes we manage to do that, some times we need
to improve upon the current API, but the goal remains the same.
 
> Patch #3: 
> ---------
>   Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems.

This was for consistency with the other slabs in DCCP:

[acme@doppio net-2.6.25]$ find net/dccp/ -name "*.c" | xargs grep 'struct kmem_cache'
net/dccp/ccids/lib/packet_history.c:static struct kmem_cache *tfrc_tx_hist_slab;
net/dccp/ccids/lib/packet_history.c:static struct kmem_cache *tfrc_rx_hist_slab;
net/dccp/ccids/lib/loss_interval.c:static struct kmem_cache *dccp_li_cachep __read_mostly;
net/dccp/ackvec.c:static struct kmem_cache *dccp_ackvec_slab;
net/dccp/ackvec.c:static struct kmem_cache *dccp_ackvec_record_slab;
net/dccp/ccid.c:        struct kmem_cache *slab;
net/dccp/ccid.c:static void ccid_kmem_cache_destroy(struct kmem_cache *slab)
[acme@doppio net-2.6.25]$

> Patch #4:
> ---------
>   packet_history_init() initialises both RX and TX history and is later called by the module_init()
>   function in net/dccp/ccids/lib/tfrc.c. Just a suggestion, it is possible to simplify this further,
>   by doing all the initialisation / cleanup in tfrc.c:
>     int __init {rx,tx}_packet_history_init()
>     {
> 	    tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", ...);
> 	    return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0;
>     }
>   and then call these successively in tfrc_module_init().

Idea here was to have each C source file to have a init module. Perhaps
we should try to break packet_history.c into tx_packet_history and
rx_packet_history.c. We can do that later to try to meet the goal of
being able to see what is being replaced.
 
> Patch #5:
> ---------
>   The naming scheme introduced here is 
> 	  s/dccp_rx/tfrc_rx/g; 
> 	  s/dccphrx/tfrchrx/g;
>   I wonder, while at it, would it be possible to extend this and extend this to other parts
>   of the code? Basically this is the same discussion as earlier on dccp@vger with Leandro,
>   who first came up with this naming scheme. There the same question came up and the result
>   of the discussion was that a prefix of  `tfrchrx' is cryptic; if something simpler is 
>   possible then it would be great.

As we did with the ackvecs, will do that later.
 
> Patch #7:
> ---------
>  * ccid3_hc_rx_detect_loss() can be fully removed since no other code references it any
>    further.

I left it to try have the diff not mixing up removal of its
implementation with the implementation of the function just after it,
ccid3_hx_rx_packet_recv. Will remove it later.

>  * bytes_recv also counts the payload_size's of non-data packets, which is wrong (it should only
>    sum up the sum of bytes transferred as data packets)	 

As said to you in private message I'll get back to the way you wrote, as
you mentioned that upcoming patches will make good use of that.

I'll try to restrain me to not do too many changes.
 
>  * loss handling is not correctly taken care of: unlike in the other part, both data and non-data
>    packets are used to detect loss (this is not correctly handled in the current Linux implementation).
 
See previous comment
 
>  * tfrc_rx_hist_entry_data_packet() is not needed:
>    - a similar function, called dccp_data_packet(), was introduced in patch 2/7

I thought about that, but dccp_data_packet is for skbs,
tfrc_rx_hist_entry_data_packet is for tfrc_rx_hist_entries, I guess we
should just make dccp_data_packet receive the packet type and not an
object that has a packet type field.

>    - code compiles cleanly without tfrc_rx_hist_entry_data_packet()
>    - all references to this function are deleted by patch 7/7
>  * is another header file (net/dccp/ccids/lib/packet_history_internal.h) really necessary?
>    - net/dccp/ccids/lib has already 3 x header file, 4 x source file

Idea here is to not expose data structures and functions that are used
only by the rx handling and loss history code. 

>    - with the removal of tfrc_rx_hist_entry_data_packet(), only struct tfrc_rx_hist_entry
>      remains as the sole occupant of that file
>    - how about hiding the internals of struct tfrc_rx_hist_entry by putting it into packet_history.c,
>      as it was done previously in a similar way for the TX packet history?

See previous comment.
 
>  * in ccid3_hc_rx_insert_options(), due to hunk-shifting or something else, there is still the
>    call to dccp_insert_option_timestamp(sk, skb)
>    --> this was meant to be removed by an earlier patch (which also removed the Elapsed Time option);
>    --> in the original submission of this patch the call to dccp_insert_option_timestamp() did no
>        longer appear (as can be found in the dccp@vger mailing list archives), and the test tree
>        likewise does not use it;
>    --> it can be removed with full confidence since no part of the code uses timestamps sent by the
>        HC-receiver (only the HC-sender timestamps are used); and it is further not required by the
>        spec to send HC-receiver timestamps (RFC 4342, section 6)

I removed that on purpose, it didn't look related to the main goal of
the patch nor was mentioned in the changeset, I'll add it as a separate
patch with the explanation you provided above (and in the past).
 
>  * one of the two variables ccid3hcrx_tstamp_last_feedback and ccid3hcrx_tstamp_last_ack is
>    redundant and can be removed (this may be part of a later patch); the variable ccid3hcrx_tstamp_last_feedback
>    is very long (function is clear due to type of ktime_t).

will change that later. I.e. these things can be changed later, avoiding
such changes at this point reduces the size of the patch, which is
always good.
 
>  * the inlines are a good idea regarding type safety, but I take it that we can now throw overboard the old rule
>    of 80 characters per line? Due to the longer names of the inlines, some expressions end at column 98 (cf. 
>    tfrc_rx_hist_sample_rtt(); but to be honest I'd rather get rid of that function since the receiver-RTT
>    sampling is notoriously inaccurate (wrong place to measure) and then there is little left to argue with the inlines).

We must try as much as possible to not have more than 80 characters per
line, but at the same time the kernel is ever growing, namespacing is
required to be able to use tools such as ctags, and even to help in
decoding oopses I'd say, as we can get more information on a backtrace.

>  * with regard to RX history initialisation, would you be amicable to the idea of performing the RX/TX history, and
>    loss intervals history in tfrc.c, as suggested for patch 1/7 (shortens the routines)?

See my previous comment.
 
>  * tfrc_rx_hist_entry is lacking documentation (my fault, had been forgotten in the initial submission):
> 	/** 
> 	 * tfrc_rx_hist_entry - Store information about a single received packet
> 	 * @ptype:   the type (5.1) of the packet
> 	 ...
> 	 */
>   
>  * is it really necessary to give the field members of known structures long names such as
> 	   tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno?
>    This is the same comment as per patch 5/7 and there has been an earlier discussion on dccp@vger where
>    other developers (not just me) agreed that such long names are a burden to write; but we could leave that also for later.

See previous comment, we can do that as we did for the ackvec, in a
separate patch, but lets leave this for later now, trying to use
namespacing but with shorter names, that even looking cryptic can help
in searching for RX stamps, for instance, as we will iterate thru them
using the editor search routine and don't get distracted with the TX
tstamps, for instance, or search for them with grep.

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] [TFRC] New rx history code
  2007-12-05  9:35                   ` [PATCH 7/7] [TFRC] " Gerrit Renker
@ 2007-12-05 12:08                     ` Arnaldo Carvalho de Melo
  2007-12-05 13:34                       ` Gerrit Renker
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-05 12:08 UTC (permalink / raw)
  To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp

Em Wed, Dec 05, 2007 at 09:35:30AM +0000, Gerrit Renker escreveu:
> Quoting Arnaldo:
> | Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu:
> | > NAK. You have changed the control flow of the algorithm and the underlying
> | > data structure. Originally it had been an array of pointers, and this had
> | > been a design decision right from the first submissions in March. From I
> | > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt
> | >  
> | >  "1. 'ring' is an array of pointers rather than a contiguous area in
> | >      memory. The reason is that, when having to swap adjacent entries
> | >      to sort the history, it is easier to swap pointers than to copy
> | >      (heavy-weight) history-entry data structs."
> | > 
> | > But in your algorithm it becomes a normal linear array.
> | > 
> | > As a consequence all subsequent code needs to be rewritten to use
> | > copying instead of swapping pointers. And there is a lot of that, since
> | > the loss detection code makes heavy use of swapping entries in order to
> | > cope with reordered and/or delayed packets.
> | 
> | So lets not do that, the decision was made based on the RX history patch
> | alone, where, as pointed out, no swapping occurs.
> |  
> | > This is not what I would call "entirely equivalent" as you claim. Your
> | > algorithm is fundamentally different, the control flow is not the same,
> | > nor are the underlying data structures.
> | 
> | Its is equivalent up to what is in the tree, thank you for point out
> | that in subsequent patches it will be used.
> | 
> | Had this design decision been made explicit in the changeset comment
> | that introduces the data structure I wouldn't have tried to reduce the
> | number of allocations.
> | 
> | And you even said that it was a good decision when first reacting to
> | this change, which I saw as positive feedback for the RFC I sent.
> | 
> That was my fault. Your solution "looked" much better to me but even I had forgotten
> that the algorithm is based on an array of pointers. When I finally sat down 

And you wrote that code, I acted on looking the code together with
reading the changeset comment, to me it also looked much better to do
that, but as there were many changes, I sent an [RFC] message. It served
its purpose, as you after two iterations realised it was in fact not a
good idea as later an array of pointers is required. I should have taken
that swap routine as the definitive hint that indeed it was needed.

> and looked through the patch set I found the original notes from March and
> saw that using a linear array instead of an array of pointers will require quite
> a few changes and means changing the algorithm. I then thought through whether there
> could be any advantage in using a linear array as in this patch, but could find none.
> In the end this went back to the same questions and issues that were tackled between
> February and March this year. I wish we could have had this dicussion then; it would
> have made this email unnecessary.

That is why it is important that the changeset comment collects these
scattered notes and discussions. Think about in some years from now we
will be left with a situation where we would have to look at many places
to understando some aspects of what is in the code. While this happens
and we have google for that I think that since you keep such detailed
notes it is not a problem to get them in the changesets.
 
> I thank you for your replay and I am sorry if you took offense at my perhaps a little strong
> reaction, here is the essence what I tried to convey but what maybe did not succeed in conveying:
> 
>  * I am absolutely ok with you making changes to variable names, namespaces, file organisation,
>    overall arrangement etc (as per previous email). You have experience and this will often be a 
>    benefit. For instance, you combined tfrc_entry_from_skb() with tfrc_rx_hist_update() to give
>    a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() is not otherwise
>    used (and I only found this out when reading your patch). (On the other hand, since this is a
>    4-element ring buffer, it is no real adding, the same entry will frequently be overwritten,
>    this is why it was initially called hist_update().)

>From the RX history API user perspective (CCID3 right now, others may
come) it is adding a packet to the history. In the past it went to a
list, now we have a ring and don't keep more than TFRC_NDUPACK + 1
entries, but this is an internal detail that users don't need to know.
 
>  * I am also not so much concerned about credit. As long as the (changed) end result is as
>    least as good as or hopefully better than the submitted input, the author name is a side
>    issue; so I do think that your approach is sound and fair. The only place where credits
>    are needed is for the SatSix project (www.ist-satsix.org) which funded this work. This is
>    why some of the copyrights now included "University of Aberdeen".  What in any case I'd like
>    to add is at least the Signed-off-by: if it turns out to be a mess I want to contribute my
>    part of responsibility.

And that, along with credit to you is being provided on the few patches
I change.
 
>  * But where I need to interact is when changes are made to the algorithm, even if these may 
>    seem small. The underlying reasons that lead to the code may not be immediately clear,
>    but since this is a solution for a specific problem, there are also specific reasons; which
>    is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first
>    before committing the patch. 

I did this for the RX handling, where changes were made. Did that for
the TX but ended up commiting before comments from you, but I think its
fair to say that the changes for the TX history were more organizational
than in the essence of the algorithm.
 
>  * In part you are already doing this (message subject); it may be necessary to adapt the way
>    of discussion/presentation. The subject is complex (spent a week with the flow chart only)
>    and it is a lot - initially this had been 40 small patches.	
>    
> | > | I modified it just to try to make it closer to the existing API, hide details from
> | > | the CCIDs and fix a couple bugs found in the initial implementation.
> | > What is "a couple bugs"? So far you have pointed out only one problem and that was
> | > agreed, it can be fixed. But it remains a side issue, it is not a failure of the
> | 
> | I pointed two problems one ended up being just the fact that tfrc_ewma
> | checks if the average is zero for every calculation.
> | 
> With regard to the tfrc_ewma we need to consider the discussion as I think we are having a
> misunderstanding. It is necessary to 
>  (a) deal with the case that a peer may be sending zero-sized packets:
>      - one solution is to simply omit the check "if (len > 0)" and declare 0-sized data
>      - I think you have a point in that the len check is indeed redundant when we are
>        already checking that this is a data packet
>        packets as pathological (which is probably what they are);
>      - a BUG_ON would be not such a good solution since 0-sized packets are legal;
>      - my suggestion is to remove the "len > 0" test - then tfrc_ewma() can also be
>        used directly, without the surrounding inline function wrapper.
> 
>  (b) reconsider the control flow: in your patch set you introduced a different control flow
>      in the rx_packet_recv() function. I tried to figure out why this was introduced, there
>      are two problems, one not immediately obvious at this moment:
>      - it does not consider where and how loss detection is done (which is part of a later
>        patch). This is a key point why the flow may seem strange: loss detection must be
>        done on both non-data and data packets; there are several cases depending on whether
>        loss previously occurred; the accounting must also be done in parallel for data packets
>        (for instance s and bytes_recvd) and non-data packets (highest-received seqno is always
>      - it is not equivalent with the initial one, when comparing this to the flowchart:
>        http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/reception-main-flowchart.png

I will revert to your original flow.
 
> | > I have provided documentation, written test modules, and am able to prove
> | > that the algorithm as submitted works reasonably correct. In addition, the
> | > behaviour has been verified using regression tests.
> | > 
> | > I am not prepared to take your claims and expressions of "deepest
> | > respect" at face value since your actions - not your words - show that
> | > you are, in fact, not dealing respectfully with work which has taken
> | > months to complete and verify.
> | > 
> | > During 9 months on dccp@vger you did not provide so much as a paragraph
> | > of feedback. Instead you mopped up code from the test tree, modified it
> | > according to your own whims and now, in the circle of your
> | > invitation-only buddies, start to talk about having discussions and 
> | > iterations. The only iteration I can see is in fixing up the things you
> | > introduced, so it is not you who has to pay the price.
> | > 
> | > Sorry, unless you can offer a more mature model of collaboration,
> | > consider me out of this and the patches summarily withdrawn. I am not
> | > prepared to throw away several months of work just because you feel
> | > inspired to do as you please with the work of others. 
> | 
> | Again you want to have your patches accepted as-is. Pointed to one case
> | where I gave you credit while improving on your work (TX history) and
> | another where the changes were up for review. I don't consider this a
> | warm welcome for me to finally dedicate time to this effort. Would you
> | prefer to continue working without help? I think we should encourage
> | more people to work on DCCP, you seem to think that changes to your work
> | are not acceptable.
> | 
> Yes and no. The internal algorithm I would rather have accepted as-is, unless
> there are good reasons to change it. But this by no means means I want this 
> waved through like a diplomat's car at the border: there are reasons why
> it is as it is, and I am happy to discuss these reasons; and if someone
> can point out a better solution, will accept that. It is just that this
> is a solution to a specific problem which so far has proven to work, and
> I therefore think that we can spent the time more efficiently by not
> fixing things that have already been fixed once: there are many more
> problems in DCCP (just recently you mentioned robustness of memory usage,
> then there is the ongoing discussion of how to integrate CCID4 as well);
> I think we could divide up work time with more benefit.

OK, I think I exaggerated on the changes and that you exaggerated on the
reaction, lets try to tone down both behaviours and we'll move on.
 
> | Perhaps others can comment on what is happening and help us find, as you
> | say, to find a more mature model of collaboration.
> | 
> No need to be cynical, you have replied gracefully and with substance.
> For that I thank you. The topic is complicated, adapting how it is
> treated may improve the discussion - for instance, if it helps, we could
> switch back to smaller patch format as original, which you have already
> been introducing.

No intention, I was fearing your reaction would escalate and asking for
other people that are more experienced in maintainership, telling about
my and your mistakes in interacting could help. But I think we're going
back on track by ourselves, thank you.

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] [TFRC] New rx history code
  2007-12-05 12:08                     ` Arnaldo Carvalho de Melo
@ 2007-12-05 13:34                       ` Gerrit Renker
  0 siblings, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-05 13:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp

| > In the end this went back to the same questions and issues that were tackled between
| > February and March this year. I wish we could have had this dicussion then; it would
| > have made this email unnecessary.
| 
| That is why it is important that the changeset comment collects these
| scattered notes and discussions. Think about in some years from now we
| will be left with a situation where we would have to look at many places
| to understando some aspects of what is in the code. While this happens
| and we have google for that I think that since you keep such detailed
| notes it is not a problem to get them in the changesets.
|  
I agree, the changelogs are all a bit short. When condensing from 40 to 16 patches the changelogs
were also cut down, for fear of being too verbose. Will go through the patches in the next days and
see if/where this can be improved, that would probably have saved some trouble.

| >  * But where I need to interact is when changes are made to the algorithm, even if these may 
| >    seem small. The underlying reasons that lead to the code may not be immediately clear,
| >    but since this is a solution for a specific problem, there are also specific reasons; which
| >    is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first
| >    before committing the patch. 
| 
| I did this for the RX handling, where changes were made. Did that for
| the TX but ended up commiting before comments from you, but I think its
| fair to say that the changes for the TX history were more organizational
| than in the essence of the algorithm.
|  
It was a similar situation: the RFC patch came out on Thursday afternoon; I replied a bit hurriedly
on Friday that it seemed ok, but found the full confirmation only on Sunday after putting all
recent changes into the test tree. And yes, you made a good job of it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
  2007-12-05 11:52           ` Arnaldo Carvalho de Melo
@ 2007-12-05 13:45             ` Gerrit Renker
  0 siblings, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-05 13:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, netdev, dccp

| > Patch #3: 
| > ---------
| >   Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems.
| 
| This was for consistency with the other slabs in DCCP:
| 
| [acme@doppio net-2.6.25]$ find net/dccp/ -name "*.c" | xargs grep 'struct kmem_cache'
Thanks, I will put the same naming scheme also in the test tree (tomorrow).

 
| > Patch #4:
| > ---------
| > Just a suggestion, it is possible to simplify this further,
| >   by doing all the initialisation / cleanup in tfrc.c:
| >     int __init {rx,tx}_packet_history_init()
| >     {
| > 	    tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", ...);
| > 	    return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0;
| >     }
| >   and then call these successively in tfrc_module_init().
| 
| Idea here was to have each C source file to have a init module. Perhaps
| we should try to break packet_history.c into tx_packet_history and
| rx_packet_history.c. We can do that later to try to meet the goal of
| being able to see what is being replaced.
|
I think this is a great idea, since then rx_packet_history.c could also
take up all the internals of the RX packet history list, as it is
currently done for the TX history, and it could also possibly
incorporate. packet_history_internal.h.


|  
| > Patch #7:
| > ---------
|  
| >  * tfrc_rx_hist_entry_data_packet() is not needed:
| >    - a similar function, called dccp_data_packet(), was introduced in patch 2/7
| 
| I thought about that, but dccp_data_packet is for skbs,
| tfrc_rx_hist_entry_data_packet is for tfrc_rx_hist_entries, I guess we
| should just make dccp_data_packet receive the packet type and not an
| object that has a packet type field.
| 
The question which of the two interfaces is generally better to use is
best left to you. Two functions doing almost the same thing can probably
be replaced by just one.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
  2007-12-02 21:36     ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo
  2007-12-02 21:36       ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo
@ 2007-12-06 13:57       ` Gerrit Renker
  1 sibling, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-06 13:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp

| Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] [TFRC]: Make the rx history slab be global
  2007-12-02 21:36       ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo
  2007-12-02 21:36         ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo
@ 2007-12-06 13:59         ` Gerrit Renker
  1 sibling, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-06 13:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp

| Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_
  2007-12-02 21:36         ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo
  2007-12-02 21:36           ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo
@ 2007-12-06 13:59           ` Gerrit Renker
  1 sibling, 0 replies; 27+ messages in thread
From: Gerrit Renker @ 2007-12-06 13:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp

| Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2007-12-06 14:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo
2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo
2007-12-02 21:36   ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo
2007-12-02 21:36     ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo
2007-12-02 21:36       ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo
2007-12-02 21:36         ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo
2007-12-02 21:36           ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo
2007-12-02 21:36             ` [PATCH 7/7] [TFRC] New rx history code Arnaldo Carvalho de Melo
2007-12-04  6:55               ` Gerrit Renker
2007-12-04 11:59                 ` Arnaldo Carvalho de Melo
2007-12-04 13:48                   ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo
2007-12-05  9:42                     ` Gerrit Renker
2007-12-05  9:35                   ` [PATCH 7/7] [TFRC] " Gerrit Renker
2007-12-05 12:08                     ` Arnaldo Carvalho de Melo
2007-12-05 13:34                       ` Gerrit Renker
2007-12-06 13:59           ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Gerrit Renker
2007-12-06 13:59         ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Gerrit Renker
2007-12-06 13:57       ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Gerrit Renker
2007-12-03  8:23 ` [RFC][PATCHES 0/7]: Reorganization of RX history patches Ian McDonald
2007-12-03  8:35 ` Gerrit Renker
2007-12-03 12:44   ` Arnaldo Carvalho de Melo
2007-12-03 13:49     ` Gerrit Renker
2007-12-03 14:54       ` Arnaldo Carvalho de Melo
2007-12-03 15:44         ` Gerrit Renker
2007-12-05 10:27         ` Gerrit Renker
2007-12-05 11:52           ` Arnaldo Carvalho de Melo
2007-12-05 13:45             ` Gerrit Renker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).