* [RFC][PATCHES 0/7]: Reorganization of RX history patches
@ 2007-12-02 21:36 Arnaldo Carvalho de Melo
2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw)
To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar
WARNING: After reading some messages from Ingo Molnar on lkml I think we should really
trim the number of lists we use for kernel development. And since I moved
back to using mutt for reading e-mails, something I should have never, ever
stopped doing, I guess we should move the DCCP discussions to netdev,
where we hopefully can get more people interested and reviewing the work we
do, so please consider moving DCCP discussion to netdev@vger.kernel.org,
where lots of smart networking folks are present and can help our efforts
on turning RFCs to code.
Back to business...:
Hi Gerrit,
Please take a look at this patch series where I reorganized your work on the new
TFRC rx history handling code. I'll wait for your considerations and then do as many
interactions as reasonable to get your work merged.
It should be completely equivalent, plus some fixes and optimizations, such as:
. The code that allocates the RX ring deals with failures when one of the entries in
the ring buffer is not successfully allocated, the original code was leaking the
successfully allocated entries.
. We do just one allocation for the ring buffer, as the number of entries is fixed we
should just do one allocation and not TFRC_NDUPACK times.
. I haven't checked if all the code was commited, as I tried to introduce just what was
immediatelly used, probably we'll need to do some changes when working on the merge
of your loss intervals code.
. I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the first
non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the
initial value in the EWMA calculation.
. I also moved some patch parts (hunks) around trying to improve the readability of
the patches, trying to get things that logically replaced what was there before
closer together.
. Separation of parts of your patches and combination of others is also another thing
you'll see in this patch set. I understand that it is difficult to find the right
compromise and I hope you don't feel too bad with the decisions I made, eventually
we'll find a common ground.
. Another change was related to namespacing, I added tfrc_rx_hist_ to a number of
functions and in some cases just normalised the naming to be consistent.
. I'm not that happy with deferring changes to the loss intervals code that uses
rx handling data structures, but I'm OK with leaving some code commented out till
we get to merging the new loss intervals code.
For what is worth I leave her my deep appreciation of your work and also my
(repeated) apologies for not being able to do these kinds of review sessions months ago,
but I also I'm willing and able to cure these shortcomings by continuing the work I've
been doing recently on finally reviewing your hard work, keep it up!
It is available at:
master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25
Best Regards,
- Arnaldo
b/net/dccp/ccids/Kconfig | 13
b/net/dccp/ccids/ccid3.c | 35 -
b/net/dccp/ccids/ccid3.h | 14
b/net/dccp/ccids/lib/Makefile | 2
b/net/dccp/ccids/lib/loss_interval.c | 14
b/net/dccp/ccids/lib/packet_history.c | 27 -
b/net/dccp/ccids/lib/packet_history.h | 3
b/net/dccp/ccids/lib/packet_history_internal.h | 68 +++
b/net/dccp/ccids/lib/tfrc.c | 48 ++
b/net/dccp/ccids/lib/tfrc.h | 18
b/net/dccp/dccp.h | 13
net/dccp/ccids/ccid3.c | 289 ++++----------
net/dccp/ccids/lib/loss_interval.c | 14
net/dccp/ccids/lib/packet_history.c | 483 +++++++++++++------------
net/dccp/ccids/lib/packet_history.h | 175 +++------
15 files changed, 602 insertions(+), 614 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread* [PATCH 1/7] [TFRC]: Provide central source file and debug facility 2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo 2007-12-03 8:23 ` [RFC][PATCHES 0/7]: Reorganization of RX history patches Ian McDonald 2007-12-03 8:35 ` Gerrit Renker 2 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker Cc: netdev, dccp, Ingo Molnar, Gerrit Renker, Ian McDonald, Arnaldo Carvalho de Melo From: Gerrit Renker <gerrit@erg.abdn.ac.uk> This patch changes the tfrc_lib module in the following manner: (1) a dedicated tfrc source file to call the packet history & loss interval init/exit functions. (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'. Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3 select IP_DCCP_TFRC_LIB. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/Kconfig | 13 ++++++--- net/dccp/ccids/lib/Makefile | 2 +- net/dccp/ccids/lib/packet_history.c | 27 ++----------------- net/dccp/ccids/lib/packet_history.h | 3 +- net/dccp/ccids/lib/tfrc.c | 48 +++++++++++++++++++++++++++++++++++ net/dccp/ccids/lib/tfrc.h | 17 +++++++++--- 6 files changed, 75 insertions(+), 35 deletions(-) create mode 100644 net/dccp/ccids/lib/tfrc.c diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig index 3d7d867..1227594 100644 --- a/net/dccp/ccids/Kconfig +++ b/net/dccp/ccids/Kconfig @@ -38,6 +38,7 @@ config IP_DCCP_CCID2_DEBUG config IP_DCCP_CCID3 tristate "CCID3 (TCP-Friendly) (EXPERIMENTAL)" def_tristate IP_DCCP + select IP_DCCP_TFRC_LIB ---help--- CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based rate-controlled congestion control mechanism. TFRC is designed to @@ -63,10 +64,6 @@ config IP_DCCP_CCID3 If in doubt, say M. -config IP_DCCP_TFRC_LIB - depends on IP_DCCP_CCID3 - def_tristate IP_DCCP_CCID3 - config IP_DCCP_CCID3_DEBUG bool "CCID3 debugging messages" depends on IP_DCCP_CCID3 @@ -110,5 +107,13 @@ config IP_DCCP_CCID3_RTO is serious network congestion: experimenting with larger values should therefore not be performed on WANs. +config IP_DCCP_TFRC_LIB + tristate + default n + +config IP_DCCP_TFRC_DEBUG + bool + depends on IP_DCCP_TFRC_LIB + default y if IP_DCCP_CCID3_DEBUG endmenu diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile index 5f940a6..68c93e3 100644 --- a/net/dccp/ccids/lib/Makefile +++ b/net/dccp/ccids/lib/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o -dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o +dccp_tfrc_lib-y := tfrc.o tfrc_equation.o packet_history.o loss_interval.o diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 4805de9..1d4d6ee 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -35,7 +35,6 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include <linux/module.h> #include <linux/string.h> #include "packet_history.h" @@ -277,39 +276,19 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); -extern int __init dccp_li_init(void); -extern void dccp_li_exit(void); - -static __init int packet_history_init(void) +__init int packet_history_init(void) { - if (dccp_li_init() != 0) - goto out; - tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist", sizeof(struct tfrc_tx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); - if (tfrc_tx_hist == NULL) - goto out_li_exit; - return 0; -out_li_exit: - dccp_li_exit(); -out: - return -ENOBUFS; + return tfrc_tx_hist == NULL ? -ENOBUFS : 0; } -module_init(packet_history_init); -static __exit void packet_history_exit(void) +void packet_history_exit(void) { if (tfrc_tx_hist != NULL) { kmem_cache_destroy(tfrc_tx_hist); tfrc_tx_hist = NULL; } - dccp_li_exit(); } -module_exit(packet_history_exit); - -MODULE_AUTHOR("Ian McDonald <ian.mcdonald@jandi.co.nz>, " - "Arnaldo Carvalho de Melo <acme@ghostprotocols.net>"); -MODULE_DESCRIPTION("DCCP TFRC library"); -MODULE_LICENSE("GPL"); diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 0670f46..9a2642e 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -39,8 +39,7 @@ #include <linux/ktime.h> #include <linux/list.h> #include <linux/slab.h> - -#include "../../dccp.h" +#include "tfrc.h" /* Number of later packets received before one is considered lost */ #define TFRC_RECV_NUM_LATE_LOSS 3 diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c new file mode 100644 index 0000000..3a7a183 --- /dev/null +++ b/net/dccp/ccids/lib/tfrc.c @@ -0,0 +1,48 @@ +/* + * TFRC: main module holding the pieces of the TFRC library together + * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2007 Arnaldo Carvalho de Melo <acme@redhat.com> + */ +#include <linux/module.h> +#include <linux/moduleparam.h> +#include "tfrc.h" + +#ifdef CONFIG_IP_DCCP_TFRC_DEBUG +int tfrc_debug; +module_param(tfrc_debug, bool, 0444); +MODULE_PARM_DESC(tfrc_debug, "Enable debug messages"); +#endif + +extern int dccp_li_init(void); +extern void dccp_li_exit(void); +extern int packet_history_init(void); +extern void packet_history_exit(void); + +static int __init tfrc_module_init(void) +{ + int rc = dccp_li_init(); + + if (rc == 0) { + rc = packet_history_init(); + if (rc != 0) + dccp_li_exit(); + } + + return rc; +} + +static void __exit tfrc_module_exit(void) +{ + packet_history_exit(); + dccp_li_exit(); +} + +module_init(tfrc_module_init); +module_exit(tfrc_module_exit); + +MODULE_AUTHOR("Gerrit Renker <gerrit@erg.abdn.ac.uk>, " + "Ian McDonald <ian.mcdonald@jandi.co.nz>, " + "Arnaldo Carvalho de Melo <acme@redhat.com>"); +MODULE_DESCRIPTION("DCCP TFRC library"); +MODULE_LICENSE("GPL"); diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h index 5a0ba86..ab8848c 100644 --- a/net/dccp/ccids/lib/tfrc.h +++ b/net/dccp/ccids/lib/tfrc.h @@ -3,10 +3,11 @@ /* * net/dccp/ccids/lib/tfrc.h * - * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz> - * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> - * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-6 Ian McDonald <ian.mcdonald@jandi.co.nz> + * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> + * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -15,6 +16,14 @@ */ #include <linux/types.h> #include <asm/div64.h> +#include "../../dccp.h" + +#ifdef CONFIG_IP_DCCP_TFRC_DEBUG +extern int tfrc_debug; +#define tfrc_pr_debug(format, a...) DCCP_PR_DEBUG(tfrc_debug, format, ##a) +#else +#define tfrc_pr_debug(format, a...) +#endif /* integer-arithmetic divisions of type (a * 1000000)/b */ static inline u64 scaled_div(u64 a, u32 b) -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' 2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker Cc: netdev, dccp, Ingo Molnar, Gerrit Renker, Ian McDonald, Arnaldo Carvalho de Melo From: Gerrit Renker <gerrit@erg.abdn.ac.uk> as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/dccp.h | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index ee97950..f4a5ea1 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -334,6 +334,7 @@ struct dccp_skb_cb { #define DCCP_SKB_CB(__skb) ((struct dccp_skb_cb *)&((__skb)->cb[0])) +/* RFC 4340, sec. 7.7 */ static inline int dccp_non_data_packet(const struct sk_buff *skb) { const __u8 type = DCCP_SKB_CB(skb)->dccpd_type; @@ -346,6 +347,17 @@ static inline int dccp_non_data_packet(const struct sk_buff *skb) type == DCCP_PKT_SYNCACK; } +/* RFC 4340, sec. 7.7 */ +static inline int dccp_data_packet(const struct sk_buff *skb) +{ + const __u8 type = DCCP_SKB_CB(skb)->dccpd_type; + + return type == DCCP_PKT_DATA || + type == DCCP_PKT_DATAACK || + type == DCCP_PKT_REQUEST || + type == DCCP_PKT_RESPONSE; +} + static inline int dccp_packet_without_ack(const struct sk_buff *skb) { const __u8 type = DCCP_SKB_CB(skb)->dccpd_type; -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency 2007-12-02 21:36 ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo 2007-12-06 13:57 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Gerrit Renker 0 siblings, 2 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/lib/packet_history.c | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 1d4d6ee..b628714 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -53,7 +53,7 @@ struct tfrc_tx_hist_entry { /* * Transmitter History Routines */ -static struct kmem_cache *tfrc_tx_hist; +static struct kmem_cache *tfrc_tx_hist_slab; static struct tfrc_tx_hist_entry * tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno) @@ -66,7 +66,7 @@ static struct tfrc_tx_hist_entry * int tfrc_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno) { - struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist, gfp_any()); + struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist_slab, gfp_any()); if (entry == NULL) return -ENOBUFS; @@ -85,7 +85,7 @@ void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp) while (head != NULL) { struct tfrc_tx_hist_entry *next = head->next; - kmem_cache_free(tfrc_tx_hist, head); + kmem_cache_free(tfrc_tx_hist_slab, head); head = next; } @@ -278,17 +278,17 @@ EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); __init int packet_history_init(void) { - tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist", - sizeof(struct tfrc_tx_hist_entry), 0, - SLAB_HWCACHE_ALIGN, NULL); + tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist", + sizeof(struct tfrc_tx_hist_entry), 0, + SLAB_HWCACHE_ALIGN, NULL); - return tfrc_tx_hist == NULL ? -ENOBUFS : 0; + return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0; } void packet_history_exit(void) { - if (tfrc_tx_hist != NULL) { - kmem_cache_destroy(tfrc_tx_hist); - tfrc_tx_hist = NULL; + if (tfrc_tx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_tx_hist_slab); + tfrc_tx_hist_slab = NULL; } } -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 4/7] [TFRC]: Make the rx history slab be global 2007-12-02 21:36 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo 2007-12-06 13:59 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Gerrit Renker 2007-12-06 13:57 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Gerrit Renker 1 sibling, 2 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 35 ++----------- net/dccp/ccids/lib/packet_history.c | 95 ++++++++++++++++++----------------- net/dccp/ccids/lib/packet_history.h | 43 ++-------------- 3 files changed, 60 insertions(+), 113 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5dea690..07920bb 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -49,8 +49,6 @@ static int ccid3_debug; #define ccid3_pr_debug(format, a...) #endif -static struct dccp_rx_hist *ccid3_rx_hist; - /* * Transmitter Half-Connection Routines */ @@ -807,9 +805,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, } detect_out: - dccp_rx_hist_add_packet(ccid3_rx_hist, &hcrx->ccid3hcrx_hist, - &hcrx->ccid3hcrx_li_hist, packet, - hcrx->ccid3hcrx_seqno_nonloss); + dccp_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, + &hcrx->ccid3hcrx_li_hist, packet, + hcrx->ccid3hcrx_seqno_nonloss); return loss; } @@ -852,8 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) return; } - packet = dccp_rx_hist_entry_new(ccid3_rx_hist, opt_recv->dccpor_ndp, - skb, GFP_ATOMIC); + packet = dccp_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); if (unlikely(packet == NULL)) { DCCP_WARN("%s(%p), Not enough mem to add rx packet " "to history, consider it lost!\n", dccp_role(sk), sk); @@ -936,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk) ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM); /* Empty packet history */ - dccp_rx_hist_purge(ccid3_rx_hist, &hcrx->ccid3hcrx_hist); + dccp_rx_hist_purge(&hcrx->ccid3hcrx_hist); /* Empty loss interval history */ dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist); @@ -1013,33 +1010,13 @@ MODULE_PARM_DESC(ccid3_debug, "Enable debug messages"); static __init int ccid3_module_init(void) { - int rc = -ENOBUFS; - - ccid3_rx_hist = dccp_rx_hist_new("ccid3"); - if (ccid3_rx_hist == NULL) - goto out; - - rc = ccid_register(&ccid3); - if (rc != 0) - goto out_free_rx; -out: - return rc; - -out_free_rx: - dccp_rx_hist_delete(ccid3_rx_hist); - ccid3_rx_hist = NULL; - goto out; + return ccid_register(&ccid3); } module_init(ccid3_module_init); static __exit void ccid3_module_exit(void) { ccid_unregister(&ccid3); - - if (ccid3_rx_hist != NULL) { - dccp_rx_hist_delete(ccid3_rx_hist); - ccid3_rx_hist = NULL; - } } module_exit(ccid3_module_exit); diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index b628714..e1ab853 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -114,48 +114,33 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); /* * Receiver History Routines */ -struct dccp_rx_hist *dccp_rx_hist_new(const char *name) +static struct kmem_cache *tfrc_rx_hist_slab; + +struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp, + const struct sk_buff *skb, + const gfp_t prio) { - struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC); - static const char dccp_rx_hist_mask[] = "rx_hist_%s"; - char *slab_name; - - if (hist == NULL) - goto out; - - slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1, - GFP_ATOMIC); - if (slab_name == NULL) - goto out_free_hist; - - sprintf(slab_name, dccp_rx_hist_mask, name); - hist->dccprxh_slab = kmem_cache_create(slab_name, - sizeof(struct dccp_rx_hist_entry), - 0, SLAB_HWCACHE_ALIGN, NULL); - if (hist->dccprxh_slab == NULL) - goto out_free_slab_name; -out: - return hist; -out_free_slab_name: - kfree(slab_name); -out_free_hist: - kfree(hist); - hist = NULL; - goto out; -} + struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, + prio); -EXPORT_SYMBOL_GPL(dccp_rx_hist_new); + if (entry != NULL) { + const struct dccp_hdr *dh = dccp_hdr(skb); -void dccp_rx_hist_delete(struct dccp_rx_hist *hist) -{ - const char* name = kmem_cache_name(hist->dccprxh_slab); + entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->dccphrx_ccval = dh->dccph_ccval; + entry->dccphrx_type = dh->dccph_type; + entry->dccphrx_ndp = ndp; + entry->dccphrx_tstamp = ktime_get_real(); + } - kmem_cache_destroy(hist->dccprxh_slab); - kfree(name); - kfree(hist); + return entry; } +EXPORT_SYMBOL_GPL(dccp_rx_hist_entry_new); -EXPORT_SYMBOL_GPL(dccp_rx_hist_delete); +static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist_entry *entry) +{ + kmem_cache_free(tfrc_rx_hist_slab, entry); +} int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, u8 *ccval) @@ -192,11 +177,10 @@ struct dccp_rx_hist_entry * EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet); -void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, - struct list_head *rx_list, - struct list_head *li_list, - struct dccp_rx_hist_entry *packet, - u64 nonloss_seqno) +void dccp_rx_hist_add_packet(struct list_head *rx_list, + struct list_head *li_list, + struct dccp_rx_hist_entry *packet, + u64 nonloss_seqno) { struct dccp_rx_hist_entry *entry, *next; u8 num_later = 0; @@ -211,7 +195,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, if (after48(nonloss_seqno, entry->dccphrx_seqno)) { list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(hist, entry); + dccp_rx_hist_entry_delete(entry); } } else if (dccp_rx_hist_entry_data_packet(entry)) --num_later; @@ -253,7 +237,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, break; case 3: list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(hist, entry); + dccp_rx_hist_entry_delete(entry); break; } } else if (dccp_rx_hist_entry_data_packet(entry)) @@ -264,13 +248,13 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet); -void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) +void dccp_rx_hist_purge(struct list_head *list) { struct dccp_rx_hist_entry *entry, *next; list_for_each_entry_safe(entry, next, list, dccphrx_node) { list_del_init(&entry->dccphrx_node); - kmem_cache_free(hist->dccprxh_slab, entry); + dccp_rx_hist_entry_delete(entry); } } @@ -281,8 +265,22 @@ __init int packet_history_init(void) tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist", sizeof(struct tfrc_tx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); + if (tfrc_tx_hist_slab == NULL) + goto out_err; - return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0; + tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist", + sizeof(struct dccp_rx_hist_entry), 0, + SLAB_HWCACHE_ALIGN, NULL); + if (tfrc_rx_hist_slab == NULL) + goto out_free_tx; + + return 0; + +out_free_tx: + kmem_cache_destroy(tfrc_tx_hist_slab); + tfrc_tx_hist_slab = NULL; +out_err: + return -ENOBUFS; } void packet_history_exit(void) @@ -291,4 +289,9 @@ void packet_history_exit(void) kmem_cache_destroy(tfrc_tx_hist_slab); tfrc_tx_hist_slab = NULL; } + + if (tfrc_rx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_rx_hist_slab); + tfrc_rx_hist_slab = NULL; + } } diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 9a2642e..34b180b 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -66,34 +66,10 @@ struct dccp_rx_hist_entry { ktime_t dccphrx_tstamp; }; -struct dccp_rx_hist { - struct kmem_cache *dccprxh_slab; -}; - -extern struct dccp_rx_hist *dccp_rx_hist_new(const char *name); -extern void dccp_rx_hist_delete(struct dccp_rx_hist *hist); - -static inline struct dccp_rx_hist_entry * - dccp_rx_hist_entry_new(struct dccp_rx_hist *hist, - const u32 ndp, +extern struct dccp_rx_hist_entry * + dccp_rx_hist_entry_new(const u32 ndp, const struct sk_buff *skb, - const gfp_t prio) -{ - struct dccp_rx_hist_entry *entry = kmem_cache_alloc(hist->dccprxh_slab, - prio); - - if (entry != NULL) { - const struct dccp_hdr *dh = dccp_hdr(skb); - - entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->dccphrx_ccval = dh->dccph_ccval; - entry->dccphrx_type = dh->dccph_type; - entry->dccphrx_ndp = ndp; - entry->dccphrx_tstamp = ktime_get_real(); - } - - return entry; -} + const gfp_t prio); static inline struct dccp_rx_hist_entry * dccp_rx_hist_head(struct list_head *list) @@ -111,21 +87,12 @@ extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, extern struct dccp_rx_hist_entry * dccp_rx_hist_find_data_packet(const struct list_head *list); -extern void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, - struct list_head *rx_list, +extern void dccp_rx_hist_add_packet(struct list_head *rx_list, struct list_head *li_list, struct dccp_rx_hist_entry *packet, u64 nonloss_seqno); -static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist *hist, - struct dccp_rx_hist_entry *entry) -{ - if (entry != NULL) - kmem_cache_free(hist->dccprxh_slab, entry); -} - -extern void dccp_rx_hist_purge(struct dccp_rx_hist *hist, - struct list_head *list); +extern void dccp_rx_hist_purge(struct list_head *list); static inline int dccp_rx_hist_entry_data_packet(const struct dccp_rx_hist_entry *entry) -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ 2007-12-02 21:36 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo 2007-12-06 13:59 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Gerrit Renker 2007-12-06 13:59 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Gerrit Renker 1 sibling, 2 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 32 ++++++------ net/dccp/ccids/lib/loss_interval.c | 14 +++--- net/dccp/ccids/lib/packet_history.c | 90 +++++++++++++++++----------------- net/dccp/ccids/lib/packet_history.h | 48 +++++++++--------- 4 files changed, 92 insertions(+), 92 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 07920bb..c95dca8 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -677,7 +677,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct dccp_rx_hist_entry *packet; + struct tfrc_rx_hist_entry *packet; ktime_t now; suseconds_t delta; @@ -701,7 +701,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) return; } - packet = dccp_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); + packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); if (unlikely(packet == NULL)) { DCCP_WARN("%s(%p), no data packet in history!\n", dccp_role(sk), sk); @@ -709,7 +709,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) } hcrx->ccid3hcrx_tstamp_last_feedback = now; - hcrx->ccid3hcrx_ccval_last_counter = packet->dccphrx_ccval; + hcrx->ccid3hcrx_ccval_last_counter = packet->tfrchrx_ccval; hcrx->ccid3hcrx_bytes_recv = 0; if (hcrx->ccid3hcrx_p == 0) @@ -752,12 +752,12 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) } static int ccid3_hc_rx_detect_loss(struct sock *sk, - struct dccp_rx_hist_entry *packet) + struct tfrc_rx_hist_entry *packet) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct dccp_rx_hist_entry *rx_hist = - dccp_rx_hist_head(&hcrx->ccid3hcrx_hist); - u64 seqno = packet->dccphrx_seqno; + struct tfrc_rx_hist_entry *rx_hist = + tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist); + u64 seqno = packet->tfrchrx_seqno; u64 tmp_seqno; int loss = 0; u8 ccval; @@ -766,9 +766,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; if (!rx_hist || - follows48(packet->dccphrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { + follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->dccphrx_ccval; + hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; goto detect_out; } @@ -789,7 +789,7 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, dccp_inc_seqno(&tmp_seqno); hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; dccp_inc_seqno(&tmp_seqno); - while (dccp_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, + while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, tmp_seqno, &ccval)) { hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; hcrx->ccid3hcrx_ccval_nonloss = ccval; @@ -799,13 +799,13 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, /* FIXME - this code could be simplified with above while */ /* but works at moment */ - if (follows48(packet->dccphrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { + if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->dccphrx_ccval; + hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; } detect_out: - dccp_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, + tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, &hcrx->ccid3hcrx_li_hist, packet, hcrx->ccid3hcrx_seqno_nonloss); return loss; @@ -815,7 +815,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); const struct dccp_options_received *opt_recv; - struct dccp_rx_hist_entry *packet; + struct tfrc_rx_hist_entry *packet; u32 p_prev, r_sample, rtt_prev; int loss, payload_size; ktime_t now; @@ -850,7 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) return; } - packet = dccp_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); + packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); if (unlikely(packet == NULL)) { DCCP_WARN("%s(%p), Not enough mem to add rx packet " "to history, consider it lost!\n", dccp_role(sk), sk); @@ -933,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk) ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM); /* Empty packet history */ - dccp_rx_hist_purge(&hcrx->ccid3hcrx_hist); + tfrc_rx_hist_purge(&hcrx->ccid3hcrx_hist); /* Empty loss interval history */ dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist); diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index f2ca4eb..a5f59af 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -129,7 +129,7 @@ static u32 dccp_li_calc_first_li(struct sock *sk, u16 s, u32 bytes_recv, u32 previous_x_recv) { - struct dccp_rx_hist_entry *entry, *next, *tail = NULL; + struct tfrc_rx_hist_entry *entry, *next, *tail = NULL; u32 x_recv, p; suseconds_t rtt, delta; ktime_t tstamp = ktime_set(0, 0); @@ -138,18 +138,18 @@ static u32 dccp_li_calc_first_li(struct sock *sk, int step = 0; u64 fval; - list_for_each_entry_safe(entry, next, hist_list, dccphrx_node) { - if (dccp_rx_hist_entry_data_packet(entry)) { + list_for_each_entry_safe(entry, next, hist_list, tfrchrx_node) { + if (tfrc_rx_hist_entry_data_packet(entry)) { tail = entry; switch (step) { case 0: - tstamp = entry->dccphrx_tstamp; - win_count = entry->dccphrx_ccval; + tstamp = entry->tfrchrx_tstamp; + win_count = entry->tfrchrx_ccval; step = 1; break; case 1: - interval = win_count - entry->dccphrx_ccval; + interval = win_count - entry->tfrchrx_ccval; if (interval < 0) interval += TFRC_WIN_COUNT_LIMIT; if (interval > 4) @@ -176,7 +176,7 @@ found: return ~0; } - delta = ktime_us_delta(tstamp, tail->dccphrx_tstamp); + delta = ktime_us_delta(tstamp, tail->tfrchrx_tstamp); DCCP_BUG_ON(delta < 0); rtt = delta * 4 / interval; diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index e1ab853..255cca1 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -116,58 +116,58 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); */ static struct kmem_cache *tfrc_rx_hist_slab; -struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp, +struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp, const struct sk_buff *skb, const gfp_t prio) { - struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, + struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, prio); if (entry != NULL) { const struct dccp_hdr *dh = dccp_hdr(skb); - entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->dccphrx_ccval = dh->dccph_ccval; - entry->dccphrx_type = dh->dccph_type; - entry->dccphrx_ndp = ndp; - entry->dccphrx_tstamp = ktime_get_real(); + entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->tfrchrx_ccval = dh->dccph_ccval; + entry->tfrchrx_type = dh->dccph_type; + entry->tfrchrx_ndp = ndp; + entry->tfrchrx_tstamp = ktime_get_real(); } return entry; } -EXPORT_SYMBOL_GPL(dccp_rx_hist_entry_new); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new); -static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist_entry *entry) +static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry) { kmem_cache_free(tfrc_rx_hist_slab, entry); } -int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, +int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, u8 *ccval) { - struct dccp_rx_hist_entry *packet = NULL, *entry; + struct tfrc_rx_hist_entry *packet = NULL, *entry; - list_for_each_entry(entry, list, dccphrx_node) - if (entry->dccphrx_seqno == seq) { + list_for_each_entry(entry, list, tfrchrx_node) + if (entry->tfrchrx_seqno == seq) { packet = entry; break; } if (packet) - *ccval = packet->dccphrx_ccval; + *ccval = packet->tfrchrx_ccval; return packet != NULL; } -EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry); -struct dccp_rx_hist_entry * - dccp_rx_hist_find_data_packet(const struct list_head *list) +EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry); +struct tfrc_rx_hist_entry * + tfrc_rx_hist_find_data_packet(const struct list_head *list) { - struct dccp_rx_hist_entry *entry, *packet = NULL; + struct tfrc_rx_hist_entry *entry, *packet = NULL; - list_for_each_entry(entry, list, dccphrx_node) - if (entry->dccphrx_type == DCCP_PKT_DATA || - entry->dccphrx_type == DCCP_PKT_DATAACK) { + list_for_each_entry(entry, list, tfrchrx_node) + if (entry->tfrchrx_type == DCCP_PKT_DATA || + entry->tfrchrx_type == DCCP_PKT_DATAACK) { packet = entry; break; } @@ -175,29 +175,29 @@ struct dccp_rx_hist_entry * return packet; } -EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet); -void dccp_rx_hist_add_packet(struct list_head *rx_list, +void tfrc_rx_hist_add_packet(struct list_head *rx_list, struct list_head *li_list, - struct dccp_rx_hist_entry *packet, + struct tfrc_rx_hist_entry *packet, u64 nonloss_seqno) { - struct dccp_rx_hist_entry *entry, *next; + struct tfrc_rx_hist_entry *entry, *next; u8 num_later = 0; - list_add(&packet->dccphrx_node, rx_list); + list_add(&packet->tfrchrx_node, rx_list); num_later = TFRC_RECV_NUM_LATE_LOSS + 1; if (!list_empty(li_list)) { - list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) { + list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { if (num_later == 0) { if (after48(nonloss_seqno, - entry->dccphrx_seqno)) { - list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(entry); + entry->tfrchrx_seqno)) { + list_del_init(&entry->tfrchrx_node); + tfrc_rx_hist_entry_delete(entry); } - } else if (dccp_rx_hist_entry_data_packet(entry)) + } else if (tfrc_rx_hist_entry_data_packet(entry)) --num_later; } } else { @@ -208,7 +208,7 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list, * We have no loss interval history so we need at least one * rtt:s of data packets to approximate rtt. */ - list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) { + list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { if (num_later == 0) { switch (step) { case 0: @@ -220,10 +220,10 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list, step = 2; /* OK, find next data packet */ num_later = 1; - win_count = entry->dccphrx_ccval; + win_count = entry->tfrchrx_ccval; break; case 2: - tmp = win_count - entry->dccphrx_ccval; + tmp = win_count - entry->tfrchrx_ccval; if (tmp < 0) tmp += TFRC_WIN_COUNT_LIMIT; if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) { @@ -236,29 +236,29 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list, num_later = 1; break; case 3: - list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(entry); + list_del_init(&entry->tfrchrx_node); + tfrc_rx_hist_entry_delete(entry); break; } - } else if (dccp_rx_hist_entry_data_packet(entry)) + } else if (tfrc_rx_hist_entry_data_packet(entry)) --num_later; } } } -EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); -void dccp_rx_hist_purge(struct list_head *list) +void tfrc_rx_hist_purge(struct list_head *list) { - struct dccp_rx_hist_entry *entry, *next; + struct tfrc_rx_hist_entry *entry, *next; - list_for_each_entry_safe(entry, next, list, dccphrx_node) { - list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(entry); + list_for_each_entry_safe(entry, next, list, tfrchrx_node) { + list_del_init(&entry->tfrchrx_node); + tfrc_rx_hist_entry_delete(entry); } } -EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); __init int packet_history_init(void) { @@ -269,7 +269,7 @@ __init int packet_history_init(void) goto out_err; tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist", - sizeof(struct dccp_rx_hist_entry), 0, + sizeof(struct tfrc_rx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); if (tfrc_rx_hist_slab == NULL) goto out_free_tx; diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 34b180b..5b0b983 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -57,51 +57,51 @@ extern u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, /* * Receiver History data structures and declarations */ -struct dccp_rx_hist_entry { - struct list_head dccphrx_node; - u64 dccphrx_seqno:48, - dccphrx_ccval:4, - dccphrx_type:4; - u32 dccphrx_ndp; /* In fact it is from 8 to 24 bits */ - ktime_t dccphrx_tstamp; +struct tfrc_rx_hist_entry { + struct list_head tfrchrx_node; + u64 tfrchrx_seqno:48, + tfrchrx_ccval:4, + tfrchrx_type:4; + u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ + ktime_t tfrchrx_tstamp; }; -extern struct dccp_rx_hist_entry * - dccp_rx_hist_entry_new(const u32 ndp, +extern struct tfrc_rx_hist_entry * + tfrc_rx_hist_entry_new(const u32 ndp, const struct sk_buff *skb, const gfp_t prio); -static inline struct dccp_rx_hist_entry * - dccp_rx_hist_head(struct list_head *list) +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_head(struct list_head *list) { - struct dccp_rx_hist_entry *head = NULL; + struct tfrc_rx_hist_entry *head = NULL; if (!list_empty(list)) - head = list_entry(list->next, struct dccp_rx_hist_entry, - dccphrx_node); + head = list_entry(list->next, struct tfrc_rx_hist_entry, + tfrchrx_node); return head; } -extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, +extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, u8 *ccval); -extern struct dccp_rx_hist_entry * - dccp_rx_hist_find_data_packet(const struct list_head *list); +extern struct tfrc_rx_hist_entry * + tfrc_rx_hist_find_data_packet(const struct list_head *list); -extern void dccp_rx_hist_add_packet(struct list_head *rx_list, +extern void tfrc_rx_hist_add_packet(struct list_head *rx_list, struct list_head *li_list, - struct dccp_rx_hist_entry *packet, + struct tfrc_rx_hist_entry *packet, u64 nonloss_seqno); -extern void dccp_rx_hist_purge(struct list_head *list); +extern void tfrc_rx_hist_purge(struct list_head *list); static inline int - dccp_rx_hist_entry_data_packet(const struct dccp_rx_hist_entry *entry) + tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry) { - return entry->dccphrx_type == DCCP_PKT_DATA || - entry->dccphrx_type == DCCP_PKT_DATAACK; + return entry->tfrchrx_type == DCCP_PKT_DATA || + entry->tfrchrx_type == DCCP_PKT_DATAACK; } -extern u64 dccp_rx_hist_detect_loss(struct list_head *rx_list, +extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list, struct list_head *li_list, u8 *win_loss); #endif /* _DCCP_PKT_HIST_ */ -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values 2007-12-02 21:36 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 7/7] [TFRC] New rx history code Arnaldo Carvalho de Melo 2007-12-06 13:59 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Gerrit Renker 1 sibling, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker Cc: netdev, dccp, Ingo Molnar, Gerrit Renker, Ian McDonald, Arnaldo Carvalho de Melo From: Gerrit Renker <gerrit@erg.abdn.ac.uk> Only the sender sets window counters [RFC 4342, sections 5 and 8.1]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index c95dca8..5ff5aab 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -733,7 +733,6 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) return 0; hcrx = ccid3_hc_rx_sk(sk); - DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_ccval_last_counter; if (dccp_packet_without_ack(skb)) return 0; -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 7/7] [TFRC] New rx history code 2007-12-02 21:36 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo @ 2007-12-02 21:36 ` Arnaldo Carvalho de Melo 2007-12-04 6:55 ` Gerrit Renker 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-02 21:36 UTC (permalink / raw) To: Gerrit Renker; +Cc: netdev, dccp, Ingo Molnar, Arnaldo Carvalho de Melo Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, hide details from the CCIDs and fix a couple bugs found in the initial implementation. Original changeset comment from Gerrit: ----------- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. ----------- Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 255 ++++++++---------------- net/dccp/ccids/ccid3.h | 14 +- net/dccp/ccids/lib/loss_interval.c | 14 +- net/dccp/ccids/lib/packet_history.c | 277 +++++++++++++++----------- net/dccp/ccids/lib/packet_history.h | 82 +++----- net/dccp/ccids/lib/packet_history_internal.h | 67 ++++++ 6 files changed, 353 insertions(+), 356 deletions(-) create mode 100644 net/dccp/ccids/lib/packet_history_internal.h diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5ff5aab..af64c1d 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, /* * Receiver Half-Connection Routines */ + +/* CCID3 feedback types */ +enum ccid3_fback_type { + CCID3_FBACK_NONE = 0, + CCID3_FBACK_INITIAL, + CCID3_FBACK_PERIODIC, + CCID3_FBACK_PARAM_CHANGE +}; + #ifdef CONFIG_IP_DCCP_CCID3_DEBUG static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) { @@ -673,53 +682,60 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); } -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, + const struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct tfrc_rx_hist_entry *packet; ktime_t now; - suseconds_t delta; + s64 delta = 0; ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk); + if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; + now = ktime_get_real(); - switch (hcrx->ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case CCID3_FBACK_INITIAL: hcrx->ccid3hcrx_x_recv = 0; + hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx->ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta < 0); - hcrx->ccid3hcrx_x_recv = - scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); + case CCID3_FBACK_PARAM_CHANGE: + /* + * When parameters change (new loss or p > p_prev), we do not + * have a reliable estimate for R_m of [RFC 3448, 6.2] and so + * need to reuse the previous value of X_recv. However, when + * X_recv was 0 (due to early loss), this would kill X down to + * s/t_mbi (i.e. one packet in 64 seconds). + * To avoid such drastic reduction, we approximate X_recv as + * the number of bytes since last feedback. + * This is a safe fallback, since X is bounded above by X_calc. + */ + if (hcrx->ccid3hcrx_x_recv > 0) + break; + /* fall through */ + case CCID3_FBACK_PERIODIC: + delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback); + if (delta <= 0) + DCCP_BUG("delta (%ld) <= 0", (long)delta); + else + hcrx->ccid3hcrx_x_recv = + scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); break; - case TFRC_RSTATE_TERM: - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); + default: return; } - packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); - if (unlikely(packet == NULL)) { - DCCP_WARN("%s(%p), no data packet in history!\n", - dccp_role(sk), sk); - return; - } + ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv); hcrx->ccid3hcrx_tstamp_last_feedback = now; - hcrx->ccid3hcrx_ccval_last_counter = packet->tfrchrx_ccval; + hcrx->ccid3hcrx_last_counter = dccp_hdr(skb)->dccph_ccval; hcrx->ccid3hcrx_bytes_recv = 0; - if (hcrx->ccid3hcrx_p == 0) - hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ - else if (hcrx->ccid3hcrx_p > 1000000) { - DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p); - hcrx->ccid3hcrx_pinv = 1; /* use 100% in this case */ - } else - hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p; - dp->dccps_hc_rx_insert_options = 1; dccp_send_ack(sk); } @@ -753,162 +769,52 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) static int ccid3_hc_rx_detect_loss(struct sock *sk, struct tfrc_rx_hist_entry *packet) { - struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct tfrc_rx_hist_entry *rx_hist = - tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist); - u64 seqno = packet->tfrchrx_seqno; - u64 tmp_seqno; - int loss = 0; - u8 ccval; - - - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; - - if (!rx_hist || - follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { - hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; - goto detect_out; - } - - - while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno) - > TFRC_RECV_NUM_LATE_LOSS) { - loss = 1; - dccp_li_update_li(sk, - &hcrx->ccid3hcrx_li_hist, - &hcrx->ccid3hcrx_hist, - hcrx->ccid3hcrx_tstamp_last_feedback, - hcrx->ccid3hcrx_s, - hcrx->ccid3hcrx_bytes_recv, - hcrx->ccid3hcrx_x_recv, - hcrx->ccid3hcrx_seqno_nonloss, - hcrx->ccid3hcrx_ccval_nonloss); - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; - dccp_inc_seqno(&tmp_seqno); - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; - dccp_inc_seqno(&tmp_seqno); - while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, - tmp_seqno, &ccval)) { - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; - hcrx->ccid3hcrx_ccval_nonloss = ccval; - dccp_inc_seqno(&tmp_seqno); - } - } - - /* FIXME - this code could be simplified with above while */ - /* but works at moment */ - if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { - hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; - } - -detect_out: - tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, - &hcrx->ccid3hcrx_li_hist, packet, - hcrx->ccid3hcrx_seqno_nonloss); - return loss; + return 0; } static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - const struct dccp_options_received *opt_recv; - struct tfrc_rx_hist_entry *packet; - u32 p_prev, r_sample, rtt_prev; - int loss, payload_size; - ktime_t now; - - opt_recv = &dccp_sk(sk)->dccps_options_received; + const u32 payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; + const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp; + enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE; + const bool is_data_packet = dccp_data_packet(skb); - switch (DCCP_SKB_CB(skb)->dccpd_type) { - case DCCP_PKT_ACK: - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) + if (hcrx->ccid3hcrx_state != TFRC_RSTATE_NO_DATA) { + if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb)) return; - case DCCP_PKT_DATAACK: - if (opt_recv->dccpor_timestamp_echo == 0) - break; - r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo; - rtt_prev = hcrx->ccid3hcrx_rtt; - r_sample = dccp_sample_rtt(sk, 10 * r_sample); - - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) - hcrx->ccid3hcrx_rtt = r_sample; - else - hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 + - r_sample / 10; - - if (rtt_prev != hcrx->ccid3hcrx_rtt) - ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n", - dccp_role(sk), sk, hcrx->ccid3hcrx_rtt, - opt_recv->dccpor_elapsed_time); - break; - case DCCP_PKT_DATA: - break; - default: /* We're not interested in other packet types, move along */ - return; - } - - packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); - if (unlikely(packet == NULL)) { - DCCP_WARN("%s(%p), Not enough mem to add rx packet " - "to history, consider it lost!\n", dccp_role(sk), sk); - return; - } - - loss = ccid3_hc_rx_detect_loss(sk, packet); - - if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK) - return; - - payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; - ccid3_hc_rx_update_s(hcrx, payload_size); - switch (hcrx->ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: - ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial " - "feedback\n", dccp_role(sk), sk, - dccp_state_name(sk->sk_state), skb); - ccid3_hc_rx_send_feedback(sk); - ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); - return; - case TFRC_RSTATE_DATA: - hcrx->ccid3hcrx_bytes_recv += payload_size; - if (loss) - break; - - now = ktime_get_real(); - if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) - - (s64)hcrx->ccid3hcrx_rtt) >= 0) { - hcrx->ccid3hcrx_tstamp_last_ack = now; - ccid3_hc_rx_send_feedback(sk); + /* Handle pending losses and otherwise check for new loss */ + if (!tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist, + skb, ndp) && + /* Handle data packets: RTT sampling and monitoring p */ + is_data_packet) { + + if (list_empty(&hcrx->ccid3hcrx_li_hist)) { /* no loss so far: p = 0 */ + const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb); + if (sample != 0) + hcrx->ccid3hcrx_rtt = + tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9); + } + + ccid3_hc_rx_update_s(hcrx, payload_size); + /* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */ + if (SUB16(dccp_hdr(skb)->dccph_ccval, + hcrx->ccid3hcrx_last_counter) > 3) + do_feedback = CCID3_FBACK_PERIODIC; } - return; - case TFRC_RSTATE_TERM: - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); - return; - } - - /* Dealing with packet loss */ - ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n", - dccp_role(sk), sk, dccp_state_name(sk->sk_state)); - - p_prev = hcrx->ccid3hcrx_p; + } else if (is_data_packet) { + do_feedback = CCID3_FBACK_INITIAL; + ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); + hcrx->ccid3hcrx_s = payload_size; + } - /* Calculate loss event rate */ - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { - u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + hcrx->ccid3hcrx_bytes_recv += payload_size; - /* Scaling up by 1000000 as fixed decimal */ - if (i_mean != 0) - hcrx->ccid3hcrx_p = 1000000 / i_mean; - } else - DCCP_BUG("empty loss history"); + tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp); - if (hcrx->ccid3hcrx_p > p_prev) { - ccid3_hc_rx_send_feedback(sk); - return; - } + if (do_feedback) + ccid3_hc_rx_send_feedback(sk, skb, do_feedback); } static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) @@ -918,11 +824,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) ccid3_pr_debug("entry\n"); hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; - INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist); INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist); - hcrx->ccid3hcrx_tstamp_last_feedback = - hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real(); - return 0; + return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist); } static void ccid3_hc_rx_exit(struct sock *sk) diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h index b842a7d..3c33dc6 100644 --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -1,7 +1,8 @@ /* * net/dccp/ccids/ccid3.h * - * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK * * An implementation of the DCCP protocol * @@ -135,9 +136,7 @@ enum ccid3_hc_rx_states { * @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448 4.3) * @ccid3hcrx_rtt - Receiver estimate of rtt (non-standard) * @ccid3hcrx_p - current loss event rate (RFC 3448 5.4) - * @ccid3hcrx_seqno_nonloss - Last received non-loss sequence number - * @ccid3hcrx_ccval_nonloss - Last received non-loss Window CCVal - * @ccid3hcrx_ccval_last_counter - Tracks window counter (RFC 4342, 8.1) + * @ccid3hcrx_last_counter - Tracks window counter (RFC 4342, 8.1) * @ccid3hcrx_state - receiver state, one of %ccid3_hc_rx_states * @ccid3hcrx_bytes_recv - Total sum of DCCP payload bytes * @ccid3hcrx_tstamp_last_feedback - Time at which last feedback was sent @@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock { #define ccid3hcrx_x_recv ccid3hcrx_tfrc.tfrcrx_x_recv #define ccid3hcrx_rtt ccid3hcrx_tfrc.tfrcrx_rtt #define ccid3hcrx_p ccid3hcrx_tfrc.tfrcrx_p - u64 ccid3hcrx_seqno_nonloss:48, - ccid3hcrx_ccval_nonloss:4, - ccid3hcrx_ccval_last_counter:4; + u8 ccid3hcrx_last_counter:4; enum ccid3_hc_rx_states ccid3hcrx_state:8; u32 ccid3hcrx_bytes_recv; ktime_t ccid3hcrx_tstamp_last_feedback; - ktime_t ccid3hcrx_tstamp_last_ack; - struct list_head ccid3hcrx_hist; + struct tfrc_rx_hist ccid3hcrx_hist; struct list_head ccid3hcrx_li_hist; u16 ccid3hcrx_s; u32 ccid3hcrx_pinv; diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index a5f59af..71080ca 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -16,6 +16,7 @@ #include "../../dccp.h" #include "loss_interval.h" #include "packet_history.h" +#include "packet_history_internal.h" #include "tfrc.h" #define DCCP_LI_HIST_IVAL_F_LENGTH 8 @@ -129,6 +130,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk, u16 s, u32 bytes_recv, u32 previous_x_recv) { +/* + * FIXME: + * Will be rewritten in the upcoming new loss intervals code. + * Has to be commented ou because it relies on the old rx history + * data structures + */ +#if 0 struct tfrc_rx_hist_entry *entry, *next, *tail = NULL; u32 x_recv, p; suseconds_t rtt, delta; @@ -216,10 +224,10 @@ found: dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied " "loss rate=%u\n", dccp_role(sk), sk, x_recv, p); - if (p == 0) - return ~0; - else + if (p != 0) return 1000000 / p; +#endif + return ~0; } void dccp_li_update_li(struct sock *sk, diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 255cca1..80ad4cd 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -36,7 +36,10 @@ */ #include <linux/string.h> +#include <linux/slab.h> #include "packet_history.h" +#include "../../dccp.h" +#include "packet_history_internal.h" /** * tfrc_tx_hist_entry - Simple singly-linked TX history list @@ -111,154 +114,195 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, } EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); -/* - * Receiver History Routines +/** + * tfrc_rx_hist_index - index to reach n-th entry after loss_start */ -static struct kmem_cache *tfrc_rx_hist_slab; +static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n) +{ + return (h->loss_start + n) & TFRC_NDUPACK; +} -struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp, - const struct sk_buff *skb, - const gfp_t prio) +/** + * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, - prio); + return h->ring + tfrc_rx_hist_index(h, h->loss_count); +} - if (entry != NULL) { - const struct dccp_hdr *dh = dccp_hdr(skb); +/** + * tfrc_rx_hist_entry - return the n-th history entry after loss_start + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n) +{ + return h->ring + tfrc_rx_hist_index(h, n); +} - entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->tfrchrx_ccval = dh->dccph_ccval; - entry->tfrchrx_type = dh->dccph_type; - entry->tfrchrx_ndp = ndp; - entry->tfrchrx_tstamp = ktime_get_real(); - } +/** + * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h) +{ + return h->ring + h->loss_start; +} + +/* + * Receiver History Routines + */ +static struct kmem_cache *tfrc_rx_hist_slab; - return entry; +void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, + const struct sk_buff *skb, + const u32 ndp) +{ + struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h); + const struct dccp_hdr *dh = dccp_hdr(skb); + + entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->tfrchrx_ccval = dh->dccph_ccval; + entry->tfrchrx_type = dh->dccph_type; + entry->tfrchrx_ndp = ndp; + entry->tfrchrx_tstamp = ktime_get_real(); } -EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry) { kmem_cache_free(tfrc_rx_hist_slab, entry); } -int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval) +/* has the packet contained in skb been seen before? */ +int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb) { - struct tfrc_rx_hist_entry *packet = NULL, *entry; + const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq; + int i; - list_for_each_entry(entry, list, tfrchrx_node) - if (entry->tfrchrx_seqno == seq) { - packet = entry; - break; - } + if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0) + return 1; - if (packet) - *ccval = packet->tfrchrx_ccval; + for (i = 1; i <= h->loss_count; i++) + if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq) + return 1; - return packet != NULL; + return 0; } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry); -struct tfrc_rx_hist_entry * - tfrc_rx_hist_find_data_packet(const struct list_head *list) +/* initialise loss detection and disable RTT sampling */ +static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *packet = NULL; + h->loss_count = 1; +} - list_for_each_entry(entry, list, tfrchrx_node) - if (entry->tfrchrx_type == DCCP_PKT_DATA || - entry->tfrchrx_type == DCCP_PKT_DATAACK) { - packet = entry; - break; - } +/* indicate whether previously a packet was detected missing */ +static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h) +{ + return h->loss_count; +} - return packet; +/* any data packets missing between last reception and skb ? */ +int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, + const struct sk_buff *skb, u32 ndp) +{ + int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno, + DCCP_SKB_CB(skb)->dccpd_seq); + + if (delta > 1 && ndp < delta) + tfrc_rx_hist_loss_indicated(h); + + return tfrc_rx_hist_loss_pending(h); +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated); + +void tfrc_rx_hist_purge(struct tfrc_rx_hist *h) +{ + kmem_cache_free(tfrc_rx_hist_slab, h->ring); + h->ring = NULL; } -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); -void tfrc_rx_hist_add_packet(struct list_head *rx_list, - struct list_head *li_list, - struct tfrc_rx_hist_entry *packet, - u64 nonloss_seqno) +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *next; - u8 num_later = 0; - - list_add(&packet->tfrchrx_node, rx_list); - - num_later = TFRC_RECV_NUM_LATE_LOSS + 1; - - if (!list_empty(li_list)) { - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { - if (num_later == 0) { - if (after48(nonloss_seqno, - entry->tfrchrx_seqno)) { - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); - } - } else if (tfrc_rx_hist_entry_data_packet(entry)) - --num_later; - } - } else { - int step = 0; - u8 win_count = 0; /* Not needed, but lets shut up gcc */ - int tmp; - /* - * We have no loss interval history so we need at least one - * rtt:s of data packets to approximate rtt. - */ - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { - if (num_later == 0) { - switch (step) { - case 0: - step = 1; - /* OK, find next data packet */ - num_later = 1; - break; - case 1: - step = 2; - /* OK, find next data packet */ - num_later = 1; - win_count = entry->tfrchrx_ccval; - break; - case 2: - tmp = win_count - entry->tfrchrx_ccval; - if (tmp < 0) - tmp += TFRC_WIN_COUNT_LIMIT; - if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) { - /* - * We have found a packet older - * than one rtt remove the rest - */ - step = 3; - } else /* OK, find next data packet */ - num_later = 1; - break; - case 3: - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); - break; - } - } else if (tfrc_rx_hist_entry_data_packet(entry)) - --num_later; - } - } + h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); + h->loss_count = h->loss_start = 0; + return h->ring == NULL; } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); +/** + * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h) +{ + return h->ring; +} -void tfrc_rx_hist_purge(struct list_head *list) +/** + * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *next; + return h->ring + h->rtt_sample_prev; +} - list_for_each_entry_safe(entry, next, list, tfrchrx_node) { - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); +/** + * tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal + * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able + * to compute a sample with given data - calling function should check this. + */ +u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb) +{ + u32 sample = 0, + delta_v = SUB16(dccp_hdr(skb)->dccph_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + + if (delta_v < 1 || delta_v > 4) { /* unsuitable CCVal delta */ + if (h->rtt_sample_prev == 2) { /* previous candidate stored */ + sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + if (sample) + sample = 4 / sample * + ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp); + else /* + * FIXME: This condition is in principle not + * possible but occurs when CCID is used for + * two-way data traffic. I have tried to trace + * it, but the cause does not seem to be here. + */ + DCCP_BUG("please report to dccp@vger.kernel.org" + " => prev = %u, last = %u", + tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + } else if (delta_v < 1) { + h->rtt_sample_prev = 1; + goto keep_ref_for_next_time; + } + + } else if (delta_v == 4) /* optimal match */ + sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp)); + else { /* suboptimal match */ + h->rtt_sample_prev = 2; + goto keep_ref_for_next_time; } -} -EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); + if (unlikely(sample > DCCP_SANE_RTT_MAX)) { + DCCP_WARN("RTT sample %u too large, using max\n", sample); + sample = DCCP_SANE_RTT_MAX; + } + + h->rtt_sample_prev = 0; /* use current entry as next reference */ +keep_ref_for_next_time: + + return sample; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt); __init int packet_history_init(void) { @@ -269,7 +313,8 @@ __init int packet_history_init(void) goto out_err; tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist", - sizeof(struct tfrc_rx_hist_entry), 0, + (sizeof(struct tfrc_rx_hist_entry) * + (TFRC_NDUPACK + 1)), 0, SLAB_HWCACHE_ALIGN, NULL); if (tfrc_rx_hist_slab == NULL) goto out_free_tx; diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 5b0b983..6871e51 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -37,15 +37,9 @@ #define _DCCP_PKT_HIST_ #include <linux/ktime.h> -#include <linux/list.h> -#include <linux/slab.h> -#include "tfrc.h" +#include <linux/types.h> -/* Number of later packets received before one is considered lost */ -#define TFRC_RECV_NUM_LATE_LOSS 3 - -#define TFRC_WIN_COUNT_PER_RTT 4 -#define TFRC_WIN_COUNT_LIMIT 16 +struct sk_buff; struct tfrc_tx_hist_entry; @@ -54,54 +48,38 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp); extern u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, const ktime_t now); -/* - * Receiver History data structures and declarations - */ -struct tfrc_rx_hist_entry { - struct list_head tfrchrx_node; - u64 tfrchrx_seqno:48, - tfrchrx_ccval:4, - tfrchrx_type:4; - u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ - ktime_t tfrchrx_tstamp; -}; - -extern struct tfrc_rx_hist_entry * - tfrc_rx_hist_entry_new(const u32 ndp, - const struct sk_buff *skb, - const gfp_t prio); +struct tfrc_rx_hist_entry; -static inline struct tfrc_rx_hist_entry * - tfrc_rx_hist_head(struct list_head *list) -{ - struct tfrc_rx_hist_entry *head = NULL; +/* Subtraction a-b modulo-16, respects circular wrap-around */ +#define SUB16(a, b) (((a) + 16 - (b)) & 0xF) - if (!list_empty(list)) - head = list_entry(list->next, struct tfrc_rx_hist_entry, - tfrchrx_node); - return head; -} +/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */ +#define TFRC_NDUPACK 3 -extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval); -extern struct tfrc_rx_hist_entry * - tfrc_rx_hist_find_data_packet(const struct list_head *list); - -extern void tfrc_rx_hist_add_packet(struct list_head *rx_list, - struct list_head *li_list, - struct tfrc_rx_hist_entry *packet, - u64 nonloss_seqno); - -extern void tfrc_rx_hist_purge(struct list_head *list); +/** + * tfrc_rx_hist - RX history structure for TFRC-based protocols + * + * @ring: Packet history for RTT sampling and loss detection + * @loss_count: Number of entries in circular history + * @loss_start: Movable index (for loss detection) + * @rtt_sample_prev: Used during RTT sampling, points to candidate entry + */ +struct tfrc_rx_hist { + struct tfrc_rx_hist_entry *ring; + u8 loss_count:2, + loss_start:2; +#define rtt_sample_prev loss_start +}; -static inline int - tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry) -{ - return entry->tfrchrx_type == DCCP_PKT_DATA || - entry->tfrchrx_type == DCCP_PKT_DATAACK; -} +extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, + const struct sk_buff *skb, const u32 ndp); -extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list, - struct list_head *li_list, u8 *win_loss); +extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb); +extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, + const struct sk_buff *skb, u32 ndp); +extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, + const struct sk_buff *skb); +extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h); +extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h); #endif /* _DCCP_PKT_HIST_ */ diff --git a/net/dccp/ccids/lib/packet_history_internal.h b/net/dccp/ccids/lib/packet_history_internal.h new file mode 100644 index 0000000..70d5c31 --- /dev/null +++ b/net/dccp/ccids/lib/packet_history_internal.h @@ -0,0 +1,67 @@ +#ifndef _DCCP_PKT_HIST_INT_ +#define _DCCP_PKT_HIST_INT_ +/* + * Packet RX/TX history data structures and routines for TFRC-based protocols. + * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * + * This code has been developed by the University of Waikato WAND + * research group. For further information please see http://www.wand.net.nz/ + * or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz + * + * This code also uses code from Lulea University, rereleased as GPL by its + * authors: + * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon + * + * Changes to meet Linux coding standards, to make it meet latest ccid3 draft + * and to make it work as a loadable module in the DCCP stack written by + * Arnaldo Carvalho de Melo <acme@conectiva.com.br>. + * + * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <linux/dccp.h> +#include <linux/ktime.h> +#include <linux/types.h> + +#define TFRC_WIN_COUNT_LIMIT 16 + +/** + * tfrc_rx_hist_entry - Store information about a single received packet + * @tfrchrx_seqno: DCCP packet sequence number + * @tfrchrx_ccval: window counter value of packet (RFC 4342, 8.1) + * @tfrchrx_ndp: the NDP count (if any) of the packet + * @tfrchrx_tstamp: actual receive time of packet + */ +struct tfrc_rx_hist_entry { + u64 tfrchrx_seqno:48, + tfrchrx_ccval:4, + tfrchrx_type:4; + u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ + ktime_t tfrchrx_tstamp; +}; + +static inline bool tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *h) +{ + return h->tfrchrx_type == DCCP_PKT_DATA || + h->tfrchrx_type == DCCP_PKT_DATAACK || + h->tfrchrx_type == DCCP_PKT_REQUEST || + h->tfrchrx_type == DCCP_PKT_RESPONSE; +} + +#endif /* _DCCP_PKT_HIST_INT_ */ -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] [TFRC] New rx history code 2007-12-02 21:36 ` [PATCH 7/7] [TFRC] New rx history code Arnaldo Carvalho de Melo @ 2007-12-04 6:55 ` Gerrit Renker 2007-12-04 11:59 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Gerrit Renker @ 2007-12-04 6:55 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp, Ingo Molnar NAK. You have changed the control flow of the algorithm and the underlying data structure. Originally it had been an array of pointers, and this had been a design decision right from the first submissions in March. From I of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt "1. 'ring' is an array of pointers rather than a contiguous area in memory. The reason is that, when having to swap adjacent entries to sort the history, it is easier to swap pointers than to copy (heavy-weight) history-entry data structs." But in your algorithm it becomes a normal linear array. As a consequence all subsequent code needs to be rewritten to use copying instead of swapping pointers. And there is a lot of that, since the loss detection code makes heavy use of swapping entries in order to cope with reordered and/or delayed packets. This is not what I would call "entirely equivalent" as you claim. Your algorithm is fundamentally different, the control flow is not the same, nor are the underlying data structures. | Credit here goes to Gerrit Renker, that provided the initial implementation for | this new codebase. | | I modified it just to try to make it closer to the existing API, hide details from | the CCIDs and fix a couple bugs found in the initial implementation. What is "a couple bugs"? So far you have pointed out only one problem and that was agreed, it can be fixed. But it remains a side issue, it is not a failure of the algorithm per se. What is worse here is that you take this single occurrence as a kind of carte blanche to mess around with the code as you please. Where please are the other bugs you are referring to? I am not buying into this and am not convinced that you are understanding what you are doing. It is the third time that you take patches which have been submitted on dccp@vger for over half a year, for which you have offered no more than a sentence of feedback, release them under your name, and introduce fundamental changes which alter the behaviour. The first instance was the set of ktime patches which I had developed for the test tree and which you extended to DCCP. In this case it were in fact three bugs that you added in migrating my patches. I know this because it messed up the way CCID3 behaved and since I spent several hours chasing these. And, in contrast to the above, it is not a mere claim: that is recorded in the netdev mail archives. The second instance was when you released the TX history patches under your name. Pro forma there was an RFC patch at 3pm, de facto it was checked in a few hours later: input not welcome. The third instance is now where you change in essence the floor underneath this work. Since you are using a different basis, the algorithm is (in addition to the changes in control flow) necessarily different. I have provided documentation, written test modules, and am able to prove that the algorithm as submitted works reasonably correct. In addition, the behaviour has been verified using regression tests. I am not prepared to take your claims and expressions of "deepest respect" at face value since your actions - not your words - show that you are, in fact, not dealing respectfully with work which has taken months to complete and verify. During 9 months on dccp@vger you did not provide so much as a paragraph of feedback. Instead you mopped up code from the test tree, modified it according to your own whims and now, in the circle of your invitation-only buddies, start to talk about having discussions and iterations. The only iteration I can see is in fixing up the things you introduced, so it is not you who has to pay the price. Sorry, unless you can offer a more mature model of collaboration, consider me out of this and the patches summarily withdrawn. I am not prepared to throw away several months of work just because you feel inspired to do as you please with the work of others. Gerrit | Original changeset comment from Gerrit: | ----------- | This provides a new, self-contained and generic RX history service for TFRC | based protocols. | | Details: | * new data structure, initialisation and cleanup routines; | * allocation of dccp_rx_hist entries local to packet_history.c, | as a service exported by the dccp_tfrc_lib module. | * interface to automatically track highest-received seqno; | * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); | * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. | ----------- | | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> | --- | net/dccp/ccids/ccid3.c | 255 ++++++++---------------- | net/dccp/ccids/ccid3.h | 14 +- | net/dccp/ccids/lib/loss_interval.c | 14 +- | net/dccp/ccids/lib/packet_history.c | 277 +++++++++++++++----------- | net/dccp/ccids/lib/packet_history.h | 82 +++----- | net/dccp/ccids/lib/packet_history_internal.h | 67 ++++++ | 6 files changed, 353 insertions(+), 356 deletions(-) | create mode 100644 net/dccp/ccids/lib/packet_history_internal.h | | diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c | index 5ff5aab..af64c1d 100644 | --- a/net/dccp/ccids/ccid3.c | +++ b/net/dccp/ccids/ccid3.c | @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, | /* | * Receiver Half-Connection Routines | */ | + | +/* CCID3 feedback types */ | +enum ccid3_fback_type { | + CCID3_FBACK_NONE = 0, | + CCID3_FBACK_INITIAL, | + CCID3_FBACK_PERIODIC,| + CCID3_FBACK_PARAM_CHANGE | +}; | + | #ifdef CONFIG_IP_DCCP_CCID3_DEBUG | static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) | { | @@ -673,53 +682,60 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) | hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); | } | | -static void ccid3_hc_rx_send_feedback(struct sock *sk) | +static void ccid3_hc_rx_send_feedback(struct sock *sk, | + const struct sk_buff *skb, | + enum ccid3_fback_type fbtype) | { | struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); | struct dccp_sock *dp = dccp_sk(sk); | - struct tfrc_rx_hist_entry *packet; | ktime_t now; | - suseconds_t delta; | + s64 delta = 0; | | ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk); | | + if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM)) | + return; | + | now = ktime_get_real(); | | - switch (hcrx->ccid3hcrx_state) { | - case TFRC_RSTATE_NO_DATA: | + switch (fbtype) { | + case CCID3_FBACK_INITIAL: | hcrx->ccid3hcrx_x_recv = 0; | + hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ | break; | - case TFRC_RSTATE_DATA: | - delta = ktime_us_delta(now, | - hcrx->ccid3hcrx_tstamp_last_feedback); | - DCCP_BUG_ON(delta < 0); | - hcrx->ccid3hcrx_x_recv = | - scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); | + case CCID3_FBACK_PARAM_CHANGE: | + /* | + * When parameters change (new loss or p > p_prev), we do not | + * have a reliable estimate for R_m of [RFC 3448, 6.2] and so | + * need to reuse the previous value of X_recv. However, when | + * X_recv was 0 (due to early loss), this would kill X down to | + * s/t_mbi (i.e. one packet in 64 seconds). | + * To avoid such drastic reduction, we approximate X_recv as | + * the number of bytes since last feedback. | + * This is a safe fallback, since X is bounded above by X_calc. | + */ | + if (hcrx->ccid3hcrx_x_recv > 0) | + break; | + /* fall through */ | + case CCID3_FBACK_PERIODIC: | + delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback); | + if (delta <= 0) | + DCCP_BUG("delta (%ld) <= 0", (long)delta); | + else | + hcrx->ccid3hcrx_x_recv = | + scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); | break; | - case TFRC_RSTATE_TERM: | - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); | + default: | return; | } | | - packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); | - if (unlikely(packet == NULL)) { | - DCCP_WARN("%s(%p), no data packet in history!\n", | - dccp_role(sk), sk); | - return; | - } | + ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, | + hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv); | | hcrx->ccid3hcrx_tstamp_last_feedback = now; | - hcrx->ccid3hcrx_ccval_last_counter = packet->tfrchrx_ccval; | + hcrx->ccid3hcrx_last_counter = dccp_hdr(skb)->dccph_ccval; | hcrx->ccid3hcrx_bytes_recv = 0; | | - if (hcrx->ccid3hcrx_p == 0) | - hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ | - else if (hcrx->ccid3hcrx_p > 1000000) { | - DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p); | - hcrx->ccid3hcrx_pinv = 1; /* use 100% in this case */ | - } else | - hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p; | - | dp->dccps_hc_rx_insert_options = 1; | dccp_send_ack(sk); | } | @@ -753,162 +769,52 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) | static int ccid3_hc_rx_detect_loss(struct sock *sk, | struct tfrc_rx_hist_entry *packet) | { | - struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); | - struct tfrc_rx_hist_entry *rx_hist = | - tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist); | - u64 seqno = packet->tfrchrx_seqno; | - u64 tmp_seqno; | - int loss = 0; | - u8 ccval; | - | - | - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; | - | - if (!rx_hist || | - follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { | - hcrx->ccid3hcrx_seqno_nonloss = seqno; | - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; | - goto detect_out; | - } | - | - | - while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno) | - > TFRC_RECV_NUM_LATE_LOSS) { | - loss = 1; | - dccp_li_update_li(sk, | - &hcrx->ccid3hcrx_li_hist, | - &hcrx->ccid3hcrx_hist, | - hcrx->ccid3hcrx_tstamp_last_feedback, | - hcrx->ccid3hcrx_s, | - hcrx->ccid3hcrx_bytes_recv, | - hcrx->ccid3hcrx_x_recv, | - hcrx->ccid3hcrx_seqno_nonloss, | - hcrx->ccid3hcrx_ccval_nonloss); | - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; | - dccp_inc_seqno(&tmp_seqno); | - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; | - dccp_inc_seqno(&tmp_seqno); | - while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, | - tmp_seqno, &ccval)) { | - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; | - hcrx->ccid3hcrx_ccval_nonloss = ccval; | - dccp_inc_seqno(&tmp_seqno); | - } | - } | - | - /* FIXME - this code could be simplified with above while */ | - /* but works at moment */ | - if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { | - hcrx->ccid3hcrx_seqno_nonloss = seqno; | - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; | - } | - | -detect_out: | - tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, | - &hcrx->ccid3hcrx_li_hist, packet, | - hcrx->ccid3hcrx_seqno_nonloss); | - return loss; | + return 0; | } | | static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) | { | struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); | - const struct dccp_options_received *opt_recv; | - struct tfrc_rx_hist_entry *packet; | - u32 p_prev, r_sample, rtt_prev; | - int loss, payload_size; | - ktime_t now; | - | - opt_recv = &dccp_sk(sk)->dccps_options_received; | + const u32 payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; | + const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp; | + enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE; | + const bool is_data_packet = dccp_data_packet(skb); | | - switch (DCCP_SKB_CB(skb)->dccpd_type) { | - case DCCP_PKT_ACK: | - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) | + if (hcrx->ccid3hcrx_state != TFRC_RSTATE_NO_DATA) { | + if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb)) | return; | - case DCCP_PKT_DATAACK: | - if (opt_recv->dccpor_timestamp_echo == 0) | - break; | - r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo; | - rtt_prev = hcrx->ccid3hcrx_rtt; | - r_sample = dccp_sample_rtt(sk, 10 * r_sample); | - | - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) | - hcrx->ccid3hcrx_rtt = r_sample; | - else | - hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 + | - r_sample / 10; | - | - if (rtt_prev != hcrx->ccid3hcrx_rtt) | - ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n", | - dccp_role(sk), sk, hcrx->ccid3hcrx_rtt, | - opt_recv->dccpor_elapsed_time); | - break; | - case DCCP_PKT_DATA: | - break; | - default: /* We're not interested in other packet types, move along */ | - return; | - } | - | - packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); | - if (unlikely(packet == NULL)) { | - DCCP_WARN("%s(%p), Not enough mem to add rx packet " | - "to history, consider it lost!\n", dccp_role(sk), sk); | - return; | - } | - | - loss = ccid3_hc_rx_detect_loss(sk, packet); | - | - if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK) | - return; | - | - payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; | - ccid3_hc_rx_update_s(hcrx, payload_size); | | - switch (hcrx->ccid3hcrx_state) { | - case TFRC_RSTATE_NO_DATA: | - ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial " | - "feedback\n", dccp_role(sk), sk, | - dccp_state_name(sk->sk_state), skb); | - ccid3_hc_rx_send_feedback(sk); | - ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); | - return; | - case TFRC_RSTATE_DATA: | - hcrx->ccid3hcrx_bytes_recv += payload_size; | - if (loss) | - break; | - | - now = ktime_get_real(); | - if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) - | - (s64)hcrx->ccid3hcrx_rtt) >= 0) { | - hcrx->ccid3hcrx_tstamp_last_ack = now; | - ccid3_hc_rx_send_feedback(sk); | + /* Handle pending losses and otherwise check for new loss */ | + if (!tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist, | + skb, ndp) && | + /* Handle data packets: RTT sampling and monitoring p */ | + is_data_packet) { | + | + if (list_empty(&hcrx->ccid3hcrx_li_hist)) { /* no loss so far: p = 0 */ | + const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb); | + if (sample != 0) | + hcrx->ccid3hcrx_rtt = | + tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9); | + } | + | + ccid3_hc_rx_update_s(hcrx, payload_size); | + /* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */ | + if (SUB16(dccp_hdr(skb)->dccph_ccval, | + hcrx->ccid3hcrx_last_counter) > 3) | + do_feedback = CCID3_FBACK_PERIODIC; | } | - return; | - case TFRC_RSTATE_TERM: | - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); | - return; | - } | - | - /* Dealing with packet loss */ | - ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n", | - dccp_role(sk), sk, dccp_state_name(sk->sk_state)); | - | - p_prev = hcrx->ccid3hcrx_p; | + } else if (is_data_packet) { | + do_feedback = CCID3_FBACK_INITIAL; | + ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); | + hcrx->ccid3hcrx_s = payload_size; | + } | | - /* Calculate loss event rate */ | - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { | - u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); | + hcrx->ccid3hcrx_bytes_recv += payload_size; | | - /* Scaling up by 1000000 as fixed decimal */ | - if (i_mean != 0) | - hcrx->ccid3hcrx_p = 1000000 / i_mean; | - } else | - DCCP_BUG("empty loss history"); | + tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp); | | - if (hcrx->ccid3hcrx_p > p_prev) { | - ccid3_hc_rx_send_feedback(sk); | - return; | - } | + if (do_feedback) | + ccid3_hc_rx_send_feedback(sk, skb, do_feedback); | } | | static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) | @@ -918,11 +824,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) | ccid3_pr_debug("entry\n"); | | hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; | - INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist); | INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist); | - hcrx->ccid3hcrx_tstamp_last_feedback = | - hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real(); | - return 0; | + return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist); | } | | static void ccid3_hc_rx_exit(struct sock *sk) | diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h | index b842a7d..3c33dc6 100644 | --- a/net/dccp/ccids/ccid3.h | +++ b/net/dccp/ccids/ccid3.h | @@ -1,7 +1,8 @@ | /* | * net/dccp/ccids/ccid3.h | * | - * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. | + * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. | + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK | * | * An implementation of the DCCP protocol | * | @@ -135,9 +136,7 @@ enum ccid3_hc_rx_states { | * @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448 4.3) | * @ccid3hcrx_rtt - Receiver estimate of rtt (non-standard) | * @ccid3hcrx_p - current loss event rate (RFC 3448 5.4) | - * @ccid3hcrx_seqno_nonloss - Last received non-loss sequence number | - * @ccid3hcrx_ccval_nonloss - Last received non-loss Window CCVal | - * @ccid3hcrx_ccval_last_counter - Tracks window counter (RFC 4342, 8.1) | + * @ccid3hcrx_last_counter - Tracks window counter (RFC 4342, 8.1) | * @ccid3hcrx_state - receiver state, one of %ccid3_hc_rx_states | * @ccid3hcrx_bytes_recv - Total sum of DCCP payload bytes | * @ccid3hcrx_tstamp_last_feedback - Time at which last feedback was sent | @@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock { | #define ccid3hcrx_x_recv ccid3hcrx_tfrc.tfrcrx_x_recv | #define ccid3hcrx_rtt ccid3hcrx_tfrc.tfrcrx_rtt | #define ccid3hcrx_p ccid3hcrx_tfrc.tfrcrx_p | - u64 ccid3hcrx_seqno_nonloss:48, | - ccid3hcrx_ccval_nonloss:4, | - ccid3hcrx_ccval_last_counter:4; | + u8 ccid3hcrx_last_counter:4; | enum ccid3_hc_rx_states ccid3hcrx_state:8; | u32 ccid3hcrx_bytes_recv; | ktime_t ccid3hcrx_tstamp_last_feedback; | - ktime_t ccid3hcrx_tstamp_last_ack; | - struct list_head ccid3hcrx_hist; | + struct tfrc_rx_hist ccid3hcrx_hist; | struct list_head ccid3hcrx_li_hist; | u16 ccid3hcrx_s; | u32 ccid3hcrx_pinv; | diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c | index a5f59af..71080ca 100644 | --- a/net/dccp/ccids/lib/loss_interval.c | +++ b/net/dccp/ccids/lib/loss_interval.c | @@ -16,6 +16,7 @@ | #include "../../dccp.h" | #include "loss_interval.h" | #include "packet_history.h" | +#include "packet_history_internal.h" | #include "tfrc.h" | | #define DCCP_LI_HIST_IVAL_F_LENGTH 8 | @@ -129,6 +130,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk, | u16 s, u32 bytes_recv, | u32 previous_x_recv) | { | +/* | + * FIXME: | + * Will be rewritten in the upcoming new loss intervals code. | + * Has to be commented ou because it relies on the old rx history | + * data structures | + */ | +#if 0 | struct tfrc_rx_hist_entry *entry, *next, *tail = NULL; | u32 x_recv, p; | suseconds_t rtt, delta; | @@ -216,10 +224,10 @@ found: | dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied " | "loss rate=%u\n", dccp_role(sk), sk, x_recv, p); | | - if (p == 0) | - return ~0; | - else | + if (p != 0) | return 1000000 / p; | +#endif | + return ~0; | } | | void dccp_li_update_li(struct sock *sk, | diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c | index 255cca1..80ad4cd 100644 | --- a/net/dccp/ccids/lib/packet_history.c | +++ b/net/dccp/ccids/lib/packet_history.c | @@ -36,7 +36,10 @@ | */ | | #include <linux/string.h> | +#include <linux/slab.h> | #include "packet_history.h" | +#include "../../dccp.h" | +#include "packet_history_internal.h" | | /** | * tfrc_tx_hist_entry - Simple singly-linked TX history list | @@ -111,154 +114,195 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, | } | EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); | | -/* | - * Receiver History Routines | +/** | + * tfrc_rx_hist_index - index to reach n-th entry after loss_start | */ | -static struct kmem_cache *tfrc_rx_hist_slab; | +static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n) | +{ | + return (h->loss_start + n) & TFRC_NDUPACK; | +} | | -struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp, | - const struct sk_buff *skb, | - const gfp_t prio) | +/** | + * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far | + */ | +static inline struct tfrc_rx_hist_entry * | + tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h) | { | - struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, | - prio); | + return h->ring + tfrc_rx_hist_index(h, h->loss_count); | +} | | - if (entry != NULL) { | - const struct dccp_hdr *dh = dccp_hdr(skb); | +/** | + * tfrc_rx_hist_entry - return the n-th history entry after loss_start | + */ | +static inline struct tfrc_rx_hist_entry * | + tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n) | +{ | + return h->ring + tfrc_rx_hist_index(h, n); | +} | | - entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; | - entry->tfrchrx_ccval = dh->dccph_ccval; | - entry->tfrchrx_type = dh->dccph_type; | - entry->tfrchrx_ndp = ndp; | - entry->tfrchrx_tstamp = ktime_get_real(); | - } | +/** | + * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected | + */ | +static inline struct tfrc_rx_hist_entry * | + tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h) | +{ | + return h->ring + h->loss_start; | +} | + | +/* | + * Receiver History Routines | + */ | +static struct kmem_cache *tfrc_rx_hist_slab; | | - return entry; | +void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, | + const struct sk_buff *skb, | + const u32 ndp) | +{ | + struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h); | + const struct dccp_hdr *dh = dccp_hdr(skb); | + | + entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; | + entry->tfrchrx_ccval = dh->dccph_ccval; | + entry->tfrchrx_type = dh->dccph_type; | + entry->tfrchrx_ndp = ndp; | + entry->tfrchrx_tstamp = ktime_get_real(); | } | -EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new); | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); | | static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry) | { | kmem_cache_free(tfrc_rx_hist_slab, entry); | } | | -int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, | - u8 *ccval) | +/* has the packet contained in skb been seen before? */ | +int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb) | { | - struct tfrc_rx_hist_entry *packet = NULL, *entry; | + const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq; | + int i; | | - list_for_each_entry(entry, list, tfrchrx_node) | - if (entry->tfrchrx_seqno == seq) { | - packet = entry; | - break; | - } | + if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0) | + return 1; | | - if (packet) | - *ccval = packet->tfrchrx_ccval; | + for (i = 1; i <= h->loss_count; i++) | + if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq) | + return 1; | | - return packet != NULL; | + return 0; | } | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate); | | -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry); | -struct tfrc_rx_hist_entry * | - tfrc_rx_hist_find_data_packet(const struct list_head *list) | +/* initialise loss detection and disable RTT sampling */ | +static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h) | { | - struct tfrc_rx_hist_entry *entry, *packet = NULL; | + h->loss_count = 1; | +} | | - list_for_each_entry(entry, list, tfrchrx_node) | - if (entry->tfrchrx_type == DCCP_PKT_DATA || | - entry->tfrchrx_type == DCCP_PKT_DATAACK) { | - packet = entry; | - break; | - } | +/* indicate whether previously a packet was detected missing */ | +static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h) | +{ | + return h->loss_count; | +} | | - return packet; | +/* any data packets missing between last reception and skb ? */ | +int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, | + const struct sk_buff *skb, u32 ndp) | +{ | + int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno, | + DCCP_SKB_CB(skb)->dccpd_seq); | + | + if (delta > 1 && ndp < delta) | + tfrc_rx_hist_loss_indicated(h); | + | + return tfrc_rx_hist_loss_pending(h); | +} | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated); | + | +void tfrc_rx_hist_purge(struct tfrc_rx_hist *h) | +{ | + kmem_cache_free(tfrc_rx_hist_slab, h->ring); | + h->ring = NULL; | } | | -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet); | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); | | -void tfrc_rx_hist_add_packet(struct list_head *rx_list, | - struct list_head *li_list, | - struct tfrc_rx_hist_entry *packet, | - u64 nonloss_seqno) | +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) | { | - struct tfrc_rx_hist_entry *entry, *next; | - u8 num_later = 0; | - | - list_add(&packet->tfrchrx_node, rx_list); | - | - num_later = TFRC_RECV_NUM_LATE_LOSS + 1; | - | - if (!list_empty(li_list)) { | - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { | - if (num_later == 0) { | - if (after48(nonloss_seqno, | - entry->tfrchrx_seqno)) { | - list_del_init(&entry->tfrchrx_node); | - tfrc_rx_hist_entry_delete(entry); | - } | - } else if (tfrc_rx_hist_entry_data_packet(entry)) | - --num_later; | - } | - } else { | - int step = 0; | - u8 win_count = 0; /* Not needed, but lets shut up gcc */ | - int tmp; | - /* | - * We have no loss interval history so we need at least one | - * rtt:s of data packets to approximate rtt. | - */ | - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { | - if (num_later == 0) { | - switch (step) { | - case 0: | - step = 1; | - /* OK, find next data packet */ | - num_later = 1; | - break; | - case 1: | - step = 2; | - /* OK, find next data packet */ | - num_later = 1; | - win_count = entry->tfrchrx_ccval; | - break; | - case 2: | - tmp = win_count - entry->tfrchrx_ccval; | - if (tmp < 0) | - tmp += TFRC_WIN_COUNT_LIMIT; | - if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) { | - /* | - * We have found a packet older | - * than one rtt remove the rest | - */ | - step = 3; | - } else /* OK, find next data packet */ | - num_later = 1; | - break; | - case 3: | - list_del_init(&entry->tfrchrx_node); | - tfrc_rx_hist_entry_delete(entry); | - break; | - } | - } else if (tfrc_rx_hist_entry_data_packet(entry)) | - --num_later; | - } | - } | + h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); | + h->loss_count = h->loss_start = 0; | + return h->ring == NULL; | } | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc); | | -EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); | +/** | + * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against | + */ | +static inline struct tfrc_rx_hist_entry * | + tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h) | +{ | + return h->ring; | +} | | -void tfrc_rx_hist_purge(struct list_head *list) | +/** | + * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry | + */ | +static inline struct tfrc_rx_hist_entry * | + tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h) | { | - struct tfrc_rx_hist_entry *entry, *next; | + return h->ring + h->rtt_sample_prev; | +} | | - list_for_each_entry_safe(entry, next, list, tfrchrx_node) { | - list_del_init(&entry->tfrchrx_node); | - tfrc_rx_hist_entry_delete(entry); | +/** | + * tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal | + * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able | + * to compute a sample with given data - calling function should check this. | + */ | +u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb) | +{ | + u32 sample = 0, | + delta_v = SUB16(dccp_hdr(skb)->dccph_ccval, | + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); | + | + if (delta_v < 1 || delta_v > 4) { /* unsuitable CCVal delta */ | + if (h->rtt_sample_prev == 2) { /* previous candidate stored */ | + sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, | + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); | + if (sample) | + sample = 4 / sample * | + ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp, | + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp); | + else /* | + * FIXME: This condition is in principle not | + * possible but occurs when CCID is used for | + * two-way data traffic. I have tried to trace | + * it, but the cause does not seem to be here. | + */ | + DCCP_BUG("please report to dccp@vger.kernel.org" | + " => prev = %u, last = %u", | + tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, | + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); | + } else if (delta_v < 1) { | + h->rtt_sample_prev = 1; | + goto keep_ref_for_next_time; | + } | + | + } else if (delta_v == 4) /* optimal match */ | + sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp)); | + else { /* suboptimal match */ | + h->rtt_sample_prev = 2; | + goto keep_ref_for_next_time; | } | -} | | -EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); | + if (unlikely(sample > DCCP_SANE_RTT_MAX)) { | + DCCP_WARN("RTT sample %u too large, using max\n", sample); | + sample = DCCP_SANE_RTT_MAX; | + } | + | + h->rtt_sample_prev = 0; /* use current entry as next reference */ | +keep_ref_for_next_time: | + | + return sample; | +} | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt); | | __init int packet_history_init(void) | { | @@ -269,7 +313,8 @@ __init int packet_history_init(void) | goto out_err; | | tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist", | - sizeof(struct tfrc_rx_hist_entry), 0, | + (sizeof(struct tfrc_rx_hist_entry) * | + (TFRC_NDUPACK + 1)), 0, | SLAB_HWCACHE_ALIGN, NULL); | if (tfrc_rx_hist_slab == NULL) | goto out_free_tx; | diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h | index 5b0b983..6871e51 100644 | --- a/net/dccp/ccids/lib/packet_history.h | +++ b/net/dccp/ccids/lib/packet_history.h | @@ -37,15 +37,9 @@ | #define _DCCP_PKT_HIST_ | | #include <linux/ktime.h> | -#include <linux/list.h> | -#include <linux/slab.h> | -#include "tfrc.h" | +#include <linux/types.h> | | -/* Number of later packets received before one is considered lost */ | -#define TFRC_RECV_NUM_LATE_LOSS 3 | - | -#define TFRC_WIN_COUNT_PER_RTT 4 | -#define TFRC_WIN_COUNT_LIMIT 16 | +struct sk_buff; | | struct tfrc_tx_hist_entry; | | @@ -54,54 +48,38 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp); | extern u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, | const u64 seqno, const ktime_t now); | | -/* | - * Receiver History data structures and declarations | - */ | -struct tfrc_rx_hist_entry { | - struct list_head tfrchrx_node; | - u64 tfrchrx_seqno:48, | - tfrchrx_ccval:4, | - tfrchrx_type:4; | - u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ | - ktime_t tfrchrx_tstamp; | -}; | - | -extern struct tfrc_rx_hist_entry * | - tfrc_rx_hist_entry_new(const u32 ndp, | - const struct sk_buff *skb, | - const gfp_t prio); | +struct tfrc_rx_hist_entry; | | -static inline struct tfrc_rx_hist_entry * | - tfrc_rx_hist_head(struct list_head *list) | -{ | - struct tfrc_rx_hist_entry *head = NULL; | +/* Subtraction a-b modulo-16, respects circular wrap-around */ | +#define SUB16(a, b) (((a) + 16 - (b)) & 0xF) | | - if (!list_empty(list)) | - head = list_entry(list->next, struct tfrc_rx_hist_entry, | - tfrchrx_node); | - return head; | -} | +/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */ | +#define TFRC_NDUPACK 3 | | -extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, | - u8 *ccval); | -extern struct tfrc_rx_hist_entry * | - tfrc_rx_hist_find_data_packet(const struct list_head *list); | - | -extern void tfrc_rx_hist_add_packet(struct list_head *rx_list, | - struct list_head *li_list, | - struct tfrc_rx_hist_entry *packet, | - u64 nonloss_seqno); | - | -extern void tfrc_rx_hist_purge(struct list_head *list); | +/** | + * tfrc_rx_hist - RX history structure for TFRC-based protocols | + * | + * @ring: Packet history for RTT sampling and loss detection | + * @loss_count: Number of entries in circular history | + * @loss_start: Movable index (for loss detection) | + * @rtt_sample_prev: Used during RTT sampling, points to candidate entry | + */ | +struct tfrc_rx_hist { | + struct tfrc_rx_hist_entry *ring; | + u8 loss_count:2, | + loss_start:2; | +#define rtt_sample_prev loss_start | +}; | | -static inline int | - tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry) | -{ | - return entry->tfrchrx_type == DCCP_PKT_DATA || | - entry->tfrchrx_type == DCCP_PKT_DATAACK; | -} | +extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, | + const struct sk_buff *skb, const u32 ndp); | | -extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list, | - struct list_head *li_list, u8 *win_loss); | +extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb); | +extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, | + const struct sk_buff *skb, u32 ndp); | +extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, | + const struct sk_buff *skb); | +extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h); | +extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h); | | #endif /* _DCCP_PKT_HIST_ */ | diff --git a/net/dccp/ccids/lib/packet_history_internal.h b/net/dccp/ccids/lib/packet_history_internal.h | new file mode 100644 | index 0000000..70d5c31 | --- /dev/null | +++ b/net/dccp/ccids/lib/packet_history_internal.h | @@ -0,0 +1,67 @@ | +#ifndef _DCCP_PKT_HIST_INT_ | +#define _DCCP_PKT_HIST_INT_ | +/* | + * Packet RX/TX history data structures and routines for TFRC-based protocols. | + * | + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK | + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. | + * | + * This code has been developed by the University of Waikato WAND | + * research group. For further information please see http://www.wand.net.nz/ | + * or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz | + * | + * This code also uses code from Lulea University, rereleased as GPL by its | + * authors: | + * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon | + * | + * Changes to meet Linux coding standards, to make it meet latest ccid3 draft | + * and to make it work as a loadable module in the DCCP stack written by | + * Arnaldo Carvalho de Melo <acme@conectiva.com.br>. | + * | + * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> | + * | + * This program is free software; you can redistribute it and/or modify | + * it under the terms of the GNU General Public License as published by | + * the Free Software Foundation; either version 2 of the License, or | + * (at your option) any later version. | + * | + * This program is distributed in the hope that it will be useful, | + * but WITHOUT ANY WARRANTY; without even the implied warranty of | + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | + * GNU General Public License for more details. | + * | + * You should have received a copy of the GNU General Public License | + * along with this program; if not, write to the Free Software | + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | + */ | + | +#include <linux/dccp.h> | +#include <linux/ktime.h> | +#include <linux/types.h> | + | +#define TFRC_WIN_COUNT_LIMIT 16 | + | +/** | + * tfrc_rx_hist_entry - Store information about a single received packet | + * @tfrchrx_seqno: DCCP packet sequence number | + * @tfrchrx_ccval: window counter value of packet (RFC 4342, 8.1) | + * @tfrchrx_ndp: the NDP count (if any) of the packet | + * @tfrchrx_tstamp: actual receive time of packet | + */ | +struct tfrc_rx_hist_entry { | + u64 tfrchrx_seqno:48, | + tfrchrx_ccval:4, | + tfrchrx_type:4; | + u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ | + ktime_t tfrchrx_tstamp; | +}; | + | +static inline bool tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *h) | +{ | + return h->tfrchrx_type == DCCP_PKT_DATA || | + h->tfrchrx_type == DCCP_PKT_DATAACK || | + h->tfrchrx_type == DCCP_PKT_REQUEST || | + h->tfrchrx_type == DCCP_PKT_RESPONSE; | +} | + | +#endif /* _DCCP_PKT_HIST_INT_ */ | -- | 1.5.3.4 | | - | To unsubscribe from this list: send the line "unsubscribe dccp" in | the body of a message to majordomo@vger.kernel.org | More majordomo info at http://vger.kernel.org/majordomo-info.html | -- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] [TFRC] New rx history code 2007-12-04 6:55 ` Gerrit Renker @ 2007-12-04 11:59 ` Arnaldo Carvalho de Melo 2007-12-04 13:48 ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo 2007-12-05 9:35 ` [PATCH 7/7] [TFRC] " Gerrit Renker 0 siblings, 2 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-04 11:59 UTC (permalink / raw) To: Gerrit Renker, netdev, dccp Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu: > NAK. You have changed the control flow of the algorithm and the underlying > data structure. Originally it had been an array of pointers, and this had > been a design decision right from the first submissions in March. From I > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt > > "1. 'ring' is an array of pointers rather than a contiguous area in > memory. The reason is that, when having to swap adjacent entries > to sort the history, it is easier to swap pointers than to copy > (heavy-weight) history-entry data structs." > > But in your algorithm it becomes a normal linear array. > > As a consequence all subsequent code needs to be rewritten to use > copying instead of swapping pointers. And there is a lot of that, since > the loss detection code makes heavy use of swapping entries in order to > cope with reordered and/or delayed packets. So lets not do that, the decision was made based on the RX history patch alone, where, as pointed out, no swapping occurs. > This is not what I would call "entirely equivalent" as you claim. Your > algorithm is fundamentally different, the control flow is not the same, > nor are the underlying data structures. Its is equivalent up to what is in the tree, thank you for point out that in subsequent patches it will be used. Had this design decision been made explicit in the changeset comment that introduces the data structure I wouldn't have tried to reduce the number of allocations. And you even said that it was a good decision when first reacting to this change, which I saw as positive feedback for the RFC I sent. > | Credit here goes to Gerrit Renker, that provided the initial implementation for > | this new codebase. > | > | I modified it just to try to make it closer to the existing API, hide details from > | the CCIDs and fix a couple bugs found in the initial implementation. > What is "a couple bugs"? So far you have pointed out only one problem and that was > agreed, it can be fixed. But it remains a side issue, it is not a failure of the I pointed two problems one ended up being just the fact that tfrc_ewma checks if the average is zero for every calculation. > algorithm per se. What is worse here is that you take this single occurrence as a > kind of carte blanche to mess around with the code as you please. Where please are > the other bugs you are referring to? I mentioned in the previous messages, one was a problem, the other you clarified that was not a problem, but could be optimized. > I am not buying into this and am not convinced that you are understanding > what you are doing. It is the third time that you take patches which > have been submitted on dccp@vger for over half a year, for which you > have offered no more than a sentence of feedback, release them under > your name, and introduce fundamental changes which alter the behaviour. I commited it under my name because I changed it, while giving you credit. > The first instance was the set of ktime patches which I had developed > for the test tree and which you extended to DCCP. In this case it were > in fact three bugs that you added in migrating my patches. I know this > because it messed up the way CCID3 behaved and since I spent several > hours chasing these. And, in contrast to the above, it is not a mere > claim: that is recorded in the netdev mail archives. I'm sorry you feel so strongly when back and forth you express gratitude and then you show contempt for my behaviour. > The second instance was when you released the TX history patches under > your name. Pro forma there was an RFC patch at 3pm, de facto it was > checked in a few hours later: input not welcome. Have you found any problems in it? Look again at the changeset to see if I "stole" your code as you seem to imply: commit 02271b56fd4028820e68e85e9d468628f42fb6ab Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Wed Nov 28 11:15:40 2007 -0200 [TFRC]: Migrate TX history to singly-linked lis This patch was based on another made by Gerrit Renker, his changelog was: ------------------------------------------------------ The patch set migrates TFRC TX history to a singly-linked list. The details are: * use of a consistent naming scheme (all TFRC functions now begin * with `tfrc_'); * allocation and cleanup are taken care of internally; * provision of a lookup function, which is used by the CCID TX * infrastructure to determine the time a packet was sent (in turn used for RTT sampling); * integration of the new interface with the present use in CCID3. ------------------------------------------------------ Simplifications I did: . removing the tfrc_tx_hist_head that had a pointer to the list head and another for the slabcache. . No need for creating a slabcache for each CCID that wants to use the TFRC tx history routines, create a single slabcache when the dccp_tfrc_lib module init routine is called. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Is this enough credit? I can make the process slower and simply post these patches and wait till you review it and incorporate the changes. > The third instance is now where you change in essence the floor > underneath this work. Since you are using a different basis, the > algorithm is (in addition to the changes in control flow) necessarily > different. You are picking something I did on a RFC patch and exaggerating on your reaction. > I have provided documentation, written test modules, and am able to prove > that the algorithm as submitted works reasonably correct. In addition, the > behaviour has been verified using regression tests. > > I am not prepared to take your claims and expressions of "deepest > respect" at face value since your actions - not your words - show that > you are, in fact, not dealing respectfully with work which has taken > months to complete and verify. > > During 9 months on dccp@vger you did not provide so much as a paragraph > of feedback. Instead you mopped up code from the test tree, modified it > according to your own whims and now, in the circle of your > invitation-only buddies, start to talk about having discussions and > iterations. The only iteration I can see is in fixing up the things you > introduced, so it is not you who has to pay the price. > > Sorry, unless you can offer a more mature model of collaboration, > consider me out of this and the patches summarily withdrawn. I am not > prepared to throw away several months of work just because you feel > inspired to do as you please with the work of others. Again you want to have your patches accepted as-is. Pointed to one case where I gave you credit while improving on your work (TX history) and another where the changes were up for review. I don't consider this a warm welcome for me to finally dedicate time to this effort. Would you prefer to continue working without help? I think we should encourage more people to work on DCCP, you seem to think that changes to your work are not acceptable. Perhaps others can comment on what is happening and help us find, as you say, to find a more mature model of collaboration. - Arnaldo ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 7/7][TAKE 2][TFRC] New rx history code 2007-12-04 11:59 ` Arnaldo Carvalho de Melo @ 2007-12-04 13:48 ` Arnaldo Carvalho de Melo 2007-12-05 9:42 ` Gerrit Renker 2007-12-05 9:35 ` [PATCH 7/7] [TFRC] " Gerrit Renker 1 sibling, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-04 13:48 UTC (permalink / raw) To: Gerrit Renker, netdev, dccp Em Tue, Dec 04, 2007 at 09:59:39AM -0200, Arnaldo Carvalho de Melo escreveu: > Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu: > > NAK. You have changed the control flow of the algorithm and the underlying > > data structure. Originally it had been an array of pointers, and this had > > been a design decision right from the first submissions in March. From I > > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt > > > > "1. 'ring' is an array of pointers rather than a contiguous area in > > memory. The reason is that, when having to swap adjacent entries > > to sort the history, it is easier to swap pointers than to copy > > (heavy-weight) history-entry data structs." > > > > But in your algorithm it becomes a normal linear array. > > > > As a consequence all subsequent code needs to be rewritten to use > > copying instead of swapping pointers. And there is a lot of that, since > > the loss detection code makes heavy use of swapping entries in order to > > cope with reordered and/or delayed packets. > > So lets not do that, the decision was made based on the RX history patch > alone, where, as pointed out, no swapping occurs. > > > This is not what I would call "entirely equivalent" as you claim. Your > > algorithm is fundamentally different, the control flow is not the same, > > nor are the underlying data structures. > > Its is equivalent up to what is in the tree, thank you for point out > that in subsequent patches it will be used. I found a problem that I'm still investigating if was introduced by this patch or if was already present. When sending 1 million 256 byte packets with ttcp over loopback, using ccid3 it is crashing, the test machine I'm using doesn't have a serial port, its a notebook, will switch to another that has and provide the backtrace. It doesn't happens every time. Here is tfrc_rx_hist_alloc back using ring of pointers with the fixed error path. +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) { + int i; + + for (i = 0; i <= TFRC_NDUPACK; i++) { + h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); + if (h->ring[i] == NULL) + goto out_free; + } + + h->loss_count = h->loss_start = 0; + return 0; + +out_free: + while (i-- != 0) { + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]); + h->ring[i] = NULL; } + return -ENOBUFS; } - Arnaldo >From 317c1ce69711fe7b0f89e23e84390a14c98c0f7e Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Tue, 4 Dec 2007 11:46:19 -0200 Subject: [PATCH 7/7] [TFRC]: New rx history code Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, hide details from the CCIDs and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: ----------- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. ------------- Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 255 +++++++---------------- net/dccp/ccids/ccid3.h | 14 +- net/dccp/ccids/lib/loss_interval.c | 14 +- net/dccp/ccids/lib/packet_history.c | 290 ++++++++++++++++---------- net/dccp/ccids/lib/packet_history.h | 82 +++----- net/dccp/ccids/lib/packet_history_internal.h | 67 ++++++ 6 files changed, 368 insertions(+), 354 deletions(-) create mode 100644 net/dccp/ccids/lib/packet_history_internal.h diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5ff5aab..af64c1d 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, /* * Receiver Half-Connection Routines */ + +/* CCID3 feedback types */ +enum ccid3_fback_type { + CCID3_FBACK_NONE = 0, + CCID3_FBACK_INITIAL, + CCID3_FBACK_PERIODIC, + CCID3_FBACK_PARAM_CHANGE +}; + #ifdef CONFIG_IP_DCCP_CCID3_DEBUG static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) { @@ -673,53 +682,60 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); } -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, + const struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct tfrc_rx_hist_entry *packet; ktime_t now; - suseconds_t delta; + s64 delta = 0; ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk); + if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; + now = ktime_get_real(); - switch (hcrx->ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case CCID3_FBACK_INITIAL: hcrx->ccid3hcrx_x_recv = 0; + hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx->ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta < 0); - hcrx->ccid3hcrx_x_recv = - scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); + case CCID3_FBACK_PARAM_CHANGE: + /* + * When parameters change (new loss or p > p_prev), we do not + * have a reliable estimate for R_m of [RFC 3448, 6.2] and so + * need to reuse the previous value of X_recv. However, when + * X_recv was 0 (due to early loss), this would kill X down to + * s/t_mbi (i.e. one packet in 64 seconds). + * To avoid such drastic reduction, we approximate X_recv as + * the number of bytes since last feedback. + * This is a safe fallback, since X is bounded above by X_calc. + */ + if (hcrx->ccid3hcrx_x_recv > 0) + break; + /* fall through */ + case CCID3_FBACK_PERIODIC: + delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback); + if (delta <= 0) + DCCP_BUG("delta (%ld) <= 0", (long)delta); + else + hcrx->ccid3hcrx_x_recv = + scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); break; - case TFRC_RSTATE_TERM: - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); + default: return; } - packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); - if (unlikely(packet == NULL)) { - DCCP_WARN("%s(%p), no data packet in history!\n", - dccp_role(sk), sk); - return; - } + ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv); hcrx->ccid3hcrx_tstamp_last_feedback = now; - hcrx->ccid3hcrx_ccval_last_counter = packet->tfrchrx_ccval; + hcrx->ccid3hcrx_last_counter = dccp_hdr(skb)->dccph_ccval; hcrx->ccid3hcrx_bytes_recv = 0; - if (hcrx->ccid3hcrx_p == 0) - hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ - else if (hcrx->ccid3hcrx_p > 1000000) { - DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p); - hcrx->ccid3hcrx_pinv = 1; /* use 100% in this case */ - } else - hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p; - dp->dccps_hc_rx_insert_options = 1; dccp_send_ack(sk); } @@ -753,162 +769,52 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) static int ccid3_hc_rx_detect_loss(struct sock *sk, struct tfrc_rx_hist_entry *packet) { - struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct tfrc_rx_hist_entry *rx_hist = - tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist); - u64 seqno = packet->tfrchrx_seqno; - u64 tmp_seqno; - int loss = 0; - u8 ccval; - - - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; - - if (!rx_hist || - follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { - hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; - goto detect_out; - } - - - while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno) - > TFRC_RECV_NUM_LATE_LOSS) { - loss = 1; - dccp_li_update_li(sk, - &hcrx->ccid3hcrx_li_hist, - &hcrx->ccid3hcrx_hist, - hcrx->ccid3hcrx_tstamp_last_feedback, - hcrx->ccid3hcrx_s, - hcrx->ccid3hcrx_bytes_recv, - hcrx->ccid3hcrx_x_recv, - hcrx->ccid3hcrx_seqno_nonloss, - hcrx->ccid3hcrx_ccval_nonloss); - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; - dccp_inc_seqno(&tmp_seqno); - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; - dccp_inc_seqno(&tmp_seqno); - while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, - tmp_seqno, &ccval)) { - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; - hcrx->ccid3hcrx_ccval_nonloss = ccval; - dccp_inc_seqno(&tmp_seqno); - } - } - - /* FIXME - this code could be simplified with above while */ - /* but works at moment */ - if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { - hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; - } - -detect_out: - tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, - &hcrx->ccid3hcrx_li_hist, packet, - hcrx->ccid3hcrx_seqno_nonloss); - return loss; + return 0; } static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - const struct dccp_options_received *opt_recv; - struct tfrc_rx_hist_entry *packet; - u32 p_prev, r_sample, rtt_prev; - int loss, payload_size; - ktime_t now; - - opt_recv = &dccp_sk(sk)->dccps_options_received; + const u32 payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; + const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp; + enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE; + const bool is_data_packet = dccp_data_packet(skb); - switch (DCCP_SKB_CB(skb)->dccpd_type) { - case DCCP_PKT_ACK: - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) + if (hcrx->ccid3hcrx_state != TFRC_RSTATE_NO_DATA) { + if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb)) return; - case DCCP_PKT_DATAACK: - if (opt_recv->dccpor_timestamp_echo == 0) - break; - r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo; - rtt_prev = hcrx->ccid3hcrx_rtt; - r_sample = dccp_sample_rtt(sk, 10 * r_sample); - - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) - hcrx->ccid3hcrx_rtt = r_sample; - else - hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 + - r_sample / 10; - - if (rtt_prev != hcrx->ccid3hcrx_rtt) - ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n", - dccp_role(sk), sk, hcrx->ccid3hcrx_rtt, - opt_recv->dccpor_elapsed_time); - break; - case DCCP_PKT_DATA: - break; - default: /* We're not interested in other packet types, move along */ - return; - } - - packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); - if (unlikely(packet == NULL)) { - DCCP_WARN("%s(%p), Not enough mem to add rx packet " - "to history, consider it lost!\n", dccp_role(sk), sk); - return; - } - - loss = ccid3_hc_rx_detect_loss(sk, packet); - - if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK) - return; - - payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; - ccid3_hc_rx_update_s(hcrx, payload_size); - switch (hcrx->ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: - ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial " - "feedback\n", dccp_role(sk), sk, - dccp_state_name(sk->sk_state), skb); - ccid3_hc_rx_send_feedback(sk); - ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); - return; - case TFRC_RSTATE_DATA: - hcrx->ccid3hcrx_bytes_recv += payload_size; - if (loss) - break; - - now = ktime_get_real(); - if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) - - (s64)hcrx->ccid3hcrx_rtt) >= 0) { - hcrx->ccid3hcrx_tstamp_last_ack = now; - ccid3_hc_rx_send_feedback(sk); + /* Handle pending losses and otherwise check for new loss */ + if (!tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist, + skb, ndp) && + /* Handle data packets: RTT sampling and monitoring p */ + is_data_packet) { + + if (list_empty(&hcrx->ccid3hcrx_li_hist)) { /* no loss so far: p = 0 */ + const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb); + if (sample != 0) + hcrx->ccid3hcrx_rtt = + tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9); + } + + ccid3_hc_rx_update_s(hcrx, payload_size); + /* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */ + if (SUB16(dccp_hdr(skb)->dccph_ccval, + hcrx->ccid3hcrx_last_counter) > 3) + do_feedback = CCID3_FBACK_PERIODIC; } - return; - case TFRC_RSTATE_TERM: - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); - return; - } - - /* Dealing with packet loss */ - ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n", - dccp_role(sk), sk, dccp_state_name(sk->sk_state)); - - p_prev = hcrx->ccid3hcrx_p; + } else if (is_data_packet) { + do_feedback = CCID3_FBACK_INITIAL; + ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); + hcrx->ccid3hcrx_s = payload_size; + } - /* Calculate loss event rate */ - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { - u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + hcrx->ccid3hcrx_bytes_recv += payload_size; - /* Scaling up by 1000000 as fixed decimal */ - if (i_mean != 0) - hcrx->ccid3hcrx_p = 1000000 / i_mean; - } else - DCCP_BUG("empty loss history"); + tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp); - if (hcrx->ccid3hcrx_p > p_prev) { - ccid3_hc_rx_send_feedback(sk); - return; - } + if (do_feedback) + ccid3_hc_rx_send_feedback(sk, skb, do_feedback); } static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) @@ -918,11 +824,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) ccid3_pr_debug("entry\n"); hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; - INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist); INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist); - hcrx->ccid3hcrx_tstamp_last_feedback = - hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real(); - return 0; + return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist); } static void ccid3_hc_rx_exit(struct sock *sk) diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h index b842a7d..3c33dc6 100644 --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -1,7 +1,8 @@ /* * net/dccp/ccids/ccid3.h * - * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK * * An implementation of the DCCP protocol * @@ -135,9 +136,7 @@ enum ccid3_hc_rx_states { * @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448 4.3) * @ccid3hcrx_rtt - Receiver estimate of rtt (non-standard) * @ccid3hcrx_p - current loss event rate (RFC 3448 5.4) - * @ccid3hcrx_seqno_nonloss - Last received non-loss sequence number - * @ccid3hcrx_ccval_nonloss - Last received non-loss Window CCVal - * @ccid3hcrx_ccval_last_counter - Tracks window counter (RFC 4342, 8.1) + * @ccid3hcrx_last_counter - Tracks window counter (RFC 4342, 8.1) * @ccid3hcrx_state - receiver state, one of %ccid3_hc_rx_states * @ccid3hcrx_bytes_recv - Total sum of DCCP payload bytes * @ccid3hcrx_tstamp_last_feedback - Time at which last feedback was sent @@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock { #define ccid3hcrx_x_recv ccid3hcrx_tfrc.tfrcrx_x_recv #define ccid3hcrx_rtt ccid3hcrx_tfrc.tfrcrx_rtt #define ccid3hcrx_p ccid3hcrx_tfrc.tfrcrx_p - u64 ccid3hcrx_seqno_nonloss:48, - ccid3hcrx_ccval_nonloss:4, - ccid3hcrx_ccval_last_counter:4; + u8 ccid3hcrx_last_counter:4; enum ccid3_hc_rx_states ccid3hcrx_state:8; u32 ccid3hcrx_bytes_recv; ktime_t ccid3hcrx_tstamp_last_feedback; - ktime_t ccid3hcrx_tstamp_last_ack; - struct list_head ccid3hcrx_hist; + struct tfrc_rx_hist ccid3hcrx_hist; struct list_head ccid3hcrx_li_hist; u16 ccid3hcrx_s; u32 ccid3hcrx_pinv; diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index a5f59af..71080ca 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -16,6 +16,7 @@ #include "../../dccp.h" #include "loss_interval.h" #include "packet_history.h" +#include "packet_history_internal.h" #include "tfrc.h" #define DCCP_LI_HIST_IVAL_F_LENGTH 8 @@ -129,6 +130,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk, u16 s, u32 bytes_recv, u32 previous_x_recv) { +/* + * FIXME: + * Will be rewritten in the upcoming new loss intervals code. + * Has to be commented ou because it relies on the old rx history + * data structures + */ +#if 0 struct tfrc_rx_hist_entry *entry, *next, *tail = NULL; u32 x_recv, p; suseconds_t rtt, delta; @@ -216,10 +224,10 @@ found: dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied " "loss rate=%u\n", dccp_role(sk), sk, x_recv, p); - if (p == 0) - return ~0; - else + if (p != 0) return 1000000 / p; +#endif + return ~0; } void dccp_li_update_li(struct sock *sk, diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 255cca1..56a0254 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -36,7 +36,10 @@ */ #include <linux/string.h> +#include <linux/slab.h> #include "packet_history.h" +#include "../../dccp.h" +#include "packet_history_internal.h" /** * tfrc_tx_hist_entry - Simple singly-linked TX history list @@ -111,154 +114,213 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, } EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); -/* - * Receiver History Routines +/** + * tfrc_rx_hist_index - index to reach n-th entry after loss_start */ -static struct kmem_cache *tfrc_rx_hist_slab; +static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n) +{ + return (h->loss_start + n) & TFRC_NDUPACK; +} -struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp, - const struct sk_buff *skb, - const gfp_t prio) +/** + * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, - prio); + return h->ring[tfrc_rx_hist_index(h, h->loss_count)]; +} - if (entry != NULL) { - const struct dccp_hdr *dh = dccp_hdr(skb); +/** + * tfrc_rx_hist_entry - return the n-th history entry after loss_start + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n) +{ + return h->ring[tfrc_rx_hist_index(h, n)]; +} - entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->tfrchrx_ccval = dh->dccph_ccval; - entry->tfrchrx_type = dh->dccph_type; - entry->tfrchrx_ndp = ndp; - entry->tfrchrx_tstamp = ktime_get_real(); - } +/** + * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h) +{ + return h->ring[h->loss_start]; +} - return entry; +/* + * Receiver History Routines + */ +static struct kmem_cache *tfrc_rx_hist_slab; + +void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, + const struct sk_buff *skb, + const u32 ndp) +{ + struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h); + const struct dccp_hdr *dh = dccp_hdr(skb); + + entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->tfrchrx_ccval = dh->dccph_ccval; + entry->tfrchrx_type = dh->dccph_type; + entry->tfrchrx_ndp = ndp; + entry->tfrchrx_tstamp = ktime_get_real(); } -EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry) { kmem_cache_free(tfrc_rx_hist_slab, entry); } -int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval) +/* has the packet contained in skb been seen before? */ +int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb) { - struct tfrc_rx_hist_entry *packet = NULL, *entry; + const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq; + int i; - list_for_each_entry(entry, list, tfrchrx_node) - if (entry->tfrchrx_seqno == seq) { - packet = entry; - break; - } + if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0) + return 1; - if (packet) - *ccval = packet->tfrchrx_ccval; + for (i = 1; i <= h->loss_count; i++) + if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq) + return 1; - return packet != NULL; + return 0; } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry); -struct tfrc_rx_hist_entry * - tfrc_rx_hist_find_data_packet(const struct list_head *list) +/* initialise loss detection and disable RTT sampling */ +static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *packet = NULL; - - list_for_each_entry(entry, list, tfrchrx_node) - if (entry->tfrchrx_type == DCCP_PKT_DATA || - entry->tfrchrx_type == DCCP_PKT_DATAACK) { - packet = entry; - break; - } + h->loss_count = 1; +} - return packet; +/* indicate whether previously a packet was detected missing */ +static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h) +{ + return h->loss_count; } -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet); +/* any data packets missing between last reception and skb ? */ +int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, + const struct sk_buff *skb, u32 ndp) +{ + int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno, + DCCP_SKB_CB(skb)->dccpd_seq); + + if (delta > 1 && ndp < delta) + tfrc_rx_hist_loss_indicated(h); + + return tfrc_rx_hist_loss_pending(h); +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated); -void tfrc_rx_hist_add_packet(struct list_head *rx_list, - struct list_head *li_list, - struct tfrc_rx_hist_entry *packet, - u64 nonloss_seqno) +void tfrc_rx_hist_purge(struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *next; - u8 num_later = 0; - - list_add(&packet->tfrchrx_node, rx_list); - - num_later = TFRC_RECV_NUM_LATE_LOSS + 1; - - if (!list_empty(li_list)) { - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { - if (num_later == 0) { - if (after48(nonloss_seqno, - entry->tfrchrx_seqno)) { - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); - } - } else if (tfrc_rx_hist_entry_data_packet(entry)) - --num_later; - } - } else { - int step = 0; - u8 win_count = 0; /* Not needed, but lets shut up gcc */ - int tmp; - /* - * We have no loss interval history so we need at least one - * rtt:s of data packets to approximate rtt. - */ - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { - if (num_later == 0) { - switch (step) { - case 0: - step = 1; - /* OK, find next data packet */ - num_later = 1; - break; - case 1: - step = 2; - /* OK, find next data packet */ - num_later = 1; - win_count = entry->tfrchrx_ccval; - break; - case 2: - tmp = win_count - entry->tfrchrx_ccval; - if (tmp < 0) - tmp += TFRC_WIN_COUNT_LIMIT; - if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) { - /* - * We have found a packet older - * than one rtt remove the rest - */ - step = 3; - } else /* OK, find next data packet */ - num_later = 1; - break; - case 3: - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); - break; - } - } else if (tfrc_rx_hist_entry_data_packet(entry)) - --num_later; + int i; + + for (i = 0; i <= TFRC_NDUPACK; ++i) + if (h->ring[i] != NULL) { + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]); + h->ring[i] = NULL; } +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); + +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) +{ + int i; + + for (i = 0; i <= TFRC_NDUPACK; i++) { + h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); + if (h->ring[i] == NULL) + goto out_free; + } + + h->loss_count = h->loss_start = 0; + return 0; + +out_free: + while (i-- != 0) { + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]); + h->ring[i] = NULL; } + return -ENOBUFS; } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); +/** + * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h) +{ + return h->ring[0]; +} -void tfrc_rx_hist_purge(struct list_head *list) +/** + * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *next; + return h->ring[h->rtt_sample_prev]; +} - list_for_each_entry_safe(entry, next, list, tfrchrx_node) { - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); +/** + * tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal + * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able + * to compute a sample with given data - calling function should check this. + */ +u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb) +{ + u32 sample = 0, + delta_v = SUB16(dccp_hdr(skb)->dccph_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + + if (delta_v < 1 || delta_v > 4) { /* unsuitable CCVal delta */ + if (h->rtt_sample_prev == 2) { /* previous candidate stored */ + sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + if (sample) + sample = 4 / sample * + ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp); + else /* + * FIXME: This condition is in principle not + * possible but occurs when CCID is used for + * two-way data traffic. I have tried to trace + * it, but the cause does not seem to be here. + */ + DCCP_BUG("please report to dccp@vger.kernel.org" + " => prev = %u, last = %u", + tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + } else if (delta_v < 1) { + h->rtt_sample_prev = 1; + goto keep_ref_for_next_time; + } + + } else if (delta_v == 4) /* optimal match */ + sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp)); + else { /* suboptimal match */ + h->rtt_sample_prev = 2; + goto keep_ref_for_next_time; } -} -EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); + if (unlikely(sample > DCCP_SANE_RTT_MAX)) { + DCCP_WARN("RTT sample %u too large, using max\n", sample); + sample = DCCP_SANE_RTT_MAX; + } + + h->rtt_sample_prev = 0; /* use current entry as next reference */ +keep_ref_for_next_time: + + return sample; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt); __init int packet_history_init(void) { diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 5b0b983..dac275f 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -37,15 +37,9 @@ #define _DCCP_PKT_HIST_ #include <linux/ktime.h> -#include <linux/list.h> -#include <linux/slab.h> -#include "tfrc.h" +#include <linux/types.h> -/* Number of later packets received before one is considered lost */ -#define TFRC_RECV_NUM_LATE_LOSS 3 - -#define TFRC_WIN_COUNT_PER_RTT 4 -#define TFRC_WIN_COUNT_LIMIT 16 +struct sk_buff; struct tfrc_tx_hist_entry; @@ -54,54 +48,38 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp); extern u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, const ktime_t now); -/* - * Receiver History data structures and declarations - */ -struct tfrc_rx_hist_entry { - struct list_head tfrchrx_node; - u64 tfrchrx_seqno:48, - tfrchrx_ccval:4, - tfrchrx_type:4; - u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ - ktime_t tfrchrx_tstamp; -}; - -extern struct tfrc_rx_hist_entry * - tfrc_rx_hist_entry_new(const u32 ndp, - const struct sk_buff *skb, - const gfp_t prio); +struct tfrc_rx_hist_entry; -static inline struct tfrc_rx_hist_entry * - tfrc_rx_hist_head(struct list_head *list) -{ - struct tfrc_rx_hist_entry *head = NULL; +/* Subtraction a-b modulo-16, respects circular wrap-around */ +#define SUB16(a, b) (((a) + 16 - (b)) & 0xF) - if (!list_empty(list)) - head = list_entry(list->next, struct tfrc_rx_hist_entry, - tfrchrx_node); - return head; -} +/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */ +#define TFRC_NDUPACK 3 -extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval); -extern struct tfrc_rx_hist_entry * - tfrc_rx_hist_find_data_packet(const struct list_head *list); - -extern void tfrc_rx_hist_add_packet(struct list_head *rx_list, - struct list_head *li_list, - struct tfrc_rx_hist_entry *packet, - u64 nonloss_seqno); - -extern void tfrc_rx_hist_purge(struct list_head *list); +/** + * tfrc_rx_hist - RX history structure for TFRC-based protocols + * + * @ring: Packet history for RTT sampling and loss detection + * @loss_count: Number of entries in circular history + * @loss_start: Movable index (for loss detection) + * @rtt_sample_prev: Used during RTT sampling, points to candidate entry + */ +struct tfrc_rx_hist { + struct tfrc_rx_hist_entry *ring[TFRC_NDUPACK + 1]; + u8 loss_count:2, + loss_start:2; +#define rtt_sample_prev loss_start +}; -static inline int - tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry) -{ - return entry->tfrchrx_type == DCCP_PKT_DATA || - entry->tfrchrx_type == DCCP_PKT_DATAACK; -} +extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, + const struct sk_buff *skb, const u32 ndp); -extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list, - struct list_head *li_list, u8 *win_loss); +extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb); +extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, + const struct sk_buff *skb, u32 ndp); +extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, + const struct sk_buff *skb); +extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h); +extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h); #endif /* _DCCP_PKT_HIST_ */ diff --git a/net/dccp/ccids/lib/packet_history_internal.h b/net/dccp/ccids/lib/packet_history_internal.h new file mode 100644 index 0000000..70d5c31 --- /dev/null +++ b/net/dccp/ccids/lib/packet_history_internal.h @@ -0,0 +1,67 @@ +#ifndef _DCCP_PKT_HIST_INT_ +#define _DCCP_PKT_HIST_INT_ +/* + * Packet RX/TX history data structures and routines for TFRC-based protocols. + * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * + * This code has been developed by the University of Waikato WAND + * research group. For further information please see http://www.wand.net.nz/ + * or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz + * + * This code also uses code from Lulea University, rereleased as GPL by its + * authors: + * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon + * + * Changes to meet Linux coding standards, to make it meet latest ccid3 draft + * and to make it work as a loadable module in the DCCP stack written by + * Arnaldo Carvalho de Melo <acme@conectiva.com.br>. + * + * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#include <linux/dccp.h> +#include <linux/ktime.h> +#include <linux/types.h> + +#define TFRC_WIN_COUNT_LIMIT 16 + +/** + * tfrc_rx_hist_entry - Store information about a single received packet + * @tfrchrx_seqno: DCCP packet sequence number + * @tfrchrx_ccval: window counter value of packet (RFC 4342, 8.1) + * @tfrchrx_ndp: the NDP count (if any) of the packet + * @tfrchrx_tstamp: actual receive time of packet + */ +struct tfrc_rx_hist_entry { + u64 tfrchrx_seqno:48, + tfrchrx_ccval:4, + tfrchrx_type:4; + u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ + ktime_t tfrchrx_tstamp; +}; + +static inline bool tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *h) +{ + return h->tfrchrx_type == DCCP_PKT_DATA || + h->tfrchrx_type == DCCP_PKT_DATAACK || + h->tfrchrx_type == DCCP_PKT_REQUEST || + h->tfrchrx_type == DCCP_PKT_RESPONSE; +} + +#endif /* _DCCP_PKT_HIST_INT_ */ -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7][TAKE 2][TFRC] New rx history code 2007-12-04 13:48 ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo @ 2007-12-05 9:42 ` Gerrit Renker 0 siblings, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-05 9:42 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp | I found a problem that I'm still investigating if was introduced by this | patch or if was already present. When sending 1 million 256 byte packets | with ttcp over loopback, using ccid3 it is crashing, the test machine | I'm using doesn't have a serial port, its a notebook, will switch to | another that has and provide the backtrace. It doesn't happens every | time. | CCID3 is difficult due to the TX queue. Small packets are faster on the wire, fill up the TX queue faster. Since there is little loss on LANs, the slow-start algorithm will soon reach link capacity; but CCID3 can not deal effectively with high speeds. What is known not to work well at the moment is bidirectional data transfer (e.g. an echo server/client). This lead to the comment in tfrc_rx_sample_rtt(); the support for bidirectional data transfer needs some more work, which however requires to make one-directional transfer work well first. | Here is tfrc_rx_hist_alloc back using ring of pointers with the fixed | error path. | Thank you - I was just about to send a similar patch as update since you clearly identified this bug. I will resubmit with your version and upload it to the test tree. | +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) | { | + int i; | + | + for (i = 0; i <= TFRC_NDUPACK; i++) { | + h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); | + if (h->ring[i] == NULL) | + goto out_free; | + } | + | + h->loss_count = h->loss_start = 0; | + return 0; | + | +out_free: | + while (i-- != 0) { | + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]); | + h->ring[i] = NULL; | } | + return -ENOBUFS; | } | ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] [TFRC] New rx history code 2007-12-04 11:59 ` Arnaldo Carvalho de Melo 2007-12-04 13:48 ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo @ 2007-12-05 9:35 ` Gerrit Renker 2007-12-05 12:08 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 28+ messages in thread From: Gerrit Renker @ 2007-12-05 9:35 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp Quoting Arnaldo: | Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu: | > NAK. You have changed the control flow of the algorithm and the underlying | > data structure. Originally it had been an array of pointers, and this had | > been a design decision right from the first submissions in March. From I | > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt | > | > "1. 'ring' is an array of pointers rather than a contiguous area in | > memory. The reason is that, when having to swap adjacent entries | > to sort the history, it is easier to swap pointers than to copy | > (heavy-weight) history-entry data structs." | > | > But in your algorithm it becomes a normal linear array. | > | > As a consequence all subsequent code needs to be rewritten to use | > copying instead of swapping pointers. And there is a lot of that, since | > the loss detection code makes heavy use of swapping entries in order to | > cope with reordered and/or delayed packets. | | So lets not do that, the decision was made based on the RX history patch | alone, where, as pointed out, no swapping occurs. | | > This is not what I would call "entirely equivalent" as you claim. Your | > algorithm is fundamentally different, the control flow is not the same, | > nor are the underlying data structures. | | Its is equivalent up to what is in the tree, thank you for point out | that in subsequent patches it will be used. | | Had this design decision been made explicit in the changeset comment | that introduces the data structure I wouldn't have tried to reduce the | number of allocations. | | And you even said that it was a good decision when first reacting to | this change, which I saw as positive feedback for the RFC I sent. | That was my fault. Your solution "looked" much better to me but even I had forgotten that the algorithm is based on an array of pointers. When I finally sat down and looked through the patch set I found the original notes from March and saw that using a linear array instead of an array of pointers will require quite a few changes and means changing the algorithm. I then thought through whether there could be any advantage in using a linear array as in this patch, but could find none. In the end this went back to the same questions and issues that were tackled between February and March this year. I wish we could have had this dicussion then; it would have made this email unnecessary. I thank you for your replay and I am sorry if you took offense at my perhaps a little strong reaction, here is the essence what I tried to convey but what maybe did not succeed in conveying: * I am absolutely ok with you making changes to variable names, namespaces, file organisation, overall arrangement etc (as per previous email). You have experience and this will often be a benefit. For instance, you combined tfrc_entry_from_skb() with tfrc_rx_hist_update() to give a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() is not otherwise used (and I only found this out when reading your patch). (On the other hand, since this is a 4-element ring buffer, it is no real adding, the same entry will frequently be overwritten, this is why it was initially called hist_update().) * I am also not so much concerned about credit. As long as the (changed) end result is as least as good as or hopefully better than the submitted input, the author name is a side issue; so I do think that your approach is sound and fair. The only place where credits are needed is for the SatSix project (www.ist-satsix.org) which funded this work. This is why some of the copyrights now included "University of Aberdeen". What in any case I'd like to add is at least the Signed-off-by: if it turns out to be a mess I want to contribute my part of responsibility. * But where I need to interact is when changes are made to the algorithm, even if these may seem small. The underlying reasons that lead to the code may not be immediately clear, but since this is a solution for a specific problem, there are also specific reasons; which is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first before committing the patch. * In part you are already doing this (message subject); it may be necessary to adapt the way of discussion/presentation. The subject is complex (spent a week with the flow chart only) and it is a lot - initially this had been 40 small patches. | > | I modified it just to try to make it closer to the existing API, hide details from | > | the CCIDs and fix a couple bugs found in the initial implementation. | > What is "a couple bugs"? So far you have pointed out only one problem and that was | > agreed, it can be fixed. But it remains a side issue, it is not a failure of the | | I pointed two problems one ended up being just the fact that tfrc_ewma | checks if the average is zero for every calculation. | With regard to the tfrc_ewma we need to consider the discussion as I think we are having a misunderstanding. It is necessary to (a) deal with the case that a peer may be sending zero-sized packets: - one solution is to simply omit the check "if (len > 0)" and declare 0-sized data - I think you have a point in that the len check is indeed redundant when we are already checking that this is a data packet packets as pathological (which is probably what they are); - a BUG_ON would be not such a good solution since 0-sized packets are legal; - my suggestion is to remove the "len > 0" test - then tfrc_ewma() can also be used directly, without the surrounding inline function wrapper. (b) reconsider the control flow: in your patch set you introduced a different control flow in the rx_packet_recv() function. I tried to figure out why this was introduced, there are two problems, one not immediately obvious at this moment: - it does not consider where and how loss detection is done (which is part of a later patch). This is a key point why the flow may seem strange: loss detection must be done on both non-data and data packets; there are several cases depending on whether loss previously occurred; the accounting must also be done in parallel for data packets (for instance s and bytes_recvd) and non-data packets (highest-received seqno is always - it is not equivalent with the initial one, when comparing this to the flowchart: http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/reception-main-flowchart.png | > I have provided documentation, written test modules, and am able to prove | > that the algorithm as submitted works reasonably correct. In addition, the | > behaviour has been verified using regression tests. | > | > I am not prepared to take your claims and expressions of "deepest | > respect" at face value since your actions - not your words - show that | > you are, in fact, not dealing respectfully with work which has taken | > months to complete and verify. | > | > During 9 months on dccp@vger you did not provide so much as a paragraph | > of feedback. Instead you mopped up code from the test tree, modified it | > according to your own whims and now, in the circle of your | > invitation-only buddies, start to talk about having discussions and | > iterations. The only iteration I can see is in fixing up the things you | > introduced, so it is not you who has to pay the price. | > | > Sorry, unless you can offer a more mature model of collaboration, | > consider me out of this and the patches summarily withdrawn. I am not | > prepared to throw away several months of work just because you feel | > inspired to do as you please with the work of others. | | Again you want to have your patches accepted as-is. Pointed to one case | where I gave you credit while improving on your work (TX history) and | another where the changes were up for review. I don't consider this a | warm welcome for me to finally dedicate time to this effort. Would you | prefer to continue working without help? I think we should encourage | more people to work on DCCP, you seem to think that changes to your work | are not acceptable. | Yes and no. The internal algorithm I would rather have accepted as-is, unless there are good reasons to change it. But this by no means means I want this waved through like a diplomat's car at the border: there are reasons why it is as it is, and I am happy to discuss these reasons; and if someone can point out a better solution, will accept that. It is just that this is a solution to a specific problem which so far has proven to work, and I therefore think that we can spent the time more efficiently by not fixing things that have already been fixed once: there are many more problems in DCCP (just recently you mentioned robustness of memory usage, then there is the ongoing discussion of how to integrate CCID4 as well); I think we could divide up work time with more benefit. | Perhaps others can comment on what is happening and help us find, as you | say, to find a more mature model of collaboration. | No need to be cynical, you have replied gracefully and with substance. For that I thank you. The topic is complicated, adapting how it is treated may improve the discussion - for instance, if it helps, we could switch back to smaller patch format as original, which you have already been introducing. Best regards Gerrit ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] [TFRC] New rx history code 2007-12-05 9:35 ` [PATCH 7/7] [TFRC] " Gerrit Renker @ 2007-12-05 12:08 ` Arnaldo Carvalho de Melo 2007-12-05 13:34 ` Gerrit Renker 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-05 12:08 UTC (permalink / raw) To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp Em Wed, Dec 05, 2007 at 09:35:30AM +0000, Gerrit Renker escreveu: > Quoting Arnaldo: > | Em Tue, Dec 04, 2007 at 06:55:17AM +0000, Gerrit Renker escreveu: > | > NAK. You have changed the control flow of the algorithm and the underlying > | > data structure. Originally it had been an array of pointers, and this had > | > been a design decision right from the first submissions in March. From I > | > of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt > | > > | > "1. 'ring' is an array of pointers rather than a contiguous area in > | > memory. The reason is that, when having to swap adjacent entries > | > to sort the history, it is easier to swap pointers than to copy > | > (heavy-weight) history-entry data structs." > | > > | > But in your algorithm it becomes a normal linear array. > | > > | > As a consequence all subsequent code needs to be rewritten to use > | > copying instead of swapping pointers. And there is a lot of that, since > | > the loss detection code makes heavy use of swapping entries in order to > | > cope with reordered and/or delayed packets. > | > | So lets not do that, the decision was made based on the RX history patch > | alone, where, as pointed out, no swapping occurs. > | > | > This is not what I would call "entirely equivalent" as you claim. Your > | > algorithm is fundamentally different, the control flow is not the same, > | > nor are the underlying data structures. > | > | Its is equivalent up to what is in the tree, thank you for point out > | that in subsequent patches it will be used. > | > | Had this design decision been made explicit in the changeset comment > | that introduces the data structure I wouldn't have tried to reduce the > | number of allocations. > | > | And you even said that it was a good decision when first reacting to > | this change, which I saw as positive feedback for the RFC I sent. > | > That was my fault. Your solution "looked" much better to me but even I had forgotten > that the algorithm is based on an array of pointers. When I finally sat down And you wrote that code, I acted on looking the code together with reading the changeset comment, to me it also looked much better to do that, but as there were many changes, I sent an [RFC] message. It served its purpose, as you after two iterations realised it was in fact not a good idea as later an array of pointers is required. I should have taken that swap routine as the definitive hint that indeed it was needed. > and looked through the patch set I found the original notes from March and > saw that using a linear array instead of an array of pointers will require quite > a few changes and means changing the algorithm. I then thought through whether there > could be any advantage in using a linear array as in this patch, but could find none. > In the end this went back to the same questions and issues that were tackled between > February and March this year. I wish we could have had this dicussion then; it would > have made this email unnecessary. That is why it is important that the changeset comment collects these scattered notes and discussions. Think about in some years from now we will be left with a situation where we would have to look at many places to understando some aspects of what is in the code. While this happens and we have google for that I think that since you keep such detailed notes it is not a problem to get them in the changesets. > I thank you for your replay and I am sorry if you took offense at my perhaps a little strong > reaction, here is the essence what I tried to convey but what maybe did not succeed in conveying: > > * I am absolutely ok with you making changes to variable names, namespaces, file organisation, > overall arrangement etc (as per previous email). You have experience and this will often be a > benefit. For instance, you combined tfrc_entry_from_skb() with tfrc_rx_hist_update() to give > a new function, tfrc_rx_hist_add_packet(). This is ok since entry_from_skb() is not otherwise > used (and I only found this out when reading your patch). (On the other hand, since this is a > 4-element ring buffer, it is no real adding, the same entry will frequently be overwritten, > this is why it was initially called hist_update().) >From the RX history API user perspective (CCID3 right now, others may come) it is adding a packet to the history. In the past it went to a list, now we have a ring and don't keep more than TFRC_NDUPACK + 1 entries, but this is an internal detail that users don't need to know. > * I am also not so much concerned about credit. As long as the (changed) end result is as > least as good as or hopefully better than the submitted input, the author name is a side > issue; so I do think that your approach is sound and fair. The only place where credits > are needed is for the SatSix project (www.ist-satsix.org) which funded this work. This is > why some of the copyrights now included "University of Aberdeen". What in any case I'd like > to add is at least the Signed-off-by: if it turns out to be a mess I want to contribute my > part of responsibility. And that, along with credit to you is being provided on the few patches I change. > * But where I need to interact is when changes are made to the algorithm, even if these may > seem small. The underlying reasons that lead to the code may not be immediately clear, > but since this is a solution for a specific problem, there are also specific reasons; which > is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first > before committing the patch. I did this for the RX handling, where changes were made. Did that for the TX but ended up commiting before comments from you, but I think its fair to say that the changes for the TX history were more organizational than in the essence of the algorithm. > * In part you are already doing this (message subject); it may be necessary to adapt the way > of discussion/presentation. The subject is complex (spent a week with the flow chart only) > and it is a lot - initially this had been 40 small patches. > > | > | I modified it just to try to make it closer to the existing API, hide details from > | > | the CCIDs and fix a couple bugs found in the initial implementation. > | > What is "a couple bugs"? So far you have pointed out only one problem and that was > | > agreed, it can be fixed. But it remains a side issue, it is not a failure of the > | > | I pointed two problems one ended up being just the fact that tfrc_ewma > | checks if the average is zero for every calculation. > | > With regard to the tfrc_ewma we need to consider the discussion as I think we are having a > misunderstanding. It is necessary to > (a) deal with the case that a peer may be sending zero-sized packets: > - one solution is to simply omit the check "if (len > 0)" and declare 0-sized data > - I think you have a point in that the len check is indeed redundant when we are > already checking that this is a data packet > packets as pathological (which is probably what they are); > - a BUG_ON would be not such a good solution since 0-sized packets are legal; > - my suggestion is to remove the "len > 0" test - then tfrc_ewma() can also be > used directly, without the surrounding inline function wrapper. > > (b) reconsider the control flow: in your patch set you introduced a different control flow > in the rx_packet_recv() function. I tried to figure out why this was introduced, there > are two problems, one not immediately obvious at this moment: > - it does not consider where and how loss detection is done (which is part of a later > patch). This is a key point why the flow may seem strange: loss detection must be > done on both non-data and data packets; there are several cases depending on whether > loss previously occurred; the accounting must also be done in parallel for data packets > (for instance s and bytes_recvd) and non-data packets (highest-received seqno is always > - it is not equivalent with the initial one, when comparing this to the flowchart: > http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/reception-main-flowchart.png I will revert to your original flow. > | > I have provided documentation, written test modules, and am able to prove > | > that the algorithm as submitted works reasonably correct. In addition, the > | > behaviour has been verified using regression tests. > | > > | > I am not prepared to take your claims and expressions of "deepest > | > respect" at face value since your actions - not your words - show that > | > you are, in fact, not dealing respectfully with work which has taken > | > months to complete and verify. > | > > | > During 9 months on dccp@vger you did not provide so much as a paragraph > | > of feedback. Instead you mopped up code from the test tree, modified it > | > according to your own whims and now, in the circle of your > | > invitation-only buddies, start to talk about having discussions and > | > iterations. The only iteration I can see is in fixing up the things you > | > introduced, so it is not you who has to pay the price. > | > > | > Sorry, unless you can offer a more mature model of collaboration, > | > consider me out of this and the patches summarily withdrawn. I am not > | > prepared to throw away several months of work just because you feel > | > inspired to do as you please with the work of others. > | > | Again you want to have your patches accepted as-is. Pointed to one case > | where I gave you credit while improving on your work (TX history) and > | another where the changes were up for review. I don't consider this a > | warm welcome for me to finally dedicate time to this effort. Would you > | prefer to continue working without help? I think we should encourage > | more people to work on DCCP, you seem to think that changes to your work > | are not acceptable. > | > Yes and no. The internal algorithm I would rather have accepted as-is, unless > there are good reasons to change it. But this by no means means I want this > waved through like a diplomat's car at the border: there are reasons why > it is as it is, and I am happy to discuss these reasons; and if someone > can point out a better solution, will accept that. It is just that this > is a solution to a specific problem which so far has proven to work, and > I therefore think that we can spent the time more efficiently by not > fixing things that have already been fixed once: there are many more > problems in DCCP (just recently you mentioned robustness of memory usage, > then there is the ongoing discussion of how to integrate CCID4 as well); > I think we could divide up work time with more benefit. OK, I think I exaggerated on the changes and that you exaggerated on the reaction, lets try to tone down both behaviours and we'll move on. > | Perhaps others can comment on what is happening and help us find, as you > | say, to find a more mature model of collaboration. > | > No need to be cynical, you have replied gracefully and with substance. > For that I thank you. The topic is complicated, adapting how it is > treated may improve the discussion - for instance, if it helps, we could > switch back to smaller patch format as original, which you have already > been introducing. No intention, I was fearing your reaction would escalate and asking for other people that are more experienced in maintainership, telling about my and your mistakes in interacting could help. But I think we're going back on track by ourselves, thank you. - Arnaldo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 7/7] [TFRC] New rx history code 2007-12-05 12:08 ` Arnaldo Carvalho de Melo @ 2007-12-05 13:34 ` Gerrit Renker 0 siblings, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-05 13:34 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp | > In the end this went back to the same questions and issues that were tackled between | > February and March this year. I wish we could have had this dicussion then; it would | > have made this email unnecessary. | | That is why it is important that the changeset comment collects these | scattered notes and discussions. Think about in some years from now we | will be left with a situation where we would have to look at many places | to understando some aspects of what is in the code. While this happens | and we have google for that I think that since you keep such detailed | notes it is not a problem to get them in the changesets. | I agree, the changelogs are all a bit short. When condensing from 40 to 16 patches the changelogs were also cut down, for fear of being too verbose. Will go through the patches in the next days and see if/where this can be improved, that would probably have saved some trouble. | > * But where I need to interact is when changes are made to the algorithm, even if these may | > seem small. The underlying reasons that lead to the code may not be immediately clear, | > but since this is a solution for a specific problem, there are also specific reasons; which | > is why for algorithm changes (and underlying data structure) I'd ask you to discuss this first | > before committing the patch. | | I did this for the RX handling, where changes were made. Did that for | the TX but ended up commiting before comments from you, but I think its | fair to say that the changes for the TX history were more organizational | than in the essence of the algorithm. | It was a similar situation: the RFC patch came out on Thursday afternoon; I replied a bit hurriedly on Friday that it seemed ok, but found the full confirmation only on Sunday after putting all recent changes into the test tree. And yes, you made a good job of it. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ 2007-12-02 21:36 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo @ 2007-12-06 13:59 ` Gerrit Renker 1 sibling, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-06 13:59 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/7] [TFRC]: Make the rx history slab be global 2007-12-02 21:36 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo @ 2007-12-06 13:59 ` Gerrit Renker 1 sibling, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-06 13:59 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency 2007-12-02 21:36 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo @ 2007-12-06 13:57 ` Gerrit Renker 1 sibling, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-06 13:57 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo @ 2007-12-03 8:23 ` Ian McDonald 2007-12-03 8:35 ` Gerrit Renker 2 siblings, 0 replies; 28+ messages in thread From: Ian McDonald @ 2007-12-03 8:23 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: Gerrit Renker, netdev, dccp, Ingo Molnar On 12/3/07, Arnaldo Carvalho de Melo <acme@redhat.com> wrote: > WARNING: After reading some messages from Ingo Molnar on lkml I think we should really > trim the number of lists we use for kernel development. And since I moved > back to using mutt for reading e-mails, something I should have never, ever > stopped doing, I guess we should move the DCCP discussions to netdev, > where we hopefully can get more people interested and reviewing the work we > do, so please consider moving DCCP discussion to netdev@vger.kernel.org, > where lots of smart networking folks are present and can help our efforts > on turning RFCs to code. > I (and others too) don't necessarily have time to read netdev so would vote on keeping dccp. I would totally agree to making sure that cross-post to netdev as well as dccp. Ian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo 2007-12-03 8:23 ` [RFC][PATCHES 0/7]: Reorganization of RX history patches Ian McDonald @ 2007-12-03 8:35 ` Gerrit Renker 2007-12-03 12:44 ` Arnaldo Carvalho de Melo 2 siblings, 1 reply; 28+ messages in thread From: Gerrit Renker @ 2007-12-03 8:35 UTC (permalink / raw) To: Arnaldo Carvalho de Melo; +Cc: netdev, dccp, Ingo Molnar Hi Arnaldo, hank you for going through this. I have just backported your recent patches of 2.6.25 to the DCCP/CCID4/Faster Restart test tree at git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr} as per subsequent message. | do, so please consider moving DCCP discussion to netdev@vger.kernel.org, | where lots of smart networking folks are present and can help our efforts | on turning RFCs to code. Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org? | Please take a look at this patch series where I reorganized your work on the new | TFRC rx history handling code. I'll wait for your considerations and then do as many | interactions as reasonable to get your work merged. | | It should be completely equivalent, plus some fixes and optimizations, such as: It will be necessary to address these points one-by-one. Before diving into making fixes and `optimisations', have you tested your code? The patches you are referring to have been posted and re-posted for a period of over 9 months on dccp@vger, and there are regression tests which show that this code improves on the existing Linux implementation. These are labelled as `test tree' on http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing So if you are making changes to the code, I would like to ask if you have run similar regression tests, to avoid having to step back later. | . The code that allocates the RX ring deals with failures when one of the entries in | the ring buffer is not successfully allocated, the original code was leaking the | successfully allocated entries. | | . We do just one allocation for the ring buffer, as the number of entries is fixed we | should just do one allocation and not TFRC_NDUPACK times. Will look at the first point in the patch; with regard to the second point I agree, this will make the code simpler, which is good. | . I haven't checked if all the code was commited, as I tried to introduce just what was | immediatelly used, probably we'll need to do some changes when working on the merge | of your loss intervals code. Sorry I don't understand this point. | . I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the first | non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the | initial value in the EWMA calculation. This is a misunderstanding. Non-data packets are not considered in the moving average for the data packet size `s'; and it would be an error to do (consider 40byte Acks vs. 1460byte data packets, also it is in RFC 4342). Where would the zero initial value come from? I think this is also a misunderstanding. Please have a look below: static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { // ... u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { if (is_data_packet) { do_feedback = FBACK_INITIAL; ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); ccid3_hc_rx_update_s(hcrx, payload_size); } goto update_records; } ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC), consequently update_s() is only called for data packets; using the two following functions: static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) { return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; } static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) { if (likely(len > 0)) /* don't update on empty packets (e.g. ACKs) */ hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); } ==> Hence I can't see where a zero value should come from: ccid3hrx_s is initially initialised with zero (memset(...,0,...)); when first called, update_s() will feed a non-zero payload size to tfrc_ewma(), which will return `newval' = payload_size, hence the first data packet will contribute a non-zero payload_size. Zero-sized DCCP-Data packets are pathological and are ignored by the CCID calculations (not by the receiver); a corresponding counterpart for zero-sized | | It is available at: | | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 | Need to do this separately. As said, the code has been developed and tested over a long time, it took a long while until it acted predictably, so being careful is very important. I would rather not have my patches merged and continue to run a test tree if the current changes alter the behaviour to the worse. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-03 8:35 ` Gerrit Renker @ 2007-12-03 12:44 ` Arnaldo Carvalho de Melo 2007-12-03 13:49 ` Gerrit Renker 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-03 12:44 UTC (permalink / raw) To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar Em Mon, Dec 03, 2007 at 08:35:12AM +0000, Gerrit Renker escreveu: > Hi Arnaldo, > > hank you for going through this. I have just backported your recent patches of 2.6.25 > to the DCCP/CCID4/Faster Restart test tree at > git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr} > as per subsequent message. > | do, so please consider moving DCCP discussion to netdev@vger.kernel.org, > | where lots of smart networking folks are present and can help our efforts > | on turning RFCs to code. > Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org? Well, since at least one person that has contributed significantly in the past has said he can't cope with traffic on netdev, we can CC dccp@vger.kernel.org. > > | Please take a look at this patch series where I reorganized your work on the new > | TFRC rx history handling code. I'll wait for your considerations and then do as many > | interactions as reasonable to get your work merged. > | > | It should be completely equivalent, plus some fixes and optimizations, such as: > It will be necessary to address these points one-by-one. Before diving into making > fixes and `optimisations', have you tested your code? The patches you are referring to I have performed basic tests, and when doing so noticed a bug in inet_diag, that I commented with Herbert Xu and he has since provided a fix. > have been posted and re-posted for a period of over 9 months on dccp@vger, and Being posted and re-posted does not guarantee that the patch is OK or that is in a form that is acceptable to all tree maintainers. DaveM is subscribed to dccp@vger and could have picked this if he had the time to do the review. I didn't, but now I'm trying to spend my time on reviewing your patches and in the process doing modifications I find appropriate while trying hard not to introduce bugs in your hard work to get it merged. > there are regression tests which show that this code improves on the existing Linux > implementation. These are labelled as `test tree' on > http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing > So if you are making changes to the code, I would like to ask if you have run similar > regression tests, to avoid having to step back later. Fair enough, I will do that before asking Herbert or Dave to pull from my tree. > | . The code that allocates the RX ring deals with failures when one of the entries in > | the ring buffer is not successfully allocated, the original code was leaking the > | successfully allocated entries. Sorry for not point out exactly this, here it goes: Your original patch: +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) +{ + int i; + + for (i = 0; i <= NDUPACK; i++) { + h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); + if (h->ring[i] == NULL) + return 1; + } + h->loss_count = 0; + h->loss_start = 0; + return 0; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); Then, in ccid3_hc_rx_init you do: static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid); ccid3_pr_debug("entry\n"); hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; tfrc_lh_init(&hcrx->ccid3hcrx_li_hist); return tfrc_rx_hist_init(&hcrx->ccid3hcrx_hist); } So if tfrc_rx_hist_init fail to allocate, say, the last entry in the ring, it will return 1, and from looking at how you initialize h->loss_{count,start} to zero I assumed that the whole tfrc_rx_hist is not zeroed when tfrc_rx_hist_init is called, so one can also assume that the ring entries are not initialized to NULL and that if the error recovery is to assume that later on tfrc_rx_hist_cleanup is called we would not have a leak, but an OOPS when tfrc_rx_hist_cleanup tries to call kfree_cache_free on the uninitialized ring entries. But if you look at ccid_new(), that calls ccid3_hc_rx_init(), you'll see that when the ccid rx or hx routine fails, it just frees the struct ccid with the area where h->ring is, so, what was not cleaned up by the ccid init routine is effectively forgot, leaked. I first did the cleanup at tfrc_rx_hist_init (that I renamed to tfrc_rx_hist_alloc, since it doesn't just initializes things, but allocates entries from slab), but then I just made the rx history slab have arrays of tfrc_rx_hist_entry objects, not individual ones as your code always allocates them as arrays. > | . We do just one allocation for the ring buffer, as the number of entries is fixed we > | should just do one allocation and not TFRC_NDUPACK times. > Will look at the first point in the patch; with regard to the second point I agree, this > will make the code simpler, which is good. good > | . I haven't checked if all the code was commited, as I tried to introduce just what was > | immediatelly used, probably we'll need to do some changes when working on the merge > | of your loss intervals code. > Sorry I don't understand this point. Let me check now and tell you for sure: tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as they were not used, we should introduce them later, when getting to the working on the loss interval code. > | . I changed the ccid3_hc_rx_packet_recv code to set hcrx->ccid3hcrx_s for the first > | non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the > | initial value in the EWMA calculation. > This is a misunderstanding. Non-data packets are not considered in the moving average > for the data packet size `s'; and it would be an error to do (consider 40byte Acks vs. > 1460byte data packets, also it is in RFC 4342). > Where would the zero initial value come from? I think this is also a misunderstanding. > Please have a look below: > static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) > { > // ... > u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; > > if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { > if (is_data_packet) { > do_feedback = FBACK_INITIAL; > ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); > ccid3_hc_rx_update_s(hcrx, payload_size); > } > goto update_records; > } > > ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC), > consequently update_s() is only called for data packets; using the two following > functions: > > > static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) > { > return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; > } I hadn't considered that tfrc_ewma would for every packet check if the avg was 0 and I find it suboptimal now that I look at it, we are just feeding data packets, no? So checking just when we are at TFRC_RSTATE_NO_DATA, as your code does seems to be better, but not by calling ccid3_hc_rx_update_s, but just doing: <SNIP> { do_feedback = CCID3_FBACK_INITIAL; ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); hcrx->ccid3hcrx_s = payload_size; } <SNIP> > static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) > { > if (likely(len > 0)) /* don't update on empty packets (e.g. ACKs) */ > hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); > } And we also just do test for len > 0 in update_s, that looks like also excessive, no? As we should pass just data packets, perhaps we should just do a BUG_ON there. > > ==> Hence I can't see where a zero value should come from: ccid3hrx_s is initially > initialised with zero (memset(...,0,...)); when first called, update_s() will > feed a non-zero payload size to tfrc_ewma(), which will return `newval' = payload_size, > hence the first data packet will contribute a non-zero payload_size. > Zero-sized DCCP-Data packets are pathological and are ignored by the CCID calculations > (not by the receiver); a corresponding counterpart for zero-sized Understood, consider this one then an optimization and not a bugfix. My motivation to add this as an optimization had I realized that tfrc_ewma checks for avg being zero would have been greatly diminished, but since we are having all this discussion, I think the optimization is OK to have merged. > | > | It is available at: > | > | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 > | > Need to do this separately. As said, the code has been developed and tested over a long time, > it took a long while until it acted predictably, so being careful is very important. Gerrit, I do these kinds of code reorganizations on the core Linux networking for ages, I'm not perfect (huh! :-P) but I also didn't performed extensive testing doing coverage analisys of all the imaginable possibilities, just did my best to improve the abstractions and the code, ocasionally fixing bugs as I went thru the existing code. So while I understand your concerns about being careful, rest asured I _am_ careful and will be even _more_ careful by using the regression tests you mentioned, as it would be a shame not to use them. But I can't just take your patches as-is all the time, I have my personal preferences and will try to use them when taking the time to merge code from anyone. > I would rather not have my patches merged and continue to run a test tree if the current > changes alter the behaviour to the worse. Its not like that, you released your code as GPL, its out there, it would be a shame not to merge it, but not as-is all the time. I'm making sure I stick my name when I do major code changes while giving you due credit for your work. And heck, this was an [RFC], I didn't pushed this upstream before listening to your concerns. Its only fair that if you expect me to review your work, I should expect for you to review my work, even more when is to improve yours. Best Regards, - Arnaldo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-03 12:44 ` Arnaldo Carvalho de Melo @ 2007-12-03 13:49 ` Gerrit Renker 2007-12-03 14:54 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Gerrit Renker @ 2007-12-03 13:49 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar | > Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org? | | Well, since at least one person that has contributed significantly in | the past has said he can't cope with traffic on netdev, we can CC | dccp@vger.kernel.org I have a similar problem with the traffic but agree and will copy as well. | > have been posted and re-posted for a period of over 9 months on dccp@vger, and | | Being posted and re-posted does not guarantee that the patch is OK or | that is in a form that is acceptable to all tree maintainers. With the first point I couldn't agree more, but this is not really what I meant - the point was that despite posting and re-posting there was often silence. And now there is feedback, in form of a patchset made by you; and all that I am asking for is just to be given the time to look that through. Last time a RFC patch appeared at 3pm and was checked in the same evening (only found out next morning). With your experience and long background as a maintainer maybe you expect quicker turn-around times, but I also had waited patiently until you had had a chance to review the stuff I sent. | > | . The code that allocates the RX ring deals with failures when one of the entries in | > | the ring buffer is not successfully allocated, the original code was leaking the | > | successfully allocated entries. | | Sorry for not point out exactly this, here it goes: | | Your original patch: | | +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) | +{ | + int i; | + | + for (i = 0; i <= NDUPACK; i++) { | + h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); | + if (h->ring[i] == NULL) | + return 1; | + } | + h->loss_count = 0; | + h->loss_start = 0; | + return 0; | +} | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); | I expected this and it actually was very clear from your original message. I fully back up your solution in this point, see below. But my question above was rather: are there any other bugs rather than the above leakage, which is what the previous email seemed to indicate? With regard to your solution - you are using int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) { h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); h->loss_count = h->loss_start = 0; return h->ring == NULL; } which is better not only for one but for two reasons. It solves the leakage and in addition makes the entire code simpler. Fully agreed. | | > | . I haven't checked if all the code was commited, as I tried to introduce just what was | > | immediatelly used, probably we'll need to do some changes when working on the merge | > | of your loss intervals code. | > Sorry I don't understand this point. | | Let me check now and tell you for sure: | | tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as | they were not used, we should introduce them later, when getting to the | working on the loss interval code. Ah thanks, that was really not clear. Just beginning to work through the set. | > static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) | > { | > // ... | > u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; | > | > if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { | > if (is_data_packet) { | > do_feedback = FBACK_INITIAL; | > ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); | > ccid3_hc_rx_update_s(hcrx, payload_size); | > } | > goto update_records; | > } | > | > ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC), | > consequently update_s() is only called for data packets; using the two following | > functions: | > | > | > static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) | > { | > return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; | > } | | I hadn't considered that tfrc_ewma would for every packet check if the | avg was 0 and I find it suboptimal now that I look at it, we are just | feeding data packets, no? Yes exactly, only data packets are used for s. | > static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) | > { | > if (likely(len > 0)) /* don't update on empty packets (e.g. ACKs) */ | > hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); | > } | | And we also just do test for len > 0 in update_s, that looks like | also excessive, no? Hm, I think we need to make it robust against API bugs and/or zero-sized data packets. The check `len > 0' may seem redundant but it catches such a condition. For a moving average an accidental zero value will have quite an impact on the overall value. In CCID3 it is x_new = 0.9 * x_old + 0.1 * len So if len is accidentally 0 where it should not be, then each time the moving average is reduced to 90%. |As we should pass just data packets, perhaps we should just do a BUG_ON there. That is a good idea, can we make it a DCCP_WARN-like thing so that the computer does not freeze? | Understood, consider this one then an optimization and not a bugfix. My | motivation to add this as an optimization had I realized that tfrc_ewma | checks for avg being zero would have been greatly diminished, | but since we are having all this discussion, I think the optimization is | OK to have merged. I see your point but as this is in patch 7/7 can you please allow me some time to digest it. | > | | > | It is available at: | > | | > | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 | > | | > Need to do this separately. As said, the code has been developed and tested over a long time, | > it took a long while until it acted predictably, so being careful is very important. | | Gerrit, I do these kinds of code reorganizations on the core Linux | networking for ages, I'm not perfect (huh! :-P) but I also didn't | performed extensive testing doing coverage analisys of all the | imaginable possibilities, just did my best to improve the abstractions | and the code, ocasionally fixing bugs as I went thru the existing code. | As above - more than likely you will have done a really good job. In the past experience of collaborating with you I can not recall a single bad thing introduced, and learned lots from that. It is just, I am not as fast as you and need longer to understand what is going on. As a comparison - the entire patch set took about a full month to do. But that meant I am reasonably sure the algorithm is sound and can cope with problematic conditions. | But I can't just take your patches as-is all the time, I have my | personal preferences and will try to use them when taking the time to | merge code from anyone. | ... which is perfectly allright and actually I am grateful for that. It was just last week where your feedback on the passive-close states lead to a reconsideration which brought to light three previously unresolved problems: (a) an skb leakage which you pointed out (b) not checking for state transitions from other states (c) how to handle simultaneous-close So it all started when you said you had concerns about (a), but you may have had an inkling or so, and see there were two more problems. Had you silently merged these patches, these three problems would probably still exist. In other words I am not interested in accepting-patches-for-instant-gratification, let's discuss where the code may be weak and iron that out. Thanks a lot for your clarifying email I now understand better. Hopefully there will be feedback. If you can bear with me until tomorrow (Tuesday) or better Wednesday, I will have a look and send feedback about your patch set. I already have some ideas, but need to sit down and look at it. Thanks indeed Gerrit ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-03 13:49 ` Gerrit Renker @ 2007-12-03 14:54 ` Arnaldo Carvalho de Melo 2007-12-03 15:44 ` Gerrit Renker 2007-12-05 10:27 ` Gerrit Renker 0 siblings, 2 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-03 14:54 UTC (permalink / raw) To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar Em Mon, Dec 03, 2007 at 01:49:47PM +0000, Gerrit Renker escreveu: > | > Are you suggesting using netdev exclusively or in addition to dccp@vger.kernel.org? > | > | Well, since at least one person that has contributed significantly in > | the past has said he can't cope with traffic on netdev, we can CC > | dccp@vger.kernel.org > I have a similar problem with the traffic but agree and will copy as well. > > > | > have been posted and re-posted for a period of over 9 months on dccp@vger, and > | > | Being posted and re-posted does not guarantee that the patch is OK or > | that is in a form that is acceptable to all tree maintainers. > With the first point I couldn't agree more, but this is not really what I meant - the point > was that despite posting and re-posting there was often silence. And now there is feedback, > in form of a patchset made by you; and all that I am asking for is just to be given the time to > look that through. Last time a RFC patch appeared at 3pm and was checked in the same evening > (only found out next morning). I was too optimistic about that one, feeling that it was safe, sorry about that, will avoid doing that in the future. > With your experience and long background as a maintainer maybe you expect quicker turn-around > times, but I also had waited patiently until you had had a chance to review the stuff I sent. Agreed, my bad, will be more patient with my side as you've been with yours :-) > | > | . The code that allocates the RX ring deals with failures when one of the entries in > | > | the ring buffer is not successfully allocated, the original code was leaking the > | > | successfully allocated entries. > | > | Sorry for not point out exactly this, here it goes: > | > | Your original patch: > | > | +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) > | +{ > | + int i; > | + > | + for (i = 0; i <= NDUPACK; i++) { > | + h->ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); > | + if (h->ring[i] == NULL) > | + return 1; > | + } > | + h->loss_count = 0; > | + h->loss_start = 0; > | + return 0; > | +} > | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); > | > I expected this and it actually was very clear from your original message. I fully back up > your solution in this point, see below. But my question above was rather: are there any > other bugs rather than the above leakage, which is what the previous email seemed to indicate? > > With regard to your solution - you are using > > int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) > { > h->ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); > h->loss_count = h->loss_start = 0; > return h->ring == NULL; > } > > which is better not only for one but for two reasons. It solves the leakage and in addition makes > the entire code simpler. Fully agreed. good > > | > | > | . I haven't checked if all the code was commited, as I tried to introduce just what was > | > | immediatelly used, probably we'll need to do some changes when working on the merge > | > | of your loss intervals code. > | > Sorry I don't understand this point. > | > | Let me check now and tell you for sure: > | > | tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as > | they were not used, we should introduce them later, when getting to the > | working on the loss interval code. > Ah thanks, that was really not clear. Just beginning to work through the set. great > | > static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) > | > { > | > // ... > | > u32 sample, payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; > | > > | > if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { > | > if (is_data_packet) { > | > do_feedback = FBACK_INITIAL; > | > ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); > | > ccid3_hc_rx_update_s(hcrx, payload_size); > | > } > | > goto update_records; > | > } > | > > | > ==> Non-data packets are ignored for the purposes of computing s (this is in the RFC), > | > consequently update_s() is only called for data packets; using the two following > | > functions: > | > > | > > | > static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) > | > { > | > return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; > | > } > | > | I hadn't considered that tfrc_ewma would for every packet check if the > | avg was 0 and I find it suboptimal now that I look at it, we are just > | feeding data packets, no? > Yes exactly, only data packets are used for s. > > | > static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) > | > { > | > if (likely(len > 0)) /* don't update on empty packets (e.g. ACKs) */ > | > hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); > | > } > | > | And we also just do test for len > 0 in update_s, that looks like > | also excessive, no? > Hm, I think we need to make it robust against API bugs and/or zero-sized > data packets. The check `len > 0' may seem redundant but it catches such > a condition. For a moving average an accidental zero value will have > quite an impact on the overall value. In CCID3 it is > > x_new = 0.9 * x_old + 0.1 * len > > So if len is accidentally 0 where it should not be, then each time the > moving average is reduced to 90%. So we must make it a BUG_ON, not something that is to be always present. > |As we should pass just data packets, perhaps we should just do a BUG_ON there. > That is a good idea, can we make it a DCCP_WARN-like thing so that the computer does > not freeze? that is ok at this point where we're still going thru lots of patches, as soon as we collectively agree that it should be a serious bug to pass a 0 sized packet to this routine, we can turn it into a BUG_ON. But we must make it from now clear that it is something completely unexpected. > | Understood, consider this one then an optimization and not a bugfix. My > | motivation to add this as an optimization had I realized that tfrc_ewma > | checks for avg being zero would have been greatly diminished, > | but since we are having all this discussion, I think the optimization is > | OK to have merged. > I see your point but as this is in patch 7/7 can you please allow me > some time to digest it. OK, its normal working day, so I'm supposed to be doing "Real Work", so you'll have plenty of time to look at it before I get back at doing more serious work on DCCP. > | > | > | > | It is available at: > | > | > | > | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 > | > | > | > Need to do this separately. As said, the code has been developed and tested over a long time, > | > it took a long while until it acted predictably, so being careful is very important. > | > | Gerrit, I do these kinds of code reorganizations on the core Linux > | networking for ages, I'm not perfect (huh! :-P) but I also didn't > | performed extensive testing doing coverage analisys of all the > | imaginable possibilities, just did my best to improve the abstractions > | and the code, ocasionally fixing bugs as I went thru the existing code. > | > As above - more than likely you will have done a really good job. In the > past experience of collaborating with you I can not recall a single bad > thing introduced, and learned lots from that. It is just, I am not as > fast as you and need longer to understand what is going on. > As a comparison - the entire patch set took about a full month to do. > But that meant I am reasonably sure the algorithm is sound and can cope > with problematic conditions. And from what I saw so far that is my impression too, if you look at what I'm doing it is: 1. go thru the whole patch trying to understand hunk by hunk 2. do consistency changes (add namespace prefixes) 3. reorganize the code to look more like what is already there, we both have different backgrounds and tastes about how code should be written, so its only normal that if we want to keep the code consistent, and I want that, I jump into things I think should be "reworded", while trying to keep the algorithm expressed by you. > | But I can't just take your patches as-is all the time, I have my > | personal preferences and will try to use them when taking the time to > | merge code from anyone. > | > ... which is perfectly allright and actually I am grateful for that. It was just > last week where your feedback on the passive-close states lead to a reconsideration > which brought to light three previously unresolved problems: > > (a) an skb leakage which you pointed out > (b) not checking for state transitions from other states > (c) how to handle simultaneous-close > > So it all started when you said you had concerns about (a), but you may > have had an inkling or so, and see there were two more problems. Had you > silently merged these patches, these three problems would probably still > exist. Glad that I was of help > In other words I am not interested in accepting-patches-for-instant-gratification, > let's discuss where the code may be weak and iron that out. Excellent > Thanks a lot for your clarifying email I now understand better. > Hopefully there will be feedback. If you can bear with me until tomorrow > (Tuesday) or better Wednesday, I will have a look and send feedback > about your patch set. I already have some ideas, but need to sit down > and look at it. That is perfectly fine and will give me time to use the regression testing that you mentioned and also to let me think about further automatization on regression testing. Thank you, - Arnaldo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-03 14:54 ` Arnaldo Carvalho de Melo @ 2007-12-03 15:44 ` Gerrit Renker 2007-12-05 10:27 ` Gerrit Renker 1 sibling, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-03 15:44 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar | > | > static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) | > | > { | > | > if (likely(len > 0)) /* don't update on empty packets (e.g. ACKs) */ | > | > hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); | > | > } | > | | > | And we also just do test for len > 0 in update_s, that looks like | > | also excessive, no? | > Hm, I think we need to make it robust against API bugs and/or zero-sized | > data packets. The check `len > 0' may seem redundant but it catches such | > a condition. For a moving average an accidental zero value will have | > quite an impact on the overall value. In CCID3 it is | > | > x_new = 0.9 * x_old + 0.1 * len | > | > So if len is accidentally 0 where it should not be, then each time the | > moving average is reduced to 90%. | | So we must make it a BUG_ON, not something that is to be always present. | I think it should be a warning condition since it can be triggered when the remote party sends zero-sized packets. It may be good to log this into the syslog to warn about possibly misbehaving apps/peers/remote stacks. | > As a comparison - the entire patch set took about a full month to do. | > But that meant I am reasonably sure the algorithm is sound and can cope | > with problematic conditions. | | And from what I saw so far that is my impression too, if you look at | what I'm doing it is: | | 1. go thru the whole patch trying to understand hunk by hunk You are doing a great job - in particular as it really is a lot of material. | 2. do consistency changes (add namespace prefixes) | 3. reorganize the code to look more like what is already there, we | both have different backgrounds and tastes about how code should | be written, so its only normal that if we want to keep the code | consistent, and I want that, I jump into things I think should be | "reworded", while trying to keep the algorithm expressed by you. | Agree, that is not always easy to get right. I try to stick as close as possible to existing conventions but of course that is my interpretation, so I am already anticipating such changes/comments here. | think about further automatization on regression testing. | If it is of any use, some scripts and setups are at the bottom of the page at http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-03 14:54 ` Arnaldo Carvalho de Melo 2007-12-03 15:44 ` Gerrit Renker @ 2007-12-05 10:27 ` Gerrit Renker 2007-12-05 11:52 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 28+ messages in thread From: Gerrit Renker @ 2007-12-05 10:27 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp, Ingo Molnar Today being Wednesday, below is some feedback after working through the patch set. Hope this is helpful. Patch #1: --------- Several new good points are introduced: - IP_DCCP_TFRC_DEBUG is made dependent on IP_DCCP_TFRC_DEBUG which is useful - the select from CONFIG_IP_DCCP_CCID3 => CONFIG_DCCP_TFRC_LIB - the cleanup action in tfrc_module_init() (when packet_history_init() fails) was previously missing, this is a good catch. Also a note: tfrc_pr_debug() is not currently used (but may be later should the code common to both CCID3 and CCID4 be shared). Patches #2/#6: -------------- Separated from main patch, no problems (were initially submitted in this format). I wonder whether switching back to smaller patch sizes is better? Patch #3: --------- Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems. Patch #4: --------- packet_history_init() initialises both RX and TX history and is later called by the module_init() function in net/dccp/ccids/lib/tfrc.c. Just a suggestion, it is possible to simplify this further, by doing all the initialisation / cleanup in tfrc.c: int __init {rx,tx}_packet_history_init() { tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", ...); return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0; } and then call these successively in tfrc_module_init(). Patch #5: --------- The naming scheme introduced here is s/dccp_rx/tfrc_rx/g; s/dccphrx/tfrchrx/g; I wonder, while at it, would it be possible to extend this and extend this to other parts of the code? Basically this is the same discussion as earlier on dccp@vger with Leandro, who first came up with this naming scheme. There the same question came up and the result of the discussion was that a prefix of `tfrchrx' is cryptic; if something simpler is possible then it would be great. Patch #7: --------- * ccid3_hc_rx_detect_loss() can be fully removed since no other code references it any further. * bytes_recv also counts the payload_size's of non-data packets, which is wrong (it should only sum up the sum of bytes transferred as data packets) * loss handling is not correctly taken care of: unlike in the other part, both data and non-data packets are used to detect loss (this is not correctly handled in the current Linux implementation). * tfrc_rx_hist_entry_data_packet() is not needed: - a similar function, called dccp_data_packet(), was introduced in patch 2/7 - code compiles cleanly without tfrc_rx_hist_entry_data_packet() - all references to this function are deleted by patch 7/7 * is another header file (net/dccp/ccids/lib/packet_history_internal.h) really necessary? - net/dccp/ccids/lib has already 3 x header file, 4 x source file - with the removal of tfrc_rx_hist_entry_data_packet(), only struct tfrc_rx_hist_entry remains as the sole occupant of that file - how about hiding the internals of struct tfrc_rx_hist_entry by putting it into packet_history.c, as it was done previously in a similar way for the TX packet history? * in ccid3_hc_rx_insert_options(), due to hunk-shifting or something else, there is still the call to dccp_insert_option_timestamp(sk, skb) --> this was meant to be removed by an earlier patch (which also removed the Elapsed Time option); --> in the original submission of this patch the call to dccp_insert_option_timestamp() did no longer appear (as can be found in the dccp@vger mailing list archives), and the test tree likewise does not use it; --> it can be removed with full confidence since no part of the code uses timestamps sent by the HC-receiver (only the HC-sender timestamps are used); and it is further not required by the spec to send HC-receiver timestamps (RFC 4342, section 6) * one of the two variables ccid3hcrx_tstamp_last_feedback and ccid3hcrx_tstamp_last_ack is redundant and can be removed (this may be part of a later patch); the variable ccid3hcrx_tstamp_last_feedback is very long (function is clear due to type of ktime_t). * the inlines are a good idea regarding type safety, but I take it that we can now throw overboard the old rule of 80 characters per line? Due to the longer names of the inlines, some expressions end at column 98 (cf. tfrc_rx_hist_sample_rtt(); but to be honest I'd rather get rid of that function since the receiver-RTT sampling is notoriously inaccurate (wrong place to measure) and then there is little left to argue with the inlines). * with regard to RX history initialisation, would you be amicable to the idea of performing the RX/TX history, and loss intervals history in tfrc.c, as suggested for patch 1/7 (shortens the routines)? * tfrc_rx_hist_entry is lacking documentation (my fault, had been forgotten in the initial submission): /** * tfrc_rx_hist_entry - Store information about a single received packet * @ptype: the type (5.1) of the packet ... */ * is it really necessary to give the field members of known structures long names such as tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno? This is the same comment as per patch 5/7 and there has been an earlier discussion on dccp@vger where other developers (not just me) agreed that such long names are a burden to write; but we could leave that also for later. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-05 10:27 ` Gerrit Renker @ 2007-12-05 11:52 ` Arnaldo Carvalho de Melo 2007-12-05 13:45 ` Gerrit Renker 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-05 11:52 UTC (permalink / raw) To: Gerrit Renker, Arnaldo Carvalho de Melo, netdev, dccp Note: removed Ingo from the CC list, I had added it first just because he advocated reducing the number of mailing lists, so I wanted him to know that we're trying to do that. Em Wed, Dec 05, 2007 at 10:27:36AM +0000, Gerrit Renker escreveu: > Today being Wednesday, below is some feedback after working through the patch set. > Hope this is helpful. > > Patch #1: > --------- > Several new good points are introduced: > - IP_DCCP_TFRC_DEBUG is made dependent on IP_DCCP_TFRC_DEBUG which is useful > - the select from CONFIG_IP_DCCP_CCID3 => CONFIG_DCCP_TFRC_LIB > - the cleanup action in tfrc_module_init() (when packet_history_init() fails) > was previously missing, this is a good catch. > Also a note: tfrc_pr_debug() is not currently used (but may be later should the > code common to both CCID3 and CCID4 be shared). OK > Patches #2/#6: > -------------- > Separated from main patch, no problems (were initially submitted in this format). > I wonder whether switching back to smaller patch sizes is better? Patches that do one thing that is logically separated from others are preferred, as its analisys is made faster. Patches that rewrite an existing functionality are done best when the functions being replaced keep as much as possible the same name. For instance, in the new RX handling code we need to alloc the RX hist internal structures, in the previous implementation we also needed to do that, it also needs to purge entries, as the old one needed. Previous one was a linked list, the new one is a ring. As we go we can and should add new APIs when needed, but trying to keep an API so that if in the future we decide to rework the internal structures, this can be done in a plugin fashion, without changing too much its users (CCIDs in this case), so that we can get back to the old one with minor disruption to the users if needed. Sometimes we manage to do that, some times we need to improve upon the current API, but the goal remains the same. > Patch #3: > --------- > Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems. This was for consistency with the other slabs in DCCP: [acme@doppio net-2.6.25]$ find net/dccp/ -name "*.c" | xargs grep 'struct kmem_cache' net/dccp/ccids/lib/packet_history.c:static struct kmem_cache *tfrc_tx_hist_slab; net/dccp/ccids/lib/packet_history.c:static struct kmem_cache *tfrc_rx_hist_slab; net/dccp/ccids/lib/loss_interval.c:static struct kmem_cache *dccp_li_cachep __read_mostly; net/dccp/ackvec.c:static struct kmem_cache *dccp_ackvec_slab; net/dccp/ackvec.c:static struct kmem_cache *dccp_ackvec_record_slab; net/dccp/ccid.c: struct kmem_cache *slab; net/dccp/ccid.c:static void ccid_kmem_cache_destroy(struct kmem_cache *slab) [acme@doppio net-2.6.25]$ > Patch #4: > --------- > packet_history_init() initialises both RX and TX history and is later called by the module_init() > function in net/dccp/ccids/lib/tfrc.c. Just a suggestion, it is possible to simplify this further, > by doing all the initialisation / cleanup in tfrc.c: > int __init {rx,tx}_packet_history_init() > { > tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", ...); > return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0; > } > and then call these successively in tfrc_module_init(). Idea here was to have each C source file to have a init module. Perhaps we should try to break packet_history.c into tx_packet_history and rx_packet_history.c. We can do that later to try to meet the goal of being able to see what is being replaced. > Patch #5: > --------- > The naming scheme introduced here is > s/dccp_rx/tfrc_rx/g; > s/dccphrx/tfrchrx/g; > I wonder, while at it, would it be possible to extend this and extend this to other parts > of the code? Basically this is the same discussion as earlier on dccp@vger with Leandro, > who first came up with this naming scheme. There the same question came up and the result > of the discussion was that a prefix of `tfrchrx' is cryptic; if something simpler is > possible then it would be great. As we did with the ackvecs, will do that later. > Patch #7: > --------- > * ccid3_hc_rx_detect_loss() can be fully removed since no other code references it any > further. I left it to try have the diff not mixing up removal of its implementation with the implementation of the function just after it, ccid3_hx_rx_packet_recv. Will remove it later. > * bytes_recv also counts the payload_size's of non-data packets, which is wrong (it should only > sum up the sum of bytes transferred as data packets) As said to you in private message I'll get back to the way you wrote, as you mentioned that upcoming patches will make good use of that. I'll try to restrain me to not do too many changes. > * loss handling is not correctly taken care of: unlike in the other part, both data and non-data > packets are used to detect loss (this is not correctly handled in the current Linux implementation). See previous comment > * tfrc_rx_hist_entry_data_packet() is not needed: > - a similar function, called dccp_data_packet(), was introduced in patch 2/7 I thought about that, but dccp_data_packet is for skbs, tfrc_rx_hist_entry_data_packet is for tfrc_rx_hist_entries, I guess we should just make dccp_data_packet receive the packet type and not an object that has a packet type field. > - code compiles cleanly without tfrc_rx_hist_entry_data_packet() > - all references to this function are deleted by patch 7/7 > * is another header file (net/dccp/ccids/lib/packet_history_internal.h) really necessary? > - net/dccp/ccids/lib has already 3 x header file, 4 x source file Idea here is to not expose data structures and functions that are used only by the rx handling and loss history code. > - with the removal of tfrc_rx_hist_entry_data_packet(), only struct tfrc_rx_hist_entry > remains as the sole occupant of that file > - how about hiding the internals of struct tfrc_rx_hist_entry by putting it into packet_history.c, > as it was done previously in a similar way for the TX packet history? See previous comment. > * in ccid3_hc_rx_insert_options(), due to hunk-shifting or something else, there is still the > call to dccp_insert_option_timestamp(sk, skb) > --> this was meant to be removed by an earlier patch (which also removed the Elapsed Time option); > --> in the original submission of this patch the call to dccp_insert_option_timestamp() did no > longer appear (as can be found in the dccp@vger mailing list archives), and the test tree > likewise does not use it; > --> it can be removed with full confidence since no part of the code uses timestamps sent by the > HC-receiver (only the HC-sender timestamps are used); and it is further not required by the > spec to send HC-receiver timestamps (RFC 4342, section 6) I removed that on purpose, it didn't look related to the main goal of the patch nor was mentioned in the changeset, I'll add it as a separate patch with the explanation you provided above (and in the past). > * one of the two variables ccid3hcrx_tstamp_last_feedback and ccid3hcrx_tstamp_last_ack is > redundant and can be removed (this may be part of a later patch); the variable ccid3hcrx_tstamp_last_feedback > is very long (function is clear due to type of ktime_t). will change that later. I.e. these things can be changed later, avoiding such changes at this point reduces the size of the patch, which is always good. > * the inlines are a good idea regarding type safety, but I take it that we can now throw overboard the old rule > of 80 characters per line? Due to the longer names of the inlines, some expressions end at column 98 (cf. > tfrc_rx_hist_sample_rtt(); but to be honest I'd rather get rid of that function since the receiver-RTT > sampling is notoriously inaccurate (wrong place to measure) and then there is little left to argue with the inlines). We must try as much as possible to not have more than 80 characters per line, but at the same time the kernel is ever growing, namespacing is required to be able to use tools such as ctags, and even to help in decoding oopses I'd say, as we can get more information on a backtrace. > * with regard to RX history initialisation, would you be amicable to the idea of performing the RX/TX history, and > loss intervals history in tfrc.c, as suggested for patch 1/7 (shortens the routines)? See my previous comment. > * tfrc_rx_hist_entry is lacking documentation (my fault, had been forgotten in the initial submission): > /** > * tfrc_rx_hist_entry - Store information about a single received packet > * @ptype: the type (5.1) of the packet > ... > */ > > * is it really necessary to give the field members of known structures long names such as > tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno? > This is the same comment as per patch 5/7 and there has been an earlier discussion on dccp@vger where > other developers (not just me) agreed that such long names are a burden to write; but we could leave that also for later. See previous comment, we can do that as we did for the ackvec, in a separate patch, but lets leave this for later now, trying to use namespacing but with shorter names, that even looking cryptic can help in searching for RX stamps, for instance, as we will iterate thru them using the editor search routine and don't get distracted with the TX tstamps, for instance, or search for them with grep. - Arnaldo ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches 2007-12-05 11:52 ` Arnaldo Carvalho de Melo @ 2007-12-05 13:45 ` Gerrit Renker 0 siblings, 0 replies; 28+ messages in thread From: Gerrit Renker @ 2007-12-05 13:45 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, netdev, dccp | > Patch #3: | > --------- | > Renames s/tfrc_tx_hist/tfrc_tx_hist_slab/g; no problems. | | This was for consistency with the other slabs in DCCP: | | [acme@doppio net-2.6.25]$ find net/dccp/ -name "*.c" | xargs grep 'struct kmem_cache' Thanks, I will put the same naming scheme also in the test tree (tomorrow). | > Patch #4: | > --------- | > Just a suggestion, it is possible to simplify this further, | > by doing all the initialisation / cleanup in tfrc.c: | > int __init {rx,tx}_packet_history_init() | > { | > tfrc_{rx,tx}_hist_slab = kmem_cache_create("tfrc_{rx,tx}_hist", ...); | > return tfrc_{rx,tx}_hist_slab == NULL ? - ENOBUFS : 0; | > } | > and then call these successively in tfrc_module_init(). | | Idea here was to have each C source file to have a init module. Perhaps | we should try to break packet_history.c into tx_packet_history and | rx_packet_history.c. We can do that later to try to meet the goal of | being able to see what is being replaced. | I think this is a great idea, since then rx_packet_history.c could also take up all the internals of the RX packet history list, as it is currently done for the TX history, and it could also possibly incorporate. packet_history_internal.h. | | > Patch #7: | > --------- | | > * tfrc_rx_hist_entry_data_packet() is not needed: | > - a similar function, called dccp_data_packet(), was introduced in patch 2/7 | | I thought about that, but dccp_data_packet is for skbs, | tfrc_rx_hist_entry_data_packet is for tfrc_rx_hist_entries, I guess we | should just make dccp_data_packet receive the packet type and not an | object that has a packet type field. | The question which of the two interfaces is generally better to use is best left to you. Two functions doing almost the same thing can probably be replaced by just one. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCHES 0/7]: DCCP patches for 2.6.25 @ 2007-12-06 21:02 Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, dccp Hi David, Please consider pulling from: master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 Best Regards, - Arnaldo b/net/dccp/ccids/Kconfig | 13 b/net/dccp/ccids/ccid3.c | 35 -- b/net/dccp/ccids/ccid3.h | 14 b/net/dccp/ccids/lib/Makefile | 2 b/net/dccp/ccids/lib/loss_interval.c | 14 b/net/dccp/ccids/lib/packet_history.c | 27 - b/net/dccp/ccids/lib/packet_history.h | 3 b/net/dccp/ccids/lib/tfrc.c | 48 +++ b/net/dccp/ccids/lib/tfrc.h | 18 - b/net/dccp/dccp.h | 13 net/dccp/ccids/ccid3.c | 322 ++++++++-------------- net/dccp/ccids/lib/loss_interval.c | 13 net/dccp/ccids/lib/packet_history.c | 496 +++++++++++++++++++--------------- net/dccp/ccids/lib/packet_history.h | 177 ++++-------- 14 files changed, 579 insertions(+), 616 deletions(-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/7] [TFRC]: Provide central source file and debug facility 2007-12-06 21:02 [PATCHES 0/7]: DCCP patches for 2.6.25 Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller Cc: netdev, dccp, Gerrit Renker, Ian McDonald, Arnaldo Carvalho de Melo From: Gerrit Renker <gerrit@erg.abdn.ac.uk> This patch changes the tfrc_lib module in the following manner: (1) a dedicated tfrc source file to call the packet history & loss interval init/exit functions. (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'. Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3 select IP_DCCP_TFRC_LIB. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/Kconfig | 13 ++++++--- net/dccp/ccids/lib/Makefile | 2 +- net/dccp/ccids/lib/packet_history.c | 27 ++----------------- net/dccp/ccids/lib/packet_history.h | 3 +- net/dccp/ccids/lib/tfrc.c | 48 +++++++++++++++++++++++++++++++++++ net/dccp/ccids/lib/tfrc.h | 17 +++++++++--- 6 files changed, 75 insertions(+), 35 deletions(-) create mode 100644 net/dccp/ccids/lib/tfrc.c diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig index 3d7d867..1227594 100644 --- a/net/dccp/ccids/Kconfig +++ b/net/dccp/ccids/Kconfig @@ -38,6 +38,7 @@ config IP_DCCP_CCID2_DEBUG config IP_DCCP_CCID3 tristate "CCID3 (TCP-Friendly) (EXPERIMENTAL)" def_tristate IP_DCCP + select IP_DCCP_TFRC_LIB ---help--- CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based rate-controlled congestion control mechanism. TFRC is designed to @@ -63,10 +64,6 @@ config IP_DCCP_CCID3 If in doubt, say M. -config IP_DCCP_TFRC_LIB - depends on IP_DCCP_CCID3 - def_tristate IP_DCCP_CCID3 - config IP_DCCP_CCID3_DEBUG bool "CCID3 debugging messages" depends on IP_DCCP_CCID3 @@ -110,5 +107,13 @@ config IP_DCCP_CCID3_RTO is serious network congestion: experimenting with larger values should therefore not be performed on WANs. +config IP_DCCP_TFRC_LIB + tristate + default n + +config IP_DCCP_TFRC_DEBUG + bool + depends on IP_DCCP_TFRC_LIB + default y if IP_DCCP_CCID3_DEBUG endmenu diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile index 5f940a6..68c93e3 100644 --- a/net/dccp/ccids/lib/Makefile +++ b/net/dccp/ccids/lib/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o -dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o +dccp_tfrc_lib-y := tfrc.o tfrc_equation.o packet_history.o loss_interval.o diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 4805de9..1d4d6ee 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -35,7 +35,6 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include <linux/module.h> #include <linux/string.h> #include "packet_history.h" @@ -277,39 +276,19 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); -extern int __init dccp_li_init(void); -extern void dccp_li_exit(void); - -static __init int packet_history_init(void) +__init int packet_history_init(void) { - if (dccp_li_init() != 0) - goto out; - tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist", sizeof(struct tfrc_tx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); - if (tfrc_tx_hist == NULL) - goto out_li_exit; - return 0; -out_li_exit: - dccp_li_exit(); -out: - return -ENOBUFS; + return tfrc_tx_hist == NULL ? -ENOBUFS : 0; } -module_init(packet_history_init); -static __exit void packet_history_exit(void) +void packet_history_exit(void) { if (tfrc_tx_hist != NULL) { kmem_cache_destroy(tfrc_tx_hist); tfrc_tx_hist = NULL; } - dccp_li_exit(); } -module_exit(packet_history_exit); - -MODULE_AUTHOR("Ian McDonald <ian.mcdonald@jandi.co.nz>, " - "Arnaldo Carvalho de Melo <acme@ghostprotocols.net>"); -MODULE_DESCRIPTION("DCCP TFRC library"); -MODULE_LICENSE("GPL"); diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 0670f46..9a2642e 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -39,8 +39,7 @@ #include <linux/ktime.h> #include <linux/list.h> #include <linux/slab.h> - -#include "../../dccp.h" +#include "tfrc.h" /* Number of later packets received before one is considered lost */ #define TFRC_RECV_NUM_LATE_LOSS 3 diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c new file mode 100644 index 0000000..3a7a183 --- /dev/null +++ b/net/dccp/ccids/lib/tfrc.c @@ -0,0 +1,48 @@ +/* + * TFRC: main module holding the pieces of the TFRC library together + * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2007 Arnaldo Carvalho de Melo <acme@redhat.com> + */ +#include <linux/module.h> +#include <linux/moduleparam.h> +#include "tfrc.h" + +#ifdef CONFIG_IP_DCCP_TFRC_DEBUG +int tfrc_debug; +module_param(tfrc_debug, bool, 0444); +MODULE_PARM_DESC(tfrc_debug, "Enable debug messages"); +#endif + +extern int dccp_li_init(void); +extern void dccp_li_exit(void); +extern int packet_history_init(void); +extern void packet_history_exit(void); + +static int __init tfrc_module_init(void) +{ + int rc = dccp_li_init(); + + if (rc == 0) { + rc = packet_history_init(); + if (rc != 0) + dccp_li_exit(); + } + + return rc; +} + +static void __exit tfrc_module_exit(void) +{ + packet_history_exit(); + dccp_li_exit(); +} + +module_init(tfrc_module_init); +module_exit(tfrc_module_exit); + +MODULE_AUTHOR("Gerrit Renker <gerrit@erg.abdn.ac.uk>, " + "Ian McDonald <ian.mcdonald@jandi.co.nz>, " + "Arnaldo Carvalho de Melo <acme@redhat.com>"); +MODULE_DESCRIPTION("DCCP TFRC library"); +MODULE_LICENSE("GPL"); diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h index 5a0ba86..ab8848c 100644 --- a/net/dccp/ccids/lib/tfrc.h +++ b/net/dccp/ccids/lib/tfrc.h @@ -3,10 +3,11 @@ /* * net/dccp/ccids/lib/tfrc.h * - * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz> - * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> - * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-6 Ian McDonald <ian.mcdonald@jandi.co.nz> + * Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br> + * Copyright (c) 2003 Nils-Erik Mattsson, Joacim Haggmark, Magnus Erixzon * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -15,6 +16,14 @@ */ #include <linux/types.h> #include <asm/div64.h> +#include "../../dccp.h" + +#ifdef CONFIG_IP_DCCP_TFRC_DEBUG +extern int tfrc_debug; +#define tfrc_pr_debug(format, a...) DCCP_PR_DEBUG(tfrc_debug, format, ##a) +#else +#define tfrc_pr_debug(format, a...) +#endif /* integer-arithmetic divisions of type (a * 1000000)/b */ static inline u64 scaled_div(u64 a, u32 b) -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' 2007-12-06 21:02 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller Cc: netdev, dccp, Gerrit Renker, Ian McDonald, Arnaldo Carvalho de Melo From: Gerrit Renker <gerrit@erg.abdn.ac.uk> as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/dccp.h | 12 ++++++++++++ 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index ee97950..f4a5ea1 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -334,6 +334,7 @@ struct dccp_skb_cb { #define DCCP_SKB_CB(__skb) ((struct dccp_skb_cb *)&((__skb)->cb[0])) +/* RFC 4340, sec. 7.7 */ static inline int dccp_non_data_packet(const struct sk_buff *skb) { const __u8 type = DCCP_SKB_CB(skb)->dccpd_type; @@ -346,6 +347,17 @@ static inline int dccp_non_data_packet(const struct sk_buff *skb) type == DCCP_PKT_SYNCACK; } +/* RFC 4340, sec. 7.7 */ +static inline int dccp_data_packet(const struct sk_buff *skb) +{ + const __u8 type = DCCP_SKB_CB(skb)->dccpd_type; + + return type == DCCP_PKT_DATA || + type == DCCP_PKT_DATAACK || + type == DCCP_PKT_REQUEST || + type == DCCP_PKT_RESPONSE; +} + static inline int dccp_packet_without_ack(const struct sk_buff *skb) { const __u8 type = DCCP_SKB_CB(skb)->dccpd_type; -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency 2007-12-06 21:02 ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, dccp, Arnaldo Carvalho de Melo, Gerrit Renker Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/lib/packet_history.c | 20 ++++++++++---------- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 1d4d6ee..b628714 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -53,7 +53,7 @@ struct tfrc_tx_hist_entry { /* * Transmitter History Routines */ -static struct kmem_cache *tfrc_tx_hist; +static struct kmem_cache *tfrc_tx_hist_slab; static struct tfrc_tx_hist_entry * tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno) @@ -66,7 +66,7 @@ static struct tfrc_tx_hist_entry * int tfrc_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno) { - struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist, gfp_any()); + struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist_slab, gfp_any()); if (entry == NULL) return -ENOBUFS; @@ -85,7 +85,7 @@ void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp) while (head != NULL) { struct tfrc_tx_hist_entry *next = head->next; - kmem_cache_free(tfrc_tx_hist, head); + kmem_cache_free(tfrc_tx_hist_slab, head); head = next; } @@ -278,17 +278,17 @@ EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); __init int packet_history_init(void) { - tfrc_tx_hist = kmem_cache_create("tfrc_tx_hist", - sizeof(struct tfrc_tx_hist_entry), 0, - SLAB_HWCACHE_ALIGN, NULL); + tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist", + sizeof(struct tfrc_tx_hist_entry), 0, + SLAB_HWCACHE_ALIGN, NULL); - return tfrc_tx_hist == NULL ? -ENOBUFS : 0; + return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0; } void packet_history_exit(void) { - if (tfrc_tx_hist != NULL) { - kmem_cache_destroy(tfrc_tx_hist); - tfrc_tx_hist = NULL; + if (tfrc_tx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_tx_hist_slab); + tfrc_tx_hist_slab = NULL; } } -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 4/7] [TFRC]: Make the rx history slab be global 2007-12-06 21:02 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, dccp, Arnaldo Carvalho de Melo, Gerrit Renker This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 35 ++----------- net/dccp/ccids/lib/packet_history.c | 95 ++++++++++++++++++----------------- net/dccp/ccids/lib/packet_history.h | 43 ++-------------- 3 files changed, 60 insertions(+), 113 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5dea690..07920bb 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -49,8 +49,6 @@ static int ccid3_debug; #define ccid3_pr_debug(format, a...) #endif -static struct dccp_rx_hist *ccid3_rx_hist; - /* * Transmitter Half-Connection Routines */ @@ -807,9 +805,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, } detect_out: - dccp_rx_hist_add_packet(ccid3_rx_hist, &hcrx->ccid3hcrx_hist, - &hcrx->ccid3hcrx_li_hist, packet, - hcrx->ccid3hcrx_seqno_nonloss); + dccp_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, + &hcrx->ccid3hcrx_li_hist, packet, + hcrx->ccid3hcrx_seqno_nonloss); return loss; } @@ -852,8 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) return; } - packet = dccp_rx_hist_entry_new(ccid3_rx_hist, opt_recv->dccpor_ndp, - skb, GFP_ATOMIC); + packet = dccp_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); if (unlikely(packet == NULL)) { DCCP_WARN("%s(%p), Not enough mem to add rx packet " "to history, consider it lost!\n", dccp_role(sk), sk); @@ -936,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk) ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM); /* Empty packet history */ - dccp_rx_hist_purge(ccid3_rx_hist, &hcrx->ccid3hcrx_hist); + dccp_rx_hist_purge(&hcrx->ccid3hcrx_hist); /* Empty loss interval history */ dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist); @@ -1013,33 +1010,13 @@ MODULE_PARM_DESC(ccid3_debug, "Enable debug messages"); static __init int ccid3_module_init(void) { - int rc = -ENOBUFS; - - ccid3_rx_hist = dccp_rx_hist_new("ccid3"); - if (ccid3_rx_hist == NULL) - goto out; - - rc = ccid_register(&ccid3); - if (rc != 0) - goto out_free_rx; -out: - return rc; - -out_free_rx: - dccp_rx_hist_delete(ccid3_rx_hist); - ccid3_rx_hist = NULL; - goto out; + return ccid_register(&ccid3); } module_init(ccid3_module_init); static __exit void ccid3_module_exit(void) { ccid_unregister(&ccid3); - - if (ccid3_rx_hist != NULL) { - dccp_rx_hist_delete(ccid3_rx_hist); - ccid3_rx_hist = NULL; - } } module_exit(ccid3_module_exit); diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index b628714..e1ab853 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -114,48 +114,33 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); /* * Receiver History Routines */ -struct dccp_rx_hist *dccp_rx_hist_new(const char *name) +static struct kmem_cache *tfrc_rx_hist_slab; + +struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp, + const struct sk_buff *skb, + const gfp_t prio) { - struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC); - static const char dccp_rx_hist_mask[] = "rx_hist_%s"; - char *slab_name; - - if (hist == NULL) - goto out; - - slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1, - GFP_ATOMIC); - if (slab_name == NULL) - goto out_free_hist; - - sprintf(slab_name, dccp_rx_hist_mask, name); - hist->dccprxh_slab = kmem_cache_create(slab_name, - sizeof(struct dccp_rx_hist_entry), - 0, SLAB_HWCACHE_ALIGN, NULL); - if (hist->dccprxh_slab == NULL) - goto out_free_slab_name; -out: - return hist; -out_free_slab_name: - kfree(slab_name); -out_free_hist: - kfree(hist); - hist = NULL; - goto out; -} + struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, + prio); -EXPORT_SYMBOL_GPL(dccp_rx_hist_new); + if (entry != NULL) { + const struct dccp_hdr *dh = dccp_hdr(skb); -void dccp_rx_hist_delete(struct dccp_rx_hist *hist) -{ - const char* name = kmem_cache_name(hist->dccprxh_slab); + entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->dccphrx_ccval = dh->dccph_ccval; + entry->dccphrx_type = dh->dccph_type; + entry->dccphrx_ndp = ndp; + entry->dccphrx_tstamp = ktime_get_real(); + } - kmem_cache_destroy(hist->dccprxh_slab); - kfree(name); - kfree(hist); + return entry; } +EXPORT_SYMBOL_GPL(dccp_rx_hist_entry_new); -EXPORT_SYMBOL_GPL(dccp_rx_hist_delete); +static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist_entry *entry) +{ + kmem_cache_free(tfrc_rx_hist_slab, entry); +} int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, u8 *ccval) @@ -192,11 +177,10 @@ struct dccp_rx_hist_entry * EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet); -void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, - struct list_head *rx_list, - struct list_head *li_list, - struct dccp_rx_hist_entry *packet, - u64 nonloss_seqno) +void dccp_rx_hist_add_packet(struct list_head *rx_list, + struct list_head *li_list, + struct dccp_rx_hist_entry *packet, + u64 nonloss_seqno) { struct dccp_rx_hist_entry *entry, *next; u8 num_later = 0; @@ -211,7 +195,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, if (after48(nonloss_seqno, entry->dccphrx_seqno)) { list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(hist, entry); + dccp_rx_hist_entry_delete(entry); } } else if (dccp_rx_hist_entry_data_packet(entry)) --num_later; @@ -253,7 +237,7 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, break; case 3: list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(hist, entry); + dccp_rx_hist_entry_delete(entry); break; } } else if (dccp_rx_hist_entry_data_packet(entry)) @@ -264,13 +248,13 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet); -void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) +void dccp_rx_hist_purge(struct list_head *list) { struct dccp_rx_hist_entry *entry, *next; list_for_each_entry_safe(entry, next, list, dccphrx_node) { list_del_init(&entry->dccphrx_node); - kmem_cache_free(hist->dccprxh_slab, entry); + dccp_rx_hist_entry_delete(entry); } } @@ -281,8 +265,22 @@ __init int packet_history_init(void) tfrc_tx_hist_slab = kmem_cache_create("tfrc_tx_hist", sizeof(struct tfrc_tx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); + if (tfrc_tx_hist_slab == NULL) + goto out_err; - return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0; + tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist", + sizeof(struct dccp_rx_hist_entry), 0, + SLAB_HWCACHE_ALIGN, NULL); + if (tfrc_rx_hist_slab == NULL) + goto out_free_tx; + + return 0; + +out_free_tx: + kmem_cache_destroy(tfrc_tx_hist_slab); + tfrc_tx_hist_slab = NULL; +out_err: + return -ENOBUFS; } void packet_history_exit(void) @@ -291,4 +289,9 @@ void packet_history_exit(void) kmem_cache_destroy(tfrc_tx_hist_slab); tfrc_tx_hist_slab = NULL; } + + if (tfrc_rx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_rx_hist_slab); + tfrc_rx_hist_slab = NULL; + } } diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 9a2642e..34b180b 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -66,34 +66,10 @@ struct dccp_rx_hist_entry { ktime_t dccphrx_tstamp; }; -struct dccp_rx_hist { - struct kmem_cache *dccprxh_slab; -}; - -extern struct dccp_rx_hist *dccp_rx_hist_new(const char *name); -extern void dccp_rx_hist_delete(struct dccp_rx_hist *hist); - -static inline struct dccp_rx_hist_entry * - dccp_rx_hist_entry_new(struct dccp_rx_hist *hist, - const u32 ndp, +extern struct dccp_rx_hist_entry * + dccp_rx_hist_entry_new(const u32 ndp, const struct sk_buff *skb, - const gfp_t prio) -{ - struct dccp_rx_hist_entry *entry = kmem_cache_alloc(hist->dccprxh_slab, - prio); - - if (entry != NULL) { - const struct dccp_hdr *dh = dccp_hdr(skb); - - entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->dccphrx_ccval = dh->dccph_ccval; - entry->dccphrx_type = dh->dccph_type; - entry->dccphrx_ndp = ndp; - entry->dccphrx_tstamp = ktime_get_real(); - } - - return entry; -} + const gfp_t prio); static inline struct dccp_rx_hist_entry * dccp_rx_hist_head(struct list_head *list) @@ -111,21 +87,12 @@ extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, extern struct dccp_rx_hist_entry * dccp_rx_hist_find_data_packet(const struct list_head *list); -extern void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, - struct list_head *rx_list, +extern void dccp_rx_hist_add_packet(struct list_head *rx_list, struct list_head *li_list, struct dccp_rx_hist_entry *packet, u64 nonloss_seqno); -static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist *hist, - struct dccp_rx_hist_entry *entry) -{ - if (entry != NULL) - kmem_cache_free(hist->dccprxh_slab, entry); -} - -extern void dccp_rx_hist_purge(struct dccp_rx_hist *hist, - struct list_head *list); +extern void dccp_rx_hist_purge(struct list_head *list); static inline int dccp_rx_hist_entry_data_packet(const struct dccp_rx_hist_entry *entry) -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ 2007-12-06 21:02 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, dccp, Arnaldo Carvalho de Melo, Gerrit Renker This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 32 ++++++------ net/dccp/ccids/lib/loss_interval.c | 14 +++--- net/dccp/ccids/lib/packet_history.c | 90 +++++++++++++++++----------------- net/dccp/ccids/lib/packet_history.h | 48 +++++++++--------- 4 files changed, 92 insertions(+), 92 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 07920bb..c95dca8 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -677,7 +677,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct dccp_rx_hist_entry *packet; + struct tfrc_rx_hist_entry *packet; ktime_t now; suseconds_t delta; @@ -701,7 +701,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) return; } - packet = dccp_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); + packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); if (unlikely(packet == NULL)) { DCCP_WARN("%s(%p), no data packet in history!\n", dccp_role(sk), sk); @@ -709,7 +709,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) } hcrx->ccid3hcrx_tstamp_last_feedback = now; - hcrx->ccid3hcrx_ccval_last_counter = packet->dccphrx_ccval; + hcrx->ccid3hcrx_ccval_last_counter = packet->tfrchrx_ccval; hcrx->ccid3hcrx_bytes_recv = 0; if (hcrx->ccid3hcrx_p == 0) @@ -752,12 +752,12 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) } static int ccid3_hc_rx_detect_loss(struct sock *sk, - struct dccp_rx_hist_entry *packet) + struct tfrc_rx_hist_entry *packet) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct dccp_rx_hist_entry *rx_hist = - dccp_rx_hist_head(&hcrx->ccid3hcrx_hist); - u64 seqno = packet->dccphrx_seqno; + struct tfrc_rx_hist_entry *rx_hist = + tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist); + u64 seqno = packet->tfrchrx_seqno; u64 tmp_seqno; int loss = 0; u8 ccval; @@ -766,9 +766,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; if (!rx_hist || - follows48(packet->dccphrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { + follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->dccphrx_ccval; + hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; goto detect_out; } @@ -789,7 +789,7 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, dccp_inc_seqno(&tmp_seqno); hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; dccp_inc_seqno(&tmp_seqno); - while (dccp_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, + while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, tmp_seqno, &ccval)) { hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; hcrx->ccid3hcrx_ccval_nonloss = ccval; @@ -799,13 +799,13 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, /* FIXME - this code could be simplified with above while */ /* but works at moment */ - if (follows48(packet->dccphrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { + if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->dccphrx_ccval; + hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; } detect_out: - dccp_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, + tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, &hcrx->ccid3hcrx_li_hist, packet, hcrx->ccid3hcrx_seqno_nonloss); return loss; @@ -815,7 +815,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); const struct dccp_options_received *opt_recv; - struct dccp_rx_hist_entry *packet; + struct tfrc_rx_hist_entry *packet; u32 p_prev, r_sample, rtt_prev; int loss, payload_size; ktime_t now; @@ -850,7 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) return; } - packet = dccp_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); + packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); if (unlikely(packet == NULL)) { DCCP_WARN("%s(%p), Not enough mem to add rx packet " "to history, consider it lost!\n", dccp_role(sk), sk); @@ -933,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk) ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM); /* Empty packet history */ - dccp_rx_hist_purge(&hcrx->ccid3hcrx_hist); + tfrc_rx_hist_purge(&hcrx->ccid3hcrx_hist); /* Empty loss interval history */ dccp_li_hist_purge(&hcrx->ccid3hcrx_li_hist); diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index f2ca4eb..a5f59af 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -129,7 +129,7 @@ static u32 dccp_li_calc_first_li(struct sock *sk, u16 s, u32 bytes_recv, u32 previous_x_recv) { - struct dccp_rx_hist_entry *entry, *next, *tail = NULL; + struct tfrc_rx_hist_entry *entry, *next, *tail = NULL; u32 x_recv, p; suseconds_t rtt, delta; ktime_t tstamp = ktime_set(0, 0); @@ -138,18 +138,18 @@ static u32 dccp_li_calc_first_li(struct sock *sk, int step = 0; u64 fval; - list_for_each_entry_safe(entry, next, hist_list, dccphrx_node) { - if (dccp_rx_hist_entry_data_packet(entry)) { + list_for_each_entry_safe(entry, next, hist_list, tfrchrx_node) { + if (tfrc_rx_hist_entry_data_packet(entry)) { tail = entry; switch (step) { case 0: - tstamp = entry->dccphrx_tstamp; - win_count = entry->dccphrx_ccval; + tstamp = entry->tfrchrx_tstamp; + win_count = entry->tfrchrx_ccval; step = 1; break; case 1: - interval = win_count - entry->dccphrx_ccval; + interval = win_count - entry->tfrchrx_ccval; if (interval < 0) interval += TFRC_WIN_COUNT_LIMIT; if (interval > 4) @@ -176,7 +176,7 @@ found: return ~0; } - delta = ktime_us_delta(tstamp, tail->dccphrx_tstamp); + delta = ktime_us_delta(tstamp, tail->tfrchrx_tstamp); DCCP_BUG_ON(delta < 0); rtt = delta * 4 / interval; diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index e1ab853..255cca1 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -116,58 +116,58 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); */ static struct kmem_cache *tfrc_rx_hist_slab; -struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp, +struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp, const struct sk_buff *skb, const gfp_t prio) { - struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, + struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, prio); if (entry != NULL) { const struct dccp_hdr *dh = dccp_hdr(skb); - entry->dccphrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->dccphrx_ccval = dh->dccph_ccval; - entry->dccphrx_type = dh->dccph_type; - entry->dccphrx_ndp = ndp; - entry->dccphrx_tstamp = ktime_get_real(); + entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->tfrchrx_ccval = dh->dccph_ccval; + entry->tfrchrx_type = dh->dccph_type; + entry->tfrchrx_ndp = ndp; + entry->tfrchrx_tstamp = ktime_get_real(); } return entry; } -EXPORT_SYMBOL_GPL(dccp_rx_hist_entry_new); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new); -static inline void dccp_rx_hist_entry_delete(struct dccp_rx_hist_entry *entry) +static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry) { kmem_cache_free(tfrc_rx_hist_slab, entry); } -int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, +int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, u8 *ccval) { - struct dccp_rx_hist_entry *packet = NULL, *entry; + struct tfrc_rx_hist_entry *packet = NULL, *entry; - list_for_each_entry(entry, list, dccphrx_node) - if (entry->dccphrx_seqno == seq) { + list_for_each_entry(entry, list, tfrchrx_node) + if (entry->tfrchrx_seqno == seq) { packet = entry; break; } if (packet) - *ccval = packet->dccphrx_ccval; + *ccval = packet->tfrchrx_ccval; return packet != NULL; } -EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry); -struct dccp_rx_hist_entry * - dccp_rx_hist_find_data_packet(const struct list_head *list) +EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry); +struct tfrc_rx_hist_entry * + tfrc_rx_hist_find_data_packet(const struct list_head *list) { - struct dccp_rx_hist_entry *entry, *packet = NULL; + struct tfrc_rx_hist_entry *entry, *packet = NULL; - list_for_each_entry(entry, list, dccphrx_node) - if (entry->dccphrx_type == DCCP_PKT_DATA || - entry->dccphrx_type == DCCP_PKT_DATAACK) { + list_for_each_entry(entry, list, tfrchrx_node) + if (entry->tfrchrx_type == DCCP_PKT_DATA || + entry->tfrchrx_type == DCCP_PKT_DATAACK) { packet = entry; break; } @@ -175,29 +175,29 @@ struct dccp_rx_hist_entry * return packet; } -EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet); -void dccp_rx_hist_add_packet(struct list_head *rx_list, +void tfrc_rx_hist_add_packet(struct list_head *rx_list, struct list_head *li_list, - struct dccp_rx_hist_entry *packet, + struct tfrc_rx_hist_entry *packet, u64 nonloss_seqno) { - struct dccp_rx_hist_entry *entry, *next; + struct tfrc_rx_hist_entry *entry, *next; u8 num_later = 0; - list_add(&packet->dccphrx_node, rx_list); + list_add(&packet->tfrchrx_node, rx_list); num_later = TFRC_RECV_NUM_LATE_LOSS + 1; if (!list_empty(li_list)) { - list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) { + list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { if (num_later == 0) { if (after48(nonloss_seqno, - entry->dccphrx_seqno)) { - list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(entry); + entry->tfrchrx_seqno)) { + list_del_init(&entry->tfrchrx_node); + tfrc_rx_hist_entry_delete(entry); } - } else if (dccp_rx_hist_entry_data_packet(entry)) + } else if (tfrc_rx_hist_entry_data_packet(entry)) --num_later; } } else { @@ -208,7 +208,7 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list, * We have no loss interval history so we need at least one * rtt:s of data packets to approximate rtt. */ - list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) { + list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { if (num_later == 0) { switch (step) { case 0: @@ -220,10 +220,10 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list, step = 2; /* OK, find next data packet */ num_later = 1; - win_count = entry->dccphrx_ccval; + win_count = entry->tfrchrx_ccval; break; case 2: - tmp = win_count - entry->dccphrx_ccval; + tmp = win_count - entry->tfrchrx_ccval; if (tmp < 0) tmp += TFRC_WIN_COUNT_LIMIT; if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) { @@ -236,29 +236,29 @@ void dccp_rx_hist_add_packet(struct list_head *rx_list, num_later = 1; break; case 3: - list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(entry); + list_del_init(&entry->tfrchrx_node); + tfrc_rx_hist_entry_delete(entry); break; } - } else if (dccp_rx_hist_entry_data_packet(entry)) + } else if (tfrc_rx_hist_entry_data_packet(entry)) --num_later; } } } -EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); -void dccp_rx_hist_purge(struct list_head *list) +void tfrc_rx_hist_purge(struct list_head *list) { - struct dccp_rx_hist_entry *entry, *next; + struct tfrc_rx_hist_entry *entry, *next; - list_for_each_entry_safe(entry, next, list, dccphrx_node) { - list_del_init(&entry->dccphrx_node); - dccp_rx_hist_entry_delete(entry); + list_for_each_entry_safe(entry, next, list, tfrchrx_node) { + list_del_init(&entry->tfrchrx_node); + tfrc_rx_hist_entry_delete(entry); } } -EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); __init int packet_history_init(void) { @@ -269,7 +269,7 @@ __init int packet_history_init(void) goto out_err; tfrc_rx_hist_slab = kmem_cache_create("tfrc_rx_hist", - sizeof(struct dccp_rx_hist_entry), 0, + sizeof(struct tfrc_rx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); if (tfrc_rx_hist_slab == NULL) goto out_free_tx; diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 34b180b..5b0b983 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -57,51 +57,51 @@ extern u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, /* * Receiver History data structures and declarations */ -struct dccp_rx_hist_entry { - struct list_head dccphrx_node; - u64 dccphrx_seqno:48, - dccphrx_ccval:4, - dccphrx_type:4; - u32 dccphrx_ndp; /* In fact it is from 8 to 24 bits */ - ktime_t dccphrx_tstamp; +struct tfrc_rx_hist_entry { + struct list_head tfrchrx_node; + u64 tfrchrx_seqno:48, + tfrchrx_ccval:4, + tfrchrx_type:4; + u32 tfrchrx_ndp; /* In fact it is from 8 to 24 bits */ + ktime_t tfrchrx_tstamp; }; -extern struct dccp_rx_hist_entry * - dccp_rx_hist_entry_new(const u32 ndp, +extern struct tfrc_rx_hist_entry * + tfrc_rx_hist_entry_new(const u32 ndp, const struct sk_buff *skb, const gfp_t prio); -static inline struct dccp_rx_hist_entry * - dccp_rx_hist_head(struct list_head *list) +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_head(struct list_head *list) { - struct dccp_rx_hist_entry *head = NULL; + struct tfrc_rx_hist_entry *head = NULL; if (!list_empty(list)) - head = list_entry(list->next, struct dccp_rx_hist_entry, - dccphrx_node); + head = list_entry(list->next, struct tfrc_rx_hist_entry, + tfrchrx_node); return head; } -extern int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, +extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, u8 *ccval); -extern struct dccp_rx_hist_entry * - dccp_rx_hist_find_data_packet(const struct list_head *list); +extern struct tfrc_rx_hist_entry * + tfrc_rx_hist_find_data_packet(const struct list_head *list); -extern void dccp_rx_hist_add_packet(struct list_head *rx_list, +extern void tfrc_rx_hist_add_packet(struct list_head *rx_list, struct list_head *li_list, - struct dccp_rx_hist_entry *packet, + struct tfrc_rx_hist_entry *packet, u64 nonloss_seqno); -extern void dccp_rx_hist_purge(struct list_head *list); +extern void tfrc_rx_hist_purge(struct list_head *list); static inline int - dccp_rx_hist_entry_data_packet(const struct dccp_rx_hist_entry *entry) + tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry) { - return entry->dccphrx_type == DCCP_PKT_DATA || - entry->dccphrx_type == DCCP_PKT_DATAACK; + return entry->tfrchrx_type == DCCP_PKT_DATA || + entry->tfrchrx_type == DCCP_PKT_DATAACK; } -extern u64 dccp_rx_hist_detect_loss(struct list_head *rx_list, +extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list, struct list_head *li_list, u8 *win_loss); #endif /* _DCCP_PKT_HIST_ */ -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values 2007-12-06 21:02 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 7/7] [TFRC]: New rx history code Arnaldo Carvalho de Melo 0 siblings, 1 reply; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller Cc: netdev, dccp, Gerrit Renker, Ian McDonald, Arnaldo Carvalho de Melo From: Gerrit Renker <gerrit@erg.abdn.ac.uk> Only the sender sets window counters [RFC 4342, sections 5 and 8.1]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index c95dca8..5ff5aab 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -733,7 +733,6 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) return 0; hcrx = ccid3_hc_rx_sk(sk); - DCCP_SKB_CB(skb)->dccpd_ccval = hcrx->ccid3hcrx_ccval_last_counter; if (dccp_packet_without_ack(skb)) return 0; -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 7/7] [TFRC]: New rx history code 2007-12-06 21:02 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo @ 2007-12-06 21:02 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 28+ messages in thread From: Arnaldo Carvalho de Melo @ 2007-12-06 21:02 UTC (permalink / raw) To: David S. Miller; +Cc: netdev, dccp, Arnaldo Carvalho de Melo, Gerrit Renker Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, renaming some functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: ----------- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- net/dccp/ccids/ccid3.c | 288 ++++++++++++---------------------- net/dccp/ccids/ccid3.h | 14 +- net/dccp/ccids/lib/loss_interval.c | 13 ++- net/dccp/ccids/lib/packet_history.c | 290 +++++++++++++++++++++-------------- net/dccp/ccids/lib/packet_history.h | 83 +++++------ 5 files changed, 330 insertions(+), 358 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5ff5aab..faacffa 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, /* * Receiver Half-Connection Routines */ + +/* CCID3 feedback types */ +enum ccid3_fback_type { + CCID3_FBACK_NONE = 0, + CCID3_FBACK_INITIAL, + CCID3_FBACK_PERIODIC, + CCID3_FBACK_PARAM_CHANGE +}; + #ifdef CONFIG_IP_DCCP_CCID3_DEBUG static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) { @@ -667,59 +676,60 @@ static void ccid3_hc_rx_set_state(struct sock *sk, hcrx->ccid3hcrx_state = state; } -static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) -{ - if (likely(len > 0)) /* don't update on empty packets (e.g. ACKs) */ - hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, len, 9); -} - -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, + const struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct tfrc_rx_hist_entry *packet; ktime_t now; - suseconds_t delta; + s64 delta = 0; ccid3_pr_debug("%s(%p) - entry \n", dccp_role(sk), sk); + if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; + now = ktime_get_real(); - switch (hcrx->ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case CCID3_FBACK_INITIAL: hcrx->ccid3hcrx_x_recv = 0; + hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx->ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta < 0); - hcrx->ccid3hcrx_x_recv = - scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); + case CCID3_FBACK_PARAM_CHANGE: + /* + * When parameters change (new loss or p > p_prev), we do not + * have a reliable estimate for R_m of [RFC 3448, 6.2] and so + * need to reuse the previous value of X_recv. However, when + * X_recv was 0 (due to early loss), this would kill X down to + * s/t_mbi (i.e. one packet in 64 seconds). + * To avoid such drastic reduction, we approximate X_recv as + * the number of bytes since last feedback. + * This is a safe fallback, since X is bounded above by X_calc. + */ + if (hcrx->ccid3hcrx_x_recv > 0) + break; + /* fall through */ + case CCID3_FBACK_PERIODIC: + delta = ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_feedback); + if (delta <= 0) + DCCP_BUG("delta (%ld) <= 0", (long)delta); + else + hcrx->ccid3hcrx_x_recv = + scaled_div32(hcrx->ccid3hcrx_bytes_recv, delta); break; - case TFRC_RSTATE_TERM: - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); + default: return; } - packet = tfrc_rx_hist_find_data_packet(&hcrx->ccid3hcrx_hist); - if (unlikely(packet == NULL)) { - DCCP_WARN("%s(%p), no data packet in history!\n", - dccp_role(sk), sk); - return; - } + ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + hcrx->ccid3hcrx_x_recv, hcrx->ccid3hcrx_pinv); hcrx->ccid3hcrx_tstamp_last_feedback = now; - hcrx->ccid3hcrx_ccval_last_counter = packet->tfrchrx_ccval; + hcrx->ccid3hcrx_last_counter = dccp_hdr(skb)->dccph_ccval; hcrx->ccid3hcrx_bytes_recv = 0; - if (hcrx->ccid3hcrx_p == 0) - hcrx->ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ - else if (hcrx->ccid3hcrx_p > 1000000) { - DCCP_WARN("p (%u) > 100%%\n", hcrx->ccid3hcrx_p); - hcrx->ccid3hcrx_pinv = 1; /* use 100% in this case */ - } else - hcrx->ccid3hcrx_pinv = 1000000 / hcrx->ccid3hcrx_p; - dp->dccps_hc_rx_insert_options = 1; dccp_send_ack(sk); } @@ -750,165 +760,74 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) return 0; } -static int ccid3_hc_rx_detect_loss(struct sock *sk, - struct tfrc_rx_hist_entry *packet) -{ - struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct tfrc_rx_hist_entry *rx_hist = - tfrc_rx_hist_head(&hcrx->ccid3hcrx_hist); - u64 seqno = packet->tfrchrx_seqno; - u64 tmp_seqno; - int loss = 0; - u8 ccval; - - - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; - - if (!rx_hist || - follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { - hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; - goto detect_out; - } - - - while (dccp_delta_seqno(hcrx->ccid3hcrx_seqno_nonloss, seqno) - > TFRC_RECV_NUM_LATE_LOSS) { - loss = 1; - dccp_li_update_li(sk, - &hcrx->ccid3hcrx_li_hist, - &hcrx->ccid3hcrx_hist, - hcrx->ccid3hcrx_tstamp_last_feedback, - hcrx->ccid3hcrx_s, - hcrx->ccid3hcrx_bytes_recv, - hcrx->ccid3hcrx_x_recv, - hcrx->ccid3hcrx_seqno_nonloss, - hcrx->ccid3hcrx_ccval_nonloss); - tmp_seqno = hcrx->ccid3hcrx_seqno_nonloss; - dccp_inc_seqno(&tmp_seqno); - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; - dccp_inc_seqno(&tmp_seqno); - while (tfrc_rx_hist_find_entry(&hcrx->ccid3hcrx_hist, - tmp_seqno, &ccval)) { - hcrx->ccid3hcrx_seqno_nonloss = tmp_seqno; - hcrx->ccid3hcrx_ccval_nonloss = ccval; - dccp_inc_seqno(&tmp_seqno); - } - } - - /* FIXME - this code could be simplified with above while */ - /* but works at moment */ - if (follows48(packet->tfrchrx_seqno, hcrx->ccid3hcrx_seqno_nonloss)) { - hcrx->ccid3hcrx_seqno_nonloss = seqno; - hcrx->ccid3hcrx_ccval_nonloss = packet->tfrchrx_ccval; - } - -detect_out: - tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, - &hcrx->ccid3hcrx_li_hist, packet, - hcrx->ccid3hcrx_seqno_nonloss); - return loss; -} - static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - const struct dccp_options_received *opt_recv; - struct tfrc_rx_hist_entry *packet; - u32 p_prev, r_sample, rtt_prev; - int loss, payload_size; - ktime_t now; - - opt_recv = &dccp_sk(sk)->dccps_options_received; - - switch (DCCP_SKB_CB(skb)->dccpd_type) { - case DCCP_PKT_ACK: - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) - return; - case DCCP_PKT_DATAACK: - if (opt_recv->dccpor_timestamp_echo == 0) - break; - r_sample = dccp_timestamp() - opt_recv->dccpor_timestamp_echo; - rtt_prev = hcrx->ccid3hcrx_rtt; - r_sample = dccp_sample_rtt(sk, 10 * r_sample); - - if (hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA) - hcrx->ccid3hcrx_rtt = r_sample; - else - hcrx->ccid3hcrx_rtt = (hcrx->ccid3hcrx_rtt * 9) / 10 + - r_sample / 10; - - if (rtt_prev != hcrx->ccid3hcrx_rtt) - ccid3_pr_debug("%s(%p), New RTT=%uus, elapsed time=%u\n", - dccp_role(sk), sk, hcrx->ccid3hcrx_rtt, - opt_recv->dccpor_elapsed_time); - break; - case DCCP_PKT_DATA: - break; - default: /* We're not interested in other packet types, move along */ - return; - } - - packet = tfrc_rx_hist_entry_new(opt_recv->dccpor_ndp, skb, GFP_ATOMIC); - if (unlikely(packet == NULL)) { - DCCP_WARN("%s(%p), Not enough mem to add rx packet " - "to history, consider it lost!\n", dccp_role(sk), sk); - return; - } - - loss = ccid3_hc_rx_detect_loss(sk, packet); + enum ccid3_fback_type do_feedback = CCID3_FBACK_NONE; + const u32 ndp = dccp_sk(sk)->dccps_options_received.dccpor_ndp; + const bool is_data_packet = dccp_data_packet(skb); + + if (unlikely(hcrx->ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { + if (is_data_packet) { + const u32 payload = skb->len - dccp_hdr(skb)->dccph_doff * 4; + do_feedback = CCID3_FBACK_INITIAL; + ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); + hcrx->ccid3hcrx_s = payload; + /* + * Not necessary to update ccid3hcrx_bytes_recv here, + * since X_recv = 0 for the first feedback packet (cf. + * RFC 3448, 6.3) -- gerrit + */ + } + goto update_records; + } - if (DCCP_SKB_CB(skb)->dccpd_type == DCCP_PKT_ACK) - return; + if (tfrc_rx_hist_duplicate(&hcrx->ccid3hcrx_hist, skb)) + return; /* done receiving */ - payload_size = skb->len - dccp_hdr(skb)->dccph_doff * 4; - ccid3_hc_rx_update_s(hcrx, payload_size); + if (is_data_packet) { + const u32 payload = skb->len - dccp_hdr(skb)->dccph_doff * 4; + /* + * Update moving-average of s and the sum of received payload bytes + */ + hcrx->ccid3hcrx_s = tfrc_ewma(hcrx->ccid3hcrx_s, payload, 9); + hcrx->ccid3hcrx_bytes_recv += payload; + } - switch (hcrx->ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: - ccid3_pr_debug("%s(%p, state=%s), skb=%p, sending initial " - "feedback\n", dccp_role(sk), sk, - dccp_state_name(sk->sk_state), skb); - ccid3_hc_rx_send_feedback(sk); - ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); - return; - case TFRC_RSTATE_DATA: - hcrx->ccid3hcrx_bytes_recv += payload_size; - if (loss) - break; + /* + * Handle pending losses and otherwise check for new loss + */ + if (tfrc_rx_hist_new_loss_indicated(&hcrx->ccid3hcrx_hist, skb, ndp)) + goto update_records; - now = ktime_get_real(); - if ((ktime_us_delta(now, hcrx->ccid3hcrx_tstamp_last_ack) - - (s64)hcrx->ccid3hcrx_rtt) >= 0) { - hcrx->ccid3hcrx_tstamp_last_ack = now; - ccid3_hc_rx_send_feedback(sk); - } - return; - case TFRC_RSTATE_TERM: - DCCP_BUG("%s(%p) - Illegal state TERM", dccp_role(sk), sk); - return; + /* + * Handle data packets: RTT sampling and monitoring p + */ + if (unlikely(!is_data_packet)) + goto update_records; + + if (list_empty(&hcrx->ccid3hcrx_li_hist)) { /* no loss so far: p = 0 */ + const u32 sample = tfrc_rx_hist_sample_rtt(&hcrx->ccid3hcrx_hist, skb); + /* + * Empty loss history: no loss so far, hence p stays 0. + * Sample RTT values, since an RTT estimate is required for the + * computation of p when the first loss occurs; RFC 3448, 6.3.1. + */ + if (sample != 0) + hcrx->ccid3hcrx_rtt = tfrc_ewma(hcrx->ccid3hcrx_rtt, sample, 9); } - /* Dealing with packet loss */ - ccid3_pr_debug("%s(%p, state=%s), data loss! Reacting...\n", - dccp_role(sk), sk, dccp_state_name(sk->sk_state)); - - p_prev = hcrx->ccid3hcrx_p; - - /* Calculate loss event rate */ - if (!list_empty(&hcrx->ccid3hcrx_li_hist)) { - u32 i_mean = dccp_li_hist_calc_i_mean(&hcrx->ccid3hcrx_li_hist); + /* + * Check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 + */ + if (SUB16(dccp_hdr(skb)->dccph_ccval, hcrx->ccid3hcrx_last_counter) > 3) + do_feedback = CCID3_FBACK_PERIODIC; - /* Scaling up by 1000000 as fixed decimal */ - if (i_mean != 0) - hcrx->ccid3hcrx_p = 1000000 / i_mean; - } else - DCCP_BUG("empty loss history"); +update_records: + tfrc_rx_hist_add_packet(&hcrx->ccid3hcrx_hist, skb, ndp); - if (hcrx->ccid3hcrx_p > p_prev) { - ccid3_hc_rx_send_feedback(sk); - return; - } + if (do_feedback) + ccid3_hc_rx_send_feedback(sk, skb, do_feedback); } static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) @@ -918,11 +837,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) ccid3_pr_debug("entry\n"); hcrx->ccid3hcrx_state = TFRC_RSTATE_NO_DATA; - INIT_LIST_HEAD(&hcrx->ccid3hcrx_hist); INIT_LIST_HEAD(&hcrx->ccid3hcrx_li_hist); - hcrx->ccid3hcrx_tstamp_last_feedback = - hcrx->ccid3hcrx_tstamp_last_ack = ktime_get_real(); - return 0; + return tfrc_rx_hist_alloc(&hcrx->ccid3hcrx_hist); } static void ccid3_hc_rx_exit(struct sock *sk) diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h index b842a7d..3c33dc6 100644 --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -1,7 +1,8 @@ /* * net/dccp/ccids/ccid3.h * - * Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK * * An implementation of the DCCP protocol * @@ -135,9 +136,7 @@ enum ccid3_hc_rx_states { * @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448 4.3) * @ccid3hcrx_rtt - Receiver estimate of rtt (non-standard) * @ccid3hcrx_p - current loss event rate (RFC 3448 5.4) - * @ccid3hcrx_seqno_nonloss - Last received non-loss sequence number - * @ccid3hcrx_ccval_nonloss - Last received non-loss Window CCVal - * @ccid3hcrx_ccval_last_counter - Tracks window counter (RFC 4342, 8.1) + * @ccid3hcrx_last_counter - Tracks window counter (RFC 4342, 8.1) * @ccid3hcrx_state - receiver state, one of %ccid3_hc_rx_states * @ccid3hcrx_bytes_recv - Total sum of DCCP payload bytes * @ccid3hcrx_tstamp_last_feedback - Time at which last feedback was sent @@ -152,14 +151,11 @@ struct ccid3_hc_rx_sock { #define ccid3hcrx_x_recv ccid3hcrx_tfrc.tfrcrx_x_recv #define ccid3hcrx_rtt ccid3hcrx_tfrc.tfrcrx_rtt #define ccid3hcrx_p ccid3hcrx_tfrc.tfrcrx_p - u64 ccid3hcrx_seqno_nonloss:48, - ccid3hcrx_ccval_nonloss:4, - ccid3hcrx_ccval_last_counter:4; + u8 ccid3hcrx_last_counter:4; enum ccid3_hc_rx_states ccid3hcrx_state:8; u32 ccid3hcrx_bytes_recv; ktime_t ccid3hcrx_tstamp_last_feedback; - ktime_t ccid3hcrx_tstamp_last_ack; - struct list_head ccid3hcrx_hist; + struct tfrc_rx_hist ccid3hcrx_hist; struct list_head ccid3hcrx_li_hist; u16 ccid3hcrx_s; u32 ccid3hcrx_pinv; diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index a5f59af..7e0714a 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -129,6 +129,13 @@ static u32 dccp_li_calc_first_li(struct sock *sk, u16 s, u32 bytes_recv, u32 previous_x_recv) { +/* + * FIXME: + * Will be rewritten in the upcoming new loss intervals code. + * Has to be commented ou because it relies on the old rx history + * data structures + */ +#if 0 struct tfrc_rx_hist_entry *entry, *next, *tail = NULL; u32 x_recv, p; suseconds_t rtt, delta; @@ -216,10 +223,10 @@ found: dccp_pr_debug("%s(%p), receive rate=%u bytes/s, implied " "loss rate=%u\n", dccp_role(sk), sk, x_recv, p); - if (p == 0) - return ~0; - else + if (p != 0) return 1000000 / p; +#endif + return ~0; } void dccp_li_update_li(struct sock *sk, diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 255cca1..e197389 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -36,7 +36,9 @@ */ #include <linux/string.h> +#include <linux/slab.h> #include "packet_history.h" +#include "../../dccp.h" /** * tfrc_tx_hist_entry - Simple singly-linked TX history list @@ -111,154 +113,214 @@ u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, } EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); + +/** + * tfrc_rx_hist_index - index to reach n-th entry after loss_start + */ +static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n) +{ + return (h->loss_start + n) & TFRC_NDUPACK; +} + +/** + * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h) +{ + return h->ring[tfrc_rx_hist_index(h, h->loss_count)]; +} + /* * Receiver History Routines */ static struct kmem_cache *tfrc_rx_hist_slab; -struct tfrc_rx_hist_entry *tfrc_rx_hist_entry_new(const u32 ndp, - const struct sk_buff *skb, - const gfp_t prio) +void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, + const struct sk_buff *skb, + const u32 ndp) { - struct tfrc_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, - prio); - - if (entry != NULL) { - const struct dccp_hdr *dh = dccp_hdr(skb); - - entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; - entry->tfrchrx_ccval = dh->dccph_ccval; - entry->tfrchrx_type = dh->dccph_type; - entry->tfrchrx_ndp = ndp; - entry->tfrchrx_tstamp = ktime_get_real(); - } - - return entry; + struct tfrc_rx_hist_entry *entry = tfrc_rx_hist_last_rcv(h); + const struct dccp_hdr *dh = dccp_hdr(skb); + + entry->tfrchrx_seqno = DCCP_SKB_CB(skb)->dccpd_seq; + entry->tfrchrx_ccval = dh->dccph_ccval; + entry->tfrchrx_type = dh->dccph_type; + entry->tfrchrx_ndp = ndp; + entry->tfrchrx_tstamp = ktime_get_real(); } -EXPORT_SYMBOL_GPL(tfrc_rx_hist_entry_new); +EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); static inline void tfrc_rx_hist_entry_delete(struct tfrc_rx_hist_entry *entry) { kmem_cache_free(tfrc_rx_hist_slab, entry); } -int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval) +/** + * tfrc_rx_hist_entry - return the n-th history entry after loss_start + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n) { - struct tfrc_rx_hist_entry *packet = NULL, *entry; + return h->ring[tfrc_rx_hist_index(h, n)]; +} - list_for_each_entry(entry, list, tfrchrx_node) - if (entry->tfrchrx_seqno == seq) { - packet = entry; - break; - } +/** + * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h) +{ + return h->ring[h->loss_start]; +} + +/* has the packet contained in skb been seen before? */ +int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb) +{ + const u64 seq = DCCP_SKB_CB(skb)->dccpd_seq; + int i; + + if (dccp_delta_seqno(tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno, seq) <= 0) + return 1; - if (packet) - *ccval = packet->tfrchrx_ccval; + for (i = 1; i <= h->loss_count; i++) + if (tfrc_rx_hist_entry(h, i)->tfrchrx_seqno == seq) + return 1; - return packet != NULL; + return 0; } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_duplicate); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_entry); -struct tfrc_rx_hist_entry * - tfrc_rx_hist_find_data_packet(const struct list_head *list) +/* initialise loss detection and disable RTT sampling */ +static inline void tfrc_rx_hist_loss_indicated(struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *packet = NULL; + h->loss_count = 1; +} - list_for_each_entry(entry, list, tfrchrx_node) - if (entry->tfrchrx_type == DCCP_PKT_DATA || - entry->tfrchrx_type == DCCP_PKT_DATAACK) { - packet = entry; - break; - } +/* indicate whether previously a packet was detected missing */ +static inline int tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h) +{ + return h->loss_count; +} + +/* any data packets missing between last reception and skb ? */ +int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, + const struct sk_buff *skb, u32 ndp) +{ + int delta = dccp_delta_seqno(tfrc_rx_hist_last_rcv(h)->tfrchrx_seqno, + DCCP_SKB_CB(skb)->dccpd_seq); + + if (delta > 1 && ndp < delta) + tfrc_rx_hist_loss_indicated(h); - return packet; + return tfrc_rx_hist_loss_pending(h); } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_new_loss_indicated); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_find_data_packet); +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) +{ + int i; + + for (i = 0; i <= TFRC_NDUPACK; i++) { + h->ring[i] = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); + if (h->ring[i] == NULL) + goto out_free; + } + + h->loss_count = h->loss_start = 0; + return 0; + +out_free: + while (i-- != 0) { + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]); + h->ring[i] = NULL; + } + return -ENOBUFS; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_alloc); -void tfrc_rx_hist_add_packet(struct list_head *rx_list, - struct list_head *li_list, - struct tfrc_rx_hist_entry *packet, - u64 nonloss_seqno) +void tfrc_rx_hist_purge(struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *next; - u8 num_later = 0; - - list_add(&packet->tfrchrx_node, rx_list); - - num_later = TFRC_RECV_NUM_LATE_LOSS + 1; - - if (!list_empty(li_list)) { - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { - if (num_later == 0) { - if (after48(nonloss_seqno, - entry->tfrchrx_seqno)) { - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); - } - } else if (tfrc_rx_hist_entry_data_packet(entry)) - --num_later; - } - } else { - int step = 0; - u8 win_count = 0; /* Not needed, but lets shut up gcc */ - int tmp; - /* - * We have no loss interval history so we need at least one - * rtt:s of data packets to approximate rtt. - */ - list_for_each_entry_safe(entry, next, rx_list, tfrchrx_node) { - if (num_later == 0) { - switch (step) { - case 0: - step = 1; - /* OK, find next data packet */ - num_later = 1; - break; - case 1: - step = 2; - /* OK, find next data packet */ - num_later = 1; - win_count = entry->tfrchrx_ccval; - break; - case 2: - tmp = win_count - entry->tfrchrx_ccval; - if (tmp < 0) - tmp += TFRC_WIN_COUNT_LIMIT; - if (tmp > TFRC_WIN_COUNT_PER_RTT + 1) { - /* - * We have found a packet older - * than one rtt remove the rest - */ - step = 3; - } else /* OK, find next data packet */ - num_later = 1; - break; - case 3: - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); - break; - } - } else if (tfrc_rx_hist_entry_data_packet(entry)) - --num_later; + int i; + + for (i = 0; i <= TFRC_NDUPACK; ++i) + if (h->ring[i] != NULL) { + kmem_cache_free(tfrc_rx_hist_slab, h->ring[i]); + h->ring[i] = NULL; } - } } +EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); -EXPORT_SYMBOL_GPL(tfrc_rx_hist_add_packet); +/** + * tfrc_rx_hist_rtt_last_s - reference entry to compute RTT samples against + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_rtt_last_s(const struct tfrc_rx_hist *h) +{ + return h->ring[0]; +} -void tfrc_rx_hist_purge(struct list_head *list) +/** + * tfrc_rx_hist_rtt_prev_s: previously suitable (wrt rtt_last_s) RTT-sampling entry + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_rtt_prev_s(const struct tfrc_rx_hist *h) { - struct tfrc_rx_hist_entry *entry, *next; + return h->ring[h->rtt_sample_prev]; +} - list_for_each_entry_safe(entry, next, list, tfrchrx_node) { - list_del_init(&entry->tfrchrx_node); - tfrc_rx_hist_entry_delete(entry); +/** + * tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal + * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able + * to compute a sample with given data - calling function should check this. + */ +u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb) +{ + u32 sample = 0, + delta_v = SUB16(dccp_hdr(skb)->dccph_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + + if (delta_v < 1 || delta_v > 4) { /* unsuitable CCVal delta */ + if (h->rtt_sample_prev == 2) { /* previous candidate stored */ + sample = SUB16(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + if (sample) + sample = 4 / sample * + ktime_us_delta(tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_tstamp, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp); + else /* + * FIXME: This condition is in principle not + * possible but occurs when CCID is used for + * two-way data traffic. I have tried to trace + * it, but the cause does not seem to be here. + */ + DCCP_BUG("please report to dccp@vger.kernel.org" + " => prev = %u, last = %u", + tfrc_rx_hist_rtt_prev_s(h)->tfrchrx_ccval, + tfrc_rx_hist_rtt_last_s(h)->tfrchrx_ccval); + } else if (delta_v < 1) { + h->rtt_sample_prev = 1; + goto keep_ref_for_next_time; + } + + } else if (delta_v == 4) /* optimal match */ + sample = ktime_to_us(net_timedelta(tfrc_rx_hist_rtt_last_s(h)->tfrchrx_tstamp)); + else { /* suboptimal match */ + h->rtt_sample_prev = 2; + goto keep_ref_for_next_time; } -} -EXPORT_SYMBOL_GPL(tfrc_rx_hist_purge); + if (unlikely(sample > DCCP_SANE_RTT_MAX)) { + DCCP_WARN("RTT sample %u too large, using max\n", sample); + sample = DCCP_SANE_RTT_MAX; + } + + h->rtt_sample_prev = 0; /* use current entry as next reference */ +keep_ref_for_next_time: + + return sample; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt); __init int packet_history_init(void) { diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 5b0b983..3dfd182 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -37,15 +37,9 @@ #define _DCCP_PKT_HIST_ #include <linux/ktime.h> -#include <linux/list.h> -#include <linux/slab.h> -#include "tfrc.h" +#include <linux/types.h> -/* Number of later packets received before one is considered lost */ -#define TFRC_RECV_NUM_LATE_LOSS 3 - -#define TFRC_WIN_COUNT_PER_RTT 4 -#define TFRC_WIN_COUNT_LIMIT 16 +struct sk_buff; struct tfrc_tx_hist_entry; @@ -54,11 +48,20 @@ extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp); extern u32 tfrc_tx_hist_rtt(struct tfrc_tx_hist_entry *head, const u64 seqno, const ktime_t now); -/* - * Receiver History data structures and declarations +/* Subtraction a-b modulo-16, respects circular wrap-around */ +#define SUB16(a, b) (((a) + 16 - (b)) & 0xF) + +/* Number of packets to wait after a missing packet (RFC 4342, 6.1) */ +#define TFRC_NDUPACK 3 + +/** + * tfrc_rx_hist_entry - Store information about a single received packet + * @tfrchrx_seqno: DCCP packet sequence number + * @tfrchrx_ccval: window counter value of packet (RFC 4342, 8.1) + * @tfrchrx_ndp: the NDP count (if any) of the packet + * @tfrchrx_tstamp: actual receive time of packet */ struct tfrc_rx_hist_entry { - struct list_head tfrchrx_node; u64 tfrchrx_seqno:48, tfrchrx_ccval:4, tfrchrx_type:4; @@ -66,42 +69,30 @@ struct tfrc_rx_hist_entry { ktime_t tfrchrx_tstamp; }; -extern struct tfrc_rx_hist_entry * - tfrc_rx_hist_entry_new(const u32 ndp, - const struct sk_buff *skb, - const gfp_t prio); - -static inline struct tfrc_rx_hist_entry * - tfrc_rx_hist_head(struct list_head *list) -{ - struct tfrc_rx_hist_entry *head = NULL; - - if (!list_empty(list)) - head = list_entry(list->next, struct tfrc_rx_hist_entry, - tfrchrx_node); - return head; -} - -extern int tfrc_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval); -extern struct tfrc_rx_hist_entry * - tfrc_rx_hist_find_data_packet(const struct list_head *list); - -extern void tfrc_rx_hist_add_packet(struct list_head *rx_list, - struct list_head *li_list, - struct tfrc_rx_hist_entry *packet, - u64 nonloss_seqno); - -extern void tfrc_rx_hist_purge(struct list_head *list); +/** + * tfrc_rx_hist - RX history structure for TFRC-based protocols + * + * @ring: Packet history for RTT sampling and loss detection + * @loss_count: Number of entries in circular history + * @loss_start: Movable index (for loss detection) + * @rtt_sample_prev: Used during RTT sampling, points to candidate entry + */ +struct tfrc_rx_hist { + struct tfrc_rx_hist_entry *ring[TFRC_NDUPACK + 1]; + u8 loss_count:2, + loss_start:2; +#define rtt_sample_prev loss_start +}; -static inline int - tfrc_rx_hist_entry_data_packet(const struct tfrc_rx_hist_entry *entry) -{ - return entry->tfrchrx_type == DCCP_PKT_DATA || - entry->tfrchrx_type == DCCP_PKT_DATAACK; -} +extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, + const struct sk_buff *skb, const u32 ndp); -extern u64 tfrc_rx_hist_detect_loss(struct list_head *rx_list, - struct list_head *li_list, u8 *win_loss); +extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb); +extern int tfrc_rx_hist_new_loss_indicated(struct tfrc_rx_hist *h, + const struct sk_buff *skb, u32 ndp); +extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, + const struct sk_buff *skb); +extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h); +extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h); #endif /* _DCCP_PKT_HIST_ */ -- 1.5.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2007-12-06 21:04 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-02 21:36 [RFC][PATCHES 0/7]: Reorganization of RX history patches Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo 2007-12-02 21:36 ` [PATCH 7/7] [TFRC] New rx history code Arnaldo Carvalho de Melo 2007-12-04 6:55 ` Gerrit Renker 2007-12-04 11:59 ` Arnaldo Carvalho de Melo 2007-12-04 13:48 ` [PATCH 7/7][TAKE 2][TFRC] " Arnaldo Carvalho de Melo 2007-12-05 9:42 ` Gerrit Renker 2007-12-05 9:35 ` [PATCH 7/7] [TFRC] " Gerrit Renker 2007-12-05 12:08 ` Arnaldo Carvalho de Melo 2007-12-05 13:34 ` Gerrit Renker 2007-12-06 13:59 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Gerrit Renker 2007-12-06 13:59 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Gerrit Renker 2007-12-06 13:57 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Gerrit Renker 2007-12-03 8:23 ` [RFC][PATCHES 0/7]: Reorganization of RX history patches Ian McDonald 2007-12-03 8:35 ` Gerrit Renker 2007-12-03 12:44 ` Arnaldo Carvalho de Melo 2007-12-03 13:49 ` Gerrit Renker 2007-12-03 14:54 ` Arnaldo Carvalho de Melo 2007-12-03 15:44 ` Gerrit Renker 2007-12-05 10:27 ` Gerrit Renker 2007-12-05 11:52 ` Arnaldo Carvalho de Melo 2007-12-05 13:45 ` Gerrit Renker -- strict thread matches above, loose matches on Subject: below -- 2007-12-06 21:02 [PATCHES 0/7]: DCCP patches for 2.6.25 Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 1/7] [TFRC]: Provide central source file and debug facility Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets' Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 4/7] [TFRC]: Make the rx history slab be global Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_ Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values Arnaldo Carvalho de Melo 2007-12-06 21:02 ` [PATCH 7/7] [TFRC]: New rx history code Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).