All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling
@ 2013-04-15 16:52 Samuel Jero
  2013-04-26  3:48 ` Gerrit Renker
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Samuel Jero @ 2013-04-15 16:52 UTC (permalink / raw)
  To: dccp

[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]

I have identified a bug in CCID 3's loss interval code affecting the 2nd loss
interval. Essentially, the bug is that CCID 3 doesn't create the 2nd loss
interval until the loss that starts the 3rd interval. Therefore, the 2nd loss
interval can't be used to adjust the loss interval rate sent back to the sender
until another loss occurs.

I have observed this issue resulting in connections that maintain a very low
sending rate for >30 seconds because the computed loss event rate from the
first interval is 1 in 33 or so. When the 2nd loss finally occurs, the computed
loss event rate suddenly jumps to 1 in 2000 or so and the allowed sending rate
increases dramatically. See [1] for a time sequence graph of one of these
connections.

RFC4342 (which is consistent with RFC3448 and 5348) defines a loss interval
as at most one round-trip time's worth of packets that may be lost followed by
an arbitrarily long series of non-dropped packets (6.1). This, of course,
doesn't work for the first interval so 3448 and 5348 require the sender to
compute an artificial loss interval length for this first interval based on
the receive rate (6.3.1). To compute the loss event rate the receiver is
supposed to calculate the average loss interval length over the last 8
intervals. The current loss interval (i.e. the currently growing one) is only
included if it increases the average loss interval length (5.4).

The way this is implemented currently, the first loss interval structure is
created in tfrc_lh_interval_add() (loss_interval.c) when the first packet
loss occurs. It's length is artificially computed as specified in the RFCs.
The bug is that a second loss interval structure is not created at this point
for the new loss interval that starts with this first loss (I interpret 6.3.1
to imply that the first loss ends the first loss interval).

With every packet received, the receiver calls tfrc_update_i_mean() to update
the loss event rate by updating the current loss event length. However, if
there is only one loss event structure this function returns before recomputing
the loss event rate. That behavior is correct, because we don't want to tamper
with the artificially computed first loss event length. If we had that second
loss interval structure, we could update its length and include it in the loss
event rate if/when it became big enough to increase the average loss interval
length and thus decrease the loss event rate.

When the second loss finally occurs, tfrc_lh_interval_add() is called again.
This time the function overwrites the length of the first loss interval structure
with the number of packets since the first loss, in effect deleting the first loss
interval and replacing it by the second. The function then creates a new loss
interval structure for the loss interval starting with the second loss.

(This actually works correctly on all later loss intervals because then the
"current" loss interval structure is actually the current loss interval, so this
code just sets its final length and adds a new loss interval structure for the
new loss interval)

This new loss interval structure will be updated by tfrc_update_i_mean() and
can affect the average loss interval length if it becomes long enough. From
this point on the code will function as the RFCs expect.

My fix (patch below) is simply to add a second loss event structure to
represent the newly started loss event at the time the first loss is detected.


[1] http://www.sjero.net/software/dccp/images/dccp_ccid3_2nd_loss_interval_bug.png

-- 
Samuel Jero
Masters Student
Computer Science
Internetworking Research Group
Ohio University

---
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c
index 47045e1..ad24819 100644
--- a/net/dccp/ccids/lib/loss_interval.c
+++ b/net/dccp/ccids/lib/loss_interval.c
@@ -198,9 +198,21 @@ bool tfrc_lh_interval_add(struct tfrc_loss_hist *lh, struct tfrc_rx_hist *rh,
 	cur->li_ccval	  = cong_evt->tfrchrx_ccval;
 	cur->li_is_closed = false;
 
-	if (++lh->counter == 1)
+	if (++lh->counter == 1) {
 		lh->i_mean = cur->li_length = (*calc_first_li)(sk);
-	else {
+		cur->li_is_closed = true;
+		cur = tfrc_lh_demand_next(lh);
+		if (unlikely(cur == NULL)) {
+			DCCP_CRIT("Cannot allocate/add loss record.");
+			return false;
+		}
+		++lh->counter;
+		cur->li_seqno	  = cong_evt_seqno;
+		cur->li_ccval	  = cong_evt->tfrchrx_ccval;
+		cur->li_is_closed = false;
+		cur->li_length    = dccp_delta_seqno(cur->li_seqno,
+				 tfrc_rx_hist_last_rcv(rh)->tfrchrx_seqno) + 1;
+	} else {
 		/* RFC 5348, 5.3: length of the open loss interval I_0 */
 		cur->li_length = dccp_delta_seqno(cur->li_seqno,
 				 tfrc_rx_hist_last_rcv(rh)->tfrchrx_seqno) + 1;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling
  2013-04-15 16:52 [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling Samuel Jero
@ 2013-04-26  3:48 ` Gerrit Renker
  2013-04-26 18:58 ` Samuel Jero
  2013-04-29  3:00 ` Gerrit Renker
  2 siblings, 0 replies; 4+ messages in thread
From: Gerrit Renker @ 2013-04-26  3:48 UTC (permalink / raw)
  To: dccp

Hi Samuel, -
> 
> My fix (patch below) is simply to add a second loss event structure to
> represent the newly started loss event at the time the first loss is detected.
> 
> 
> [1] http://www.sjero.net/software/dccp/images/dccp_ccid3_2nd_loss_interval_bug.png
> 
Thank you for the patch and the detailed explanation. Are there before/after test results
relative to [1] which indicate that the sharp bend in the diagram is now gone, or any other
notable improvements with this fix?

It is at
  http://eden-feed.erg.abdn.ac.uk/cgi-bin/gitweb.cgi?p‹cp_exp.git;a=commitdiff;h‰06b9aebd7c073345f181120987cd83b7fa5af1

I have not yet updated CCID-4 with regard to this change.
--
To unsubscribe from this list: send the line "unsubscribe dccp" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling
  2013-04-15 16:52 [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling Samuel Jero
  2013-04-26  3:48 ` Gerrit Renker
@ 2013-04-26 18:58 ` Samuel Jero
  2013-04-29  3:00 ` Gerrit Renker
  2 siblings, 0 replies; 4+ messages in thread
From: Samuel Jero @ 2013-04-26 18:58 UTC (permalink / raw)
  To: dccp

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

Gerrit,

>>
>> [1] http://www.sjero.net/software/dccp/images/dccp_ccid3_2nd_loss_interval_bug.png
>>
> Thank you for the patch and the detailed explanation. Are there before/after test results
> relative to [1] which indicate that the sharp bend in the diagram is now gone, or any other
> notable improvements with this fix?

An example of this same test with this fix applied is at [2]. Of course for this bug to show
up as dramatically as in [1] we need to get a single loss while the connection is sending
very slowly and then not receive another loss until much later.

It's much more instructive to look at the Loss Event Rate Options the DCCP receiver sends. 
(DCCP loss event rate options are sent as 1/loss_rate so a loss rate option of 33 represents
a loss rate of 1/33---1 packet out of every 33 sent)

In the buggy case we see Feedback packets with these options (each line is a feedback packet)
        CCID3 Loss Event Rate: 0 (or max)
        CCID3 Loss Event Rate: 0 (or max)
        CCID3 Loss Event Rate: 33
        CCID3 Loss Event Rate: 33
 	CCID3 Loss Event Rate: 33
 	CCID3 Loss Event Rate: 33
 	CCID3 Loss Event Rate: 33
	....
	CCID3 Loss Event Rate: 33
        CCID3 Loss Event Rate: 33
        CCID3 Loss Event Rate: 18191
        CCID3 Loss Event Rate: 18191

While with my fix, we see this:
	CCID3 Loss Event Rate: 0 (or max)
        CCID3 Loss Event Rate: 0 (or max)
        CCID3 Loss Event Rate: 0 (or max)
        CCID3 Loss Event Rate: 434
        CCID3 Loss Event Rate: 434
	...
        CCID3 Loss Event Rate: 434
        CCID3 Loss Event Rate: 434
        CCID3 Loss Event Rate: 442
        CCID3 Loss Event Rate: 467
        CCID3 Loss Event Rate: 491
        CCID3 Loss Event Rate: 517
        CCID3 Loss Event Rate: 545
        CCID3 Loss Event Rate: 572
        CCID3 Loss Event Rate: 603
        CCID3 Loss Event Rate: 637

You'll notice that with this fix, the loss event rate option starts increasing (i.e. the loss event
RATE decreases) after a while. This means that the 2nd loss interval has become large enough to 
start effecting the loss event rate. In the buggy case, we have to wait until that 2nd loss interval
is closed by the 2nd loss before it can effect the rate, resulting in the large jump seen above.

[2] http://www.sjero.net/software/dccp/images/dccp_ccid3_2nd_loss_interval_fixed.png


Samuel Jero
Masters Student
Computer Science
Internetworking Research Group
Ohio University
sj323707@ohio.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling
  2013-04-15 16:52 [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling Samuel Jero
  2013-04-26  3:48 ` Gerrit Renker
  2013-04-26 18:58 ` Samuel Jero
@ 2013-04-29  3:00 ` Gerrit Renker
  2 siblings, 0 replies; 4+ messages in thread
From: Gerrit Renker @ 2013-04-29  3:00 UTC (permalink / raw)
  To: dccp

Hi Samuel,

> You'll notice that with this fix, the loss event rate option starts increasing (i.e. the loss event
> RATE decreases) after a while. This means that the 2nd loss interval has become large enough to 
> start effecting the loss event rate. In the buggy case, we have to wait until that 2nd loss interval
> is closed by the 2nd loss before it can effect the rate, resulting in the large jump seen above.

many thanks indeed for this excellent work and for taking the time to explain the cause of the problem.
I have applied this to the test tree and will be posting a patch to update CCID-4 (based on your patch)
shortly.

Gerrit

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-04-29  3:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 16:52 [RFC][PATCH] tfrc: Correct 2nd Loss Interval Handling Samuel Jero
2013-04-26  3:48 ` Gerrit Renker
2013-04-26 18:58 ` Samuel Jero
2013-04-29  3:00 ` Gerrit Renker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.