* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
@ 2006-09-21 8:30 Gerrit Renker
2006-09-21 12:20 ` Andrea Bittau
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Gerrit Renker @ 2006-09-21 8:30 UTC (permalink / raw)
To: dccp
Hi Andrea,
I think what you are saying touches upon a very crucial issue and it is necessary
to get these architectural issues `right', i.e. generic and scalable enough, before
proceeding with other work (I'd be willing to put in effort).
The general problem is that the horizon does not end with CCID 2 and CCID 3, rather:
* CCID 4 is already in the pipeline (draft-ietf-dccp-tfrc-voip-05) to become RFC soon
* there is a proposed faster-restart variant for CCID 3/4, so we will have CCID-3-`fr'
and CCID-4-`fr' variants besides the `plain' standardised CCID 3/4
* people at NEC have developed and implemented (Kame) yet another CCID-3 variant
more specifically designed for VoIP applications
* given the experimental state of using CCIDs in the Internet, it is more than likely
that further CCIDs will soon evolve
Therefore the architectural concerns you are raising are of critical importance for the
overall success of the Linux DCCP framework.
It is very much worth to focus on the architecture: a generic and well-designed, modular
interface between main DCCP module and CCID `plug-ins' will save *much* frustration and
having to revamp again in the future -- when more CCIDs are happily added to the landscape.
I think it is wasting time to use temporary and non-scalable fixes and therefore would
like to expand on the following comments you made:
| I don't think tx queueing immedeately "helps" solving this problem. It's an
| architectural change that I'm proposing. In the current API, DCCP is the
| "master" and asks the CCID if it can send. The CCID either says yes, or says
| "ask me again in X time". I don't think this is great because some CCIDs don't
| really have a concept of time. In rate based protocols, perhaps there is a way
| to say "OK I can send after X time" but in window based protocols it's "upon the
| reception of the next ack". OK one can give an esitmate of time [rtt/cwnd?] but
| it still isn't great.
|
| By reversing the architecture, I think this problem is solved quite neatly. In
| this case, it is the CCID which drives DCCP. The API would be: CCID tells DCCP
| "hey, you can send" and DCCP sends happilly. Not a poll & push model as before,
| but rather a pull model from CCID's perspective.
<.....>
| To summarize, the API would be as follows. DCCP would implement:
| void pull(int x); /* Called by CCID, indicating that DCCP may send x packets */
|
| CCID would implement:
| void notify(int true); /* if true, CCID will pull from DCCP, else not */
I have the following further discussion items regarding the CCID <=> main module interface:
1/ TX Buffering: set size of TX ring buffer via socket option.
There are interesting simulations which confirm the merit of having such and such
queue lenghts; the current code does not have a limit on the TX queue; and I am
referring to an effective circular-list implementation in [LK05].
Maybe sendmsg() could just block if actual_qlen = max_qlen ??? The above callbacks
seem a useful start here => suggestions?
2/ Keeping track of Maximum Packet Size (MPS, RFC 4340, sec. 14).
The main module needs to keep track of MPS which is influenced by CCMPS, the MPS
determined by the CCID in use => need a way to communicate CCMPS to main module
3/ Fragmentation.
MPS is also determined by path MTU (which appears to work). Currently, unlike UDP,
EMSGSIZE is returned if buflen > MPS. The spec allows optional fragmentation for
such cases where CCMPS <= buflen <= MPS. Again, the CCID needs to be set first
(sockopt), then the CCMPS be communicated to main module.
4/ Interpretation/setting of the CCVal header field.
5/ Feature negotiation: the feature negotiation code also depends on current CCID value.
API-wise, my understanding is that it all starts with setting the CCID socket option where
apparently CCID 2 acts as common denominator, since
* new connections start with CCID 2 as default
* DCCP implementations "SHOULD implement at least CCID 2" [RFC 4340, sec. 10]
Hence if the CCID socket option is not set, fallback is CCID 2. Later, when feature negotiation
is over, the actual CCID in place can be queried via getsockopt - and at that time the CCID-in-place
is already communicating with the main module.
Lastly, a related issue is the use of the DCCP_SOCKOPT_PACKET_SIZE socket option: this
is a strange something, again is CCID-specific, and no current use for it can be seen
- Ian McDonald has a patch which removes it. I am still wondering whether and if it has any
use at all?
Comments, please.
Gerrit.
[LK05] Lai, Junwen and Eddie Kohler. Efficiency and late data choice in
a user-kernel interface for congestion-controlled datagrams. In
Surendar Chandra and Nalini Venkatasubramanian, editors, Twelfth
Annual Multimedia Computing and Networking (MMCN '05), San Jose,
California, volume 5680 of Proceedings of the SPIE, pages
136--142. 2005.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
@ 2006-09-21 12:20 ` Andrea Bittau
2006-09-22 0:15 ` Ian McDonald
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Andrea Bittau @ 2006-09-21 12:20 UTC (permalink / raw)
To: dccp
On Thu, Sep 21, 2006 at 09:30:21AM +0100, Gerrit Renker wrote:
> 1/ TX Buffering: set size of TX ring buffer via socket option.
The size of the TX buffer is interesting in applications which want to do their
own queue management. That is, real-time applications that would prefer
dropping certain packets and re-order other packets based on the state of the
session. We are used to the standard UNIX "push" model where you shove stuff in
the kernel via write. Perhaps a different architecture would be for the TX
buffer to be in user-land and the kernel to pull from it. There is a lot of
overhead [context-switch] added, but there might be a good way of coding this.
By doing so, the application chooses exactly what to send and when. Perhaps
this is equivalent to a 0 TX buffer size.
Alternatively, there could be an API for managing the TX buffer in the kernel,
or maybe tagging packets with an expiry time or something. The bottom line is,
that other than just regulating the TX buffer size, there might be smarter
things that we could do and may turn out to be useful. Giving the application
the power to control what is sent and when, in accordance with the CCID, will
allow the application to make use of all of DCCP's benefits.
> 5/ Feature negotiation: the feature negotiation code also depends on current CCID value.
Feature negotiation is somewhat there. Currently, you can setsockopt features.
What needs to be hooked up is the getsockopt on features in order to poll the
state of the feater [i.e. what is the outcome of the negotiation]. There is
kernel code for this, it just needs to be hooked up, tested and used.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
2006-09-21 12:20 ` Andrea Bittau
@ 2006-09-22 0:15 ` Ian McDonald
2006-09-22 0:18 ` Ian McDonald
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Ian McDonald @ 2006-09-22 0:15 UTC (permalink / raw)
To: dccp
> Lastly, a related issue is the use of the DCCP_SOCKOPT_PACKET_SIZE socket option: this
> is a strange something, again is CCID-specific, and no current use for it can be seen
> - Ian McDonald has a patch which removes it. I am still wondering whether and if it has any
> use at all?
>
It has been used in a number of applications but it doesn't work! It
is to implement section 5.3 of RFC4342. My patch to remove it is in
the context of other patches to introduce two which do work. It is to
specify one for each half connection. I hope to post these to the list
later today.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
2006-09-21 12:20 ` Andrea Bittau
2006-09-22 0:15 ` Ian McDonald
@ 2006-09-22 0:18 ` Ian McDonald
2006-09-22 4:11 ` Eddie Kohler
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Ian McDonald @ 2006-09-22 0:18 UTC (permalink / raw)
To: dccp
On 9/22/06, Andrea Bittau <a.bittau@cs.ucl.ac.uk> wrote:
> On Thu, Sep 21, 2006 at 09:30:21AM +0100, Gerrit Renker wrote:
> > 1/ TX Buffering: set size of TX ring buffer via socket option.
>
> The size of the TX buffer is interesting in applications which want to do their
> own queue management. That is, real-time applications that would prefer
> dropping certain packets and re-order other packets based on the state of the
> session. We are used to the standard UNIX "push" model where you shove stuff in
> the kernel via write. Perhaps a different architecture would be for the TX
> buffer to be in user-land and the kernel to pull from it. There is a lot of
> overhead [context-switch] added, but there might be a good way of coding this.
> By doing so, the application chooses exactly what to send and when. Perhaps
> this is equivalent to a 0 TX buffer size.
>
> Alternatively, there could be an API for managing the TX buffer in the kernel,
> or maybe tagging packets with an expiry time or something. The bottom line is,
> that other than just regulating the TX buffer size, there might be smarter
> things that we could do and may turn out to be useful. Giving the application
> the power to control what is sent and when, in accordance with the CCID, will
> allow the application to make use of all of DCCP's benefits.
>
I'm not sure whether Andrea is alluding to my code once again or not here.
In my patches I have online I do deliberate packet reordering and
expiry for experimental purposes. The difference is that mine is done
in kernel space rather than user space.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
` (2 preceding siblings ...)
2006-09-22 0:18 ` Ian McDonald
@ 2006-09-22 4:11 ` Eddie Kohler
2006-09-25 9:03 ` Gerrit Renker
2006-09-25 17:58 ` Ian McDonald
5 siblings, 0 replies; 7+ messages in thread
From: Eddie Kohler @ 2006-09-22 4:11 UTC (permalink / raw)
To: dccp
Hi, a short note;
Andrea Bittau wrote:
> On Thu, Sep 21, 2006 at 09:30:21AM +0100, Gerrit Renker wrote:
>> 1/ TX Buffering: set size of TX ring buffer via socket option.
>
> The size of the TX buffer is interesting in applications which want to do their
> own queue management. That is, real-time applications that would prefer
> dropping certain packets and re-order other packets based on the state of the
> session. We are used to the standard UNIX "push" model where you shove stuff in
> the kernel via write. Perhaps a different architecture would be for the TX
> buffer to be in user-land and the kernel to pull from it. There is a lot of
> overhead [context-switch] added, but there might be a good way of coding this.
> By doing so, the application chooses exactly what to send and when. Perhaps
> this is equivalent to a 0 TX buffer size.
Junwen Lai and I designed and built an API very much like this -- a transmit
ring in user space. Gerrit referred to the paper. It's actually similar to
the design Xen uses for its virtual network drivers.
Eddie
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
` (3 preceding siblings ...)
2006-09-22 4:11 ` Eddie Kohler
@ 2006-09-25 9:03 ` Gerrit Renker
2006-09-25 17:58 ` Ian McDonald
5 siblings, 0 replies; 7+ messages in thread
From: Gerrit Renker @ 2006-09-25 9:03 UTC (permalink / raw)
To: dccp
| > 1/ TX Buffering: set size of TX ring buffer via socket option.
|
| The size of the TX buffer is interesting in applications which want to do their
| own queue management. That is, real-time applications that would prefer
| dropping certain packets and re-order other packets based on the state of the
| session. We are used to the standard UNIX "push" model where you shove stuff in
| the kernel via write. Perhaps a different architecture would be for the TX
| buffer to be in user-land and the kernel to pull from it. There is a lot of
| overhead [context-switch] added, but there might be a good way of coding this.
| By doing so, the application chooses exactly what to send and when. Perhaps
| this is equivalent to a 0 TX buffer size.
|
| Alternatively, there could be an API for managing the TX buffer in the kernel,
| or maybe tagging packets with an expiry time or something. The bottom line is,
| that other than just regulating the TX buffer size, there might be smarter
| things that we could do and may turn out to be useful. Giving the application
| the power to control what is sent and when, in accordance with the CCID, will
| allow the application to make use of all of DCCP's benefits.
I went through Lai's paper again and read Ian's code: no need to go into user-space,
thanks to Ian's efforts, the same packet ring principle is already in the kernel.
Things that would be good to work on (imho) are:
* Notification Mechanism Between CCID <=> Main Module
The Lai implementation used a hybrid polling / asynchronous notification principle,
which involved setting a flag and using a syscall. The mechanism you suggested
earlier sounds very interesting - if you have any patches, could you make them
available via your website please?
* Setting Size of TX Buffer
This relates to the second paragraph above. There are simulations which show that
the size of the TX queue (currently infinite) influences the packet drop rate.
For further work, it would be good to set an upper limit on qlen.
As you say, with qlen = 0, one could disable in-kernel buffering if desired.
Wouldn't it then be that dcccp_sendmsg() either blocks or returns -EAGAIN
if the write_queue's qlen is > maximal value (special case for 0).
Gerrit
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: CCID2: Tell DCCP to quickly check whether cwnd is available
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
` (4 preceding siblings ...)
2006-09-25 9:03 ` Gerrit Renker
@ 2006-09-25 17:58 ` Ian McDonald
5 siblings, 0 replies; 7+ messages in thread
From: Ian McDonald @ 2006-09-25 17:58 UTC (permalink / raw)
To: dccp
> * Notification Mechanism Between CCID <=> Main Module
> The Lai implementation used a hybrid polling / asynchronous notification principle,
> which involved setting a flag and using a syscall. The mechanism you suggested
> earlier sounds very interesting - if you have any patches, could you make them
> available via your website please?
I'm not sure if the above is addressed at me. In case it is all my
patches are currently on the web.
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-09-25 17:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-21 8:30 CCID2: Tell DCCP to quickly check whether cwnd is available Gerrit Renker
2006-09-21 12:20 ` Andrea Bittau
2006-09-22 0:15 ` Ian McDonald
2006-09-22 0:18 ` Ian McDonald
2006-09-22 4:11 ` Eddie Kohler
2006-09-25 9:03 ` Gerrit Renker
2006-09-25 17:58 ` Ian McDonald
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).