From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
To: Andrew Morton <akpm@osdl.org>,
davem@davemloft.net, yoshfuji@linux-ipv6.org,
kuznet@ms2.inr.ac.ru, jmorris@namei.org, kaber@coreworks.de,
pekkas@netcore.fi
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
Date: Fri, 14 Jul 2006 17:19:02 +0100 [thread overview]
Message-ID: <200607141719.02766@strip-the-willow> (raw)
Generic support (header files, configuration, and documentation)
for the UDP-Lite protocol (RFC 3828).
This has been tested successfully on AMD, i386/i686, and SMP
architectures (compiles cleanly and works under different configurations).
With regard to the previously submitted patch set, no conceptual
changes (it works), but a lot of tidying up and re-organisation:
* the number of changes to other files has been reduced to the
bare minimum (would greatly value ideas for further reduction),
* ifdefs have been replaced by generic functions (whose implementation
depends on CONFIG_IP_UDPLITE),
* much of the declarations and concepts shared between v4 and v6 UDP-
Lite is now concentrated in the header file
I hope that this makes the code a lot more accessible. Both ipv4/udplite.c
and ipv6/udplite.c derive from the respective udp.c variants.
There is documentation enclosed, as well as a few (v6-enabled) applications
to test the kernel implementation. These should be easy to built - please
give it a test run if you have a little time.
Previous comments have been a great help in improving the code: I would greatly
value any further ideas, suggestions, and comments -- in particular ACKS & NAKS
or application.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: William Stanislaus <william@erg.abdn.ac.uk>
---
Documentation/networking/udplite.txt | 334 +++++++++++++++++++++++++++++++++++
include/linux/in.h | 1
include/linux/socket.h | 1
include/linux/udplite.h | 96 ++++++++++
include/net/udplite.h | 196 ++++++++++++++++++++
net/core/sock.c | 7
net/ipv4/Kconfig | 21 ++
7 files changed, 655 insertions(+), 1 deletion(-)
diff -Nurp a/net/ipv4/Kconfig b/net/ipv4/Kconfig
--- a/net/ipv4/Kconfig 2006-07-14 10:15:27.000000000 +0100
+++ b/net/ipv4/Kconfig 2006-07-14 10:17:50.000000000 +0100
@@ -581,3 +581,24 @@ config TCP_CONG_BIC
source "net/ipv4/ipvs/Kconfig"
+config IP_UDPLITE
+ bool "The UDP-Lite Protocol (EXPERIMENTAL)"
+ depends on INET && EXPERIMENTAL
+ default n
+ ---help---
+ The UDP-Lite Protocol <http://www.ietf.org/rfc/rfc3828.txt>
+
+ UDP-Lite is a Standards-Track IETF transport protocol (RFC 3828). It
+ features a variable-length checksum; which allows partially damaged
+ packets to be forwarded to media codecs, rather than being discarded
+ due to invalid (UDP) checksum values. This can have advantages for the
+ transport of multimedia (e.g. video/audio) over wireless networks.
+
+ The protocol runs on both IPv4 and IPv6. The socket API resembles that
+ of UDP. Applications must indicate their wish to utilise the partial
+ checksum coverage feature by setting a socket option; UDP-Lite will
+ otherwise run in (compatible) UDP mode.
+
+ Detailed documentation in <file:Documentation/networking/udplite.txt>.
+
+ If in doubt, say N.
diff -Nurp a/include/linux/udplite.h b/include/linux/udplite.h
--- a/include/linux/udplite.h 1970-01-01 01:00:00.000000000 +0100
+++ b/include/linux/udplite.h 2006-07-14 10:17:50.000000000 +0100
@@ -0,0 +1,96 @@
+/*
+ * Header file for UDP Lite (RFC 3828).
+ *
+ * Version: see net/ipv4/udplite.c
+ *
+ * Authors: Gerrit Renker, <gerrit@erg.abdn.ac.uk>
+ * William Stanislaus, <william@erg.abdn.ac.uk>
+ *
+ * Fixes:
+ * Changes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _LINUX_UDPLITE_H
+#define _LINUX_UDPLITE_H
+#include <linux/types.h>
+
+/**
+ * struct udplitehdr - UDP-Lite header re-interpreting UDP (RFC 768) fields
+ *
+ * @source: source port number (as in UDP)
+ * @dest: destination port number (as in UDP)
+ * @checklen: checksum coverage length
+ * @check: checksum field (as in UDP)
+ *
+ * For the detailed semantics see RFC 3828.
+ */
+struct udplitehdr {
+ __u16 source;
+ __u16 dest;
+ __u16 checklen;
+ __u16 check;
+};
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10
+#define UDPLITE_RECV_CSCOV 11
+
+
+#ifdef __KERNEL__
+#include <linux/config.h>
+#include <net/sock.h>
+#include <linux/ip.h>
+
+/**
+ * struct udplite_sock - unreliable, connection-less UDP-Lite service
+ *
+ * @inet: has to be the first member
+ * @pending: any pending frames?
+ * @corkflag: when cork is required
+ * @encap_type: is this an encapsulation socket?
+ * @len: total length of pending frames
+ * @pcslen: partial checksum coverage length for sending socket
+ * @pcrlen: partial checksum coverage length for receiving socket
+ * @pcflag: partial checksum coverage flag
+ *
+ * NOTE: Checksum coverage length has different semantics for sending and
+ * receiving sockets.
+ *
+ */
+struct udplite_sock {
+ struct inet_sock inet;
+ int pending;
+ unsigned int corkflag;
+ __u16 encap_type;
+ /* The following members retain the information to create a
+ * UDP-Lite header when the socket is uncorked. */
+ __u16 len;
+ __u16 pcslen;
+ __u16 pcrlen;
+/* checksum coverage set indicators used by pcflag */
+#define UDPLITE_SEND_CC 0x1
+#define UDPLITE_RECV_CC 0x2
+ __u8 pcflag;
+};
+
+static inline struct udplite_sock *udplite_sk(const struct sock *sk)
+{
+ return (struct udplite_sock *)sk;
+}
+
+
+struct udplite6_sock {
+ struct udplite_sock udpl;
+ /*
+ * ipv6_pinfo has to be the last member of udplite6_sock,
+ * see inet6_sk_generic.
+ */
+ struct ipv6_pinfo inet6;
+};
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_UDPLITE_H */
diff -Nurp a/include/net/udplite.h b/include/net/udplite.h
--- a/include/net/udplite.h 1970-01-01 01:00:00.000000000 +0100
+++ b/include/net/udplite.h 2006-07-14 10:17:50.000000000 +0100
@@ -0,0 +1,196 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ *
+ * Version: see net/ipv4/udplite.c
+ *
+ * Authors: Gerrit Renker, <gerrit@erg.abdn.ac.uk>
+ * William Stanislaus, <william@erg.abdn.ac.uk>
+ *
+ * Fixes:
+ * Changes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * NOTE: In UDP-Lite the checksum MUST always be computed, hence there is
+ * no UDPLITE_CSUM_DEFAULT and no UDPLITE_CSUM_NOXMIT here.
+ */
+
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+#ifdef CONFIG_IP_UDPLITE
+#include <linux/udplite.h>
+#include <net/snmp.h>
+#include <net/udp.h> /* for UDP_HTABLE_SIZE and proc structures */
+/*
+ * Global variables
+ */
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+extern rwlock_t udplite_hash_lock;
+extern int udplite_port_rover;
+
+/**
+ * struct udp_lite_skb - UDP-Lite private variables
+ *
+ * @cscov: checksum coverage length
+ * @partial: flag, if set indicates partial csum coverage
+ * @header: private variables used by IPv4/v6 (thanks tcp.h!)
+ */
+struct udplite_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial;
+};
+#define UDPLITE_SKB_CB(skb) ((struct udplite_skb_cb *)&((skb)->cb))
+
+
+/*
+ * Inlined functions shared between UDP-Litev4 and UDP-Litev6
+ */
+
+/* shared port space between ipv4/udplite.c and ipv6/udplite.c, cf. udp.c */
+static inline int udplite_lport_inuse(u16 num)
+{
+ struct sock *sk;
+ struct hlist_node *node;
+
+ sk_for_each(sk, node, &udplite_hash[num & (UDP_HTABLE_SIZE - 1)])
+ if (inet_sk(sk)->num == num)
+ return 1;
+ return 0;
+}
+
+/*
+ * Calculate / check variable-length UDP-Lite checksum
+ * skb->csum holds the sum of the IPv4 or IPv6 pseudo-header.
+ */
+static inline u16 __udplite_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDPLITE_SKB_CB(skb)->partial)
+ return __skb_checksum_complete(skb);
+
+ return csum_fold(skb_checksum(skb, 0, UDPLITE_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static inline u16 udplite_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udplite_checksum_complete(skb);
+}
+
+/*
+ * UDP-Lite checksum computation is all in software, hence simpler getfrag
+ */
+static inline int udplite_getfrag(void *from, char *to, int offset, int len,
+ int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+
+/*
+ * net/ipv4/udplite.c
+ */
+extern void udplite4_register(void);
+extern int udplite4_mib_init(void);
+extern unsigned int udplite_poll(struct file *file, struct socket *sock,
+ poll_table * wait);
+extern int udplite_rcv(struct sk_buff *skb);
+extern void udplite_err(struct sk_buff *, u32);
+extern int udplite_disconnect(struct sock *sk, int flags);
+extern int udplite_ioctl(struct sock *sk, int cmd, unsigned long arg);
+extern int udplite_sendmsg(struct kiocb *iocb, struct sock *sk,
+ struct msghdr *msg, size_t len);
+#ifdef CONFIG_PROC_FS
+extern int udplite_proc_register(struct udp_seq_afinfo *afinfo);
+extern void udplite_proc_unregister(struct udp_seq_afinfo *afinfo);
+
+extern int udplite4_proc_init(void);
+extern void udplite4_proc_exit(void);
+#endif /* CONFIG_PROC_FS */
+
+
+/*
+ * net/ipv6/udplite.c
+ */
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+extern struct proto udplitev6_prot;
+extern void udplitev6_init(void);
+extern void udplitev6_exit(void);
+#ifdef CONFIG_PROC_FS
+extern int udplite6_proc_init(void);
+extern void udplite6_proc_exit(void);
+#endif
+#endif /* CONFIG_IPV6 */
+
+
+/*
+ * MIB / runtime statistics for UDP-Litev4 and UDP-Litev6.
+ */
+enum {
+ UDPLITE_MIB_NUM = 0,
+ UDPLITE_MIB_INDATAGRAMS, /* total received datagramns */
+ UDPLITE_MIB_IN_PARTIALCOV, /* rcvd datagrams with partial coverage */
+ UDPLITE_MIB_NOPORTS, /* rcvd datagrams to wrong ports */
+ UDPLITE_MIB_INERRORS, /* total erroneous received datagrams */
+ UDPLITE_MIB_IN_BAD_COV, /* checksum coverage errors */
+ UDPLITE_MIB_IN_BAD_CSUM, /* checksum itself did not qualify */
+ UDPLITE_MIB_OUTDATAGRAMS, /* total sent datagrams */
+ UDPLITE_MIB_OUT_PARTIALCOV, /* sent datagrams with partial coverage */
+ __UDPLITE_MIB_MAX
+};
+
+struct udplite_mib {
+ unsigned long mibs[__UDPLITE_MIB_MAX];
+} __SNMP_MIB_ALIGN__;
+
+
+DECLARE_SNMP_STAT(struct udplite_mib, udplite_statistics);
+#define UDPLITE_INC_STATS(f) SNMP_INC_STATS(udplite_statistics, f)
+#define UDPLITE_INC_STATS_BH(f) SNMP_INC_STATS_BH(udplite_statistics, f)
+#define UDPLITE_INC_STATS_USER(f) SNMP_INC_STATS_USER(udplite_statistics, f)
+
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+DECLARE_SNMP_STAT(struct udplite_mib, udplite_stats_in6);
+#define UDPLITE6_INC_STATS(f) SNMP_INC_STATS(udplite_stats_in6, f)
+#define UDPLITE6_INC_STATS_BH(f) SNMP_INC_STATS_BH(udplite_stats_in6, f)
+#define UDPLITE6_INC_STATS_USER(f) SNMP_INC_STATS_USER(udplite_stats_in6, f)
+#endif /* CONFIG_IPV6 */
+
+
+
+#else /* CONFIG_IP_UDPLITE is not enabled */
+
+
+/* empty v4 function definitions: */
+inline void udplite4_register(void) { }
+#define udplite4_mib_init(void) (1)
+#ifdef CONFIG_PROC_FS
+#define udplite4_proc_init(void) (0)
+inline void udplite4_proc_exit(void) { }
+#endif /* CONFIG_PROC_FS */
+
+
+/* empty v6 function definitions: */
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+inline void udplitev6_init(void) { }
+inline void udplitev6_exit(void) { }
+#ifdef CONFIG_PROC_FS
+#define udplite6_proc_init(void) (0)
+inline void udplite6_proc_exit(void) { }
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_IPV6 */
+
+#endif /* CONFIG_IP_UDPLITE */
+
+
+#endif /* _UDPLITE_H */
diff -Nurp a/include/linux/socket.h b/include/linux/socket.h
--- a/include/linux/socket.h 2006-07-06 09:08:15.000000000 +0100
+++ b/include/linux/socket.h 2006-07-14 10:17:50.000000000 +0100
@@ -264,6 +264,7 @@ struct ucred {
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff -Nurp a/include/linux/in.h b/include/linux/in.h
--- a/include/linux/in.h 2006-06-19 08:45:25.000000000 +0100
+++ b/include/linux/in.h 2006-07-14 10:17:50.000000000 +0100
@@ -44,6 +44,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite Protocol (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff -Nurp a/net/core/sock.c b/net/core/sock.c
--- a/net/core/sock.c 2006-07-06 09:08:24.000000000 +0100
+++ b/net/core/sock.c 2006-07-14 10:17:50.000000000 +0100
@@ -479,7 +479,12 @@ set_rcvbuf:
break;
case SO_NO_CHECK:
- sk->sk_no_check = valbool;
+ /* UDP-Lite (RFC 3828) mandates checksumming,
+ * hence user must not enable this option. */
+ if (sk->sk_protocol == IPPROTO_UDPLITE)
+ ret = -EOPNOTSUPP;
+ else
+ sk->sk_no_check = valbool;
break;
case SO_PRIORITY:
diff -Nurp a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
--- a/Documentation/networking/udplite.txt 1970-01-01 01:00:00.000000000 +0100
+++ b/Documentation/networking/udplite.txt 2006-07-14 10:17:50.000000000 +0100
@@ -0,0 +1,334 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+ last modified: Fri 14th July 2006
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can always also pull the latest patch for the stable
+ kernel tree and example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Ethereal UDP-Lite WiKi (with capture files):
+ http://wiki.ethereal.com/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828: http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+ UDP-Lite is not enabled per default, set CONFIG_IP_UDPLITE=y to support it.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ dead easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ Since both UDP-Litev4 and UDP-Litev6 are supported, the porting process is the
+ same in both occasions. With just this change you are able to run UDP-Lite
+ services or connect to UDP-Lite servers. The kernel will assume that you are
+ not interested in using partial checksum coverage and so emulate UDP mode.
+
+ To make use of the partial checksum coverage facilities requires setting just
+ one socket option which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level must be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, trying to disable the UDP-Lite checksum
+ (option SO_NO_CHECK) in setsockopt(2) results in an error. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always result in an error, while
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ will always return a value of 0 (meaning checksum enabled). Packets
+ with a zero checksum field are silently discarded by the receiver.
+
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams (as in UDP).
+
+ InPartialCov: Number of received datagrams with csum coverage < length.
+
+ NoPorts: Number of packets received to an unknown port (as in UDP).
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet (cf. below)
+ * checksum coverage violated (InBadCoverage)
+ * bad checksum (InBadChecksum)
+
+ InBadCoverage: Datagrams with invalid checksum coverage (also InErrors):
+ * coverage length is less than the minimum 8
+ * coverage length exceeds actual datagram length
+
+ InBadChecksum: Datagrams with wrong checksum (also InErrors).
+
+ OutDatagrams: Total number of sent datagrams.
+
+ OutPartialCov: Number of sent datagrams with csum coverage < length.
+
+ If a receiving application has specified a minimum coverage length and
+ received a packet with a smaller coverage value than this, or if it has
+ specified full coverage (UDP mode) and received a partially covered packet,
+ this counts as error (under InErrors), and an error message is logged.
+
+ These statistics variables obey the following relations:
+
+ Total_received_datagrams = InDatagrams + InErrors + NoPorts
+
+ InErrors >= InBadCoverage + InBadChecksum
+
+ The `>' includes other errors such as socket queue errors (usually 0). For
+ IPv6, the same statistics variables are used, using the `UdpLite6' prefix,
+ and can be viewed using "grep ^UdpLite6 /proc/net/snmp6". Alternatively,
+ you can use the `nstat' utility found in the iproute2 package.
+
+
+
+ VI) OPEN ISSUES
+
+ 1) Sharing Code with UDP
+
+ On the mailing list there has been a suggestion to share code between
+ ipv?/udp.c and ipv?/udplite.c. There is indeed a potential for this, but the
+ challenge is not to mess up existing code. A line-by-line comparison between
+ ipv4/udp.c and ipv4/udplite.c revealed the following similarities:
+
+ * 45 functions appear in udp.c and modified in udplite.c
+ * 26 are used with trivial modifications (sed/perl could do this)
+ * 10 are used with minor changes (structure / sockopt names)
+ * 8 require real modifications (in control flow and algorithm)
+ * 1 function is missing in udplite.c (no equivalent of udp_check())
+
+ A summary of this analysis can be found on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/udplite-comparison.html
+
+ Further similarities include structure identifiers, hence udp_seq_afinfo is
+ e.g. already reused in UDP-Lite; as are several UDP constants.
+
+ However, I doubt whether merging will make things better. In a lot of cases
+ the code is functionally identical but depends and operates on global data
+ structures and locks which are exported as kernel symbols:
+ * udp(lite)_hash
+ * udp(lite)_hash_lock
+ * udp(lite)_port_rover
+ * udp(lite)_statistics
+ Hence it would be necessary to rename these globals apart in both source code
+ files, which would lead to a lot of #ifdefs in udp.c and introduce a fragile
+ dependency between both. Any change made in udp.c would thus immediately
+ propagate to udplite.c - manual revision remains inevitable.
+
+ 2) MIB Standardisation
+
+ A MIB for UDP-Lite does not (yet) exist. For someone who is familiar with
+ SNMP/ASN.1 it would be an easy task to turn the above variable definitions
+ into a MIB - in the same manner as per e.g. RFC 2013. Anyone interested in
+ helping with this work should contact us at <gerrit@erg.abdn.ac.uk>.
+
+
next reply other threads:[~2006-07-14 16:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-14 16:19 Gerrit Renker [this message]
2006-07-15 13:33 ` [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support Herbert Xu
2006-07-16 9:29 ` Gerrit Renker
2006-07-28 5:30 ` David Miller
2006-07-28 8:19 ` Gerrit Renker
2006-07-28 8:25 ` David Miller
2006-09-19 7:25 ` [PATCHv3 1/4][RFC] net/ipv4: consolidated UDP / UDP-Lite code Gerrit Renker
2006-10-09 9:51 ` [PATCH-update][RFC] net: " Gerrit Renker
2006-10-11 2:38 ` David Miller
2006-10-11 7:40 ` Gerrit Renker
2006-10-12 7:49 ` Gerrit Renker
2006-10-12 9:01 ` David Miller
2006-10-13 15:14 ` [PATCHv4 1/3] net/ipv4: UDP-Lite support (RFC 3828) Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 2/3] net/ipv6: v6-side of UDP-Lite Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 3/3] net: UDP-Lite misc files Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 2/4][RFC] net/ipv4: self-contained UDP-Lite module Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite Gerrit Renker
2006-09-19 7:37 ` Patrick McHardy
2006-09-19 7:25 ` [PATCHv3 4/4][RFC] net: misc. files to support UDP-Lite Gerrit Renker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200607141719.02766@strip-the-willow \
--to=gerrit@erg.abdn.ac.uk \
--cc=akpm@osdl.org \
--cc=davem@davemloft.net \
--cc=jmorris@namei.org \
--cc=kaber@coreworks.de \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pekkas@netcore.fi \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.