* [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
@ 2006-07-14 16:19 Gerrit Renker
2006-07-15 13:33 ` Herbert Xu
2006-07-28 5:30 ` David Miller
0 siblings, 2 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-07-14 16:19 UTC (permalink / raw)
To: Andrew Morton, davem, yoshfuji, kuznet, jmorris, kaber, pekkas
Cc: netdev, linux-kernel
Generic support (header files, configuration, and documentation)
for the UDP-Lite protocol (RFC 3828).
This has been tested successfully on AMD, i386/i686, and SMP
architectures (compiles cleanly and works under different configurations).
With regard to the previously submitted patch set, no conceptual
changes (it works), but a lot of tidying up and re-organisation:
* the number of changes to other files has been reduced to the
bare minimum (would greatly value ideas for further reduction),
* ifdefs have been replaced by generic functions (whose implementation
depends on CONFIG_IP_UDPLITE),
* much of the declarations and concepts shared between v4 and v6 UDP-
Lite is now concentrated in the header file
I hope that this makes the code a lot more accessible. Both ipv4/udplite.c
and ipv6/udplite.c derive from the respective udp.c variants.
There is documentation enclosed, as well as a few (v6-enabled) applications
to test the kernel implementation. These should be easy to built - please
give it a test run if you have a little time.
Previous comments have been a great help in improving the code: I would greatly
value any further ideas, suggestions, and comments -- in particular ACKS & NAKS
or application.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: William Stanislaus <william@erg.abdn.ac.uk>
---
Documentation/networking/udplite.txt | 334 +++++++++++++++++++++++++++++++++++
include/linux/in.h | 1
include/linux/socket.h | 1
include/linux/udplite.h | 96 ++++++++++
include/net/udplite.h | 196 ++++++++++++++++++++
net/core/sock.c | 7
net/ipv4/Kconfig | 21 ++
7 files changed, 655 insertions(+), 1 deletion(-)
diff -Nurp a/net/ipv4/Kconfig b/net/ipv4/Kconfig
--- a/net/ipv4/Kconfig 2006-07-14 10:15:27.000000000 +0100
+++ b/net/ipv4/Kconfig 2006-07-14 10:17:50.000000000 +0100
@@ -581,3 +581,24 @@ config TCP_CONG_BIC
source "net/ipv4/ipvs/Kconfig"
+config IP_UDPLITE
+ bool "The UDP-Lite Protocol (EXPERIMENTAL)"
+ depends on INET && EXPERIMENTAL
+ default n
+ ---help---
+ The UDP-Lite Protocol <http://www.ietf.org/rfc/rfc3828.txt>
+
+ UDP-Lite is a Standards-Track IETF transport protocol (RFC 3828). It
+ features a variable-length checksum; which allows partially damaged
+ packets to be forwarded to media codecs, rather than being discarded
+ due to invalid (UDP) checksum values. This can have advantages for the
+ transport of multimedia (e.g. video/audio) over wireless networks.
+
+ The protocol runs on both IPv4 and IPv6. The socket API resembles that
+ of UDP. Applications must indicate their wish to utilise the partial
+ checksum coverage feature by setting a socket option; UDP-Lite will
+ otherwise run in (compatible) UDP mode.
+
+ Detailed documentation in <file:Documentation/networking/udplite.txt>.
+
+ If in doubt, say N.
diff -Nurp a/include/linux/udplite.h b/include/linux/udplite.h
--- a/include/linux/udplite.h 1970-01-01 01:00:00.000000000 +0100
+++ b/include/linux/udplite.h 2006-07-14 10:17:50.000000000 +0100
@@ -0,0 +1,96 @@
+/*
+ * Header file for UDP Lite (RFC 3828).
+ *
+ * Version: see net/ipv4/udplite.c
+ *
+ * Authors: Gerrit Renker, <gerrit@erg.abdn.ac.uk>
+ * William Stanislaus, <william@erg.abdn.ac.uk>
+ *
+ * Fixes:
+ * Changes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _LINUX_UDPLITE_H
+#define _LINUX_UDPLITE_H
+#include <linux/types.h>
+
+/**
+ * struct udplitehdr - UDP-Lite header re-interpreting UDP (RFC 768) fields
+ *
+ * @source: source port number (as in UDP)
+ * @dest: destination port number (as in UDP)
+ * @checklen: checksum coverage length
+ * @check: checksum field (as in UDP)
+ *
+ * For the detailed semantics see RFC 3828.
+ */
+struct udplitehdr {
+ __u16 source;
+ __u16 dest;
+ __u16 checklen;
+ __u16 check;
+};
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10
+#define UDPLITE_RECV_CSCOV 11
+
+
+#ifdef __KERNEL__
+#include <linux/config.h>
+#include <net/sock.h>
+#include <linux/ip.h>
+
+/**
+ * struct udplite_sock - unreliable, connection-less UDP-Lite service
+ *
+ * @inet: has to be the first member
+ * @pending: any pending frames?
+ * @corkflag: when cork is required
+ * @encap_type: is this an encapsulation socket?
+ * @len: total length of pending frames
+ * @pcslen: partial checksum coverage length for sending socket
+ * @pcrlen: partial checksum coverage length for receiving socket
+ * @pcflag: partial checksum coverage flag
+ *
+ * NOTE: Checksum coverage length has different semantics for sending and
+ * receiving sockets.
+ *
+ */
+struct udplite_sock {
+ struct inet_sock inet;
+ int pending;
+ unsigned int corkflag;
+ __u16 encap_type;
+ /* The following members retain the information to create a
+ * UDP-Lite header when the socket is uncorked. */
+ __u16 len;
+ __u16 pcslen;
+ __u16 pcrlen;
+/* checksum coverage set indicators used by pcflag */
+#define UDPLITE_SEND_CC 0x1
+#define UDPLITE_RECV_CC 0x2
+ __u8 pcflag;
+};
+
+static inline struct udplite_sock *udplite_sk(const struct sock *sk)
+{
+ return (struct udplite_sock *)sk;
+}
+
+
+struct udplite6_sock {
+ struct udplite_sock udpl;
+ /*
+ * ipv6_pinfo has to be the last member of udplite6_sock,
+ * see inet6_sk_generic.
+ */
+ struct ipv6_pinfo inet6;
+};
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_UDPLITE_H */
diff -Nurp a/include/net/udplite.h b/include/net/udplite.h
--- a/include/net/udplite.h 1970-01-01 01:00:00.000000000 +0100
+++ b/include/net/udplite.h 2006-07-14 10:17:50.000000000 +0100
@@ -0,0 +1,196 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ *
+ * Version: see net/ipv4/udplite.c
+ *
+ * Authors: Gerrit Renker, <gerrit@erg.abdn.ac.uk>
+ * William Stanislaus, <william@erg.abdn.ac.uk>
+ *
+ * Fixes:
+ * Changes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * NOTE: In UDP-Lite the checksum MUST always be computed, hence there is
+ * no UDPLITE_CSUM_DEFAULT and no UDPLITE_CSUM_NOXMIT here.
+ */
+
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+#ifdef CONFIG_IP_UDPLITE
+#include <linux/udplite.h>
+#include <net/snmp.h>
+#include <net/udp.h> /* for UDP_HTABLE_SIZE and proc structures */
+/*
+ * Global variables
+ */
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+extern rwlock_t udplite_hash_lock;
+extern int udplite_port_rover;
+
+/**
+ * struct udp_lite_skb - UDP-Lite private variables
+ *
+ * @cscov: checksum coverage length
+ * @partial: flag, if set indicates partial csum coverage
+ * @header: private variables used by IPv4/v6 (thanks tcp.h!)
+ */
+struct udplite_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial;
+};
+#define UDPLITE_SKB_CB(skb) ((struct udplite_skb_cb *)&((skb)->cb))
+
+
+/*
+ * Inlined functions shared between UDP-Litev4 and UDP-Litev6
+ */
+
+/* shared port space between ipv4/udplite.c and ipv6/udplite.c, cf. udp.c */
+static inline int udplite_lport_inuse(u16 num)
+{
+ struct sock *sk;
+ struct hlist_node *node;
+
+ sk_for_each(sk, node, &udplite_hash[num & (UDP_HTABLE_SIZE - 1)])
+ if (inet_sk(sk)->num == num)
+ return 1;
+ return 0;
+}
+
+/*
+ * Calculate / check variable-length UDP-Lite checksum
+ * skb->csum holds the sum of the IPv4 or IPv6 pseudo-header.
+ */
+static inline u16 __udplite_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDPLITE_SKB_CB(skb)->partial)
+ return __skb_checksum_complete(skb);
+
+ return csum_fold(skb_checksum(skb, 0, UDPLITE_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static inline u16 udplite_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udplite_checksum_complete(skb);
+}
+
+/*
+ * UDP-Lite checksum computation is all in software, hence simpler getfrag
+ */
+static inline int udplite_getfrag(void *from, char *to, int offset, int len,
+ int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+
+/*
+ * net/ipv4/udplite.c
+ */
+extern void udplite4_register(void);
+extern int udplite4_mib_init(void);
+extern unsigned int udplite_poll(struct file *file, struct socket *sock,
+ poll_table * wait);
+extern int udplite_rcv(struct sk_buff *skb);
+extern void udplite_err(struct sk_buff *, u32);
+extern int udplite_disconnect(struct sock *sk, int flags);
+extern int udplite_ioctl(struct sock *sk, int cmd, unsigned long arg);
+extern int udplite_sendmsg(struct kiocb *iocb, struct sock *sk,
+ struct msghdr *msg, size_t len);
+#ifdef CONFIG_PROC_FS
+extern int udplite_proc_register(struct udp_seq_afinfo *afinfo);
+extern void udplite_proc_unregister(struct udp_seq_afinfo *afinfo);
+
+extern int udplite4_proc_init(void);
+extern void udplite4_proc_exit(void);
+#endif /* CONFIG_PROC_FS */
+
+
+/*
+ * net/ipv6/udplite.c
+ */
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+extern struct proto udplitev6_prot;
+extern void udplitev6_init(void);
+extern void udplitev6_exit(void);
+#ifdef CONFIG_PROC_FS
+extern int udplite6_proc_init(void);
+extern void udplite6_proc_exit(void);
+#endif
+#endif /* CONFIG_IPV6 */
+
+
+/*
+ * MIB / runtime statistics for UDP-Litev4 and UDP-Litev6.
+ */
+enum {
+ UDPLITE_MIB_NUM = 0,
+ UDPLITE_MIB_INDATAGRAMS, /* total received datagramns */
+ UDPLITE_MIB_IN_PARTIALCOV, /* rcvd datagrams with partial coverage */
+ UDPLITE_MIB_NOPORTS, /* rcvd datagrams to wrong ports */
+ UDPLITE_MIB_INERRORS, /* total erroneous received datagrams */
+ UDPLITE_MIB_IN_BAD_COV, /* checksum coverage errors */
+ UDPLITE_MIB_IN_BAD_CSUM, /* checksum itself did not qualify */
+ UDPLITE_MIB_OUTDATAGRAMS, /* total sent datagrams */
+ UDPLITE_MIB_OUT_PARTIALCOV, /* sent datagrams with partial coverage */
+ __UDPLITE_MIB_MAX
+};
+
+struct udplite_mib {
+ unsigned long mibs[__UDPLITE_MIB_MAX];
+} __SNMP_MIB_ALIGN__;
+
+
+DECLARE_SNMP_STAT(struct udplite_mib, udplite_statistics);
+#define UDPLITE_INC_STATS(f) SNMP_INC_STATS(udplite_statistics, f)
+#define UDPLITE_INC_STATS_BH(f) SNMP_INC_STATS_BH(udplite_statistics, f)
+#define UDPLITE_INC_STATS_USER(f) SNMP_INC_STATS_USER(udplite_statistics, f)
+
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+DECLARE_SNMP_STAT(struct udplite_mib, udplite_stats_in6);
+#define UDPLITE6_INC_STATS(f) SNMP_INC_STATS(udplite_stats_in6, f)
+#define UDPLITE6_INC_STATS_BH(f) SNMP_INC_STATS_BH(udplite_stats_in6, f)
+#define UDPLITE6_INC_STATS_USER(f) SNMP_INC_STATS_USER(udplite_stats_in6, f)
+#endif /* CONFIG_IPV6 */
+
+
+
+#else /* CONFIG_IP_UDPLITE is not enabled */
+
+
+/* empty v4 function definitions: */
+inline void udplite4_register(void) { }
+#define udplite4_mib_init(void) (1)
+#ifdef CONFIG_PROC_FS
+#define udplite4_proc_init(void) (0)
+inline void udplite4_proc_exit(void) { }
+#endif /* CONFIG_PROC_FS */
+
+
+/* empty v6 function definitions: */
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+inline void udplitev6_init(void) { }
+inline void udplitev6_exit(void) { }
+#ifdef CONFIG_PROC_FS
+#define udplite6_proc_init(void) (0)
+inline void udplite6_proc_exit(void) { }
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_IPV6 */
+
+#endif /* CONFIG_IP_UDPLITE */
+
+
+#endif /* _UDPLITE_H */
diff -Nurp a/include/linux/socket.h b/include/linux/socket.h
--- a/include/linux/socket.h 2006-07-06 09:08:15.000000000 +0100
+++ b/include/linux/socket.h 2006-07-14 10:17:50.000000000 +0100
@@ -264,6 +264,7 @@ struct ucred {
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff -Nurp a/include/linux/in.h b/include/linux/in.h
--- a/include/linux/in.h 2006-06-19 08:45:25.000000000 +0100
+++ b/include/linux/in.h 2006-07-14 10:17:50.000000000 +0100
@@ -44,6 +44,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite Protocol (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff -Nurp a/net/core/sock.c b/net/core/sock.c
--- a/net/core/sock.c 2006-07-06 09:08:24.000000000 +0100
+++ b/net/core/sock.c 2006-07-14 10:17:50.000000000 +0100
@@ -479,7 +479,12 @@ set_rcvbuf:
break;
case SO_NO_CHECK:
- sk->sk_no_check = valbool;
+ /* UDP-Lite (RFC 3828) mandates checksumming,
+ * hence user must not enable this option. */
+ if (sk->sk_protocol == IPPROTO_UDPLITE)
+ ret = -EOPNOTSUPP;
+ else
+ sk->sk_no_check = valbool;
break;
case SO_PRIORITY:
diff -Nurp a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
--- a/Documentation/networking/udplite.txt 1970-01-01 01:00:00.000000000 +0100
+++ b/Documentation/networking/udplite.txt 2006-07-14 10:17:50.000000000 +0100
@@ -0,0 +1,334 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+ last modified: Fri 14th July 2006
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can always also pull the latest patch for the stable
+ kernel tree and example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Ethereal UDP-Lite WiKi (with capture files):
+ http://wiki.ethereal.com/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828: http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+ UDP-Lite is not enabled per default, set CONFIG_IP_UDPLITE=y to support it.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ dead easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ Since both UDP-Litev4 and UDP-Litev6 are supported, the porting process is the
+ same in both occasions. With just this change you are able to run UDP-Lite
+ services or connect to UDP-Lite servers. The kernel will assume that you are
+ not interested in using partial checksum coverage and so emulate UDP mode.
+
+ To make use of the partial checksum coverage facilities requires setting just
+ one socket option which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level must be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, trying to disable the UDP-Lite checksum
+ (option SO_NO_CHECK) in setsockopt(2) results in an error. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always result in an error, while
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ will always return a value of 0 (meaning checksum enabled). Packets
+ with a zero checksum field are silently discarded by the receiver.
+
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams (as in UDP).
+
+ InPartialCov: Number of received datagrams with csum coverage < length.
+
+ NoPorts: Number of packets received to an unknown port (as in UDP).
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet (cf. below)
+ * checksum coverage violated (InBadCoverage)
+ * bad checksum (InBadChecksum)
+
+ InBadCoverage: Datagrams with invalid checksum coverage (also InErrors):
+ * coverage length is less than the minimum 8
+ * coverage length exceeds actual datagram length
+
+ InBadChecksum: Datagrams with wrong checksum (also InErrors).
+
+ OutDatagrams: Total number of sent datagrams.
+
+ OutPartialCov: Number of sent datagrams with csum coverage < length.
+
+ If a receiving application has specified a minimum coverage length and
+ received a packet with a smaller coverage value than this, or if it has
+ specified full coverage (UDP mode) and received a partially covered packet,
+ this counts as error (under InErrors), and an error message is logged.
+
+ These statistics variables obey the following relations:
+
+ Total_received_datagrams = InDatagrams + InErrors + NoPorts
+
+ InErrors >= InBadCoverage + InBadChecksum
+
+ The `>' includes other errors such as socket queue errors (usually 0). For
+ IPv6, the same statistics variables are used, using the `UdpLite6' prefix,
+ and can be viewed using "grep ^UdpLite6 /proc/net/snmp6". Alternatively,
+ you can use the `nstat' utility found in the iproute2 package.
+
+
+
+ VI) OPEN ISSUES
+
+ 1) Sharing Code with UDP
+
+ On the mailing list there has been a suggestion to share code between
+ ipv?/udp.c and ipv?/udplite.c. There is indeed a potential for this, but the
+ challenge is not to mess up existing code. A line-by-line comparison between
+ ipv4/udp.c and ipv4/udplite.c revealed the following similarities:
+
+ * 45 functions appear in udp.c and modified in udplite.c
+ * 26 are used with trivial modifications (sed/perl could do this)
+ * 10 are used with minor changes (structure / sockopt names)
+ * 8 require real modifications (in control flow and algorithm)
+ * 1 function is missing in udplite.c (no equivalent of udp_check())
+
+ A summary of this analysis can be found on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/udplite-comparison.html
+
+ Further similarities include structure identifiers, hence udp_seq_afinfo is
+ e.g. already reused in UDP-Lite; as are several UDP constants.
+
+ However, I doubt whether merging will make things better. In a lot of cases
+ the code is functionally identical but depends and operates on global data
+ structures and locks which are exported as kernel symbols:
+ * udp(lite)_hash
+ * udp(lite)_hash_lock
+ * udp(lite)_port_rover
+ * udp(lite)_statistics
+ Hence it would be necessary to rename these globals apart in both source code
+ files, which would lead to a lot of #ifdefs in udp.c and introduce a fragile
+ dependency between both. Any change made in udp.c would thus immediately
+ propagate to udplite.c - manual revision remains inevitable.
+
+ 2) MIB Standardisation
+
+ A MIB for UDP-Lite does not (yet) exist. For someone who is familiar with
+ SNMP/ASN.1 it would be an easy task to turn the above variable definitions
+ into a MIB - in the same manner as per e.g. RFC 2013. Anyone interested in
+ helping with this work should contact us at <gerrit@erg.abdn.ac.uk>.
+
+
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
2006-07-14 16:19 [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support Gerrit Renker
@ 2006-07-15 13:33 ` Herbert Xu
2006-07-16 9:29 ` Gerrit Renker
2006-07-28 5:30 ` David Miller
1 sibling, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2006-07-15 13:33 UTC (permalink / raw)
To: Gerrit Renker
Cc: akpm, davem, yoshfuji, kuznet, jmorris, kaber, pekkas, netdev,
linux-kernel
Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
>
> diff -Nurp a/net/core/sock.c b/net/core/sock.c
> --- a/net/core/sock.c 2006-07-06 09:08:24.000000000 +0100
> +++ b/net/core/sock.c 2006-07-14 10:17:50.000000000 +0100
> @@ -479,7 +479,12 @@ set_rcvbuf:
> break;
>
> case SO_NO_CHECK:
> - sk->sk_no_check = valbool;
> + /* UDP-Lite (RFC 3828) mandates checksumming,
> + * hence user must not enable this option. */
> + if (sk->sk_protocol == IPPROTO_UDPLITE)
> + ret = -EOPNOTSUPP;
> + else
> + sk->sk_no_check = valbool;
Please don't add protocol-specific stuff to generic functions. In this
case why don't you just ignore sk_no_check for UDPLITE as we do for TCP?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
2006-07-15 13:33 ` Herbert Xu
@ 2006-07-16 9:29 ` Gerrit Renker
0 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-07-16 9:29 UTC (permalink / raw)
To: Herbert Xu
Cc: akpm, davem, yoshfuji, kuznet, jmorris, kaber, pekkas, netdev,
linux-kernel
Quoting Herbert Xu:
| > case SO_NO_CHECK:
| > - sk->sk_no_check = valbool;
| > + /* UDP-Lite (RFC 3828) mandates checksumming,
| > + * hence user must not enable this option. */
| > + if (sk->sk_protocol == IPPROTO_UDPLITE)
| > + ret = -EOPNOTSUPP;
| > + else
| > + sk->sk_no_check = valbool;
|
| Please don't add protocol-specific stuff to generic functions. In this
| case why don't you just ignore sk_no_check for UDPLITE as we do for TCP?
Thank you for spotting this -- the UDP-Lite code indeed ignores sk_no_check
and will (if no socket options are set) emulate UDP with sk_no_check = 0. Setting
it to 1 will make no difference; so the above is more not strictly necessary. Will
remove in next patch.
Any other comments or ideas, please do not hesitate to write.
-- Gerrit
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
2006-07-14 16:19 [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support Gerrit Renker
2006-07-15 13:33 ` Herbert Xu
@ 2006-07-28 5:30 ` David Miller
2006-07-28 8:19 ` Gerrit Renker
` (4 more replies)
1 sibling, 5 replies; 19+ messages in thread
From: David Miller @ 2006-07-28 5:30 UTC (permalink / raw)
To: gerrit; +Cc: akpm, yoshfuji, kuznet, jmorris, kaber, pekkas, netdev,
linux-kernel
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Fri, 14 Jul 2006 17:19:02 +0100
> Generic support (header files, configuration, and documentation) for
> the UDP-Lite protocol (RFC 3828).
Gerrit, I tried to bring myself over the edge to accept this
work and push it into my net-2.6.19 tree, but I simply can't
The amount of code duplication is absolutely enormous and
totally unnecessary.
With proper abstractions, you can easily add UDP-lite support to the
existing UDP code. And I would really like it to be in that format
before we put it into the tree.
Then all you need to do is something like add:
__u16 pcslen;
__u16 pcrlen;
/* checksum coverage set indicators used by pcflag */
#define UDPLITE_SEND_CC 0x1
#define UDPLITE_RECV_CC 0x2
__u8 pcflag;
to struct udp_sock.
Add the seperate udp_port_rover and udlite_hash[] table to udp.c, make
the latter sized by UDP_HTABLE_SIZE, with suitable exports, and share
the udp_hash_lock for mutual exclusion. Finally, parameterize all the
UDP hash table routines to take a hash table base and a port rover
pointer so that they can operate on both UDP and UDP-Lite sockets
transparently.
Next make 2 top-level routines for things like udp_err() etc.
So that you can go:
/* Common code */
static void __udp_err(struct sock *sk, struct sk_buff *skb, int type, int code, u32 info)
{
struct inet_sock *inet;
int type = skb->h.icmph->type;
int code = skb->h.icmph->code;
int harderr, err;
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
}
err = 0;
harderr = 0;
inet = inet_sk(sk);
switch (type) {
...
}
/*
* RFC1122: OK. Passes ICMP errors back to application, as per
* 4.1.3.3.
*/
if (!inet->recverr) {
if (!harderr || sk->sk_state != TCP_ESTABLISHED)
goto out;
} else {
ip_icmp_error(sk, skb, err, uh->dest, info, (u8*)(uh+1));
}
sk->sk_err = err;
sk->sk_error_report(sk);
out:
sock_put(sk);
}
void udp_err(struct sk_buff *skb, u32 info)
{
struct iphdr *iph = (struct iphdr*)skb->data;
struct udphdr *uh = (struct udphdr*)(skb->data+(iph->ihl<<2));
int type = skb->h.icmph->type;
int code = skb->h.icmph->code;
struct sock *sk;
sk = udp_v4_lookup(udp_hash,
iph->daddr, uh->dest, iph->saddr,
uh->source, skb->dev->ifindex);
__udp_err(sk, skb, type, code, info);
}
void udplite_err(struct sk_buff *skb, u32 info)
{
struct iphdr *iph = (struct iphdr*)skb->data;
struct udphdr *uh = (struct udphdr*)(skb->data+(iph->ihl<<2));
int type = skb->h.icmph->type;
int code = skb->h.icmph->code;
struct sock *sk;
sk = udp_v4_lookup(udplite_hash,
iph->daddr, uh->dest, iph->saddr,
uh->source, skb->dev->ifindex);
__udp_err(sk, skb, type, code, info);
}
Make similar abstractions for the send and recive path processing,
substituting in the specific UDP vs. UDP-Lite header and
checksum semantic handling along the way.
It's mostly clerical work, but it will mean that we will have one
copy of all this code and as a result we won't even need a config
option for UDP-Lite.
Thanks.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
2006-07-28 5:30 ` David Miller
@ 2006-07-28 8:19 ` Gerrit Renker
2006-07-28 8:25 ` David Miller
2006-09-19 7:25 ` [PATCHv3 1/4][RFC] net/ipv4: consolidated UDP / UDP-Lite code Gerrit Renker
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Gerrit Renker @ 2006-07-28 8:19 UTC (permalink / raw)
To: David Miller
Cc: akpm, yoshfuji, kuznet, jmorris, kaber, pekkas, netdev,
linux-kernel
Hi David,
thank you very much for taking time to revise the code
and for the detailed comments.
| The amount of code duplication is absolutely enormous and
| totally unnecessary.
You are right.
So far I thought it better to keep UDP and UDP-Lite separate but an
intelligent code integration does seem the better way.
<snip>
| It's mostly clerical work, but it will mean that we will have one
| copy of all this code and as a result we won't even need a config
| option for UDP-Lite.
I will start with the v4-side and post a small RFC patch to see whether
I got the concepts right. (Due to vacation, this will not before mid/end
of August.)
Best regards
Gerrit
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support
2006-07-28 8:19 ` Gerrit Renker
@ 2006-07-28 8:25 ` David Miller
0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2006-07-28 8:25 UTC (permalink / raw)
To: gerrit; +Cc: akpm, yoshfuji, kuznet, jmorris, kaber, pekkas, netdev,
linux-kernel
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Fri, 28 Jul 2006 09:19:55 +0100
> I will start with the v4-side and post a small RFC patch to see whether
> I got the concepts right. (Due to vacation, this will not before mid/end
> of August.)
Ok, I look forward to reviewing it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCHv3 1/4][RFC] net/ipv4: consolidated UDP / UDP-Lite code
2006-07-28 5:30 ` David Miller
2006-07-28 8:19 ` Gerrit Renker
@ 2006-09-19 7:25 ` Gerrit Renker
2006-10-09 9:51 ` [PATCH-update][RFC] net: " Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 2/4][RFC] net/ipv4: self-contained UDP-Lite module Gerrit Renker
` (2 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Gerrit Renker @ 2006-09-19 7:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Hi David,
please find enclosed for review the proposed changes to integrate UDP-Lite
code with UDP.
Please disregard all earlier patches, I have been putting in more hard work to
consolidate the code further, also with regard to expanding to UDP(-Lite) v6.
(Reductions are drastic: udplite.c is slimmed down to 186 lines.)
This is the v4-side and with a minimum of formatting diffs. Any suggestions
for v4 will automatically be incorporated into the v6 side in the subsequent
revision.
Enclosed patch applies on your 2.6.19 tree and on 2.6.18-rc6-mm2 and has been
tested (UDPv4/v6 as well) for quite a while.
R e s o l v e d I s s u e s
a) Common Naming scheme:
--stuff that works for both UDPv4/6 and UDP-Litev4/6 is called udp_lib_xxx()
* restricted to v4: udp4_lib_xxx()
* restricted to v6: udp6_lib_xxx()
* when it should not be called directly: __udp_lib_xxx() [resp.__udp{4,6}_lib_xxx()]
--some old functions (e.g. udp_v4_mcast_next()) have retained their name
although they would need to be renamed (here udp4_lib_mcast_next());
this is left for later -- a `name-change-only' patch
b) Correct MIB-counting of de facto received datagrams (resolving
http://bugzilla.kernel.org/show_bug.cgi?id=6660 for this module; some follow-up
work may have to look at doing the same for sunrpc, nfs, and likewise)
c) udp_v{4,6}_get_port(): consolidated.
C h a n g e l o g
* inline functions which are shared between UDP and UDP-Lite and v4/v6 have
been put into header files rather than duplicated in source files
* made net/ipv4/udplite.c self-contained unit (#included at the end of udp.c)
* consolidated v4/v6 checksumming code (generic now for UDP and UDP-Lite)
--new: udp_csum_outgoing()
--contrasts with udplite_csum_outgoing()
--removed support for disabling UDPv6 checksums (RFC 2460 says it's illegal)
* udp_v{4,6}_hash() and udp_v{4,6}_unhash() identical: inlined into header file
* basic xfrm/netfilter support (thanks to Patrick McHardy)
* updated the protocol registration in udplite4_register to not do
/proc registration on error (thanks to comments by James Morris)
* detailed per-function breakdown as per earlier mail
Regards,
Gerrit
--
include/linux/udp.h | 11 +
include/net/udp.h | 89 +++++++++
net/ipv4/udp.c | 485 ++++++++++++++++++++++++++++++++--------------------
net/ipv6/udp.c | 50 -----
4 files changed, 406 insertions(+), 229 deletions(-)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 90223f0..1b7cf10 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,12 +50,23 @@ struct udp_sock {
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ /*
+ * Fields specific to UDP-Lite.
+ */
+ __u16 pcslen;
+ __u16 pcrlen;
+/* indicator bits used by pcflag: */
+#define UDPLITE_BIT 0x1 /* set by udplite proto init function */
+#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */
+#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt */
+ __u8 pcflag; /* marks socket as UDP-Lite if > 0 */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
}
+#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
#endif
diff --git a/include/net/udp.h b/include/net/udp.h
index db0c05f..b22abb3 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -26,9 +26,31 @@ #include <linux/list.h>
#include <net/inet_sock.h>
#include <net/sock.h>
#include <net/snmp.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
#include <linux/seq_file.h>
#define UDP_HTABLE_SIZE 128
+#include <net/udplite.h>
+
+/**
+ * struct udp_skb_cb - UDP(-Lite) private variables
+ *
+ * @header: private variables used by IPv4/IPv6
+ * @cscov: checksum coverage length (UDP-Lite only)
+ * @partial_cov: if set indicates partial csum coverage
+ */
+struct udp_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial_cov;
+};
+#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb))
extern struct hlist_head udp_hash[UDP_HTABLE_SIZE];
extern rwlock_t udp_hash_lock;
@@ -47,6 +69,56 @@ extern struct proto udp_prot;
struct sk_buff;
+/*
+ * Generic checksumming routines for UDP(-Lite) v4 and v6
+ */
+static inline u16 __udp_lib_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDP_SKB_CB(skb)->partial_cov)
+ return __skb_checksum_complete(skb);
+ return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static __inline__ int udp_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udp_lib_checksum_complete(skb);
+}
+
+/**
+ * udp_csum_outgoing - compute UDPv4/v6 checksum over fragments
+ * @sk: socket we are writing to
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static inline u32 udp_csum_outgoing(struct sock *sk, struct sk_buff *skb)
+{
+ u32 csum = csum_partial(skb->h.raw, sizeof(struct udphdr), 0);
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+ return csum;
+}
+
+/* hash routines shared between UDPv4/6 and UDP-Litev4/6 */
+static inline void udp_lib_hash(struct sock *sk)
+{
+ BUG();
+}
+
+static inline void udp_lib_unhash(struct sock *sk)
+{
+ write_lock_bh(&udp_hash_lock);
+ if (sk_del_node_init(sk)) {
+ inet_sk(sk)->num = 0;
+ sock_prot_dec_use(sk->sk_prot);
+ }
+ write_unlock_bh(&udp_hash_lock);
+}
+
+/* net/ipv4/udp.c */
extern int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *, const struct sock *));
extern void udp_err(struct sk_buff *, u32);
@@ -61,21 +133,32 @@ extern unsigned int udp_poll(struct file
poll_table *wait);
DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
-#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field)
-#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field)
-#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field)
+/*
+ * SNMP statistics for UDP and UDP-Lite
+ */
+#define UDP_INC_STATS_USER(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \
+ else SNMP_INC_STATS_USER(udp_statistics, field); } while(0)
+#define UDP_INC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
+ else SNMP_INC_STATS_BH(udp_statistics, field); } while(0)
+#define UDP_DEC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_DEC_STATS_BH(udplite_statistics, field); \
+ else SNMP_DEC_STATS_BH(udp_statistics, field); } while(0)
/* /proc */
struct udp_seq_afinfo {
struct module *owner;
char *name;
sa_family_t family;
+ struct hlist_head *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations *seq_fops;
};
struct udp_iter_state {
sa_family_t family;
+ struct hlist_head *hashtable;
int bucket;
struct seq_operations seq_ops;
};
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 77e265d..9bbb2b2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include <linux/errno.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/inet.h>
-#include <linux/ipv6.h>
#include <linux/netdevice.h>
#include <net/snmp.h>
-#include <net/ip.h>
#include <net/tcp_states.h>
#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -120,26 +118,29 @@ DEFINE_RWLOCK(udp_hash_lock);
static int udp_port_rover;
-static inline int udp_lport_inuse(u16 num)
+static inline int __udp_lib_lport_inuse(u16 num, struct hlist_head udptable[])
{
struct sock *sk;
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[num & (UDP_HTABLE_SIZE - 1)])
+ sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
if (inet_sk(sk)->num == num)
return 1;
return 0;
}
/**
- * udp_get_port - common port lookup for IPv4 and IPv6
+ * __udp_lib_get_port - UDP/-Lite port lookup for IPv4 and IPv6
*
* @sk: socket struct in question
* @snum: port number to look up
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ * @port_rover: pointer to record of last unallocated port
* @saddr_comp: AF-dependent comparison of bound local IP addresses
*/
-int udp_get_port(struct sock *sk, unsigned short snum,
- int (*saddr_cmp)(const struct sock *sk1, const struct sock *sk2))
+static int __udp_lib_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover,
+ int (*saddr_cmp)(const struct sock *, const struct sock *))
{
struct hlist_node *node;
struct hlist_head *head;
@@ -150,15 +151,15 @@ int udp_get_port(struct sock *sk, unsign
if (snum == 0) {
int best_size_so_far, best, result, i;
- if (udp_port_rover > sysctl_local_port_range[1] ||
- udp_port_rover < sysctl_local_port_range[0])
- udp_port_rover = sysctl_local_port_range[0];
+ if (*port_rover > sysctl_local_port_range[1] ||
+ *port_rover < sysctl_local_port_range[0])
+ *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
- best = result = udp_port_rover;
+ best = result = *port_rover;
for (i = 0; i < UDP_HTABLE_SIZE; i++, result++) {
int size;
- head = &udp_hash[result & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[result & (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(head)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -179,15 +180,15 @@ int udp_get_port(struct sock *sk, unsign
result = sysctl_local_port_range[0]
+ ((result - sysctl_local_port_range[0]) &
(UDP_HTABLE_SIZE - 1));
- if (!udp_lport_inuse(result))
+ if (! __udp_lib_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
goto fail;
gotit:
- udp_port_rover = snum = result;
+ *port_rover = snum = result;
} else {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_for_each(sk2, node, head)
if (inet_sk(sk2)->num == snum &&
@@ -200,7 +201,7 @@ gotit:
}
inet_sk(sk)->num = snum;
if (sk_unhashed(sk)) {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_add_node(sk, head);
sock_prot_inc_use(sk->sk_prot);
}
@@ -210,6 +211,12 @@ fail:
return error;
}
+__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
+}
+
static inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
{
struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
@@ -225,33 +232,20 @@ static inline int udp_v4_get_port(struct
}
-static void udp_v4_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v4_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
* harder than this. -DaveM
*/
-static struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport,
- u32 daddr, u16 dport, int dif)
+static struct sock *__udp4_lib_lookup(u32 saddr, u16 sport,
+ u32 daddr, u16 dport,
+ int dif, struct hlist_head udptable[])
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
unsigned short hnum = ntohs(dport);
int badness = -1;
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ read_lock(&udp_hash_lock);
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && !ipv6_only_sock(sk)) {
@@ -285,20 +279,17 @@ static struct sock *udp_v4_lookup_longwa
}
}
}
+ if (result)
+ sock_hold(result);
+ read_unlock(&udp_hash_lock);
+
return result;
}
static __inline__ struct sock *udp_v4_lookup(u32 saddr, u16 sport,
u32 daddr, u16 dport, int dif)
{
- struct sock *sk;
-
- read_lock(&udp_hash_lock);
- sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif);
- if (sk)
- sock_hold(sk);
- read_unlock(&udp_hash_lock);
- return sk;
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
}
static inline struct sock *udp_v4_mcast_next(struct sock *sk,
@@ -340,7 +331,8 @@ found:
* to find the appropriate port.
*/
-void udp_err(struct sk_buff *skb, u32 info)
+static void __udp4_lib_err(struct sk_buff *skb, u32 info,
+ struct hlist_head udptable[] )
{
struct inet_sock *inet;
struct iphdr *iph = (struct iphdr*)skb->data;
@@ -351,7 +343,8 @@ void udp_err(struct sk_buff *skb, u32 in
int harderr;
int err;
- sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(iph->daddr, uh->dest, iph->saddr, uh->source,
+ skb->dev->ifindex, udptable );
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
@@ -405,6 +398,11 @@ out:
sock_put(sk);
}
+__inline__ void udp_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udp_hash);
+}
+
/*
* Throw away all pending data and cancel the corking. Socket is locked.
*/
@@ -419,6 +417,45 @@ static void udp_flush_pending_frames(str
}
}
+/**
+ * udp4_hwcsum_outgoing - handle outgoing HW checksumming
+ * @sk: socket we are sending on
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static void udp4_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb,
+ u32 src, u32 dst, int len )
+{
+ unsigned int csum = 0, offset;
+ struct udphdr *uh = skb->h.uh;
+
+ if (skb_queue_len(&sk->sk_write_queue) == 1) {
+ /*
+ * Only one fragment on the socket.
+ */
+ skb->csum = offsetof(struct udphdr, check);
+ uh->check = ~csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, 0);
+ } else {
+ /*
+ * HW-checksum won't work as there are two or more
+ * fragments on the socket so that all csums of sk_buffs
+ * should be together
+ */
+ offset = skb->h.raw - skb->data;
+ skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+
+ skb->ip_summed = CHECKSUM_NONE;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+
+ uh->check = csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+ }
+}
+
/*
* Push out all pending data as one UDP datagram. Socket is locked.
*/
@@ -429,6 +466,7 @@ static int udp_push_pending_frames(struc
struct sk_buff *skb;
struct udphdr *uh;
int err = 0;
+ u32 csum = 0;
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
@@ -443,52 +481,31 @@ static int udp_push_pending_frames(struc
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
+ if (up->pcflag) { /* UDP-Lite */
+ int cscov = udplite_sender_cscov(up, uh);
+
+ csum = udplite_csum_outgoing(sk, cscov);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ } else if (sk->sk_no_check == UDP_CSUM_NOXMIT) { /* UDP csum disabled */
+
skb->ip_summed = CHECKSUM_NONE;
goto send;
- }
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- /*
- * Only one fragment on the socket.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- skb->csum = offsetof(struct udphdr, check);
- uh->check = ~csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, 0);
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, skb->csum);
- if (uh->check == 0)
- uh->check = -1;
- }
- } else {
- unsigned int csum = 0;
- /*
- * HW-checksum won't work as there are two or more
- * fragments on the socket so that all csums of sk_buffs
- * should be together.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int offset = (unsigned char *)uh - skb->data;
- skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+ } else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
- skb->ip_summed = CHECKSUM_NONE;
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- }
+ udp4_hwcsum_outgoing(sk, skb, fl->fl4_src,fl->fl4_dst, up->len);
+ goto send;
+
+ } else /* `normal' UDP */
+ csum = udp_csum_outgoing(sk, skb);
+
+ /* add protocol-dependent pseudo-header */
+ uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst, up->len,
+ sk->sk_protocol, csum );
+ if (uh->check == 0)
+ uh->check = -1;
- skb_queue_walk(&sk->sk_write_queue, skb) {
- csum = csum_add(csum, skb->csum);
- }
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, csum);
- if (uh->check == 0)
- uh->check = -1;
- }
send:
err = ip_push_pending_frames(sk);
out:
@@ -497,12 +514,6 @@ out:
return err;
}
-
-static unsigned short udp_check(struct udphdr *uh, int len, unsigned long saddr, unsigned long daddr, unsigned long base)
-{
- return(csum_tcpudp_magic(saddr, daddr, len, IPPROTO_UDP, base));
-}
-
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -516,8 +527,9 @@ int udp_sendmsg(struct kiocb *iocb, stru
u32 daddr, faddr, saddr;
u16 dport;
u8 tos;
- int err;
+ int err, is_udplite = up->pcflag;
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
if (len > 0xFFFF)
return -EMSGSIZE;
@@ -622,7 +634,7 @@ int udp_sendmsg(struct kiocb *iocb, stru
{ .daddr = faddr,
.saddr = saddr,
.tos = tos } },
- .proto = IPPROTO_UDP,
+ .proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = dport } } };
@@ -668,8 +680,9 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
- sizeof(struct udphdr), &ipc, rt,
+ getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
+ err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
+ sizeof(struct udphdr), &ipc, rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
udp_flush_pending_frames(sk);
@@ -682,7 +695,7 @@ out:
if (free)
kfree(ipc.opt);
if (!err) {
- UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -693,7 +706,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -793,17 +806,6 @@ int udp_ioctl(struct sock *sk, int cmd,
return(0);
}
-static __inline__ int __udp_checksum_complete(struct sk_buff *skb)
-{
- return __skb_checksum_complete(skb);
-}
-
-static __inline__ int udp_checksum_complete(struct sk_buff *skb)
-{
- return skb->ip_summed != CHECKSUM_UNNECESSARY &&
- __udp_checksum_complete(skb);
-}
-
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
@@ -815,7 +817,7 @@ static int udp_recvmsg(struct kiocb *ioc
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
struct sk_buff *skb;
- int copied, err;
+ int copied, err, copy_only, is_udplite = IS_UDPLITE(sk);
/*
* Check any passed addresses
@@ -837,15 +839,25 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
- if (__udp_checksum_complete(skb))
+ /*
+ * Decide whether to checksum and/or copy data.
+ *
+ * UDP: checksum may have been computed in HW,
+ * (re-)compute it if message is truncated.
+ * UDP-Lite: always needs to checksum, no HW support.
+ */
+ copy_only = (skb->ip_summed==CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
+ if (__udp_lib_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
+ copy_only = 1;
+ }
+
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
@@ -878,7 +890,8 @@ out:
return err;
csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
skb_kill_datagram(sk, skb, flags);
@@ -1019,10 +1032,8 @@ static int udp_queue_rcv_skb(struct sock
/*
* Charge it to the socket, dropping if the queue is full.
*/
- if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
nf_reset(skb);
if (up->encap_type) {
@@ -1046,31 +1057,77 @@ static int udp_queue_rcv_skb(struct sock
if (ret < 0) {
/* process the ESP packet */
ret = xfrm4_rcv_encap(skb, up->encap_type);
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return -ret;
}
/* FALLTHROUGH -- it's a UDP Packet */
}
- if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
- if (__udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ /*
+ * MIB statistics other than incrementing the error count are
+ * disabled for the following two types of errors: these depend
+ * on the application settings, not on the functioning of the
+ * protocol stack as such.
+ *
+ *
+ * RFC 3828 here recommends (sec 3.3): "There should also be a
+ * way ... to ... at least let the receiving application block
+ * delivery of packets with coverage values less than a value
+ * provided by the application."
+ */
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE: partial coverage "
+ "%d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
}
+ /* The next case involves violating the min. coverage requested
+ * by the receiver. This is subtle: if receiver wants x and x is
+ * greater than the buffersize/MTU then receiver will complain
+ * that it wants x while sender emits packets of smaller size y.
+ * Therefore the above ...()->partial_cov statement is essential.
+ */
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING
+ "UDPLITE: coverage %d too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
+ }
+ }
+
+ if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
+ if (__udp_lib_checksum_complete(skb))
+ goto drop;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+
+ /*
+ * XXX Incrementing this counter when the datagram is later taken
+ * off the queue due to receive failure is problematic, cf.
+ * http://bugzilla.kernel.org/show_bug.cgi?id=6660
+ * This module counts correctly by decrementing InDatagrams
+ * whenever the datagram is popped off a queue without being
+ * actually delivered; see also udp_recvmsg() and udp_poll().
+ */
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+
+drop:
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
/*
@@ -1079,14 +1136,14 @@ static int udp_queue_rcv_skb(struct sock
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
- u32 saddr, u32 daddr)
+static int __udp4_lib_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
+ u32 saddr, u32 daddr, struct hlist_head udptable[])
{
struct sock *sk;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v4_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (sk) {
@@ -1115,6 +1172,12 @@ static int udp_v4_mcast_deliver(struct s
return 0;
}
+static __inline__ int udp_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udp_hash);
+}
+
/* Initialize UDP checksum. If exited with zero value (success),
* CHECKSUM_UNNECESSARY means, that no more checks are required.
* Otherwise, csum completion requires chacksumming packet body,
@@ -1126,7 +1189,7 @@ static void udp_checksum_init(struct sk_
if (uh->check == 0) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
} else if (skb->ip_summed == CHECKSUM_COMPLETE) {
- if (!udp_check(uh, ulen, saddr, daddr, skb->csum))
+ if (!csum_tcpudp_magic(saddr,daddr,ulen, IPPROTO_UDP, skb->csum))
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if (skb->ip_summed != CHECKSUM_UNNECESSARY)
@@ -1134,16 +1197,19 @@ static void udp_checksum_init(struct sk_
/* Probably, we should checksum udp header (it should be in cache
* in any case) and data in tiny packets (< rx copybreak).
*/
+
+ /* UDP = UDP-Lite with a non-partial checksum coverage */
+ UDP_SKB_CB(skb)->partial_cov = 0;
}
/*
- * All we need to do is get the socket, and then do a checksum.
+ * All we need to do is get the socket, and then do a checksum.
*/
-
-int udp_rcv(struct sk_buff *skb)
+static int __udp4_lib_rcv(struct sk_buff *skb,
+ struct hlist_head udptable[], int is_udplite)
{
struct sock *sk;
- struct udphdr *uh;
+ struct udphdr *uh = skb->h.uh;
unsigned short ulen;
struct rtable *rt = (struct rtable*)skb->dst;
u32 saddr = skb->nh.iph->saddr;
@@ -1151,34 +1217,39 @@ int udp_rcv(struct sk_buff *skb)
int len = skb->len;
/*
- * Validate the packet and the UDP length.
+ * Validate the packet.
*/
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
- goto no_header;
-
- uh = skb->h.uh;
+ goto drop; /* No space for header. */
ulen = ntohs(uh->len);
-
- if (ulen > len || ulen < sizeof(*uh))
+ if (ulen > len)
goto short_packet;
- if (pskb_trim_rcsum(skb, ulen))
- goto short_packet;
+ if(! is_udplite ) { /* UDP validates ulen. */
+
+ if ( ulen < sizeof(*uh) || pskb_trim_rcsum(skb, ulen) )
+ goto short_packet;
+ /* note the difference: UDP uses ulen, UDP-Lite uses len */
+ udp_checksum_init(skb, uh, ulen, saddr, daddr);
- udp_checksum_init(skb, uh, ulen, saddr, daddr);
+ } else { /* UDP-Lite validates cscov. */
+ if (! udplite_checksum_init(skb, uh, len, saddr, daddr) )
+ goto csum_error;
+ }
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return udp_v4_mcast_deliver(skb, uh, saddr, daddr);
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udptable);
- sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+ skb->dev->ifindex, udptable );
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
- * it it wants the return to be -protocol, or 0
+ * it wants the return to be -protocol, or 0
*/
if (ret > 0)
return -ret;
@@ -1193,7 +1264,7 @@ int udp_rcv(struct sk_buff *skb)
if (udp_checksum_complete(skb))
goto csum_error;
- UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
/*
@@ -1204,35 +1275,39 @@ int udp_rcv(struct sk_buff *skb)
return(0);
short_packet:
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
ulen,
len,
NIPQUAD(daddr),
ntohs(uh->dest));
-no_header:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return(0);
+ goto drop;
csum_error:
- /*
- * RFC1122: OK. Discards the bad packet silently (as far as
- * the network is concerned, anyway) as per 4.1.3.4 (MUST).
+ /*
+ * RFC1122: OK. Discards the bad packet silently (as far as
+ * the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
NIPQUAD(daddr),
ntohs(uh->dest),
ulen);
drop:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
+__inline__ int udp_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udp_hash, 0);
+}
+
static int udp_destroy_sock(struct sock *sk)
{
lock_sock(sk);
@@ -1282,6 +1357,32 @@ static int do_udp_setsockopt(struct sock
}
break;
+ /*
+ * UDP-Lite's partial checksum coverage (RFC 3828).
+ */
+ /* The sender sets actual checksum coverage length via this option.
+ * The case coverage > packet length is handled by send module. */
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ /* The receiver specifies a minimum checksum coverage value. To make
+ * sense, this should be set to at least 8 (as done below). If zero is
+ * used, this again means full checksum coverage. */
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -1293,18 +1394,18 @@ static int do_udp_setsockopt(struct sock
static int udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return ip_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1331,6 +1432,15 @@ static int do_udp_getsockopt(struct sock
val = up->encap_type;
break;
+ /* the following two always return 0 on UDP sockets */
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -1345,18 +1455,18 @@ static int do_udp_getsockopt(struct sock
static int udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return ip_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_getsockopt(sk, level, optname, optval, optlen);
}
#endif
/**
@@ -1376,7 +1486,8 @@ unsigned int udp_poll(struct file *file,
{
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
-
+ int is_lite = IS_UDPLITE(sk);
+
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
!(file->f_flags & O_NONBLOCK) &&
@@ -1387,7 +1498,11 @@ unsigned int udp_poll(struct file *file,
spin_lock_bh(&rcvq->lock);
while ((skb = skb_peek(rcvq)) != NULL) {
if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ /* The datagram has already been counted as
+ * InDatagram when earlier it was enqueued.
+ * Update count of really received datagrams. */
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_lite);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_lite);
__skb_unlink(skb, rcvq);
kfree_skb(skb);
} else {
@@ -1420,8 +1535,8 @@ struct proto udp_prot = {
.recvmsg = udp_recvmsg,
.sendpage = udp_sendpage,
.backlog_rcv = udp_queue_rcv_skb,
- .hash = udp_v4_hash,
- .unhash = udp_v4_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v4_get_port,
.obj_size = sizeof(struct udp_sock),
#ifdef CONFIG_COMPAT
@@ -1440,7 +1555,7 @@ static struct sock *udp_get_first(struct
for (state->bucket = 0; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[state->bucket]) {
+ sk_for_each(sk, node, state->hashtable + state->bucket) {
if (sk->sk_family == state->family)
goto found;
}
@@ -1461,7 +1576,7 @@ try_again:
} while (sk && sk->sk_family != state->family);
if (!sk && ++state->bucket < UDP_HTABLE_SIZE) {
- sk = sk_head(&udp_hash[state->bucket]);
+ sk = sk_head(state->hashtable + state->bucket);
goto try_again;
}
return sk;
@@ -1511,6 +1626,7 @@ static int udp_seq_open(struct inode *in
if (!s)
goto out;
s->family = afinfo->family;
+ s->hashtable = afinfo->hashtable;
s->seq_ops.start = udp_seq_start;
s->seq_ops.next = udp_seq_next;
s->seq_ops.show = afinfo->seq_show;
@@ -1577,7 +1693,7 @@ static void udp4_format_sock(struct sock
atomic_read(&sp->sk_refcnt), sp);
}
-static int udp4_seq_show(struct seq_file *seq, void *v)
+int udp4_seq_show(struct seq_file *seq, void *v)
{
if (v == SEQ_START_TOKEN)
seq_printf(seq, "%-127s\n",
@@ -1600,6 +1716,7 @@ static struct udp_seq_afinfo udp4_seq_af
.owner = THIS_MODULE,
.name = "udp",
.family = AF_INET,
+ .hashtable = udp_hash,
.seq_show = udp4_seq_show,
.seq_fops = &udp4_seq_fops,
};
@@ -1628,3 +1745,5 @@ #ifdef CONFIG_PROC_FS
EXPORT_SYMBOL(udp_proc_register);
EXPORT_SYMBOL(udp_proc_unregister);
#endif
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9662561..cf6cd7a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -66,21 +66,6 @@ static inline int udp_v6_get_port(struct
return udp_get_port(sk, snum, ipv6_rcv_saddr_equal);
}
-static void udp_v6_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v6_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
static struct sock *udp_v6_lookup(struct in6_addr *saddr, u16 sport,
struct in6_addr *daddr, u16 dport, int dif)
{
@@ -485,6 +470,7 @@ static int udp_v6_push_pending_frames(st
struct inet_sock *inet = inet_sk(sk);
struct flowi *fl = &inet->cork.fl;
int err = 0;
+ u32 csum;
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
@@ -499,35 +485,12 @@ static int udp_v6_push_pending_frames(st
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
- skb->ip_summed = CHECKSUM_NONE;
- goto send;
- }
-
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, skb->csum);
- } else {
- u32 tmp_csum = 0;
-
- skb_queue_walk(&sk->sk_write_queue, skb) {
- tmp_csum = csum_add(tmp_csum, skb->csum);
- }
- tmp_csum = csum_partial((char *)uh,
- sizeof(struct udphdr), tmp_csum);
- tmp_csum = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, tmp_csum);
- uh->check = tmp_csum;
-
- }
+ csum = udp_csum_outgoing(sk, skb);
+ uh->check = csum_ipv6_magic(&fl->fl6_src, &fl->fl6_dst,
+ up->len, fl->proto, csum );
if (uh->check == 0)
uh->check = -1;
-send:
err = ip6_push_pending_frames(sk);
out:
up->len = 0;
@@ -1001,6 +964,7 @@ static struct udp_seq_afinfo udp6_seq_af
.owner = THIS_MODULE,
.name = "udp6",
.family = AF_INET6,
+ .hashtable = udp_hash,
.seq_show = udp6_seq_show,
.seq_fops = &udp6_seq_fops,
};
@@ -1030,8 +994,8 @@ struct proto udpv6_prot = {
.sendmsg = udpv6_sendmsg,
.recvmsg = udpv6_recvmsg,
.backlog_rcv = udpv6_queue_rcv_skb,
- .hash = udp_v6_hash,
- .unhash = udp_v6_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v6_get_port,
.obj_size = sizeof(struct udp6_sock),
#ifdef CONFIG_COMPAT
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCHv3 2/4][RFC] net/ipv4: self-contained UDP-Lite module
2006-07-28 5:30 ` David Miller
2006-07-28 8:19 ` Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 1/4][RFC] net/ipv4: consolidated UDP / UDP-Lite code Gerrit Renker
@ 2006-09-19 7:25 ` Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 4/4][RFC] net: misc. files to support UDP-Lite Gerrit Renker
4 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-09-19 7:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev
The self-contained UDP-Litev4 module for v4; logically completely separate from ipv4/udp.c.
--
include/net/udplite.h | 86 +++++++++++++++++++++++
net/ipv4/udplite.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 272 insertions(+)
diff --git a/include/net/udplite.h b/include/net/udplite.h
new file mode 100644
index 0000000..90d7aec
--- /dev/null
+++ b/include/net/udplite.h
@@ -0,0 +1,86 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ */
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10 /* sender partial coverage (as sent) */
+#define UDPLITE_RECV_CSCOV 11 /* receiver partial coverage (threshold ) */
+
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+
+/* UDP-Lite does not have a standardized MIB yet, so we inherit from UDP */
+DECLARE_SNMP_STAT(struct udp_mib, udplite_statistics);
+
+/*
+ * Checksum computation is all in software, hence simpler getfrag.
+ */
+static __inline__ int udplite_getfrag(void *from, char *to, int offset,
+ int len, int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+/*
+ * Functions used by UDP-Litev4 and UDP-Litev6
+ */
+/* calculate checksum coverage set for outgoing packets */
+static inline int udplite_sender_cscov(struct udp_sock *up, struct udphdr *uh)
+{
+ int cscov = up->len;
+
+ /*
+ * Sender has set `partial coverage' option on UDP-Lite socket
+ */
+ if (up->pcflag & UDPLITE_SEND_CC) {
+ if (up->pcslen < up->len) {
+ /* up->pcslen == 0 means that full coverage is required,
+ * partial coverage only if 0 < up->pcslen < up->len */
+ if (0 < up->pcslen) {
+ cscov = up->pcslen;
+ }
+ uh->len = htons(up->pcslen);
+ }
+ /*
+ * NOTE: Causes for the error case `up->pcslen > up->len':
+ * (i) Application error (will not be penalized).
+ * (ii) Payload too big for send buffer: data is split
+ * into several packets, each with its own header.
+ * In this case (e.g. last segment), coverage may
+ * exceed packet length.
+ * Since packets with coverage length > packet length are
+ * illegal, we fall back to the defaults here.
+ */
+ }
+ return cscov;
+}
+
+static inline u32 udplite_csum_outgoing(struct sock *sk, int cscov)
+{
+ struct sk_buff *skb;
+ int off, len;
+ u32 csum = 0;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ off = skb->h.raw - skb->data;
+ len = skb->len - off;
+
+ csum = skb_checksum(skb, off, (cscov > len)? len : cscov, csum);
+
+ if ((cscov -= len) <= 0)
+ break;
+ }
+ return csum;
+}
+
+/*
+ * net/ipv4/udplite.c
+ */
+extern void udplite4_register(void);
+extern int udplite_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *));
+extern int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ u16 len, u32 saddr, u32 daddr );
+#endif /* _UDPLITE_H */
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
new file mode 100644
index 0000000..7f6498d
--- /dev/null
+++ b/net/ipv4/udplite.c
@@ -0,0 +1,186 @@
+/*
+ * UDPLITE An implementation of the UDP-Lite protocol (RFC 3828).
+ *
+ * Version: $Id: udplite.c,v 1.24 2006/09/18 21:50:59 gerrit Exp gerrit $
+ *
+ * Authors: Gerrit Renker <gerrit@erg.abdn.ac.uk>
+ *
+ * Changes:
+ * Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+static int udplite_port_rover;
+DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics) __read_mostly;
+
+/* Designate sk as UDP-Lite socket */
+static inline int udplite_sk_init(struct sock *sk)
+{
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
+__inline__ int udplite_get_port(struct sock *sk, unsigned short p,
+ int (*c)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, p, udplite_hash, &udplite_port_rover, c);
+}
+
+static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return udplite_get_port(sk, snum, ipv4_rcv_saddr_equal);
+}
+
+static __inline__ struct sock *udplite_v4_lookup(u32 saddr, u16 sport,
+ u32 daddr, u16 dport, int dif)
+{
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+static __inline__ int udplite_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udplite_hash);
+}
+
+__inline__ void udplite_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udplite_hash);
+}
+
+int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ u16 len, u32 saddr, u32 daddr )
+{
+ u16 cscov;
+
+ /* In UDPv4 a zero checksum means that the transmitter generated no
+ * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
+ * with a zero checksum field are illegal. */
+ if (uh->check == 0) {
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: zeroed csum field"
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", NIPQUAD(saddr),
+ ntohs(uh->source), NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+ }
+
+ UDP_SKB_CB(skb)->partial_cov = 0;
+ cscov = ntohs(uh->len);
+
+ if (cscov == 0) /* Indicates that full coverage is required. */
+ cscov = len;
+ else if (cscov < 8 || cscov > len) {
+ /*
+ * Coverage length violates RFC 3828: log and discard silently.
+ */
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: bad csum coverage %d/%d "
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", cscov, len,
+ NIPQUAD(saddr), ntohs(uh->source),
+ NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+
+ } else if (cscov < len)
+ UDP_SKB_CB(skb)->partial_cov = 1;
+
+ UDP_SKB_CB(skb)->cscov = cscov;
+
+ /*
+ * Initialise pseudo-header for checksum computation.
+ *
+ * There is no known NIC manufacturer supporting UDP-Lite yet,
+ * hence ip_summed is always (re-)set to CHECKSUM_NONE.
+ */
+ skb->csum = csum_tcpudp_nofold(saddr, daddr, len, IPPROTO_UDPLITE, 0);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ return 1;
+}
+
+__inline__ int udplite_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udplite_hash, 1);
+}
+
+struct proto udplite_prot = {
+ .name = "UDP-Lite",
+ .owner = THIS_MODULE,
+ .close = udp_close,
+ .connect = ip4_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udp_destroy_sock,
+ .setsockopt = udp_setsockopt,
+ .getsockopt = udp_getsockopt,
+ .sendmsg = udp_sendmsg,
+ .recvmsg = udp_recvmsg,
+ .sendpage = udp_sendpage,
+ .backlog_rcv = udp_queue_rcv_skb,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
+ .get_port = udplite_v4_get_port,
+ .obj_size = sizeof(struct udp_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udp_setsockopt,
+ .compat_getsockopt = compat_udp_getsockopt,
+#endif
+};
+
+static struct net_protocol udplite_protocol = {
+ .handler = udplite_rcv,
+ .err_handler = udplite_err,
+ .no_policy = 1,
+};
+
+static struct inet_protosw udplite4_protosw = {
+ .type = SOCK_DGRAM,
+ .protocol = IPPROTO_UDPLITE,
+ .prot = &udplite_prot,
+ .ops = &inet_dgram_ops,
+ .capability = -1,
+ .no_check = 0, /* must checksum (RFC 3828) */
+ .flags = INET_PROTOSW_PERMANENT,
+};
+
+#ifdef CONFIG_PROC_FS
+static struct file_operations udplite4_seq_fops;
+static struct udp_seq_afinfo udplite4_seq_afinfo = {
+ .owner = THIS_MODULE,
+ .name = "udplite",
+ .family = AF_INET,
+ .hashtable = udplite_hash,
+ .seq_show = udp4_seq_show,
+ .seq_fops = &udplite4_seq_fops,
+};
+#endif /* CONFIG_PROC_FS */
+
+
+void __init udplite4_register(void)
+{
+ if (proto_register(&udplite_prot, 1))
+ goto out_register_err;
+
+ if (inet_add_protocol(&udplite_protocol, IPPROTO_UDPLITE) < 0)
+ goto out_unregister_proto;
+
+ inet_register_protosw(&udplite4_protosw);
+
+#ifdef CONFIG_PROC_FS
+ if (udp_proc_register(&udplite4_seq_afinfo)) /* udplite4_proc_init() */
+ printk(KERN_ERR "udplite4: Cannot register /proc!\n");
+#endif /* CONFIG_PROC_FS */
+ return;
+
+out_unregister_proto:
+ proto_unregister(&udplite_prot);
+out_register_err:
+ printk(KERN_CRIT "udplite4_register: Cannot add UDP-Lite protocol.\n");
+}
+
+EXPORT_SYMBOL(udplite_hash);
+EXPORT_SYMBOL(udplite_prot);
+EXPORT_SYMBOL(udplite_get_port); /* for v6 */
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite
2006-07-28 5:30 ` David Miller
` (2 preceding siblings ...)
2006-09-19 7:25 ` [PATCHv3 2/4][RFC] net/ipv4: self-contained UDP-Lite module Gerrit Renker
@ 2006-09-19 7:25 ` Gerrit Renker
2006-09-19 7:37 ` Patrick McHardy
2006-09-19 7:25 ` [PATCHv3 4/4][RFC] net: misc. files to support UDP-Lite Gerrit Renker
4 siblings, 1 reply; 19+ messages in thread
From: Gerrit Renker @ 2006-09-19 7:25 UTC (permalink / raw)
To: David Miller, Patrick McHardy; +Cc: netdev
Basic xfrm and netfilter support for UDP-Lite:
* matching of UDP-Lite packets
* LOG support
* header file support
--
include/net/xfrm.h | 2 ++
net/ipv4/netfilter/ipt_LOG.c | 11 ++++++++---
net/ipv4/xfrm4_policy.c | 1 +
net/ipv6/netfilter/ip6t_LOG.c | 10 +++++++---
net/ipv6/xfrm6_policy.c | 1 +
net/netfilter/xt_multiport.c | 9 +++++----
net/netfilter/xt_tcpudp.c | 20 +++++++++++++++++++-
7 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index bf8e2df..e697862 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -468,6 +468,7 @@ u16 xfrm_flowi_sport(struct flowi *fl)
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_sport;
break;
@@ -493,6 +494,7 @@ u16 xfrm_flowi_dport(struct flowi *fl)
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_dport;
break;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 4795985..22b53ea 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -180,6 +180,7 @@ _decode_session4(struct sk_buff *skb, st
if (!(iph->frag_off & htons(IP_MF | IP_OFFSET))) {
switch (iph->protocol) {
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 9391c4c..ea94bd1 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -252,6 +252,7 @@ _decode_session6(struct sk_buff *skb, st
break;
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c
index e76a68e..46414b5 100644
--- a/net/netfilter/xt_tcpudp.c
+++ b/net/netfilter/xt_tcpudp.c
@@ -10,7 +10,7 @@ #include <linux/netfilter/xt_tcpudp.h>
#include <linux/netfilter_ipv4/ip_tables.h>
#include <linux/netfilter_ipv6/ip6_tables.h>
-MODULE_DESCRIPTION("x_tables match for TCP and UDP, supports IPv4 and IPv6");
+MODULE_DESCRIPTION("x_tables match for TCP and UDP(-Lite), supports IPv4 and IPv6");
MODULE_LICENSE("GPL");
MODULE_ALIAS("xt_tcp");
MODULE_ALIAS("xt_udp");
@@ -234,6 +234,24 @@ static struct xt_match xt_tcpudp_match[]
.proto = IPPROTO_UDP,
.me = THIS_MODULE,
},
+ {
+ .name = "udplite",
+ .family = AF_INET,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "udplite",
+ .family = AF_INET6,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
};
static int __init xt_tcpudp_init(void)
diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index d3aefd3..9127f85 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -1,5 +1,5 @@
-/* Kernel module to match one of a list of TCP/UDP/SCTP/DCCP ports: ports are in
- the same place so we can treat them as equal. */
+/* Kernel module to match one of a list of TCP/UDP(-Lite)/SCTP/DCCP ports:
+ * ports are in the same place so we can treat them as equal. */
/* (C) 1999-2001 Paul `Rusty' Russell
* (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org>
@@ -161,8 +161,9 @@ check(u_int16_t proto,
u_int8_t count)
{
/* Must specify supported protocol, no unknown flags or bad count */
- return (proto == IPPROTO_TCP || proto == IPPROTO_UDP
- || proto == IPPROTO_SCTP || proto == IPPROTO_DCCP)
+ return ( proto == IPPROTO_TCP ||
+ proto == IPPROTO_UDP || proto == IPPROTO_UDPLITE ||
+ proto == IPPROTO_SCTP || proto == IPPROTO_DCCP )
&& !(ip_invflags & XT_INV_PROTO)
&& (match_flags == XT_MULTIPORT_SOURCE
|| match_flags == XT_MULTIPORT_DESTINATION
diff --git a/net/ipv6/netfilter/ip6t_LOG.c b/net/ipv6/netfilter/ip6t_LOG.c
index 0cf537d..3cb6bb7 100644
--- a/net/ipv6/netfilter/ip6t_LOG.c
+++ b/net/ipv6/netfilter/ip6t_LOG.c
@@ -270,11 +270,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (currenthdr == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (fragment)
break;
diff --git a/net/ipv4/netfilter/ipt_LOG.c b/net/ipv4/netfilter/ipt_LOG.c
index 7dc820d..46eee64 100644
--- a/net/ipv4/netfilter/ipt_LOG.c
+++ b/net/ipv4/netfilter/ipt_LOG.c
@@ -171,11 +171,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (ih->protocol == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (ntohs(ih->frag_off) & IP_OFFSET)
break;
@@ -341,6 +345,7 @@ static void dump_packet(const struct nf_
/* IP: 40+46+6+11+127 = 230 */
/* TCP: 10+max(25,20+30+13+9+32+11+127) = 252 */
/* UDP: 10+max(25,20) = 35 */
+ /* UDPLITE: 14+max(25,20) = 39 */
/* ICMP: 11+max(25, 18+25+max(19,14,24+3+n+10,3+n+10)) = 91+n */
/* ESP: 10+max(25)+15 = 50 */
/* AH: 9+max(25)+15 = 49 */
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCHv3 4/4][RFC] net: misc. files to support UDP-Lite
2006-07-28 5:30 ` David Miller
` (3 preceding siblings ...)
2006-09-19 7:25 ` [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite Gerrit Renker
@ 2006-09-19 7:25 ` Gerrit Renker
4 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-09-19 7:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Miscellaneous files which complete the support for UDP-Litev4.
--
Documentation/networking/udplite.txt | 291 +++++++++++++++++++++++++++++++++++
include/linux/in.h | 1
include/linux/socket.h | 1
include/net/snmp.h | 2
net/ipv4/af_inet.c | 9 -
net/ipv4/proc.c | 12 +
6 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index fdd89e3..8b2e66e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1223,10 +1223,13 @@ static int __init init_ipv4_mibs(void)
tcp_statistics[1] = alloc_percpu(struct tcp_mib);
udp_statistics[0] = alloc_percpu(struct udp_mib);
udp_statistics[1] = alloc_percpu(struct udp_mib);
+ udplite_statistics[0] = alloc_percpu(struct udp_mib);
+ udplite_statistics[1] = alloc_percpu(struct udp_mib);
if (!
(net_statistics[0] && net_statistics[1] && ip_statistics[0]
&& ip_statistics[1] && tcp_statistics[0] && tcp_statistics[1]
- && udp_statistics[0] && udp_statistics[1]))
+ && udp_statistics[0] && udp_statistics[1]
+ && udplite_statistics[0] && udplite_statistics[1] ) )
return -ENOMEM;
(void) tcp_mib_init();
@@ -1313,6 +1316,10 @@ #endif
/* Setup TCP slab cache for open requests. */
tcp_init();
+ /*
+ * Add UDP-Lite (RFC 3828)
+ */
+ udplite4_register();
/*
* Set the ICMP layer up
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9c6cbe3..9b72fe4 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -66,6 +66,7 @@ static int sockstat_seq_show(struct seq_
tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
atomic_read(&tcp_memory_allocated));
seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
+ seq_printf(seq, "UDPLITE: inuse %d\n", fold_prot_inuse(&udplite_prot));
seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
seq_printf(seq, "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
atomic_read(&ip_frag_mem));
@@ -304,6 +305,17 @@ static int snmp_seq_show(struct seq_file
fold_field((void **) udp_statistics,
snmp4_udp_list[i].entry));
+ /* the UDP and UDP-Lite MIBs are the same */
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %s", snmp4_udp_list[i].name);
+
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %lu",
+ fold_field((void **) udplite_statistics,
+ snmp4_udp_list[i].entry) );
+
seq_putc(seq, '\n');
return 0;
}
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 464970e..34183aa 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -131,6 +131,8 @@ #define SNMP_INC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]++)
#define SNMP_DEC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]--)
+#define SNMP_DEC_STATS_BH(mib, field) \
+ (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]--)
#define SNMP_ADD_STATS_BH(mib, field, addend) \
(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field] += addend)
#define SNMP_ADD_STATS_USER(mib, field, addend) \
diff --git a/include/linux/in.h b/include/linux/in.h
index bcaca83..0903e5f 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -44,6 +44,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3614090..592b666 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -264,6 +264,7 @@ #define SOL_UDP 17
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
new file mode 100644
index 0000000..a899fa1
--- /dev/null
+++ b/Documentation/networking/udplite.txt
@@ -0,0 +1,291 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+ last modified: Mon 18th September 2006
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can always also download the latest patch for the stable
+ kernel tree and some example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Wireshark UDP-Lite WiKi (with capture files):
+ http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828, on http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+ UDP-Lite is not enabled per default: set CONFIG_IP_UDPLITE=y to support it.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ dead easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ Since both UDP-Litev4 and UDP-Litev6 are supported, the porting process is the
+ same in both occasions. With just this change you are able to run UDP-Lite
+ services or connect to UDP-Lite servers. The kernel will assume that you are
+ not interested in using partial checksum coverage and so emulate UDP mode.
+
+ To make use of the partial checksum coverage facilities requires setting just
+ one socket option which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level must be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, checksumming will always be performed
+ and can not be disabled using SO_NO_CHECK. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always will be ignored, while the value of
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ is meaningless (as in TCP). Packets with a zero checksum field are
+ illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
+
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams.
+
+ NoPorts: Number of packets received to an unknown port.
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet
+ * checksum coverage violated
+ * bad checksum
+
+ OutDatagrams: Total number of sent datagrams.
+
+ These statistics derive from the UDP MIB (RFC 2013).
+
+
+ VI) IPTABLES
+
+ There is packet match support for UDP-Lite as well as support for the LOG target.
+ If you copy and paste the following line into /etc/protcols,
+
+ udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
+
+ then
+ iptables -A INPUT -p udplite -j LOG
+
+ will produce logging output to syslog. Dropping and rejecting packets also works.
+
+
+ VII) MAINTAINER ADDRESS
+
+ The UDP-Lite patch was developed at
+ University of Aberdeen
+ Electronics Research Group
+ Department of Engineering
+ Fraser Noble Building
+ Aberdeen AB24 3UE; UK
+ The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
+ code had been developed by William Stanislaus, <william@erg.abdn.ac.uk>.
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite
2006-09-19 7:25 ` [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite Gerrit Renker
@ 2006-09-19 7:37 ` Patrick McHardy
0 siblings, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2006-09-19 7:37 UTC (permalink / raw)
To: Gerrit Renker; +Cc: David Miller, netdev
Gerrit Renker wrote:
> Basic xfrm and netfilter support for UDP-Lite:
> * matching of UDP-Lite packets
> * LOG support
> * header file support
>
> --
> --- a/net/netfilter/xt_multiport.c
> +++ b/net/netfilter/xt_multiport.c
> @@ -161,8 +161,9 @@ check(u_int16_t proto,
> u_int8_t count)
> {
> /* Must specify supported protocol, no unknown flags or bad count */
> - return (proto == IPPROTO_TCP || proto == IPPROTO_UDP
> - || proto == IPPROTO_SCTP || proto == IPPROTO_DCCP)
> + return ( proto == IPPROTO_TCP ||
> + proto == IPPROTO_UDP || proto == IPPROTO_UDPLITE ||
> + proto == IPPROTO_SCTP || proto == IPPROTO_DCCP )
> && !(ip_invflags & XT_INV_PROTO)
> && (match_flags == XT_MULTIPORT_SOURCE
> || match_flags == XT_MULTIPORT_DESTINATION
The patch looks good besides the fancy formating above. I'm not too
much of a fan of the existing formating, but please keep it consistent.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code
2006-09-19 7:25 ` [PATCHv3 1/4][RFC] net/ipv4: consolidated UDP / UDP-Lite code Gerrit Renker
@ 2006-10-09 9:51 ` Gerrit Renker
2006-10-11 2:38 ` David Miller
2006-10-12 7:49 ` Gerrit Renker
0 siblings, 2 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-10-09 9:51 UTC (permalink / raw)
To: davem; +Cc: netdev
Hi David,
this is a maintenance update of the UDP-Lite code. It tracks recent changes to udp.c
which make it difficult to apply earlier versions of the patch. For consistency, the
code has now also been aligned to use __be32/16.
The patch addresses the suggestions made earlier by the following people on this list:
Patrick McHardy, Herbert Xu, James Morris, Arnaldo de Melo, Yoshifuji Hideaki,
and David S. Miller.
Being small, I send it in one piece (but could dice it up); the code comprises:
* self-contained UDP-Lite extension (udplite.c and include/net/udplite.h)
* consolidated checksumming for UDPv4/6 and UDP-Litev4/6
--UDPv{4,6} checksumming therefore simpler
--enforces the mandatory UDPv6 checksums (RFC 2460, sec. 8.1)
* fixes the bug stated on http://bugzilla.kernel.org/show_bug.cgi?id=6660
( i.e. correct counting of delivered/erratic datagrams )
* concentrates some shared code which previously re-appeared in different files
* basic xfrm/netfilter support
I have left the v6-side sitting in the drawer. But it is easy to add, since it mirrors
the structure of the v4-side - and asking whether this structure is ok with regard
to the existing code is the whole point of this patch.
- Gerrit
--
Documentation/networking/udplite.txt | 291 ++++++++++++++++++++
include/linux/in.h | 1
include/linux/socket.h | 1
include/linux/udp.h | 11
include/net/snmp.h | 2
include/net/udp.h | 95 ++++++
include/net/udplite.h | 86 ++++++
include/net/xfrm.h | 2
net/ipv4/af_inet.c | 9
net/ipv4/netfilter/ipt_LOG.c | 11
net/ipv4/proc.c | 12
net/ipv4/udp.c | 487 +++++++++++++++++++++--------------
net/ipv4/udplite.c | 186 +++++++++++++
net/ipv4/xfrm4_policy.c | 1
net/ipv6/netfilter/ip6t_LOG.c | 10
net/ipv6/udp.c | 60 ----
net/ipv6/xfrm6_policy.c | 1
net/netfilter/xt_multiport.c | 5
net/netfilter/xt_tcpudp.c | 20 +
19 files changed, 1040 insertions(+), 251 deletions(-)
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
new file mode 100644
index 0000000..a899fa1
--- /dev/null
+++ b/Documentation/networking/udplite.txt
@@ -0,0 +1,291 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+ last modified: Mon 18th September 2006
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can always also download the latest patch for the stable
+ kernel tree and some example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Wireshark UDP-Lite WiKi (with capture files):
+ http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828, on http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+ UDP-Lite is not enabled per default: set CONFIG_IP_UDPLITE=y to support it.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ dead easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ Since both UDP-Litev4 and UDP-Litev6 are supported, the porting process is the
+ same in both occasions. With just this change you are able to run UDP-Lite
+ services or connect to UDP-Lite servers. The kernel will assume that you are
+ not interested in using partial checksum coverage and so emulate UDP mode.
+
+ To make use of the partial checksum coverage facilities requires setting just
+ one socket option which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level must be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, checksumming will always be performed
+ and can not be disabled using SO_NO_CHECK. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always will be ignored, while the value of
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ is meaningless (as in TCP). Packets with a zero checksum field are
+ illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
+
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams.
+
+ NoPorts: Number of packets received to an unknown port.
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet
+ * checksum coverage violated
+ * bad checksum
+
+ OutDatagrams: Total number of sent datagrams.
+
+ These statistics derive from the UDP MIB (RFC 2013).
+
+
+ VI) IPTABLES
+
+ There is packet match support for UDP-Lite as well as support for the LOG target.
+ If you copy and paste the following line into /etc/protcols,
+
+ udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
+
+ then
+ iptables -A INPUT -p udplite -j LOG
+
+ will produce logging output to syslog. Dropping and rejecting packets also works.
+
+
+ VII) MAINTAINER ADDRESS
+
+ The UDP-Lite patch was developed at
+ University of Aberdeen
+ Electronics Research Group
+ Department of Engineering
+ Fraser Noble Building
+ Aberdeen AB24 3UE; UK
+ The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
+ code had been developed by William Stanislaus, <william@erg.abdn.ac.uk>.
diff --git a/include/linux/in.h b/include/linux/in.h
index 2619859..1912e7c 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -45,6 +45,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3614090..592b666 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -264,6 +264,7 @@ #define SOL_UDP 17
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 014b41d..1248668 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,12 +50,23 @@ struct udp_sock {
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ /*
+ * Fields specific to UDP-Lite.
+ */
+ __u16 pcslen;
+ __u16 pcrlen;
+/* indicator bits used by pcflag: */
+#define UDPLITE_BIT 0x1 /* set by udplite proto init function */
+#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */
+#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt */
+ __u8 pcflag; /* marks socket as UDP-Lite if > 0 */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
}
+#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
#endif
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 464970e..34183aa 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -131,6 +131,8 @@ #define SNMP_INC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]++)
#define SNMP_DEC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]--)
+#define SNMP_DEC_STATS_BH(mib, field) \
+ (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]--)
#define SNMP_ADD_STATS_BH(mib, field, addend) \
(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field] += addend)
#define SNMP_ADD_STATS_USER(mib, field, addend) \
diff --git a/include/net/udp.h b/include/net/udp.h
index db0c05f..06476e3 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -26,9 +26,31 @@ #include <linux/list.h>
#include <net/inet_sock.h>
#include <net/sock.h>
#include <net/snmp.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
#include <linux/seq_file.h>
#define UDP_HTABLE_SIZE 128
+#include <net/udplite.h>
+
+/**
+ * struct udp_skb_cb - UDP(-Lite) private variables
+ *
+ * @header: private variables used by IPv4/IPv6
+ * @cscov: checksum coverage length (UDP-Lite only)
+ * @partial_cov: if set indicates partial csum coverage
+ */
+struct udp_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial_cov;
+};
+#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb))
extern struct hlist_head udp_hash[UDP_HTABLE_SIZE];
extern rwlock_t udp_hash_lock;
@@ -47,6 +69,62 @@ extern struct proto udp_prot;
struct sk_buff;
+/*
+ * Generic checksumming routines for UDP(-Lite) v4 and v6
+ */
+static inline u16 __udp_lib_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDP_SKB_CB(skb)->partial_cov)
+ return __skb_checksum_complete(skb);
+ return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static __inline__ int udp_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udp_lib_checksum_complete(skb);
+}
+
+/**
+ * udp_csum_outgoing - compute UDPv4/v6 checksum over fragments
+ * @sk: socket we are writing to
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static inline u32 udp_csum_outgoing(struct sock *sk, struct sk_buff *skb)
+{
+ u32 csum = csum_partial(skb->h.raw, sizeof(struct udphdr), 0);
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+ return csum;
+}
+
+/* hash routines shared between UDPv4/6 and UDP-Litev4/6 */
+static inline void udp_lib_hash(struct sock *sk)
+{
+ BUG();
+}
+
+static inline void udp_lib_unhash(struct sock *sk)
+{
+ write_lock_bh(&udp_hash_lock);
+ if (sk_del_node_init(sk)) {
+ inet_sk(sk)->num = 0;
+ sock_prot_dec_use(sk->sk_prot);
+ }
+ write_unlock_bh(&udp_hash_lock);
+}
+
+static inline void udp_lib_close(struct sock *sk, long timeout)
+{
+ sk_common_release(sk);
+}
+
+
+/* net/ipv4/udp.c */
extern int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *, const struct sock *));
extern void udp_err(struct sk_buff *, u32);
@@ -61,21 +139,32 @@ extern unsigned int udp_poll(struct file
poll_table *wait);
DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
-#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field)
-#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field)
-#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field)
+/*
+ * SNMP statistics for UDP and UDP-Lite
+ */
+#define UDP_INC_STATS_USER(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \
+ else SNMP_INC_STATS_USER(udp_statistics, field); } while(0)
+#define UDP_INC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
+ else SNMP_INC_STATS_BH(udp_statistics, field); } while(0)
+#define UDP_DEC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_DEC_STATS_BH(udplite_statistics, field); \
+ else SNMP_DEC_STATS_BH(udp_statistics, field); } while(0)
/* /proc */
struct udp_seq_afinfo {
struct module *owner;
char *name;
sa_family_t family;
+ struct hlist_head *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations *seq_fops;
};
struct udp_iter_state {
sa_family_t family;
+ struct hlist_head *hashtable;
int bucket;
struct seq_operations seq_ops;
};
diff --git a/include/net/udplite.h b/include/net/udplite.h
new file mode 100644
index 0000000..90d7aec
--- /dev/null
+++ b/include/net/udplite.h
@@ -0,0 +1,86 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ */
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10 /* sender partial coverage (as sent) */
+#define UDPLITE_RECV_CSCOV 11 /* receiver partial coverage (threshold ) */
+
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+
+/* UDP-Lite does not have a standardized MIB yet, so we inherit from UDP */
+DECLARE_SNMP_STAT(struct udp_mib, udplite_statistics);
+
+/*
+ * Checksum computation is all in software, hence simpler getfrag.
+ */
+static __inline__ int udplite_getfrag(void *from, char *to, int offset,
+ int len, int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+/*
+ * Functions used by UDP-Litev4 and UDP-Litev6
+ */
+/* calculate checksum coverage set for outgoing packets */
+static inline int udplite_sender_cscov(struct udp_sock *up, struct udphdr *uh)
+{
+ int cscov = up->len;
+
+ /*
+ * Sender has set `partial coverage' option on UDP-Lite socket
+ */
+ if (up->pcflag & UDPLITE_SEND_CC) {
+ if (up->pcslen < up->len) {
+ /* up->pcslen == 0 means that full coverage is required,
+ * partial coverage only if 0 < up->pcslen < up->len */
+ if (0 < up->pcslen) {
+ cscov = up->pcslen;
+ }
+ uh->len = htons(up->pcslen);
+ }
+ /*
+ * NOTE: Causes for the error case `up->pcslen > up->len':
+ * (i) Application error (will not be penalized).
+ * (ii) Payload too big for send buffer: data is split
+ * into several packets, each with its own header.
+ * In this case (e.g. last segment), coverage may
+ * exceed packet length.
+ * Since packets with coverage length > packet length are
+ * illegal, we fall back to the defaults here.
+ */
+ }
+ return cscov;
+}
+
+static inline u32 udplite_csum_outgoing(struct sock *sk, int cscov)
+{
+ struct sk_buff *skb;
+ int off, len;
+ u32 csum = 0;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ off = skb->h.raw - skb->data;
+ len = skb->len - off;
+
+ csum = skb_checksum(skb, off, (cscov > len)? len : cscov, csum);
+
+ if ((cscov -= len) <= 0)
+ break;
+ }
+ return csum;
+}
+
+/*
+ * net/ipv4/udplite.c
+ */
+extern void udplite4_register(void);
+extern int udplite_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *));
+extern int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ u16 len, u32 saddr, u32 daddr );
+#endif /* _UDPLITE_H */
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 1e2a4dd..2bd23cd 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -468,6 +468,7 @@ __be16 xfrm_flowi_sport(struct flowi *fl
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_sport;
break;
@@ -493,6 +494,7 @@ __be16 xfrm_flowi_dport(struct flowi *fl
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_dport;
break;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index edcf093..2b997b1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1223,10 +1223,13 @@ static int __init init_ipv4_mibs(void)
tcp_statistics[1] = alloc_percpu(struct tcp_mib);
udp_statistics[0] = alloc_percpu(struct udp_mib);
udp_statistics[1] = alloc_percpu(struct udp_mib);
+ udplite_statistics[0] = alloc_percpu(struct udp_mib);
+ udplite_statistics[1] = alloc_percpu(struct udp_mib);
if (!
(net_statistics[0] && net_statistics[1] && ip_statistics[0]
&& ip_statistics[1] && tcp_statistics[0] && tcp_statistics[1]
- && udp_statistics[0] && udp_statistics[1]))
+ && udp_statistics[0] && udp_statistics[1]
+ && udplite_statistics[0] && udplite_statistics[1] ) )
return -ENOMEM;
(void) tcp_mib_init();
@@ -1313,6 +1316,10 @@ #endif
/* Setup TCP slab cache for open requests. */
tcp_init();
+ /*
+ * Add UDP-Lite (RFC 3828)
+ */
+ udplite4_register();
/*
* Set the ICMP layer up
diff --git a/net/ipv4/netfilter/ipt_LOG.c b/net/ipv4/netfilter/ipt_LOG.c
index 7dc820d..46eee64 100644
--- a/net/ipv4/netfilter/ipt_LOG.c
+++ b/net/ipv4/netfilter/ipt_LOG.c
@@ -171,11 +171,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (ih->protocol == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (ntohs(ih->frag_off) & IP_OFFSET)
break;
@@ -341,6 +345,7 @@ static void dump_packet(const struct nf_
/* IP: 40+46+6+11+127 = 230 */
/* TCP: 10+max(25,20+30+13+9+32+11+127) = 252 */
/* UDP: 10+max(25,20) = 35 */
+ /* UDPLITE: 14+max(25,20) = 39 */
/* ICMP: 11+max(25, 18+25+max(19,14,24+3+n+10,3+n+10)) = 91+n */
/* ESP: 10+max(25)+15 = 50 */
/* AH: 9+max(25)+15 = 49 */
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9c6cbe3..9b72fe4 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -66,6 +66,7 @@ static int sockstat_seq_show(struct seq_
tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
atomic_read(&tcp_memory_allocated));
seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
+ seq_printf(seq, "UDPLITE: inuse %d\n", fold_prot_inuse(&udplite_prot));
seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
seq_printf(seq, "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
atomic_read(&ip_frag_mem));
@@ -304,6 +305,17 @@ static int snmp_seq_show(struct seq_file
fold_field((void **) udp_statistics,
snmp4_udp_list[i].entry));
+ /* the UDP and UDP-Lite MIBs are the same */
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %s", snmp4_udp_list[i].name);
+
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %lu",
+ fold_field((void **) udplite_statistics,
+ snmp4_udp_list[i].entry) );
+
seq_putc(seq, '\n');
return 0;
}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 865d752..47bfbf3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include <linux/errno.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/inet.h>
-#include <linux/ipv6.h>
#include <linux/netdevice.h>
#include <net/snmp.h>
-#include <net/ip.h>
#include <net/tcp_states.h>
#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -120,26 +118,29 @@ DEFINE_RWLOCK(udp_hash_lock);
static int udp_port_rover;
-static inline int udp_lport_inuse(u16 num)
+static inline int __udp_lib_lport_inuse(__be16 num, struct hlist_head udptable[])
{
struct sock *sk;
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[num & (UDP_HTABLE_SIZE - 1)])
+ sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
if (inet_sk(sk)->num == num)
return 1;
return 0;
}
/**
- * udp_get_port - common port lookup for IPv4 and IPv6
+ * __udp_lib_get_port - UDP/-Lite port lookup for IPv4 and IPv6
*
* @sk: socket struct in question
* @snum: port number to look up
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ * @port_rover: pointer to record of last unallocated port
* @saddr_comp: AF-dependent comparison of bound local IP addresses
*/
-int udp_get_port(struct sock *sk, unsigned short snum,
- int (*saddr_cmp)(const struct sock *sk1, const struct sock *sk2))
+static int __udp_lib_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover,
+ int (*saddr_cmp)(const struct sock *, const struct sock *))
{
struct hlist_node *node;
struct hlist_head *head;
@@ -150,15 +151,15 @@ int udp_get_port(struct sock *sk, unsign
if (snum == 0) {
int best_size_so_far, best, result, i;
- if (udp_port_rover > sysctl_local_port_range[1] ||
- udp_port_rover < sysctl_local_port_range[0])
- udp_port_rover = sysctl_local_port_range[0];
+ if (*port_rover > sysctl_local_port_range[1] ||
+ *port_rover < sysctl_local_port_range[0])
+ *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
- best = result = udp_port_rover;
+ best = result = *port_rover;
for (i = 0; i < UDP_HTABLE_SIZE; i++, result++) {
int size;
- head = &udp_hash[result & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[result & (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(head)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -179,15 +180,15 @@ int udp_get_port(struct sock *sk, unsign
result = sysctl_local_port_range[0]
+ ((result - sysctl_local_port_range[0]) &
(UDP_HTABLE_SIZE - 1));
- if (!udp_lport_inuse(result))
+ if (! __udp_lib_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
goto fail;
gotit:
- udp_port_rover = snum = result;
+ *port_rover = snum = result;
} else {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_for_each(sk2, node, head)
if (inet_sk(sk2)->num == snum &&
@@ -200,7 +201,7 @@ gotit:
}
inet_sk(sk)->num = snum;
if (sk_unhashed(sk)) {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_add_node(sk, head);
sock_prot_inc_use(sk->sk_prot);
}
@@ -210,6 +211,12 @@ fail:
return error;
}
+__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
+}
+
static inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
{
struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
@@ -224,34 +231,20 @@ static inline int udp_v4_get_port(struct
return udp_get_port(sk, snum, ipv4_rcv_saddr_equal);
}
-
-static void udp_v4_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v4_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
* harder than this. -DaveM
*/
-static struct sock *udp_v4_lookup_longway(__be32 saddr, __be16 sport,
- __be32 daddr, __be16 dport, int dif)
+static struct sock *__udp4_lib_lookup(__be32 saddr, __be16 sport,
+ __be32 daddr, __be16 dport,
+ int dif, struct hlist_head udptable[])
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
unsigned short hnum = ntohs(dport);
int badness = -1;
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ read_lock(&udp_hash_lock);
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && !ipv6_only_sock(sk)) {
@@ -285,20 +278,16 @@ static struct sock *udp_v4_lookup_longwa
}
}
}
+ if (result)
+ sock_hold(result);
+ read_unlock(&udp_hash_lock);
return result;
}
static __inline__ struct sock *udp_v4_lookup(__be32 saddr, __be16 sport,
__be32 daddr, __be16 dport, int dif)
{
- struct sock *sk;
-
- read_lock(&udp_hash_lock);
- sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif);
- if (sk)
- sock_hold(sk);
- read_unlock(&udp_hash_lock);
- return sk;
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
}
static inline struct sock *udp_v4_mcast_next(struct sock *sk,
@@ -340,7 +329,8 @@ found:
* to find the appropriate port.
*/
-void udp_err(struct sk_buff *skb, u32 info)
+static void __udp4_lib_err(struct sk_buff *skb, u32 info,
+ struct hlist_head udptable[] )
{
struct inet_sock *inet;
struct iphdr *iph = (struct iphdr*)skb->data;
@@ -351,7 +341,8 @@ void udp_err(struct sk_buff *skb, u32 in
int harderr;
int err;
- sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(iph->daddr, uh->dest, iph->saddr, uh->source,
+ skb->dev->ifindex, udptable );
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
@@ -405,6 +396,11 @@ out:
sock_put(sk);
}
+__inline__ void udp_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udp_hash);
+}
+
/*
* Throw away all pending data and cancel the corking. Socket is locked.
*/
@@ -419,6 +415,45 @@ static void udp_flush_pending_frames(str
}
}
+/**
+ * udp4_hwcsum_outgoing - handle outgoing HW checksumming
+ * @sk: socket we are sending on
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static void udp4_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb,
+ __be32 src, __be32 dst, int len )
+{
+ unsigned int csum = 0, offset;
+ struct udphdr *uh = skb->h.uh;
+
+ if (skb_queue_len(&sk->sk_write_queue) == 1) {
+ /*
+ * Only one fragment on the socket.
+ */
+ skb->csum = offsetof(struct udphdr, check);
+ uh->check = ~csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, 0);
+ } else {
+ /*
+ * HW-checksum won't work as there are two or more
+ * fragments on the socket so that all csums of sk_buffs
+ * should be together
+ */
+ offset = skb->h.raw - skb->data;
+ skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+
+ skb->ip_summed = CHECKSUM_NONE;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+
+ uh->check = csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+ }
+}
+
/*
* Push out all pending data as one UDP datagram. Socket is locked.
*/
@@ -429,6 +464,7 @@ static int udp_push_pending_frames(struc
struct sk_buff *skb;
struct udphdr *uh;
int err = 0;
+ u32 csum = 0;
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
@@ -443,52 +479,31 @@ static int udp_push_pending_frames(struc
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
+ if (up->pcflag) { /* UDP-Lite */
+ int cscov = udplite_sender_cscov(up, uh);
+
+ csum = udplite_csum_outgoing(sk, cscov);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ } else if (sk->sk_no_check == UDP_CSUM_NOXMIT) { /* UDP csum disabled */
+
skb->ip_summed = CHECKSUM_NONE;
goto send;
- }
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- /*
- * Only one fragment on the socket.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- skb->csum = offsetof(struct udphdr, check);
- uh->check = ~csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, 0);
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, skb->csum);
- if (uh->check == 0)
- uh->check = -1;
- }
- } else {
- unsigned int csum = 0;
- /*
- * HW-checksum won't work as there are two or more
- * fragments on the socket so that all csums of sk_buffs
- * should be together.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int offset = (unsigned char *)uh - skb->data;
- skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+ } else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
- skb->ip_summed = CHECKSUM_NONE;
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- }
+ udp4_hwcsum_outgoing(sk, skb, fl->fl4_src,fl->fl4_dst, up->len);
+ goto send;
+
+ } else /* `normal' UDP */
+ csum = udp_csum_outgoing(sk, skb);
+
+ /* add protocol-dependent pseudo-header */
+ uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst, up->len,
+ sk->sk_protocol, csum );
+ if (uh->check == 0)
+ uh->check = -1;
- skb_queue_walk(&sk->sk_write_queue, skb) {
- csum = csum_add(csum, skb->csum);
- }
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, csum);
- if (uh->check == 0)
- uh->check = -1;
- }
send:
err = ip_push_pending_frames(sk);
out:
@@ -497,12 +512,6 @@ out:
return err;
}
-
-static unsigned short udp_check(struct udphdr *uh, int len, __be32 saddr, __be32 daddr, unsigned long base)
-{
- return(csum_tcpudp_magic(saddr, daddr, len, IPPROTO_UDP, base));
-}
-
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -516,8 +525,9 @@ int udp_sendmsg(struct kiocb *iocb, stru
__be32 daddr, faddr, saddr;
__be16 dport;
u8 tos;
- int err;
+ int err, is_udplite = up->pcflag;
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
if (len > 0xFFFF)
return -EMSGSIZE;
@@ -622,7 +632,7 @@ int udp_sendmsg(struct kiocb *iocb, stru
{ .daddr = faddr,
.saddr = saddr,
.tos = tos } },
- .proto = IPPROTO_UDP,
+ .proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = dport } } };
@@ -668,8 +678,9 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
- sizeof(struct udphdr), &ipc, rt,
+ getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
+ err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
+ sizeof(struct udphdr), &ipc, rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
udp_flush_pending_frames(sk);
@@ -684,7 +695,7 @@ out:
if (free)
kfree(ipc.opt);
if (!err) {
- UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -695,7 +706,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -795,17 +806,6 @@ int udp_ioctl(struct sock *sk, int cmd,
return(0);
}
-static __inline__ int __udp_checksum_complete(struct sk_buff *skb)
-{
- return __skb_checksum_complete(skb);
-}
-
-static __inline__ int udp_checksum_complete(struct sk_buff *skb)
-{
- return skb->ip_summed != CHECKSUM_UNNECESSARY &&
- __udp_checksum_complete(skb);
-}
-
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
@@ -817,7 +817,7 @@ static int udp_recvmsg(struct kiocb *ioc
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
struct sk_buff *skb;
- int copied, err;
+ int copied, err, copy_only, is_udplite = IS_UDPLITE(sk);
/*
* Check any passed addresses
@@ -839,15 +839,25 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
- if (__udp_checksum_complete(skb))
+ /*
+ * Decide whether to checksum and/or copy data.
+ *
+ * UDP: checksum may have been computed in HW,
+ * (re-)compute it if message is truncated.
+ * UDP-Lite: always needs to checksum, no HW support.
+ */
+ copy_only = (skb->ip_summed==CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
+ if (__udp_lib_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
+ copy_only = 1;
+ }
+
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
@@ -880,7 +890,8 @@ out:
return err;
csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
skb_kill_datagram(sk, skb, flags);
@@ -912,11 +923,6 @@ int udp_disconnect(struct sock *sk, int
return 0;
}
-static void udp_close(struct sock *sk, long timeout)
-{
- sk_common_release(sk);
-}
-
/* return:
* 1 if the the UDP system should process it
* 0 if we should drop this packet
@@ -1021,10 +1027,8 @@ static int udp_queue_rcv_skb(struct sock
/*
* Charge it to the socket, dropping if the queue is full.
*/
- if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
nf_reset(skb);
if (up->encap_type) {
@@ -1048,31 +1052,77 @@ static int udp_queue_rcv_skb(struct sock
if (ret < 0) {
/* process the ESP packet */
ret = xfrm4_rcv_encap(skb, up->encap_type);
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return -ret;
}
/* FALLTHROUGH -- it's a UDP Packet */
}
- if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
- if (__udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ /*
+ * MIB statistics other than incrementing the error count are
+ * disabled for the following two types of errors: these depend
+ * on the application settings, not on the functioning of the
+ * protocol stack as such.
+ *
+ *
+ * RFC 3828 here recommends (sec 3.3): "There should also be a
+ * way ... to ... at least let the receiving application block
+ * delivery of packets with coverage values less than a value
+ * provided by the application."
+ */
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE: partial coverage "
+ "%d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
+ }
+ /* The next case involves violating the min. coverage requested
+ * by the receiver. This is subtle: if receiver wants x and x is
+ * greater than the buffersize/MTU then receiver will complain
+ * that it wants x while sender emits packets of smaller size y.
+ * Therefore the above ...()->partial_cov statement is essential.
+ */
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING
+ "UDPLITE: coverage %d too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
}
+ }
+
+ if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
+ if (__udp_lib_checksum_complete(skb))
+ goto drop;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+
+ /*
+ * Incrementing this counter when the datagram is later taken
+ * off the queue (e.g. due to receive failure) is problematic, cf.
+ * http://bugzilla.kernel.org/show_bug.cgi?id=6660
+ * This module counts correctly by decrementing InDatagrams
+ * whenever the datagram is popped off a queue without being
+ * actually delivered: see udp_recvmsg() and udp_poll().
+ */
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+
+drop:
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
/*
@@ -1081,14 +1131,16 @@ static int udp_queue_rcv_skb(struct sock
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
- __be32 saddr, __be32 daddr)
+static int __udp4_lib_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh,
+ __be32 saddr, __be32 daddr,
+ struct hlist_head udptable[])
{
struct sock *sk;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v4_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (sk) {
@@ -1117,6 +1169,12 @@ static int udp_v4_mcast_deliver(struct s
return 0;
}
+static __inline__ int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
+ __be32 saddr, __be32 daddr )
+{
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udp_hash);
+}
+
/* Initialize UDP checksum. If exited with zero value (success),
* CHECKSUM_UNNECESSARY means, that no more checks are required.
* Otherwise, csum completion requires chacksumming packet body,
@@ -1128,7 +1186,7 @@ static void udp_checksum_init(struct sk_
if (uh->check == 0) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
} else if (skb->ip_summed == CHECKSUM_COMPLETE) {
- if (!udp_check(uh, ulen, saddr, daddr, skb->csum))
+ if (!csum_tcpudp_magic(saddr,daddr,ulen, IPPROTO_UDP, skb->csum))
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if (skb->ip_summed != CHECKSUM_UNNECESSARY)
@@ -1136,16 +1194,20 @@ static void udp_checksum_init(struct sk_
/* Probably, we should checksum udp header (it should be in cache
* in any case) and data in tiny packets (< rx copybreak).
*/
+
+ /* UDP = UDP-Lite with a non-partial checksum coverage */
+ UDP_SKB_CB(skb)->partial_cov = 0;
}
/*
* All we need to do is get the socket, and then do a checksum.
*/
-int udp_rcv(struct sk_buff *skb)
+static int __udp4_lib_rcv(struct sk_buff *skb,
+ struct hlist_head udptable[], int is_udplite)
{
struct sock *sk;
- struct udphdr *uh;
+ struct udphdr *uh = skb->h.uh;
unsigned short ulen;
struct rtable *rt = (struct rtable*)skb->dst;
__be32 saddr = skb->nh.iph->saddr;
@@ -1153,34 +1215,40 @@ int udp_rcv(struct sk_buff *skb)
int len = skb->len;
/*
- * Validate the packet and the UDP length.
+ * Validate the packet.
*/
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
- goto no_header;
-
- uh = skb->h.uh;
+ goto drop; /* No space for header. */
ulen = ntohs(uh->len);
-
- if (ulen > len || ulen < sizeof(*uh))
+ if (ulen > len)
goto short_packet;
- if (pskb_trim_rcsum(skb, ulen))
- goto short_packet;
+ if(! is_udplite ) { /* UDP validates ulen. */
- udp_checksum_init(skb, uh, ulen, saddr, daddr);
+ if (ulen < sizeof(*uh) || pskb_trim_rcsum(skb, ulen))
+ goto short_packet;
+
+ /* note the difference: UDP uses ulen, UDP-Lite uses len */
+ udp_checksum_init(skb, uh, ulen, saddr, daddr);
+
+ } else { /* UDP-Lite validates cscov. */
+ if (! udplite_checksum_init(skb, uh, len, saddr, daddr))
+ goto csum_error;
+ }
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return udp_v4_mcast_deliver(skb, uh, saddr, daddr);
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udptable);
- sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+ skb->dev->ifindex, udptable );
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
- * it it wants the return to be -protocol, or 0
+ * it wants the return to be -protocol, or 0
*/
if (ret > 0)
return -ret;
@@ -1195,7 +1263,7 @@ int udp_rcv(struct sk_buff *skb)
if (udp_checksum_complete(skb))
goto csum_error;
- UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
/*
@@ -1206,35 +1274,39 @@ int udp_rcv(struct sk_buff *skb)
return(0);
short_packet:
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
ulen,
len,
NIPQUAD(daddr),
ntohs(uh->dest));
-no_header:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return(0);
+ goto drop;
csum_error:
/*
* RFC1122: OK. Discards the bad packet silently (as far as
* the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
NIPQUAD(daddr),
ntohs(uh->dest),
ulen);
drop:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
+__inline__ int udp_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udp_hash, 0);
+}
+
static int udp_destroy_sock(struct sock *sk)
{
lock_sock(sk);
@@ -1284,6 +1356,32 @@ static int do_udp_setsockopt(struct sock
}
break;
+ /*
+ * UDP-Lite's partial checksum coverage (RFC 3828).
+ */
+ /* The sender sets actual checksum coverage length via this option.
+ * The case coverage > packet length is handled by send module. */
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ /* The receiver specifies a minimum checksum coverage value. To make
+ * sense, this should be set to at least 8 (as done below). If zero is
+ * used, this again means full checksum coverage. */
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -1295,18 +1393,18 @@ static int do_udp_setsockopt(struct sock
static int udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return ip_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1333,6 +1431,16 @@ static int do_udp_getsockopt(struct sock
val = up->encap_type;
break;
+ /* The following two cannot be changed on UDP sockets, the return is
+ * always 0 (which corresponds to the full checksum coverage of UDP). */
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -1347,18 +1455,18 @@ static int do_udp_getsockopt(struct sock
static int udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return ip_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_getsockopt(sk, level, optname, optval, optlen);
}
#endif
/**
@@ -1378,7 +1486,8 @@ unsigned int udp_poll(struct file *file,
{
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
-
+ int is_lite = IS_UDPLITE(sk);
+
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
!(file->f_flags & O_NONBLOCK) &&
@@ -1389,7 +1498,11 @@ unsigned int udp_poll(struct file *file,
spin_lock_bh(&rcvq->lock);
while ((skb = skb_peek(rcvq)) != NULL) {
if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ /* The datagram has already been counted as
+ * InDatagram when earlier it was enqueued.
+ * Update count of really received datagrams. */
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_lite);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_lite);
__skb_unlink(skb, rcvq);
kfree_skb(skb);
} else {
@@ -1411,7 +1524,7 @@ unsigned int udp_poll(struct file *file,
struct proto udp_prot = {
.name = "UDP",
.owner = THIS_MODULE,
- .close = udp_close,
+ .close = udp_lib_close,
.connect = ip4_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
@@ -1422,8 +1535,8 @@ struct proto udp_prot = {
.recvmsg = udp_recvmsg,
.sendpage = udp_sendpage,
.backlog_rcv = udp_queue_rcv_skb,
- .hash = udp_v4_hash,
- .unhash = udp_v4_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v4_get_port,
.obj_size = sizeof(struct udp_sock),
#ifdef CONFIG_COMPAT
@@ -1442,7 +1555,7 @@ static struct sock *udp_get_first(struct
for (state->bucket = 0; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[state->bucket]) {
+ sk_for_each(sk, node, state->hashtable + state->bucket) {
if (sk->sk_family == state->family)
goto found;
}
@@ -1463,7 +1576,7 @@ try_again:
} while (sk && sk->sk_family != state->family);
if (!sk && ++state->bucket < UDP_HTABLE_SIZE) {
- sk = sk_head(&udp_hash[state->bucket]);
+ sk = sk_head(state->hashtable + state->bucket);
goto try_again;
}
return sk;
@@ -1513,6 +1626,7 @@ static int udp_seq_open(struct inode *in
if (!s)
goto out;
s->family = afinfo->family;
+ s->hashtable = afinfo->hashtable;
s->seq_ops.start = udp_seq_start;
s->seq_ops.next = udp_seq_next;
s->seq_ops.show = afinfo->seq_show;
@@ -1579,7 +1693,7 @@ static void udp4_format_sock(struct sock
atomic_read(&sp->sk_refcnt), sp);
}
-static int udp4_seq_show(struct seq_file *seq, void *v)
+int udp4_seq_show(struct seq_file *seq, void *v)
{
if (v == SEQ_START_TOKEN)
seq_printf(seq, "%-127s\n",
@@ -1602,6 +1716,7 @@ static struct udp_seq_afinfo udp4_seq_af
.owner = THIS_MODULE,
.name = "udp",
.family = AF_INET,
+ .hashtable = udp_hash,
.seq_show = udp4_seq_show,
.seq_fops = &udp4_seq_fops,
};
@@ -1630,3 +1745,5 @@ #ifdef CONFIG_PROC_FS
EXPORT_SYMBOL(udp_proc_register);
EXPORT_SYMBOL(udp_proc_unregister);
#endif
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
new file mode 100644
index 0000000..84d09f3
--- /dev/null
+++ b/net/ipv4/udplite.c
@@ -0,0 +1,186 @@
+/*
+ * UDPLITE An implementation of the UDP-Lite protocol (RFC 3828).
+ *
+ * Version: $Id: udplite.c,v 1.24 2006/09/18 21:50:59 gerrit Exp gerrit $
+ *
+ * Authors: Gerrit Renker <gerrit@erg.abdn.ac.uk>
+ *
+ * Changes:
+ * Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+static int udplite_port_rover;
+DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics) __read_mostly;
+
+/* Designate sk as UDP-Lite socket */
+static inline int udplite_sk_init(struct sock *sk)
+{
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
+__inline__ int udplite_get_port(struct sock *sk, unsigned short p,
+ int (*c)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, p, udplite_hash, &udplite_port_rover, c);
+}
+
+static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return udplite_get_port(sk, snum, ipv4_rcv_saddr_equal);
+}
+
+static __inline__ struct sock *udplite_v4_lookup(u32 saddr, u16 sport,
+ u32 daddr, u16 dport, int dif)
+{
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+static __inline__ int udplite_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udplite_hash);
+}
+
+__inline__ void udplite_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udplite_hash);
+}
+
+int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ u16 len, u32 saddr, u32 daddr )
+{
+ u16 cscov;
+
+ /* In UDPv4 a zero checksum means that the transmitter generated no
+ * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
+ * with a zero checksum field are illegal. */
+ if (uh->check == 0) {
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: zeroed csum field"
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", NIPQUAD(saddr),
+ ntohs(uh->source), NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+ }
+
+ UDP_SKB_CB(skb)->partial_cov = 0;
+ cscov = ntohs(uh->len);
+
+ if (cscov == 0) /* Indicates that full coverage is required. */
+ cscov = len;
+ else if (cscov < 8 || cscov > len) {
+ /*
+ * Coverage length violates RFC 3828: log and discard silently.
+ */
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: bad csum coverage %d/%d "
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", cscov, len,
+ NIPQUAD(saddr), ntohs(uh->source),
+ NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+
+ } else if (cscov < len)
+ UDP_SKB_CB(skb)->partial_cov = 1;
+
+ UDP_SKB_CB(skb)->cscov = cscov;
+
+ /*
+ * Initialise pseudo-header for checksum computation.
+ *
+ * There is no known NIC manufacturer supporting UDP-Lite yet,
+ * hence ip_summed is always (re-)set to CHECKSUM_NONE.
+ */
+ skb->csum = csum_tcpudp_nofold(saddr, daddr, len, IPPROTO_UDPLITE, 0);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ return 1;
+}
+
+__inline__ int udplite_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udplite_hash, 1);
+}
+
+struct proto udplite_prot = {
+ .name = "UDP-Lite",
+ .owner = THIS_MODULE,
+ .close = udp_lib_close,
+ .connect = ip4_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udp_destroy_sock,
+ .setsockopt = udp_setsockopt,
+ .getsockopt = udp_getsockopt,
+ .sendmsg = udp_sendmsg,
+ .recvmsg = udp_recvmsg,
+ .sendpage = udp_sendpage,
+ .backlog_rcv = udp_queue_rcv_skb,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
+ .get_port = udplite_v4_get_port,
+ .obj_size = sizeof(struct udp_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udp_setsockopt,
+ .compat_getsockopt = compat_udp_getsockopt,
+#endif
+};
+
+static struct net_protocol udplite_protocol = {
+ .handler = udplite_rcv,
+ .err_handler = udplite_err,
+ .no_policy = 1,
+};
+
+static struct inet_protosw udplite4_protosw = {
+ .type = SOCK_DGRAM,
+ .protocol = IPPROTO_UDPLITE,
+ .prot = &udplite_prot,
+ .ops = &inet_dgram_ops,
+ .capability = -1,
+ .no_check = 0, /* must checksum (RFC 3828) */
+ .flags = INET_PROTOSW_PERMANENT,
+};
+
+#ifdef CONFIG_PROC_FS
+static struct file_operations udplite4_seq_fops;
+static struct udp_seq_afinfo udplite4_seq_afinfo = {
+ .owner = THIS_MODULE,
+ .name = "udplite",
+ .family = AF_INET,
+ .hashtable = udplite_hash,
+ .seq_show = udp4_seq_show,
+ .seq_fops = &udplite4_seq_fops,
+};
+#endif /* CONFIG_PROC_FS */
+
+
+void __init udplite4_register(void)
+{
+ if (proto_register(&udplite_prot, 1))
+ goto out_register_err;
+
+ if (inet_add_protocol(&udplite_protocol, IPPROTO_UDPLITE) < 0)
+ goto out_unregister_proto;
+
+ inet_register_protosw(&udplite4_protosw);
+
+#ifdef CONFIG_PROC_FS
+ if (udp_proc_register(&udplite4_seq_afinfo)) /* udplite4_proc_init() */
+ printk(KERN_ERR "udplite4: Cannot register /proc!\n");
+#endif /* CONFIG_PROC_FS */
+ return;
+
+out_unregister_proto:
+ proto_unregister(&udplite_prot);
+out_register_err:
+ printk(KERN_CRIT "udplite4_register: Cannot add UDP-Lite protocol.\n");
+}
+
+EXPORT_SYMBOL(udplite_hash);
+EXPORT_SYMBOL(udplite_prot);
+EXPORT_SYMBOL(udplite_get_port); /* for v6 */
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 7a7a001..923a48b 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -199,6 +199,7 @@ _decode_session4(struct sk_buff *skb, st
if (!(iph->frag_off & htons(IP_MF | IP_OFFSET))) {
switch (iph->protocol) {
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/ipv6/netfilter/ip6t_LOG.c b/net/ipv6/netfilter/ip6t_LOG.c
index 0cf537d..3cb6bb7 100644
--- a/net/ipv6/netfilter/ip6t_LOG.c
+++ b/net/ipv6/netfilter/ip6t_LOG.c
@@ -270,11 +270,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (currenthdr == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (fragment)
break;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e0c3934..fc9761f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -66,21 +66,6 @@ static inline int udp_v6_get_port(struct
return udp_get_port(sk, snum, ipv6_rcv_saddr_equal);
}
-static void udp_v6_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v6_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
static struct sock *udp_v6_lookup(struct in6_addr *saddr, u16 sport,
struct in6_addr *daddr, u16 dport, int dif)
{
@@ -132,15 +117,6 @@ static struct sock *udp_v6_lookup(struct
}
/*
- *
- */
-
-static void udpv6_close(struct sock *sk, long timeout)
-{
- sk_common_release(sk);
-}
-
-/*
* This should be easy, if there is something there we
* return it, otherwise we block.
*/
@@ -499,35 +475,12 @@ static int udp_v6_push_pending_frames(st
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
- skb->ip_summed = CHECKSUM_NONE;
- goto send;
- }
-
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, skb->csum);
- } else {
- u32 tmp_csum = 0;
-
- skb_queue_walk(&sk->sk_write_queue, skb) {
- tmp_csum = csum_add(tmp_csum, skb->csum);
- }
- tmp_csum = csum_partial((char *)uh,
- sizeof(struct udphdr), tmp_csum);
- tmp_csum = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, tmp_csum);
- uh->check = tmp_csum;
-
- }
+ uh->check = csum_ipv6_magic(&fl->fl6_src, &fl->fl6_dst,
+ up->len, fl->proto,
+ udp_csum_outgoing(sk, skb) );
if (uh->check == 0)
uh->check = -1;
-send:
err = ip6_push_pending_frames(sk);
out:
up->len = 0;
@@ -1003,6 +956,7 @@ static struct udp_seq_afinfo udp6_seq_af
.owner = THIS_MODULE,
.name = "udp6",
.family = AF_INET6,
+ .hashtable = udp_hash,
.seq_show = udp6_seq_show,
.seq_fops = &udp6_seq_fops,
};
@@ -1022,7 +976,7 @@ #endif /* CONFIG_PROC_FS */
struct proto udpv6_prot = {
.name = "UDPv6",
.owner = THIS_MODULE,
- .close = udpv6_close,
+ .close = udp_lib_close,
.connect = ip6_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
@@ -1032,8 +986,8 @@ struct proto udpv6_prot = {
.sendmsg = udpv6_sendmsg,
.recvmsg = udpv6_recvmsg,
.backlog_rcv = udpv6_queue_rcv_skb,
- .hash = udp_v6_hash,
- .unhash = udp_v6_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v6_get_port,
.obj_size = sizeof(struct udp6_sock),
#ifdef CONFIG_COMPAT
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 6a252e2..cb352e3 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -272,6 +272,7 @@ _decode_session6(struct sk_buff *skb, st
break;
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index d3aefd3..c3b4bc0 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -1,5 +1,5 @@
-/* Kernel module to match one of a list of TCP/UDP/SCTP/DCCP ports: ports are in
- the same place so we can treat them as equal. */
+/* Kernel module to match one of a list of TCP/UDP(-Lite)/SCTP/DCCP ports:
+ ports are in the same place so we can treat them as equal. */
/* (C) 1999-2001 Paul `Rusty' Russell
* (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org>
@@ -162,6 +162,7 @@ check(u_int16_t proto,
{
/* Must specify supported protocol, no unknown flags or bad count */
return (proto == IPPROTO_TCP || proto == IPPROTO_UDP
+ || proto == IPPROTO_UDPLITE
|| proto == IPPROTO_SCTP || proto == IPPROTO_DCCP)
&& !(ip_invflags & XT_INV_PROTO)
&& (match_flags == XT_MULTIPORT_SOURCE
diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c
index e76a68e..46414b5 100644
--- a/net/netfilter/xt_tcpudp.c
+++ b/net/netfilter/xt_tcpudp.c
@@ -10,7 +10,7 @@ #include <linux/netfilter/xt_tcpudp.h>
#include <linux/netfilter_ipv4/ip_tables.h>
#include <linux/netfilter_ipv6/ip6_tables.h>
-MODULE_DESCRIPTION("x_tables match for TCP and UDP, supports IPv4 and IPv6");
+MODULE_DESCRIPTION("x_tables match for TCP and UDP(-Lite), supports IPv4 and IPv6");
MODULE_LICENSE("GPL");
MODULE_ALIAS("xt_tcp");
MODULE_ALIAS("xt_udp");
@@ -234,6 +234,24 @@ static struct xt_match xt_tcpudp_match[]
.proto = IPPROTO_UDP,
.me = THIS_MODULE,
},
+ {
+ .name = "udplite",
+ .family = AF_INET,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "udplite",
+ .family = AF_INET6,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
};
static int __init xt_tcpudp_init(void)
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code
2006-10-09 9:51 ` [PATCH-update][RFC] net: " Gerrit Renker
@ 2006-10-11 2:38 ` David Miller
2006-10-11 7:40 ` Gerrit Renker
2006-10-12 7:49 ` Gerrit Renker
1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2006-10-11 2:38 UTC (permalink / raw)
To: gerrit; +Cc: netdev
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Mon, 9 Oct 2006 10:51:44 +0100
> csum_copy_err:
> - UDP_INC_STATS_BH(UDP_MIB_INERRORS);
> + UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
> + UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
I'm not a big fan at all of these "statistic corrections"
we're starting to place in various spots.
I really don't think it's the end of the world if we count as
INDATAGRAMS a packet that we later discover has a bad checksum.
There are even some serious issues to consider because we might,
for example, count the INDATAGRAMS on a particular cpu, and then
find the checksum problem on another cpu and thus be subtracting
a different one of the per-cpu instances of this counter. That
could make it negative or similar.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code
2006-10-11 2:38 ` David Miller
@ 2006-10-11 7:40 ` Gerrit Renker
0 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-10-11 7:40 UTC (permalink / raw)
To: David Miller; +Cc: netdev
| > csum_copy_err:
| > - UDP_INC_STATS_BH(UDP_MIB_INERRORS);
| > + UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
| > + UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
|
| I'm not a big fan at all of these "statistic corrections"
| we're starting to place in various spots.
I am not really fond of this solution either. It evolved via discussion, from a
previous suggestion to place the increment of InDatagrams into udp_recvmsg().
The problem with that alternative was in dealing with applications which use the
data_ready handler (such as sunrpc).
| I really don't think it's the end of the world if we count as
| INDATAGRAMS a packet that we later discover has a bad checksum.
It would have been nice to say "all these counters count correctly". Maybe that
has been over-ambitious - in tcp_ipv4.c I found that tcp_v4_rcv also counts
incoming segments even if they are bad.
I will restore the original state and remove the counter decrements in the
next upcoming version of the patch.
Thank you,
-- Gerrit
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code
2006-10-09 9:51 ` [PATCH-update][RFC] net: " Gerrit Renker
2006-10-11 2:38 ` David Miller
@ 2006-10-12 7:49 ` Gerrit Renker
2006-10-12 9:01 ` David Miller
1 sibling, 1 reply; 19+ messages in thread
From: Gerrit Renker @ 2006-10-12 7:49 UTC (permalink / raw)
To: davem; +Cc: netdev
Hi David,
please find attached the updated UDP-Lite patch - I have removed the
statistics corrections you pointed out to me.
Can you please indicate whether you are ok, by and large, with the
changes performed by the patch? Even if it is some time ago, I
have implemented in this patch the architectural suggestions you
gave me a while earlier.
Since it takes more time to review both the v4 and the v6 side, I
have still cut out the v6 side (it mirrors v4) - do you think it is
better to send the whole thing?
My idea was to ask you for comments on the v4 side, incorporate these
into both v4/v6 sides and then send the completed works without any
further changes - so that the final reviewing will not take that long.
Comments made by other people have been incorporated in the attached patch.
Many thanks,
Gerrit
---
Documentation/networking/udplite.txt | 291 +++++++++++++++++++++
include/linux/in.h | 1
include/linux/socket.h | 1
include/linux/udp.h | 11
include/net/udp.h | 92 ++++++
include/net/udplite.h | 86 ++++++
include/net/xfrm.h | 2
net/ipv4/af_inet.c | 9
net/ipv4/netfilter/ipt_LOG.c | 11
net/ipv4/proc.c | 12
net/ipv4/udp.c | 474 +++++++++++++++++++++--------------
net/ipv4/udplite.c | 186 +++++++++++++
net/ipv4/xfrm4_policy.c | 1
net/ipv6/netfilter/ip6t_LOG.c | 10
net/ipv6/udp.c | 60 ----
net/ipv6/xfrm6_policy.c | 1
net/netfilter/xt_multiport.c | 5
net/netfilter/xt_tcpudp.c | 20 +
18 files changed, 1022 insertions(+), 251 deletions(-)
diff --git a/include/linux/in.h b/include/linux/in.h
index 2619859..1912e7c 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -45,6 +45,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3614090..592b666 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -264,6 +264,7 @@ #define SOL_UDP 17
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 014b41d..1248668 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,12 +50,23 @@ struct udp_sock {
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ /*
+ * Fields specific to UDP-Lite.
+ */
+ __u16 pcslen;
+ __u16 pcrlen;
+/* indicator bits used by pcflag: */
+#define UDPLITE_BIT 0x1 /* set by udplite proto init function */
+#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */
+#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt */
+ __u8 pcflag; /* marks socket as UDP-Lite if > 0 */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
}
+#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
#endif
diff --git a/include/net/udp.h b/include/net/udp.h
index db0c05f..33d61a8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -26,9 +26,31 @@ #include <linux/list.h>
#include <net/inet_sock.h>
#include <net/sock.h>
#include <net/snmp.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
#include <linux/seq_file.h>
#define UDP_HTABLE_SIZE 128
+#include <net/udplite.h>
+
+/**
+ * struct udp_skb_cb - UDP(-Lite) private variables
+ *
+ * @header: private variables used by IPv4/IPv6
+ * @cscov: checksum coverage length (UDP-Lite only)
+ * @partial_cov: if set indicates partial csum coverage
+ */
+struct udp_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial_cov;
+};
+#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb))
extern struct hlist_head udp_hash[UDP_HTABLE_SIZE];
extern rwlock_t udp_hash_lock;
@@ -47,6 +69,62 @@ extern struct proto udp_prot;
struct sk_buff;
+/*
+ * Generic checksumming routines for UDP(-Lite) v4 and v6
+ */
+static inline u16 __udp_lib_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDP_SKB_CB(skb)->partial_cov)
+ return __skb_checksum_complete(skb);
+ return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static __inline__ int udp_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udp_lib_checksum_complete(skb);
+}
+
+/**
+ * udp_csum_outgoing - compute UDPv4/v6 checksum over fragments
+ * @sk: socket we are writing to
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static inline u32 udp_csum_outgoing(struct sock *sk, struct sk_buff *skb)
+{
+ u32 csum = csum_partial(skb->h.raw, sizeof(struct udphdr), 0);
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+ return csum;
+}
+
+/* hash routines shared between UDPv4/6 and UDP-Litev4/6 */
+static inline void udp_lib_hash(struct sock *sk)
+{
+ BUG();
+}
+
+static inline void udp_lib_unhash(struct sock *sk)
+{
+ write_lock_bh(&udp_hash_lock);
+ if (sk_del_node_init(sk)) {
+ inet_sk(sk)->num = 0;
+ sock_prot_dec_use(sk->sk_prot);
+ }
+ write_unlock_bh(&udp_hash_lock);
+}
+
+static inline void udp_lib_close(struct sock *sk, long timeout)
+{
+ sk_common_release(sk);
+}
+
+
+/* net/ipv4/udp.c */
extern int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *, const struct sock *));
extern void udp_err(struct sk_buff *, u32);
@@ -61,21 +139,29 @@ extern unsigned int udp_poll(struct file
poll_table *wait);
DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
-#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field)
-#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field)
-#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field)
+/*
+ * SNMP statistics for UDP and UDP-Lite
+ */
+#define UDP_INC_STATS_USER(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \
+ else SNMP_INC_STATS_USER(udp_statistics, field); } while(0)
+#define UDP_INC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
+ else SNMP_INC_STATS_BH(udp_statistics, field); } while(0)
/* /proc */
struct udp_seq_afinfo {
struct module *owner;
char *name;
sa_family_t family;
+ struct hlist_head *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations *seq_fops;
};
struct udp_iter_state {
sa_family_t family;
+ struct hlist_head *hashtable;
int bucket;
struct seq_operations seq_ops;
};
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 865d752..f404eee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include <linux/errno.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/inet.h>
-#include <linux/ipv6.h>
#include <linux/netdevice.h>
#include <net/snmp.h>
-#include <net/ip.h>
#include <net/tcp_states.h>
#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -120,26 +118,29 @@ DEFINE_RWLOCK(udp_hash_lock);
static int udp_port_rover;
-static inline int udp_lport_inuse(u16 num)
+static inline int __udp_lib_lport_inuse(__be16 num, struct hlist_head udptable[])
{
struct sock *sk;
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[num & (UDP_HTABLE_SIZE - 1)])
+ sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
if (inet_sk(sk)->num == num)
return 1;
return 0;
}
/**
- * udp_get_port - common port lookup for IPv4 and IPv6
+ * __udp_lib_get_port - UDP/-Lite port lookup for IPv4 and IPv6
*
* @sk: socket struct in question
* @snum: port number to look up
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ * @port_rover: pointer to record of last unallocated port
* @saddr_comp: AF-dependent comparison of bound local IP addresses
*/
-int udp_get_port(struct sock *sk, unsigned short snum,
- int (*saddr_cmp)(const struct sock *sk1, const struct sock *sk2))
+static int __udp_lib_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover,
+ int (*saddr_cmp)(const struct sock *, const struct sock *))
{
struct hlist_node *node;
struct hlist_head *head;
@@ -150,15 +151,15 @@ int udp_get_port(struct sock *sk, unsign
if (snum == 0) {
int best_size_so_far, best, result, i;
- if (udp_port_rover > sysctl_local_port_range[1] ||
- udp_port_rover < sysctl_local_port_range[0])
- udp_port_rover = sysctl_local_port_range[0];
+ if (*port_rover > sysctl_local_port_range[1] ||
+ *port_rover < sysctl_local_port_range[0])
+ *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
- best = result = udp_port_rover;
+ best = result = *port_rover;
for (i = 0; i < UDP_HTABLE_SIZE; i++, result++) {
int size;
- head = &udp_hash[result & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[result & (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(head)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -179,15 +180,15 @@ int udp_get_port(struct sock *sk, unsign
result = sysctl_local_port_range[0]
+ ((result - sysctl_local_port_range[0]) &
(UDP_HTABLE_SIZE - 1));
- if (!udp_lport_inuse(result))
+ if (! __udp_lib_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
goto fail;
gotit:
- udp_port_rover = snum = result;
+ *port_rover = snum = result;
} else {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_for_each(sk2, node, head)
if (inet_sk(sk2)->num == snum &&
@@ -200,7 +201,7 @@ gotit:
}
inet_sk(sk)->num = snum;
if (sk_unhashed(sk)) {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_add_node(sk, head);
sock_prot_inc_use(sk->sk_prot);
}
@@ -210,6 +211,12 @@ fail:
return error;
}
+__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
+}
+
static inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
{
struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
@@ -224,34 +231,20 @@ static inline int udp_v4_get_port(struct
return udp_get_port(sk, snum, ipv4_rcv_saddr_equal);
}
-
-static void udp_v4_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v4_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
* harder than this. -DaveM
*/
-static struct sock *udp_v4_lookup_longway(__be32 saddr, __be16 sport,
- __be32 daddr, __be16 dport, int dif)
+static struct sock *__udp4_lib_lookup(__be32 saddr, __be16 sport,
+ __be32 daddr, __be16 dport,
+ int dif, struct hlist_head udptable[])
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
unsigned short hnum = ntohs(dport);
int badness = -1;
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ read_lock(&udp_hash_lock);
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && !ipv6_only_sock(sk)) {
@@ -285,20 +278,16 @@ static struct sock *udp_v4_lookup_longwa
}
}
}
+ if (result)
+ sock_hold(result);
+ read_unlock(&udp_hash_lock);
return result;
}
static __inline__ struct sock *udp_v4_lookup(__be32 saddr, __be16 sport,
__be32 daddr, __be16 dport, int dif)
{
- struct sock *sk;
-
- read_lock(&udp_hash_lock);
- sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif);
- if (sk)
- sock_hold(sk);
- read_unlock(&udp_hash_lock);
- return sk;
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
}
static inline struct sock *udp_v4_mcast_next(struct sock *sk,
@@ -340,7 +329,8 @@ found:
* to find the appropriate port.
*/
-void udp_err(struct sk_buff *skb, u32 info)
+static void __udp4_lib_err(struct sk_buff *skb, u32 info,
+ struct hlist_head udptable[] )
{
struct inet_sock *inet;
struct iphdr *iph = (struct iphdr*)skb->data;
@@ -351,7 +341,8 @@ void udp_err(struct sk_buff *skb, u32 in
int harderr;
int err;
- sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(iph->daddr, uh->dest, iph->saddr, uh->source,
+ skb->dev->ifindex, udptable );
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
@@ -405,6 +396,11 @@ out:
sock_put(sk);
}
+__inline__ void udp_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udp_hash);
+}
+
/*
* Throw away all pending data and cancel the corking. Socket is locked.
*/
@@ -419,6 +415,45 @@ static void udp_flush_pending_frames(str
}
}
+/**
+ * udp4_hwcsum_outgoing - handle outgoing HW checksumming
+ * @sk: socket we are sending on
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static void udp4_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb,
+ __be32 src, __be32 dst, int len )
+{
+ unsigned int csum = 0, offset;
+ struct udphdr *uh = skb->h.uh;
+
+ if (skb_queue_len(&sk->sk_write_queue) == 1) {
+ /*
+ * Only one fragment on the socket.
+ */
+ skb->csum = offsetof(struct udphdr, check);
+ uh->check = ~csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, 0);
+ } else {
+ /*
+ * HW-checksum won't work as there are two or more
+ * fragments on the socket so that all csums of sk_buffs
+ * should be together
+ */
+ offset = skb->h.raw - skb->data;
+ skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+
+ skb->ip_summed = CHECKSUM_NONE;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+
+ uh->check = csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+ }
+}
+
/*
* Push out all pending data as one UDP datagram. Socket is locked.
*/
@@ -429,6 +464,7 @@ static int udp_push_pending_frames(struc
struct sk_buff *skb;
struct udphdr *uh;
int err = 0;
+ u32 csum = 0;
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
@@ -443,52 +479,31 @@ static int udp_push_pending_frames(struc
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
+ if (up->pcflag) { /* UDP-Lite */
+ int cscov = udplite_sender_cscov(up, uh);
+
+ csum = udplite_csum_outgoing(sk, cscov);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ } else if (sk->sk_no_check == UDP_CSUM_NOXMIT) { /* UDP csum disabled */
+
skb->ip_summed = CHECKSUM_NONE;
goto send;
- }
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- /*
- * Only one fragment on the socket.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- skb->csum = offsetof(struct udphdr, check);
- uh->check = ~csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, 0);
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, skb->csum);
- if (uh->check == 0)
- uh->check = -1;
- }
- } else {
- unsigned int csum = 0;
- /*
- * HW-checksum won't work as there are two or more
- * fragments on the socket so that all csums of sk_buffs
- * should be together.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int offset = (unsigned char *)uh - skb->data;
- skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+ } else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
- skb->ip_summed = CHECKSUM_NONE;
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- }
+ udp4_hwcsum_outgoing(sk, skb, fl->fl4_src,fl->fl4_dst, up->len);
+ goto send;
+
+ } else /* `normal' UDP */
+ csum = udp_csum_outgoing(sk, skb);
+
+ /* add protocol-dependent pseudo-header */
+ uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst, up->len,
+ sk->sk_protocol, csum );
+ if (uh->check == 0)
+ uh->check = -1;
- skb_queue_walk(&sk->sk_write_queue, skb) {
- csum = csum_add(csum, skb->csum);
- }
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, csum);
- if (uh->check == 0)
- uh->check = -1;
- }
send:
err = ip_push_pending_frames(sk);
out:
@@ -497,12 +512,6 @@ out:
return err;
}
-
-static unsigned short udp_check(struct udphdr *uh, int len, __be32 saddr, __be32 daddr, unsigned long base)
-{
- return(csum_tcpudp_magic(saddr, daddr, len, IPPROTO_UDP, base));
-}
-
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -516,8 +525,9 @@ int udp_sendmsg(struct kiocb *iocb, stru
__be32 daddr, faddr, saddr;
__be16 dport;
u8 tos;
- int err;
+ int err, is_udplite = up->pcflag;
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
if (len > 0xFFFF)
return -EMSGSIZE;
@@ -622,7 +632,7 @@ int udp_sendmsg(struct kiocb *iocb, stru
{ .daddr = faddr,
.saddr = saddr,
.tos = tos } },
- .proto = IPPROTO_UDP,
+ .proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = dport } } };
@@ -668,8 +678,9 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
- sizeof(struct udphdr), &ipc, rt,
+ getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
+ err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
+ sizeof(struct udphdr), &ipc, rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
udp_flush_pending_frames(sk);
@@ -684,7 +695,7 @@ out:
if (free)
kfree(ipc.opt);
if (!err) {
- UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -695,7 +706,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -795,17 +806,6 @@ int udp_ioctl(struct sock *sk, int cmd,
return(0);
}
-static __inline__ int __udp_checksum_complete(struct sk_buff *skb)
-{
- return __skb_checksum_complete(skb);
-}
-
-static __inline__ int udp_checksum_complete(struct sk_buff *skb)
-{
- return skb->ip_summed != CHECKSUM_UNNECESSARY &&
- __udp_checksum_complete(skb);
-}
-
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
@@ -817,7 +817,7 @@ static int udp_recvmsg(struct kiocb *ioc
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
struct sk_buff *skb;
- int copied, err;
+ int copied, err, copy_only, is_udplite = IS_UDPLITE(sk);
/*
* Check any passed addresses
@@ -839,15 +839,25 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
- if (__udp_checksum_complete(skb))
+ /*
+ * Decide whether to checksum and/or copy data.
+ *
+ * UDP: checksum may have been computed in HW,
+ * (re-)compute it if message is truncated.
+ * UDP-Lite: always needs to checksum, no HW support.
+ */
+ copy_only = (skb->ip_summed==CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
+ if (__udp_lib_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
+ copy_only = 1;
+ }
+
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
@@ -880,7 +890,7 @@ out:
return err;
csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
skb_kill_datagram(sk, skb, flags);
@@ -912,11 +922,6 @@ int udp_disconnect(struct sock *sk, int
return 0;
}
-static void udp_close(struct sock *sk, long timeout)
-{
- sk_common_release(sk);
-}
-
/* return:
* 1 if the the UDP system should process it
* 0 if we should drop this packet
@@ -1021,10 +1026,8 @@ static int udp_queue_rcv_skb(struct sock
/*
* Charge it to the socket, dropping if the queue is full.
*/
- if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
nf_reset(skb);
if (up->encap_type) {
@@ -1048,31 +1051,69 @@ static int udp_queue_rcv_skb(struct sock
if (ret < 0) {
/* process the ESP packet */
ret = xfrm4_rcv_encap(skb, up->encap_type);
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return -ret;
}
/* FALLTHROUGH -- it's a UDP Packet */
}
- if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
- if (__udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ /*
+ * MIB statistics other than incrementing the error count are
+ * disabled for the following two types of errors: these depend
+ * on the application settings, not on the functioning of the
+ * protocol stack as such.
+ *
+ *
+ * RFC 3828 here recommends (sec 3.3): "There should also be a
+ * way ... to ... at least let the receiving application block
+ * delivery of packets with coverage values less than a value
+ * provided by the application."
+ */
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE: partial coverage "
+ "%d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
}
+ /* The next case involves violating the min. coverage requested
+ * by the receiver. This is subtle: if receiver wants x and x is
+ * greater than the buffersize/MTU then receiver will complain
+ * that it wants x while sender emits packets of smaller size y.
+ * Therefore the above ...()->partial_cov statement is essential.
+ */
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING
+ "UDPLITE: coverage %d too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
+ }
+ }
+
+ if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
+ if (__udp_lib_checksum_complete(skb))
+ goto drop;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+
+drop:
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
/*
@@ -1081,14 +1122,16 @@ static int udp_queue_rcv_skb(struct sock
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
- __be32 saddr, __be32 daddr)
+static int __udp4_lib_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh,
+ __be32 saddr, __be32 daddr,
+ struct hlist_head udptable[])
{
struct sock *sk;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v4_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (sk) {
@@ -1117,6 +1160,12 @@ static int udp_v4_mcast_deliver(struct s
return 0;
}
+static __inline__ int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
+ __be32 saddr, __be32 daddr )
+{
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udp_hash);
+}
+
/* Initialize UDP checksum. If exited with zero value (success),
* CHECKSUM_UNNECESSARY means, that no more checks are required.
* Otherwise, csum completion requires chacksumming packet body,
@@ -1128,7 +1177,7 @@ static void udp_checksum_init(struct sk_
if (uh->check == 0) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
} else if (skb->ip_summed == CHECKSUM_COMPLETE) {
- if (!udp_check(uh, ulen, saddr, daddr, skb->csum))
+ if (!csum_tcpudp_magic(saddr,daddr,ulen, IPPROTO_UDP, skb->csum))
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if (skb->ip_summed != CHECKSUM_UNNECESSARY)
@@ -1136,16 +1185,20 @@ static void udp_checksum_init(struct sk_
/* Probably, we should checksum udp header (it should be in cache
* in any case) and data in tiny packets (< rx copybreak).
*/
+
+ /* UDP = UDP-Lite with a non-partial checksum coverage */
+ UDP_SKB_CB(skb)->partial_cov = 0;
}
/*
* All we need to do is get the socket, and then do a checksum.
*/
-int udp_rcv(struct sk_buff *skb)
+static int __udp4_lib_rcv(struct sk_buff *skb,
+ struct hlist_head udptable[], int is_udplite)
{
struct sock *sk;
- struct udphdr *uh;
+ struct udphdr *uh = skb->h.uh;
unsigned short ulen;
struct rtable *rt = (struct rtable*)skb->dst;
__be32 saddr = skb->nh.iph->saddr;
@@ -1153,34 +1206,40 @@ int udp_rcv(struct sk_buff *skb)
int len = skb->len;
/*
- * Validate the packet and the UDP length.
+ * Validate the packet.
*/
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
- goto no_header;
-
- uh = skb->h.uh;
+ goto drop; /* No space for header. */
ulen = ntohs(uh->len);
-
- if (ulen > len || ulen < sizeof(*uh))
+ if (ulen > len)
goto short_packet;
- if (pskb_trim_rcsum(skb, ulen))
- goto short_packet;
+ if(! is_udplite ) { /* UDP validates ulen. */
+
+ if (ulen < sizeof(*uh) || pskb_trim_rcsum(skb, ulen))
+ goto short_packet;
+
+ /* note the difference: UDP uses ulen, UDP-Lite uses len */
+ udp_checksum_init(skb, uh, ulen, saddr, daddr);
- udp_checksum_init(skb, uh, ulen, saddr, daddr);
+ } else { /* UDP-Lite validates cscov. */
+ if (! udplite_checksum_init(skb, uh, len, saddr, daddr))
+ goto csum_error;
+ }
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return udp_v4_mcast_deliver(skb, uh, saddr, daddr);
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udptable);
- sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+ skb->dev->ifindex, udptable );
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
- * it it wants the return to be -protocol, or 0
+ * it wants the return to be -protocol, or 0
*/
if (ret > 0)
return -ret;
@@ -1195,7 +1254,7 @@ int udp_rcv(struct sk_buff *skb)
if (udp_checksum_complete(skb))
goto csum_error;
- UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
/*
@@ -1206,35 +1265,39 @@ int udp_rcv(struct sk_buff *skb)
return(0);
short_packet:
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
ulen,
len,
NIPQUAD(daddr),
ntohs(uh->dest));
-no_header:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return(0);
+ goto drop;
csum_error:
/*
* RFC1122: OK. Discards the bad packet silently (as far as
* the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
NIPQUAD(daddr),
ntohs(uh->dest),
ulen);
drop:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
+__inline__ int udp_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udp_hash, 0);
+}
+
static int udp_destroy_sock(struct sock *sk)
{
lock_sock(sk);
@@ -1284,6 +1347,32 @@ static int do_udp_setsockopt(struct sock
}
break;
+ /*
+ * UDP-Lite's partial checksum coverage (RFC 3828).
+ */
+ /* The sender sets actual checksum coverage length via this option.
+ * The case coverage > packet length is handled by send module. */
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ /* The receiver specifies a minimum checksum coverage value. To make
+ * sense, this should be set to at least 8 (as done below). If zero is
+ * used, this again means full checksum coverage. */
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -1295,18 +1384,18 @@ static int do_udp_setsockopt(struct sock
static int udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return ip_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1333,6 +1422,16 @@ static int do_udp_getsockopt(struct sock
val = up->encap_type;
break;
+ /* The following two cannot be changed on UDP sockets, the return is
+ * always 0 (which corresponds to the full checksum coverage of UDP). */
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -1347,18 +1446,18 @@ static int do_udp_getsockopt(struct sock
static int udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return ip_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_getsockopt(sk, level, optname, optval, optlen);
}
#endif
/**
@@ -1378,7 +1477,8 @@ unsigned int udp_poll(struct file *file,
{
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
-
+ int is_lite = IS_UDPLITE(sk);
+
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
!(file->f_flags & O_NONBLOCK) &&
@@ -1389,7 +1489,7 @@ unsigned int udp_poll(struct file *file,
spin_lock_bh(&rcvq->lock);
while ((skb = skb_peek(rcvq)) != NULL) {
if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_lite);
__skb_unlink(skb, rcvq);
kfree_skb(skb);
} else {
@@ -1411,7 +1511,7 @@ unsigned int udp_poll(struct file *file,
struct proto udp_prot = {
.name = "UDP",
.owner = THIS_MODULE,
- .close = udp_close,
+ .close = udp_lib_close,
.connect = ip4_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
@@ -1422,8 +1522,8 @@ struct proto udp_prot = {
.recvmsg = udp_recvmsg,
.sendpage = udp_sendpage,
.backlog_rcv = udp_queue_rcv_skb,
- .hash = udp_v4_hash,
- .unhash = udp_v4_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v4_get_port,
.obj_size = sizeof(struct udp_sock),
#ifdef CONFIG_COMPAT
@@ -1442,7 +1542,7 @@ static struct sock *udp_get_first(struct
for (state->bucket = 0; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[state->bucket]) {
+ sk_for_each(sk, node, state->hashtable + state->bucket) {
if (sk->sk_family == state->family)
goto found;
}
@@ -1463,7 +1563,7 @@ try_again:
} while (sk && sk->sk_family != state->family);
if (!sk && ++state->bucket < UDP_HTABLE_SIZE) {
- sk = sk_head(&udp_hash[state->bucket]);
+ sk = sk_head(state->hashtable + state->bucket);
goto try_again;
}
return sk;
@@ -1513,6 +1613,7 @@ static int udp_seq_open(struct inode *in
if (!s)
goto out;
s->family = afinfo->family;
+ s->hashtable = afinfo->hashtable;
s->seq_ops.start = udp_seq_start;
s->seq_ops.next = udp_seq_next;
s->seq_ops.show = afinfo->seq_show;
@@ -1579,7 +1680,7 @@ static void udp4_format_sock(struct sock
atomic_read(&sp->sk_refcnt), sp);
}
-static int udp4_seq_show(struct seq_file *seq, void *v)
+int udp4_seq_show(struct seq_file *seq, void *v)
{
if (v == SEQ_START_TOKEN)
seq_printf(seq, "%-127s\n",
@@ -1602,6 +1703,7 @@ static struct udp_seq_afinfo udp4_seq_af
.owner = THIS_MODULE,
.name = "udp",
.family = AF_INET,
+ .hashtable = udp_hash,
.seq_show = udp4_seq_show,
.seq_fops = &udp4_seq_fops,
};
@@ -1630,3 +1732,5 @@ #ifdef CONFIG_PROC_FS
EXPORT_SYMBOL(udp_proc_register);
EXPORT_SYMBOL(udp_proc_unregister);
#endif
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
diff --git a/include/net/udplite.h b/include/net/udplite.h
new file mode 100644
index 0000000..90d7aec
--- /dev/null
+++ b/include/net/udplite.h
@@ -0,0 +1,86 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ */
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10 /* sender partial coverage (as sent) */
+#define UDPLITE_RECV_CSCOV 11 /* receiver partial coverage (threshold ) */
+
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+
+/* UDP-Lite does not have a standardized MIB yet, so we inherit from UDP */
+DECLARE_SNMP_STAT(struct udp_mib, udplite_statistics);
+
+/*
+ * Checksum computation is all in software, hence simpler getfrag.
+ */
+static __inline__ int udplite_getfrag(void *from, char *to, int offset,
+ int len, int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+/*
+ * Functions used by UDP-Litev4 and UDP-Litev6
+ */
+/* calculate checksum coverage set for outgoing packets */
+static inline int udplite_sender_cscov(struct udp_sock *up, struct udphdr *uh)
+{
+ int cscov = up->len;
+
+ /*
+ * Sender has set `partial coverage' option on UDP-Lite socket
+ */
+ if (up->pcflag & UDPLITE_SEND_CC) {
+ if (up->pcslen < up->len) {
+ /* up->pcslen == 0 means that full coverage is required,
+ * partial coverage only if 0 < up->pcslen < up->len */
+ if (0 < up->pcslen) {
+ cscov = up->pcslen;
+ }
+ uh->len = htons(up->pcslen);
+ }
+ /*
+ * NOTE: Causes for the error case `up->pcslen > up->len':
+ * (i) Application error (will not be penalized).
+ * (ii) Payload too big for send buffer: data is split
+ * into several packets, each with its own header.
+ * In this case (e.g. last segment), coverage may
+ * exceed packet length.
+ * Since packets with coverage length > packet length are
+ * illegal, we fall back to the defaults here.
+ */
+ }
+ return cscov;
+}
+
+static inline u32 udplite_csum_outgoing(struct sock *sk, int cscov)
+{
+ struct sk_buff *skb;
+ int off, len;
+ u32 csum = 0;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ off = skb->h.raw - skb->data;
+ len = skb->len - off;
+
+ csum = skb_checksum(skb, off, (cscov > len)? len : cscov, csum);
+
+ if ((cscov -= len) <= 0)
+ break;
+ }
+ return csum;
+}
+
+/*
+ * net/ipv4/udplite.c
+ */
+extern void udplite4_register(void);
+extern int udplite_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *));
+extern int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ u16 len, u32 saddr, u32 daddr );
+#endif /* _UDPLITE_H */
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
new file mode 100644
index 0000000..84d09f3
--- /dev/null
+++ b/net/ipv4/udplite.c
@@ -0,0 +1,186 @@
+/*
+ * UDPLITE An implementation of the UDP-Lite protocol (RFC 3828).
+ *
+ * Version: $Id: udplite.c,v 1.24 2006/09/18 21:50:59 gerrit Exp gerrit $
+ *
+ * Authors: Gerrit Renker <gerrit@erg.abdn.ac.uk>
+ *
+ * Changes:
+ * Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+static int udplite_port_rover;
+DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics) __read_mostly;
+
+/* Designate sk as UDP-Lite socket */
+static inline int udplite_sk_init(struct sock *sk)
+{
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
+__inline__ int udplite_get_port(struct sock *sk, unsigned short p,
+ int (*c)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, p, udplite_hash, &udplite_port_rover, c);
+}
+
+static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return udplite_get_port(sk, snum, ipv4_rcv_saddr_equal);
+}
+
+static __inline__ struct sock *udplite_v4_lookup(u32 saddr, u16 sport,
+ u32 daddr, u16 dport, int dif)
+{
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+static __inline__ int udplite_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udplite_hash);
+}
+
+__inline__ void udplite_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udplite_hash);
+}
+
+int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ u16 len, u32 saddr, u32 daddr )
+{
+ u16 cscov;
+
+ /* In UDPv4 a zero checksum means that the transmitter generated no
+ * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
+ * with a zero checksum field are illegal. */
+ if (uh->check == 0) {
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: zeroed csum field"
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", NIPQUAD(saddr),
+ ntohs(uh->source), NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+ }
+
+ UDP_SKB_CB(skb)->partial_cov = 0;
+ cscov = ntohs(uh->len);
+
+ if (cscov == 0) /* Indicates that full coverage is required. */
+ cscov = len;
+ else if (cscov < 8 || cscov > len) {
+ /*
+ * Coverage length violates RFC 3828: log and discard silently.
+ */
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: bad csum coverage %d/%d "
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", cscov, len,
+ NIPQUAD(saddr), ntohs(uh->source),
+ NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+
+ } else if (cscov < len)
+ UDP_SKB_CB(skb)->partial_cov = 1;
+
+ UDP_SKB_CB(skb)->cscov = cscov;
+
+ /*
+ * Initialise pseudo-header for checksum computation.
+ *
+ * There is no known NIC manufacturer supporting UDP-Lite yet,
+ * hence ip_summed is always (re-)set to CHECKSUM_NONE.
+ */
+ skb->csum = csum_tcpudp_nofold(saddr, daddr, len, IPPROTO_UDPLITE, 0);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ return 1;
+}
+
+__inline__ int udplite_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udplite_hash, 1);
+}
+
+struct proto udplite_prot = {
+ .name = "UDP-Lite",
+ .owner = THIS_MODULE,
+ .close = udp_lib_close,
+ .connect = ip4_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udp_destroy_sock,
+ .setsockopt = udp_setsockopt,
+ .getsockopt = udp_getsockopt,
+ .sendmsg = udp_sendmsg,
+ .recvmsg = udp_recvmsg,
+ .sendpage = udp_sendpage,
+ .backlog_rcv = udp_queue_rcv_skb,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
+ .get_port = udplite_v4_get_port,
+ .obj_size = sizeof(struct udp_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udp_setsockopt,
+ .compat_getsockopt = compat_udp_getsockopt,
+#endif
+};
+
+static struct net_protocol udplite_protocol = {
+ .handler = udplite_rcv,
+ .err_handler = udplite_err,
+ .no_policy = 1,
+};
+
+static struct inet_protosw udplite4_protosw = {
+ .type = SOCK_DGRAM,
+ .protocol = IPPROTO_UDPLITE,
+ .prot = &udplite_prot,
+ .ops = &inet_dgram_ops,
+ .capability = -1,
+ .no_check = 0, /* must checksum (RFC 3828) */
+ .flags = INET_PROTOSW_PERMANENT,
+};
+
+#ifdef CONFIG_PROC_FS
+static struct file_operations udplite4_seq_fops;
+static struct udp_seq_afinfo udplite4_seq_afinfo = {
+ .owner = THIS_MODULE,
+ .name = "udplite",
+ .family = AF_INET,
+ .hashtable = udplite_hash,
+ .seq_show = udp4_seq_show,
+ .seq_fops = &udplite4_seq_fops,
+};
+#endif /* CONFIG_PROC_FS */
+
+
+void __init udplite4_register(void)
+{
+ if (proto_register(&udplite_prot, 1))
+ goto out_register_err;
+
+ if (inet_add_protocol(&udplite_protocol, IPPROTO_UDPLITE) < 0)
+ goto out_unregister_proto;
+
+ inet_register_protosw(&udplite4_protosw);
+
+#ifdef CONFIG_PROC_FS
+ if (udp_proc_register(&udplite4_seq_afinfo)) /* udplite4_proc_init() */
+ printk(KERN_ERR "udplite4: Cannot register /proc!\n");
+#endif /* CONFIG_PROC_FS */
+ return;
+
+out_unregister_proto:
+ proto_unregister(&udplite_prot);
+out_register_err:
+ printk(KERN_CRIT "udplite4_register: Cannot add UDP-Lite protocol.\n");
+}
+
+EXPORT_SYMBOL(udplite_hash);
+EXPORT_SYMBOL(udplite_prot);
+EXPORT_SYMBOL(udplite_get_port); /* for v6 */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e0c3934..fc9761f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -66,21 +66,6 @@ static inline int udp_v6_get_port(struct
return udp_get_port(sk, snum, ipv6_rcv_saddr_equal);
}
-static void udp_v6_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v6_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
static struct sock *udp_v6_lookup(struct in6_addr *saddr, u16 sport,
struct in6_addr *daddr, u16 dport, int dif)
{
@@ -132,15 +117,6 @@ static struct sock *udp_v6_lookup(struct
}
/*
- *
- */
-
-static void udpv6_close(struct sock *sk, long timeout)
-{
- sk_common_release(sk);
-}
-
-/*
* This should be easy, if there is something there we
* return it, otherwise we block.
*/
@@ -499,35 +475,12 @@ static int udp_v6_push_pending_frames(st
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
- skb->ip_summed = CHECKSUM_NONE;
- goto send;
- }
-
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, skb->csum);
- } else {
- u32 tmp_csum = 0;
-
- skb_queue_walk(&sk->sk_write_queue, skb) {
- tmp_csum = csum_add(tmp_csum, skb->csum);
- }
- tmp_csum = csum_partial((char *)uh,
- sizeof(struct udphdr), tmp_csum);
- tmp_csum = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, tmp_csum);
- uh->check = tmp_csum;
-
- }
+ uh->check = csum_ipv6_magic(&fl->fl6_src, &fl->fl6_dst,
+ up->len, fl->proto,
+ udp_csum_outgoing(sk, skb) );
if (uh->check == 0)
uh->check = -1;
-send:
err = ip6_push_pending_frames(sk);
out:
up->len = 0;
@@ -1003,6 +956,7 @@ static struct udp_seq_afinfo udp6_seq_af
.owner = THIS_MODULE,
.name = "udp6",
.family = AF_INET6,
+ .hashtable = udp_hash,
.seq_show = udp6_seq_show,
.seq_fops = &udp6_seq_fops,
};
@@ -1022,7 +976,7 @@ #endif /* CONFIG_PROC_FS */
struct proto udpv6_prot = {
.name = "UDPv6",
.owner = THIS_MODULE,
- .close = udpv6_close,
+ .close = udp_lib_close,
.connect = ip6_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
@@ -1032,8 +986,8 @@ struct proto udpv6_prot = {
.sendmsg = udpv6_sendmsg,
.recvmsg = udpv6_recvmsg,
.backlog_rcv = udpv6_queue_rcv_skb,
- .hash = udp_v6_hash,
- .unhash = udp_v6_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v6_get_port,
.obj_size = sizeof(struct udp6_sock),
#ifdef CONFIG_COMPAT
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index edcf093..2b997b1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1223,10 +1223,13 @@ static int __init init_ipv4_mibs(void)
tcp_statistics[1] = alloc_percpu(struct tcp_mib);
udp_statistics[0] = alloc_percpu(struct udp_mib);
udp_statistics[1] = alloc_percpu(struct udp_mib);
+ udplite_statistics[0] = alloc_percpu(struct udp_mib);
+ udplite_statistics[1] = alloc_percpu(struct udp_mib);
if (!
(net_statistics[0] && net_statistics[1] && ip_statistics[0]
&& ip_statistics[1] && tcp_statistics[0] && tcp_statistics[1]
- && udp_statistics[0] && udp_statistics[1]))
+ && udp_statistics[0] && udp_statistics[1]
+ && udplite_statistics[0] && udplite_statistics[1] ) )
return -ENOMEM;
(void) tcp_mib_init();
@@ -1313,6 +1316,10 @@ #endif
/* Setup TCP slab cache for open requests. */
tcp_init();
+ /*
+ * Add UDP-Lite (RFC 3828)
+ */
+ udplite4_register();
/*
* Set the ICMP layer up
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9c6cbe3..9b72fe4 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -66,6 +66,7 @@ static int sockstat_seq_show(struct seq_
tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
atomic_read(&tcp_memory_allocated));
seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
+ seq_printf(seq, "UDPLITE: inuse %d\n", fold_prot_inuse(&udplite_prot));
seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
seq_printf(seq, "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
atomic_read(&ip_frag_mem));
@@ -304,6 +305,17 @@ static int snmp_seq_show(struct seq_file
fold_field((void **) udp_statistics,
snmp4_udp_list[i].entry));
+ /* the UDP and UDP-Lite MIBs are the same */
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %s", snmp4_udp_list[i].name);
+
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %lu",
+ fold_field((void **) udplite_statistics,
+ snmp4_udp_list[i].entry) );
+
seq_putc(seq, '\n');
return 0;
}
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 737fdb2..70a8d2d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -468,6 +468,7 @@ __be16 xfrm_flowi_sport(struct flowi *fl
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_sport;
break;
@@ -493,6 +494,7 @@ __be16 xfrm_flowi_dport(struct flowi *fl
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_dport;
break;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 1bed0cd..af6867f 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -199,6 +199,7 @@ _decode_session4(struct sk_buff *skb, st
if (!(iph->frag_off & htons(IP_MF | IP_OFFSET))) {
switch (iph->protocol) {
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 73cee2e..8a25db1 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -272,6 +272,7 @@ _decode_session6(struct sk_buff *skb, st
break;
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index d3aefd3..c3b4bc0 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -1,5 +1,5 @@
-/* Kernel module to match one of a list of TCP/UDP/SCTP/DCCP ports: ports are in
- the same place so we can treat them as equal. */
+/* Kernel module to match one of a list of TCP/UDP(-Lite)/SCTP/DCCP ports:
+ ports are in the same place so we can treat them as equal. */
/* (C) 1999-2001 Paul `Rusty' Russell
* (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org>
@@ -162,6 +162,7 @@ check(u_int16_t proto,
{
/* Must specify supported protocol, no unknown flags or bad count */
return (proto == IPPROTO_TCP || proto == IPPROTO_UDP
+ || proto == IPPROTO_UDPLITE
|| proto == IPPROTO_SCTP || proto == IPPROTO_DCCP)
&& !(ip_invflags & XT_INV_PROTO)
&& (match_flags == XT_MULTIPORT_SOURCE
diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c
index e76a68e..46414b5 100644
--- a/net/netfilter/xt_tcpudp.c
+++ b/net/netfilter/xt_tcpudp.c
@@ -10,7 +10,7 @@ #include <linux/netfilter/xt_tcpudp.h>
#include <linux/netfilter_ipv4/ip_tables.h>
#include <linux/netfilter_ipv6/ip6_tables.h>
-MODULE_DESCRIPTION("x_tables match for TCP and UDP, supports IPv4 and IPv6");
+MODULE_DESCRIPTION("x_tables match for TCP and UDP(-Lite), supports IPv4 and IPv6");
MODULE_LICENSE("GPL");
MODULE_ALIAS("xt_tcp");
MODULE_ALIAS("xt_udp");
@@ -234,6 +234,24 @@ static struct xt_match xt_tcpudp_match[]
.proto = IPPROTO_UDP,
.me = THIS_MODULE,
},
+ {
+ .name = "udplite",
+ .family = AF_INET,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "udplite",
+ .family = AF_INET6,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
};
static int __init xt_tcpudp_init(void)
diff --git a/net/ipv4/netfilter/ipt_LOG.c b/net/ipv4/netfilter/ipt_LOG.c
index 7dc820d..46eee64 100644
--- a/net/ipv4/netfilter/ipt_LOG.c
+++ b/net/ipv4/netfilter/ipt_LOG.c
@@ -171,11 +171,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (ih->protocol == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (ntohs(ih->frag_off) & IP_OFFSET)
break;
@@ -341,6 +345,7 @@ static void dump_packet(const struct nf_
/* IP: 40+46+6+11+127 = 230 */
/* TCP: 10+max(25,20+30+13+9+32+11+127) = 252 */
/* UDP: 10+max(25,20) = 35 */
+ /* UDPLITE: 14+max(25,20) = 39 */
/* ICMP: 11+max(25, 18+25+max(19,14,24+3+n+10,3+n+10)) = 91+n */
/* ESP: 10+max(25)+15 = 50 */
/* AH: 9+max(25)+15 = 49 */
diff --git a/net/ipv6/netfilter/ip6t_LOG.c b/net/ipv6/netfilter/ip6t_LOG.c
index 0cf537d..3cb6bb7 100644
--- a/net/ipv6/netfilter/ip6t_LOG.c
+++ b/net/ipv6/netfilter/ip6t_LOG.c
@@ -270,11 +270,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (currenthdr == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (fragment)
break;
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
new file mode 100644
index 0000000..a899fa1
--- /dev/null
+++ b/Documentation/networking/udplite.txt
@@ -0,0 +1,291 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+ last modified: Mon 18th September 2006
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can always also download the latest patch for the stable
+ kernel tree and some example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Wireshark UDP-Lite WiKi (with capture files):
+ http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828, on http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+ UDP-Lite is not enabled per default: set CONFIG_IP_UDPLITE=y to support it.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ dead easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ Since both UDP-Litev4 and UDP-Litev6 are supported, the porting process is the
+ same in both occasions. With just this change you are able to run UDP-Lite
+ services or connect to UDP-Lite servers. The kernel will assume that you are
+ not interested in using partial checksum coverage and so emulate UDP mode.
+
+ To make use of the partial checksum coverage facilities requires setting just
+ one socket option which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level must be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, checksumming will always be performed
+ and can not be disabled using SO_NO_CHECK. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always will be ignored, while the value of
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ is meaningless (as in TCP). Packets with a zero checksum field are
+ illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
+
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams.
+
+ NoPorts: Number of packets received to an unknown port.
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet
+ * checksum coverage violated
+ * bad checksum
+
+ OutDatagrams: Total number of sent datagrams.
+
+ These statistics derive from the UDP MIB (RFC 2013).
+
+
+ VI) IPTABLES
+
+ There is packet match support for UDP-Lite as well as support for the LOG target.
+ If you copy and paste the following line into /etc/protcols,
+
+ udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
+
+ then
+ iptables -A INPUT -p udplite -j LOG
+
+ will produce logging output to syslog. Dropping and rejecting packets also works.
+
+
+ VII) MAINTAINER ADDRESS
+
+ The UDP-Lite patch was developed at
+ University of Aberdeen
+ Electronics Research Group
+ Department of Engineering
+ Fraser Noble Building
+ Aberdeen AB24 3UE; UK
+ The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
+ code had been developed by William Stanislaus, <william@erg.abdn.ac.uk>.
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code
2006-10-12 7:49 ` Gerrit Renker
@ 2006-10-12 9:01 ` David Miller
2006-10-13 15:14 ` [PATCHv4 1/3] net/ipv4: UDP-Lite support (RFC 3828) Gerrit Renker
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: David Miller @ 2006-10-12 9:01 UTC (permalink / raw)
To: gerrit; +Cc: netdev
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Thu, 12 Oct 2006 08:49:19 +0100
> please find attached the updated UDP-Lite patch - I have removed the
> statistics corrections you pointed out to me.
>
> Can you please indicate whether you are ok, by and large, with the
> changes performed by the patch? Even if it is some time ago, I
> have implemented in this patch the architectural suggestions you
> gave me a while earlier.
The patch looks pretty good. I have no problems with how
you implemented this at all.
I think we'll have no problem getting this into 2.6.20
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCHv4 1/3] net/ipv4: UDP-Lite support (RFC 3828)
2006-10-12 9:01 ` David Miller
@ 2006-10-13 15:14 ` Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 2/3] net/ipv6: v6-side of UDP-Lite Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 3/3] net: UDP-Lite misc files Gerrit Renker
2 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-10-13 15:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Hi David,
thank you for reviewing the code. This now is the full version,
including both the v4 and the v6 side.
I would like to say `please consider for inclusion', but I think
it would be good if the IPv6 developers could first have a look
through and say whether they are ok with the changes.
The v4 side remains the same as before - apart from some re-shuffling
to integrate it with v6.
Before submitting, I have checked compilation on a small bouquet of
different architectures (worked ok, no compile or run problems).
Thanks again,
Gerrit
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
include/linux/in.h | 1
include/linux/socket.h | 1
include/linux/udp.h | 12 +
include/net/udp.h | 91 +++++++++
include/net/udplite.h | 149 +++++++++++++++
net/ipv4/af_inet.c | 10 -
net/ipv4/proc.c | 13 +
net/ipv4/udp.c | 473 +++++++++++++++++++++++++++++--------------------
net/ipv4/udplite.c | 126 +++++++++++++
9 files changed, 682 insertions(+), 194 deletions(-)
diff --git a/include/linux/in.h b/include/linux/in.h
index 2619859..1912e7c 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -45,6 +45,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3614090..592b666 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -264,6 +264,7 @@ #define SOL_UDP 17
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 014b41d..564f3b0 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -38,6 +38,7 @@ #ifdef __KERNEL__
#include <linux/types.h>
#include <net/inet_sock.h>
+#define UDP_HTABLE_SIZE 128
struct udp_sock {
/* inet_sock has to be the first member */
@@ -50,12 +51,23 @@ struct udp_sock {
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ /*
+ * Fields specific to UDP-Lite.
+ */
+ __u16 pcslen;
+ __u16 pcrlen;
+/* indicator bits used by pcflag: */
+#define UDPLITE_BIT 0x1 /* set by udplite proto init function */
+#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */
+#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt */
+ __u8 pcflag; /* marks socket as UDP-Lite if > 0 */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
}
+#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
#endif
diff --git a/include/net/udp.h b/include/net/udp.h
index db0c05f..fa1552d 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -26,9 +26,28 @@ #include <linux/list.h>
#include <net/inet_sock.h>
#include <net/sock.h>
#include <net/snmp.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
#include <linux/seq_file.h>
-#define UDP_HTABLE_SIZE 128
+/**
+ * struct udp_skb_cb - UDP(-Lite) private variables
+ *
+ * @header: private variables used by IPv4/IPv6
+ * @cscov: checksum coverage length (UDP-Lite only)
+ * @partial_cov: if set indicates partial csum coverage
+ */
+struct udp_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial_cov;
+};
+#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb))
extern struct hlist_head udp_hash[UDP_HTABLE_SIZE];
extern rwlock_t udp_hash_lock;
@@ -47,6 +66,62 @@ extern struct proto udp_prot;
struct sk_buff;
+/*
+ * Generic checksumming routines for UDP(-Lite) v4 and v6
+ */
+static inline u16 __udp_lib_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDP_SKB_CB(skb)->partial_cov)
+ return __skb_checksum_complete(skb);
+ return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static __inline__ int udp_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udp_lib_checksum_complete(skb);
+}
+
+/**
+ * udp_csum_outgoing - compute UDPv4/v6 checksum over fragments
+ * @sk: socket we are writing to
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static inline u32 udp_csum_outgoing(struct sock *sk, struct sk_buff *skb)
+{
+ u32 csum = csum_partial(skb->h.raw, sizeof(struct udphdr), 0);
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+ return csum;
+}
+
+/* hash routines shared between UDPv4/6 and UDP-Litev4/6 */
+static inline void udp_lib_hash(struct sock *sk)
+{
+ BUG();
+}
+
+static inline void udp_lib_unhash(struct sock *sk)
+{
+ write_lock_bh(&udp_hash_lock);
+ if (sk_del_node_init(sk)) {
+ inet_sk(sk)->num = 0;
+ sock_prot_dec_use(sk->sk_prot);
+ }
+ write_unlock_bh(&udp_hash_lock);
+}
+
+static inline void udp_lib_close(struct sock *sk, long timeout)
+{
+ sk_common_release(sk);
+}
+
+
+/* net/ipv4/udp.c */
extern int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *, const struct sock *));
extern void udp_err(struct sk_buff *, u32);
@@ -61,21 +136,29 @@ extern unsigned int udp_poll(struct file
poll_table *wait);
DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
-#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field)
-#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field)
-#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field)
+/*
+ * SNMP statistics for UDP and UDP-Lite
+ */
+#define UDP_INC_STATS_USER(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \
+ else SNMP_INC_STATS_USER(udp_statistics, field); } while(0)
+#define UDP_INC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
+ else SNMP_INC_STATS_BH(udp_statistics, field); } while(0)
/* /proc */
struct udp_seq_afinfo {
struct module *owner;
char *name;
sa_family_t family;
+ struct hlist_head *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations *seq_fops;
};
struct udp_iter_state {
sa_family_t family;
+ struct hlist_head *hashtable;
int bucket;
struct seq_operations seq_ops;
};
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 865d752..2321553 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include <linux/errno.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/inet.h>
-#include <linux/ipv6.h>
#include <linux/netdevice.h>
#include <net/snmp.h>
-#include <net/ip.h>
#include <net/tcp_states.h>
#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -103,6 +101,7 @@ #include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <net/sock.h>
#include <net/udp.h>
+#include <net/udplite.h>
#include <net/icmp.h>
#include <net/route.h>
#include <net/inet_common.h>
@@ -120,26 +119,29 @@ DEFINE_RWLOCK(udp_hash_lock);
static int udp_port_rover;
-static inline int udp_lport_inuse(u16 num)
+static inline int __udp_lib_lport_inuse(__be16 num, struct hlist_head udptable[])
{
struct sock *sk;
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[num & (UDP_HTABLE_SIZE - 1)])
+ sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
if (inet_sk(sk)->num == num)
return 1;
return 0;
}
/**
- * udp_get_port - common port lookup for IPv4 and IPv6
+ * __udp_lib_get_port - UDP/-Lite port lookup for IPv4 and IPv6
*
* @sk: socket struct in question
* @snum: port number to look up
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ * @port_rover: pointer to record of last unallocated port
* @saddr_comp: AF-dependent comparison of bound local IP addresses
*/
-int udp_get_port(struct sock *sk, unsigned short snum,
- int (*saddr_cmp)(const struct sock *sk1, const struct sock *sk2))
+static int __udp_lib_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover,
+ int (*saddr_cmp)(const struct sock *, const struct sock *))
{
struct hlist_node *node;
struct hlist_head *head;
@@ -150,15 +152,15 @@ int udp_get_port(struct sock *sk, unsign
if (snum == 0) {
int best_size_so_far, best, result, i;
- if (udp_port_rover > sysctl_local_port_range[1] ||
- udp_port_rover < sysctl_local_port_range[0])
- udp_port_rover = sysctl_local_port_range[0];
+ if (*port_rover > sysctl_local_port_range[1] ||
+ *port_rover < sysctl_local_port_range[0])
+ *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
- best = result = udp_port_rover;
+ best = result = *port_rover;
for (i = 0; i < UDP_HTABLE_SIZE; i++, result++) {
int size;
- head = &udp_hash[result & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[result & (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(head)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -179,15 +181,15 @@ int udp_get_port(struct sock *sk, unsign
result = sysctl_local_port_range[0]
+ ((result - sysctl_local_port_range[0]) &
(UDP_HTABLE_SIZE - 1));
- if (!udp_lport_inuse(result))
+ if (! __udp_lib_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
goto fail;
gotit:
- udp_port_rover = snum = result;
+ *port_rover = snum = result;
} else {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_for_each(sk2, node, head)
if (inet_sk(sk2)->num == snum &&
@@ -200,7 +202,7 @@ gotit:
}
inet_sk(sk)->num = snum;
if (sk_unhashed(sk)) {
- head = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ head = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_add_node(sk, head);
sock_prot_inc_use(sk->sk_prot);
}
@@ -210,6 +212,12 @@ fail:
return error;
}
+__inline__ int udp_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, snum, udp_hash, &udp_port_rover, scmp);
+}
+
static inline int ipv4_rcv_saddr_equal(const struct sock *sk1, const struct sock *sk2)
{
struct inet_sock *inet1 = inet_sk(sk1), *inet2 = inet_sk(sk2);
@@ -224,34 +232,20 @@ static inline int udp_v4_get_port(struct
return udp_get_port(sk, snum, ipv4_rcv_saddr_equal);
}
-
-static void udp_v4_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v4_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
* harder than this. -DaveM
*/
-static struct sock *udp_v4_lookup_longway(__be32 saddr, __be16 sport,
- __be32 daddr, __be16 dport, int dif)
+static struct sock *__udp4_lib_lookup(__be32 saddr, __be16 sport,
+ __be32 daddr, __be16 dport,
+ int dif, struct hlist_head udptable[])
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
unsigned short hnum = ntohs(dport);
int badness = -1;
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ read_lock(&udp_hash_lock);
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && !ipv6_only_sock(sk)) {
@@ -285,20 +279,16 @@ static struct sock *udp_v4_lookup_longwa
}
}
}
+ if (result)
+ sock_hold(result);
+ read_unlock(&udp_hash_lock);
return result;
}
static __inline__ struct sock *udp_v4_lookup(__be32 saddr, __be16 sport,
__be32 daddr, __be16 dport, int dif)
{
- struct sock *sk;
-
- read_lock(&udp_hash_lock);
- sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif);
- if (sk)
- sock_hold(sk);
- read_unlock(&udp_hash_lock);
- return sk;
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
}
static inline struct sock *udp_v4_mcast_next(struct sock *sk,
@@ -340,7 +330,8 @@ found:
* to find the appropriate port.
*/
-void udp_err(struct sk_buff *skb, u32 info)
+static void __udp4_lib_err(struct sk_buff *skb, u32 info,
+ struct hlist_head udptable[] )
{
struct inet_sock *inet;
struct iphdr *iph = (struct iphdr*)skb->data;
@@ -351,7 +342,8 @@ void udp_err(struct sk_buff *skb, u32 in
int harderr;
int err;
- sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(iph->daddr, uh->dest, iph->saddr, uh->source,
+ skb->dev->ifindex, udptable );
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
@@ -405,6 +397,11 @@ out:
sock_put(sk);
}
+__inline__ void udp_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udp_hash);
+}
+
/*
* Throw away all pending data and cancel the corking. Socket is locked.
*/
@@ -419,6 +416,45 @@ static void udp_flush_pending_frames(str
}
}
+/**
+ * udp4_hwcsum_outgoing - handle outgoing HW checksumming
+ * @sk: socket we are sending on
+ * @skb: sk_buff containing the filled-in UDP header
+ * (checksum field must be zeroed out)
+ */
+static void udp4_hwcsum_outgoing(struct sock *sk, struct sk_buff *skb,
+ __be32 src, __be32 dst, int len )
+{
+ unsigned int csum = 0, offset;
+ struct udphdr *uh = skb->h.uh;
+
+ if (skb_queue_len(&sk->sk_write_queue) == 1) {
+ /*
+ * Only one fragment on the socket.
+ */
+ skb->csum = offsetof(struct udphdr, check);
+ uh->check = ~csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, 0);
+ } else {
+ /*
+ * HW-checksum won't work as there are two or more
+ * fragments on the socket so that all csums of sk_buffs
+ * should be together
+ */
+ offset = skb->h.raw - skb->data;
+ skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+
+ skb->ip_summed = CHECKSUM_NONE;
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ csum = csum_add(csum, skb->csum);
+ }
+
+ uh->check = csum_tcpudp_magic(src, dst, len, IPPROTO_UDP, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+ }
+}
+
/*
* Push out all pending data as one UDP datagram. Socket is locked.
*/
@@ -429,6 +465,7 @@ static int udp_push_pending_frames(struc
struct sk_buff *skb;
struct udphdr *uh;
int err = 0;
+ u32 csum = 0;
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
@@ -443,52 +480,28 @@ static int udp_push_pending_frames(struc
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
+ if (up->pcflag) /* UDP-Lite */
+ csum = udplite_csum_outgoing(sk, skb);
+
+ else if (sk->sk_no_check == UDP_CSUM_NOXMIT) { /* UDP csum disabled */
+
skb->ip_summed = CHECKSUM_NONE;
goto send;
- }
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- /*
- * Only one fragment on the socket.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- skb->csum = offsetof(struct udphdr, check);
- uh->check = ~csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, 0);
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, skb->csum);
- if (uh->check == 0)
- uh->check = -1;
- }
- } else {
- unsigned int csum = 0;
- /*
- * HW-checksum won't work as there are two or more
- * fragments on the socket so that all csums of sk_buffs
- * should be together.
- */
- if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int offset = (unsigned char *)uh - skb->data;
- skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
+ } else if (skb->ip_summed == CHECKSUM_PARTIAL) { /* UDP hardware csum */
- skb->ip_summed = CHECKSUM_NONE;
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- }
+ udp4_hwcsum_outgoing(sk, skb, fl->fl4_src,fl->fl4_dst, up->len);
+ goto send;
+
+ } else /* `normal' UDP */
+ csum = udp_csum_outgoing(sk, skb);
+
+ /* add protocol-dependent pseudo-header */
+ uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst, up->len,
+ sk->sk_protocol, csum );
+ if (uh->check == 0)
+ uh->check = -1;
- skb_queue_walk(&sk->sk_write_queue, skb) {
- csum = csum_add(csum, skb->csum);
- }
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, csum);
- if (uh->check == 0)
- uh->check = -1;
- }
send:
err = ip_push_pending_frames(sk);
out:
@@ -497,12 +510,6 @@ out:
return err;
}
-
-static unsigned short udp_check(struct udphdr *uh, int len, __be32 saddr, __be32 daddr, unsigned long base)
-{
- return(csum_tcpudp_magic(saddr, daddr, len, IPPROTO_UDP, base));
-}
-
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -516,8 +523,9 @@ int udp_sendmsg(struct kiocb *iocb, stru
__be32 daddr, faddr, saddr;
__be16 dport;
u8 tos;
- int err;
+ int err, is_udplite = up->pcflag;
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
if (len > 0xFFFF)
return -EMSGSIZE;
@@ -622,7 +630,7 @@ int udp_sendmsg(struct kiocb *iocb, stru
{ .daddr = faddr,
.saddr = saddr,
.tos = tos } },
- .proto = IPPROTO_UDP,
+ .proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = dport } } };
@@ -668,8 +676,9 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
- sizeof(struct udphdr), &ipc, rt,
+ getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
+ err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
+ sizeof(struct udphdr), &ipc, rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
udp_flush_pending_frames(sk);
@@ -684,7 +693,7 @@ out:
if (free)
kfree(ipc.opt);
if (!err) {
- UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -695,7 +704,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -795,17 +804,6 @@ int udp_ioctl(struct sock *sk, int cmd,
return(0);
}
-static __inline__ int __udp_checksum_complete(struct sk_buff *skb)
-{
- return __skb_checksum_complete(skb);
-}
-
-static __inline__ int udp_checksum_complete(struct sk_buff *skb)
-{
- return skb->ip_summed != CHECKSUM_UNNECESSARY &&
- __udp_checksum_complete(skb);
-}
-
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
@@ -817,7 +815,7 @@ static int udp_recvmsg(struct kiocb *ioc
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
struct sk_buff *skb;
- int copied, err;
+ int copied, err, copy_only, is_udplite = IS_UDPLITE(sk);
/*
* Check any passed addresses
@@ -839,15 +837,25 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
- if (__udp_checksum_complete(skb))
+ /*
+ * Decide whether to checksum and/or copy data.
+ *
+ * UDP: checksum may have been computed in HW,
+ * (re-)compute it if message is truncated.
+ * UDP-Lite: always needs to checksum, no HW support.
+ */
+ copy_only = (skb->ip_summed==CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
+ if (__udp_lib_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
+ copy_only = 1;
+ }
+
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
@@ -880,7 +888,7 @@ out:
return err;
csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
skb_kill_datagram(sk, skb, flags);
@@ -912,11 +920,6 @@ int udp_disconnect(struct sock *sk, int
return 0;
}
-static void udp_close(struct sock *sk, long timeout)
-{
- sk_common_release(sk);
-}
-
/* return:
* 1 if the the UDP system should process it
* 0 if we should drop this packet
@@ -1021,10 +1024,8 @@ static int udp_queue_rcv_skb(struct sock
/*
* Charge it to the socket, dropping if the queue is full.
*/
- if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
nf_reset(skb);
if (up->encap_type) {
@@ -1048,31 +1049,68 @@ static int udp_queue_rcv_skb(struct sock
if (ret < 0) {
/* process the ESP packet */
ret = xfrm4_rcv_encap(skb, up->encap_type);
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return -ret;
}
/* FALLTHROUGH -- it's a UDP Packet */
}
- if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
- if (__udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ /*
+ * MIB statistics other than incrementing the error count are
+ * disabled for the following two types of errors: these depend
+ * on the application settings, not on the functioning of the
+ * protocol stack as such.
+ *
+ * RFC 3828 here recommends (sec 3.3): "There should also be a
+ * way ... to ... at least let the receiving application block
+ * delivery of packets with coverage values less than a value
+ * provided by the application."
+ */
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE: partial coverage "
+ "%d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
+ }
+ /* The next case involves violating the min. coverage requested
+ * by the receiver. This is subtle: if receiver wants x and x is
+ * greater than the buffersize/MTU then receiver will complain
+ * that it wants x while sender emits packets of smaller size y.
+ * Therefore the above ...()->partial_cov statement is essential.
+ */
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING
+ "UDPLITE: coverage %d too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
}
+ }
+
+ if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
+ if (__udp_lib_checksum_complete(skb))
+ goto drop;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+
+drop:
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
/*
@@ -1081,14 +1119,16 @@ static int udp_queue_rcv_skb(struct sock
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
- __be32 saddr, __be32 daddr)
+static int __udp4_lib_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh,
+ __be32 saddr, __be32 daddr,
+ struct hlist_head udptable[])
{
struct sock *sk;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v4_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (sk) {
@@ -1122,65 +1162,75 @@ static int udp_v4_mcast_deliver(struct s
* Otherwise, csum completion requires chacksumming packet body,
* including udp header and folding it to skb->csum.
*/
-static void udp_checksum_init(struct sk_buff *skb, struct udphdr *uh,
- unsigned short ulen, __be32 saddr, __be32 daddr)
+static inline void udp4_csum_init(struct sk_buff *skb, struct udphdr *uh)
{
if (uh->check == 0) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
} else if (skb->ip_summed == CHECKSUM_COMPLETE) {
- if (!udp_check(uh, ulen, saddr, daddr, skb->csum))
+ if (!csum_tcpudp_magic(skb->nh.iph->saddr, skb->nh.iph->daddr,
+ skb->len, IPPROTO_UDP, skb->csum ))
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if (skb->ip_summed != CHECKSUM_UNNECESSARY)
- skb->csum = csum_tcpudp_nofold(saddr, daddr, ulen, IPPROTO_UDP, 0);
+ skb->csum = csum_tcpudp_nofold(skb->nh.iph->saddr,
+ skb->nh.iph->daddr,
+ skb->len, IPPROTO_UDP, 0);
/* Probably, we should checksum udp header (it should be in cache
* in any case) and data in tiny packets (< rx copybreak).
*/
+
+ /* UDP = UDP-Lite with a non-partial checksum coverage */
+ UDP_SKB_CB(skb)->partial_cov = 0;
}
/*
* All we need to do is get the socket, and then do a checksum.
*/
-int udp_rcv(struct sk_buff *skb)
+static int __udp4_lib_rcv(struct sk_buff *skb,
+ struct hlist_head udptable[], int is_udplite)
{
struct sock *sk;
- struct udphdr *uh;
+ struct udphdr *uh = skb->h.uh;
unsigned short ulen;
struct rtable *rt = (struct rtable*)skb->dst;
__be32 saddr = skb->nh.iph->saddr;
__be32 daddr = skb->nh.iph->daddr;
- int len = skb->len;
/*
- * Validate the packet and the UDP length.
+ * Validate the packet.
*/
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
- goto no_header;
-
- uh = skb->h.uh;
+ goto drop; /* No space for header. */
ulen = ntohs(uh->len);
-
- if (ulen > len || ulen < sizeof(*uh))
+ if (ulen > skb->len)
goto short_packet;
- if (pskb_trim_rcsum(skb, ulen))
- goto short_packet;
+ if(! is_udplite ) { /* UDP validates ulen. */
+
+ if (ulen < sizeof(*uh) || pskb_trim_rcsum(skb, ulen))
+ goto short_packet;
- udp_checksum_init(skb, uh, ulen, saddr, daddr);
+ udp4_csum_init(skb, uh);
+
+ } else { /* UDP-Lite validates cscov. */
+ if (udplite4_csum_init(skb, uh))
+ goto csum_error;
+ }
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return udp_v4_mcast_deliver(skb, uh, saddr, daddr);
+ return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udptable);
- sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
+ sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+ skb->dev->ifindex, udptable );
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
- * it it wants the return to be -protocol, or 0
+ * it wants the return to be -protocol, or 0
*/
if (ret > 0)
return -ret;
@@ -1195,7 +1245,7 @@ int udp_rcv(struct sk_buff *skb)
if (udp_checksum_complete(skb))
goto csum_error;
- UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
/*
@@ -1206,35 +1256,39 @@ int udp_rcv(struct sk_buff *skb)
return(0);
short_packet:
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
ulen,
- len,
+ skb->len,
NIPQUAD(daddr),
ntohs(uh->dest));
-no_header:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return(0);
+ goto drop;
csum_error:
/*
* RFC1122: OK. Discards the bad packet silently (as far as
* the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
NIPQUAD(daddr),
ntohs(uh->dest),
ulen);
drop:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
+__inline__ int udp_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udp_hash, 0);
+}
+
static int udp_destroy_sock(struct sock *sk)
{
lock_sock(sk);
@@ -1284,6 +1338,32 @@ static int do_udp_setsockopt(struct sock
}
break;
+ /*
+ * UDP-Lite's partial checksum coverage (RFC 3828).
+ */
+ /* The sender sets actual checksum coverage length via this option.
+ * The case coverage > packet length is handled by send module. */
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ /* The receiver specifies a minimum checksum coverage value. To make
+ * sense, this should be set to at least 8 (as done below). If zero is
+ * used, this again means full checksum coverage. */
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -1295,18 +1375,18 @@ static int do_udp_setsockopt(struct sock
static int udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return ip_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1333,6 +1413,16 @@ static int do_udp_getsockopt(struct sock
val = up->encap_type;
break;
+ /* The following two cannot be changed on UDP sockets, the return is
+ * always 0 (which corresponds to the full checksum coverage of UDP). */
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -1347,18 +1437,18 @@ static int do_udp_getsockopt(struct sock
static int udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return ip_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_getsockopt(sk, level, optname, optval, optlen);
}
#endif
/**
@@ -1378,7 +1468,8 @@ unsigned int udp_poll(struct file *file,
{
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
-
+ int is_lite = IS_UDPLITE(sk);
+
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
!(file->f_flags & O_NONBLOCK) &&
@@ -1389,7 +1480,7 @@ unsigned int udp_poll(struct file *file,
spin_lock_bh(&rcvq->lock);
while ((skb = skb_peek(rcvq)) != NULL) {
if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_lite);
__skb_unlink(skb, rcvq);
kfree_skb(skb);
} else {
@@ -1411,7 +1502,7 @@ unsigned int udp_poll(struct file *file,
struct proto udp_prot = {
.name = "UDP",
.owner = THIS_MODULE,
- .close = udp_close,
+ .close = udp_lib_close,
.connect = ip4_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
@@ -1422,8 +1513,8 @@ struct proto udp_prot = {
.recvmsg = udp_recvmsg,
.sendpage = udp_sendpage,
.backlog_rcv = udp_queue_rcv_skb,
- .hash = udp_v4_hash,
- .unhash = udp_v4_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v4_get_port,
.obj_size = sizeof(struct udp_sock),
#ifdef CONFIG_COMPAT
@@ -1442,7 +1533,7 @@ static struct sock *udp_get_first(struct
for (state->bucket = 0; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[state->bucket]) {
+ sk_for_each(sk, node, state->hashtable + state->bucket) {
if (sk->sk_family == state->family)
goto found;
}
@@ -1463,7 +1554,7 @@ try_again:
} while (sk && sk->sk_family != state->family);
if (!sk && ++state->bucket < UDP_HTABLE_SIZE) {
- sk = sk_head(&udp_hash[state->bucket]);
+ sk = sk_head(state->hashtable + state->bucket);
goto try_again;
}
return sk;
@@ -1513,6 +1604,7 @@ static int udp_seq_open(struct inode *in
if (!s)
goto out;
s->family = afinfo->family;
+ s->hashtable = afinfo->hashtable;
s->seq_ops.start = udp_seq_start;
s->seq_ops.next = udp_seq_next;
s->seq_ops.show = afinfo->seq_show;
@@ -1602,6 +1694,7 @@ static struct udp_seq_afinfo udp4_seq_af
.owner = THIS_MODULE,
.name = "udp",
.family = AF_INET,
+ .hashtable = udp_hash,
.seq_show = udp4_seq_show,
.seq_fops = &udp4_seq_fops,
};
@@ -1630,3 +1723,5 @@ #ifdef CONFIG_PROC_FS
EXPORT_SYMBOL(udp_proc_register);
EXPORT_SYMBOL(udp_proc_unregister);
#endif
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
diff --git a/include/net/udplite.h b/include/net/udplite.h
new file mode 100644
index 0000000..85de96c
--- /dev/null
+++ b/include/net/udplite.h
@@ -0,0 +1,149 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ */
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10 /* sender partial coverage (as sent) */
+#define UDPLITE_RECV_CSCOV 11 /* receiver partial coverage (threshold ) */
+
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+
+/* UDP-Lite does not have a standardized MIB yet, so we inherit from UDP */
+DECLARE_SNMP_STAT(struct udp_mib, udplite_statistics);
+
+/*
+ * Checksum computation is all in software, hence simpler getfrag.
+ */
+static __inline__ int udplite_getfrag(void *from, char *to, int offset,
+ int len, int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+/* Designate sk as UDP-Lite socket */
+static inline int udplite_sk_init(struct sock *sk)
+{
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
+/*
+ * Checksumming routines
+ */
+static inline int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh)
+{
+ u16 cscov;
+
+ /* In UDPv4 a zero checksum means that the transmitter generated no
+ * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
+ * with a zero checksum field are illegal. */
+ if (uh->check == 0) {
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: zeroed checksum field\n");
+ return 1;
+ }
+
+ UDP_SKB_CB(skb)->partial_cov = 0;
+ cscov = ntohs(uh->len);
+
+ if (cscov == 0) /* Indicates that full coverage is required. */
+ cscov = skb->len;
+ else if (cscov < 8 || cscov > skb->len) {
+ /*
+ * Coverage length violates RFC 3828: log and discard silently.
+ */
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: bad csum coverage %d/%d\n",
+ cscov, skb->len);
+ return 1;
+
+ } else if (cscov < skb->len)
+ UDP_SKB_CB(skb)->partial_cov = 1;
+
+ UDP_SKB_CB(skb)->cscov = cscov;
+
+ /*
+ * There is no known NIC manufacturer supporting UDP-Lite yet,
+ * hence ip_summed is always (re-)set to CHECKSUM_NONE.
+ */
+ skb->ip_summed = CHECKSUM_NONE;
+
+ return 0;
+}
+
+static __inline__ int udplite4_csum_init(struct sk_buff *skb, struct udphdr *uh)
+{
+ int rc = udplite_checksum_init(skb, uh);
+
+ if (!rc)
+ skb->csum = csum_tcpudp_nofold(skb->nh.iph->saddr,
+ skb->nh.iph->daddr,
+ skb->len, IPPROTO_UDPLITE, 0);
+ return rc;
+}
+
+static __inline__ int udplite6_csum_init(struct sk_buff *skb, struct udphdr *uh)
+{
+ int rc = udplite_checksum_init(skb, uh);
+
+ if (!rc)
+ skb->csum = ~csum_ipv6_magic(&skb->nh.ipv6h->saddr,
+ &skb->nh.ipv6h->daddr,
+ skb->len, IPPROTO_UDPLITE, 0);
+ return rc;
+}
+
+static inline int udplite_sender_cscov(struct udp_sock *up, struct udphdr *uh)
+{
+ int cscov = up->len;
+
+ /*
+ * Sender has set `partial coverage' option on UDP-Lite socket
+ */
+ if (up->pcflag & UDPLITE_SEND_CC) {
+ if (up->pcslen < up->len) {
+ /* up->pcslen == 0 means that full coverage is required,
+ * partial coverage only if 0 < up->pcslen < up->len */
+ if (0 < up->pcslen) {
+ cscov = up->pcslen;
+ }
+ uh->len = htons(up->pcslen);
+ }
+ /*
+ * NOTE: Causes for the error case `up->pcslen > up->len':
+ * (i) Application error (will not be penalized).
+ * (ii) Payload too big for send buffer: data is split
+ * into several packets, each with its own header.
+ * In this case (e.g. last segment), coverage may
+ * exceed packet length.
+ * Since packets with coverage length > packet length are
+ * illegal, we fall back to the defaults here.
+ */
+ }
+ return cscov;
+}
+
+static inline u32 udplite_csum_outgoing(struct sock *sk, struct sk_buff *skb)
+{
+ u32 csum = 0;
+ int off, len, cscov = udplite_sender_cscov(udp_sk(sk), skb->h.uh);
+
+ skb->ip_summed = CHECKSUM_NONE; /* no HW support for checksumming */
+
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ off = skb->h.raw - skb->data;
+ len = skb->len - off;
+
+ csum = skb_checksum(skb, off, (cscov > len)? len : cscov, csum);
+
+ if ((cscov -= len) <= 0)
+ break;
+ }
+ return csum;
+}
+
+extern void udplite4_register(void);
+extern int udplite_get_port(struct sock *sk, unsigned short snum,
+ int (*scmp)(const struct sock *, const struct sock *));
+#endif /* _UDPLITE_H */
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
new file mode 100644
index 0000000..89428da
--- /dev/null
+++ b/net/ipv4/udplite.c
@@ -0,0 +1,126 @@
+/*
+ * UDPLITE An implementation of the UDP-Lite protocol (RFC 3828).
+ *
+ * Version: $Id: udplite.c,v 1.24 2006/09/18 21:50:59 gerrit Exp gerrit $
+ *
+ * Authors: Gerrit Renker <gerrit@erg.abdn.ac.uk>
+ *
+ * Changes:
+ * Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+static int udplite_port_rover;
+
+DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics) __read_mostly;
+
+__inline__ int udplite_get_port(struct sock *sk, unsigned short p,
+ int (*c)(const struct sock *, const struct sock *))
+{
+ return __udp_lib_get_port(sk, p, udplite_hash, &udplite_port_rover, c);
+}
+
+static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return udplite_get_port(sk, snum, ipv4_rcv_saddr_equal);
+}
+
+static __inline__ struct sock *udplite_v4_lookup(__be32 saddr, __be16 sport,
+ __be32 daddr, __be16 dport,
+ int dif )
+{
+ return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+__inline__ int udplite_rcv(struct sk_buff *skb)
+{
+ return __udp4_lib_rcv(skb, udplite_hash, 1);
+}
+
+__inline__ void udplite_err(struct sk_buff *skb, u32 info)
+{
+ return __udp4_lib_err(skb, info, udplite_hash);
+}
+
+static struct net_protocol udplite_protocol = {
+ .handler = udplite_rcv,
+ .err_handler = udplite_err,
+ .no_policy = 1,
+};
+
+struct proto udplite_prot = {
+ .name = "UDP-Lite",
+ .owner = THIS_MODULE,
+ .close = udp_lib_close,
+ .connect = ip4_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udp_destroy_sock,
+ .setsockopt = udp_setsockopt,
+ .getsockopt = udp_getsockopt,
+ .sendmsg = udp_sendmsg,
+ .recvmsg = udp_recvmsg,
+ .sendpage = udp_sendpage,
+ .backlog_rcv = udp_queue_rcv_skb,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
+ .get_port = udplite_v4_get_port,
+ .obj_size = sizeof(struct udp_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udp_setsockopt,
+ .compat_getsockopt = compat_udp_getsockopt,
+#endif
+};
+
+static struct inet_protosw udplite4_protosw = {
+ .type = SOCK_DGRAM,
+ .protocol = IPPROTO_UDPLITE,
+ .prot = &udplite_prot,
+ .ops = &inet_dgram_ops,
+ .capability = -1,
+ .no_check = 0, /* must checksum (RFC 3828) */
+ .flags = INET_PROTOSW_PERMANENT,
+};
+
+#ifdef CONFIG_PROC_FS
+static struct file_operations udplite4_seq_fops;
+static struct udp_seq_afinfo udplite4_seq_afinfo = {
+ .owner = THIS_MODULE,
+ .name = "udplite",
+ .family = AF_INET,
+ .hashtable = udplite_hash,
+ .seq_show = udp4_seq_show,
+ .seq_fops = &udplite4_seq_fops,
+};
+#endif
+
+void __init udplite4_register(void)
+{
+ if (proto_register(&udplite_prot, 1))
+ goto out_register_err;
+
+ if (inet_add_protocol(&udplite_protocol, IPPROTO_UDPLITE) < 0)
+ goto out_unregister_proto;
+
+ inet_register_protosw(&udplite4_protosw);
+
+#ifdef CONFIG_PROC_FS
+ if (udp_proc_register(&udplite4_seq_afinfo)) /* udplite4_proc_init() */
+ printk(KERN_ERR "%s: Cannot register /proc!\n", __FUNCTION__);
+#endif
+ return;
+
+out_unregister_proto:
+ proto_unregister(&udplite_prot);
+out_register_err:
+ printk(KERN_CRIT "%s: Cannot add UDP-Lite protocol.\n", __FUNCTION__);
+}
+
+EXPORT_SYMBOL(udplite_hash);
+EXPORT_SYMBOL(udplite_prot);
+EXPORT_SYMBOL(udplite_get_port);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index edcf093..05b2df2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -104,6 +104,7 @@ #include <net/ip_fib.h>
#include <net/inet_connection_sock.h>
#include <net/tcp.h>
#include <net/udp.h>
+#include <net/udplite.h>
#include <linux/skbuff.h>
#include <net/sock.h>
#include <net/raw.h>
@@ -1223,10 +1224,13 @@ static int __init init_ipv4_mibs(void)
tcp_statistics[1] = alloc_percpu(struct tcp_mib);
udp_statistics[0] = alloc_percpu(struct udp_mib);
udp_statistics[1] = alloc_percpu(struct udp_mib);
+ udplite_statistics[0] = alloc_percpu(struct udp_mib);
+ udplite_statistics[1] = alloc_percpu(struct udp_mib);
if (!
(net_statistics[0] && net_statistics[1] && ip_statistics[0]
&& ip_statistics[1] && tcp_statistics[0] && tcp_statistics[1]
- && udp_statistics[0] && udp_statistics[1]))
+ && udp_statistics[0] && udp_statistics[1]
+ && udplite_statistics[0] && udplite_statistics[1] ) )
return -ENOMEM;
(void) tcp_mib_init();
@@ -1313,6 +1317,10 @@ #endif
/* Setup TCP slab cache for open requests. */
tcp_init();
+ /*
+ * Add UDP-Lite (RFC 3828)
+ */
+ udplite4_register();
/*
* Set the ICMP layer up
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9c6cbe3..cd873da 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -38,6 +38,7 @@ #include <net/icmp.h>
#include <net/protocol.h>
#include <net/tcp.h>
#include <net/udp.h>
+#include <net/udplite.h>
#include <linux/inetdevice.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
@@ -66,6 +67,7 @@ static int sockstat_seq_show(struct seq_
tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
atomic_read(&tcp_memory_allocated));
seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
+ seq_printf(seq, "UDPLITE: inuse %d\n", fold_prot_inuse(&udplite_prot));
seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
seq_printf(seq, "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
atomic_read(&ip_frag_mem));
@@ -304,6 +306,17 @@ static int snmp_seq_show(struct seq_file
fold_field((void **) udp_statistics,
snmp4_udp_list[i].entry));
+ /* the UDP and UDP-Lite MIBs are the same */
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %s", snmp4_udp_list[i].name);
+
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %lu",
+ fold_field((void **) udplite_statistics,
+ snmp4_udp_list[i].entry) );
+
seq_putc(seq, '\n');
return 0;
}
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCHv4 2/3] net/ipv6: v6-side of UDP-Lite
2006-10-12 9:01 ` David Miller
2006-10-13 15:14 ` [PATCHv4 1/3] net/ipv4: UDP-Lite support (RFC 3828) Gerrit Renker
@ 2006-10-13 15:14 ` Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 3/3] net: UDP-Lite misc files Gerrit Renker
2 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-10-13 15:14 UTC (permalink / raw)
To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: David Miller, netdev
This provides consolidated UDP-Lite support over IPv6.
Changes to net/ipv6/udp.c reflect those in net/ipv4/udp.c.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
include/net/ipv6.h | 10 +
include/net/transp_v6.h | 3
net/ipv6/af_inet6.c | 3
net/ipv6/proc.c | 11 +
net/ipv6/udp.c | 333 ++++++++++++++++++++++++++++--------------------
net/ipv6/udplite.c | 131 ++++++++++++++++++
6 files changed, 350 insertions(+), 141 deletions(-)
diff --git a/include/net/transp_v6.h b/include/net/transp_v6.h
index 61f724c..d5a337d 100644
--- a/include/net/transp_v6.h
+++ b/include/net/transp_v6.h
@@ -11,6 +11,7 @@ #ifdef __KERNEL__
extern struct proto rawv6_prot;
extern struct proto udpv6_prot;
+extern struct proto udplitev6_prot;
extern struct proto tcpv6_prot;
struct flowi;
@@ -24,6 +25,8 @@ extern void ipv6_destopt_init(void);
/* transport protocols */
extern void rawv6_init(void);
extern void udpv6_init(void);
+extern void udplitev6_init(void);
+extern void udplitev6_cleanup(void);
extern void tcpv6_init(void);
extern int udpv6_connect(struct sock *sk,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e0c3934..72f3d69 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -49,6 +49,7 @@ #include <net/ip6_route.h>
#include <net/addrconf.h>
#include <net/ip.h>
#include <net/udp.h>
+#include <net/udplite.h>
#include <net/raw.h>
#include <net/inet_common.h>
#include <net/tcp_states.h>
@@ -66,23 +67,9 @@ static inline int udp_v6_get_port(struct
return udp_get_port(sk, snum, ipv6_rcv_saddr_equal);
}
-static void udp_v6_hash(struct sock *sk)
-{
- BUG();
-}
-
-static void udp_v6_unhash(struct sock *sk)
-{
- write_lock_bh(&udp_hash_lock);
- if (sk_del_node_init(sk)) {
- inet_sk(sk)->num = 0;
- sock_prot_dec_use(sk->sk_prot);
- }
- write_unlock_bh(&udp_hash_lock);
-}
-
-static struct sock *udp_v6_lookup(struct in6_addr *saddr, u16 sport,
- struct in6_addr *daddr, u16 dport, int dif)
+static struct sock *__udp6_lib_lookup(struct in6_addr *saddr, __be16 sport,
+ struct in6_addr *daddr, __be16 dport,
+ int dif, struct hlist_head udptable[])
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
@@ -90,7 +77,7 @@ static struct sock *udp_v6_lookup(struct
int badness = -1;
read_lock(&udp_hash_lock);
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && sk->sk_family == PF_INET6) {
@@ -131,13 +118,11 @@ static struct sock *udp_v6_lookup(struct
return result;
}
-/*
- *
- */
-
-static void udpv6_close(struct sock *sk, long timeout)
+static __inline__ struct sock *udp_v6_lookup(struct in6_addr *saddr, __be16 sport,
+ struct in6_addr *daddr, __be16 dport,
+ int dif)
{
- sk_common_release(sk);
+ return __udp6_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
}
/*
@@ -153,7 +138,7 @@ static int udpv6_recvmsg(struct kiocb *i
struct inet_sock *inet = inet_sk(sk);
struct sk_buff *skb;
size_t copied;
- int err;
+ int err, copy_only, is_udplite = IS_UDPLITE(sk);
if (addr_len)
*addr_len=sizeof(struct sockaddr_in6);
@@ -172,15 +157,21 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
- if (__skb_checksum_complete(skb))
+ /*
+ * Decide whether to checksum and/or copy data.
+ */
+ copy_only = (skb->ip_summed==CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
+ if (__udp_lib_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
+ copy_only = 1;
+ }
+
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
goto csum_copy_err;
@@ -231,14 +222,15 @@ csum_copy_err:
skb_kill_datagram(sk, skb, flags);
if (flags & MSG_DONTWAIT) {
- UDP6_INC_STATS_USER(UDP_MIB_INERRORS);
+ UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite);
return -EAGAIN;
}
goto try_again;
}
-static void udpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
- int type, int code, int offset, __u32 info)
+static void __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+ int type, int code, int offset, __u32 info,
+ struct hlist_head udptable[] )
{
struct ipv6_pinfo *np;
struct ipv6hdr *hdr = (struct ipv6hdr*)skb->data;
@@ -249,8 +241,8 @@ static void udpv6_err(struct sk_buff *sk
struct sock *sk;
int err;
- sk = udp_v6_lookup(daddr, uh->dest, saddr, uh->source, dev->ifindex);
-
+ sk = __udp6_lib_lookup(daddr, uh->dest,
+ saddr, uh->source, dev->ifindex, udptable);
if (sk == NULL)
return;
@@ -271,31 +263,56 @@ out:
sock_put(sk);
}
-static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
+static __inline__ void udpv6_err(struct sk_buff *skb,
+ struct inet6_skb_parm *opt, int type,
+ int code, int offset, __u32 info )
{
+ return __udp6_lib_err(skb, opt, type, code, offset, info, udp_hash);
+}
+
+static int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
+{
+ struct udp_sock *up = udp_sk(sk);
int rc;
- if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
- if (skb_checksum_complete(skb)) {
- UDP6_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return 0;
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets.
+ * For comments on these cases, see net/ipv4/udp.c
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE6: partial coverage"
+ " %d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
+ }
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE6: coverage %d "
+ "too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
+ }
}
+ if (__udp_lib_checksum_complete(skb))
+ goto drop;
+
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP6_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP6_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return 0;
+ UDP6_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP6_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP6_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+drop:
+ UDP6_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
static struct sock *udp_v6_mcast_next(struct sock *sk,
@@ -339,15 +356,15 @@ static struct sock *udp_v6_mcast_next(st
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static void udpv6_mcast_deliver(struct udphdr *uh,
- struct in6_addr *saddr, struct in6_addr *daddr,
- struct sk_buff *skb)
+static int __udp6_lib_mcast_deliver(struct sk_buff *skb, struct in6_addr *saddr,
+ struct in6_addr *daddr, struct hlist_head udptable[])
{
struct sock *sk, *sk2;
+ const struct udphdr *uh = skb->h.uh;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v6_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (!sk) {
@@ -365,9 +382,34 @@ static void udpv6_mcast_deliver(struct u
udpv6_queue_rcv_skb(sk, skb);
out:
read_unlock(&udp_hash_lock);
+ return 0;
+}
+
+static inline int udp6_csum_init(struct sk_buff *skb, struct udphdr *uh)
+
+{
+ if (uh->check == 0) {
+ /* RFC 2460 section 8.1 says that we SHOULD log
+ this error. Well, it is reasonable.
+ */
+ LIMIT_NETDEBUG(KERN_INFO "IPv6: udp checksum is 0\n");
+ return 1;
+ }
+ if (skb->ip_summed == CHECKSUM_COMPLETE &&
+ !csum_ipv6_magic(&skb->nh.ipv6h->saddr, &skb->nh.ipv6h->daddr,
+ skb->len, IPPROTO_UDP, skb->csum ))
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+ if (skb->ip_summed != CHECKSUM_UNNECESSARY)
+ skb->csum = ~csum_ipv6_magic(&skb->nh.ipv6h->saddr,
+ &skb->nh.ipv6h->daddr,
+ skb->len, IPPROTO_UDP, 0);
+
+ return (UDP_SKB_CB(skb)->partial_cov = 0);
}
-static int udpv6_rcv(struct sk_buff **pskb)
+static int __udp6_lib_rcv(struct sk_buff **pskb,
+ struct hlist_head udptable[], int is_udplite)
{
struct sk_buff *skb = *pskb;
struct sock *sk;
@@ -384,44 +426,39 @@ static int udpv6_rcv(struct sk_buff **ps
uh = skb->h.uh;
ulen = ntohs(uh->len);
+ if (ulen > skb->len)
+ goto short_packet;
- /* Check for jumbo payload */
- if (ulen == 0)
- ulen = skb->len;
+ if(! is_udplite ) { /* UDP validates ulen. */
- if (ulen > skb->len || ulen < sizeof(*uh))
- goto short_packet;
+ /* Check for jumbo payload */
+ if (ulen == 0)
+ ulen = skb->len;
- if (uh->check == 0) {
- /* RFC 2460 section 8.1 says that we SHOULD log
- this error. Well, it is reasonable.
- */
- LIMIT_NETDEBUG(KERN_INFO "IPv6: udp checksum is 0\n");
- goto discard;
- }
+ if (ulen < sizeof(*uh))
+ goto short_packet;
- if (ulen < skb->len) {
- if (pskb_trim_rcsum(skb, ulen))
- goto discard;
- saddr = &skb->nh.ipv6h->saddr;
- daddr = &skb->nh.ipv6h->daddr;
- uh = skb->h.uh;
- }
+ if (ulen < skb->len) {
+ if (pskb_trim_rcsum(skb, ulen))
+ goto short_packet;
+ saddr = &skb->nh.ipv6h->saddr;
+ daddr = &skb->nh.ipv6h->daddr;
+ uh = skb->h.uh;
+ }
- if (skb->ip_summed == CHECKSUM_COMPLETE &&
- !csum_ipv6_magic(saddr, daddr, ulen, IPPROTO_UDP, skb->csum))
- skb->ip_summed = CHECKSUM_UNNECESSARY;
+ if (udp6_csum_init(skb, uh))
+ goto discard;
- if (skb->ip_summed != CHECKSUM_UNNECESSARY)
- skb->csum = ~csum_ipv6_magic(saddr, daddr, ulen, IPPROTO_UDP, 0);
+ } else { /* UDP-Lite validates cscov. */
+ if (udplite6_csum_init(skb, uh))
+ goto discard;
+ }
/*
* Multicast receive code
*/
- if (ipv6_addr_is_multicast(daddr)) {
- udpv6_mcast_deliver(uh, saddr, daddr, skb);
- return 0;
- }
+ if (ipv6_addr_is_multicast(daddr))
+ return __udp6_lib_mcast_deliver(skb, saddr, daddr, udptable);
/* Unicast */
@@ -429,15 +466,16 @@ static int udpv6_rcv(struct sk_buff **ps
* check socket cache ... must talk to Alan about his plans
* for sock caches... i'll skip this for now.
*/
- sk = udp_v6_lookup(saddr, uh->source, daddr, uh->dest, dev->ifindex);
+ sk = __udp6_lib_lookup(saddr, uh->source,
+ daddr, uh->dest, dev->ifindex, udptable);
if (sk == NULL) {
if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb))
goto discard;
- if (skb_checksum_complete(skb))
+ if (__udp_lib_checksum_complete(skb))
goto discard;
- UDP6_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP6_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0, dev);
@@ -452,14 +490,20 @@ static int udpv6_rcv(struct sk_buff **ps
return(0);
short_packet:
- if (net_ratelimit())
- printk(KERN_DEBUG "UDP: short packet: %d/%u\n", ulen, skb->len);
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%sv6: short packet: %d/%u\n",
+ is_udplite? "-Lite" : "", ulen, skb->len);
discard:
- UDP6_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP6_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
+
+static __inline__ int udpv6_rcv(struct sk_buff **pskb)
+{
+ return __udp6_lib_rcv(pskb, udp_hash, 0);
+}
+
/*
* Throw away all pending data and cancel the corking. Socket is locked.
*/
@@ -485,6 +529,7 @@ static int udp_v6_push_pending_frames(st
struct inet_sock *inet = inet_sk(sk);
struct flowi *fl = &inet->cork.fl;
int err = 0;
+ u32 csum = 0;
/* Grab the skbuff where UDP header space exists. */
if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
@@ -499,35 +544,17 @@ static int udp_v6_push_pending_frames(st
uh->len = htons(up->len);
uh->check = 0;
- if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
- skb->ip_summed = CHECKSUM_NONE;
- goto send;
- }
+ if (up->pcflag)
+ csum = udplite_csum_outgoing(sk, skb);
+ else
+ csum = udp_csum_outgoing(sk, skb);
- if (skb_queue_len(&sk->sk_write_queue) == 1) {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, skb->csum);
- } else {
- u32 tmp_csum = 0;
-
- skb_queue_walk(&sk->sk_write_queue, skb) {
- tmp_csum = csum_add(tmp_csum, skb->csum);
- }
- tmp_csum = csum_partial((char *)uh,
- sizeof(struct udphdr), tmp_csum);
- tmp_csum = csum_ipv6_magic(&fl->fl6_src,
- &fl->fl6_dst,
- up->len, fl->proto, tmp_csum);
- uh->check = tmp_csum;
-
- }
+ /* add protocol-dependent pseudo-header */
+ uh->check = csum_ipv6_magic(&fl->fl6_src, &fl->fl6_dst,
+ up->len, fl->proto, csum );
if (uh->check == 0)
uh->check = -1;
-send:
err = ip6_push_pending_frames(sk);
out:
up->len = 0;
@@ -555,6 +582,8 @@ static int udpv6_sendmsg(struct kiocb *i
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
int err;
int connected = 0;
+ int is_udplite = up->pcflag;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
/* destination address check */
if (sin6) {
@@ -695,7 +724,7 @@ do_udp_sendmsg:
opt = fl6_merge_options(&opt_space, flowlabel, opt);
opt = ipv6_fixup_options(&opt_space, opt);
- fl.proto = IPPROTO_UDP;
+ fl.proto = sk->sk_protocol,
ipv6_addr_copy(&fl.fl6_dst, daddr);
if (ipv6_addr_any(&fl.fl6_src) && !ipv6_addr_any(&np->saddr))
ipv6_addr_copy(&fl.fl6_src, &np->saddr);
@@ -762,7 +791,8 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
+ getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
+ err = ip6_append_data(sk, getfrag, msg->msg_iov, ulen,
sizeof(struct udphdr), hlimit, tclass, opt, &fl,
(struct rt6_info*)dst,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
@@ -794,7 +824,7 @@ #endif
out:
fl6_sock_release(flowlabel);
if (!err) {
- UDP6_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP6_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -805,7 +835,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP6_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP6_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -855,7 +885,7 @@ static int do_udpv6_setsockopt(struct so
release_sock(sk);
}
break;
-
+
case UDP_ENCAP:
switch (val) {
case 0:
@@ -866,6 +896,24 @@ static int do_udpv6_setsockopt(struct so
break;
}
break;
+
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
default:
err = -ENOPROTOOPT;
@@ -878,19 +926,18 @@ static int do_udpv6_setsockopt(struct so
static int udpv6_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ipv6_setsockopt(sk, level, optname, optval, optlen);
- return do_udpv6_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udpv6_setsockopt(sk, level, optname, optval, optlen);
+ return ipv6_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udpv6_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ipv6_setsockopt(sk, level, optname,
- optval, optlen);
- return do_udpv6_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udpv6_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ipv6_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -917,6 +964,14 @@ static int do_udpv6_getsockopt(struct so
val = up->encap_type;
break;
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -931,19 +986,18 @@ static int do_udpv6_getsockopt(struct so
static int udpv6_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return ipv6_getsockopt(sk, level, optname, optval, optlen);
- return do_udpv6_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udpv6_getsockopt(sk, level, optname, optval, optlen);
+ return ipv6_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udpv6_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ipv6_getsockopt(sk, level, optname,
- optval, optlen);
- return do_udpv6_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udpv6_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ipv6_getsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1003,6 +1057,7 @@ static struct udp_seq_afinfo udp6_seq_af
.owner = THIS_MODULE,
.name = "udp6",
.family = AF_INET6,
+ .hashtable = udp_hash,
.seq_show = udp6_seq_show,
.seq_fops = &udp6_seq_fops,
};
@@ -1022,7 +1077,7 @@ #endif /* CONFIG_PROC_FS */
struct proto udpv6_prot = {
.name = "UDPv6",
.owner = THIS_MODULE,
- .close = udpv6_close,
+ .close = udp_lib_close,
.connect = ip6_datagram_connect,
.disconnect = udp_disconnect,
.ioctl = udp_ioctl,
@@ -1032,8 +1087,8 @@ struct proto udpv6_prot = {
.sendmsg = udpv6_sendmsg,
.recvmsg = udpv6_recvmsg,
.backlog_rcv = udpv6_queue_rcv_skb,
- .hash = udp_v6_hash,
- .unhash = udp_v6_unhash,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
.get_port = udp_v6_get_port,
.obj_size = sizeof(struct udp6_sock),
#ifdef CONFIG_COMPAT
@@ -1059,3 +1114,5 @@ void __init udpv6_init(void)
printk(KERN_ERR "udpv6_init: Could not register protocol\n");
inet6_register_protosw(&udpv6_protosw);
}
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
new file mode 100644
index 0000000..ee410f7
--- /dev/null
+++ b/net/ipv6/udplite.c
@@ -0,0 +1,131 @@
+/*
+ * UDPLITEv6 An implementation of the UDP-Lite protocol over IPv6.
+ * See also net/ipv4/udplite.c
+ *
+ * Version: $Id: udplite.c,v 1.8 2006/07/14 09:06:24 gerrit Exp gerrit $
+ *
+ * Authors: Gerrit Renker <gerrit@erg.abdn.ac.uk>
+ *
+ * Changes:
+ * Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+DEFINE_SNMP_STAT(struct udp_mib, udplite_stats_in6) __read_mostly;
+
+static __inline__ int udplite6_mib_init(void)
+{
+ return snmp6_mib_init((void **)udplite_stats_in6,
+ sizeof(struct udp_mib),
+ __alignof__(struct udp_mib) );
+}
+
+static __inline__ int udplite_v6_get_port(struct sock *sk, unsigned short snum)
+{
+ return udplite_get_port(sk, snum, ipv6_rcv_saddr_equal);
+}
+
+static __inline__ struct sock *udplite_v6_lookup(struct in6_addr *saddr, __be16 sport,
+ struct in6_addr *daddr, __be16 dport,
+ int dif)
+{
+ return __udp6_lib_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+static __inline__ int udplitev6_rcv(struct sk_buff **pskb)
+{
+ return __udp6_lib_rcv(pskb, udplite_hash, 1);
+}
+
+static __inline__ void udplitev6_err(struct sk_buff *skb,
+ struct inet6_skb_parm *opt, int type,
+ int code, int offset, __u32 info )
+{
+ return __udp6_lib_err(skb, opt, type, code, offset, info, udplite_hash);
+}
+
+static struct inet6_protocol udplitev6_protocol = {
+ .handler = udplitev6_rcv,
+ .err_handler = udplitev6_err,
+ .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL,
+};
+
+struct proto udplitev6_prot = {
+ .name = "UDPLITEv6",
+ .owner = THIS_MODULE,
+ .close = udp_lib_close,
+ .connect = ip6_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udpv6_destroy_sock,
+ .setsockopt = udpv6_setsockopt,
+ .getsockopt = udpv6_getsockopt,
+ .sendmsg = udpv6_sendmsg,
+ .recvmsg = udpv6_recvmsg,
+ .backlog_rcv = udpv6_queue_rcv_skb,
+ .hash = udp_lib_hash,
+ .unhash = udp_lib_unhash,
+ .get_port = udplite_v6_get_port,
+ .obj_size = sizeof(struct udp6_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udpv6_setsockopt,
+ .compat_getsockopt = compat_udpv6_getsockopt,
+#endif
+};
+
+static struct inet_protosw udplite6_protosw = {
+ .type = SOCK_DGRAM,
+ .protocol = IPPROTO_UDPLITE,
+ .prot = &udplitev6_prot,
+ .ops = &inet6_dgram_ops,
+ .capability = -1,
+ .no_check = 0,
+ .flags = INET_PROTOSW_PERMANENT,
+};
+
+#ifdef CONFIG_PROC_FS
+static struct file_operations udplite6_seq_fops;
+static struct udp_seq_afinfo udplite6_seq_afinfo = {
+ .owner = THIS_MODULE,
+ .name = "udplite6",
+ .family = AF_INET6,
+ .hashtable = udplite_hash,
+ .seq_show = udp6_seq_show,
+ .seq_fops = &udplite6_seq_fops,
+};
+#endif
+
+void __init udplitev6_init(void)
+{
+ if (proto_register(&udplitev6_prot, 1))
+ goto out_register_err;
+
+ if (inet6_add_protocol(&udplitev6_protocol, IPPROTO_UDPLITE) < 0)
+ goto out_unregister_proto;
+
+ inet6_register_protosw(&udplite6_protosw);
+
+ if(udplite6_mib_init() < 0)
+ printk(KERN_ERR "%s: Can not add MIB support.\n", __FUNCTION__);
+
+#ifdef CONFIG_PROC_FS
+ if (udp_proc_register(&udplite6_seq_afinfo))
+ printk(KERN_ERR "%s: Cannot register /proc!\n", __FUNCTION__);
+#endif
+ return;
+
+out_unregister_proto:
+ proto_unregister(&udplitev6_prot);
+out_register_err:
+ printk(KERN_ERR "%s: Could not register.\n", __FUNCTION__);
+}
+
+void __exit udplitev6_cleanup(void)
+{
+ snmp6_mib_free((void **)udplite_stats_in6);
+ proto_unregister(&udplitev6_prot);
+}
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 8223c44..60cbe35 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -143,9 +143,13 @@ #define ICMP6_INC_STATS_OFFSET_BH(idev,
SNMP_INC_STATS_OFFSET_BH(icmpv6_statistics, field, _offset); \
})
DECLARE_SNMP_STAT(struct udp_mib, udp_stats_in6);
-#define UDP6_INC_STATS(field) SNMP_INC_STATS(udp_stats_in6, field)
-#define UDP6_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_stats_in6, field)
-#define UDP6_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_stats_in6, field)
+DECLARE_SNMP_STAT(struct udp_mib, udplite_stats_in6);
+#define UDP6_INC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_BH(udplite_stats_in6, field); \
+ else SNMP_INC_STATS_BH(udp_stats_in6, field); } while(0)
+#define UDP6_INC_STATS_USER(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_USER(udplite_stats_in6, field); \
+ else SNMP_INC_STATS_USER(udp_stats_in6, field); } while(0)
int snmp6_register_dev(struct inet6_dev *idev);
int snmp6_unregister_dev(struct inet6_dev *idev);
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index efee7a6..d72b36e 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -49,6 +49,8 @@ static int sockstat6_seq_show(struct seq
fold_prot_inuse(&tcpv6_prot));
seq_printf(seq, "UDP6: inuse %d\n",
fold_prot_inuse(&udpv6_prot));
+ seq_printf(seq, "UDPLITE6: inuse %d\n",
+ fold_prot_inuse(&udplitev6_prot));
seq_printf(seq, "RAW6: inuse %d\n",
fold_prot_inuse(&rawv6_prot));
seq_printf(seq, "FRAG6: inuse %d memory %d\n",
@@ -133,6 +135,14 @@ static struct snmp_mib snmp6_udp6_list[]
SNMP_MIB_SENTINEL
};
+static struct snmp_mib snmp6_udplite6_list[] = {
+ SNMP_MIB_ITEM("UdpLite6InDatagrams", UDP_MIB_INDATAGRAMS),
+ SNMP_MIB_ITEM("UdpLite6NoPorts", UDP_MIB_NOPORTS),
+ SNMP_MIB_ITEM("UdpLite6InErrors", UDP_MIB_INERRORS),
+ SNMP_MIB_ITEM("UdpLite6OutDatagrams", UDP_MIB_OUTDATAGRAMS),
+ SNMP_MIB_SENTINEL
+};
+
static unsigned long
fold_field(void *mib[], int offt)
{
@@ -166,6 +176,7 @@ static int snmp6_seq_show(struct seq_fil
snmp6_seq_show_item(seq, (void **)ipv6_statistics, snmp6_ipstats_list);
snmp6_seq_show_item(seq, (void **)icmpv6_statistics, snmp6_icmp6_list);
snmp6_seq_show_item(seq, (void **)udp_stats_in6, snmp6_udp6_list);
+ snmp6_seq_show_item(seq, (void **)udplite_stats_in6, snmp6_udplite6_list);
}
return 0;
}
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 858cae2..fdc7bef 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -49,6 +49,7 @@ #include <linux/netfilter_ipv6.h>
#include <net/ip.h>
#include <net/ipv6.h>
#include <net/udp.h>
+#include <net/udplite.h>
#include <net/tcp.h>
#include <net/ipip.h>
#include <net/protocol.h>
@@ -862,6 +863,7 @@ #endif
/* Init v6 transport protocols. */
udpv6_init();
+ udplitev6_init();
tcpv6_init();
ipv6_packet_init();
@@ -926,6 +928,7 @@ #ifdef CONFIG_IPV6_MIP6
mip6_fini();
#endif
/* Cleanup code parts. */
+ udplitev6_cleanup();
ip6_flowlabel_cleanup();
addrconf_cleanup();
ip6_route_cleanup();
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCHv4 3/3] net: UDP-Lite misc files
2006-10-12 9:01 ` David Miller
2006-10-13 15:14 ` [PATCHv4 1/3] net/ipv4: UDP-Lite support (RFC 3828) Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 2/3] net/ipv6: v6-side of UDP-Lite Gerrit Renker
@ 2006-10-13 15:14 ` Gerrit Renker
2 siblings, 0 replies; 19+ messages in thread
From: Gerrit Renker @ 2006-10-13 15:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev
Miscellaneous files to support UDP-Lite, including basic
xfrm and netfilter support.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
Documentation/networking/udplite.txt | 291 +++++++++++++++++++++++++++++++++++
include/net/xfrm.h | 2
net/ipv4/netfilter/ipt_LOG.c | 11 -
net/ipv4/xfrm4_policy.c | 1
net/ipv6/netfilter/ip6t_LOG.c | 10 -
net/ipv6/xfrm6_policy.c | 1
net/netfilter/xt_multiport.c | 5
net/netfilter/xt_tcpudp.c | 20 ++
8 files changed, 332 insertions(+), 9 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 737fdb2..70a8d2d 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -468,6 +468,7 @@ __be16 xfrm_flowi_sport(struct flowi *fl
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_sport;
break;
@@ -493,6 +494,7 @@ __be16 xfrm_flowi_dport(struct flowi *fl
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_dport;
break;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 1bed0cd..af6867f 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -199,6 +199,7 @@ _decode_session4(struct sk_buff *skb, st
if (!(iph->frag_off & htons(IP_MF | IP_OFFSET))) {
switch (iph->protocol) {
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 73cee2e..8a25db1 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -272,6 +272,7 @@ _decode_session6(struct sk_buff *skb, st
break;
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_TCP:
case IPPROTO_SCTP:
case IPPROTO_DCCP:
diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index d3aefd3..c3b4bc0 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -1,5 +1,5 @@
-/* Kernel module to match one of a list of TCP/UDP/SCTP/DCCP ports: ports are in
- the same place so we can treat them as equal. */
+/* Kernel module to match one of a list of TCP/UDP(-Lite)/SCTP/DCCP ports:
+ ports are in the same place so we can treat them as equal. */
/* (C) 1999-2001 Paul `Rusty' Russell
* (C) 2002-2004 Netfilter Core Team <coreteam@netfilter.org>
@@ -162,6 +162,7 @@ check(u_int16_t proto,
{
/* Must specify supported protocol, no unknown flags or bad count */
return (proto == IPPROTO_TCP || proto == IPPROTO_UDP
+ || proto == IPPROTO_UDPLITE
|| proto == IPPROTO_SCTP || proto == IPPROTO_DCCP)
&& !(ip_invflags & XT_INV_PROTO)
&& (match_flags == XT_MULTIPORT_SOURCE
diff --git a/net/netfilter/xt_tcpudp.c b/net/netfilter/xt_tcpudp.c
index e76a68e..46414b5 100644
--- a/net/netfilter/xt_tcpudp.c
+++ b/net/netfilter/xt_tcpudp.c
@@ -10,7 +10,7 @@ #include <linux/netfilter/xt_tcpudp.h>
#include <linux/netfilter_ipv4/ip_tables.h>
#include <linux/netfilter_ipv6/ip6_tables.h>
-MODULE_DESCRIPTION("x_tables match for TCP and UDP, supports IPv4 and IPv6");
+MODULE_DESCRIPTION("x_tables match for TCP and UDP(-Lite), supports IPv4 and IPv6");
MODULE_LICENSE("GPL");
MODULE_ALIAS("xt_tcp");
MODULE_ALIAS("xt_udp");
@@ -234,6 +234,24 @@ static struct xt_match xt_tcpudp_match[]
.proto = IPPROTO_UDP,
.me = THIS_MODULE,
},
+ {
+ .name = "udplite",
+ .family = AF_INET,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "udplite",
+ .family = AF_INET6,
+ .checkentry = udp_checkentry,
+ .match = udp_match,
+ .matchsize = sizeof(struct xt_udp),
+ .proto = IPPROTO_UDPLITE,
+ .me = THIS_MODULE,
+ },
};
static int __init xt_tcpudp_init(void)
diff --git a/net/ipv4/netfilter/ipt_LOG.c b/net/ipv4/netfilter/ipt_LOG.c
index 7dc820d..46eee64 100644
--- a/net/ipv4/netfilter/ipt_LOG.c
+++ b/net/ipv4/netfilter/ipt_LOG.c
@@ -171,11 +171,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (ih->protocol == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (ntohs(ih->frag_off) & IP_OFFSET)
break;
@@ -341,6 +345,7 @@ static void dump_packet(const struct nf_
/* IP: 40+46+6+11+127 = 230 */
/* TCP: 10+max(25,20+30+13+9+32+11+127) = 252 */
/* UDP: 10+max(25,20) = 35 */
+ /* UDPLITE: 14+max(25,20) = 39 */
/* ICMP: 11+max(25, 18+25+max(19,14,24+3+n+10,3+n+10)) = 91+n */
/* ESP: 10+max(25)+15 = 50 */
/* AH: 9+max(25)+15 = 49 */
diff --git a/net/ipv6/netfilter/ip6t_LOG.c b/net/ipv6/netfilter/ip6t_LOG.c
index 0cf537d..3cb6bb7 100644
--- a/net/ipv6/netfilter/ip6t_LOG.c
+++ b/net/ipv6/netfilter/ip6t_LOG.c
@@ -270,11 +270,15 @@ static void dump_packet(const struct nf_
}
break;
}
- case IPPROTO_UDP: {
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE: {
struct udphdr _udph, *uh;
- /* Max length: 10 "PROTO=UDP " */
- printk("PROTO=UDP ");
+ if (currenthdr == IPPROTO_UDP)
+ /* Max length: 10 "PROTO=UDP " */
+ printk("PROTO=UDP " );
+ else /* Max length: 14 "PROTO=UDPLITE " */
+ printk("PROTO=UDPLITE ");
if (fragment)
break;
diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt
new file mode 100644
index 0000000..a899fa1
--- /dev/null
+++ b/Documentation/networking/udplite.txt
@@ -0,0 +1,291 @@
+ ===========================================================================
+ The UDP-Lite protocol (RFC 3828)
+ ===========================================================================
+ last modified: Mon 18th September 2006
+
+
+ UDP-Lite is a Standards-Track IETF transport protocol whose characteristic
+ is a variable-length checksum. This has advantages for transport of multimedia
+ (video, VoIP) over wireless networks, as partly damaged packets can still be
+ fed into the codec instead of being discarded due to a failed checksum test.
+
+ This file briefly describes the existing kernel support and the socket API.
+ For in-depth information, you can consult:
+
+ o The UDP-Lite Homepage: http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/
+ Fom here you can always also download the latest patch for the stable
+ kernel tree and some example application source code.
+
+ o The UDP-Lite HOWTO on
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/UDP-Lite-HOWTO.txt
+
+ o The Wireshark UDP-Lite WiKi (with capture files):
+ http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+
+ o The Protocol Spec, RFC 3828, on http://www.ietf.org/rfc/rfc3828.txt
+
+
+ I) APPLICATIONS
+
+ Several applications have been ported successfully to UDP-Lite. Ethereal
+ (now called wireshark) has UDP-Litev4/v6 support by default. The tarball on
+
+ http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
+
+ has source code for several v4/v6 client-server and network testing examples.
+
+ Porting applications to UDP-Lite is straightforward: only socket level and
+ IPPROTO need to be changed; senders additionally set the checksum coverage
+ length (default = header length = 8). Details are in the next section.
+ UDP-Lite is not enabled per default: set CONFIG_IP_UDPLITE=y to support it.
+
+
+ II) PROGRAMMING API
+
+ UDP-Lite provides a connectionless, unreliable datagram service and hence
+ uses the same socket type as UDP. In fact, porting from UDP to UDP-Lite is
+ dead easy: simply add `IPPROTO_UDPLITE' as the last argument of the socket(2)
+ call so that the statement looks like:
+
+ s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ or, respectively,
+
+ s = socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE);
+
+ Since both UDP-Litev4 and UDP-Litev6 are supported, the porting process is the
+ same in both occasions. With just this change you are able to run UDP-Lite
+ services or connect to UDP-Lite servers. The kernel will assume that you are
+ not interested in using partial checksum coverage and so emulate UDP mode.
+
+ To make use of the partial checksum coverage facilities requires setting just
+ one socket option which takes an integer specifying the coverage length:
+
+ * Sender checksum coverage: UDPLITE_SEND_CSCOV
+
+ For example,
+
+ int val = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_SEND_CSCOV, &val, sizeof(int));
+
+ sets the checksum coverage length to 20 bytes (12b data + 8b header).
+ Of each packet only the first 20 bytes (plus the pseudo-header) will be
+ checksummed. This is useful for RTP applications which have a 12-byte
+ base header.
+
+
+ * Receiver checksum coverage: UDPLITE_RECV_CSCOV
+
+ This option is the receiver-side analogue. It is truly optional, i.e. not
+ required to enable traffic with partial checksum coverage. Its function is
+ that of a traffic filter: when enabled, it instructs the kernel to drop
+ all packets which have a coverage _less_ than this value. For example, if
+ RTP and UDP headers are to be protected, a receiver can enforce that only
+ packets with a minimum coverage of 20 are admitted:
+
+ int min = 20;
+ setsockopt(s, SOL_UDPLITE, UDPLITE_RECV_CSCOV, &min, sizeof(int));
+
+ The calls to getsockopt(2) are analogous. Being an extension and not a stand-
+ alone protocol, all socket options known from UDP can be used in exactly the
+ same manner as before, e.g. UDP_CORK or UDP_ENCAP.
+
+ A detailed discussion of UDP-Lite checksum coverage options is in section IV.
+
+
+
+ III) HEADER FILES
+
+ The socket API requires support through header files in /usr/include:
+
+ * /usr/include/netinet/in.h
+ to define IPPROTO_UDPLITE
+
+ * /usr/include/netinet/udplite.h
+ for UDP-Lite header fields and protocol constants
+
+ For testing purposes, the following can serve as a `mini' header file:
+
+ #define IPPROTO_UDPLITE 136
+ #define SOL_UDPLITE 136
+ #define UDPLITE_SEND_CSCOV 10
+ #define UDPLITE_RECV_CSCOV 11
+
+ Ready-made header files for various distros are in the UDP-Lite tarball.
+
+
+
+ IV) KERNEL BEHAVIOUR WITH REGARD TO THE VARIOUS SOCKET OPTIONS
+
+ To enable debugging messages, the log level must be set to 8, as most
+ messages use the KERN_DEBUG level (7).
+
+
+ 1) Sender Socket Options
+
+ If the sender specifies a value of 0 as coverage length, the module
+ assumes full coverage, transmits a packet with coverage length of 0
+ and according checksum. If the sender specifies a coverage < 8 and
+ different from 0, the kernel assumes 8 as default value. Finally,
+ if the specified coverage length exceeds the packet length, the packet
+ length is used instead as coverage length.
+
+
+ 2) Receiver Socket Options
+
+ The receiver specifies the minimum value of the coverage length it
+ is willing to accept. A value of 0 here indicates that the receiver
+ always wants the whole of the packet covered. In this case, all
+ partially covered packets are dropped and an error is logged.
+
+ It is not possible to specify illegal values (<0 and <8); in these
+ cases the default of 8 is assumed.
+
+ All packets arriving with a coverage value less than the specified
+ threshold are discarded, these events are also logged.
+
+
+ 3) Disabling the Checksum Computation
+
+ On both sender and receiver, checksumming will always be performed
+ and can not be disabled using SO_NO_CHECK. Thus
+
+ setsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, ... );
+
+ will always will be ignored, while the value of
+
+ getsockopt(sockfd, SOL_SOCKET, SO_NO_CHECK, &value, ...);
+
+ is meaningless (as in TCP). Packets with a zero checksum field are
+ illegal (cf. RFC 3828, sec. 3.1) will be silently discarded.
+
+
+ 4) Fragmentation
+
+ The checksum computation respects both buffersize and MTU. The size
+ of UDP-Lite packets is determined by the size of the send buffer. The
+ minimum size of the send buffer is 2048 (defined as SOCK_MIN_SNDBUF
+ in include/net/sock.h), the default value is configurable as
+ net.core.wmem_default or via setting the SO_SNDBUF socket(7)
+ option. The maximum upper bound for the send buffer is determined
+ by net.core.wmem_max.
+
+ Given a payload size larger than the send buffer size, UDP-Lite will
+ split the payload into several individual packets, filling up the
+ send buffer size in each case.
+
+ The precise value also depends on the interface MTU. The interface MTU,
+ in turn, may trigger IP fragmentation. In this case, the generated
+ UDP-Lite packet is split into several IP packets, of which only the
+ first one contains the L4 header.
+
+ The send buffer size has implications on the checksum coverage length.
+ Consider the following example:
+
+ Payload: 1536 bytes Send Buffer: 1024 bytes
+ MTU: 1500 bytes Coverage Length: 856 bytes
+
+ UDP-Lite will ship the 1536 bytes in two separate packets:
+
+ Packet 1: 1024 payload + 8 byte header + 20 byte IP header = 1052 bytes
+ Packet 2: 512 payload + 8 byte header + 20 byte IP header = 540 bytes
+
+ The coverage packet covers the UDP-Lite header and 848 bytes of the
+ payload in the first packet, the second packet is fully covered. Note
+ that for the second packet, the coverage length exceeds the packet
+ length. The kernel always re-adjusts the coverage length to the packet
+ length in such cases.
+
+ As an example of what happens when one UDP-Lite packet is split into
+ several tiny fragments, consider the following example.
+
+ Payload: 1024 bytes Send buffer size: 1024 bytes
+ MTU: 300 bytes Coverage length: 575 bytes
+
+ +-+-----------+--------------+--------------+--------------+
+ |8| 272 | 280 | 280 | 280 |
+ +-+-----------+--------------+--------------+--------------+
+ 280 560 840 1032
+ ^
+ *****checksum coverage*************
+
+ The UDP-Lite module generates one 1032 byte packet (1024 + 8 byte
+ header). According to the interface MTU, these are split into 4 IP
+ packets (280 byte IP payload + 20 byte IP header). The kernel module
+ sums the contents of the entire first two packets, plus 15 bytes of
+ the last packet before releasing the fragments to the IP module.
+
+ To see the analogous case for IPv6 fragmentation, consider a link
+ MTU of 1280 bytes and a write buffer of 3356 bytes. If the checksum
+ coverage is less than 1232 bytes (MTU minus IPv6/fragment header
+ lengths), only the first fragment needs to be considered. When using
+ larger checksum coverage lengths, each eligible fragment needs to be
+ checksummed. Suppose we have a checksum coverage of 3062. The buffer
+ of 3356 bytes will be split into the following fragments:
+
+ Fragment 1: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 2: 1280 bytes carrying 1232 bytes of UDP-Lite data
+ Fragment 3: 948 bytes carrying 900 bytes of UDP-Lite data
+
+ The first two fragments have to be checksummed in full, of the last
+ fragment only 598 (= 3062 - 2*1232) bytes are checksummed.
+
+ While it is important that such cases are dealt with correctly, they
+ are (annoyingly) rare: UDP-Lite is designed for optimising multimedia
+ performance over wireless (or generally noisy) links and thus smaller
+ coverage lenghts are likely to be expected.
+
+
+ V) UDP-LITE RUNTIME STATISTICS AND THEIR MEANING
+
+ Exceptional and error conditions are logged to syslog at the KERN_DEBUG
+ level. Live statistics about UDP-Lite are available in /proc/net/snmp
+ and can (with newer versions of netstat) be viewed using
+
+ netstat -svu
+
+ This displays UDP-Lite statistics variables, whose meaning is as follows.
+
+ InDatagrams: Total number of received datagrams.
+
+ NoPorts: Number of packets received to an unknown port.
+ These cases are counted separately (not as InErrors).
+
+ InErrors: Number of erroneous UDP-Lite packets. Errors include:
+ * internal socket queue receive errors
+ * packet too short (less than 8 bytes or stated
+ coverage length exceeds received length)
+ * xfrm4_policy_check() returned with error
+ * application has specified larger min. coverage
+ length than that of incoming packet
+ * checksum coverage violated
+ * bad checksum
+
+ OutDatagrams: Total number of sent datagrams.
+
+ These statistics derive from the UDP MIB (RFC 2013).
+
+
+ VI) IPTABLES
+
+ There is packet match support for UDP-Lite as well as support for the LOG target.
+ If you copy and paste the following line into /etc/protcols,
+
+ udplite 136 UDP-Lite # UDP-Lite [RFC 3828]
+
+ then
+ iptables -A INPUT -p udplite -j LOG
+
+ will produce logging output to syslog. Dropping and rejecting packets also works.
+
+
+ VII) MAINTAINER ADDRESS
+
+ The UDP-Lite patch was developed at
+ University of Aberdeen
+ Electronics Research Group
+ Department of Engineering
+ Fraser Noble Building
+ Aberdeen AB24 3UE; UK
+ The current maintainer is Gerrit Renker, <gerrit@erg.abdn.ac.uk>. Initial
+ code had been developed by William Stanislaus, <william@erg.abdn.ac.uk>.
^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-10-13 15:14 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-14 16:19 [PATCHv2 2.6.18-rc1-mm2 1/3] net: UDP-Lite generic support Gerrit Renker
2006-07-15 13:33 ` Herbert Xu
2006-07-16 9:29 ` Gerrit Renker
2006-07-28 5:30 ` David Miller
2006-07-28 8:19 ` Gerrit Renker
2006-07-28 8:25 ` David Miller
2006-09-19 7:25 ` [PATCHv3 1/4][RFC] net/ipv4: consolidated UDP / UDP-Lite code Gerrit Renker
2006-10-09 9:51 ` [PATCH-update][RFC] net: " Gerrit Renker
2006-10-11 2:38 ` David Miller
2006-10-11 7:40 ` Gerrit Renker
2006-10-12 7:49 ` Gerrit Renker
2006-10-12 9:01 ` David Miller
2006-10-13 15:14 ` [PATCHv4 1/3] net/ipv4: UDP-Lite support (RFC 3828) Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 2/3] net/ipv6: v6-side of UDP-Lite Gerrit Renker
2006-10-13 15:14 ` [PATCHv4 3/3] net: UDP-Lite misc files Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 2/4][RFC] net/ipv4: self-contained UDP-Lite module Gerrit Renker
2006-09-19 7:25 ` [PATCHv3 3/4][RFC] net: basic xfrm/netfilter support for UDP-Lite Gerrit Renker
2006-09-19 7:37 ` Patrick McHardy
2006-09-19 7:25 ` [PATCHv3 4/4][RFC] net: misc. files to support UDP-Lite Gerrit Renker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).