Add PGM protocol support to the IP stack

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Add PGM protocol support to the IP stack
@ 2010-03-18 17:58 Christoph Lameter
  2010-03-18 21:58 ` Christoph Lameter
  2010-03-19 17:18 ` Andi Kleen
  0 siblings, 2 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-03-18 17:58 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-kernel

Is there any work in progress on including PGM support (RFC 3208) in the
kernel?

I know about the openpgm implementation. Openpbm does this at the user
level and requires linking to a library. It is essentially a communication
protocol done in user space. It has privilege issues because it has to
create PGM packets via a raw socket. Which also has implications for the
possible performance. Openpgm seems to be able to interact with major
commercial implementations of PGM.

I am looking at openpgm right now and it seems that there are a number of
useful files and functions in there that could be used to implement PGM
support in the kernel.

There is also an existing socket API for handling PGM available in another
operating system whose name we rather avoid mentioning. That socket API
could be used as the basic. PGM use would then be possible without a
library and without privilege and performance issues.

PGM support would support two different modes of communication

1. Native PGM (allows NAK suppression by Cisco routers to be used)

	socket(AF_INET, SOCK_RDM, IPPROTO_RM)

(SOCK_RDM is defined in the kernel sources but not implemented. PGM
support would implement SOCK_RDM, IPPROTO_RM would need to be defined
according to the IANA protocol number for PGM).

2. PGM over UDP (which is used by many commercial product but not by the
unspeakable OS). No router support for NAK suppression is available. For
this I guess we would have to support

	socket(AF_INET, SOCK_RDM, IPPROTO_UDP)

I would be interested to find others who are interested in such a project
or maybe there is already a project in the works? If not then I will try
to come up with some code to get this going. Any help you could offer
would be appreciated.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-18 17:58 Add PGM protocol support to the IP stack Christoph Lameter
@ 2010-03-18 21:58 ` Christoph Lameter
  2010-03-19 17:18 ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-03-18 21:58 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-kernel

Here is what I have so far after a couple of hours.
Something hacked together from openpgm and udplite.

---
 Documentation/networking/pgm/TODO       |    8
 Documentation/networking/pgm/references |    2
 Documentation/networking/pgm/usage      |   91 ++++
 include/linux/in.h                      |    2
 include/linux/pgm.h                     |  720 ++++++++++++++++++++++++++++++++
 net/ipv4/Kconfig                        |   14
 net/ipv4/Makefile                       |    3
 net/ipv4/pgm.c                          |  143 ++++++
 8 files changed, 983 insertions(+)

Index: linux-2.6/include/linux/in.h
===================================================================
--- linux-2.6.orig/include/linux/in.h	2010-03-18 11:05:24.000000000 -0500
+++ linux-2.6/include/linux/in.h	2010-03-18 15:47:59.000000000 -0500
@@ -44,6 +44,7 @@ enum {
   IPPROTO_PIM    = 103,		/* Protocol Independent Multicast	*/

   IPPROTO_COMP   = 108,                /* Compression Header protocol */
+  IPPROTO_PGM	 = 113,		/* Pragmatic General Multicast		*/
   IPPROTO_SCTP   = 132,		/* Stream Control Transport Protocol	*/
   IPPROTO_UDPLITE = 136,	/* UDP-Lite (RFC 3828)			*/

@@ -51,6 +52,7 @@ enum {
   IPPROTO_MAX
 };

+#define IPPROTO_RM IPPROTO_PGM

 /* Internet address. */
 struct in_addr {
Index: linux-2.6/include/linux/pgm.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/include/linux/pgm.h	2010-03-18 16:56:19.000000000 -0500
@@ -0,0 +1,720 @@
+/*
+ * PGM packet formats, RFC 3208.
+ *
+ * Copyright (c) 2006 Miru Limited.
+ * Copyright (c) 2010 Christoph Lameter, The Linux Foundation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ * March 17, 2010 Christoph Lameter
+ *		Basic PGM definitions extracted from openpgm project.
+ * March 18, 2010
+ *		Socket API and document intended usage.
+ *		Basic protocol environment (from udplite.c)
+ */
+
+#ifndef _LINUX_PGM_H
+#define _LINUX_PGM_H
+
+#include <linux/types.h>
+
+/* PGM socket options */
+
+/* Transmitter */
+#define RM_LATEJOIN				1	/* X Not supported on receive so why have it? */
+#define RM_RATE_WINDOW_SIZE			2	/* See struct pgm_send_window */
+#define RM_SEND_WINDOW_ADV_RATE			3	/* X Increase of send window in percentage of window */
+#define RM_SENDER_STATISTICS			4	/* see struct pgm_sender_stats */
+#define RM_SENDER_WINDOW_ADVANCE_METHOD		5	/* X seems obsolete */
+#define RM_SET_MCAST_TTL			6	/* X Can be set via IP_MULTICAST_TTL */
+#define RM_SET_MESSAGE_BOUNDARY			7	/* Fix the size of the messages in bytes */
+#define RM_SET_SEND_IF				8	/* X use IP_MULTICAST_IF etc instead */
+#define RM_USE_FEC				9
+
+/* Receiver */
+#define RM_ADD_RECEIVE_IF			100	/* X ???? IP_MULTICAST_IF instead? */
+#define RM_DEL_RECEIVE_IF			101	/* X IP_MULTICAST_IF */
+#define RM_HIGH_SPEED_INTRANET_OPT		102	/* X PGM should adapt automatically to high speed networks */
+#define RM_RECEIVER_STATISTICS			103	/* See struct pgm_receiver_stats */
+
+/* Socket API structures (established by M$DN) */
+struct pgm_receiver_stats {
+	u64	NumODataPacketsReceived;	/* Number of ODATA (original) sequences */
+	u64	NumRDataPacketsReceived;	/* Number of RDATA (repair) sequences */
+	u64	NumDuplicateDataPackets;	/* Duplicate sequences */
+	u64	DataBytesReceived;
+	u64	TotalBytesReceived;
+	u64	RateKBitsPerSecOverall;		/* Receive rate since start of session X */
+	u64	RateKBitsPerSecLast;		/* Receive rate for last second X*/
+	u64	TrailingEdgeSeqId;		/* Oldest sequence in the receive window */
+	u64	LeadingEdgeSeqId;		/* Newest sequence in the receive window */
+	u64	AverageSequencesInWindow;	/* Average number of sequences in receive window X */
+	u64	MinSequencesInWindow;		/* The mininum number of sequences */
+	u64	MaxSequencesInWindow;		/* The maximum number of sequences */
+	u64	FirstNakSequenceNumber;		/* First outstanding nack sequence number */
+	u64	NumPendingNaks;			/* Number of sequences waiting for NCF */
+	u64	NumOutstandingNaks;		/* Number of sequences waiting for RDATA */
+	u64	NumDataPacketsBuffered;		/* Number of packets currently buffered */
+	u64	TotalSelectiveNaksSent;		/* Number of NAKs sent total */
+	u64	TotalParityNaksSent;		/* Number of parity NAKs sent */
+};
+
+struct pgm_sender_stats {
+	u64	DataBytesSent;
+	u64	TotalBytesSent;
+	u64	NaksReceived;
+	u64	NaksReceivedTooLate;		/* NAKs received after receive window advanced */
+	u64	NumOutstandingNaks;		/* Number of NAKs awaiting response */
+	u64	NumNaksAfterRData;		/* Number of NAKs after RDATA sequences were sent which were ignored */
+	u64	RepairPacketsSent;
+	u64	BufferSpaceAvailable;		/* Number of partial messages dropped */
+	u64	TrailingEdgeSeqId;		/* Oldest sequence id in window */
+	u64	LeadingEdgeSeqId;		/* Newest sequence id in window */
+	u64	RateKBitsPerSecOverall;		/* Rate since start of session X */
+	u64	RateKBitsPerSecLast;		/* Rate in last second X */
+	u64	TotalODataPacketsSent;		/* Total data packets transmitted */
+};
+
+/* Setup of sender RateKbitsPerSec = WindowSizeBytes / WindowSizeMSecs */
+struct pgm_send_window {
+  	u64	RateKbitsPerSec;		/* Allowed rate for the sender in kbits per second */
+	u64	WindowSizeInMSecs;		/* Send window size in time */
+	u64	WindowSizeInBytes;		/* Window size in bytes */
+};
+
+struct pgm_fec_info {
+  	u16	FECBlockSize;			/* Maximum number of packets for a group. Default and max = 255 */
+	u16	FECProActivePackets;		/* Number of proactive packets per group. */
+	u8	FECGroupSize;			/* Number of packets to be treated as a group. Power of two */
+  	int	fFECOnDemandParityEnabled;	/* Allow sender to sent parity repair packets */
+};
+
+/* address family indicator, rfc 1700 (ADDRESS FAMILY NUMBERS) */
+#ifndef AFI_IP
+#define AFI_IP	    1	    /* IP (IP version 4) */
+#define AFI_IP6	    2	    /* IP6 (IP version 6) */
+#endif
+
+/* UDP ports for UDP encapsulation, as per IBM WebSphere MQ */
+#define PGM_DEFAULT_UDP_ENCAP_UCAST_PORT	3055
+#define PGM_DEFAULT_UDP_ENCAP_MCAST_PORT	3056
+
+/* PGM default ports */
+#define PGM_DEFAULT_DATA_DESTINATION_PORT	7500
+#define PGM_DEFAULT_DATA_SOURCE_PORT	0	/* random */
+
+/* DoS limitation to protocol (MS08-036, KB950762) */
+#define PGM_MAX_APDU			UINT16_MAX
+
+/* Cisco default: 24 (max 8200), Juniper & H3C default: 16 */
+#define PGM_MAX_FRAGMENTS		16
+
+enum pgm_type {
+    PGM_SPM = 0x00,	/* 8.1: source path message */
+    PGM_POLL = 0x01,	/* 14.7.1: poll request */
+    PGM_POLR = 0x02,	/* 14.7.2: poll response */
+    PGM_ODATA = 0x04,	/* 8.2: original data */
+    PGM_RDATA = 0x05,	/* 8.2: repair data */
+    PGM_NAK = 0x08,	/* 8.3: NAK or negative acknowledgement */
+    PGM_NNAK = 0x09,	/* 8.3: N-NAK or null negative acknowledgement */
+    PGM_NCF = 0x0a,	/* 8.3: NCF or NAK confirmation */
+    PGM_SPMR = 0x0c,	/* 13.6: SPM request */
+    PGM_MAX = 0xff
+};
+
+#define PGM_OPT_LENGTH		    0x00	/* options length */
+#define PGM_OPT_FRAGMENT	    0x01	/* fragmentation */
+#define PGM_OPT_NAK_LIST	    0x02	/* list of nak entries */
+#define PGM_OPT_JOIN		    0x03	/* late joining */
+#define PGM_OPT_REDIRECT	    0x07	/* redirect */
+#define PGM_OPT_SYN		    0x0d	/* synchronisation */
+#define PGM_OPT_FIN		    0x0e	/* session end */
+#define PGM_OPT_RST		    0x0f	/* session reset */
+
+#define PGM_OPT_PARITY_PRM	    0x08	/* forward error correction parameters */
+#define PGM_OPT_PARITY_GRP	    0x09	/*   group number */
+#define PGM_OPT_CURR_TGSIZE	    0x0a	/*   group size */
+
+#define PGM_OPT_CR		    0x10	/* congestion report */
+#define PGM_OPT_CRQST		    0x11	/* congestion report request */
+
+#define PGM_OPT_NAK_BO_IVL	    0x04	/* nak back-off interval */
+#define PGM_OPT_NAK_BO_RNG	    0x05	/* nak back-off range */
+#define PGM_OPT_NBR_UNREACH	    0x0b	/* neighbour unreachable */
+#define PGM_OPT_PATH_NLA	    0x0c	/* path nla */
+
+#define PGM_OPT_INVALID		    0x7f	/* option invalidated */
+
+/* 8. PGM header */
+struct pgm_header {
+	u16		sport;			/* source port: tsi::sport or UDP port depending on direction */
+	u16		dport;			/* destination port */
+	u8		type;			/* version / packet type */
+	u8		options;		/* options */
+#define PGM_OPT_PARITY		0x80	/* parity packet */
+#define PGM_OPT_VAR_PKTLEN	0x40	/* + variable sized packets */
+#define PGM_OPT_NETWORK		0x02    /* network-significant: must be interpreted by network elements */
+#define PGM_OPT_PRESENT		0x01	/* option extension are present */
+	u16		checksum;		/* checksum */
+	u8		gsi[6];			/* global source id */
+	u16		tsdu_length;		/* tsdu length */
+				/* tpdu length = th length (header + options) + tsdu length */
+};
+
+/* 8.1.  Source Path Messages (SPM) */
+struct pgm_spm {
+	u32		sqn;			/* spm sequence number */
+	u32		trail;			/* trailing edge sequence number */
+	u32		lead;			/* leading edge sequence number */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved;		/* reserved */
+	struct in_addr spm_nla;		/* path nla */
+	/* ... option extensions */
+};
+
+struct pgm_spm6 {
+	u32		sqn;			/* spm sequence number */
+	u32		trail;			/* trailing edge sequence number */
+	u32		lead;			/* leading edge sequence number */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved;		/* reserved */
+	struct in6_addr spm6_nla;		/* path nla */
+	/* ... option extensions */
+};
+
+/* 8.2.  Data Packet */
+struct pgm_data {
+	u32		sqn;			/* data packet sequence number */
+	u32		trail;			/* trailing edge sequence number */
+	/* ... option extensions */
+	/* ... data */
+};
+
+/* 8.3.  Negative Acknowledgments and Confirmations (NAK, N-NAK, & NCF) */
+struct pgm_nak {
+	u32		sqn;			/* requested sequence number */
+	u16		src_nla_afi;		/* nla afi */
+	u16		reserved;		/* reserved */
+	struct in_addr src_nla;		/* source nla */
+	u16		grp_nla_afi;		/* nla afi */
+	u16		reserved2;		/* reserved */
+	struct in_addr grp_nla;		/* multicast group nla */
+	/* ... option extension */
+};
+
+struct pgm_nak6 {
+	u32		sqn;			/* requested sequence number */
+	u16		src_nla_afi;		/* nla afi */
+	u16		reserved;		/* reserved */
+	struct in6_addr src_nla;		/* source nla */
+	u16		grp_nla_afi;		/* nla afi */
+	u16		reserved2;		/* reserved */
+	struct in6_addr grp_nla;		/* multicast group nla */
+	/* ... option extension */
+};
+
+/* 9.  Option header (max 16 per packet) */
+struct pgm_opt_header {
+	u8		type;			/* option type */
+#define PGM_OPT_MASK	0x7f
+#define PGM_OPT_END	0x80		/* end of options flag */
+	u8		length;			/* option length */
+	u8		reserved;
+#define PGM_OP_ENCODED		0x8	/* F-bit */
+#define PGM_OPX_MASK		0x3
+#define PGM_OPX_IGNORE		0x0	/* extensibility bits */
+#define PGM_OPX_INVALIDATE	0x1
+#define PGM_OPX_DISCARD		0x2
+#define PGM_OP_ENCODED_NULL	0x80	/* U-bit */
+};
+
+/* 9.1.  Option extension length - OPT_LENGTH */
+struct pgm_opt_length {
+	u8		type;			/* include header as total length overwrites reserved/OPX bits */
+	u8		length;
+	u16		total_length;	    	/* total length of all options */
+};
+
+/* 9.2.  Option fragment - OPT_FRAGMENT */
+struct pgm_opt_fragment {
+	u8		reserved;		/* reserved */
+	u32		sqn;			/* first sequence number */
+	u32		frag_off;		/* offset */
+	u32		frag_len;		/* length */
+};
+
+/* 9.3.5.  Option NAK List - OPT_NAK_LIST */
+struct pgm_opt_nak_list {
+	u8		reserved;		/* reserved */
+	u32		sqn[];
+};
+
+/* 9.4.2.  Option Join - OPT_JOIN */
+struct pgm_opt_join {
+	u8		reserved;		    /* reserved */
+	u32		join_min;		    /* minimum sequence number */
+};
+
+/* 9.5.5.  Option Redirect - OPT_REDIRECT */
+struct pgm_opt_redirect {
+	u8		reserved;		/* reserved */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved2;		/* reserved */
+	struct in_addr nla;		/* dlr nla */
+};
+
+struct pgm_opt6_redirect {
+	u8		reserved;		/* reserved */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved2;		/* reserved */
+	struct in6_addr opt6_nla;		/* dlr nla */
+};
+
+/* 9.6.2.  Option Sources - OPT_SYN */
+struct pgm_opt_syn {
+	u8	    	reserved;		/* reserved */
+};
+
+/* 9.7.4.  Option End Session - OPT_FIN */
+struct pgm_opt_fin {
+	u8		reserved;		/* reserved */
+};
+
+/* 9.8.4.  Option Reset - OPT_RST */
+struct pgm_opt_rst {
+	u8		reserved;		/* reserved */
+};
+
+
+/*
+ * Forward Error Correction - FEC
+ */
+
+/* 11.8.1.  Option Parity - OPT_PARITY_PRM */
+struct pgm_opt_parity_prm {
+	u8	reserved;			/* reserved */
+#define PGM_PARITY_PRM_MASK 0x3
+#define PGM_PARITY_PRM_PRO  0x1		/* source provides pro-active parity packets */
+#define PGM_PARITY_PRM_OND  0x2		/*                 on-demand parity packets */
+	u32		tgs;			/* transmission group size */
+};
+
+/* 11.8.2.  Option Parity Group - OPT_PARITY_GRP */
+struct pgm_opt_parity_grp {
+	u8	reserved;			/* reserved */
+	u32	group;				/* parity group number */
+};
+
+/* 11.8.3.  Option Current Transmission Group Size - OPT_CURR_TGSIZE */
+struct pgm_opt_curr_tgsize {
+	u8	reserved;			/* reserved */
+	u32	atgsize;			/* actual transmission group size */
+};
+
+/*
+ * Congestion Control
+ */
+
+/* 12.7.1.  Option Congestion Report - OPT_CR */
+struct pgm_opt_cr {
+	u8		reserved;		/* reserved */
+	u32		cr_lead;		/* congestion report reference sqn */
+	u16		cr_ne_wl;		/* ne worst link */
+	u16		cr_ne_wp;		/* ne worst path */
+	u16		cr_rx_wp;		/* rcvr worst path */
+	u16		reserved2;		/* reserved */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved3;		/* reserved */
+	u32		cr_rcvr;		/* worst receivers nla */
+};
+
+/* 12.7.2.  Option Congestion Report Request - OPT_CRQST */
+struct pgm_opt_crqst {
+	u8	reserved;			/* reserved */
+};
+
+
+/*
+ * SPM Requests
+ */
+
+/* 13.6.  SPM Requests */
+struct pgm_spmr {
+    /* ... option extensions */
+};
+
+
+/*
+ * Poll Mechanism
+ */
+
+/* 14.7.1.  Poll Request */
+struct pgm_poll {
+	u32		sqn;			/* poll sequence number */
+	u16		round;			/* poll round */
+	u16		type;			/* poll sub-type */
+#define PGM_POLL_GENERAL	0x0	/* general poll  */
+#define PGM_POLL_DLR		0x1	/* DLR poll */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved;		/* reserved */
+	struct in_addr nla;			/* path nla */
+	u32		bo_ivl;			/* poll back-off interval */
+	char	rand[4];		/* random string */
+	u32		mask;			/* matching bit-mask */
+	/* ... option extensions */
+};
+
+struct pgm_poll6 {
+	u32		sqn;			/* poll sequence number */
+	u16		round;		    	/* poll round */
+	u16		s_type;			/* poll sub-type */
+	u16		nla_afi;		/* nla afi */
+	u16		reserved;		/* reserved */
+	struct in6_addr nla;		/* path nla */
+	u32		bo_ivl;			/* poll back-off interval */
+	char	rand[4];		/* random string */
+	u32		mask;			/* matching bit-mask */
+	/* ... option extensions */
+};
+
+/* 14.7.2.  Poll Response */
+struct pgm_polr {
+	u32		sqn;			/* polr sequence number */
+	u16		round;			/* polr round */
+	u16		reserved;		/* reserved */
+	/* ... option extensions */
+};
+
+
+/*
+ * Implosion Prevention
+ */
+
+/* 15.4.1.  Option NAK Back-Off Interval - OPT_NAK_BO_IVL */
+struct pgm_opt_nak_bo_ivl {
+	u8		opt_reserved;		/* reserved */
+	u32		opt_nak_bo_ivl;		/* nak back-off interval */
+	u32		opt_nak_bo_ivl_sqn;	/* nak back-off interval sqn */
+};
+
+/* 15.4.2.  Option NAK Back-Off Range - OPT_NAK_BO_RNG */
+struct pgm_opt_nak_bo_rng {
+	u8		opt_reserved;		/* reserved */
+	u32		opt_nak_max_bo_ivl;	/* maximum nak back-off interval */
+	u32		opt_nak_min_bo_ivl;	/* minimum nak back-off interval */
+};
+
+/* 15.4.3.  Option Neighbour Unreachable - OPT_NBR_UNREACH */
+struct pgm_opt_nbr_unreach {
+	u8		opt_reserved;		/* reserved */
+};
+
+/* 15.4.4.  Option Path - OPT_PATH_NLA */
+struct pgm_opt_path_nla {
+ 	u8		reserved;		/* reserved */
+	struct in_addr opt_path_nla;	/* path nla */
+};
+
+struct pgm_opt6_path_nla {
+	u8		reserved;		/* reserved */
+	struct in6_addr opt6_path_nla;	/* path nla */
+};
+
+#ifdef __KERNEL__
+
+#include <net/inet_sock.h>
+#include <linux/skbuff.h>
+#include <net/netns/hash.h>
+#include <linux/rslib.h>
+
+static inline int pgm_is_upstream(u8 type)
+{
+    return (type == PGM_NAK ||		/* unicast */
+	    type == PGM_NNAK ||		/* unicast */
+	    type == PGM_SPMR ||		/* multicast + unicast */
+	    type == PGM_POLR);		/* unicast */
+}
+
+static inline int pgm_is_peer(u8 type)
+{
+    return (type == PGM_SPMR);		/* multicast */
+}
+
+static inline int pgm_is_downstream (u8 type)
+{
+    return (type == PGM_SPM   ||	/* all multicast */
+	    type == PGM_ODATA ||
+	    type == PGM_RDATA ||
+	    type == PGM_POLL  ||
+	    type == PGM_NCF);
+}
+
+int pgm_verify_spm(struct sk_buff *);
+int pgm_verify_spmr(struct sk_buff *);
+int pgm_verify_nak(struct sk_buff *);
+int pgm_verify_nnak(struct sk_buff *);
+int pgm_verify_ncf(struct sk_buff *);
+int pgm_verify_poll(struct sk_buff *);
+int pgm_verify_polr(struct sk_buff *);
+
+/* Global sesssion ID */
+struct pgm_gsi {
+	char gsi[6];
+};
+
+struct pgm_tsi {
+	char	gsi[6];		/* global session identifier */
+	u16	sport;		/* source port: a random number to help detect session re-starts */
+}
+
+/* Receiver data structures */
+
+enum pgm_rxw_state {
+	PGM_PKT_ERROR_STATE,
+	PGM_PKT_BACK_OFF_STATE,	    /* PGM protocol recovery states */
+	PGM_PKT_WAIT_NCF_STATE,
+	PGM_PKT_WAIT_DATA_STATE,
+
+	PGM_PKT_HAVE_DATA_STATE,	    /* data received waiting to commit to application layer */
+
+	PGM_PKT_HAVE_PARITY_STATE,	    /* contains parity information not original data */
+	PGM_PKT_COMMIT_DATA_STATE,	    /* commited data waiting for purging */
+	PGM_PKT_LOST_DATA_STATE,	    /* if recovery fails, but packet has not yet been commited */
+};
+
+enum pgm_rxw_returns {
+	PGM_RXW_OK,
+	PGM_RXW_INSERTED,
+	PGM_RXW_APPENDED,
+	PGM_RXW_UPDATED,
+	PGM_RXW_MISSING,
+	PGM_RXW_DUPLICATE,
+	PGM_RXW_MALFORMED,
+	PGM_RXW_BOUNDS,
+	PGM_RXW_SLOW_CONSUMER,
+	PGM_RXW_UNKNOWN,
+};
+
+struct pgm_rxw_state {
+	unsigned long	nak_rb_expiry;
+	unsigned long	nak_rpt_expiry;
+	unsigned long	nak_rdata_expiry;
+
+        enum pgm_receiver_state state;
+
+	u8		nak_transmit_count;
+	u8		ncf_retry_count;
+	u8		data_retry_count;
+
+/* only valid on tg_sqn::pkt_sqn = 0 */
+	unsigned	is_contiguous:1;	/* transmission group */
+};
+
+struct pgm_rxw {
+	struct pgm_tsi *	tsi;
+
+        struct list_head backoff_queue;
+        struct list_head wait_ncf_queue;
+        struct list_head wait_data_queue;
+
+	/* window context counters */
+	u32		lost_count;		/* failed to repair */
+	u32		fragment_count;		/* incomplete apdu */
+	u32		parity_count;		/* parity for repairs */
+	u32		committed_count;	/* but still in window */
+
+        u16		max_tpdu;               /* maximum packet size */
+        u32		lead, trail;
+        u32		rxw_trail, rxw_trail_init;
+	u32		commit_lead;
+        unsigned        is_constrained:1;
+        unsigned        is_defined:1;
+	unsigned	has_event:1;		/* edge triggered */
+	unsigned	is_fec_available:1;
+	struct rs_t	rs;
+	u32		tg_size;		/* transmission group size for parity recovery */
+	unsigned	tg_sqn_shift;
+
+	u32		min_fill_time;		/* restricted from pgm_time_t */
+	u32		max_fill_time;
+	u32		min_nak_transmit_count;
+	u32		max_nak_transmit_count;
+	u32		cumulative_losses;
+	u32		bytes_delivered;		/* Fix this: Will overflow */
+	u32		msgs_delivered;
+
+	size_t		size;			/* in bytes */
+	unsigned	alloc;			/* in pkts */
+	struct sk_buff *pdata[];
+};
+
+struct pgm_rxw* pgm_rxw_create(pgm_tsi *, u16, u32, unsigned, unsigned);
+void pgm_rxw_destroy(struct pgm_rxw *);
+int pgm_rxw_add(struct pgm_rxw *, struct sk_buf *, u64, u64);
+void pgm_rxw_remove_commit(struct pgm_rxw *);
+size_t pgm_rxw_readv(struct pgm_rxw *, struct kiovec *, unsigned int);
+unsigned int pgm_rxw_remove_trail (struct pgm_rxw *);
+unsigned int pgm_rxw_update(struct pgm_rxw *, u32, u32, u64, u64);
+void pgm_rxw_update_fec(struct pgm_rxw *, unsigned int);
+int pgm_rxw_confirm(struct pgm_rxw *, u32, u64, u64, u64);
+void pgm_rxw_lost(struct  pgm_rxw *, u32);
+void pgm_rxw_state(struct pgm_rxw *, struct sk_buff *, enum pgm_pkt_state);
+struct sk_buff *pgm_rxw_peek(struct pgm_rxw *, u32);
+
+static inline int pgm_rxw_max_length(struct pgm_rxw *window)
+{
+	return window->alloc;
+}
+
+static inline u32 pgm_rxw_length(struct pgm_rxw *window)
+{
+	return ( 1 + window->lead ) - window->trail;
+}
+
+static inline size_t pgm_rxw_size(struct pgm_rxw *window)
+{
+	return window->size;
+}
+
+static inline int pgm_rxw_is_empty(struct pgm_rxw *window)
+{
+	return pgm_rxw_length (window) == 0;
+}
+
+static inline int pgm_rxw_is_full(struct pgm_rxw *window)
+{
+	return pgm_rxw_length (window) == pgm_rxw_max_length (window);
+}
+
+static inline u32 pgm_rxw_lead(struct pgm_rxw *window)
+{
+	return window->lead;
+}
+
+static inline u32 pgm_rxw_next_lead(struct pgm_rxw *window)
+{
+	return pgm_rxw_lead(window) + 1;
+}
+
+/* Transmitter data structures */
+
+struct pgm_txw_state {
+	u32		unfolded_checksum;	/* first 32-bit word must be checksum */
+
+	unsigned	waiting_retransmit:1;	/* in retransmit queue */
+	unsigned	retransmit_count:15;
+	unsigned	nak_elimination_count:16;
+
+        unsigned long	expiry;			/* Advance with time */
+        unsigned long	last_retransmit;	/* NAK elimination */
+};
+
+struct pgm_txw {
+	struct pgm_tsi*		tsi;
+
+/* option: lockless atomics */
+        u32			lead;
+        u32			trail;
+
+        struct list_head	retransmit_queue;
+
+	struct rs_t		rs;
+	unsigned int		tg_sqn_shift;
+	struct sk_buff *	parity_buffer;
+	unsigned		is_fec_enabled:1;
+
+	u32			size;			/* window content size in bytes */
+	u32			alloc;			/* length of pdata[] */
+	struct sk_buff*	pdata[];
+};
+
+struct pgm_txw *pgm_txw_create(pgm_tsi *, u16, u32, unsigned int,
+			unsigned int, int, unsigned int, unsigned int);
+void pgm_txw_shutdown (struct pgm_txw *);
+void pgm_txw_add(struct pgm_txw *, struct sk_buff *);
+struct sk_buff* pgm_txw_peek(struct pgm_txw* , u32);
+int pgm_txw_retransmit_push(struct pgm_txw *, u32, int, unsigned int);
+struct sk_buff* pgm_txw_retransmit_try_peek(struct pgm_txw *);
+void pgm_txw_retransmit_remove_head(struct pgm_txw *);
+
+static inline unsigned int pgm_txw_max_length(struct pgm_txw *window)
+{
+	return window->alloc;
+}
+
+static inline u32 pgm_txw_length(struct pgm_txw *window)
+{
+	return ( 1 + window->lead ) - window->trail;
+}
+
+static inline u32 pgm_txw_size(struct pgm_txw *window)
+{
+	return window->size;
+}
+
+static inline int pgm_txw_is_empty(struct pgm_txw *window)
+{
+	return pgm_txw_length(window) == 0;
+}
+
+static inline int pgm_txw_is_full(struct pgm_txw *window)
+{
+	return pgm_txw_length(window) == pgm_txw_max_length(window);
+}
+
+static inline u32 pgm_txw_lead(struct pgm_txw *window)
+{
+	return window->lead;
+}
+
+static inline u32 pgm_txw_next_lead(struct pgm_txw *window)
+{
+	return pgm_txw_lead (window) + 1;
+}
+
+static inline u32 pgm_txw_trail(struct pgm_txw *window)
+{
+	return window->trail;
+}
+
+static inline u32 pgm_txw_get_unfolded_checksum(struct sk_buff *skb)
+{
+	struct pgm_txw_state *state = (void *)&skb->cb;
+
+	return state->unfolded_checksum;
+}
+
+static inline void pgm_txw_set_unfolded_checksum(struct sk_buff* skb, u32 csum)
+{
+	struct pgm_txw_state *state = (void *)&skb->cb;
+
+	state->unfolded_checksum = csum;
+}
+
+static inline void pgm_txw_inc_retransmit_count(struct sk_buff * skb)
+{
+	struct pgm_txw_state *state = (void *)&skb->cb;
+
+	state->retransmit_count++;
+}
+
+static inline int pgm_txw_retransmit_is_empty(struct pgm_txw *window)
+{
+	return list_empty(&window->retransmit_queue);
+}
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_PGM_H */
Index: linux-2.6/Documentation/networking/pgm/TODO
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/networking/pgm/TODO	2010-03-18 13:14:59.000000000 -0500
@@ -0,0 +1,8 @@
+- Define Socket API
+- Define /proc and sys api
+- Implement base logic
+- PGM over UDP
+- FEC Forward Error correction
+- Verify interaction with Cisco and other switches
+- Verify interaction with IBM Websphere, TIBCO, openpgm etc.
+
Index: linux-2.6/Documentation/networking/pgm/references
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/networking/pgm/references	2010-03-18 13:14:59.000000000 -0500
@@ -0,0 +1,2 @@
+RFC3208
+
Index: linux-2.6/Documentation/networking/pgm/usage
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/networking/pgm/usage	2010-03-18 15:55:17.000000000 -0500
@@ -0,0 +1,91 @@
+1. Opening a socket
+
+	A. Native PGM
+
+		fd = socket(AF_INET, SOCK_RDM, IPPROTO_PGM)
+
+	B. PGM over UDP
+
+		fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)
+
+	C. PGM over SHM (?)
+
+		fd = socket(AF_UNIX, SOCK_RDM, 0)
+
+
+2. Binding to a multicast address
+
+	A. Sender
+
+		Connect the socket to a MC address and port using connect().
+
+		Note that the port is significant since multiple streams on different
+		ports can be run over the same MC addr.
+
+	B. Receiver
+
+		I. Bind the socket to the MC address and port of interest.
+
+		II. Listen to the socket.
+
+			Process will wait until a PGM packet destined to the port of interest
+			is received.
+
+		III. Accept a connection.
+
+			Establishes a session. Data can then be received.
+
+
+3. Sending and receiving
+
+	Use the usual socket read and write operations and the various flavors of waiting
+	for a packet via select, poll, epoll etc.
+
+	Packet sizes are determined by the number of  packets in a single sendmsg() unless
+	overridden by the RM_SET_MESSAGE_BOUNDARY socket option.
+
+	The sender will block when the send window is full unless a non blocking write is performed.
+
+	The receiver shows the usual wait semantics. If the stream is set to unreliable then
+	packets may arrive in random order. If the set is set to RM_LISTEN_ONLY then packets may
+	just be missing.
+
+4. 	Transmitter Socket Options
+
+
+	A. Setting the window size / rate.
+
+		struct pgm_send_window x;
+		x.RateKbitsPerSec = 56;
+		x.WindowSizeInMsecs = 60000;
+		x.WindowSizeinBytes = 10000000;
+
+		setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));
+
+		Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.
+
+	B. FEC mode
+
+		struct pgm_fec_info x;
+
+		x.FECBlocksize = 255;
+		x.FECProActivePackets = 0;
+		x.FECGroupSize = 0;
+		x.fFECOnDemandParityEnabled = 1;
+
+		setsockopt(fd, SOCK_RDM, RM_FEC_MODE, &x, sizeof(x));
+
+
+5.	Receiver Socket Options
+
+	None?
+
+
+Possible Extensions
+
+	RM_UNORDERED	accept unordered packet avoiding delays when packets arrive out of sequence.
+			packet is still NAKed.
+
+	RM_RECEIVE_ONLY	Simply ignore missed packets. Do not send any replies.
+
+
Index: linux-2.6/net/ipv4/pgm.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/net/ipv4/pgm.c	2010-03-18 16:37:17.000000000 -0500
@@ -0,0 +1,143 @@
+/*
+ *  PGM		An implementation of the PGM (Pragmatic General Multicast)
+ *              protocol (RFC 3208).
+ *
+ *  Authors:    Christoph Lameter      <cl@linux-foundation.org>
+ *
+ *  Changes:
+ *  Fixes:
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ */
+#include "udp_impl.h"
+
+struct udp_table 	pgm_table __read_mostly;
+EXPORT_SYMBOL(pgm_table);
+
+static int pgm_rcv(struct sk_buff *skb)
+{
+	/* TBD */
+	return __udp4_lib_rcv(skb, &pgm_table, IPPROTO_UDPLITE);
+}
+
+static void pgm_err(struct sk_buff *skb, u32 info)
+{
+	__udp4_lib_err(skb, info, &pgm_table);
+}
+
+static const struct net_protocol pgm_protocol = {
+	.handler	= pgm_rcv,
+	.err_handler	= pgm_err,
+	.no_policy	= 1,
+	.netns_ok	= 1,
+};
+
+struct proto 	pgm_prot = {
+	.name		   = "PGM",
+	.owner		   = THIS_MODULE,
+	.close		   = udp_lib_close,
+	.connect	   = ip4_datagram_connect,
+	.disconnect	   = udp_disconnect,
+	.ioctl		   = udp_ioctl,
+	.init		   = pgm_sk_init,
+	.destroy	   = udp_destroy_sock,
+	.setsockopt	   = pgm_setsockopt,
+	.getsockopt	   = pgm_getsockopt,
+	.sendmsg	   = pgm_sendmsg,
+	.recvmsg	   = pgm_recvmsg,
+	.sendpage	   = pgm_sendpage,
+	.backlog_rcv	   = udp_queue_rcv_skb,
+	.hash		   = udp_lib_hash,
+	.unhash		   = udp_lib_unhash,
+	.get_port	   = udp_v4_get_port,
+	.obj_size	   = sizeof(struct udp_sock),
+	.slab_flags	   = SLAB_DESTROY_BY_RCU,
+	.h.udp_table	   = &pgm_table,
+#ifdef CONFIG_COMPAT
+	.compat_setsockopt = compat_pgm_setsockopt,
+	.compat_getsockopt = compat_pgm_getsockopt,
+#endif
+};
+
+static struct inet_protosw pgm_ip_protosw = {
+	.type		=  SOCK_RDM,
+	.protocol	=  IPPROTO_PGM,
+	.prot		=  &pgm_ip_prot,
+	.ops		=  &inet_pgm_ops,
+	.no_check	=  0,		/* must checksum (RFC 3828) */
+	.flags		=  INET_PROTOSW_PERMANENT,
+};
+
+static struct inet_protosw pgm_udp_protosw = {
+	.type		=  SOCK_RDM,
+	.protocol	=  IPPROTO_UDP,
+	.prot		=  &pgm_udp_prot,
+	.ops		=  &inet_pgm_ops,
+	.no_check	=  0,		/* must checksum (RFC 3828) */
+	.flags		=  INET_PROTOSW_PERMANENT,
+};
+
+#ifdef CONFIG_PROC_FS
+static struct udp_seq_afinfo pgm_seq_afinfo = {
+	.name		= "pgm",
+	.family		= AF_INET,
+	.udp_table 	= &pgm_table,
+	.seq_fops	= {
+		.owner	=	THIS_MODULE,
+	},
+	.seq_ops	= {
+		.show		= udp4_seq_show,
+	},
+};
+
+static int __net_init pgm_proc_init_net(struct net *net)
+{
+	return udp_proc_register(net, &pgm_seq_afinfo);
+}
+
+static void __net_exit pgm_proc_exit_net(struct net *net)
+{
+	udp_proc_unregister(net, &pgm_seq_afinfo);
+}
+
+static struct pernet_operations pgm4_net_ops = {
+	.init = pgm_proc_init_net,
+	.exit = pgm_proc_exit_net,
+};
+
+static __init int pgm_proc_init(void)
+{
+	return register_pernet_subsys(&pgm_net_ops);
+}
+#else
+static inline int pgm_proc_init(void)
+{
+	return 0;
+}
+#endif
+
+void __init pgm_register(void)
+{
+	udp_table_init(&pgm_table, "PGM");
+	if (proto_register(&pgm_prot, 1))
+		goto out_register_err;
+
+	if (inet_add_protocol(&pgm_protocol, IPPROTO_PGM) < 0)
+		goto out_unregister_proto;
+
+	inet_register_protosw(&pgm_ip_protosw);
+	inet_register_protosw(&pgm_udp_protosw);
+
+	if (pgm_proc_init())
+		printk(KERN_ERR "%s: Cannot register /proc!\n", __func__);
+	return;
+
+out_unregister_proto:
+	proto_unregister(&pgm_prot);
+out_register_err:
+	printk(KERN_CRIT "%s: Cannot add PGM protocol.\n", __func__);
+}
+
+EXPORT_SYMBOL(pgm_prot);
Index: linux-2.6/net/ipv4/Kconfig
===================================================================
--- linux-2.6.orig/net/ipv4/Kconfig	2010-03-18 16:16:34.000000000 -0500
+++ linux-2.6/net/ipv4/Kconfig	2010-03-18 16:39:36.000000000 -0500
@@ -14,6 +14,20 @@ config IP_MULTICAST
 	  <file:Documentation/networking/multicast.txt>. For most people, it's
 	  safe to say N.

+config IP_PGM
+	bool "IP: Pragmatic General Multicast (RFC3208) support"
+	depends on IP_MULTICAST && EXPERIMENTAL
+	help
+	   This is an implementation of reliable multicasting following
+	   RFC3208. PGM is used for publisher-subscriber based information
+	   services on private networks. The PGM protocol allows for recovery
+	   of lost packets through resent requests (NAKs) and through the
+	   recovery of missing packets via FEC. PGM is supported by router
+	   vendors through logic that allows correlation of NAKs to avoid
+	   flooding the network with NAK (aka NAK-storm). PGM is widely used
+	   in the financial industry and various commercial applications
+	   support this protocol.
+
 config IP_ADVANCED_ROUTER
 	bool "IP: advanced router"
 	---help---
Index: linux-2.6/net/ipv4/Makefile
===================================================================
--- linux-2.6.orig/net/ipv4/Makefile	2010-03-18 16:16:07.000000000 -0500
+++ linux-2.6/net/ipv4/Makefile	2010-03-18 16:24:04.000000000 -0500
@@ -52,3 +52,6 @@ obj-$(CONFIG_NETLABEL) += cipso_ipv4.o

 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
 		      xfrm4_output.o
+
+obj-$(CONFIG_IP_PGM)	+= pgm.o
+

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-18 17:58 Add PGM protocol support to the IP stack Christoph Lameter
  2010-03-18 21:58 ` Christoph Lameter
@ 2010-03-19 17:18 ` Andi Kleen
  2010-03-19 21:53   ` David Miller
  2010-03-22 14:20   ` Christoph Lameter
  1 sibling, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2010-03-19 17:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, netdev, linux-kernel

Christoph Lameter <cl@linux-foundation.org> writes:
>
> I know about the openpgm implementation. Openpbm does this at the user
> level and requires linking to a library. It is essentially a communication
> protocol done in user space. It has privilege issues because it has to
> create PGM packets via a raw socket.

That seems like a poor reason alone to put something into the kernel
Perhaps you rather need some way to have unpriviledged raw sockets?

The classical way to do this is to start suid root, only open
the socket and then drop privileges.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-19 17:18 ` Andi Kleen
@ 2010-03-19 21:53   ` David Miller
  2010-03-19 22:26     ` H. Peter Anvin
  2010-03-22 14:20   ` Christoph Lameter
  1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2010-03-19 21:53 UTC (permalink / raw)
  To: andi; +Cc: cl, netdev, linux-kernel

From: Andi Kleen <andi@firstfloor.org>
Date: Fri, 19 Mar 2010 18:18:36 +0100

> Christoph Lameter <cl@linux-foundation.org> writes:
>>
>> I know about the openpgm implementation. Openpbm does this at the user
>> level and requires linking to a library. It is essentially a communication
>> protocol done in user space. It has privilege issues because it has to
>> create PGM packets via a raw socket.
> 
> That seems like a poor reason alone to put something into the kernel
> Perhaps you rather need some way to have unpriviledged raw sockets?
> 
> The classical way to do this is to start suid root, only open
> the socket and then drop privileges.

I completely agree.

We should be able to make a way for unprivileged users to
use RAW sockets in some limited capacity, for cases like this.

But I also don't consider what openpbm has to do right now to
be all that much of a restriction.  You need privileges to
add the protocol to the kernel, you need privileges to run
the userspace variant, there is no real difference.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-19 21:53   ` David Miller
@ 2010-03-19 22:26     ` H. Peter Anvin
  2010-03-22 14:24       ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2010-03-19 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: andi, cl, netdev, linux-kernel

On 03/19/2010 02:53 PM, David Miller wrote:
> But I also don't consider what openpbm has to do right now to
> be all that much of a restriction.  You need privileges to
> add the protocol to the kernel, you need privileges to run
> the userspace variant, there is no real difference.

The real difference is if multiplex is needed between multiple
unprivileged users.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-19 22:26     ` H. Peter Anvin
@ 2010-03-22 14:24       ` Christoph Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-03-22 14:24 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: David Miller, andi, netdev, linux-kernel

On Fri, 19 Mar 2010, H. Peter Anvin wrote:

> On 03/19/2010 02:53 PM, David Miller wrote:
> > But I also don't consider what openpbm has to do right now to
> > be all that much of a restriction.  You need privileges to
> > add the protocol to the kernel, you need privileges to run
> > the userspace variant, there is no real difference.
>
> The real difference is if multiplex is needed between multiple
> unprivileged users.

It is needed. PGM ports exist and work similarly to UDP and TCP ports.

PGM as provided by openpgm and other solutions avoids native PGM and
instead uses PGM over UDP. But the routers do not support PGM over UDP in
the same way as native PGM. So the NAK suppression and other advanced
features available in Juniper and Cisco switches cannot be used.

openpbm can work with the native PGM protocol via a raw socket but then
one cannot run multiple processes communicating via different ports
effectively.

The fragmentation of packets and the assembly etc in user space is a pain.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-19 17:18 ` Andi Kleen
  2010-03-19 21:53   ` David Miller
@ 2010-03-22 14:20   ` Christoph Lameter
  2010-03-22 16:36     ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-03-22 14:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, linux-kernel

On Fri, 19 Mar 2010, Andi Kleen wrote:

> Christoph Lameter <cl@linux-foundation.org> writes:
> >
> > I know about the openpgm implementation. Openpbm does this at the user
> > level and requires linking to a library. It is essentially a communication
> > protocol done in user space. It has privilege issues because it has to
> > create PGM packets via a raw socket.
>
> That seems like a poor reason alone to put something into the kernel
> Perhaps you rather need some way to have unpriviledged raw sockets?

Not the only reason. There are also performance implications. NAKing and
other control messages from user space are a pain and the available
implementations add numerous threads just to control the timing of control
messages and the expiration of data etc. Its difficult to listen to a PGM
port from user space. You have to get all messages for the PGM protocol
and then filter in each process.

PGM operates on the same level as TCP and UDP.

> The classical way to do this is to start suid root, only open
> the socket and then drop privileges.

Yes those solutions exist and the experience with their limitations are
the reason to try to get PGM in the kernel.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 14:20   ` Christoph Lameter
@ 2010-03-22 16:36     ` Andi Kleen
  2010-03-22 16:51       ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2010-03-22 16:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, David Miller, netdev, linux-kernel

On Mon, Mar 22, 2010 at 09:20:42AM -0500, Christoph Lameter wrote:
> On Fri, 19 Mar 2010, Andi Kleen wrote:
> 
> > Christoph Lameter <cl@linux-foundation.org> writes:
> > >
> > > I know about the openpgm implementation. Openpbm does this at the user
> > > level and requires linking to a library. It is essentially a communication
> > > protocol done in user space. It has privilege issues because it has to
> > > create PGM packets via a raw socket.
> >
> > That seems like a poor reason alone to put something into the kernel
> > Perhaps you rather need some way to have unpriviledged raw sockets?
> 
> Not the only reason. There are also performance implications. NAKing and
> other control messages from user space are a pain and the available
> implementations add numerous threads just to control the timing of control
> messages and the expiration of data etc. Its difficult to listen to a PGM
> port from user space. You have to get all messages for the PGM protocol
> and then filter in each process.

Ok that sounds like a good reason to have a kernel protocol.
Thanks.

Multicast reliable kernel protocols are somewhat new, I guess one
would need to make sure to come up with a clean generic interface 
for them first.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 16:36     ` Andi Kleen
@ 2010-03-22 16:51       ` Christoph Lameter
  2010-03-22 17:43         ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-03-22 16:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, linux-kernel

On Mon, 22 Mar 2010, Andi Kleen wrote:

> Multicast reliable kernel protocols are somewhat new, I guess one
> would need to make sure to come up with a clean generic interface
> for them first.

It has been around for a long time in another OS. I wonder if I should use
the socket API realized there as a model or come up with something new
from scratch?

What I have right now is:

1. Opening a socket

        A. Native PGM

                fd = socket(AF_INET, SOCK_RDM, IPPROTO_PGM)

        B. PGM over UDP

                fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)

        C. PGM over SHM (?)

                fd = socket(AF_UNIX, SOCK_RDM, 0)


2. Binding to a multicast address

        A. Sender

                Connect the socket to a MC address and port using connect().

                Note that the port is significant since multiple streams on different
                ports can be run over the same MC addr.

        B. Receiver

                I. Bind the socket to the MC address and port of interest.

                II. Listen to the socket.

                        Process will wait until a PGM packet destined to the port of interest
                        is received.

                III. Accept a connection.

                        Establishes a session. Data can then be received.


3. Sending and receiving

        Use the usual socket read and write operations and the various flavors of waiting
        for a packet via select, poll, epoll etc.

        Packet sizes are determined by the number of  packets in a single sendmsg() unless
        overridden by the RM_SET_MESSAGE_BOUNDARY socket option.

        The sender will block when the send window is full unless a non blocking write is performed.

        The receiver shows the usual wait semantics. If the stream is set to unreliable then
        packets may arrive in random order. If the set is set to RM_LISTEN_ONLY then packets may
        just be missing.

4.      Transmitter Socket Options


        A. Setting the window size / rate.

                struct pgm_send_window x;
                x.RateKbitsPerSec = 56;
                x.WindowSizeInMsecs = 60000;
                x.WindowSizeinBytes = 10000000;

                setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));

                Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.

        B. FEC mode

                struct pgm_fec_info x;

                x.FECBlocksize = 255;
                x.FECProActivePackets = 0;
                x.FECGroupSize = 0;
                x.fFECOnDemandParityEnabled = 1;

                setsockopt(fd, SOCK_RDM, RM_FEC_MODE, &x, sizeof(x));


5.      Receiver Socket Options

        None?


Possible Extensions

        RM_UNORDERED    accept unordered packet avoiding delays when packets arrive out of sequence.
                        packet is still NAKed.

        RM_RECEIVE_ONLY Simply ignore missed packets. Do not send any replies.



Existing socket options in the other OS (X denotes that this looks like
its screwy and should be avoided)

/* PGM socket options */

/* Transmitter */
#define RM_LATEJOIN                             1       /* X Not supported on receive so why have it? */
#define RM_RATE_WINDOW_SIZE                     2       /* See struct pgm_send_window */
#define RM_SEND_WINDOW_ADV_RATE                 3       /* X Increase of send window in percentage of window */
#define RM_SENDER_STATISTICS                    4       /* see struct pgm_sender_stats */
#define RM_SENDER_WINDOW_ADVANCE_METHOD         5       /* X seems obsolete */
#define RM_SET_MCAST_TTL                        6       /* X Can be set via IP_MULTICAST_TTL */
#define RM_SET_MESSAGE_BOUNDARY                 7       /* Fix the size of the messages in bytes */
#define RM_SET_SEND_IF                          8       /* X use IP_MULTICAST_IF etc instead */
#define RM_USE_FEC                              9

/* Receiver */
#define RM_ADD_RECEIVE_IF                       100     /* X ???? IP_MULTICAST_IF instead? */
#define RM_DEL_RECEIVE_IF                       101     /* X IP_MULTICAST_IF */
#define RM_HIGH_SPEED_INTRANET_OPT              102     /* X PGM should adapt automatically to high speed networks */
#define RM_RECEIVER_STATISTICS                  103     /* See struct pgm_receiver_stats */


/* Socket API structures (established by M$DN) */
struct pgm_receiver_stats {
        u64     NumODataPacketsReceived;        /* Number of ODATA (original) sequences */
        u64     NumRDataPacketsReceived;        /* Number of RDATA (repair) sequences */
        u64     NumDuplicateDataPackets;        /* Duplicate sequences */
        u64     DataBytesReceived;
        u64     TotalBytesReceived;
        u64     RateKBitsPerSecOverall;         /* Receive rate since start of session X */
        u64     RateKBitsPerSecLast;            /* Receive rate for last second X*/
        u64     TrailingEdgeSeqId;              /* Oldest sequence in the receive window */
        u64     LeadingEdgeSeqId;               /* Newest sequence in the receive window */
        u64     AverageSequencesInWindow;       /* Average number of sequences in receive window X */
        u64     MinSequencesInWindow;           /* The mininum number of sequences */
        u64     MaxSequencesInWindow;           /* The maximum number of sequences */
        u64     FirstNakSequenceNumber;         /* First outstanding nack sequence number */
        u64     NumPendingNaks;                 /* Number of sequences waiting for NCF */
        u64     NumOutstandingNaks;             /* Number of sequences waiting for RDATA */
        u64     NumDataPacketsBuffered;         /* Number of packets currently buffered */
        u64     TotalSelectiveNaksSent;         /* Number of NAKs sent total */
        u64     TotalParityNaksSent;            /* Number of parity NAKs sent */
};

struct pgm_sender_stats {
        u64     DataBytesSent;
        u64     TotalBytesSent;
        u64     NaksReceived;
        u64     NaksReceivedTooLate;            /* NAKs received after receive window advanced */
        u64     NumOutstandingNaks;             /* Number of NAKs awaiting response */
        u64     NumNaksAfterRData;              /* Number of NAKs after RDATA sequences were sent which were ignored */
        u64     RepairPacketsSent;
        u64     BufferSpaceAvailable;           /* Number of partial messages dropped */
        u64     TrailingEdgeSeqId;              /* Oldest sequence id in window */
        u64     LeadingEdgeSeqId;               /* Newest sequence id in window */
        u64     RateKBitsPerSecOverall;         /* Rate since start of session X */
        u64     RateKBitsPerSecLast;            /* Rate in last second X */
        u64     TotalODataPacketsSent;          /* Total data packets transmitted */
};

/* Setup of sender RateKbitsPerSec = WindowSizeBytes / WindowSizeMSecs */
struct pgm_send_window {
        u64     RateKbitsPerSec;                /* Allowed rate for the sender in kbits per second */
        u64     WindowSizeInMSecs;              /* Send window size in time */
        u64     WindowSizeInBytes;              /* Window size in bytes */
};

struct pgm_fec_info {
        u16     FECBlockSize;                   /* Maximum number of packets for a group. Default and max = 255 */
        u16     FECProActivePackets;            /* Number of proactive packets per group. */
        u8      FECGroupSize;                   /* Number of packets to be treated as a group. Power of two */
        int     fFECOnDemandParityEnabled;      /* Allow sender to sent parity repair packets */
};

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 16:51       ` Christoph Lameter
@ 2010-03-22 17:43         ` Andi Kleen
  2010-03-22 18:07           ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2010-03-22 17:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, netdev, linux-kernel

Christoph Lameter <cl@linux-foundation.org> writes:

> On Mon, 22 Mar 2010, Andi Kleen wrote:
>
>> Multicast reliable kernel protocols are somewhat new, I guess one
>> would need to make sure to come up with a clean generic interface
>> for them first.
>
> It has been around for a long time in another OS. I wonder if I should use
> the socket API realized there as a model or come up with something new
> from scratch?

If the other API doesn't have a serious flaw I guess it's better
to aim for a sub/superset at least, to make porting applications easier.

>
> What I have right now is:
>
> 1. Opening a socket

>
>         A. Native PGM
>
>                 fd = socket(AF_INET, SOCK_RDM, IPPROTO_PGM)

RDM = Reliable ? Multicast ? 

>         B. PGM over UDP
>
>                 fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)
>
>         C. PGM over SHM (?)
>
>                 fd = socket(AF_UNIX, SOCK_RDM, 0)

Not sure how that should work.

> 3. Sending and receiving
>
>         Use the usual socket read and write operations and the various flavors of waiting
>         for a packet via select, poll, epoll etc.
>
>         Packet sizes are determined by the number of  packets in a single sendmsg() unless

Number of bytes surely?

>         overridden by the RM_SET_MESSAGE_BOUNDARY socket option.

That's unusual to have such a option (except the MTU). What is it good for?

>
> 4.      Transmitter Socket Options
>
>
>         A. Setting the window size / rate.
>
>                 struct pgm_send_window x;
>                 x.RateKbitsPerSec = 56;
>                 x.WindowSizeInMsecs = 60000;
>                 x.WindowSizeinBytes = 10000000;
>
>                 setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));
>
>                 Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.

That's a very large buffer for a socket. It would be better to use the usual
auto shrinking/increasing mechanisms.

>         B. FEC mode
>
>                 struct pgm_fec_info x;
>
>                 x.FECBlocksize = 255;
>                 x.FECProActivePackets = 0;
>                 x.FECGroupSize = 0;
>                 x.fFECOnDemandParityEnabled = 1;
>
>                 setsockopt(fd, SOCK_RDM, RM_FEC_MODE, &x, sizeof(x));

Is that mode really needed?

> /* Socket API structures (established by M$DN) */
> struct pgm_receiver_stats {
>         u64     NumODataPacketsReceived;        /* Number of ODATA (original) sequences */

It's difficult to maintain 64 bit counters on 32bit hosts on all targets.
But I guess it would be ok to only fill in 32bit in this case.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 17:43         ` Andi Kleen
@ 2010-03-22 18:07           ` Christoph Lameter
  2010-03-22 18:53             ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-03-22 18:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, linux-kernel

On Mon, 22 Mar 2010, Andi Kleen wrote:

> > What I have right now is:
> >
> > 1. Opening a socket
>
> >
> >         A. Native PGM
> >
> >                 fd = socket(AF_INET, SOCK_RDM, IPPROTO_PGM)
>
> RDM = Reliable ? Multicast ?

RDM is Reliable Datagram Multicast I believe. I'd rather have SOCK_PGM if
I could choose.

>
> >         B. PGM over UDP
> >
> >                 fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)
> >
> >         C. PGM over SHM (?)
> >
> >                 fd = socket(AF_UNIX, SOCK_RDM, 0)
>
> Not sure how that should work.

Multiple processes would communicate via shm segments. Maybe defer to the
future but its an important operation mode as the systems grow bigger and bigger.
SHM segment would have to contain some sort of ring buffer that the
receivers could tap into. But that mode has not really been thought
through.

> > 3. Sending and receiving
> >
> >         Use the usual socket read and write operations and the various flavors of waiting
> >         for a packet via select, poll, epoll etc.
> >
> >         Packet sizes are determined by the number of  packets in a single sendmsg() unless
>
> Number of bytes surely?

Sorry yes you are right.

> >         overridden by the RM_SET_MESSAGE_BOUNDARY socket option.
>
> That's unusual to have such a option (except the MTU). What is it good for?

No idea why it was implemented. It can be used to use send() for portions
of a message. Triggers the send() only when all bytes have been provided.
Probably necessary if one wants to have very long (megabytes) messages.
Esoteric and likely not going to be in a first release.

> > 4.      Transmitter Socket Options
> >
> >
> >         A. Setting the window size / rate.
> >
> >                 struct pgm_send_window x;
> >                 x.RateKbitsPerSec = 56;
> >                 x.WindowSizeInMsecs = 60000;
> >                 x.WindowSizeinBytes = 10000000;
> >
> >                 setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));
> >
> >                 Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.
>
> That's a very large buffer for a socket. It would be better to use the usual
> auto shrinking/increasing mechanisms.

Reliable multicast protocols have a defined time period / "reliabilty
buffer" so that they can resend a message that was missed for a time
period. It is customary to either specify a time period or define the size
of the "reliability buffer".

> >         B. FEC mode
> >
> >                 struct pgm_fec_info x;
> >
> >                 x.FECBlocksize = 255;
> >                 x.FECProActivePackets = 0;
> >                 x.FECGroupSize = 0;
> >                 x.fFECOnDemandParityEnabled = 1;
> >
> >                 setsockopt(fd, SOCK_RDM, RM_FEC_MODE, &x, sizeof(x));
>
> Is that mode really needed?

Never used it. I'd rather skip for now. Maybe later.

>
> > /* Socket API structures (established by M$DN) */
> > struct pgm_receiver_stats {
> >         u64     NumODataPacketsReceived;        /* Number of ODATA (original) sequences */
>
> It's difficult to maintain 64 bit counters on 32bit hosts on all targets.
> But I guess it would be ok to only fill in 32bit in this case.

32 bit counters have the awful habit of overflowing.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 18:07           ` Christoph Lameter
@ 2010-03-22 18:53             ` Andi Kleen
  2010-03-22 19:32               ` Christoph Lameter
                                 ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andi Kleen @ 2010-03-22 18:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, David Miller, netdev, linux-kernel

On Mon, Mar 22, 2010 at 01:07:37PM -0500, Christoph Lameter wrote:
> > >         B. PGM over UDP
> > >
> > >                 fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)
> > >
> > >         C. PGM over SHM (?)
> > >
> > >                 fd = socket(AF_UNIX, SOCK_RDM, 0)
> >
> > Not sure how that should work.
> 
> Multiple processes would communicate via shm segments. Maybe defer to the
> future but its an important operation mode as the systems grow bigger and bigger.
> SHM segment would have to contain some sort of ring buffer that the
> receivers could tap into. But that mode has not really been thought
> through.

AF_UNIX is not SHM today.

The only point is to avoid one copy? (user1 -> kernel -> user2  to user1 -> user2) 
Not sure if that is really worth it. Don't you need another copy to the reliability
buffer anyways?

Letting kernel parse a data structure in user defined memory is also
always somewhat tricky.

But in principle AF_INET over localhost should not be that less efficient
than AF_UNIX, so you can probably drop it for now (unless you need special AF_UNIX
features like credentials)

> > >
> > >         Packet sizes are determined by the number of  packets in a single sendmsg() unless
> >
> > Number of bytes surely?
> 
> Sorry yes you are right.
> 
> > >         overridden by the RM_SET_MESSAGE_BOUNDARY socket option.
> >
> > That's unusual to have such a option (except the MTU). What is it good for?
> 
> No idea why it was implemented. It can be used to use send() for portions
> of a message. Triggers the send() only when all bytes have been provided.
> Probably necessary if one wants to have very long (megabytes) messages.

Those could be a problem in kernel memory consumption. One would need
to be very careful to have a good memory management scheme for the socket
in place.

> > >
> > >         A. Setting the window size / rate.
> > >
> > >                 struct pgm_send_window x;
> > >                 x.RateKbitsPerSec = 56;
> > >                 x.WindowSizeInMsecs = 60000;
> > >                 x.WindowSizeinBytes = 10000000;
> > >
> > >                 setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));
> > >
> > >                 Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.
> >
> > That's a very large buffer for a socket. It would be better to use the usual
> > auto shrinking/increasing mechanisms.
> 
> Reliable multicast protocols have a defined time period / "reliabilty
> buffer" so that they can resend a message that was missed for a time
> period. It is customary to either specify a time period or define the size
> of the "reliability buffer".

One problem is memory management then. What happens when a process opens 100 of those
sockets and fills them all?

I guess you would still need a suitable global limit like TCP has.

> Never used it. I'd rather skip for now. Maybe later.
> 
> >
> > > /* Socket API structures (established by M$DN) */
> > > struct pgm_receiver_stats {
> > >         u64     NumODataPacketsReceived;        /* Number of ODATA (original) sequences */
> >
> > It's difficult to maintain 64 bit counters on 32bit hosts on all targets.
> > But I guess it would be ok to only fill in 32bit in this case.
> 
> 32 bit counters have the awful habit of overflowing.

There's just no portable atomic64_t. Ok maybe you can use the socket lock
to synchronize all the counts if they are only per socket.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 18:53             ` Andi Kleen
@ 2010-03-22 19:32               ` Christoph Lameter
  2010-03-26 17:33               ` Christoph Lameter
  2010-03-29 23:01               ` H. Peter Anvin
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-03-22 19:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, linux-kernel

On Mon, 22 Mar 2010, Andi Kleen wrote:

> > Multiple processes would communicate via shm segments. Maybe defer to the
> > future but its an important operation mode as the systems grow bigger and bigger.
> > SHM segment would have to contain some sort of ring buffer that the
> > receivers could tap into. But that mode has not really been thought
> > through.
>
> AF_UNIX is not SHM today.
>
> The only point is to avoid one copy? (user1 -> kernel -> user2  to user1 -> user2)
> Not sure if that is really worth it. Don't you need another copy to the reliability
> buffer anyways?

Not sure either. Access of multiple processes to one reliability buffer
would be best. Some sort of multiended pipe I guess.

> But in principle AF_INET over localhost should not be that less efficient
> than AF_UNIX, so you can probably drop it for now (unless you need special AF_UNIX
> features like credentials)

Well lets skip it for now and see if there are performance implications in
the future.

> > > That's unusual to have such a option (except the MTU). What is it good for?
> >
> > No idea why it was implemented. It can be used to use send() for portions
> > of a message. Triggers the send() only when all bytes have been provided.
> > Probably necessary if one wants to have very long (megabytes) messages.
>
> Those could be a problem in kernel memory consumption. One would need
> to be very careful to have a good memory management scheme for the socket
> in place.

Lets not support it then unless someone can make a convincing case.

> > Reliable multicast protocols have a defined time period / "reliabilty
> > buffer" so that they can resend a message that was missed for a time
> > period. It is customary to either specify a time period or define the size
> > of the "reliability buffer".
>
> One problem is memory management then. What happens when a process opens 100 of those
> sockets and fills them all?

Pushes out the app? Same as the user space apps now. Some sort of
upper limit is needed I guess.

> I guess you would still need a suitable global limit like TCP has.

Yes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 18:53             ` Andi Kleen
  2010-03-22 19:32               ` Christoph Lameter
@ 2010-03-26 17:33               ` Christoph Lameter
  2010-03-27 13:11                 ` Andi Kleen
  2010-03-29 23:01               ` H. Peter Anvin
  2 siblings, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-03-26 17:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, linux-kernel

Here is a pgm.7 manpage describing how the socket API could look like for
a PGM implementation.

I dumped the RM_* based socket options from the other OS since most of the
options were unusable.

.\" This man page is Copyright (C) 2010 Christoph Lameter <cl@linux-foundation.org>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\"
.TH PGM  7 2010-08-01 "Linux" "Linux Programmer's Manual"
.SH NAME
pgm \- Pragmatic General Multicast Protocol Support for IPv4
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <netinet/in.h>
.br
.B #include <linux/pgm.h>
.sp
.B pgm_socket = socket(AF_INET, SOCK_RDM, IPPROTO_PGM);
.br
.B pgm_socket = socket(AF_INET, SOCK_RDM, IPPROTO_UDP);
.SH DESCRIPTION
This is an implementation of the Pragmatic General Multicast Protocol
described in RFC\ 3028.
PGM implements a connection oriented, Reliable Datagram Messaging
(thus SOCK_RDM) protocol. Packets are delivered in order even though the
network may
have reordered, duplicated or dropped packets. Receivers may ask for
retransmission of missed packets (NAK). Transmitters do not keep receiver
state so that an individual sender is able to interact with an unlimited
number of receivers.
The recovery mechanism of PGM can limit the scalability of PGM if too
many receivers are NAKing. Therefore measures exist at various layers
to reduce the potential repair volume that a transmitter may have to
deal with.

PGM supports two variants. The first one is the
.B native PGM protocol
which uses its own IP protocol implementation at the same level as TCP and UDP.
Native PGM supports NAK suppression ("assist") by network elements (Cisco,
Juniper and other commercially available routers have support for PGM) which
is an important measure to reduce the NAK volume in case of packet loss during
multicast replication of messages in the network. Routers can consolidate
multiple NAKs from downstream into a single upstream and are also able to
use
.B FEC
(Forward Error Correction) to directly provide repair data without having to
forward NAKs to a transmitter.

The second variant is
.B PGM over UDP.
UDP is used as a transport protocol
instead of IP. PGM over UDP does
.B not
support assist from network elements and
therefore has limited support for NAK suppression. PGM over UDP mainly exists
because of the lack of kernel based PGM implementations. Using raw sockets
for packet creation and packet reception is inefficient and slow. User space
based PGM implementation typically are restricted to a single stream or multiple
stream in the same process since the in kernel multiplexing available for TCP
and UDP does not exist.
PGM over UDP allows the use of UDP port multiplexing instead which allows for]
efficient operation of multiple streams on a single system even if the
OS has no native support for PGM.

Creation of a PGM socket will lead to an unconnected socket. A sender must connect
to a multicast address to be able to send messages. A receiver needs to
bind to the multicast address and port number of interest and then listen
to the socket. The receiver can accept a connection when PGM traffic is
received on the chosen PGM multicast address and port. It is then
possible to receive datagrams on the PGM socket.

When
.BR connect (2)
is called on the socket, the multicast destination address is set and
datagrams can then be sent using
.BR send (2)
or
.BR write (2).
It is not possible to send to other destinations than the single multicast
address connected to. Note that the the send operations will cause the
application to be throttled if the maximum transmission rate is exceeded.
Throttling can be avoided by setting the socket to non blocking mode or
using MSG_DONTWAIT.

In order to receive packets, the socket needs to be bound to a multicast
address first by using
.BR bind (2).

All receive operations return only one packet.
When the packet is smaller than the passed buffer, only that much
data is returned; when it is bigger, the packet is truncated and the
.B MSG_TRUNC
flag is set.
.B MSG_WAITALL
is not supported.

Some IP options may be sent or received using the socket options described in
.BR ip (7).
However, multicast join and leave operations are not supported.
See
.BR ip (7).

By default, Linux PGM does path MTU (Maximum Transmission Unit) discovery.
This means the kernel
will keep track of the MTU to a specific target IP address and return
.B EMSGSIZE
when a PGM packet write exceeds it.
When this happens, the application should decrease the packet size.
Path MTU discovery can be also turned off using the
.B IP_MTU_DISCOVER
socket option or the
.I /proc/sys/net/ipv4/ip_no_pmtu_disc
file; see
.BR ip (7)
for details.
When turned off, PGM will fragment outgoing PGM packets
that exceed the interface MTU.
However, disabling it is not recommended
for performance and reliability reasons.
.SS "Address Format"
PGM supports IPv4 and IPv6 but Linux currently only supports IPv4. The
.I sockaddr_in
address format described in
.BR ip (7)
is used.
.SS "Error Handling"
All fatal errors will be passed to the user as an error return even
when the socket is not connected.
This includes asynchronous errors
received from the network.
You may get an error for an earlier packet
that was sent on the same socket.

When the
.B IP_RECVERR
option is enabled, all errors are stored in the socket error queue,
and can be received by
.BR recvmsg (2)
with the
.B MSG_ERRQUEUE
flag set.
.SS /proc interfaces
System-wide PGM parameter settings can be accessed by files in the directory
.IR /proc/sys/net/ipv4/ .
.TP
.IR pgm_mem " "
This is a vector of three integers governing the number
of pages allowed for queueing by all PGM sockets.
.RS
.TP 10
.I min
Below this number of pages, PGM is not bothered about its
memory appetite.
When the amount of memory allocated by PGM exceeds
this number, PGM starts to moderate memory usage.
.TP
.I pressure
This value was introduced to follow the format of
.IR tcp_mem
(see
.BR tcp (7)).
.TP
.I max
Number of pages allowed for queueing by all PGM sockets.
.RE
.IP
Defaults values for these three items are
calculated at boot time from the amount of available memory.
.TP
.IR pgm_window_size_default " (integer; default value: 10 MB)"
Default size, in bytes, of receive and transmit windows used by PGM sockets.
Each PGM socket is able to use the size for the receiving data window,
even if total pages of PGM sockets exceed pgm_mem pressure.
.TP
.IR pgm_window_msec_default " (integer; default value: 2000)"
Default time for packets to keep in the transmit and receive windows.
Each PGM socket is able to use the time period to resend data,
even if total pages of PGM sockets exceed
.I pgm_mem
pressure.
.TP
.IR pgm_ambient_spm_msecs " (integer; default value 15 seconds)"
Unconditional heartbeat sent by PGM transmitters to periodically notify receivers
about the stream status.
.TP
.IR pgm_spm_list_usec " (integers; default value: 1000 1000 4000 8000 16000 32000 64000 1280000 256000 1000000 2000000 8000000) "
Intervals for successive SPM heatbearts for the case that the connection goes idle. Initial SPMs are rapid to allow for
fast discovery of a missed packet and then back off until the unconditional heartbeat limit is reached.
.TP
.IR pgm_transmitter_rate_kbps "(integer; default value: 56)"
Default limit on the rate of traffic produced by a single transmitter.
The rate is an overall maximum of repair and original data. The limit
is set low because transmitters can do a lot of harm to the network
(especially WAN links) if they sent at high rates. It it advisable to
be careful when increasing the rate.
.TP
.IR pgm_transmitter_repair_rate_kpbs  "(integer; default value 30) "
Default limit on the amount of repair data sent by a single transmitter
.TP
.IR pgm_transmitter_nak_ignore_after_rdata_msec "(integer; default 50)"
Period during which to ignore receiver NAKs after repair data was sent
(is usually set to correlate to the maximum WAN delay seen).  This is
used to avoid useless additional repair data while NAK / repair data
is in flight.
.TP
.IR pgm_crybaby_rate_kbps " (integer; default 20)"
Maximum rate of repair traffic to a single receiver. A single receiver may
be slow and not able to keep up. Therefore it may continually ask for repairs (Thus
.B crybaby).
This parameter allows to limit the impact that continual repair traffic by the crybaby and
typically causes the crybaby to get so far out of sync that the receiver will finally have
to give up since messages for which repair is needed have been expired on the transmitter side.
Note that the transmitters do not keep track of the receivers. Crybaby detection is an
opportunitic heuristic method.
.TP
.IR pgm_fec_proactive_packets  " (integer; default 0 )"
The number of parity packets to insert in each sequence of
.B pgm_fec_group_size
packets. FEC (Forward Error Correction) is another means to reduce NAK
traffic in configurations with a large number of receivers. Receivers
(and network elements) will be able to reconstruct missed packets on their
own without resorting to NAKs. However, if too many packets are missed and
recover is not possible then NAKs will still be sent.
.TP
.IR pgm_fec_group_size	" (integer; default 16)"
Defines a unit of packets for which FEC parity packets are created.
.TP
.IR pgm_nak_retries " (integer; default 20)"
The number of recovery attempts to make for a single message before giving up.
.TP
.IR pgm_naks_per_sec " (integer; default 50)"
The maximum number of NAKs to send per second.
.IR pgm_debug " (integer; default 0)"
Allows enabling diagnostics for PGM interaction on the network.
If set to one then PGM will log all recovery activities/
If set to two then PGM will additionally log SPMs and SPMR and connection setup and teardown.
If set to three then PGM will log all activities in the syslog.

.SS "Socket Options"
To set or get a PGM socket option, call
.BR getsockopt (2)
to read or
.BR setsockopt (2)
to write the option with the option level argument set to
.BR IPPROTO_PGM .
.TP
.BR PGM_TRANSMITTER_CONFIG
This option is used to set up parameters for the transmitter before
connecting to a multicast address. The option cannot be used on a
connected SOCK_RDM socket. It is recommended to first get the
configuration data (which will contain the configured OS defaults) and
then modify individual fields as needed.
.sp
.in +4n
.nf
struct pgm_transmitter_config {
        int rate_kbyte;                         /* Maximum rate per second */
        int window_msecs;                       /* Window maximum packet age  */
        int window_kbytes;                      /* Window maximum size in kbytes */
        int ambient_spm_msecs;                  /* Unconditional SPM */
        int spm_msecs[12];                       /* Idle SPM backoff */
        int repeat_nak_ignore_msecs;            /* How long to skip nacks after sending rdata */
        int repair_rate_kbyte;                  /* Max permitted rate of repair traffic */
        int crybaby_rate_kbyte;                 /* Max rate of repair traffic to individual receiver */
        int transmit_only:1;                    /* If set do not process feedback from receivers */
        int fec:1;                              /* Enable forward error correction */
        int fec_parity:1;                       /* Respond to parity repair packet requests */
        int fec_packets_per_group;              /* Maximum number of packets for a group. */
        int fec_proactive_packets;              /* Number of proactive packets per group. */
        int fec_group_size;                     /* Number of packets to be treated as a group. Power of two */
}
.fi
.TP
.BR PGM_TRANSMITTER_STATISTICS
Retrieves transmitter statistics.
.sp
.in +4n
.nf
struct pgm_transmitter_stats {
        u64     bytes_received;
        u64     data_send;
        u64     naks_received;
        u64     naks_too_late;                  /* NAKs received after receive window advanced */
        u64     naks_outstanding;               /* Number of NAKs awaiting response */
        u64     naks_after_rdata;               /* Number of NAKs after RDATA sequences were sent which were ignored */
        u64     rdata_packets;                  /* Repair data */
        u64     odata_packets;                  /* Original data */
        u32     first_seqid;                    /* Oldest sequence id in window */
        u32     last_seqid;                     /* Newest sequence id in window */
};
fi
.TP
.BR PGM_RECEIVER_CONFIG
Used to setup receiver parameters before accepting a connection.
The option cannot be used a on a connected SOCK_RDM socket.
.sp
.in +4n
.nf
struct pgm_receiver_config {
        int window_msecs;                       /* Receive window maximum age (per transmitter) */
        int window_kbyte;                       /* Receive window maximum size (per transmitter) */
        int nak_retries;                        /* Nak retries before giving up */
        int nak_ncf_retries;                    /* Nak retries after NCF before giving up */
        int nak_backoff_interval;               /* time to backoff on NAK failure */
        int naks_per_sec;                       /* Limit on the naks per second */
        int peer_timeout;                       /* Discard peer if silent for this time period */
        int spmr_timeout;                       /* Abort connection if no SPMR response */
        int receive_only:1;                     /* Never send data to sender */
}
.fi
.TP
.BR PGM_RECEIVER_STATISTICS
Retrieves receiver statistics.
.sp
.in +4n
.nf
struct pgm_receiver_stats {
        u64     bytes_received;                 /* Total bytes received */
        u64     data_received                   /* Useful data bytes received */
        u64     odata_packets;                  /* Number of ODATA (original) sequences */
        u64     rdata_packets;                  /* Number of RDATA (repair) sequences */
        u64     odata_duplicates;               /* Duplicate ODATA */
        u64     rdata_duplicates;               /* Duplicate RDATA */
        u32     first_seqid;                    /* First buffered sequence id (first transmitter) */
        u32     last_seqid;                     /* Last buffered sequence id (first transmitter) */
        u32     first_naked_seqid;              /* First sequence id that was naked */
        u64     pending_naks;                   /* Outstanding naks */
        u64     pending_ncfs;                   /* Outstanding ncfs */
        u64     naks_sent;
        u64     parity_naks_sent;
        u32     active_transmitters;            /* Number of transmitters */
};
.fi
.SS Ioctls
These ioctls can be accessed using
.BR ioctl (2).
The correct syntax is:
.PP
.RS
.nf
.BI int " value";
.IB error " = ioctl(" pgm_socket ", " ioctl_type ", &" value ");"
.fi
.RE
.TP
.BR FIONREAD " (" SIOCINQ )
Gets a pointer to an integer as argument.
Returns the size of the next pending datagram in the integer in bytes,
or 0 when no datagram is pending.
.TP
.BR TIOCOUTQ " (" SIOCOUTQ )
Returns the number of data bytes in the local send queue.
.PP
In addition all ioctls documented in
.BR ip (7)
and
.BR socket (7)
are supported.
.SH ERRORS
All errors documented for
.BR socket (7)
or
.BR ip (7)
may be returned by a send or receive on a PGM socket.
.TP
.B ECONNREFUSED
The socket was not associated with a multicast address. For a receiver
this may mean that no PGM traffic was detected on the given port. The
address specified may not be a valid multicast address.
.TP
.B NOTCONN
Socket is not connected.
.TP
.B EISCONN
Socket is already connected.
.TP
.B ECONNABORTED
Receiver was not able to keep up. Connection was
torn down.
.\" .SH CREDITS
.\" This man page was written by Christoph Lameter.
.SH "SEE ALSO"
.BR ip (7),
.BR raw (7),
.BR socket (7),
.BR udp (7)

RFC\ 3028 for the Pragmatic General Multicast protocol.
.br
RFC\ 1122 for the host requirements.
.br
RFC\ 1191 for a description of path MTU discovery.
.SH COLOPHON
This page is part of release 3.xx of the Linux
.I man-pages
project.
A description of the project,
and information about reporting bugs,
can be found at
http://www.kernel.org/doc/man-pages/.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-26 17:33               ` Christoph Lameter
@ 2010-03-27 13:11                 ` Andi Kleen
  2010-03-27 16:54                   ` Martin Sustrik
  2010-03-29 15:00                   ` Christoph Lameter
  0 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2010-03-27 13:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, David Miller, netdev, linux-kernel

On Fri, Mar 26, 2010 at 12:33:07PM -0500, Christoph Lameter wrote:
> Here is a pgm.7 manpage describing how the socket API could look like for
> a PGM implementation.
> 
> I dumped the RM_* based socket options from the other OS since most of the
> options were unusable.

I did a quick read and the manpage/interface seem reasonable to me.

You changed the parameter struct fields to lower case. While
that looks definitely more Linuxy than before does it mean programs
have to #ifdef this? It might be good idea to have at least some
optional compat header that #defines.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-27 13:11                 ` Andi Kleen
@ 2010-03-27 16:54                   ` Martin Sustrik
  2010-03-29 14:50                     ` Christoph Lameter
  2010-03-29 15:00                   ` Christoph Lameter
  1 sibling, 1 reply; 21+ messages in thread
From: Martin Sustrik @ 2010-03-27 16:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, David Miller, netdev, linux-kernel

Andi Kleen wrote:

> I did a quick read and the manpage/interface seem reasonable to me.

You may also have a look at original PGM implementation by Luigi Rizzo 
(FreeBSD). It's not maintained, but it might give you broader view.

http://info.iet.unipi.it/~luigi/pgm-code/

Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-27 16:54                   ` Martin Sustrik
@ 2010-03-29 14:50                     ` Christoph Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-03-29 14:50 UTC (permalink / raw)
  To: Martin Sustrik; +Cc: Andi Kleen, David Miller, netdev, linux-kernel

On Sat, 27 Mar 2010, Martin Sustrik wrote:

> Andi Kleen wrote:
>
> > I did a quick read and the manpage/interface seem reasonable to me.
>
> You may also have a look at original PGM implementation by Luigi Rizzo
> (FreeBSD). It's not maintained, but it might give you broader view.
>
> http://info.iet.unipi.it/~luigi/pgm-code/

Interesting. Which files in that directory contain the most current code?

Looks like the tcpdump patch has been merged.

Here is another tcpdump patch that implements decoding PGM via UDP. Anyone
know how to submit something like that?

(Need to specify -Tpgm option to use pgm decoder on UDP traffic)

Index: tcpdump/interface.h
===================================================================
--- tcpdump.orig/interface.h	2010-02-26 18:50:39.411609391 -0600
+++ tcpdump/interface.h	2010-02-26 18:51:04.270350179 -0600
@@ -74,6 +74,7 @@
 #define PT_CNFP		7	/* Cisco NetFlow protocol */
 #define PT_TFTP		8	/* trivial file transfer protocol */
 #define PT_AODV		9	/* Ad-hoc On-demand Distance Vector Protocol */
+#define PT_PGM		10	/* The PGM protocol */

 #ifndef min
 #define min(a,b) ((a)>(b)?(b):(a))
Index: tcpdump/print-udp.c
===================================================================
--- tcpdump.orig/print-udp.c	2010-02-26 18:51:35.921610552 -0600
+++ tcpdump/print-udp.c	2010-02-26 18:53:54.440349950 -0600
@@ -520,6 +520,11 @@
 			tftp_print(cp, length);
 			break;

+		case PT_PGM:
+			udpipaddr_print(ip, sport, dport);
+			pgm_print(cp, length, (const u_char *)ip);
+			break;
+
 		case PT_AODV:
 			udpipaddr_print(ip, sport, dport);
 			aodv_print((const u_char *)(up + 1), length,
Index: tcpdump/tcpdump.c
===================================================================
--- tcpdump.orig/tcpdump.c	2010-02-26 18:37:13.971601597 -0600
+++ tcpdump/tcpdump.c	2010-02-26 18:37:43.290033748 -0600
@@ -854,6 +854,8 @@
 				packettype = PT_TFTP;
 			else if (strcasecmp(optarg, "aodv") == 0)
 				packettype = PT_AODV;
+			else if (strcasecmp(optarg, "pgm") == 0)
+				packettype = PT_PGM;
 			else
 				error("unknown packet type `%s'", optarg);
 			break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-27 13:11                 ` Andi Kleen
  2010-03-27 16:54                   ` Martin Sustrik
@ 2010-03-29 15:00                   ` Christoph Lameter
  2010-03-29 21:43                     ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2010-03-29 15:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, netdev, linux-kernel

On Sat, 27 Mar 2010, Andi Kleen wrote:

> On Fri, Mar 26, 2010 at 12:33:07PM -0500, Christoph Lameter wrote:
> > Here is a pgm.7 manpage describing how the socket API could look like for
> > a PGM implementation.
> >
> > I dumped the RM_* based socket options from the other OS since most of the
> > options were unusable.
>
> I did a quick read and the manpage/interface seem reasonable to me.

Thanks. I will then proceed to get a patch out that implements the
network environment. Then we can plug the openpgm logic in there.

> You changed the parameter struct fields to lower case. While
> that looks definitely more Linuxy than before does it mean programs
> have to #ifdef this? It might be good idea to have at least some
> optional compat header that #defines.

The socket API will be completely different. The basic handling of the
sockets is the same (binding, listening, connecting). There is no way of
mapping M$ socket options to Linux socket options with the approach that
I proposed in the manpage. The stats structure is different too since some
key elements were missing.

What users are there of the M$ api? I have seen vendors supplying their
own pgm implementation (guess due to bit rot in the old M$
implementation).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-29 15:00                   ` Christoph Lameter
@ 2010-03-29 21:43                     ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2010-03-29 21:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andi Kleen, David Miller, netdev, linux-kernel

On Mon, Mar 29, 2010 at 10:00:57AM -0500, Christoph Lameter wrote:
> On Sat, 27 Mar 2010, Andi Kleen wrote:
> 
> > On Fri, Mar 26, 2010 at 12:33:07PM -0500, Christoph Lameter wrote:
> > > Here is a pgm.7 manpage describing how the socket API could look like for
> > > a PGM implementation.
> > >
> > > I dumped the RM_* based socket options from the other OS since most of the
> > > options were unusable.
> >
> > I did a quick read and the manpage/interface seem reasonable to me.
> 
> Thanks. I will then proceed to get a patch out that implements the
> network environment. Then we can plug the openpgm logic in there.

You might still need some reviewing from network maintainers.

> 
> > You changed the parameter struct fields to lower case. While
> > that looks definitely more Linuxy than before does it mean programs
> > have to #ifdef this? It might be good idea to have at least some
> > optional compat header that #defines.
> 
> The socket API will be completely different. The basic handling of the
> sockets is the same (binding, listening, connecting). There is no way of
> mapping M$ socket options to Linux socket options with the approach that
> I proposed in the manpage. The stats structure is different too since some
> key elements were missing.

Ok.

> 
> What users are there of the M$ api? I have seen vendors supplying their
> own pgm implementation (guess due to bit rot in the old M$
> implementation).

I don't know, it was just a general consideration.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-22 18:53             ` Andi Kleen
  2010-03-22 19:32               ` Christoph Lameter
  2010-03-26 17:33               ` Christoph Lameter
@ 2010-03-29 23:01               ` H. Peter Anvin
  2010-03-30 18:12                 ` Christoph Lameter
  2 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2010-03-29 23:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Christoph Lameter, David Miller, netdev, linux-kernel

On 03/22/2010 11:53 AM, Andi Kleen wrote:
> 
> There's just no portable atomic64_t. Ok maybe you can use the socket lock
> to synchronize all the counts if they are only per socket.
> 

In 2.6.34 there is (although some arches which could support it natively
don't as of yet... but that's fixable.)  See lib/atomic64.c.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Add PGM protocol support to the IP stack
  2010-03-29 23:01               ` H. Peter Anvin
@ 2010-03-30 18:12                 ` Christoph Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2010-03-30 18:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andi Kleen, David Miller, netdev, linux-kernel

On Mon, 29 Mar 2010, H. Peter Anvin wrote:

> On 03/22/2010 11:53 AM, Andi Kleen wrote:
> >
> > There's just no portable atomic64_t. Ok maybe you can use the socket lock
> > to synchronize all the counts if they are only per socket.
> >
>
> In 2.6.34 there is (although some arches which could support it natively
> don't as of yet... but that's fixable.)  See lib/atomic64.c.

There are also the 64bit thiscpu operations that were merged in 2.6.33.
They do the right thing if the arch does not provide operations.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-03-30 18:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-18 17:58 Add PGM protocol support to the IP stack Christoph Lameter
2010-03-18 21:58 ` Christoph Lameter
2010-03-19 17:18 ` Andi Kleen
2010-03-19 21:53   ` David Miller
2010-03-19 22:26     ` H. Peter Anvin
2010-03-22 14:24       ` Christoph Lameter
2010-03-22 14:20   ` Christoph Lameter
2010-03-22 16:36     ` Andi Kleen
2010-03-22 16:51       ` Christoph Lameter
2010-03-22 17:43         ` Andi Kleen
2010-03-22 18:07           ` Christoph Lameter
2010-03-22 18:53             ` Andi Kleen
2010-03-22 19:32               ` Christoph Lameter
2010-03-26 17:33               ` Christoph Lameter
2010-03-27 13:11                 ` Andi Kleen
2010-03-27 16:54                   ` Martin Sustrik
2010-03-29 14:50                     ` Christoph Lameter
2010-03-29 15:00                   ` Christoph Lameter
2010-03-29 21:43                     ` Andi Kleen
2010-03-29 23:01               ` H. Peter Anvin
2010-03-30 18:12                 ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).