Netdev List
 help / color / mirror / Atom feed
* Realtek 8139 Flow Control?
From: Sven Anders @ 2011-07-25 19:42 UTC (permalink / raw)
  To: netdev; +Cc: becker, jgarzik

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Hello!

I have just a short question.

We have appliances with Realtek 8139C/8139C+ (rev 10) chipsets (with the
PCI IDs: 10ec:8139). We are using the 8139too driver (version: 0.9.28)
According the datasheet the chipset supports Flow Control (IEEE 802.3x).


I want to know, if the support for enabling flow control is missing in
the driver by purpose or is only not implemented due to missing time?


Regards
 Sven

-- 
 Sven Anders <anders@anduras.de>                 () UTF-8 Ribbon Campaign
                                                 /\ Support plain text e-mail
 ANDURAS intranet security AG
 Messestraße 3 - 94036 Passau - Germany
 Web: www.anduras.de - Tel: +49 (0)851-4 90 50-0 - Fax: +49 (0)851-4 90 50-55

Rechtsform: Aktiengesellschaft - Sitz: Passau - Amtsgericht: Passau HRB 6032
Mitglieder des Vorstands: Dipl.-Inf. Sven Anders, Dipl.-Inf. Marcus Junker
Vorsitzender des Aufsichtsrats: RA Mark Peters

[-- Attachment #2: anders.vcf --]
[-- Type: text/x-vcard, Size: 339 bytes --]

begin:vcard
fn:Sven Anders
n:Anders;Sven
org:ANDURAS AG;Research and Development
adr;quoted-printable:;;Messestra=C3=9Fe 3;Passau;Bavaria;94036;Germany
email;internet:anders@anduras.de
title:Dipl. Inf.
tel;work:++49 (0)851 / 490 50 -0
tel;fax:++49 (0)851 / 590 50 - 55
x-mozilla-html:FALSE
url:http://www.anduras.de
version:2.1
end:vcard


^ permalink raw reply

* [PATCH 2/2] net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
From: Neil Horman @ 2011-07-25 19:45 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Karsten Keil, David S. Miller, Jay Vosburgh,
	Andy Gospodarek, Patrick McHardy, Krzysztof Halasa,
	John W. Linville, Greg Kroah-Hartman, Marcel Holtmann,
	Johannes Berg
In-Reply-To: <1311623120-26839-1-git-send-email-nhorman@tuxdriver.com>

After the last patch, We are left in a state in which only drivers calling
ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
hardware call ether_setup for their net_devices and don't hold any state in
their skbs.  There are a handful of drivers that violate this assumption of
course, and need to be fixed up.  This patch identifies those drivers, and marks
them as not being able to support the safe transmission of skbs by clearning the
IFF_TX_SKB_SHARING flag in priv_flags

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Karsten Keil <isdn@linux-pingi.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: "John W. Linville" <linville@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Marcel Holtmann <marcel@holtmann.org>
CC: Johannes Berg <johannes@sipsolutions.net>
---
 drivers/isdn/i4l/isdn_net.c                  |    3 +++
 drivers/net/bonding/bond_main.c              |    6 ++++--
 drivers/net/ifb.c                            |    2 +-
 drivers/net/macvlan.c                        |    2 +-
 drivers/net/tun.c                            |    1 +
 drivers/net/veth.c                           |    2 ++
 drivers/net/wan/hdlc_fr.c                    |    5 +++--
 drivers/net/wireless/airo.c                  |    1 +
 drivers/net/wireless/hostap/hostap_main.c    |    1 +
 drivers/staging/ath6kl/os/linux/ar6000_drv.c |    1 +
 net/8021q/vlan_dev.c                         |    2 +-
 net/bluetooth/bnep/netdev.c                  |    1 +
 net/l2tp/l2tp_eth.c                          |    2 +-
 net/mac80211/iface.c                         |    1 +
 14 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_net.c b/drivers/isdn/i4l/isdn_net.c
index 48e9cc0..1f73d7f 100644
--- a/drivers/isdn/i4l/isdn_net.c
+++ b/drivers/isdn/i4l/isdn_net.c
@@ -2532,6 +2532,9 @@ static void _isdn_setup(struct net_device *dev)
 
 	/* Setup the generic properties */
 	dev->flags = IFF_NOARP|IFF_POINTOPOINT;
+
+	/* isdn prepends a header in the tx path, can't share skbs */
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->header_ops = NULL;
 	dev->netdev_ops = &isdn_netdev_ops;
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 02842d0..df21e84 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1557,8 +1557,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 
 			if (slave_dev->type != ARPHRD_ETHER)
 				bond_setup_by_slave(bond_dev, slave_dev);
-			else
+			else {
 				ether_setup(bond_dev);
+				bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+			}
 
 			netdev_bonding_change(bond_dev,
 					      NETDEV_POST_TYPE_CHANGE);
@@ -4330,7 +4332,7 @@ static void bond_setup(struct net_device *bond_dev)
 	bond_dev->tx_queue_len = 0;
 	bond_dev->flags |= IFF_MASTER|IFF_MULTICAST;
 	bond_dev->priv_flags |= IFF_BONDING;
-	bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
+	bond_dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 
 	/* At first, we block adding VLANs. That's the only way to
 	 * prevent problems that occur when adding VLANs over an
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 6e82dd3..46b5f5f 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -183,7 +183,7 @@ static void ifb_setup(struct net_device *dev)
 
 	dev->flags |= IFF_NOARP;
 	dev->flags &= ~IFF_MULTICAST;
-	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
+	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	random_ether_addr(dev->dev_addr);
 }
 
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index ba631fc..05172c3 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -572,7 +572,7 @@ void macvlan_common_setup(struct net_device *dev)
 {
 	ether_setup(dev);
 
-	dev->priv_flags	       &= ~IFF_XMIT_DST_RELEASE;
+	dev->priv_flags	       &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	dev->netdev_ops		= &macvlan_netdev_ops;
 	dev->destructor		= free_netdev;
 	dev->header_ops		= &macvlan_hard_header_ops,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 9a6b382..71f3d1a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -528,6 +528,7 @@ static void tun_net_init(struct net_device *dev)
 		dev->netdev_ops = &tap_netdev_ops;
 		/* Ethernet TAP Device */
 		ether_setup(dev);
+		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 		random_ether_addr(dev->dev_addr);
 
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 7f78db7..5b23767 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -263,6 +263,8 @@ static void veth_setup(struct net_device *dev)
 {
 	ether_setup(dev);
 
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+
 	dev->netdev_ops = &veth_netdev_ops;
 	dev->ethtool_ops = &veth_ethtool_ops;
 	dev->features |= NETIF_F_LLTX;
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index b25c922..eb20281 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1074,9 +1074,10 @@ static int fr_add_pvc(struct net_device *frad, unsigned int dlci, int type)
 
 	used = pvc_is_used(pvc);
 
-	if (type == ARPHRD_ETHER)
+	if (type == ARPHRD_ETHER) {
 		dev = alloc_netdev(0, "pvceth%d", ether_setup);
-	else
+		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+	} else
 		dev = alloc_netdev(0, "pvc%d", pvc_setup);
 
 	if (!dev) {
diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c
index 55cf71f..e1b3e3c 100644
--- a/drivers/net/wireless/airo.c
+++ b/drivers/net/wireless/airo.c
@@ -2823,6 +2823,7 @@ static struct net_device *_init_airo_card( unsigned short irq, int port,
 	dev->wireless_data = &ai->wireless_data;
 	dev->irq = irq;
 	dev->base_addr = port;
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 	SET_NETDEV_DEV(dev, dmdev);
 
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index d508482..89a116f 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -855,6 +855,7 @@ void hostap_setup_dev(struct net_device *dev, local_info_t *local,
 
 	iface = netdev_priv(dev);
 	ether_setup(dev);
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
 	/* kernel callbacks */
 	if (iface) {
diff --git a/drivers/staging/ath6kl/os/linux/ar6000_drv.c b/drivers/staging/ath6kl/os/linux/ar6000_drv.c
index 48dd9e3..8ff5289 100644
--- a/drivers/staging/ath6kl/os/linux/ar6000_drv.c
+++ b/drivers/staging/ath6kl/os/linux/ar6000_drv.c
@@ -6179,6 +6179,7 @@ int ar6000_create_ap_interface(struct ar6_softc *ar, char *ap_ifname)
     
     ether_setup(dev);
     init_netdev(dev, ap_ifname);
+    dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 
     if (register_netdev(dev)) {
         AR_DEBUG_PRINTF(ATH_DEBUG_ERR,("ar6000_create_ap_interface: register_netdev failed\n"));
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 934e221..9d40a07 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -695,7 +695,7 @@ void vlan_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->priv_flags		|= IFF_802_1Q_VLAN;
-	dev->priv_flags		&= ~IFF_XMIT_DST_RELEASE;
+	dev->priv_flags		&= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
 	dev->tx_queue_len	= 0;
 
 	dev->netdev_ops		= &vlan_netdev_ops;
diff --git a/net/bluetooth/bnep/netdev.c b/net/bluetooth/bnep/netdev.c
index 8c100c9..d4f5dff 100644
--- a/net/bluetooth/bnep/netdev.c
+++ b/net/bluetooth/bnep/netdev.c
@@ -231,6 +231,7 @@ void bnep_net_setup(struct net_device *dev)
 	dev->addr_len = ETH_ALEN;
 
 	ether_setup(dev);
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops = &bnep_netdev_ops;
 
 	dev->watchdog_timeo  = HZ * 2;
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index a8193f5..d2726a7 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -103,7 +103,7 @@ static struct net_device_ops l2tp_eth_netdev_ops = {
 static void l2tp_eth_dev_setup(struct net_device *dev)
 {
 	ether_setup(dev);
-
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops		= &l2tp_eth_netdev_ops;
 	dev->destructor		= free_netdev;
 }
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index cd5fb40..556e7e6 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -698,6 +698,7 @@ static const struct net_device_ops ieee80211_monitorif_ops = {
 static void ieee80211_if_setup(struct net_device *dev)
 {
 	ether_setup(dev);
+	dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 	dev->netdev_ops = &ieee80211_dataif_ops;
 	dev->destructor = free_netdev;
 }
-- 
1.7.6


^ permalink raw reply related

* [PATCH 1/2] net: add IFF_SKB_TX_SHARED flag to priv_flags
From: Neil Horman @ 2011-07-25 19:45 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Robert Olsson, Eric Dumazet, Alexey Dobriyan,
	David S. Miller
In-Reply-To: <1311623120-26839-1-git-send-email-nhorman@tuxdriver.com>

Pktgen attempts to transmit shared skbs to net devices, which can't be used by
some drivers as they keep state information in skbs.  This patch adds a flag
marking drivers as being able to handle shared skbs in their tx path.  Drivers
are defaulted to being unable to do so, but calling ether_setup enables this
flag, as 90% of the drivers calling ether_setup touch real hardware and can
handle shared skbs.  A subsequent patch will audit drivers to ensure that the
flag is set properly

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jiri Pirko <jpirko@redhat.com>
CC: Robert Olsson <robert.olsson@its.uu.se>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: David S. Miller <davem@davemloft.net>
---
 include/linux/if.h |    2 ++
 net/core/pktgen.c  |    8 +++++---
 net/ethernet/eth.c |    1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/if.h b/include/linux/if.h
index 3bc63e6..03489ca 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -76,6 +76,8 @@
 #define IFF_BRIDGE_PORT	0x4000		/* device used as bridge port */
 #define IFF_OVS_DATAPATH	0x8000	/* device used as Open vSwitch
 					 * datapath port */
+#define IFF_TX_SKB_SHARING	0x10000	/* The interface supports sharing
+					 * skbs on transmit */
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
 #define IF_GET_PROTO	0x0002
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index f76079c..53f3f15 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1070,7 +1070,9 @@ static ssize_t pktgen_if_write(struct file *file,
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
 			return len;
-
+		if ((len > 0) &&
+		    (!(pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING)))
+			return -ENOTSUPP;
 		i += len;
 		pkt_dev->clone_skb = value;
 
@@ -3555,7 +3557,6 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	pkt_dev->min_pkt_size = ETH_ZLEN;
 	pkt_dev->max_pkt_size = ETH_ZLEN;
 	pkt_dev->nfrags = 0;
-	pkt_dev->clone_skb = pg_clone_skb_d;
 	pkt_dev->delay = pg_delay_d;
 	pkt_dev->count = pg_count_d;
 	pkt_dev->sofar = 0;
@@ -3563,7 +3564,6 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	pkt_dev->udp_src_max = 9;
 	pkt_dev->udp_dst_min = 9;
 	pkt_dev->udp_dst_max = 9;
-
 	pkt_dev->vlan_p = 0;
 	pkt_dev->vlan_cfi = 0;
 	pkt_dev->vlan_id = 0xffff;
@@ -3575,6 +3575,8 @@ static int pktgen_add_device(struct pktgen_thread *t, const char *ifname)
 	err = pktgen_setup_dev(pkt_dev, ifname);
 	if (err)
 		goto out1;
+	if (pkt_dev->odev->priv_flags & IFF_TX_SKB_SHARING)
+		pkt_dev->clone_skb = pg_clone_skb_d;
 
 	pkt_dev->entry = proc_create_data(ifname, 0600, pg_proc_dir,
 					  &pktgen_if_fops, pkt_dev);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 5cffb63..4866330 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -339,6 +339,7 @@ void ether_setup(struct net_device *dev)
 	dev->addr_len		= ETH_ALEN;
 	dev->tx_queue_len	= 1000;	/* Ethernet wants good queues */
 	dev->flags		= IFF_BROADCAST|IFF_MULTICAST;
+	dev->priv_flags		= IFF_TX_SKB_SHARING;
 
 	memset(dev->broadcast, 0xFF, ETH_ALEN);
 
-- 
1.7.6


^ permalink raw reply related

* [PATCH 0/2] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods (v2)
From: Neil Horman @ 2011-07-25 19:45 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1311105179-26408-1-git-send-email-nhorman@tuxdriver.com>

Ok, after considering all your comments, Dave suggested this as an alternate
approach:

1) We create a new priv_flag, IFF_SKB_TX_SHARED, to identify drivers capable of
handling shared skbs.  Default is to not set this flag

2) Modify ether_setup to enable this flag, under the assumption that any driver
calling this  function is initalizing a real ethernet device and as such can
handle shared skbs since they don't tend to store state in the skb struct.
Pktgen can then query this flag when a user script attempts to issue the
clone_skb command and decide if it is to be alowed or not.

3) Audit the network drivers calling ether_setup to identify any code doing so
that can't actualy handle shared skbs and manually disable the new flag.  There
are about 10 drivers in this category.

Thoughts/reviews aprpeciated.
Neil
 

^ permalink raw reply

* Re: [lm-sensors] [PATCH 01/34] System Firmware Interface
From: Jean Delvare @ 2011-07-25 19:03 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-ia64-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	platform-driver-x86-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	device-drivers-devel-ZG0+EudsQA8dtHy/vicBwGD2FQJk+8+b,
	abelay-3s7WtUTddSA, eric.piel-VkQ1JFuSMpfAbQlEx87xDw,
	x86-DgEjT+Ai2ygdnm+yROfE0A, lm-sensors-GZX6beZjE8VD60Wz+7aTrA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	johnpol-9fLWQ3dKdXwox3rIn2DAYQ,
	linux-watchdog-u79uwXL29TY76Z2rM5mHXA,
	rtc-linux-/JYPxA39Uh5TLH3MbocFFw, dz-8fiUuRrzOP0dnm+yROfE0A,
	openipmi-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	evel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	rpurdie-Fm38FmjxZ/leoWH0uzbU5w,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1310994528-26276-2-git-send-email-prarit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, 18 Jul 2011 09:08:15 -0400, Prarit Bhargava wrote:
> This patch introduces a general System Firmware interface to the kernel, called
> sysfw.
> 
> Inlcluded in this interface is the ability to search a standard set of fields,
> sysfw_lookup().  The fields are currently based upon the x86 and ia64 SMBIOS
> fields but exapandable to fields that other arches may introduce.  Also
> included is  the ability to search and match against those fields, and run
> a callback function against the matches, sysfw_callback().
> 
> Modify module code to use sysfw instead of old DMI interface.

This is a HUGE patch set. You'd need to have a good reason for such a
big and intrusive change, yet I see no such reason explained. I
understand that we _can_ abstract system information interfaces, but
just because we can doesn't mean we have to. I would at least wait for
a second DMI-like interface to be widely implemented and support before
any attempt to abstract, otherwise your design is bound to be missing
the target. And even then, you'd still need to convince me that there
is a need for a unified interface to access both backends at once. I
would guess that you know what backend is present on a system when you
try to identify it.

At this point, I see the work needed to review your patches, the risk
of regressions due to the large size of the patch set, but I don't see
any immediate benefit. Thus I am not going to look into it at all,
sorry.

-- 
Jean Delvare

^ permalink raw reply

* [PATCH] net: fix eth.c kernel-doc warning
From: Randy Dunlap @ 2011-07-25 18:41 UTC (permalink / raw)
  To: netdev; +Cc: davem

From: Randy Dunlap <rdunlap@xenotime.net>

Fix new kernel-doc warning in eth.c:

Warning(net/ethernet/eth.c:237): No description found for parameter 'type'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
---
 net/ethernet/eth.c |    1 +
 1 file changed, 1 insertion(+)

--- linux-3.0-git4.orig/net/ethernet/eth.c
+++ linux-3.0-git4/net/ethernet/eth.c
@@ -231,6 +231,7 @@ EXPORT_SYMBOL(eth_header_parse);
  * eth_header_cache - fill cache entry from neighbour
  * @neigh: source neighbour
  * @hh: destination cache entry
+ * @type: Ethernet type field
  * Create an Ethernet header template from the neighbour.
  */
 int eth_header_cache(const struct neighbour *neigh, struct hh_cache *hh, __be16 type)

^ permalink raw reply

* Re: write() udp socket
From: Rick Jones @ 2011-07-25 17:38 UTC (permalink / raw)
  To: Huajun Li; +Cc: ZHOU Xiaobo, netdev
In-Reply-To: <CA+v9cxZAF27mXc6aCEPyOdqbeLved3_xibvDS4TQugHR=7owPg@mail.gmail.com>

On 07/24/2011 01:33 AM, Huajun Li wrote:
> 2011/7/23 ZHOU Xiaobo<xb.zhou@qq.com>:
>> question No1:
>> When I call
>> ssize_t write(int fd, const void *buf, size_t count);
>>
>>
>> on a nonblocking UDP socket, is the return value  always equal to 'count'?
>>
>>
>
> I don't think so.  The function may be interrupt by signal or return
> due to other reason, so the return value only represents the size it
> writes successfully to the fd.

I believe it should either appaear to succeed or fail.  write() best not 
be sending partial UDP datagrams.  That would be "bad."

^ permalink raw reply

* Attention
From: MONEYGRAM CENTER @ 2011-07-25 17:18 UTC (permalink / raw)





My associate has helped me to send your first payment of $5000 USD to you
as instructed by the Malaysian Government and Mr. David Cameron the United
Kingdom prime minister after the last G20 meeting that was held in
Malaysia, making you one of the beneficiaries.Here is the information
below.

MONEY TRANSFER REFERENCE: 2116-3297
SENDER'S NAME: Patrick Lee Chun
AMOUNT: US$5000

We will keep sending you $5000 USD twice a week until the FULL payment of
($820000.00 United State Dollars)is completed.

A certificate will be made to change the Receivers Name to your name as
stated by the Malaysian Government,reconfirm your {1}Full Names {2}address
{3}Mobile Number

via Email to: center.moneygram@yahoo.com.my to proceed.

Note: You cannot pickup the money until the certificate is obtained by you.

Best Regards

Money Gram Agent
Mr. Allan Davis.
Tel: +(60)1-6240-6207
For more info: www.g20.org

I wrote to know if this is your valid email. Please, let me know if this
email is valid.I have an information to pass to you.
My valid Email is: center.moneygram@yahoo.com.my

Money Gram Agent
Mr. Allan Davis.
Tel: +(60)1-6240-6207
For more info: www.g20.org


^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Casey Schaufler @ 2011-07-25 17:00 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: anton, mjt, davem, netdev, linux-security-module, Casey Schaufler
In-Reply-To: <201107260143.CGH18263.FOOSVMOQFJFLHt@I-love.SAKURA.ne.jp>

On 7/25/2011 9:43 AM, Tetsuo Handa wrote:
> Casey Schaufler wrote:
>>> OK. Then, the question is how to reduce performance loss by redundant
>>> security_socket_sendmsg() calls.
>> Not to be splitting hairs, but if the packets are headed to
>> different destinations the calls to security_socket_sendmsg()
>> are not redundant, they are necessary and appropriate. What
>> you have with sendmmsg() is an optimization that sacrifices
>> correctness for performance.
> Excuse me, but this thread is not trying to remove necessary and appropriate
> security_socket_sendmsg() calls. Linux 3.0 was released without necessary and
> appropriate security_socket_sendmsg() calls, and I'm trying to correct it (via
> msg11504.html or msg11510.html) for Linux 3.0.x stable release.

I understand. Sorry if I did a poor job of jumping into
the thread.

>> I fear that you are going to find that the work you have
>> to do to reduce the number of calls is going to outweigh
>> the benefits of your optimization, as has been pointed out
>> earlier.
> I fear it too. Unless many dozens (maybe some hundreds) of packets are sent by
> sendmmsg(), msg11504.html might show better performance than msg11510.html .
> But I don't have a machine to benchmark.

Is there some chance that the original authors could step up
to help with the benchmarking effort on this repair? Having been
on the end where I introduced problems more than once, I have a
good understanding of the principle "you broke it, you bought it".


^ permalink raw reply

* Re: IPv6: autoconfiguration and suspend/resume or link down/up
From: Stephen Hemminger @ 2011-07-25 16:55 UTC (permalink / raw)
  To: Herbert Xu, Stephen Hemminger
  Cc: Nicolas de Pesloüan, David Miller, jbohac, netdev

who manages link wit sriov? I assume it is up to the guest. And it is not really safe to assume that network is the same after migration. It makes sense to do DAD again.

Herbert Xu <herbert@gondor.apana.org.au> wrote:

>On Sun, Jul 24, 2011 at 08:26:20PM -0700, Stephen Hemminger wrote:
>>
>> Since virtual machines should be using virtio network devices, shouldn't
>> the suspend/resume in that device just work. It doesn't need to drop the link.
>
>The VM may also be using SRIOV.
>
>Cheers,
>-- 
>Email: Herbert Xu <herbert@gondor.apana.org.au>
>Home Page: http://gondor.apana.org.au/~herbert/
>PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: IPv6: autoconfiguration and suspend/resume or link down/up
From: Stephen Hemminger @ 2011-07-19 19:42 UTC (permalink / raw)
  To: Jiri Bohac, netdev; +Cc: Herbert Xu,  David S. Miller, stephen hemminger

bridge forwarding table; route cache; and neighbor table could have same problem. I thought carrier is supposed to toggle on suspend or hibernate

Jiri Bohac <jbohac@suse.cz> wrote:

>Hi,
>
>I came over a surprising behaviour with IPv6 autoconfiguration,
>which I think is a bug, but I would first like to hear other
>people's opinions before trying to fix this:
>
>Problem 1: all the address/route lifetimes are kept in jiffies
>and jiffies don't get incremented on resume. So when a
>route/address lifetime is 30 minutes and the system resumes after
>1 hour, the route/address should be considered expired, but it is
>not.
>
>Problem 2: when a system is moved to a new network a RS is not
>sent. Thus, IPv6 does not autoconfigure until the router sends a
>periodic RA. This can occur both while the system is alive and
>while it is suspended. I think the autoconfigured state should be
>discarded when the kernel suspects the system could have been
>moved to a different network.
>
>When the cable is unplugged and plugged in again, we already get
>notified through linkwatch -> netdev_state_change ->
>  -> call_netdevice_notifiers(NETDEV_CHANGE, ...)
>However, if the device has already been autoconfigured,
>addrconf_notify() only handles this event by printing a
>message.
>
>So my idea was to:
>- handle link up/down in addrconf_notify() similarly to
>  NETDEV_UP/NETDEV_DOWN
>
>- on suspend, faking a link down event; on resume, faking a link up event
>  (or better, having a special event type for suspend/resume)
>
>This would cause autoconfiguration to be restarted on resume as
>well as cable plug/unplug, solving both the above problems.
>
>Or do we want to completely rely on userspace tools
>(networkmanager/ifplug) and expect them to do NETDEV_DOWN on
>unplug/suspend and NETDEV_UP on plug/resume?
>
>Any thoughts?
>
>-- 
>Jiri Bohac <jbohac@suse.cz>
>SUSE Labs, SUSE CZ
>

^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Tetsuo Handa @ 2011-07-25 16:43 UTC (permalink / raw)
  To: casey; +Cc: anton, mjt, davem, netdev, linux-security-module
In-Reply-To: <4E2D8F4D.2000009@schaufler-ca.com>

Casey Schaufler wrote:
> > OK. Then, the question is how to reduce performance loss by redundant
> > security_socket_sendmsg() calls.
> 
> Not to be splitting hairs, but if the packets are headed to
> different destinations the calls to security_socket_sendmsg()
> are not redundant, they are necessary and appropriate. What
> you have with sendmmsg() is an optimization that sacrifices
> correctness for performance.

Excuse me, but this thread is not trying to remove necessary and appropriate
security_socket_sendmsg() calls. Linux 3.0 was released without necessary and
appropriate security_socket_sendmsg() calls, and I'm trying to correct it (via
msg11504.html or msg11510.html) for Linux 3.0.x stable release.

> I fear that you are going to find that the work you have
> to do to reduce the number of calls is going to outweigh
> the benefits of your optimization, as has been pointed out
> earlier.

I fear it too. Unless many dozens (maybe some hundreds) of packets are sent by
sendmmsg(), msg11504.html might show better performance than msg11510.html .
But I don't have a machine to benchmark.

^ permalink raw reply

* Re: [PATCH 0/7] More sane neigh infrastructure
From: Roland Dreier @ 2011-07-25 16:34 UTC (permalink / raw)
  To: David Miller; +Cc: linux-rdma, netdev
In-Reply-To: <20110725.030109.1723861338142084129.davem@davemloft.net>

On Mon, Jul 25, 2011 at 3:01 AM, David Miller <davem@davemloft.net> wrote:
> Devices provide up to three things:
>
> 1) netdev->neighpriv_len, length of per-neighbour device private
>   state, accessible via neighbour_priv(neigh)
>
> 2) net_device_ops->ndo_neigh_construct(), invoked right after
>   neigh_tbl->constructor(), can fail

Hey Dave,

I'll definitely look at converting IPoIB over to using this stuff.
Would love to get rid of all the dicy handling of ipoib_neigh lifetime
that we currently have.  However, I have a question about what the
intention for ndo_neigh_construct() is in the IPoIB case.

As we talked about, IPoIB has to trigger a path lookup to the subnet
manager (SM) when it gets a remote port ID.  However the SM is a
remote entity, so this lookup means we send a message and then
asynchronously wait for it to complete (or possibly timeout), just
like the ARP itself.  But this is done after we get the port ID via
normal RFC 826 ARP (with an address format as specified by RFC 4391).

So I don't think we can use custom neigh_ops with a new solict method
the way clip does -- we actually want to let the normal stack do ARP
or ND, but then extend the process by another message/response step.
I'm sure this is possible within your scheme but I'm not sure I
understand what the "right" way is.

Thanks!
  Roland

^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Casey Schaufler @ 2011-07-25 15:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: anton, mjt, davem, netdev, linux-security-module, Casey Schaufler
In-Reply-To: <201107252215.GBG95887.OQVMFOOJLSFFHt@I-love.SAKURA.ne.jp>

On 7/25/2011 6:15 AM, Tetsuo Handa wrote:
> Anton Blanchard wrote:
>>>> When I saw recvmmsg()/sendmmsg() here, my first thought was an
>>>> authoritative DNS server which can read several requests at a
>>>> time and answer them all at once too - this way it all will go
>>>> to different addresses.
>>> I don't know what application wants sendmmsg(). Since users can send
>>> up to UIO_MAXIOV (= 1024) "struct iovec" blocks using sendmsg(), they
>>> will use sendmsg() rather than sendmmsg() if the destination address
>>> are the same.
>> But if an application needs to maintain packet boundaries, then sendmsg
>> isn't going to help is it?
> Well, such application might want to use RDM or SeqPacket... but your point is
> to maintain packet boundaries. You are assuming that sendmmsg() will be used
> for sending as much data as possible while preserving packet boundaries.
>
> OK. Then, the question is how to reduce performance loss by redundant
> security_socket_sendmsg() calls.

Not to be splitting hairs, but if the packets are headed to
different destinations the calls to security_socket_sendmsg()
are not redundant, they are necessary and appropriate. What
you have with sendmmsg() is an optimization that sacrifices
correctness for performance.

> If sendmmsg() likely contains single (or few)
> destination(s), trying to optimize security_socket_sendmsg() calls by comparing
> destination address (as proposed at
> http://www.spinics.net/linux/fedora/linux-security-module/msg11510.html
> ) would help. Otherwise, no optimization (as proposed at
> http://www.spinics.net/linux/fedora/linux-security-module/msg11504.html
> ) would be better. Which approach do you like?

I fear that you are going to find that the work you have
to do to reduce the number of calls is going to outweigh
the benefits of your optimization, as has been pointed out
earlier. My recommendation is that the sendmmsg() interface
is ill conceived and that you should look for alternative
ways to improve the performance of the use case.


^ permalink raw reply

* Pick Up Your $5000.00USD,
From: WESTERN UNION OFFICE @ 2011-07-25 10:01 UTC (permalink / raw)




How are you today?


I write to inform you that we have already sent you $5,000.00USD
through Western union as we have been given the mandate to transfer
your full compensation payment of  $1.800,000.00USD via western union
by this government.

I called to give you the information through phone as internet hackers
were many but i cannot reach you yesterday even this morning,So I
decided to email you the (MTCN) and sender name so that you can pick
up this $5,000.00USD to enable us send another $5,000.00USD by
tomorrow as you knows we will be sending you only $5,000.00USD per
day.Please pick up this information and run to any western union
(OUTLET) in your country and pick up this $5,000.00USD and send us an
email back,so that we can send another $5,000.00USD by tomorrow.

Manager: Mr Frank Amos
email me on:western-money677@hotmail.com
call us on: +234-7031908911
once you picked up this $5000.00USD today.

Here is the western union information to pick up the $5000.00USD,

MTCN :___________MTCN 9500834460
first name: ______Appoline
Second Name: ______Ouedraoge
Text Question: ___________When
Answer: ___________________2Hours ago
Amount:______________________ $5,000.00 United State Dollars

I am waiting for your E-mail once you pick up $5000.00USD,

Thanks
Mr Frank Amos.


^ permalink raw reply

* [PATCH net-next] net: Convert struct net_device uc_promisc to bool
From: Joe Perches @ 2011-07-25 14:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel

No need to use int, its uses are boolean.
May save a few bytes one day.

Signed-off-by: Joe Perches <joe@perches.com>
---
 include/linux/netdevice.h |    2 +-
 net/core/dev.c            |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 34f3abc..1d92acc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1132,7 +1132,7 @@ struct net_device {
 	spinlock_t		addr_list_lock;
 	struct netdev_hw_addr_list	uc;	/* Unicast mac addresses */
 	struct netdev_hw_addr_list	mc;	/* Multicast mac addresses */
-	int			uc_promisc;
+	bool			uc_promisc;
 	unsigned int		promiscuity;
 	unsigned int		allmulti;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 9444c5c..17d67b5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4497,10 +4497,10 @@ void __dev_set_rx_mode(struct net_device *dev)
 		 */
 		if (!netdev_uc_empty(dev) && !dev->uc_promisc) {
 			__dev_set_promiscuity(dev, 1);
-			dev->uc_promisc = 1;
+			dev->uc_promisc = true;
 		} else if (netdev_uc_empty(dev) && dev->uc_promisc) {
 			__dev_set_promiscuity(dev, -1);
-			dev->uc_promisc = 0;
+			dev->uc_promisc = false;
 		}
 
 		if (ops->ndo_set_multicast_list)
-- 
1.7.6.131.g99019


^ permalink raw reply related

* Re: [bisected regression] Partial breakage of forcedeth driver
From: Jiri Pirko @ 2011-07-25 14:36 UTC (permalink / raw)
  To: walt; +Cc: netdev
In-Reply-To: <4E2CA683.4010005@gmail.com>

Mon, Jul 25, 2011 at 01:10:59AM CEST, w41ter@gmail.com wrote:
>Hi gang.
>
>commit 3326c784c9f492e988617d93f647ae0cfd4c8d09
>Author: Jiri Pirko <jpirko@>
>Date:   Wed Jul 20 04:54:38 2011 +0000
>
>    forcedeth: do vlan cleanup
>    
>    - unify vlan and nonvlan rx path
>    - kill np->vlangrp and nv_vlan_rx_register
>    - allow to turn on/off rx vlan accel via ethtool (set_features)
>    
>    Signed-off-by: Jiri Pirko <jpirko@>
>    Signed-off-by: David S. Miller <davem@>
>
>This commit causes networking trouble for my nForce-based motherboard.
>(Details happily supplied if needed.)
>
>I say 'partial' breakage because networking is not completely dead,
>just limping very slowly :)
>
>The simplest test is to ping any host (in my LAN or on the internet,
>makes no difference).  The replies to my ping come at very irregular
>intervals, ranging from normal to 5-10 seconds or so, seemingly at
>random AFAICT.
>
>I'm happy to try any patches, tests, whatever, to help fix this.
>
>Many thanks!


Same is happening on machine I borrowed. ccing list.
Will look at it very soon (no time today).

jirka

^ permalink raw reply

* [PATCH 7/7] dccp ccid-2: check Ack Ratio when reducing cwnd
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Samuel Jero
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

From: Samuel Jero <sj323707@ohio.edu>

This patch causes CCID-2 to check the Ack Ratio after reducing the congestion
window. If the Ack Ratio is greater than the congestion window, it is
reduced. This prevents timeouts caused by an Ack Ratio larger than the
congestion window.

In this situation, we choose to set the Ack Ratio to half the congestion window
(or one if that's zero) so that if we loose one ack we don't trigger a timeout.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid2.c |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)

--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -101,6 +101,24 @@ static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
 				   min_t(u32, val, DCCPF_ACK_RATIO_MAX));
 }
 
+static void ccid2_check_l_ack_ratio(struct sock *sk)
+{
+	struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
+
+	/*
+	 * After a loss, idle period, application limited period, or RTO we
+	 * need to check that the ack ratio is still less than the congestion
+	 * window. Otherwise, we will send an entire congestion window of
+	 * packets and got no response because we haven't sent ack ratio
+	 * packets yet.
+	 * If the ack ratio does need to be reduced, we reduce it to half of
+	 * the congestion window (or 1 if that's zero) instead of to the
+	 * congestion window. This prevents problems if one ack is lost.
+	 */
+	if (dccp_feat_nn_get(sk, DCCPF_ACK_RATIO) > hc->tx_cwnd)
+		ccid2_change_l_ack_ratio(sk, hc->tx_cwnd/2 ? : 1U);
+}
+
 static void ccid2_change_l_seq_window(struct sock *sk, u64 val)
 {
 	dccp_feat_signal_nn_change(sk, DCCPF_SEQUENCE_WINDOW,
@@ -187,6 +205,8 @@ static void ccid2_cwnd_application_limited(struct sock *sk, const u32 now)
 	}
 	hc->tx_cwnd_used  = 0;
 	hc->tx_cwnd_stamp = now;
+
+	ccid2_check_l_ack_ratio(sk);
 }
 
 /* This borrows the code of tcp_cwnd_restart() */
@@ -205,6 +225,8 @@ static void ccid2_cwnd_restart(struct sock *sk, const u32 now)
 
 	hc->tx_cwnd_stamp = now;
 	hc->tx_cwnd_used  = 0;
+
+	ccid2_check_l_ack_ratio(sk);
 }
 
 static void ccid2_hc_tx_packet_sent(struct sock *sk, unsigned int len)
@@ -461,9 +483,7 @@ static void ccid2_congestion_event(struct sock *sk, struct ccid2_seq *seqp)
 	hc->tx_cwnd      = hc->tx_cwnd / 2 ? : 1U;
 	hc->tx_ssthresh  = max(hc->tx_cwnd, 2U);
 
-	/* Avoid spurious timeouts resulting from Ack Ratio > cwnd */
-	if (dccp_sk(sk)->dccps_l_ack_ratio > hc->tx_cwnd)
-		ccid2_change_l_ack_ratio(sk, hc->tx_cwnd);
+	ccid2_check_l_ack_ratio(sk);
 }
 
 static int ccid2_hc_tx_parse_options(struct sock *sk, u8 packet_type,

^ permalink raw reply

* [PATCH 5/7] dccp ccid-2: prevent cwnd > Sequence Window
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Samuel Jero
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

From: Samuel Jero <sj323707@ohio.edu>

Add a check to prevent CCID-2 from increasing the cwnd greater than the
Sequence Window.

When the congestion window becomes bigger than the Sequence Window, CCID-2
will attempt to keep more data in the network than the DCCP Sequence Window
code considers possible. This results in the Sequence Window code issuing
a Sync, thereby inducing needless overhead. Further, if this occurs at the
sender, CCID-2 will never detect the problem because the Acks it receives
will indicate no losses. I have seen this cause a drop of 1/3rd in throughput
for a connection.

Also add code to adjust the Sequence Window to be about 5 times the number of
packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that
the remote Sequence Window will hold about 5 times the number of packets in
the network. This allows the congestion window to increase correctly without
being limited by the Sequence Window.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid2.c |   50 +++++++++++++++++++++++++++++++++--------------
 net/dccp/ccids/ccid2.h |    6 +++++
 2 files changed, 41 insertions(+), 15 deletions(-)

--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -43,6 +43,12 @@ struct ccid2_seq {
 #define CCID2_SEQBUF_LEN 1024
 #define CCID2_SEQBUF_MAX 128
 
+/*
+ * Multiple of congestion window to keep the sequence window at
+ * (RFC 4340 7.5.2)
+ */
+#define CCID2_WIN_CHANGE_FACTOR 5
+
 /**
  * struct ccid2_hc_tx_sock - CCID2 TX half connection
  * @tx_{cwnd,ssthresh,pipe}: as per RFC 4341, section 5
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -85,7 +85,6 @@ static int ccid2_hc_tx_send_packet(struct sock *sk, struct sk_buff *skb)
 
 static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
 {
-	struct dccp_sock *dp = dccp_sk(sk);
 	u32 max_ratio = DIV_ROUND_UP(ccid2_hc_tx_sk(sk)->tx_cwnd, 2);
 
 	/*
@@ -98,14 +97,15 @@ static void ccid2_change_l_ack_ratio(struct sock *sk, u32 val)
 		DCCP_WARN("Limiting Ack Ratio (%u) to %u\n", val, max_ratio);
 		val = max_ratio;
 	}
-	if (val > DCCPF_ACK_RATIO_MAX)
-		val = DCCPF_ACK_RATIO_MAX;
-
-	if (val == dp->dccps_l_ack_ratio)
-		return;
+	dccp_feat_signal_nn_change(sk, DCCPF_ACK_RATIO,
+				   min_t(u32, val, DCCPF_ACK_RATIO_MAX));
+}
 
-	ccid2_pr_debug("changing local ack ratio to %u\n", val);
-	dp->dccps_l_ack_ratio = val;
+static void ccid2_change_l_seq_window(struct sock *sk, u64 val)
+{
+	dccp_feat_signal_nn_change(sk, DCCPF_SEQUENCE_WINDOW,
+				   clamp_val(val, DCCPF_SEQ_WMIN,
+						  DCCPF_SEQ_WMAX));
 }
 
 static void ccid2_hc_tx_rto_expire(unsigned long data)
@@ -405,17 +405,37 @@ static void ccid2_new_ack(struct sock *sk, struct ccid2_seq *seqp,
 			  unsigned int *maxincr)
 {
 	struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk);
-
-	if (hc->tx_cwnd < hc->tx_ssthresh) {
-		if (*maxincr > 0 && ++hc->tx_packets_acked == 2) {
+	struct dccp_sock *dp = dccp_sk(sk);
+	int r_seq_used = hc->tx_cwnd / dp->dccps_l_ack_ratio;
+
+	if (hc->tx_cwnd < dp->dccps_l_seq_win &&
+	    r_seq_used < dp->dccps_r_seq_win) {
+		if (hc->tx_cwnd < hc->tx_ssthresh) {
+			if (*maxincr > 0 && ++hc->tx_packets_acked == 2) {
+				hc->tx_cwnd += 1;
+				*maxincr    -= 1;
+				hc->tx_packets_acked = 0;
+			}
+		} else if (++hc->tx_packets_acked >= hc->tx_cwnd) {
 			hc->tx_cwnd += 1;
-			*maxincr    -= 1;
 			hc->tx_packets_acked = 0;
 		}
-	} else if (++hc->tx_packets_acked >= hc->tx_cwnd) {
-			hc->tx_cwnd += 1;
-			hc->tx_packets_acked = 0;
 	}
+
+	/*
+	 * Adjust the local sequence window and the ack ratio to allow about
+	 * 5 times the number of packets in the network (RFC 4340 7.5.2)
+	 */
+	if (r_seq_used * CCID2_WIN_CHANGE_FACTOR >= dp->dccps_r_seq_win)
+		ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio * 2);
+	else if (r_seq_used * CCID2_WIN_CHANGE_FACTOR < dp->dccps_r_seq_win/2)
+		ccid2_change_l_ack_ratio(sk, dp->dccps_l_ack_ratio / 2 ? : 1U);
+
+	if (hc->tx_cwnd * CCID2_WIN_CHANGE_FACTOR >= dp->dccps_l_seq_win)
+		ccid2_change_l_seq_window(sk, dp->dccps_l_seq_win * 2);
+	else if (hc->tx_cwnd * CCID2_WIN_CHANGE_FACTOR < dp->dccps_l_seq_win/2)
+		ccid2_change_l_seq_window(sk, dp->dccps_l_seq_win / 2);
+
 	/*
 	 * FIXME: RTT is sampled several times per acknowledgment (for each
 	 * entry in the Ack Vector), instead of once per Ack (as in TCP SACK).

^ permalink raw reply

* [PATCH 6/7] dccp ccid-2: increment cwnd correctly
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Samuel Jero
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

From: Samuel Jero <sj323707@ohio.edu>

This patch fixes an issue where CCID-2 will not increase the congestion
window for numerous RTTs after an idle period, application-limited period,
or a loss once the algorithm is in Congestion Avoidance.

What happens is that, when CCID-2 is in Congestion Avoidance mode, it will
increase hc->tx_packets_acked by one for every packet and will increment cwnd
every cwnd packets. However, if there is now an idle period in the connection,
cwnd will be reduced, possibly below the slow start threshold. This will
cause the connection to go into Slow Start. However, in Slow Start CCID-2
performs this test to increment cwnd every second ack:

	++hc->tx_packets_acked == 2

Unfortunately, this will be incorrect, if cwnd previous to the idle period
was larger than 2 and if tx_packets_acked was close to cwnd. For example:
	cwnd=50  and  tx_packets_acked=45.

In this case, the current code, will increment tx_packets_acked until it
equals two, which will only be once tx_packets_acked (an unsigned 32-bit
integer) overflows.

My fix is simply to change that test for tx_packets_acked greater than or
equal to two in slow start.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid2.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -411,7 +411,7 @@ static void ccid2_new_ack(struct sock *sk, struct ccid2_seq *seqp,
 	if (hc->tx_cwnd < dp->dccps_l_seq_win &&
 	    r_seq_used < dp->dccps_r_seq_win) {
 		if (hc->tx_cwnd < hc->tx_ssthresh) {
-			if (*maxincr > 0 && ++hc->tx_packets_acked == 2) {
+			if (*maxincr > 0 && ++hc->tx_packets_acked >= 2) {
 				hc->tx_cwnd += 1;
 				*maxincr    -= 1;
 				hc->tx_packets_acked = 0;

^ permalink raw reply

* [PATCH 2/7] dccp: support for exchanging of NN options in established state 2/2
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

This patch adds the receiver side and the (fast-path) activation part for
dynamic changes of non-negotiable (NN) parameters in (PART)OPEN state.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
---
 net/dccp/feat.c |  116 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 116 insertions(+), 0 deletions(-)

--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -344,6 +344,20 @@ static int __dccp_feat_activate(struct sock *sk, const int idx,
 	return dccp_feat_table[idx].activation_hdlr(sk, val, rx);
 }
 
+/**
+ * dccp_feat_activate  -  Activate feature value on socket
+ * @sk: fully connected DCCP socket (after handshake is complete)
+ * @feat_num: feature to activate, one of %dccp_feature_numbers
+ * @local: whether local (1) or remote (0) @feat_num is meant
+ * @fval: the value (SP or NN) to activate, or NULL to use the default value
+ * For general use this function is preferable over __dccp_feat_activate().
+ */
+static int dccp_feat_activate(struct sock *sk, u8 feat_num, bool local,
+			      dccp_feat_val const *fval)
+{
+	return __dccp_feat_activate(sk, dccp_feat_index(feat_num), local, fval);
+}
+
 /* Test for "Req'd" feature (RFC 4340, 6.4) */
 static inline int dccp_feat_must_be_understood(u8 feat_num)
 {
@@ -1252,6 +1266,100 @@ confirmation_failed:
 }
 
 /**
+ * dccp_feat_handle_nn_established  -  Fast-path reception of NN options
+ * @sk:		socket of an established DCCP connection
+ * @mandatory:	whether @opt was preceded by a Mandatory option
+ * @opt:	%DCCPO_CHANGE_L | %DCCPO_CONFIRM_R (NN only)
+ * @feat:	NN number, one of %dccp_feature_numbers
+ * @val:	NN value
+ * @len:	length of @val in bytes
+ * This function combines the functionality of change_recv/confirm_recv, with
+ * the following differences (reset codes are the same):
+ *    - cleanup after receiving the Confirm;
+ *    - values are directly activated after successful parsing;
+ *    - deliberately restricted to NN features.
+ * The restriction to NN features is essential since SP features can have non-
+ * predictable outcomes (depending on the remote configuration), and are inter-
+ * dependent (CCIDs for instance cause further dependencies).
+ */
+static u8 dccp_feat_handle_nn_established(struct sock *sk, u8 mandatory, u8 opt,
+					  u8 feat, u8 *val, u8 len)
+{
+	struct list_head *fn = &dccp_sk(sk)->dccps_featneg;
+	const bool local = (opt == DCCPO_CONFIRM_R);
+	struct dccp_feat_entry *entry;
+	u8 type = dccp_feat_type(feat);
+	dccp_feat_val fval;
+
+	dccp_feat_print_opt(opt, feat, val, len, mandatory);
+
+	/* Ignore non-mandatory unknown and non-NN features */
+	if (type == FEAT_UNKNOWN) {
+		if (local && !mandatory)
+			return 0;
+		goto fast_path_unknown;
+	} else if (type != FEAT_NN) {
+		return 0;
+	}
+
+	/*
+	 * We don't accept empty Confirms, since in fast-path feature
+	 * negotiation the values are enabled immediately after sending
+	 * the Change option.
+	 * Empty Changes on the other hand are invalid (RFC 4340, 6.1).
+	 */
+	if (len == 0 || len > sizeof(fval.nn))
+		goto fast_path_unknown;
+
+	if (opt == DCCPO_CHANGE_L) {
+		fval.nn = dccp_decode_value_var(val, len);
+		if (!dccp_feat_is_valid_nn_val(feat, fval.nn))
+			goto fast_path_unknown;
+
+		if (dccp_feat_push_confirm(fn, feat, local, &fval) ||
+		    dccp_feat_activate(sk, feat, local, &fval))
+			return DCCP_RESET_CODE_TOO_BUSY;
+
+		/* set the `Ack Pending' flag to piggyback a Confirm */
+		inet_csk_schedule_ack(sk);
+
+	} else if (opt == DCCPO_CONFIRM_R) {
+		entry = dccp_feat_list_lookup(fn, feat, local);
+		if (entry == NULL || entry->state != FEAT_CHANGING)
+			return 0;
+
+		fval.nn = dccp_decode_value_var(val, len);
+		/*
+		 * Just ignore a value that doesn't match our current value.
+		 * If the option changes twice within two RTTs, then at least
+		 * one CONFIRM will be received for the old value after a
+		 * new CHANGE was sent.
+		 */
+		if (fval.nn != entry->val.nn)
+			return 0;
+
+		/* Only activate after receiving the Confirm option (6.6.1). */
+		dccp_feat_activate(sk, feat, local, &fval);
+
+		/* It has been confirmed - so remove the entry */
+		dccp_feat_list_pop(entry);
+
+	} else {
+		DCCP_WARN("Received illegal option %u\n", opt);
+		goto fast_path_failed;
+	}
+	return 0;
+
+fast_path_unknown:
+	if (!mandatory)
+		return dccp_push_empty_confirm(fn, feat, local);
+
+fast_path_failed:
+	return mandatory ? DCCP_RESET_CODE_MANDATORY_ERROR
+			 : DCCP_RESET_CODE_OPTION_ERROR;
+}
+
+/**
  * dccp_feat_parse_options  -  Process Feature-Negotiation Options
  * @sk: for general use and used by the client during connection setup
  * @dreq: used by the server during connection setup
@@ -1286,6 +1394,14 @@ int dccp_feat_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
 			return dccp_feat_confirm_recv(fn, mandatory, opt, feat,
 						      val, len, server);
 		}
+		break;
+	/*
+	 *	Support for exchanging NN options on an established connection.
+	 */
+	case DCCP_OPEN:
+	case DCCP_PARTOPEN:
+		return dccp_feat_handle_nn_established(sk, mandatory, opt, feat,
+						       val, len);
 	}
 	return 0;	/* ignore FN options in all other states */
 }

^ permalink raw reply

* [PATCH 3/7] dccp: send Confirm options only once
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev, Samuel Jero
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

From: Samuel Jero <sj323707@ohio.edu>

If a connection is in the OPEN state, remove feature negotiation Confirm
options from the list of options after sending them once; as such options
are NOT supposed to be retransmitted and are ONLY supposed to be sent in
response to a Change option (RFC 4340 6.2).

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/feat.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -665,11 +665,22 @@ int dccp_feat_insert_opts(struct dccp_sock *dp, struct dccp_request_sock *dreq,
 			return -1;
 		if (pos->needs_mandatory && dccp_insert_option_mandatory(skb))
 			return -1;
-		/*
-		 * Enter CHANGING after transmitting the Change option (6.6.2).
-		 */
-		if (pos->state == FEAT_INITIALISING)
-			pos->state = FEAT_CHANGING;
+
+		if (skb->sk->sk_state == DCCP_OPEN &&
+		    (opt == DCCPO_CONFIRM_R || opt == DCCPO_CONFIRM_L)) {
+			/*
+			 * Confirms don't get retransmitted (6.6.3) once the
+			 * connection is in state OPEN
+			 */
+			dccp_feat_list_pop(pos);
+		} else {
+			/*
+			 * Enter CHANGING after transmitting the Change
+			 * option (6.6.2).
+			 */
+			if (pos->state == FEAT_INITIALISING)
+				pos->state = FEAT_CHANGING;
+		}
 	}
 	return 0;
 }

^ permalink raw reply

* [PATCH 1/7] dccp: support for the exchange of NN options in established state 1/2
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

In contrast to static feature negotiation at the begin of a connection, this
patch introduces support for exchange of dynamically changing options.

Such an update/exchange is necessary in at least two cases:
 * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection;
 * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as
   the connection progresses".

Both are non-negotiable (NN) features, which means that no new capabilities
are negotiated, but rather that changes in known parameters are brought
up-to-date at either end.

Thse characteristics are reflected by the implementation:
 * only NN options can be exchanged after connection setup;
 * an ack is scheduled directly after activation to speed up the update;
 * CCIDs may request changes to an NN feature even if a negotiation for that
   feature is already underway: this is required by CCID-2, where changes in
   cwnd necessitate Ack Ratio changes, such that the previous Ack Ratio (which
   is still being negotiated) would cause irrecoverable RTO timeouts (thanks
   to work by Samuel Jero).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
---
 net/dccp/dccp.h |    1 +
 net/dccp/feat.c |   65 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/dccp/feat.h |    1 +
 3 files changed, 67 insertions(+), 0 deletions(-)

--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -474,6 +474,7 @@ static inline int dccp_ack_pending(const struct sock *sk)
 	return dccp_ackvec_pending(sk) || inet_csk_ack_scheduled(sk);
 }
 
+extern int  dccp_feat_signal_nn_change(struct sock *sk, u8 feat, u64 nn_val);
 extern int  dccp_feat_finalise_settings(struct dccp_sock *dp);
 extern int  dccp_feat_server_ccid_dependencies(struct dccp_request_sock *dreq);
 extern int  dccp_feat_insert_opts(struct dccp_sock*, struct dccp_request_sock*,
--- a/net/dccp/feat.h
+++ b/net/dccp/feat.h
@@ -129,6 +129,7 @@ extern int  dccp_feat_clone_list(struct list_head const *, struct list_head *);
 
 extern void dccp_encode_value_var(const u64 value, u8 *to, const u8 len);
 extern u64  dccp_decode_value_var(const u8 *bf, const u8 len);
+extern u64  dccp_feat_nn_get(struct sock *sk, u8 feat);
 
 extern int  dccp_insert_option_mandatory(struct sk_buff *skb);
 extern int  dccp_insert_fn_opt(struct sk_buff *skb, u8 type, u8 feat,
--- a/net/dccp/feat.c
+++ b/net/dccp/feat.c
@@ -12,6 +12,7 @@
  *  -----------
  *  o Feature negotiation is coordinated with connection setup (as in TCP), wild
  *    changes of parameters of an established connection are not supported.
+ *  o Changing non-negotiable (NN) values is supported in state OPEN/PARTOPEN.
  *  o All currently known SP features have 1-byte quantities. If in the future
  *    extensions of RFCs 4340..42 define features with item lengths larger than
  *    one byte, a feature-specific extension of the code will be required.
@@ -730,6 +731,70 @@ int dccp_feat_register_sp(struct sock *sk, u8 feat, u8 is_local,
 				  0, list, len);
 }
 
+/**
+ * dccp_feat_nn_get  -  Query current/pending value of NN feature
+ * @sk: DCCP socket of an established connection
+ * @feat: NN feature number from %dccp_feature_numbers
+ * For a known NN feature, returns value currently being negotiated, or
+ * current (confirmed) value if no negotiation is going on.
+ */
+u64 dccp_feat_nn_get(struct sock *sk, u8 feat)
+{
+	if (dccp_feat_type(feat) == FEAT_NN) {
+		struct dccp_sock *dp = dccp_sk(sk);
+		struct dccp_feat_entry *entry;
+
+		entry = dccp_feat_list_lookup(&dp->dccps_featneg, feat, 1);
+		if (entry != NULL)
+			return entry->val.nn;
+
+		switch (feat) {
+		case DCCPF_ACK_RATIO:
+			return dp->dccps_l_ack_ratio;
+		case DCCPF_SEQUENCE_WINDOW:
+			return dp->dccps_l_seq_win;
+		}
+	}
+	DCCP_BUG("attempt to look up unsupported feature %u", feat);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dccp_feat_nn_get);
+
+/**
+ * dccp_feat_signal_nn_change  -  Update NN values for an established connection
+ * @sk: DCCP socket of an established connection
+ * @feat: NN feature number from %dccp_feature_numbers
+ * @nn_val: the new value to use
+ * This function is used to communicate NN updates out-of-band.
+ */
+int dccp_feat_signal_nn_change(struct sock *sk, u8 feat, u64 nn_val)
+{
+	struct list_head *fn = &dccp_sk(sk)->dccps_featneg;
+	dccp_feat_val fval = { .nn = nn_val };
+	struct dccp_feat_entry *entry;
+
+	if (sk->sk_state != DCCP_OPEN && sk->sk_state != DCCP_PARTOPEN)
+		return 0;
+
+	if (dccp_feat_type(feat) != FEAT_NN ||
+	    !dccp_feat_is_valid_nn_val(feat, nn_val))
+		return -EINVAL;
+
+	if (nn_val == dccp_feat_nn_get(sk, feat))
+		return 0;	/* already set or negotiation under way */
+
+	entry = dccp_feat_list_lookup(fn, feat, 1);
+	if (entry != NULL) {
+		dccp_pr_debug("Clobbering existing NN entry %llu -> %llu\n",
+			      (unsigned long long)entry->val.nn,
+			      (unsigned long long)nn_val);
+		dccp_feat_list_pop(entry);
+	}
+
+	inet_csk_schedule_ack(sk);
+	return dccp_feat_push_change(fn, feat, 1, 0, &fval);
+}
+EXPORT_SYMBOL_GPL(dccp_feat_signal_nn_change);
 
 /*
  *	Tracking features whose value depend on the choice of CCID

^ permalink raw reply

* [PATCH 4/7] dccp ccid-2: use feature-negotiation to report Ack Ratio changes
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

This uses the new feature-negotiation framework to signal Ack Ratio changes,
as required by RFC 4341, sec. 6.1.2.

That raises some problems with CCID-2, which at the moment can not cope
gracefully with Ack Ratios > 1. Since these issues are not directly related
to feature negotiation, they are marked by a FIXME.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>
---
 net/dccp/ccids/ccid2.c |   10 +++++++++-
 net/dccp/proto.c       |    1 -
 2 files changed, 9 insertions(+), 2 deletions(-)

--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -184,7 +184,6 @@ int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized)
 	dp->dccps_rate_last	= jiffies;
 	dp->dccps_role		= DCCP_ROLE_UNDEFINED;
 	dp->dccps_service	= DCCP_SERVICE_CODE_IS_ABSENT;
-	dp->dccps_l_ack_ratio	= dp->dccps_r_ack_ratio = 1;
 	dp->dccps_tx_qlen	= sysctl_dccp_tx_qlen;
 
 	dccp_init_xmit_timers(sk);
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -494,8 +494,16 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 			if (hc->tx_rpdupack >= NUMDUPACK) {
 				hc->tx_rpdupack = -1; /* XXX lame */
 				hc->tx_rpseq    = 0;
-
+#ifdef __CCID2_COPES_GRACEFULLY_WITH_ACK_CONGESTION_CONTROL__
+				/*
+				 * FIXME: Ack Congestion Control is broken; in
+				 * the current state instabilities occurred with
+				 * Ack Ratios greater than 1; causing hang-ups
+				 * and long RTO timeouts. This needs to be fixed
+				 * before opening up dynamic changes. -- gerrit
+				 */
 				ccid2_change_l_ack_ratio(sk, 2 * dp->dccps_l_ack_ratio);
+#endif
 			}
 		}
 	}

^ permalink raw reply

* net-next-2.6 [PATCH 0/7] dccp: add support for dynamic parameter updates
From: Gerrit Renker @ 2011-07-25 13:36 UTC (permalink / raw)
  To: davem; +Cc: dccp, netdev
In-Reply-To: <test_tree_patch_set_update_2011-07-25>

Hi Dave,

please find attached a 2-part patch set to implement features required by the RFCs:

 a) exchange of "non-negotiable" (NN) feature options (RFC 4340, 6.3.2), which are
    used to dynamically update known parameters during an established connection;
 b) use of this new API to improve the current state of the CCID-2 (RFC 4341)
    implementation for updating Ack Ratio and Sequence Window features.

Both sets are thanks to the good work done by Samuel Jero.    


General DCCP part:
 Patch #1: introduces sender-signalling part for exchange of NN options.
 Patch #2: implements the receiver-side and activation part for NN options.
 Patch #3: bug-fix to send Confirm options in the RFC-specified manner.

CCID-2 part: 
 Patch #4: adds initial code for CCID-2 Ack Ratio exchange.
 Patch #5: fixes issues with cwnd/Sequence Window relationship in CCID-2. 
 Patch #6: fixes a bug in incrementing the cwnd of CCID-2.
 Patch #7: fixes a bug in updating Ack Ratio relative to cwnd in CCID-2.


I have also placed this in into a fresh (today's) copy of net-next-2.6, on

    git://eden-feed.erg.abdn.ac.uk/net-next-2.6        [subtree 'dccp']

---
 ccids/ccid2.c |   88 +++++++++++++++++++------
 ccids/ccid2.h |    6 +
 dccp.h        |    1 
 feat.c        |  202 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 feat.h        |    1 
 proto.c       |    1 
 6 files changed, 273 insertions(+), 26 deletions(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox