Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net V3] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 12:07 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: david.vrabel, jbeulich, annie.li, Wei Liu, Ian Campbell,
	Jason Luan

time_after_eq() only works if the delta is < MAX_ULONG/2.

For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).

Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jason Luan <jianhai.luan@oracle.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    3 +--
 drivers/net/xen-netback/netback.c   |   10 +++++-----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..400fea1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -163,6 +163,7 @@ struct xenvif {
 	unsigned long   credit_usec;
 	unsigned long   remaining_credit;
 	struct timer_list credit_timeout;
+	u64 credit_window_start;
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..459935a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -312,8 +312,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
 	init_timer(&vif->credit_timeout);
-	/* Initialize 'expires' now: it's used to track the credit window. */
-	vif->credit_timeout.expires = jiffies;
+	vif->credit_window_start = get_jiffies_64();
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..900da4b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1185,9 +1185,8 @@ out:
 
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
-	unsigned long now = jiffies;
-	unsigned long next_credit =
-		vif->credit_timeout.expires +
+	u64 now = get_jiffies_64();
+	u64 next_credit = vif->credit_window_start +
 		msecs_to_jiffies(vif->credit_usec / 1000);
 
 	/* Timer could already be pending in rare cases. */
@@ -1195,8 +1194,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 		return true;
 
 	/* Passed the point where we can replenish credit? */
-	if (time_after_eq(now, next_credit)) {
-		vif->credit_timeout.expires = now;
+	if (time_after_eq64(now, next_credit)) {
+		vif->credit_window_start = now;
 		tx_add_credit(vif);
 	}
 
@@ -1208,6 +1207,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 			tx_credit_callback;
 		mod_timer(&vif->credit_timeout,
 			  next_credit);
+		vif->credit_window_start = next_credit;
 
 		return true;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 12:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, david.vrabel, Ian Campbell, xen-devel, annie.li,
	Jason Luan, netdev
In-Reply-To: <526E5D7302000078000FD41A@nat28.tlf.novell.com>

On Mon, Oct 28, 2013 at 11:49:55AM +0000, Jan Beulich wrote:
> >>> On 28.10.13 at 12:35, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> Two formal things:
> 
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1185,18 +1185,17 @@ out:
> >  
> >  static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
> >  {
> > -	unsigned long now = jiffies;
> > -	unsigned long next_credit =
> > -		vif->credit_timeout.expires +
> > -		msecs_to_jiffies(vif->credit_usec / 1000);
> > +	u64 now = get_jiffies_64();
> > +	u64 next_credit = vif->credit_window_start +
> > +		(u64)msecs_to_jiffies(vif->credit_usec / 1000);
> 
> The cast to u64 seems pointless here.
> 
> > @@ -1207,7 +1206,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
> >  		vif->credit_timeout.function =
> >  			tx_credit_callback;
> >  		mod_timer(&vif->credit_timeout,
> > -			  next_credit);
> > +			  (unsigned long)next_credit);
> 
> And the cast here seems pointless too (gcc doesn't warn about
> truncations).
> 

Will fix these, thanks.

Wei.

> Jan

^ permalink raw reply

* [PATCH 5/5] net/usb/r8152: code adjust
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

 -Remove rtl8152_get_stats()
 -Fix the wrong name
 -Something else

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 46 +++++++++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 25 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 90bc105..d252ff6 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -218,7 +218,7 @@
 #define POWER_CUT		0x0100
 
 /* USB_PM_CTRL_STATUS */
-#define RWSUME_INDICATE		0x0001
+#define RESUME_INDICATE		0x0001
 
 /* USB_USB_CTRL */
 #define RX_AGG_DISABLE		0x0010
@@ -376,7 +376,8 @@ struct r8152 {
 enum rtl_version {
 	RTL_VER_UNKNOWN = 0,
 	RTL_VER_01,
-	RTL_VER_02
+	RTL_VER_02,
+	RTL_VER_INVALLID = -1
 };
 
 /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
@@ -422,14 +423,15 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 			       value, index, tmp, size, 500);
 
 	kfree(tmp);
+
 	return ret;
 }
 
 static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
 				void *data, u16 type)
 {
-	u16	limit = 64;
-	int	ret = 0;
+	u16 limit = 64;
+	int ret = 0;
 
 	if (test_bit(RTL8152_UNPLUG, &tp->flags))
 		return -ENODEV;
@@ -468,9 +470,9 @@ static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
 static int generic_ocp_write(struct r8152 *tp, u16 index, u16 byteen,
 				u16 size, void *data, u16 type)
 {
-	int	ret;
-	u16	byteen_start, byteen_end, byen;
-	u16	limit = 512;
+	int ret;
+	u16 byteen_start, byteen_end, byen;
+	u16 limit = 512;
 
 	if (test_bit(RTL8152_UNPLUG, &tp->flags))
 		return -ENODEV;
@@ -763,11 +765,6 @@ static int rtl8152_set_mac_address(struct net_device *netdev, void *p)
 	return 0;
 }
 
-static struct net_device_stats *rtl8152_get_stats(struct net_device *dev)
-{
-	return &dev->stats;
-}
-
 static void read_bulk_callback(struct urb *urb)
 {
 	struct net_device *netdev;
@@ -836,6 +833,7 @@ static void read_bulk_callback(struct urb *urb)
 static void write_bulk_callback(struct urb *urb)
 {
 	struct net_device_stats *stats;
+	struct net_device *netdev;
 	unsigned long flags;
 	struct tx_agg *agg;
 	struct r8152 *tp;
@@ -849,7 +847,8 @@ static void write_bulk_callback(struct urb *urb)
 	if (!tp)
 		return;
 
-	stats = rtl8152_get_stats(tp->netdev);
+	netdev = tp->netdev;
+	stats = &netdev->stats;
 	if (status) {
 		pr_warn_ratelimited("Tx status %d\n", status);
 		stats->tx_errors += agg->skb_num;
@@ -862,7 +861,7 @@ static void write_bulk_callback(struct urb *urb)
 	list_add_tail(&agg->list, &tp->tx_free);
 	spin_unlock_irqrestore(&tp->tx_lock, flags);
 
-	if (!netif_carrier_ok(tp->netdev))
+	if (!netif_carrier_ok(netdev))
 		return;
 
 	if (!test_bit(WORK_ENABLE, &tp->flags))
@@ -1214,7 +1213,7 @@ static void rx_bottom(struct r8152 *tp)
 
 		while (urb->actual_length > len_used) {
 			struct net_device *netdev = tp->netdev;
-			struct net_device_stats *stats;
+			struct net_device_stats *stats = &netdev->stats;
 			unsigned pkt_len;
 			struct sk_buff *skb;
 
@@ -1226,8 +1225,6 @@ static void rx_bottom(struct r8152 *tp)
 			if (urb->actual_length < len_used)
 				break;
 
-			stats = rtl8152_get_stats(netdev);
-
 			pkt_len -= 4; /* CRC */
 			rx_data += sizeof(struct rx_desc);
 
@@ -1281,7 +1278,7 @@ static void tx_bottom(struct r8152 *tp)
 			unsigned long flags;
 
 			netdev = tp->netdev;
-			stats = rtl8152_get_stats(netdev);
+			stats = &netdev->stats;
 
 			if (res == -ENODEV) {
 				netif_device_detach(netdev);
@@ -1476,7 +1473,7 @@ static int rtl8152_enable(struct r8152 *tp)
 
 static void rtl8152_disable(struct r8152 *tp)
 {
-	struct net_device_stats *stats = rtl8152_get_stats(tp->netdev);
+	struct net_device_stats *stats = &tp->netdev->stats;
 	struct sk_buff *skb;
 	u32 ocp_data;
 	int i;
@@ -1600,8 +1597,8 @@ static void r8152b_exit_oob(struct r8152 *tp)
 
 static void r8152b_enter_oob(struct r8152 *tp)
 {
-	u32	ocp_data;
-	int	i;
+	u32 ocp_data;
+	int i;
 
 	ocp_data = ocp_read_byte(tp, MCU_TYPE_PLA, PLA_OOB_CTRL);
 	ocp_data &= ~NOW_IS_OOB;
@@ -1722,7 +1719,6 @@ static int rtl8152_set_speed(struct r8152 *tp, u8 autoneg, u16 speed, u8 duplex)
 	r8152_mdio_write(tp, MII_BMCR, bmcr);
 
 out:
-
 	return ret;
 }
 
@@ -1840,7 +1836,7 @@ static void rtl_clear_bp(struct r8152 *tp)
 
 static void r8152b_enable_eee(struct r8152 *tp)
 {
-	u32	ocp_data;
+	u32 ocp_data;
 
 	ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_EEE_CR);
 	ocp_data |= EEE_RX_EN | EEE_TX_EN;
@@ -1896,7 +1892,7 @@ static void r8152b_init(struct r8152 *tp)
 	ocp_write_word(tp, MCU_TYPE_USB, USB_UPS_CTRL, ocp_data);
 
 	ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS);
-	ocp_data &= ~RWSUME_INDICATE;
+	ocp_data &= ~RESUME_INDICATE;
 	ocp_write_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS, ocp_data);
 
 	r8152b_exit_oob(tp);
@@ -2148,7 +2144,7 @@ static void rtl8152_unload(struct r8152 *tp)
 	}
 
 	ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS);
-	ocp_data &= ~RWSUME_INDICATE;
+	ocp_data &= ~RESUME_INDICATE;
 	ocp_write_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS, ocp_data);
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 4/5] net/usb/r8152: fix incorrect type in assignment
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

Correct some declaration.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index a711025..90bc105 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -309,22 +309,22 @@ enum rtl8152_flags {
 #define MCU_TYPE_USB			0x0000
 
 struct rx_desc {
-	u32 opts1;
+	__le32 opts1;
 #define RX_LEN_MASK			0x7fff
-	u32 opts2;
-	u32 opts3;
-	u32 opts4;
-	u32 opts5;
-	u32 opts6;
+	__le32 opts2;
+	__le32 opts3;
+	__le32 opts4;
+	__le32 opts5;
+	__le32 opts6;
 };
 
 struct tx_desc {
-	u32 opts1;
+	__le32 opts1;
 #define TX_FS			(1 << 31) /* First segment of a packet */
 #define TX_LS			(1 << 30) /* Final segment of a packet */
 #define TX_LEN_MASK		0x3ffff
 
-	u32 opts2;
+	__le32 opts2;
 #define UDP_CS			(1 << 31) /* Calculate UDP/IP checksum */
 #define TCP_CS			(1 << 30) /* Calculate TCP/IP checksum */
 #define IPV4_CS			(1 << 29) /* Calculate IPv4 checksum */
@@ -878,7 +878,7 @@ static void write_bulk_callback(struct urb *urb)
 static void intr_callback(struct urb *urb)
 {
 	struct r8152 *tp;
-	__u16 *d;
+	__le16 *d;
 	int status = urb->status;
 	int res;
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 3/5] net/usb/r8152: modify the tx flow
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

Let rtl8152_start_xmit() to queue packet only, and tx_bottom() would
send all of the packets. This simplify the code and make sure all the
packet would be sent by the original order.

Support stopping and waking tx queue. The maximum tx queue length
is 60.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 52 ++++++++++---------------------------------------
 1 file changed, 10 insertions(+), 42 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 815d890..a711025 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -276,6 +276,8 @@ enum rtl_register_content {
 
 #define INTR_LINK		0x0004
 
+#define TX_QLEN			60
+
 #define RTL8152_REQT_READ	0xc0
 #define RTL8152_REQT_WRITE	0x40
 #define RTL8152_REQ_GET_REGS	0x05
@@ -1174,6 +1176,9 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
 	}
 
+	if (netif_queue_stopped(tp->netdev))
+		netif_wake_queue(tp->netdev);
+
 	usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
 			  agg->head, (int)(tx_data - (u8 *)agg->head),
 			  (usb_complete_t)write_bulk_callback, agg);
@@ -1389,53 +1394,16 @@ static netdev_tx_t rtl8152_start_xmit(struct sk_buff *skb,
 					    struct net_device *netdev)
 {
 	struct r8152 *tp = netdev_priv(netdev);
-	struct net_device_stats *stats = rtl8152_get_stats(netdev);
-	unsigned long flags;
-	struct tx_agg *agg = NULL;
-	struct tx_desc *tx_desc;
-	unsigned int len;
-	u8 *tx_data;
-	int res;
 
 	skb_tx_timestamp(skb);
 
-	/* If tx_queue is not empty, it means at least one previous packt */
-	/* is waiting for sending. Don't send current one before it.      */
-	if (skb_queue_empty(&tp->tx_queue))
-		agg = r8152_get_tx_agg(tp);
-
-	if (!agg) {
-		skb_queue_tail(&tp->tx_queue, skb);
-		return NETDEV_TX_OK;
-	}
+	skb_queue_tail(&tp->tx_queue, skb);
 
-	tx_desc = (struct tx_desc *)agg->head;
-	tx_data = agg->head + sizeof(*tx_desc);
-	agg->skb_num = agg->skb_len = 0;
+	if (skb_queue_len(&tp->tx_queue) > TX_QLEN)
+		netif_stop_queue(netdev);
 
-	len = skb->len;
-	r8152_tx_csum(tp, tx_desc, skb);
-	memcpy(tx_data, skb->data, len);
-	dev_kfree_skb_any(skb);
-	agg->skb_num++;
-	agg->skb_len += len;
-	usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
-			  agg->head, len + sizeof(*tx_desc),
-			  (usb_complete_t)write_bulk_callback, agg);
-	res = usb_submit_urb(agg->urb, GFP_ATOMIC);
-	if (res) {
-		/* Can we get/handle EPIPE here? */
-		if (res == -ENODEV) {
-			netif_device_detach(tp->netdev);
-		} else {
-			netif_warn(tp, tx_err, netdev,
-				   "failed tx_urb %d\n", res);
-			stats->tx_dropped++;
-			spin_lock_irqsave(&tp->tx_lock, flags);
-			list_add_tail(&agg->list, &tp->tx_free);
-			spin_unlock_irqrestore(&tp->tx_lock, flags);
-		}
-	}
+	if (!list_empty(&tp->tx_free))
+		tasklet_schedule(&tp->tl);
 
 	return NETDEV_TX_OK;
 }
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 2/5] net/usb/r8152: make sure the tx checksum setting is correct
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

Clear the checksum setting before checking the necessary of hardware
checksum.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 5dbfe50..815d890 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1094,6 +1094,7 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
 	memset(desc, 0, sizeof(*desc));
 
 	desc->opts1 = cpu_to_le32((skb->len & TX_LEN_MASK) | TX_FS | TX_LS);
+	desc->opts2 = 0;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		__be16 protocol;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 1/5] net/usb/r8152: fix tx/rx memory overflow
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

The tx/rx would access the memory which is out of the desired range.
Modify the method of checking the end of the memory to avoid it.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f3fce41..5dbfe50 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -24,7 +24,7 @@
 #include <linux/ipv6.h>
 
 /* Version Information */
-#define DRIVER_VERSION "v1.01.0 (2013/08/12)"
+#define DRIVER_VERSION "v1.02.0 (2013/10/28)"
 #define DRIVER_AUTHOR "Realtek linux nic maintainers <nic_swsd@realtek.com>"
 #define DRIVER_DESC "Realtek RTL8152 Based USB 2.0 Ethernet Adapters"
 #define MODULENAME "r8152"
@@ -1136,14 +1136,14 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
 
 static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 {
-	u32 remain;
+	int remain;
 	u8 *tx_data;
 
 	tx_data = agg->head;
 	agg->skb_num = agg->skb_len = 0;
-	remain = rx_buf_sz - sizeof(struct tx_desc);
+	remain = rx_buf_sz;
 
-	while (remain >= ETH_ZLEN) {
+	while (remain >= ETH_ZLEN + sizeof(struct tx_desc)) {
 		struct tx_desc *tx_desc;
 		struct sk_buff *skb;
 		unsigned int len;
@@ -1152,12 +1152,14 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		if (!skb)
 			break;
 
+		remain -= sizeof(*tx_desc);
 		len = skb->len;
 		if (remain < len) {
 			skb_queue_head(&tp->tx_queue, skb);
 			break;
 		}
 
+		tx_data = tx_agg_align(tx_data);
 		tx_desc = (struct tx_desc *)tx_data;
 		tx_data += sizeof(*tx_desc);
 
@@ -1167,9 +1169,8 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		agg->skb_len += len;
 		dev_kfree_skb_any(skb);
 
-		tx_data = tx_agg_align(tx_data + len);
-		remain = rx_buf_sz - sizeof(*tx_desc) -
-			 (u32)((void *)tx_data - agg->head);
+		tx_data += len;
+		remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
 	}
 
 	usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
@@ -1188,7 +1189,6 @@ static void rx_bottom(struct r8152 *tp)
 	list_for_each_safe(cursor, next, &tp->rx_done) {
 		struct rx_desc *rx_desc;
 		struct rx_agg *agg;
-		unsigned pkt_len;
 		int len_used = 0;
 		struct urb *urb;
 		u8 *rx_data;
@@ -1204,17 +1204,22 @@ static void rx_bottom(struct r8152 *tp)
 
 		rx_desc = agg->head;
 		rx_data = agg->head;
-		pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
-		len_used += sizeof(struct rx_desc) + pkt_len;
+		len_used += sizeof(struct rx_desc);
 
-		while (urb->actual_length >= len_used) {
+		while (urb->actual_length > len_used) {
 			struct net_device *netdev = tp->netdev;
 			struct net_device_stats *stats;
+			unsigned pkt_len;
 			struct sk_buff *skb;
 
+			pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
 			if (pkt_len < ETH_ZLEN)
 				break;
 
+			len_used += pkt_len;
+			if (urb->actual_length < len_used)
+				break;
+
 			stats = rtl8152_get_stats(netdev);
 
 			pkt_len -= 4; /* CRC */
@@ -1234,9 +1239,8 @@ static void rx_bottom(struct r8152 *tp)
 
 			rx_data = rx_agg_align(rx_data + pkt_len + 4);
 			rx_desc = (struct rx_desc *)rx_data;
-			pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
 			len_used = (int)(rx_data - (u8 *)agg->head);
-			len_used += sizeof(struct rx_desc) + pkt_len;
+			len_used += sizeof(struct rx_desc);
 		}
 
 submit:
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 0/5] r8152 bug fixes
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang

These could fix some driver issues.

Hayes Wang (5):
  net/usb/r8152: fix tx/rx memory overflow
  net/usb/r8152: make sure the tx checksum setting is correct
  net/usb/r8152: modify the tx flow
  net/usb/r8152: fix incorrect type in assignment
  net/usb/r8152: code adjust

 drivers/net/usb/r8152.c | 145 +++++++++++++++++++-----------------------------
 1 file changed, 57 insertions(+), 88 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Bug in skb_segment: fskb->len != len
From: Christoph Paasch @ 2013-10-28 11:55 UTC (permalink / raw)
  To: Eric Dumazet, Herbert Xu; +Cc: netdev

Hello,

I have been seeing the below BUG in skb_segment with the latest net-next
head on my router.

I am forwarding Multipath TCP-traffic on this router. The MPTCP-sender is simply
doing an iperf-session. Strangely, I cannot reproduce the bug when sending
regular TCP-traffic across the router.
Note: The crash happens on a vanilla net-next kernel. It does not has any
MPTCP-code in it.

I bisected it down to 8a29111c7c (net: gro: allow to build full sized skb),
but I guess 8a29111c7c is just revealing a more fundamental bug in skb_segment.

Some info I found:
In skb_segment, when the bug happens, fskb->len is 4284 but the mss and len is 1428.
Shortly before the bug happens, skb_gro_receive is building a packet where
lp->len is equal to 4284 inside the frag_list.


Seems like skb_segment cannot handle those bigger skb's in the frag_list.


Cheers,
Christoph


Here the crash-dump:

[  399.832854] ------------[ cut here ]------------
[  399.888048] kernel BUG at /home/cpaasch/builder/net-next/net/core/skbuff.c:2796!
[  399.976504] invalid opcode: 0000 [#1] SMP 
[  400.025675] Modules linked in:
[  400.062270] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.12.0-rc6-mptcp #231
[  400.145531] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
[  400.243342] task: ffff88042d8a4680 ti: ffff88042d8ce000 task.ti: ffff88042d8ce000
[  400.332841] RIP: 0010:[<ffffffff81447d21>]  [<ffffffff81447d21>] skb_segment+0x1aa/0x5fa
[  400.429722] RSP: 0018:ffff88043fd03770  EFLAGS: 00010212
[  400.493231] RAX: 0000000000000594 RBX: ffff8800ba89ac00 RCX: 00000000000064be
[  400.578574] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff8804273a7080
[  400.663918] RBP: ffff88043fd03820 R08: 0000000000000000 R09: ffff88042c4d4600
[  400.749259] R10: 0000000000010000 R11: ffff88042d801900 R12: ffff88042c7ca000
[  400.834596] R13: ffff88042c5d5400 R14: 0000000000001650 R15: 0000000000000056
[  400.919934] FS:  0000000000000000(0000) GS:ffff88043fd00000(0000) knlGS:0000000000000000
[  401.016711] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  401.085422] CR2: ffffffffff600400 CR3: 000000042c86b000 CR4: 00000000000007e0
[  401.170765] Stack:
[  401.194780]  ffff88042d94e900 ffff88042c4d46f0 0000000000000000 0000000000000042
[  401.283663]  0100000000000000 0000000000000001 0000001100000594 0000000000000056
[  401.372555]  0000000000000000 0000004200000098 ffffffffffffffaa 0000001100000001
[  401.461445] Call Trace:
[  401.490658]  <IRQ> 
[  401.513631]  [<ffffffff8149b077>] tcp_gso_segment+0x168/0x395
[  401.584644]  [<ffffffff814a5ba1>] inet_gso_segment+0x175/0x2a9
[  401.654396]  [<ffffffff8144fb40>] skb_mac_gso_segment+0x10a/0x16a
[  401.727264]  [<ffffffff81451062>] __skb_gso_segment+0xaf/0xb4
[  401.795977]  [<ffffffff814515ae>] dev_hard_start_xmit+0x215/0x40a
[  401.868846]  [<ffffffff814689ed>] sch_direct_xmit+0x6b/0x195
[  401.936519]  [<ffffffff81451988>] dev_queue_xmit+0x1e5/0x3ac
[  402.004193]  [<ffffffff814b6461>] ? iptable_filter_hook+0x41/0x4c
[  402.077061]  [<ffffffff8148039d>] ip_finish_output+0x2f6/0x351
[  402.146812]  [<ffffffff8147c6dc>] ? ip_frag_mem+0x34/0x34
[  402.211366]  [<ffffffff81480470>] ip_output+0x78/0x7f
[  402.271765]  [<ffffffff8147c71c>] ip_forward_finish+0x40/0x44
[  402.340475]  [<ffffffff8147c9c5>] ip_forward+0x2a5/0x300
[  402.403993]  [<ffffffff8147b104>] ip_rcv_finish+0x214/0x22c
[  402.470625]  [<ffffffff8147b3cd>] ip_rcv+0x2b1/0x2e9
[  402.529983]  [<ffffffff81446a19>] ? skb_gro_receive+0x562/0x582
[  402.600773]  [<ffffffff8144dcd8>] __netif_receive_skb_core+0x49a/0x4cd
[  402.678840]  [<ffffffff8144dd60>] __netif_receive_skb+0x55/0x5a
[  402.749631]  [<ffffffff81450190>] netif_receive_skb+0x71/0x78
[  402.818344]  [<ffffffff8149af07>] ? tcp4_gro_receive+0xf4/0xfc
[  402.888095]  [<ffffffff81450249>] napi_gro_complete+0xb2/0xba
[  402.956808]  [<ffffffff8145045f>] dev_gro_receive+0x20e/0x34d
[  403.025519]  [<ffffffff81450ae5>] napi_gro_receive+0x92/0xf1
[  403.093195]  [<ffffffff813acfe2>] netxen_process_rcv_ring+0x1b0/0x767
[  403.170222]  [<ffffffff810b3ae8>] ? kmem_cache_free+0xef/0xf3
[  403.238931]  [<ffffffff81450fb1>] ? dev_kfree_skb_any+0x2e/0x30
[  403.309723]  [<ffffffff813acc42>] ? netxen_process_cmd_ring+0x33/0x223
[  403.387790]  [<ffffffff813a8f70>] netxen_nic_poll+0x35/0x9a
[  403.454423]  [<ffffffff814506dc>] net_rx_action+0xa7/0x1d2
[  403.520017]  [<ffffffff8103605d>] __do_softirq+0xbd/0x17e
[  403.584572]  [<ffffffff815289bc>] call_softirq+0x1c/0x26
[  403.648085]  [<ffffffff81003bbb>] do_softirq+0x33/0x68
[  403.709523]  [<ffffffff81035efb>] irq_exit+0x40/0x4e
[  403.768880]  [<ffffffff81003423>] do_IRQ+0x98/0xaf
[  403.826158]  [<ffffffff8152716a>] common_interrupt+0x6a/0x6a
[  403.893829]  <EOI> 
[  403.916800]  [<ffffffff8100933d>] ? default_idle+0x6/0x8
[  403.982604]  [<ffffffff81009542>] arch_cpu_idle+0x13/0x18
[  404.047159]  [<ffffffff8105ea2b>] cpu_startup_entry+0xa4/0xf1
[  404.115873]  [<ffffffff8102320b>] start_secondary+0x1b2/0x1b7
[  404.184582] Code: bd 7f ff ff ff 00 74 04 44 8b 75 c0 45 85 f6 0f 85 e5 00 00 00 8b 75 84 39 75 ac 0f 8c d9 00 00 00 45 8b 75 68 44 3b 75 c0 74 04 <0f> 0b eb fe 4c 89 ef be 20 00 00 00 e8 08 f1 ff ff 48 85 c0 48 
[  404.417106] RIP  [<ffffffff81447d21>] skb_segment+0x1aa/0x5fa
[  404.485928]  RSP <ffff88043fd03770>
[  404.527614] ---[ end trace 32152a68c7bdc3ac ]---

^ permalink raw reply

* Re: [Xen-devel] [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: David Vrabel @ 2013-10-28 11:53 UTC (permalink / raw)
  To: Wei Liu, xen-devel, netdev
  Cc: Ian Campbell, annie.li, david.vrabel, jbeulich, Jason Luan
In-Reply-To: <1382960117-13053-1-git-send-email-wei.liu2@citrix.com>

On 28/10/2013 11:35, Wei Liu wrote:
> time_after_eq() only works if the delta is < MAX_ULONG/2.
> 
> For a 32bit Dom0, if netfront sends packets at a very low rate, the time
> between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
> and the test for timer_after_eq() will be incorrect. Credit will not be
> replenished and the guest may become unable to send packets (e.g., if
> prior to the long gap, all credit was exhausted).
> 
> Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply

* Re: [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: Jan Beulich @ 2013-10-28 11:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: david.vrabel, Ian Campbell, xen-devel, annie.li, Jason Luan,
	netdev
In-Reply-To: <1382960117-13053-1-git-send-email-wei.liu2@citrix.com>

>>> On 28.10.13 at 12:35, Wei Liu <wei.liu2@citrix.com> wrote:

Two formal things:

> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1185,18 +1185,17 @@ out:
>  
>  static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>  {
> -	unsigned long now = jiffies;
> -	unsigned long next_credit =
> -		vif->credit_timeout.expires +
> -		msecs_to_jiffies(vif->credit_usec / 1000);
> +	u64 now = get_jiffies_64();
> +	u64 next_credit = vif->credit_window_start +
> +		(u64)msecs_to_jiffies(vif->credit_usec / 1000);

The cast to u64 seems pointless here.

> @@ -1207,7 +1206,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>  		vif->credit_timeout.function =
>  			tx_credit_callback;
>  		mod_timer(&vif->credit_timeout,
> -			  next_credit);
> +			  (unsigned long)next_credit);

And the cast here seems pointless too (gcc doesn't warn about
truncations).

Jan

^ permalink raw reply

* [PATCH iproute2] tc: add cls_bpf frontend
From: Daniel Borkmann @ 2013-10-28 11:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Thomas Graf

This is the iproute2 part of the kernel patch "net: sched:
add BPF-based traffic classifier".

[Will re-submit later again for iproute2 when window for
 -next submissions opens.]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/linux/pkt_cls.h |  14 +++
 tc/Makefile             |   1 +
 tc/f_bpf.c              | 288 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 303 insertions(+)
 create mode 100644 tc/f_bpf.c

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 082eafa..25731df 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -388,6 +388,20 @@ enum {
 
 #define TCA_CGROUP_MAX (__TCA_CGROUP_MAX - 1)
 
+/* BPF classifier */
+
+enum {
+	TCA_BPF_UNSPEC,
+	TCA_BPF_ACT,
+	TCA_BPF_POLICE,
+	TCA_BPF_CLASSID,
+	TCA_BPF_OPS_LEN,
+	TCA_BPF_OPS,
+	__TCA_BPF_MAX,
+};
+
+#define TCA_BPF_MAX (__TCA_BPF_MAX - 1)
+
 /* Extended Matches */
 
 struct tcf_ematch_tree_hdr {
diff --git a/tc/Makefile b/tc/Makefile
index f54a955..84215c0 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -22,6 +22,7 @@ TCMODULES += f_u32.o
 TCMODULES += f_route.o
 TCMODULES += f_fw.o
 TCMODULES += f_basic.o
+TCMODULES += f_bpf.o
 TCMODULES += f_flow.o
 TCMODULES += f_cgroup.o
 TCMODULES += q_dsmark.o
diff --git a/tc/f_bpf.c b/tc/f_bpf.c
new file mode 100644
index 0000000..d52d7d8
--- /dev/null
+++ b/tc/f_bpf.c
@@ -0,0 +1,288 @@
+/*
+ * f_bpf.c	BPF-based Classifier
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Daniel Borkmann <dborkman@redhat.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/filter.h>
+#include <linux/if.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... bpf ...\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, " [inline]:     run bytecode BPF_BYTECODE\n");
+	fprintf(stderr, " [from file]:  run bytecode-file FILE\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "               [ police POLICE_SPEC ] [ action ACTION_SPEC ]\n");
+	fprintf(stderr, "               [ classid CLASSID ]\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "Where BPF_BYTECODE := \'s,c t f k,c t f k,c t f k,...\'\n");
+	fprintf(stderr, "      c,t,f,k and s are decimals; s denotes number of 4-tuples\n");
+	fprintf(stderr, "Where FILE points to a file containing the BPF_BYTECODE string\n");
+	fprintf(stderr, "\nNOTE: CLASSID is parsed as hexadecimal input.\n");
+}
+
+static int bpf_parse_string(char *arg, bool from_file, __u16 *bpf_len,
+			    char **bpf_string, bool *need_release,
+			    const char separator)
+{
+	char sp;
+
+	if (from_file) {
+		size_t tmp_len, op_len = sizeof("65535 255 255 4294967295,");
+		char *tmp_string;
+		FILE *fp;
+
+		tmp_len = sizeof("4096,") + BPF_MAXINSNS * op_len;
+		tmp_string = malloc(tmp_len);
+		if (tmp_string == NULL)
+			return -ENOMEM;
+
+		memset(tmp_string, 0, tmp_len);
+
+		fp = fopen(arg, "r");
+		if (fp == NULL) {
+			perror("Cannot fopen");
+			free(tmp_string);
+			return -ENOENT;
+		}
+
+		if (!fgets(tmp_string, tmp_len, fp)) {
+			free(tmp_string);
+			fclose(fp);
+			return -EIO;
+		}
+
+		fclose(fp);
+
+		*need_release = true;
+		*bpf_string = tmp_string;
+	} else {
+		*need_release = false;
+		*bpf_string = arg;
+	}
+
+	if (sscanf(*bpf_string, "%hu%c", bpf_len, &sp) != 2 ||
+	    sp != separator) {
+		if (*need_release)
+			free(*bpf_string);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int bpf_parse_ops(int argc, char **argv, struct nlmsghdr *n,
+			 bool from_file)
+{
+	char *bpf_string, *token, separator = ',';
+	struct sock_filter bpf_ops[BPF_MAXINSNS];
+	int ret = 0, i = 0;
+	bool need_release;
+	__u16 bpf_len = 0;
+
+	if (argc < 1)
+		return -EINVAL;
+	if (bpf_parse_string(argv[0], from_file, &bpf_len, &bpf_string,
+			     &need_release, separator))
+		return -EINVAL;
+	if (bpf_len == 0 || bpf_len > BPF_MAXINSNS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	token = bpf_string;
+	while ((token = strchr(token, separator)) && (++token)[0]) {
+		if (i >= bpf_len) {
+			fprintf(stderr, "Real program length exceeds encoded "
+				"length parameter!\n");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (sscanf(token, "%hu %hhu %hhu %u,",
+			   &bpf_ops[i].code, &bpf_ops[i].jt,
+			   &bpf_ops[i].jf, &bpf_ops[i].k) != 4) {
+			fprintf(stderr, "Error at instruction %d!\n", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		i++;
+	}
+
+	if (i != bpf_len) {
+		fprintf(stderr, "Parsed program length is less than encoded"
+			"length parameter!\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	addattr_l(n, MAX_MSG, TCA_BPF_OPS_LEN, &bpf_len, sizeof(bpf_len));
+	addattr_l(n, MAX_MSG, TCA_BPF_OPS, &bpf_ops,
+		  bpf_len * sizeof(struct sock_filter));
+out:
+	if (need_release)
+		free(bpf_string);
+
+	return ret;
+}
+
+static void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len)
+{
+	struct sock_filter *ops = (struct sock_filter *) RTA_DATA(bpf_ops);
+	int i;
+
+	if (len == 0)
+		return;
+
+	fprintf(f, "bytecode \'%u,", len);
+
+	for (i = 0; i < len - 1; i++)
+		fprintf(f, "%hu %hhu %hhu %u,", ops[i].code, ops[i].jt,
+			ops[i].jf, ops[i].k);
+
+	fprintf(f, "%hu %hhu %hhu %u\'\n", ops[i].code, ops[i].jt,
+		ops[i].jf, ops[i].k);
+}
+
+static int bpf_parse_opt(struct filter_util *qu, char *handle,
+			 int argc, char **argv, struct nlmsghdr *n)
+{
+	struct tcmsg *t = NLMSG_DATA(n);
+	struct rtattr *tail;
+	long h = 0;
+
+	if (argc == 0)
+		return 0;
+
+	if (handle) {
+		h = strtol(handle, NULL, 0);
+		if (h == LONG_MIN || h == LONG_MAX) {
+			fprintf(stderr, "Illegal handle \"%s\", must be "
+				"numeric.\n", handle);
+			return -1;
+		}
+	}
+
+	t->tcm_handle = h;
+
+	tail = (struct rtattr*)(((void*)n)+NLMSG_ALIGN(n->nlmsg_len));
+	addattr_l(n, MAX_MSG, TCA_OPTIONS, NULL, 0);
+
+	while (argc > 0) {
+		if (matches(*argv, "run") == 0) {
+			bool from_file;
+			NEXT_ARG();
+			if (strcmp(*argv, "bytecode-file") == 0) {
+				from_file = true;
+			} else if (strcmp(*argv, "bytecode") == 0) {
+				from_file = false;
+			} else {
+				fprintf(stderr, "What is \"%s\"?\n", *argv);
+				explain();
+				return -1;
+			}
+			NEXT_ARG();
+			if (bpf_parse_ops(argc, argv, n, from_file)) {
+				fprintf(stderr, "Illegal \"bytecode\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "classid") == 0 ||
+			   strcmp(*argv, "flowid") == 0) {
+			unsigned handle;
+			NEXT_ARG();
+			if (get_tc_classid(&handle, *argv)) {
+				fprintf(stderr, "Illegal \"classid\"\n");
+				return -1;
+			}
+			addattr_l(n, MAX_MSG, TCA_BPF_CLASSID, &handle, 4);
+		} else if (matches(*argv, "action") == 0) {
+			NEXT_ARG();
+			if (parse_action(&argc, &argv, TCA_BPF_ACT, n)) {
+				fprintf(stderr, "Illegal \"action\"\n");
+				return -1;
+			}
+			continue;
+		} else if (matches(*argv, "police") == 0) {
+			NEXT_ARG();
+			if (parse_police(&argc, &argv, TCA_BPF_POLICE, n)) {
+				fprintf(stderr, "Illegal \"police\"\n");
+				return -1;
+			}
+			continue;
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail->rta_len = (((void*)n)+n->nlmsg_len) - (void*)tail;
+	return 0;
+}
+
+static int bpf_print_opt(struct filter_util *qu, FILE *f,
+			 struct rtattr *opt, __u32 handle)
+{
+	struct rtattr *tb[TCA_BPF_MAX + 1];
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_BPF_MAX, opt);
+
+	if (handle)
+		fprintf(f, "handle 0x%x ", handle);
+
+	if (tb[TCA_BPF_CLASSID]) {
+		SPRINT_BUF(b1);
+		fprintf(f, "flowid %s ",
+			sprint_tc_classid(rta_getattr_u32(tb[TCA_BPF_CLASSID]), b1));
+	}
+
+	if (tb[TCA_BPF_OPS] && tb[TCA_BPF_OPS_LEN])
+		bpf_print_ops(f, tb[TCA_BPF_OPS],
+			      rta_getattr_u16(tb[TCA_BPF_OPS_LEN]));
+
+	if (tb[TCA_BPF_POLICE]) {
+		fprintf(f, "\n");
+		tc_print_police(f, tb[TCA_BPF_POLICE]);
+	}
+
+	if (tb[TCA_BPF_ACT]) {
+		tc_print_action(f, tb[TCA_BPF_ACT]);
+	}
+
+	return 0;
+}
+
+struct filter_util bpf_filter_util = {
+	.id = "bpf",
+	.parse_fopt = bpf_parse_opt,
+	.print_fopt = bpf_print_opt,
+};
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next] net: sched: cls_bpf: add BPF-based classifier
From: Daniel Borkmann @ 2013-10-28 11:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Thomas Graf

This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:
non-zero for match, zero for mismatch.

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/uapi/linux/pkt_cls.h |  14 ++
 net/sched/Kconfig            |  10 ++
 net/sched/Makefile           |   1 +
 net/sched/cls_bpf.c          | 380 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 405 insertions(+)
 create mode 100644 net/sched/cls_bpf.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 082eafa..25731df 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -388,6 +388,20 @@ enum {
 
 #define TCA_CGROUP_MAX (__TCA_CGROUP_MAX - 1)
 
+/* BPF classifier */
+
+enum {
+	TCA_BPF_UNSPEC,
+	TCA_BPF_ACT,
+	TCA_BPF_POLICE,
+	TCA_BPF_CLASSID,
+	TCA_BPF_OPS_LEN,
+	TCA_BPF_OPS,
+	__TCA_BPF_MAX,
+};
+
+#define TCA_BPF_MAX (__TCA_BPF_MAX - 1)
+
 /* Extended Matches */
 
 struct tcf_ematch_tree_hdr {
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c03a32a..989464c 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -443,6 +443,16 @@ config NET_CLS_CGROUP
 	  To compile this code as a module, choose M here: the
 	  module will be called cls_cgroup.
 
+config NET_CLS_BPF
+	tristate "BPF-based classifier"
+	select NET_CLS
+	---help---
+	  If you say Y here, you will be able to classify packets based on
+	  programmable BPF (JIT'ed) filters as an alternative to ematches.
+
+	  To compile this code as a module, choose M here: the module will
+	  be called cls_bpf.
+
 config NET_EMATCH
 	bool "Extended Matches"
 	select NET_CLS
diff --git a/net/sched/Makefile b/net/sched/Makefile
index e5f9abe..35fa47a 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_NET_CLS_RSVP6)	+= cls_rsvp6.o
 obj-$(CONFIG_NET_CLS_BASIC)	+= cls_basic.o
 obj-$(CONFIG_NET_CLS_FLOW)	+= cls_flow.o
 obj-$(CONFIG_NET_CLS_CGROUP)	+= cls_cgroup.o
+obj-$(CONFIG_NET_CLS_BPF)	+= cls_bpf.o
 obj-$(CONFIG_NET_EMATCH)	+= ematch.o
 obj-$(CONFIG_NET_EMATCH_CMP)	+= em_cmp.o
 obj-$(CONFIG_NET_EMATCH_NBYTE)	+= em_nbyte.o
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
new file mode 100644
index 0000000..778ca86
--- /dev/null
+++ b/net/sched/cls_bpf.c
@@ -0,0 +1,380 @@
+/*
+ * Berkeley Packet Filter based traffic classifier
+ *
+ * Might be used to classify traffic through flexible, user-defined and
+ * possibly JIT-ed BPF filters for traffic control as an alternative to
+ * ematches.
+ *
+ * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/filter.h>
+#include <net/rtnetlink.h>
+#include <net/pkt_cls.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
+MODULE_DESCRIPTION("TC BPF based classifier");
+
+struct cls_bpf_head {
+	struct list_head plist;
+	u32 hgen;
+};
+
+struct cls_bpf_prog {
+	struct sk_filter *filter;
+	struct sock_filter *bpf_ops;
+	struct tcf_exts exts;
+	struct tcf_result res;
+	struct list_head link;
+	u32 handle;
+	u16 bpf_len;
+};
+
+static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
+	[TCA_BPF_CLASSID]	= { .type = NLA_U32 },
+	[TCA_BPF_OPS_LEN]	= { .type = NLA_U16 },
+	[TCA_BPF_OPS]		= { .type = NLA_BINARY,
+				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
+};
+
+static const struct tcf_ext_map bpf_ext_map = {
+	.action = TCA_BPF_ACT,
+	.police = TCA_BPF_POLICE,
+};
+
+static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+			    struct tcf_result *res)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	int ret;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (SK_RUN_FILTER(prog->filter, skb) == 0)
+			continue;
+
+		*res = prog->res;
+		ret = tcf_exts_exec(skb, &prog->exts, res);
+		if (ret < 0)
+			continue;
+
+		return ret;
+	}
+
+	return -1;
+}
+
+static int cls_bpf_init(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (head == NULL)
+		return -ENOBUFS;
+
+	INIT_LIST_HEAD(&head->plist);
+	tp->root = head;
+
+	return 0;
+}
+
+static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
+{
+	tcf_unbind_filter(tp, &prog->res);
+	tcf_exts_destroy(tp, &prog->exts);
+
+	sk_unattached_filter_destroy(prog->filter);
+
+	kfree(prog->bpf_ops);
+	kfree(prog);
+}
+
+static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog == todel) {
+			tcf_tree_lock(tp);
+			list_del(&prog->link);
+			tcf_tree_unlock(tp);
+
+			cls_bpf_delete_prog(tp, prog);
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static void cls_bpf_destroy(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *tmp;
+
+	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
+		list_del(&prog->link);
+		cls_bpf_delete_prog(tp, prog);
+	}
+
+	kfree(head);
+}
+
+static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	unsigned long ret = 0UL;
+
+	if (head == NULL)
+		return 0UL;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog->handle == handle) {
+			ret = (unsigned long) prog;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void cls_bpf_put(struct tcf_proto *tp, unsigned long f)
+{
+}
+
+static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
+				   struct cls_bpf_prog *prog,
+				   unsigned long base, struct nlattr **tb,
+				   struct nlattr *est)
+{
+	struct sock_filter *bpf_ops, *bpf_old;
+	struct tcf_exts exts;
+	struct sock_fprog tmp;
+	struct sk_filter *fp, *fp_old;
+	u16 bpf_size, bpf_len;
+	u32 classid;
+	int ret;
+
+	if (!tb[TCA_BPF_OPS_LEN] || !tb[TCA_BPF_OPS] || !tb[TCA_BPF_CLASSID])
+		return -EINVAL;
+
+	ret = tcf_exts_validate(net, tp, tb, est, &exts, &bpf_ext_map);
+	if (ret < 0)
+		return ret;
+
+	classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
+	bpf_len = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
+	if (bpf_len > BPF_MAXINSNS || bpf_len == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	bpf_size = bpf_len * sizeof(*bpf_ops);
+	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
+	if (bpf_ops == NULL) {
+		ret = -ENOMEM;
+		goto errout;
+	}
+
+	memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size);
+
+	tmp.len = bpf_len;
+	tmp.filter = (struct sock_filter __user *) bpf_ops;
+
+	ret = sk_unattached_filter_create(&fp, &tmp);
+	if (ret)
+		goto errout_free;
+
+	tcf_tree_lock(tp);
+	fp_old = prog->filter;
+	bpf_old = prog->bpf_ops;
+
+	prog->bpf_len = bpf_len;
+	prog->bpf_ops = bpf_ops;
+	prog->filter = fp;
+	prog->res.classid = classid;
+	tcf_tree_unlock(tp);
+
+	tcf_bind_filter(tp, &prog->res, base);
+	tcf_exts_change(tp, &prog->exts, &exts);
+
+	if (fp_old)
+		sk_unattached_filter_destroy(fp_old);
+	if (bpf_old)
+		kfree(bpf_old);
+
+	return 0;
+
+errout_free:
+	kfree(bpf_ops);
+errout:
+	tcf_exts_destroy(tp, &exts);
+	return ret;
+}
+
+static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp,
+				   struct cls_bpf_head *head)
+{
+	unsigned int i = 0x80000000;
+
+	do {
+		if (++head->hgen == 0x7FFFFFFF)
+			head->hgen = 1;
+	} while (--i > 0 && cls_bpf_get(tp, head->hgen));
+	if (i == 0)
+		pr_err("Insufficient number of handles\n");
+
+	return i;
+}
+
+static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
+			  struct tcf_proto *tp, unsigned long base,
+			  u32 handle, struct nlattr **tca,
+			  unsigned long *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg;
+	struct nlattr *tb[TCA_BPF_MAX + 1];
+	int ret;
+
+	if (tca[TCA_OPTIONS] == NULL)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_BPF_MAX, tca[TCA_OPTIONS], bpf_policy);
+	if (ret < 0)
+		return ret;
+
+	if (prog != NULL) {
+		if (handle && prog->handle != handle)
+			return -EINVAL;
+		return cls_bpf_modify_existing(net, tp, prog, base, tb,
+					       tca[TCA_RATE]);
+	}
+
+	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
+	if (prog == NULL)
+		return -ENOBUFS;
+
+	if (handle == 0)
+		prog->handle = cls_bpf_grab_new_handle(tp, head);
+	else
+		prog->handle = handle;
+	if (prog->handle == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	ret = cls_bpf_modify_existing(net, tp, prog, base, tb, tca[TCA_RATE]);
+	if (ret < 0)
+		goto errout;
+
+	tcf_tree_lock(tp);
+	list_add(&prog->link, &head->plist);
+	tcf_tree_unlock(tp);
+
+	*arg = (unsigned long) prog;
+
+	return 0;
+errout:
+	if (*arg == 0UL && prog)
+		kfree(prog);
+
+	return ret;
+}
+
+static int cls_bpf_dump(struct tcf_proto *tp, unsigned long fh,
+			struct sk_buff *skb, struct tcmsg *tm)
+{
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh;
+	struct nlattr *nest, *nla;
+
+	if (prog == NULL)
+		return skb->len;
+
+	tm->tcm_handle = prog->handle;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (nest == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
+		goto nla_put_failure;
+	if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_len))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_len *
+			  sizeof(struct sock_filter));
+	if (nla == NULL)
+		goto nla_put_failure;
+
+        memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla));
+
+	if (tcf_exts_dump(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest);
+
+	if (tcf_exts_dump_stats(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (arg->count < arg->skip)
+			goto skip;
+		if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+skip:
+		arg->count++;
+	}
+}
+
+static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
+	.kind		=	"bpf",
+	.owner		=	THIS_MODULE,
+	.classify	=	cls_bpf_classify,
+	.init		=	cls_bpf_init,
+	.destroy	=	cls_bpf_destroy,
+	.get		=	cls_bpf_get,
+	.put		=	cls_bpf_put,
+	.change		=	cls_bpf_change,
+	.delete		=	cls_bpf_delete,
+	.walk		=	cls_bpf_walk,
+	.dump		=	cls_bpf_dump,
+};
+
+static int __init cls_bpf_init_mod(void)
+{
+	return register_tcf_proto_ops(&cls_bpf_ops);
+}
+
+static void __exit cls_bpf_exit_mod(void)
+{
+	unregister_tcf_proto_ops(&cls_bpf_ops);
+}
+
+module_init(cls_bpf_init_mod);
+module_exit(cls_bpf_exit_mod);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 11:35 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: david.vrabel, jbeulich, annie.li, Wei Liu, Ian Campbell,
	Jason Luan

time_after_eq() only works if the delta is < MAX_ULONG/2.

For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).

Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jason Luan <jianhai.luan@oracle.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    3 +--
 drivers/net/xen-netback/netback.c   |   14 +++++++-------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..400fea1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -163,6 +163,7 @@ struct xenvif {
 	unsigned long   credit_usec;
 	unsigned long   remaining_credit;
 	struct timer_list credit_timeout;
+	u64 credit_window_start;
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..459935a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -312,8 +312,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
 	init_timer(&vif->credit_timeout);
-	/* Initialize 'expires' now: it's used to track the credit window. */
-	vif->credit_timeout.expires = jiffies;
+	vif->credit_window_start = get_jiffies_64();
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..8644aca 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1185,18 +1185,17 @@ out:
 
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
-	unsigned long now = jiffies;
-	unsigned long next_credit =
-		vif->credit_timeout.expires +
-		msecs_to_jiffies(vif->credit_usec / 1000);
+	u64 now = get_jiffies_64();
+	u64 next_credit = vif->credit_window_start +
+		(u64)msecs_to_jiffies(vif->credit_usec / 1000);
 
 	/* Timer could already be pending in rare cases. */
 	if (timer_pending(&vif->credit_timeout))
 		return true;
 
 	/* Passed the point where we can replenish credit? */
-	if (time_after_eq(now, next_credit)) {
-		vif->credit_timeout.expires = now;
+	if (time_after_eq64(now, next_credit)) {
+		vif->credit_window_start = now;
 		tx_add_credit(vif);
 	}
 
@@ -1207,7 +1206,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 		vif->credit_timeout.function =
 			tx_credit_callback;
 		mod_timer(&vif->credit_timeout,
-			  next_credit);
+			  (unsigned long)next_credit);
+		vif->credit_window_start = next_credit;
 
 		return true;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* Re: Big performance loss from 3.4.63 to 3.10.13 when routing ipv4
From: Steffen Klassert @ 2013-10-28 11:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Wolfgang Walter, David Miller, hannes, netdev, klassert
In-Reply-To: <1382941051.13037.13.camel@edumazet-glaptop.roam.corp.google.com>

On Sun, Oct 27, 2013 at 11:17:31PM -0700, Eric Dumazet wrote:
> On Fri, 2013-10-25 at 11:20 +0200, Steffen Klassert wrote:
> > On Fri, Oct 25, 2013 at 01:50:28AM -0700, Eric Dumazet wrote:
> > > 
> > > 32768 as the default seems fine to me
> > > 
> > > 448 bytes per dst -> thats less than 30 Mbytes of memory if we hit 65536
> > > dst.
> > > 
> > 
> > Ok, I'll add the patch below to to the ipsec tree if everyone is fine
> > with that threshold.
> > 
> > Subject: [PATCH RFC] xfrm: Increase the garbage collector threshold
> > 
> > With the removal of the routing cache, we lost the
> > option to tweak the garbage collector threshold
> > along with the maximum routing cache size. So git
> > commit 703fb94ec ("xfrm: Fix the gc threshold value
> > for ipv4") moved back to a static threshold.
> > 
> > It turned out that the current threshold before we
> > start garbage collecting is much to small for some
> > workloads, so increase it from 1024 to 32768. This
> > means that we start the garbage collector if we have
> > more than 32768 dst entries in the system and refuse
> > new allocations if we are above 65536.
> > 
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> > ---
> 
> Sure please add :
> 
> Reported-by: Wolfgang Walter <linux@stwm.de>

Done.

Now applied to the ipsec tree, thanks everyone!

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 11:30 UTC (permalink / raw)
  To: annie li
  Cc: Wei Liu, Ian Campbell, netdev, xen-devel, david.vrabel, jbeulich,
	Jason Luan
In-Reply-To: <526DD2D8.5030405@oracle.com>

On Mon, Oct 28, 2013 at 10:58:32AM +0800, annie li wrote:
[...]
> >>      if (timer_pending(&vif->credit_timeout))
> >>          return true;
> >>        /* Passed the point where we can replenish credit? */
> >>-    if (time_after_eq(now, next_credit)) {
> >>-        vif->credit_timeout.expires = now;
> >>+    if (time_after_eq64(now, next_credit)) {
> >>+        vif->credit_timeout.expires = (unsigned long)now;
> >
> >updates credit_window_start as following,
> >vif->credit_window_start = (unsigned long)now;
> 
> both credit_window_start and credit_timeout.expires need to be
> updated here,
> 
> vif->credit_window_start = (unsigned long)now;
> vif->credit_timeout.expires = (unsigned long)now;
> 

IMHO we don't need to update .expires anymore -- we now track the window
with another variable.

Wei.

^ permalink raw reply

* Re: [PATCH net-next 2/3] tipc: message reassembly using fragment chain
From: Jon Maloy @ 2013-10-28 10:24 UTC (permalink / raw)
  Cc: jon.maloy, paul.gortmaker, erik.hugne, ying.xue, tipc-discussion,
	netdev, David Miller
In-Reply-To: <20131028.010737.400887499395331496.davem@davemloft.net>

On 10/28/2013 01:07 AM, David Miller wrote:
> From: Jon Maloy <jon.maloy@ericsson.com>
> Date: Sat, 26 Oct 2013 14:41:02 -0400
>
>> +			int ret = tipc_link_recv_fragment(
>> +					&node->bclink.reasm_head,
>> +					&node->bclink.reasm_tail,
>> +					&buf);
> This is not the correct way to indent a function call that spans
> multiple lines.  In such a situation the arguments that appear
> on the second and subsequent lines must start at the first column
> after the openning parenthesis of the function call.
>
> Like this:
>
> 	func(a, b, c,
> 	     d, e, f);
>
> Please audit this in your entire set of patches and resubmit,
> thanks.

Doing as David says here means that some lines will be >80 chars.
This was the reason for the somewhat strange indentation.
I tried to rename the function to tipc_link_rcv_fragm(), but one
line was still too long. The problem we have goes deeper.

In Linus' coding style manual I read that the 80 char limit is a hard limit,
a limit we violate in several places. One offender is that we have too
many indentation levels, at least in tipc_recv_msg() and probably in
some other places. This is sensitive code, that I don't feel for touching
right now.  A more low hanging fruit is our local variable names:
names such as l_ptr, n_ptr, b_ptr is exactly what Linus characterizes
as "brain dead Hungarian style", and I never liked that naming anyway.
For me l, n, and b is good enough as long as the context is clear.

But, doing so, at least in tipc_recv_msg(), would require another, separate,
patch, and it would lead to style inconsistency.

In brief, I am at loss about to proceed here, and I am not going to submit
this patch again until I have some feedback from somebody who can tell me
what is the right thing to do. Maybe > 80 chars is fine for now?

///jon

^ permalink raw reply

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Peter Schmitt @ 2013-10-28 10:10 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <CALnjE+pdvpfCTo=bTtSrY29mxc0yZKMj0XSc2gQVhGjbpByhcA@mail.gmail.com>

Hi Pravin,

I tested this patch and it works fine with 3.10.17! Now I get correct packets from the GRE-device and can surf again using WCCP.

Thank you very much!

Best regards,
Peter





> On Friday, October 25, 2013 5:25 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> > Hi Peter,
> I think its easy to fix this case, Can you try attached patch?
> If it works I will send formal patch for stable.
> 
> Thanks,
> Pravin.
> 
> 
> On Fri, Oct 18, 2013 at 4:39 AM, Peter Schmitt <peter.schmitt82@yahoo.de> 
> wrote:
>>  Hi,
>> 
>> 
>> 
>>>  On Thursday, October 17, 2013 3:44 PM, Eric Dumazet 
> <eric.dumazet@gmail.com> wrote:
>>>  > On Thu, 2013-10-17 at 11:53 +0100, Peter Schmitt wrote:
>>>>   Hi,
>>>> 
>>>> 
>>>>   >
>>>>   > 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the 
> probable
>>>>   > fix :
>>>>   >
>>>>   > commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
>>>>   > ("ip_tunnel: push generic protocol handling to ip_tunnel
>>>  module.")
>>>>   >
>>>>   > The problem is 3.10 do not pull the extra 4 bytes of WCCP
>>>>   >
>>>>   > This is now properly done in iptunnel_pull_header()
>>>>   >
>>>> 
>>>>   First of all, many thanks for your help on this.
>>>> 
>>>>   I tried to apply the fix to the 3.10.16 sources, as I would like 
> to
>>>>   stay with the long-term line. Unfortunately it does not apply, as
>>>>   there are a lot of dependencies on other patches.
>>>> 
>>>>   Do you think there will be a fix for the long-time 3.10 kernel 
> line?
>>>>   Or can you guide me on how to apply this fix to the long-term 
> kernel?
>>> 
>>>  3.10 is right in the middle of GRE refactoring, and many bugs are in 
> it.
>>> 
>>>  It might be very painful to get a complete list of patches to backport.
>> 
>>  thanks again. Yes I saw that there were lots of changes.
>> 
>>  Do you think there will be a fix for all the 3.10.x stable long term kernel 
> users? Many of them might not be able to use some newer kernel easily or will 
> this be a "won't fix"?
>> 
>>  Best regards,
>>  Peter
>> 
> 

^ permalink raw reply

* [PATCH net-next v3 5/5] 6lowpan: remove unnecessary break
From: Alexander Aring @ 2013-10-28  9:24 UTC (permalink / raw)
  To: alex.bluesman.smirnov
  Cc: linux-zigbee-devel, werner, dbaryshkov, netdev, Alexander Aring
In-Reply-To: <1382952260-23174-1-git-send-email-alex.aring@gmail.com>

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
---
 net/ieee802154/6lowpan.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 3bc32fb..fde90e6 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -440,7 +440,6 @@ lowpan_uncompress_udp_header(struct sk_buff *skb, struct udphdr *uh)
 		default:
 			pr_debug("ERROR: unknown UDP format\n");
 			goto err;
-			break;
 		}
 
 		pr_debug("uncompressed UDP ports: src = %d, dst = %d\n",
-- 
1.8.4.1

^ permalink raw reply related

* [PATCH net-next v3 2/5] 6lowpan: remove unnecessary check on err >= 0
From: Alexander Aring @ 2013-10-28  9:24 UTC (permalink / raw)
  To: alex.bluesman.smirnov
  Cc: linux-zigbee-devel, werner, dbaryshkov, netdev, Alexander Aring
In-Reply-To: <1382952260-23174-1-git-send-email-alex.aring@gmail.com>

The err variable can only be zero in this case.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
---
 net/ieee802154/6lowpan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index d288035..9057f83 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -1179,7 +1179,7 @@ lowpan_skb_fragmentation(struct sk_buff *skb, struct net_device *dev)
 	head[0] &= ~LOWPAN_DISPATCH_FRAG1;
 	head[0] |= LOWPAN_DISPATCH_FRAGN;
 
-	while ((payload_length - offset > 0) && (err >= 0)) {
+	while (payload_length - offset > 0) {
 		int len = LOWPAN_FRAG_SIZE;
 
 		head[4] = offset / 8;
-- 
1.8.4.1

^ permalink raw reply related

* [PATCH net-next v3 0/5] 6lowpan: trivial changes
From: Alexander Aring @ 2013-10-28  9:24 UTC (permalink / raw)
  To: alex.bluesman.smirnov
  Cc: linux-zigbee-devel, werner, dbaryshkov, netdev, Alexander Aring

This patch series includes some trivial changes to prepare the 6lowpan stack
for upcomming patch-series which mainly fix fragmentation according to rfc4944
and udp handling(which is currently broken).

Changes since v3:
  - really fix intendation in patch 3/5

Changes since v2:
  - change intendation in patch 3/5
  - fix typo in 5/5 unecessary -> unnecessary
  - add missing 6lowpan tag in cover-letter

Alexander Aring (5):
  6lowpan: remove unnecessary ret variable
  6lowpan: remove unnecessary check on err >= 0
  6lowpan: use netdev_alloc_skb instead dev_alloc_skb
  6lowpan: remove skb->dev assignment
  6lowpan: remove unnecessary break

 net/ieee802154/6lowpan.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

-- 
1.8.4.1

^ permalink raw reply

* [PATCH net-next v3 4/5] 6lowpan: remove skb->dev assignment
From: Alexander Aring @ 2013-10-28  9:24 UTC (permalink / raw)
  To: alex.bluesman.smirnov-Re5JQEeQqe8AvxtiuMwx3w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1382952260-23174-1-git-send-email-alex.aring-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

This patch removes the assignment of skb->dev. We don't need it here because
we use the netdev_alloc_skb_ip_align function which already sets the
skb->dev.

Signed-off-by: Alexander Aring <alex.aring-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reviewed-by: Werner Almesberger <werner-SEdMjqphH88wryQfseakQg@public.gmane.org>
---
 net/ieee802154/6lowpan.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 7506d83..3bc32fb 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -785,7 +785,6 @@ lowpan_alloc_new_frame(struct sk_buff *skb, u16 len, u16 tag)
 		goto skb_err;
 
 	frame->skb->priority = skb->priority;
-	frame->skb->dev = skb->dev;
 
 	/* reserve headroom for uncompressed ipv6 header */
 	skb_reserve(frame->skb, sizeof(struct ipv6hdr));
-- 
1.8.4.1


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk

^ permalink raw reply related

* [PATCH net-next v3 3/5] 6lowpan: use netdev_alloc_skb instead dev_alloc_skb
From: Alexander Aring @ 2013-10-28  9:24 UTC (permalink / raw)
  To: alex.bluesman.smirnov-Re5JQEeQqe8AvxtiuMwx3w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1382952260-23174-1-git-send-email-alex.aring-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

This patch uses the netdev_alloc_skb instead dev_alloc_skb function and
drops the seperate assignment to skb->dev.

Signed-off-by: Alexander Aring <alex.aring-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reviewed-by: Werner Almesberger <werner-SEdMjqphH88wryQfseakQg@public.gmane.org>
---
 net/ieee802154/6lowpan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 9057f83..7506d83 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -1127,12 +1127,12 @@ lowpan_fragment_xmit(struct sk_buff *skb, u8 *head,
 
 	lowpan_raw_dump_inline(__func__, "6lowpan fragment header", head, hlen);
 
-	frag = dev_alloc_skb(hlen + mlen + plen + IEEE802154_MFR_SIZE);
+	frag = netdev_alloc_skb(skb->dev,
+				hlen + mlen + plen + IEEE802154_MFR_SIZE);
 	if (!frag)
 		return -ENOMEM;
 
 	frag->priority = skb->priority;
-	frag->dev = skb->dev;
 
 	/* copy header, MFR and payload */
 	memcpy(skb_put(frag, mlen), skb->data, mlen);
-- 
1.8.4.1


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk

^ permalink raw reply related

* [PATCH net-next v3 1/5] 6lowpan: remove unnecessary ret variable
From: Alexander Aring @ 2013-10-28  9:24 UTC (permalink / raw)
  To: alex.bluesman.smirnov-Re5JQEeQqe8AvxtiuMwx3w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1382952260-23174-1-git-send-email-alex.aring-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Signed-off-by: Alexander Aring <alex.aring-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Reviewed-by: Werner Almesberger <werner-SEdMjqphH88wryQfseakQg@public.gmane.org>
---
 net/ieee802154/6lowpan.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index ff41b4d..d288035 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -1120,7 +1120,7 @@ lowpan_fragment_xmit(struct sk_buff *skb, u8 *head,
 			int mlen, int plen, int offset, int type)
 {
 	struct sk_buff *frag;
-	int hlen, ret;
+	int hlen;
 
 	hlen = (type == LOWPAN_DISPATCH_FRAG1) ?
 			LOWPAN_FRAG1_HEAD_SIZE : LOWPAN_FRAGN_HEAD_SIZE;
@@ -1145,9 +1145,7 @@ lowpan_fragment_xmit(struct sk_buff *skb, u8 *head,
 	lowpan_raw_dump_table(__func__, " raw fragment dump", frag->data,
 								frag->len);
 
-	ret = dev_queue_xmit(frag);
-
-	return ret;
+	return dev_queue_xmit(frag);
 }
 
 static int
-- 
1.8.4.1


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk

^ permalink raw reply related

* RE: [PATCH v5] net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)
From: David Laight @ 2013-10-28  9:17 UTC (permalink / raw)
  To: Joe Perches, Arvid Brodin
  Cc: netdev, David Miller, Stephen Hemminger, Javier Boticario,
	balferreira, Elías Molina Muñoz
In-Reply-To: <1382725918.2425.2.camel@joe-AO722>

> Subject: Re: [PATCH v5] net/hsr: Add support for the High-availability Seamless Redundancy protocol
> (HSRv0)
> 
> On Fri, 2013-10-25 at 20:20 +0200, Arvid Brodin wrote:
> > On 2013-10-23 18:52, Joe Perches wrote:
> > > On Wed, 2013-10-23 at 18:09 +0200, Arvid Brodin wrote:
> []
> > >> +/* above(a, b) - return 1 if a > b, 0 otherwise.
> > >> + */
> > >> +static bool above(u16 a, u16 b)
> > >> +{
> > >> +	/* Remove inconsistency where above(a, b) == below(a, b) */
> > >> +	if ((int) b - a == 32768)
> > >> +		return 0;
> > >> +
> > >> +	return (((s16) (b - a)) < 0);
> > >> +}
> > >> +#define below(a, b)		above((b), (a))
> > >> +#define above_or_eq(a, b)	(!below((a), (b)))
> > >> +#define below_or_eq(a, b)	(!above((a), (b)))
> > >
> > > This looks odd.
> >
> > It relies on unsigned arithmetic to compare two values that may wrap. I.e.,
> > it doesn't care about the absolute sizes, but only about the distance
> > between the numbers.

I'd have thought it wasn't really worth worrying about what happens
when a ^ b == 0x8000. Anything using modulo 16 distances shouldn't
really see such values.
Perhaps just document the effect.

Also, if these definitions are in a .h file they need better names

	David

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox