Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 15/16] wl1251: Add sysfs file tx_mgmt_frm_rate for setting rate
From: Johannes Berg @ 2013-10-28 13:45 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen
In-Reply-To: <1382819655-30430-16-git-send-email-pali.rohar@gmail.com>

On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> This patch was extracted from Maemo 2.6.28 kernel

That's not a description or justification for the patch ....

but again, it seems like a bad idea to use sysfs.

johannes

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Johannes Berg @ 2013-10-28 13:45 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen
In-Reply-To: <1382819655-30430-17-git-send-email-pali.rohar@gmail.com>

On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> Driver wl1251 generating mac address randomly at startup and there is no way to
> set permanent mac address via SET_IEEE80211_PERM_ADDR. This patch export sysfs
> file which can set permanent mac address by userspace helper program. Patch is
> needed for devices which do not store mac address in internal wl1251 eeprom.

This doesn't really seem like a good idea since you can also just use
'ip' or whatever to set the MAC address.

johannes

^ permalink raw reply

* [PATCH] bgmac: separate RX descriptor setup code into a new function
From: Rafał Miłecki @ 2013-10-28 13:40 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: Rafał Miłecki

This cleans code a bit and will be useful when allocating buffers in
other places (like RX path, to avoid skb_copy_from_linear_data_offset).

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
---
 drivers/net/ethernet/broadcom/bgmac.c |   41 ++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index e7d98ea..6b7541f 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -269,6 +269,26 @@ static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
 	return 0;
 }
 
+static void bgmac_dma_rx_setup_desc(struct bgmac *bgmac,
+				    struct bgmac_dma_ring *ring, int desc_idx)
+{
+	struct bgmac_dma_desc *dma_desc = ring->cpu_base + desc_idx;
+	u32 ctl0 = 0, ctl1 = 0;
+
+	if (desc_idx == ring->num_slots - 1)
+		ctl0 |= BGMAC_DESC_CTL0_EOT;
+	ctl1 |= BGMAC_RX_BUF_SIZE & BGMAC_DESC_CTL1_LEN;
+	/* Is there any BGMAC device that requires extension? */
+	/* ctl1 |= (addrext << B43_DMA64_DCTL1_ADDREXT_SHIFT) &
+	 * B43_DMA64_DCTL1_ADDREXT_MASK;
+	 */
+
+	dma_desc->addr_low = cpu_to_le32(lower_32_bits(ring->slots[desc_idx].dma_addr));
+	dma_desc->addr_high = cpu_to_le32(upper_32_bits(ring->slots[desc_idx].dma_addr));
+	dma_desc->ctl0 = cpu_to_le32(ctl0);
+	dma_desc->ctl1 = cpu_to_le32(ctl1);
+}
+
 static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
 			     int weight)
 {
@@ -491,8 +511,6 @@ err_dma_free:
 static void bgmac_dma_init(struct bgmac *bgmac)
 {
 	struct bgmac_dma_ring *ring;
-	struct bgmac_dma_desc *dma_desc;
-	u32 ctl0, ctl1;
 	int i;
 
 	for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
@@ -525,23 +543,8 @@ static void bgmac_dma_init(struct bgmac *bgmac)
 		if (ring->unaligned)
 			bgmac_dma_rx_enable(bgmac, ring);
 
-		for (j = 0, dma_desc = ring->cpu_base; j < ring->num_slots;
-		     j++, dma_desc++) {
-			ctl0 = ctl1 = 0;
-
-			if (j == ring->num_slots - 1)
-				ctl0 |= BGMAC_DESC_CTL0_EOT;
-			ctl1 |= BGMAC_RX_BUF_SIZE & BGMAC_DESC_CTL1_LEN;
-			/* Is there any BGMAC device that requires extension? */
-			/* ctl1 |= (addrext << B43_DMA64_DCTL1_ADDREXT_SHIFT) &
-			 * B43_DMA64_DCTL1_ADDREXT_MASK;
-			 */
-
-			dma_desc->addr_low = cpu_to_le32(lower_32_bits(ring->slots[j].dma_addr));
-			dma_desc->addr_high = cpu_to_le32(upper_32_bits(ring->slots[j].dma_addr));
-			dma_desc->ctl0 = cpu_to_le32(ctl0);
-			dma_desc->ctl1 = cpu_to_le32(ctl1);
-		}
+		for (j = 0; j < ring->num_slots; j++)
+			bgmac_dma_rx_setup_desc(bgmac, ring, j);
 
 		bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_INDEX,
 			    ring->index_base +
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net-next] net: sched: cls_bpf: add BPF-based classifier
From: Eric Dumazet @ 2013-10-28 13:34 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Thomas Graf
In-Reply-To: <1382960120-2198-1-git-send-email-dborkman@redhat.com>

On Mon, 2013-10-28 at 12:35 +0100, Daniel Borkmann wrote:
> This work contains a lightweight BPF-based traffic classifier that can
> serve as a flexible alternative to ematch-based tree classification, i.e.
> now that BPF filter engine can also be JITed in the kernel. Naturally, tc
> actions and policies are supported as well with cls_bpf. Multiple BPF
> programs/filter can be attached for a class, or they can just as well be
> written within a single BPF program, that's really up to the user how he
> wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
> The notion of a BPF program's return/exit codes is being kept as follows:
> non-zero for match, zero for mismatch.
> 
> As a minimal usage example with iproute2, we use a 3 band prio root qdisc
> on a router with sfq each as leave, and assign ssh and icmp bpf-based
> filters to band 1, http traffic to band 2 and the rest to band 3. For the
> first two bands we load the bytecode from a file, in the 2nd we load it
> inline as an example:
> 
> echo 1 > /proc/sys/net/core/bpf_jit_enable
> 
> tc qdisc del dev em1 root
> tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1


> tc qdisc add dev em1 parent 1:1 sfq perturb 16
> tc qdisc add dev em1 parent 1:2 sfq perturb 16
> tc qdisc add dev em1 parent 1:3 sfq perturb 16
> 
> tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
> tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
> tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
> tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
> 
> BPF programs can be easily created and passed to tc, either as inline
> 'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
> compile opcodes, for example:
> 
> 1) People familiar with tcpdump-like filters:
> 
>    tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
> 
> 2) People that want to low-level program their filters or use BPF
>    extensions that lack support by libpcap's compiler:
> 
>    bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
> 
>    ssh.ops example code:
>    ldh [12]
>    jne #0x800, drop
>    ldb [23]
>    jneq #6, drop
>    ldh [20]
>    jset #0x1fff, drop
>    ldxb 4 * ([14] & 0xf)
>    ldh [%x + 14]
>    jeq #0x16, pass
>    ldh [%x + 16]
>    jne #0x16, drop
>    pass: ret #-1
>    drop: ret #0
> 
> It was chosen to load bytecode into tc, since the reverse operation,
> tc filter list dev em1, is then able to show the exact commands again.
> Possible follow-up work could also include a small expression compiler
> for iproute2. Tested with the help of bmon. This idea came up during
> the Netfilter Workshop 2013 in Copenhagen.
> 

Well, running a large amount of filters might be very expensive [1],
have you considered returning the flowid from the filter, instead of
returning 0 and !0 ?

0 : would mean : not matched filter
<>0 : flowid

[1] Because of lot of duplicated code in all filters...

^ permalink raw reply

* Re: Bug in skb_segment: fskb->len != len
From: Christoph Paasch @ 2013-10-28 13:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, netdev
In-Reply-To: <1382966471.13037.18.camel@edumazet-glaptop.roam.corp.google.com>

On 28/10/13 - 06:21:11, Eric Dumazet wrote:
> On Mon, 2013-10-28 at 12:55 +0100, Christoph Paasch wrote:
> > I have been seeing the below BUG in skb_segment with the latest net-next
> > head on my router.
> > 
> > I am forwarding Multipath TCP-traffic on this router. The MPTCP-sender is simply
> > doing an iperf-session. Strangely, I cannot reproduce the bug when sending
> > regular TCP-traffic across the router.
> > Note: The crash happens on a vanilla net-next kernel. It does not has any
> > MPTCP-code in it.
> > 
> > I bisected it down to 8a29111c7c (net: gro: allow to build full sized skb),
> > but I guess 8a29111c7c is just revealing a more fundamental bug in skb_segment.
> > 
> > Some info I found:
> > In skb_segment, when the bug happens, fskb->len is 4284 but the mss and len is 1428.
> 
> fskb seems to contain 3 segments -> 3*1428 = 4284, so it looks fine
> 
> But what do you mean by 'len is 1428' ?

I meant that the variable "len" equals 1428. And thus BUG_ON(fskb->len != len) triggers.

> > Shortly before the bug happens, skb_gro_receive is building a packet where
> > lp->len is equal to 4284 inside the frag_list.
> > 
> > 
> > Seems like skb_segment cannot handle those bigger skb's in the frag_list.
> > 
> 
> Thanks for the report, I'll take a look.
> 
> As mentioned earlier, building very large skbs (with a frag_list) for a
> router makes little sense, because we need to segment them before NIC
> ndo_start_xmit()
> 
> But we also need to fix the skb_segment() bug anyway.
> 
> Thanks !

Let me know if I should provide more info or test a patch.


Cheers,
Christoph

^ permalink raw reply

* Re: Bug in skb_segment: fskb->len != len
From: Eric Dumazet @ 2013-10-28 13:21 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Herbert Xu, netdev
In-Reply-To: <20131028115552.GC4408@cpaasch-mac>

On Mon, 2013-10-28 at 12:55 +0100, Christoph Paasch wrote:
> Hello,
> 
> I have been seeing the below BUG in skb_segment with the latest net-next
> head on my router.
> 
> I am forwarding Multipath TCP-traffic on this router. The MPTCP-sender is simply
> doing an iperf-session. Strangely, I cannot reproduce the bug when sending
> regular TCP-traffic across the router.
> Note: The crash happens on a vanilla net-next kernel. It does not has any
> MPTCP-code in it.
> 
> I bisected it down to 8a29111c7c (net: gro: allow to build full sized skb),
> but I guess 8a29111c7c is just revealing a more fundamental bug in skb_segment.
> 
> Some info I found:
> In skb_segment, when the bug happens, fskb->len is 4284 but the mss and len is 1428.

fskb seems to contain 3 segments -> 3*1428 = 4284, so it looks fine

But what do you mean by 'len is 1428' ?

> Shortly before the bug happens, skb_gro_receive is building a packet where
> lp->len is equal to 4284 inside the frag_list.
> 
> 
> Seems like skb_segment cannot handle those bigger skb's in the frag_list.
> 

Thanks for the report, I'll take a look.

As mentioned earlier, building very large skbs (with a frag_list) for a
router makes little sense, because we need to segment them before NIC
ndo_start_xmit()

But we also need to fix the skb_segment() bug anyway.

Thanks !

^ permalink raw reply

* Re: [virtio-net] BUG: sleeping function called from invalid context at kernel/mutex.c:616
From: Fengguang Wu @ 2013-10-28 12:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <526DF9A4.70709@redhat.com>

On Mon, Oct 28, 2013 at 01:44:04PM +0800, Jason Wang wrote:
> On 10/24/2013 07:20 AM, Fengguang Wu wrote:
> > Yes it reduces the sleeping function bug:
> >
> > /kernel/x86_64-lkp-CONFIG_SCSI_DEBUG/7c4ed2767afb813493b0a8fb18d666cd44550963
> >
> > +------------------------------------------------------------------------------------+-----------+--------------+--------------+
> > |                                                                                    | v3.12-rc3 | 3ab098df35f8 | 7c4ed2767afb |
> > +------------------------------------------------------------------------------------+-----------+--------------+--------------+
> > | good_boots                                                                         | 30        | 0            | 93           |
> > | has_kernel_error_warning                                                           | 0         | 20           | 7            |
> > | BUG:sleeping_function_called_from_invalid_context_at_kernel/mutex.c                | 0         | 20           |              |
> > | INFO:rcu_sched_self-detected_stall_on_CPU(t=jiffies_g=c=q=)                        | 0         | 0            | 1            |
> > | INFO:task_blocked_for_more_than_seconds                                            | 0         | 0            | 2            |
> > | INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 0         | 0            | 1            |
> > | Kernel_panic-not_syncing:hung_task:blocked_tasks                                   | 0         | 0            | 1            |
> > | BUG:kernel_test_crashed                                                            | 0         | 0            | 3            |
> > | BUG:kernel_test_hang                                                               | 0         | 0            | 1            |
> > +------------------------------------------------------------------------------------+-----------+--------------+--------------+
> >
> > However I'll need to increase tests on v3.12-rc3 to make sure it's not
> > the patch that added the other error messages.
> >
> > Thanks,
> > Fengguang
> 
> Thanks, any more update on the result of v3.12-rc3 for this?

Yes, it's confirmed that there are no new error types in 7c4ed2767afb
comparing to v3.12-rc3, so your fix is fine.

/kernel/x86_64-lkp-CONFIG_SCSI_DEBUG/7c4ed2767afb813493b0a8fb18d666cd44550963

+------------------------------------------------------------------------------------+-----------+--------------+--------------+
|                                                                                    | v3.12-rc3 | 3ab098df35f8 | 7c4ed2767afb |
+------------------------------------------------------------------------------------+-----------+--------------+--------------+
| good_boots                                                                         | 877       | 0            | 93           |
| has_kernel_error_warning                                                           | 154       | 20           | 7            |
| BUG:kernel_test_hang                                                               | 112       | 0            | 1            |
| INFO:rcu_sched_self-detected_stall_on_CPU(t=jiffies_g=c=q=)                        | 1         | 0            | 1            |
| INFO:task_blocked_for_more_than_seconds                                            | 18        | 0            | 2            |
| INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 11        | 0            | 1            |
| Kernel_panic-not_syncing:hung_task:blocked_tasks                                   | 11        | 0            | 1            |
| BUG:kernel_test_crashed                                                            | 13        | 0            | 3            |
| Out_of_memory:Kill_process                                                         | 9         |              |              |
| BUG:kernel_early_hang_without_any_printk_output                                    | 1         |              |              |
| page_allocation_failure:order:,mode                                                | 1         |              |              |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes                   | 2         |              |              |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/mutex.c                | 0         | 20           |              |
+------------------------------------------------------------------------------------+-----------+--------------+--------------+

Thanks,
Fengguang

^ permalink raw reply

* [PATCH net V3] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 12:07 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: david.vrabel, jbeulich, annie.li, Wei Liu, Ian Campbell,
	Jason Luan

time_after_eq() only works if the delta is < MAX_ULONG/2.

For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).

Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jason Luan <jianhai.luan@oracle.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    3 +--
 drivers/net/xen-netback/netback.c   |   10 +++++-----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..400fea1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -163,6 +163,7 @@ struct xenvif {
 	unsigned long   credit_usec;
 	unsigned long   remaining_credit;
 	struct timer_list credit_timeout;
+	u64 credit_window_start;
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..459935a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -312,8 +312,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
 	init_timer(&vif->credit_timeout);
-	/* Initialize 'expires' now: it's used to track the credit window. */
-	vif->credit_timeout.expires = jiffies;
+	vif->credit_window_start = get_jiffies_64();
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..900da4b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1185,9 +1185,8 @@ out:
 
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
-	unsigned long now = jiffies;
-	unsigned long next_credit =
-		vif->credit_timeout.expires +
+	u64 now = get_jiffies_64();
+	u64 next_credit = vif->credit_window_start +
 		msecs_to_jiffies(vif->credit_usec / 1000);
 
 	/* Timer could already be pending in rare cases. */
@@ -1195,8 +1194,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 		return true;
 
 	/* Passed the point where we can replenish credit? */
-	if (time_after_eq(now, next_credit)) {
-		vif->credit_timeout.expires = now;
+	if (time_after_eq64(now, next_credit)) {
+		vif->credit_window_start = now;
 		tx_add_credit(vif);
 	}
 
@@ -1208,6 +1207,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 			tx_credit_callback;
 		mod_timer(&vif->credit_timeout,
 			  next_credit);
+		vif->credit_window_start = next_credit;
 
 		return true;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 12:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, david.vrabel, Ian Campbell, xen-devel, annie.li,
	Jason Luan, netdev
In-Reply-To: <526E5D7302000078000FD41A@nat28.tlf.novell.com>

On Mon, Oct 28, 2013 at 11:49:55AM +0000, Jan Beulich wrote:
> >>> On 28.10.13 at 12:35, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> Two formal things:
> 
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1185,18 +1185,17 @@ out:
> >  
> >  static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
> >  {
> > -	unsigned long now = jiffies;
> > -	unsigned long next_credit =
> > -		vif->credit_timeout.expires +
> > -		msecs_to_jiffies(vif->credit_usec / 1000);
> > +	u64 now = get_jiffies_64();
> > +	u64 next_credit = vif->credit_window_start +
> > +		(u64)msecs_to_jiffies(vif->credit_usec / 1000);
> 
> The cast to u64 seems pointless here.
> 
> > @@ -1207,7 +1206,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
> >  		vif->credit_timeout.function =
> >  			tx_credit_callback;
> >  		mod_timer(&vif->credit_timeout,
> > -			  next_credit);
> > +			  (unsigned long)next_credit);
> 
> And the cast here seems pointless too (gcc doesn't warn about
> truncations).
> 

Will fix these, thanks.

Wei.

> Jan

^ permalink raw reply

* [PATCH 5/5] net/usb/r8152: code adjust
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

 -Remove rtl8152_get_stats()
 -Fix the wrong name
 -Something else

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 46 +++++++++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 25 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 90bc105..d252ff6 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -218,7 +218,7 @@
 #define POWER_CUT		0x0100
 
 /* USB_PM_CTRL_STATUS */
-#define RWSUME_INDICATE		0x0001
+#define RESUME_INDICATE		0x0001
 
 /* USB_USB_CTRL */
 #define RX_AGG_DISABLE		0x0010
@@ -376,7 +376,8 @@ struct r8152 {
 enum rtl_version {
 	RTL_VER_UNKNOWN = 0,
 	RTL_VER_01,
-	RTL_VER_02
+	RTL_VER_02,
+	RTL_VER_INVALLID = -1
 };
 
 /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
@@ -422,14 +423,15 @@ int set_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 			       value, index, tmp, size, 500);
 
 	kfree(tmp);
+
 	return ret;
 }
 
 static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
 				void *data, u16 type)
 {
-	u16	limit = 64;
-	int	ret = 0;
+	u16 limit = 64;
+	int ret = 0;
 
 	if (test_bit(RTL8152_UNPLUG, &tp->flags))
 		return -ENODEV;
@@ -468,9 +470,9 @@ static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
 static int generic_ocp_write(struct r8152 *tp, u16 index, u16 byteen,
 				u16 size, void *data, u16 type)
 {
-	int	ret;
-	u16	byteen_start, byteen_end, byen;
-	u16	limit = 512;
+	int ret;
+	u16 byteen_start, byteen_end, byen;
+	u16 limit = 512;
 
 	if (test_bit(RTL8152_UNPLUG, &tp->flags))
 		return -ENODEV;
@@ -763,11 +765,6 @@ static int rtl8152_set_mac_address(struct net_device *netdev, void *p)
 	return 0;
 }
 
-static struct net_device_stats *rtl8152_get_stats(struct net_device *dev)
-{
-	return &dev->stats;
-}
-
 static void read_bulk_callback(struct urb *urb)
 {
 	struct net_device *netdev;
@@ -836,6 +833,7 @@ static void read_bulk_callback(struct urb *urb)
 static void write_bulk_callback(struct urb *urb)
 {
 	struct net_device_stats *stats;
+	struct net_device *netdev;
 	unsigned long flags;
 	struct tx_agg *agg;
 	struct r8152 *tp;
@@ -849,7 +847,8 @@ static void write_bulk_callback(struct urb *urb)
 	if (!tp)
 		return;
 
-	stats = rtl8152_get_stats(tp->netdev);
+	netdev = tp->netdev;
+	stats = &netdev->stats;
 	if (status) {
 		pr_warn_ratelimited("Tx status %d\n", status);
 		stats->tx_errors += agg->skb_num;
@@ -862,7 +861,7 @@ static void write_bulk_callback(struct urb *urb)
 	list_add_tail(&agg->list, &tp->tx_free);
 	spin_unlock_irqrestore(&tp->tx_lock, flags);
 
-	if (!netif_carrier_ok(tp->netdev))
+	if (!netif_carrier_ok(netdev))
 		return;
 
 	if (!test_bit(WORK_ENABLE, &tp->flags))
@@ -1214,7 +1213,7 @@ static void rx_bottom(struct r8152 *tp)
 
 		while (urb->actual_length > len_used) {
 			struct net_device *netdev = tp->netdev;
-			struct net_device_stats *stats;
+			struct net_device_stats *stats = &netdev->stats;
 			unsigned pkt_len;
 			struct sk_buff *skb;
 
@@ -1226,8 +1225,6 @@ static void rx_bottom(struct r8152 *tp)
 			if (urb->actual_length < len_used)
 				break;
 
-			stats = rtl8152_get_stats(netdev);
-
 			pkt_len -= 4; /* CRC */
 			rx_data += sizeof(struct rx_desc);
 
@@ -1281,7 +1278,7 @@ static void tx_bottom(struct r8152 *tp)
 			unsigned long flags;
 
 			netdev = tp->netdev;
-			stats = rtl8152_get_stats(netdev);
+			stats = &netdev->stats;
 
 			if (res == -ENODEV) {
 				netif_device_detach(netdev);
@@ -1476,7 +1473,7 @@ static int rtl8152_enable(struct r8152 *tp)
 
 static void rtl8152_disable(struct r8152 *tp)
 {
-	struct net_device_stats *stats = rtl8152_get_stats(tp->netdev);
+	struct net_device_stats *stats = &tp->netdev->stats;
 	struct sk_buff *skb;
 	u32 ocp_data;
 	int i;
@@ -1600,8 +1597,8 @@ static void r8152b_exit_oob(struct r8152 *tp)
 
 static void r8152b_enter_oob(struct r8152 *tp)
 {
-	u32	ocp_data;
-	int	i;
+	u32 ocp_data;
+	int i;
 
 	ocp_data = ocp_read_byte(tp, MCU_TYPE_PLA, PLA_OOB_CTRL);
 	ocp_data &= ~NOW_IS_OOB;
@@ -1722,7 +1719,6 @@ static int rtl8152_set_speed(struct r8152 *tp, u8 autoneg, u16 speed, u8 duplex)
 	r8152_mdio_write(tp, MII_BMCR, bmcr);
 
 out:
-
 	return ret;
 }
 
@@ -1840,7 +1836,7 @@ static void rtl_clear_bp(struct r8152 *tp)
 
 static void r8152b_enable_eee(struct r8152 *tp)
 {
-	u32	ocp_data;
+	u32 ocp_data;
 
 	ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_EEE_CR);
 	ocp_data |= EEE_RX_EN | EEE_TX_EN;
@@ -1896,7 +1892,7 @@ static void r8152b_init(struct r8152 *tp)
 	ocp_write_word(tp, MCU_TYPE_USB, USB_UPS_CTRL, ocp_data);
 
 	ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS);
-	ocp_data &= ~RWSUME_INDICATE;
+	ocp_data &= ~RESUME_INDICATE;
 	ocp_write_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS, ocp_data);
 
 	r8152b_exit_oob(tp);
@@ -2148,7 +2144,7 @@ static void rtl8152_unload(struct r8152 *tp)
 	}
 
 	ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS);
-	ocp_data &= ~RWSUME_INDICATE;
+	ocp_data &= ~RESUME_INDICATE;
 	ocp_write_word(tp, MCU_TYPE_USB, USB_PM_CTRL_STATUS, ocp_data);
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 4/5] net/usb/r8152: fix incorrect type in assignment
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

Correct some declaration.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index a711025..90bc105 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -309,22 +309,22 @@ enum rtl8152_flags {
 #define MCU_TYPE_USB			0x0000
 
 struct rx_desc {
-	u32 opts1;
+	__le32 opts1;
 #define RX_LEN_MASK			0x7fff
-	u32 opts2;
-	u32 opts3;
-	u32 opts4;
-	u32 opts5;
-	u32 opts6;
+	__le32 opts2;
+	__le32 opts3;
+	__le32 opts4;
+	__le32 opts5;
+	__le32 opts6;
 };
 
 struct tx_desc {
-	u32 opts1;
+	__le32 opts1;
 #define TX_FS			(1 << 31) /* First segment of a packet */
 #define TX_LS			(1 << 30) /* Final segment of a packet */
 #define TX_LEN_MASK		0x3ffff
 
-	u32 opts2;
+	__le32 opts2;
 #define UDP_CS			(1 << 31) /* Calculate UDP/IP checksum */
 #define TCP_CS			(1 << 30) /* Calculate TCP/IP checksum */
 #define IPV4_CS			(1 << 29) /* Calculate IPv4 checksum */
@@ -878,7 +878,7 @@ static void write_bulk_callback(struct urb *urb)
 static void intr_callback(struct urb *urb)
 {
 	struct r8152 *tp;
-	__u16 *d;
+	__le16 *d;
 	int status = urb->status;
 	int res;
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 3/5] net/usb/r8152: modify the tx flow
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

Let rtl8152_start_xmit() to queue packet only, and tx_bottom() would
send all of the packets. This simplify the code and make sure all the
packet would be sent by the original order.

Support stopping and waking tx queue. The maximum tx queue length
is 60.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 52 ++++++++++---------------------------------------
 1 file changed, 10 insertions(+), 42 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 815d890..a711025 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -276,6 +276,8 @@ enum rtl_register_content {
 
 #define INTR_LINK		0x0004
 
+#define TX_QLEN			60
+
 #define RTL8152_REQT_READ	0xc0
 #define RTL8152_REQT_WRITE	0x40
 #define RTL8152_REQ_GET_REGS	0x05
@@ -1174,6 +1176,9 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
 	}
 
+	if (netif_queue_stopped(tp->netdev))
+		netif_wake_queue(tp->netdev);
+
 	usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
 			  agg->head, (int)(tx_data - (u8 *)agg->head),
 			  (usb_complete_t)write_bulk_callback, agg);
@@ -1389,53 +1394,16 @@ static netdev_tx_t rtl8152_start_xmit(struct sk_buff *skb,
 					    struct net_device *netdev)
 {
 	struct r8152 *tp = netdev_priv(netdev);
-	struct net_device_stats *stats = rtl8152_get_stats(netdev);
-	unsigned long flags;
-	struct tx_agg *agg = NULL;
-	struct tx_desc *tx_desc;
-	unsigned int len;
-	u8 *tx_data;
-	int res;
 
 	skb_tx_timestamp(skb);
 
-	/* If tx_queue is not empty, it means at least one previous packt */
-	/* is waiting for sending. Don't send current one before it.      */
-	if (skb_queue_empty(&tp->tx_queue))
-		agg = r8152_get_tx_agg(tp);
-
-	if (!agg) {
-		skb_queue_tail(&tp->tx_queue, skb);
-		return NETDEV_TX_OK;
-	}
+	skb_queue_tail(&tp->tx_queue, skb);
 
-	tx_desc = (struct tx_desc *)agg->head;
-	tx_data = agg->head + sizeof(*tx_desc);
-	agg->skb_num = agg->skb_len = 0;
+	if (skb_queue_len(&tp->tx_queue) > TX_QLEN)
+		netif_stop_queue(netdev);
 
-	len = skb->len;
-	r8152_tx_csum(tp, tx_desc, skb);
-	memcpy(tx_data, skb->data, len);
-	dev_kfree_skb_any(skb);
-	agg->skb_num++;
-	agg->skb_len += len;
-	usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
-			  agg->head, len + sizeof(*tx_desc),
-			  (usb_complete_t)write_bulk_callback, agg);
-	res = usb_submit_urb(agg->urb, GFP_ATOMIC);
-	if (res) {
-		/* Can we get/handle EPIPE here? */
-		if (res == -ENODEV) {
-			netif_device_detach(tp->netdev);
-		} else {
-			netif_warn(tp, tx_err, netdev,
-				   "failed tx_urb %d\n", res);
-			stats->tx_dropped++;
-			spin_lock_irqsave(&tp->tx_lock, flags);
-			list_add_tail(&agg->list, &tp->tx_free);
-			spin_unlock_irqrestore(&tp->tx_lock, flags);
-		}
-	}
+	if (!list_empty(&tp->tx_free))
+		tasklet_schedule(&tp->tl);
 
 	return NETDEV_TX_OK;
 }
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 2/5] net/usb/r8152: make sure the tx checksum setting is correct
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

Clear the checksum setting before checking the necessary of hardware
checksum.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 5dbfe50..815d890 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1094,6 +1094,7 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
 	memset(desc, 0, sizeof(*desc));
 
 	desc->opts1 = cpu_to_le32((skb->len & TX_LEN_MASK) | TX_FS | TX_LS);
+	desc->opts2 = 0;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		__be16 protocol;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 1/5] net/usb/r8152: fix tx/rx memory overflow
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang
In-Reply-To: <1382961494-2272-1-git-send-email-hayeswang@realtek.com>

The tx/rx would access the memory which is out of the desired range.
Modify the method of checking the end of the memory to avoid it.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f3fce41..5dbfe50 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -24,7 +24,7 @@
 #include <linux/ipv6.h>
 
 /* Version Information */
-#define DRIVER_VERSION "v1.01.0 (2013/08/12)"
+#define DRIVER_VERSION "v1.02.0 (2013/10/28)"
 #define DRIVER_AUTHOR "Realtek linux nic maintainers <nic_swsd@realtek.com>"
 #define DRIVER_DESC "Realtek RTL8152 Based USB 2.0 Ethernet Adapters"
 #define MODULENAME "r8152"
@@ -1136,14 +1136,14 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, struct sk_buff *skb)
 
 static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 {
-	u32 remain;
+	int remain;
 	u8 *tx_data;
 
 	tx_data = agg->head;
 	agg->skb_num = agg->skb_len = 0;
-	remain = rx_buf_sz - sizeof(struct tx_desc);
+	remain = rx_buf_sz;
 
-	while (remain >= ETH_ZLEN) {
+	while (remain >= ETH_ZLEN + sizeof(struct tx_desc)) {
 		struct tx_desc *tx_desc;
 		struct sk_buff *skb;
 		unsigned int len;
@@ -1152,12 +1152,14 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		if (!skb)
 			break;
 
+		remain -= sizeof(*tx_desc);
 		len = skb->len;
 		if (remain < len) {
 			skb_queue_head(&tp->tx_queue, skb);
 			break;
 		}
 
+		tx_data = tx_agg_align(tx_data);
 		tx_desc = (struct tx_desc *)tx_data;
 		tx_data += sizeof(*tx_desc);
 
@@ -1167,9 +1169,8 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 		agg->skb_len += len;
 		dev_kfree_skb_any(skb);
 
-		tx_data = tx_agg_align(tx_data + len);
-		remain = rx_buf_sz - sizeof(*tx_desc) -
-			 (u32)((void *)tx_data - agg->head);
+		tx_data += len;
+		remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
 	}
 
 	usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
@@ -1188,7 +1189,6 @@ static void rx_bottom(struct r8152 *tp)
 	list_for_each_safe(cursor, next, &tp->rx_done) {
 		struct rx_desc *rx_desc;
 		struct rx_agg *agg;
-		unsigned pkt_len;
 		int len_used = 0;
 		struct urb *urb;
 		u8 *rx_data;
@@ -1204,17 +1204,22 @@ static void rx_bottom(struct r8152 *tp)
 
 		rx_desc = agg->head;
 		rx_data = agg->head;
-		pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
-		len_used += sizeof(struct rx_desc) + pkt_len;
+		len_used += sizeof(struct rx_desc);
 
-		while (urb->actual_length >= len_used) {
+		while (urb->actual_length > len_used) {
 			struct net_device *netdev = tp->netdev;
 			struct net_device_stats *stats;
+			unsigned pkt_len;
 			struct sk_buff *skb;
 
+			pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
 			if (pkt_len < ETH_ZLEN)
 				break;
 
+			len_used += pkt_len;
+			if (urb->actual_length < len_used)
+				break;
+
 			stats = rtl8152_get_stats(netdev);
 
 			pkt_len -= 4; /* CRC */
@@ -1234,9 +1239,8 @@ static void rx_bottom(struct r8152 *tp)
 
 			rx_data = rx_agg_align(rx_data + pkt_len + 4);
 			rx_desc = (struct rx_desc *)rx_data;
-			pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
 			len_used = (int)(rx_data - (u8 *)agg->head);
-			len_used += sizeof(struct rx_desc) + pkt_len;
+			len_used += sizeof(struct rx_desc);
 		}
 
 submit:
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 0/5] r8152 bug fixes
From: Hayes Wang @ 2013-10-28 11:58 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang

These could fix some driver issues.

Hayes Wang (5):
  net/usb/r8152: fix tx/rx memory overflow
  net/usb/r8152: make sure the tx checksum setting is correct
  net/usb/r8152: modify the tx flow
  net/usb/r8152: fix incorrect type in assignment
  net/usb/r8152: code adjust

 drivers/net/usb/r8152.c | 145 +++++++++++++++++++-----------------------------
 1 file changed, 57 insertions(+), 88 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Bug in skb_segment: fskb->len != len
From: Christoph Paasch @ 2013-10-28 11:55 UTC (permalink / raw)
  To: Eric Dumazet, Herbert Xu; +Cc: netdev

Hello,

I have been seeing the below BUG in skb_segment with the latest net-next
head on my router.

I am forwarding Multipath TCP-traffic on this router. The MPTCP-sender is simply
doing an iperf-session. Strangely, I cannot reproduce the bug when sending
regular TCP-traffic across the router.
Note: The crash happens on a vanilla net-next kernel. It does not has any
MPTCP-code in it.

I bisected it down to 8a29111c7c (net: gro: allow to build full sized skb),
but I guess 8a29111c7c is just revealing a more fundamental bug in skb_segment.

Some info I found:
In skb_segment, when the bug happens, fskb->len is 4284 but the mss and len is 1428.
Shortly before the bug happens, skb_gro_receive is building a packet where
lp->len is equal to 4284 inside the frag_list.


Seems like skb_segment cannot handle those bigger skb's in the frag_list.


Cheers,
Christoph


Here the crash-dump:

[  399.832854] ------------[ cut here ]------------
[  399.888048] kernel BUG at /home/cpaasch/builder/net-next/net/core/skbuff.c:2796!
[  399.976504] invalid opcode: 0000 [#1] SMP 
[  400.025675] Modules linked in:
[  400.062270] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.12.0-rc6-mptcp #231
[  400.145531] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
[  400.243342] task: ffff88042d8a4680 ti: ffff88042d8ce000 task.ti: ffff88042d8ce000
[  400.332841] RIP: 0010:[<ffffffff81447d21>]  [<ffffffff81447d21>] skb_segment+0x1aa/0x5fa
[  400.429722] RSP: 0018:ffff88043fd03770  EFLAGS: 00010212
[  400.493231] RAX: 0000000000000594 RBX: ffff8800ba89ac00 RCX: 00000000000064be
[  400.578574] RDX: 0000000000000000 RSI: 0000000000000011 RDI: ffff8804273a7080
[  400.663918] RBP: ffff88043fd03820 R08: 0000000000000000 R09: ffff88042c4d4600
[  400.749259] R10: 0000000000010000 R11: ffff88042d801900 R12: ffff88042c7ca000
[  400.834596] R13: ffff88042c5d5400 R14: 0000000000001650 R15: 0000000000000056
[  400.919934] FS:  0000000000000000(0000) GS:ffff88043fd00000(0000) knlGS:0000000000000000
[  401.016711] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  401.085422] CR2: ffffffffff600400 CR3: 000000042c86b000 CR4: 00000000000007e0
[  401.170765] Stack:
[  401.194780]  ffff88042d94e900 ffff88042c4d46f0 0000000000000000 0000000000000042
[  401.283663]  0100000000000000 0000000000000001 0000001100000594 0000000000000056
[  401.372555]  0000000000000000 0000004200000098 ffffffffffffffaa 0000001100000001
[  401.461445] Call Trace:
[  401.490658]  <IRQ> 
[  401.513631]  [<ffffffff8149b077>] tcp_gso_segment+0x168/0x395
[  401.584644]  [<ffffffff814a5ba1>] inet_gso_segment+0x175/0x2a9
[  401.654396]  [<ffffffff8144fb40>] skb_mac_gso_segment+0x10a/0x16a
[  401.727264]  [<ffffffff81451062>] __skb_gso_segment+0xaf/0xb4
[  401.795977]  [<ffffffff814515ae>] dev_hard_start_xmit+0x215/0x40a
[  401.868846]  [<ffffffff814689ed>] sch_direct_xmit+0x6b/0x195
[  401.936519]  [<ffffffff81451988>] dev_queue_xmit+0x1e5/0x3ac
[  402.004193]  [<ffffffff814b6461>] ? iptable_filter_hook+0x41/0x4c
[  402.077061]  [<ffffffff8148039d>] ip_finish_output+0x2f6/0x351
[  402.146812]  [<ffffffff8147c6dc>] ? ip_frag_mem+0x34/0x34
[  402.211366]  [<ffffffff81480470>] ip_output+0x78/0x7f
[  402.271765]  [<ffffffff8147c71c>] ip_forward_finish+0x40/0x44
[  402.340475]  [<ffffffff8147c9c5>] ip_forward+0x2a5/0x300
[  402.403993]  [<ffffffff8147b104>] ip_rcv_finish+0x214/0x22c
[  402.470625]  [<ffffffff8147b3cd>] ip_rcv+0x2b1/0x2e9
[  402.529983]  [<ffffffff81446a19>] ? skb_gro_receive+0x562/0x582
[  402.600773]  [<ffffffff8144dcd8>] __netif_receive_skb_core+0x49a/0x4cd
[  402.678840]  [<ffffffff8144dd60>] __netif_receive_skb+0x55/0x5a
[  402.749631]  [<ffffffff81450190>] netif_receive_skb+0x71/0x78
[  402.818344]  [<ffffffff8149af07>] ? tcp4_gro_receive+0xf4/0xfc
[  402.888095]  [<ffffffff81450249>] napi_gro_complete+0xb2/0xba
[  402.956808]  [<ffffffff8145045f>] dev_gro_receive+0x20e/0x34d
[  403.025519]  [<ffffffff81450ae5>] napi_gro_receive+0x92/0xf1
[  403.093195]  [<ffffffff813acfe2>] netxen_process_rcv_ring+0x1b0/0x767
[  403.170222]  [<ffffffff810b3ae8>] ? kmem_cache_free+0xef/0xf3
[  403.238931]  [<ffffffff81450fb1>] ? dev_kfree_skb_any+0x2e/0x30
[  403.309723]  [<ffffffff813acc42>] ? netxen_process_cmd_ring+0x33/0x223
[  403.387790]  [<ffffffff813a8f70>] netxen_nic_poll+0x35/0x9a
[  403.454423]  [<ffffffff814506dc>] net_rx_action+0xa7/0x1d2
[  403.520017]  [<ffffffff8103605d>] __do_softirq+0xbd/0x17e
[  403.584572]  [<ffffffff815289bc>] call_softirq+0x1c/0x26
[  403.648085]  [<ffffffff81003bbb>] do_softirq+0x33/0x68
[  403.709523]  [<ffffffff81035efb>] irq_exit+0x40/0x4e
[  403.768880]  [<ffffffff81003423>] do_IRQ+0x98/0xaf
[  403.826158]  [<ffffffff8152716a>] common_interrupt+0x6a/0x6a
[  403.893829]  <EOI> 
[  403.916800]  [<ffffffff8100933d>] ? default_idle+0x6/0x8
[  403.982604]  [<ffffffff81009542>] arch_cpu_idle+0x13/0x18
[  404.047159]  [<ffffffff8105ea2b>] cpu_startup_entry+0xa4/0xf1
[  404.115873]  [<ffffffff8102320b>] start_secondary+0x1b2/0x1b7
[  404.184582] Code: bd 7f ff ff ff 00 74 04 44 8b 75 c0 45 85 f6 0f 85 e5 00 00 00 8b 75 84 39 75 ac 0f 8c d9 00 00 00 45 8b 75 68 44 3b 75 c0 74 04 <0f> 0b eb fe 4c 89 ef be 20 00 00 00 e8 08 f1 ff ff 48 85 c0 48 
[  404.417106] RIP  [<ffffffff81447d21>] skb_segment+0x1aa/0x5fa
[  404.485928]  RSP <ffff88043fd03770>
[  404.527614] ---[ end trace 32152a68c7bdc3ac ]---

^ permalink raw reply

* Re: [Xen-devel] [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: David Vrabel @ 2013-10-28 11:53 UTC (permalink / raw)
  To: Wei Liu, xen-devel, netdev
  Cc: Ian Campbell, annie.li, david.vrabel, jbeulich, Jason Luan
In-Reply-To: <1382960117-13053-1-git-send-email-wei.liu2@citrix.com>

On 28/10/2013 11:35, Wei Liu wrote:
> time_after_eq() only works if the delta is < MAX_ULONG/2.
> 
> For a 32bit Dom0, if netfront sends packets at a very low rate, the time
> between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
> and the test for timer_after_eq() will be incorrect. Credit will not be
> replenished and the guest may become unable to send packets (e.g., if
> prior to the long gap, all credit was exhausted).
> 
> Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply

* Re: [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: Jan Beulich @ 2013-10-28 11:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: david.vrabel, Ian Campbell, xen-devel, annie.li, Jason Luan,
	netdev
In-Reply-To: <1382960117-13053-1-git-send-email-wei.liu2@citrix.com>

>>> On 28.10.13 at 12:35, Wei Liu <wei.liu2@citrix.com> wrote:

Two formal things:

> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1185,18 +1185,17 @@ out:
>  
>  static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>  {
> -	unsigned long now = jiffies;
> -	unsigned long next_credit =
> -		vif->credit_timeout.expires +
> -		msecs_to_jiffies(vif->credit_usec / 1000);
> +	u64 now = get_jiffies_64();
> +	u64 next_credit = vif->credit_window_start +
> +		(u64)msecs_to_jiffies(vif->credit_usec / 1000);

The cast to u64 seems pointless here.

> @@ -1207,7 +1206,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
>  		vif->credit_timeout.function =
>  			tx_credit_callback;
>  		mod_timer(&vif->credit_timeout,
> -			  next_credit);
> +			  (unsigned long)next_credit);

And the cast here seems pointless too (gcc doesn't warn about
truncations).

Jan

^ permalink raw reply

* [PATCH iproute2] tc: add cls_bpf frontend
From: Daniel Borkmann @ 2013-10-28 11:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Thomas Graf

This is the iproute2 part of the kernel patch "net: sched:
add BPF-based traffic classifier".

[Will re-submit later again for iproute2 when window for
 -next submissions opens.]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/linux/pkt_cls.h |  14 +++
 tc/Makefile             |   1 +
 tc/f_bpf.c              | 288 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 303 insertions(+)
 create mode 100644 tc/f_bpf.c

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 082eafa..25731df 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -388,6 +388,20 @@ enum {
 
 #define TCA_CGROUP_MAX (__TCA_CGROUP_MAX - 1)
 
+/* BPF classifier */
+
+enum {
+	TCA_BPF_UNSPEC,
+	TCA_BPF_ACT,
+	TCA_BPF_POLICE,
+	TCA_BPF_CLASSID,
+	TCA_BPF_OPS_LEN,
+	TCA_BPF_OPS,
+	__TCA_BPF_MAX,
+};
+
+#define TCA_BPF_MAX (__TCA_BPF_MAX - 1)
+
 /* Extended Matches */
 
 struct tcf_ematch_tree_hdr {
diff --git a/tc/Makefile b/tc/Makefile
index f54a955..84215c0 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -22,6 +22,7 @@ TCMODULES += f_u32.o
 TCMODULES += f_route.o
 TCMODULES += f_fw.o
 TCMODULES += f_basic.o
+TCMODULES += f_bpf.o
 TCMODULES += f_flow.o
 TCMODULES += f_cgroup.o
 TCMODULES += q_dsmark.o
diff --git a/tc/f_bpf.c b/tc/f_bpf.c
new file mode 100644
index 0000000..d52d7d8
--- /dev/null
+++ b/tc/f_bpf.c
@@ -0,0 +1,288 @@
+/*
+ * f_bpf.c	BPF-based Classifier
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Daniel Borkmann <dborkman@redhat.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/filter.h>
+#include <linux/if.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... bpf ...\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, " [inline]:     run bytecode BPF_BYTECODE\n");
+	fprintf(stderr, " [from file]:  run bytecode-file FILE\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "               [ police POLICE_SPEC ] [ action ACTION_SPEC ]\n");
+	fprintf(stderr, "               [ classid CLASSID ]\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "Where BPF_BYTECODE := \'s,c t f k,c t f k,c t f k,...\'\n");
+	fprintf(stderr, "      c,t,f,k and s are decimals; s denotes number of 4-tuples\n");
+	fprintf(stderr, "Where FILE points to a file containing the BPF_BYTECODE string\n");
+	fprintf(stderr, "\nNOTE: CLASSID is parsed as hexadecimal input.\n");
+}
+
+static int bpf_parse_string(char *arg, bool from_file, __u16 *bpf_len,
+			    char **bpf_string, bool *need_release,
+			    const char separator)
+{
+	char sp;
+
+	if (from_file) {
+		size_t tmp_len, op_len = sizeof("65535 255 255 4294967295,");
+		char *tmp_string;
+		FILE *fp;
+
+		tmp_len = sizeof("4096,") + BPF_MAXINSNS * op_len;
+		tmp_string = malloc(tmp_len);
+		if (tmp_string == NULL)
+			return -ENOMEM;
+
+		memset(tmp_string, 0, tmp_len);
+
+		fp = fopen(arg, "r");
+		if (fp == NULL) {
+			perror("Cannot fopen");
+			free(tmp_string);
+			return -ENOENT;
+		}
+
+		if (!fgets(tmp_string, tmp_len, fp)) {
+			free(tmp_string);
+			fclose(fp);
+			return -EIO;
+		}
+
+		fclose(fp);
+
+		*need_release = true;
+		*bpf_string = tmp_string;
+	} else {
+		*need_release = false;
+		*bpf_string = arg;
+	}
+
+	if (sscanf(*bpf_string, "%hu%c", bpf_len, &sp) != 2 ||
+	    sp != separator) {
+		if (*need_release)
+			free(*bpf_string);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int bpf_parse_ops(int argc, char **argv, struct nlmsghdr *n,
+			 bool from_file)
+{
+	char *bpf_string, *token, separator = ',';
+	struct sock_filter bpf_ops[BPF_MAXINSNS];
+	int ret = 0, i = 0;
+	bool need_release;
+	__u16 bpf_len = 0;
+
+	if (argc < 1)
+		return -EINVAL;
+	if (bpf_parse_string(argv[0], from_file, &bpf_len, &bpf_string,
+			     &need_release, separator))
+		return -EINVAL;
+	if (bpf_len == 0 || bpf_len > BPF_MAXINSNS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	token = bpf_string;
+	while ((token = strchr(token, separator)) && (++token)[0]) {
+		if (i >= bpf_len) {
+			fprintf(stderr, "Real program length exceeds encoded "
+				"length parameter!\n");
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (sscanf(token, "%hu %hhu %hhu %u,",
+			   &bpf_ops[i].code, &bpf_ops[i].jt,
+			   &bpf_ops[i].jf, &bpf_ops[i].k) != 4) {
+			fprintf(stderr, "Error at instruction %d!\n", i);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		i++;
+	}
+
+	if (i != bpf_len) {
+		fprintf(stderr, "Parsed program length is less than encoded"
+			"length parameter!\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	addattr_l(n, MAX_MSG, TCA_BPF_OPS_LEN, &bpf_len, sizeof(bpf_len));
+	addattr_l(n, MAX_MSG, TCA_BPF_OPS, &bpf_ops,
+		  bpf_len * sizeof(struct sock_filter));
+out:
+	if (need_release)
+		free(bpf_string);
+
+	return ret;
+}
+
+static void bpf_print_ops(FILE *f, struct rtattr *bpf_ops, __u16 len)
+{
+	struct sock_filter *ops = (struct sock_filter *) RTA_DATA(bpf_ops);
+	int i;
+
+	if (len == 0)
+		return;
+
+	fprintf(f, "bytecode \'%u,", len);
+
+	for (i = 0; i < len - 1; i++)
+		fprintf(f, "%hu %hhu %hhu %u,", ops[i].code, ops[i].jt,
+			ops[i].jf, ops[i].k);
+
+	fprintf(f, "%hu %hhu %hhu %u\'\n", ops[i].code, ops[i].jt,
+		ops[i].jf, ops[i].k);
+}
+
+static int bpf_parse_opt(struct filter_util *qu, char *handle,
+			 int argc, char **argv, struct nlmsghdr *n)
+{
+	struct tcmsg *t = NLMSG_DATA(n);
+	struct rtattr *tail;
+	long h = 0;
+
+	if (argc == 0)
+		return 0;
+
+	if (handle) {
+		h = strtol(handle, NULL, 0);
+		if (h == LONG_MIN || h == LONG_MAX) {
+			fprintf(stderr, "Illegal handle \"%s\", must be "
+				"numeric.\n", handle);
+			return -1;
+		}
+	}
+
+	t->tcm_handle = h;
+
+	tail = (struct rtattr*)(((void*)n)+NLMSG_ALIGN(n->nlmsg_len));
+	addattr_l(n, MAX_MSG, TCA_OPTIONS, NULL, 0);
+
+	while (argc > 0) {
+		if (matches(*argv, "run") == 0) {
+			bool from_file;
+			NEXT_ARG();
+			if (strcmp(*argv, "bytecode-file") == 0) {
+				from_file = true;
+			} else if (strcmp(*argv, "bytecode") == 0) {
+				from_file = false;
+			} else {
+				fprintf(stderr, "What is \"%s\"?\n", *argv);
+				explain();
+				return -1;
+			}
+			NEXT_ARG();
+			if (bpf_parse_ops(argc, argv, n, from_file)) {
+				fprintf(stderr, "Illegal \"bytecode\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "classid") == 0 ||
+			   strcmp(*argv, "flowid") == 0) {
+			unsigned handle;
+			NEXT_ARG();
+			if (get_tc_classid(&handle, *argv)) {
+				fprintf(stderr, "Illegal \"classid\"\n");
+				return -1;
+			}
+			addattr_l(n, MAX_MSG, TCA_BPF_CLASSID, &handle, 4);
+		} else if (matches(*argv, "action") == 0) {
+			NEXT_ARG();
+			if (parse_action(&argc, &argv, TCA_BPF_ACT, n)) {
+				fprintf(stderr, "Illegal \"action\"\n");
+				return -1;
+			}
+			continue;
+		} else if (matches(*argv, "police") == 0) {
+			NEXT_ARG();
+			if (parse_police(&argc, &argv, TCA_BPF_POLICE, n)) {
+				fprintf(stderr, "Illegal \"police\"\n");
+				return -1;
+			}
+			continue;
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail->rta_len = (((void*)n)+n->nlmsg_len) - (void*)tail;
+	return 0;
+}
+
+static int bpf_print_opt(struct filter_util *qu, FILE *f,
+			 struct rtattr *opt, __u32 handle)
+{
+	struct rtattr *tb[TCA_BPF_MAX + 1];
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_BPF_MAX, opt);
+
+	if (handle)
+		fprintf(f, "handle 0x%x ", handle);
+
+	if (tb[TCA_BPF_CLASSID]) {
+		SPRINT_BUF(b1);
+		fprintf(f, "flowid %s ",
+			sprint_tc_classid(rta_getattr_u32(tb[TCA_BPF_CLASSID]), b1));
+	}
+
+	if (tb[TCA_BPF_OPS] && tb[TCA_BPF_OPS_LEN])
+		bpf_print_ops(f, tb[TCA_BPF_OPS],
+			      rta_getattr_u16(tb[TCA_BPF_OPS_LEN]));
+
+	if (tb[TCA_BPF_POLICE]) {
+		fprintf(f, "\n");
+		tc_print_police(f, tb[TCA_BPF_POLICE]);
+	}
+
+	if (tb[TCA_BPF_ACT]) {
+		tc_print_action(f, tb[TCA_BPF_ACT]);
+	}
+
+	return 0;
+}
+
+struct filter_util bpf_filter_util = {
+	.id = "bpf",
+	.parse_fopt = bpf_parse_opt,
+	.print_fopt = bpf_print_opt,
+};
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next] net: sched: cls_bpf: add BPF-based classifier
From: Daniel Borkmann @ 2013-10-28 11:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, Thomas Graf

This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:
non-zero for match, zero for mismatch.

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 include/uapi/linux/pkt_cls.h |  14 ++
 net/sched/Kconfig            |  10 ++
 net/sched/Makefile           |   1 +
 net/sched/cls_bpf.c          | 380 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 405 insertions(+)
 create mode 100644 net/sched/cls_bpf.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 082eafa..25731df 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -388,6 +388,20 @@ enum {
 
 #define TCA_CGROUP_MAX (__TCA_CGROUP_MAX - 1)
 
+/* BPF classifier */
+
+enum {
+	TCA_BPF_UNSPEC,
+	TCA_BPF_ACT,
+	TCA_BPF_POLICE,
+	TCA_BPF_CLASSID,
+	TCA_BPF_OPS_LEN,
+	TCA_BPF_OPS,
+	__TCA_BPF_MAX,
+};
+
+#define TCA_BPF_MAX (__TCA_BPF_MAX - 1)
+
 /* Extended Matches */
 
 struct tcf_ematch_tree_hdr {
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c03a32a..989464c 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -443,6 +443,16 @@ config NET_CLS_CGROUP
 	  To compile this code as a module, choose M here: the
 	  module will be called cls_cgroup.
 
+config NET_CLS_BPF
+	tristate "BPF-based classifier"
+	select NET_CLS
+	---help---
+	  If you say Y here, you will be able to classify packets based on
+	  programmable BPF (JIT'ed) filters as an alternative to ematches.
+
+	  To compile this code as a module, choose M here: the module will
+	  be called cls_bpf.
+
 config NET_EMATCH
 	bool "Extended Matches"
 	select NET_CLS
diff --git a/net/sched/Makefile b/net/sched/Makefile
index e5f9abe..35fa47a 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_NET_CLS_RSVP6)	+= cls_rsvp6.o
 obj-$(CONFIG_NET_CLS_BASIC)	+= cls_basic.o
 obj-$(CONFIG_NET_CLS_FLOW)	+= cls_flow.o
 obj-$(CONFIG_NET_CLS_CGROUP)	+= cls_cgroup.o
+obj-$(CONFIG_NET_CLS_BPF)	+= cls_bpf.o
 obj-$(CONFIG_NET_EMATCH)	+= ematch.o
 obj-$(CONFIG_NET_EMATCH_CMP)	+= em_cmp.o
 obj-$(CONFIG_NET_EMATCH_NBYTE)	+= em_nbyte.o
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
new file mode 100644
index 0000000..778ca86
--- /dev/null
+++ b/net/sched/cls_bpf.c
@@ -0,0 +1,380 @@
+/*
+ * Berkeley Packet Filter based traffic classifier
+ *
+ * Might be used to classify traffic through flexible, user-defined and
+ * possibly JIT-ed BPF filters for traffic control as an alternative to
+ * ematches.
+ *
+ * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/filter.h>
+#include <net/rtnetlink.h>
+#include <net/pkt_cls.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
+MODULE_DESCRIPTION("TC BPF based classifier");
+
+struct cls_bpf_head {
+	struct list_head plist;
+	u32 hgen;
+};
+
+struct cls_bpf_prog {
+	struct sk_filter *filter;
+	struct sock_filter *bpf_ops;
+	struct tcf_exts exts;
+	struct tcf_result res;
+	struct list_head link;
+	u32 handle;
+	u16 bpf_len;
+};
+
+static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
+	[TCA_BPF_CLASSID]	= { .type = NLA_U32 },
+	[TCA_BPF_OPS_LEN]	= { .type = NLA_U16 },
+	[TCA_BPF_OPS]		= { .type = NLA_BINARY,
+				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
+};
+
+static const struct tcf_ext_map bpf_ext_map = {
+	.action = TCA_BPF_ACT,
+	.police = TCA_BPF_POLICE,
+};
+
+static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+			    struct tcf_result *res)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	int ret;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (SK_RUN_FILTER(prog->filter, skb) == 0)
+			continue;
+
+		*res = prog->res;
+		ret = tcf_exts_exec(skb, &prog->exts, res);
+		if (ret < 0)
+			continue;
+
+		return ret;
+	}
+
+	return -1;
+}
+
+static int cls_bpf_init(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (head == NULL)
+		return -ENOBUFS;
+
+	INIT_LIST_HEAD(&head->plist);
+	tp->root = head;
+
+	return 0;
+}
+
+static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
+{
+	tcf_unbind_filter(tp, &prog->res);
+	tcf_exts_destroy(tp, &prog->exts);
+
+	sk_unattached_filter_destroy(prog->filter);
+
+	kfree(prog->bpf_ops);
+	kfree(prog);
+}
+
+static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog == todel) {
+			tcf_tree_lock(tp);
+			list_del(&prog->link);
+			tcf_tree_unlock(tp);
+
+			cls_bpf_delete_prog(tp, prog);
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static void cls_bpf_destroy(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *tmp;
+
+	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
+		list_del(&prog->link);
+		cls_bpf_delete_prog(tp, prog);
+	}
+
+	kfree(head);
+}
+
+static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	unsigned long ret = 0UL;
+
+	if (head == NULL)
+		return 0UL;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog->handle == handle) {
+			ret = (unsigned long) prog;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void cls_bpf_put(struct tcf_proto *tp, unsigned long f)
+{
+}
+
+static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
+				   struct cls_bpf_prog *prog,
+				   unsigned long base, struct nlattr **tb,
+				   struct nlattr *est)
+{
+	struct sock_filter *bpf_ops, *bpf_old;
+	struct tcf_exts exts;
+	struct sock_fprog tmp;
+	struct sk_filter *fp, *fp_old;
+	u16 bpf_size, bpf_len;
+	u32 classid;
+	int ret;
+
+	if (!tb[TCA_BPF_OPS_LEN] || !tb[TCA_BPF_OPS] || !tb[TCA_BPF_CLASSID])
+		return -EINVAL;
+
+	ret = tcf_exts_validate(net, tp, tb, est, &exts, &bpf_ext_map);
+	if (ret < 0)
+		return ret;
+
+	classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
+	bpf_len = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
+	if (bpf_len > BPF_MAXINSNS || bpf_len == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	bpf_size = bpf_len * sizeof(*bpf_ops);
+	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
+	if (bpf_ops == NULL) {
+		ret = -ENOMEM;
+		goto errout;
+	}
+
+	memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size);
+
+	tmp.len = bpf_len;
+	tmp.filter = (struct sock_filter __user *) bpf_ops;
+
+	ret = sk_unattached_filter_create(&fp, &tmp);
+	if (ret)
+		goto errout_free;
+
+	tcf_tree_lock(tp);
+	fp_old = prog->filter;
+	bpf_old = prog->bpf_ops;
+
+	prog->bpf_len = bpf_len;
+	prog->bpf_ops = bpf_ops;
+	prog->filter = fp;
+	prog->res.classid = classid;
+	tcf_tree_unlock(tp);
+
+	tcf_bind_filter(tp, &prog->res, base);
+	tcf_exts_change(tp, &prog->exts, &exts);
+
+	if (fp_old)
+		sk_unattached_filter_destroy(fp_old);
+	if (bpf_old)
+		kfree(bpf_old);
+
+	return 0;
+
+errout_free:
+	kfree(bpf_ops);
+errout:
+	tcf_exts_destroy(tp, &exts);
+	return ret;
+}
+
+static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp,
+				   struct cls_bpf_head *head)
+{
+	unsigned int i = 0x80000000;
+
+	do {
+		if (++head->hgen == 0x7FFFFFFF)
+			head->hgen = 1;
+	} while (--i > 0 && cls_bpf_get(tp, head->hgen));
+	if (i == 0)
+		pr_err("Insufficient number of handles\n");
+
+	return i;
+}
+
+static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
+			  struct tcf_proto *tp, unsigned long base,
+			  u32 handle, struct nlattr **tca,
+			  unsigned long *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg;
+	struct nlattr *tb[TCA_BPF_MAX + 1];
+	int ret;
+
+	if (tca[TCA_OPTIONS] == NULL)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_BPF_MAX, tca[TCA_OPTIONS], bpf_policy);
+	if (ret < 0)
+		return ret;
+
+	if (prog != NULL) {
+		if (handle && prog->handle != handle)
+			return -EINVAL;
+		return cls_bpf_modify_existing(net, tp, prog, base, tb,
+					       tca[TCA_RATE]);
+	}
+
+	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
+	if (prog == NULL)
+		return -ENOBUFS;
+
+	if (handle == 0)
+		prog->handle = cls_bpf_grab_new_handle(tp, head);
+	else
+		prog->handle = handle;
+	if (prog->handle == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	ret = cls_bpf_modify_existing(net, tp, prog, base, tb, tca[TCA_RATE]);
+	if (ret < 0)
+		goto errout;
+
+	tcf_tree_lock(tp);
+	list_add(&prog->link, &head->plist);
+	tcf_tree_unlock(tp);
+
+	*arg = (unsigned long) prog;
+
+	return 0;
+errout:
+	if (*arg == 0UL && prog)
+		kfree(prog);
+
+	return ret;
+}
+
+static int cls_bpf_dump(struct tcf_proto *tp, unsigned long fh,
+			struct sk_buff *skb, struct tcmsg *tm)
+{
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh;
+	struct nlattr *nest, *nla;
+
+	if (prog == NULL)
+		return skb->len;
+
+	tm->tcm_handle = prog->handle;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (nest == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
+		goto nla_put_failure;
+	if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_len))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_len *
+			  sizeof(struct sock_filter));
+	if (nla == NULL)
+		goto nla_put_failure;
+
+        memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla));
+
+	if (tcf_exts_dump(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest);
+
+	if (tcf_exts_dump_stats(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (arg->count < arg->skip)
+			goto skip;
+		if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+skip:
+		arg->count++;
+	}
+}
+
+static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
+	.kind		=	"bpf",
+	.owner		=	THIS_MODULE,
+	.classify	=	cls_bpf_classify,
+	.init		=	cls_bpf_init,
+	.destroy	=	cls_bpf_destroy,
+	.get		=	cls_bpf_get,
+	.put		=	cls_bpf_put,
+	.change		=	cls_bpf_change,
+	.delete		=	cls_bpf_delete,
+	.walk		=	cls_bpf_walk,
+	.dump		=	cls_bpf_dump,
+};
+
+static int __init cls_bpf_init_mod(void)
+{
+	return register_tcf_proto_ops(&cls_bpf_ops);
+}
+
+static void __exit cls_bpf_exit_mod(void)
+{
+	unregister_tcf_proto_ops(&cls_bpf_ops);
+}
+
+module_init(cls_bpf_init_mod);
+module_exit(cls_bpf_exit_mod);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net V2] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 11:35 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: david.vrabel, jbeulich, annie.li, Wei Liu, Ian Campbell,
	Jason Luan

time_after_eq() only works if the delta is < MAX_ULONG/2.

For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).

Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jason Luan <jianhai.luan@oracle.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    3 +--
 drivers/net/xen-netback/netback.c   |   14 +++++++-------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..400fea1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -163,6 +163,7 @@ struct xenvif {
 	unsigned long   credit_usec;
 	unsigned long   remaining_credit;
 	struct timer_list credit_timeout;
+	u64 credit_window_start;
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..459935a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -312,8 +312,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
 	init_timer(&vif->credit_timeout);
-	/* Initialize 'expires' now: it's used to track the credit window. */
-	vif->credit_timeout.expires = jiffies;
+	vif->credit_window_start = get_jiffies_64();
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..8644aca 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1185,18 +1185,17 @@ out:
 
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
-	unsigned long now = jiffies;
-	unsigned long next_credit =
-		vif->credit_timeout.expires +
-		msecs_to_jiffies(vif->credit_usec / 1000);
+	u64 now = get_jiffies_64();
+	u64 next_credit = vif->credit_window_start +
+		(u64)msecs_to_jiffies(vif->credit_usec / 1000);
 
 	/* Timer could already be pending in rare cases. */
 	if (timer_pending(&vif->credit_timeout))
 		return true;
 
 	/* Passed the point where we can replenish credit? */
-	if (time_after_eq(now, next_credit)) {
-		vif->credit_timeout.expires = now;
+	if (time_after_eq64(now, next_credit)) {
+		vif->credit_window_start = now;
 		tx_add_credit(vif);
 	}
 
@@ -1207,7 +1206,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 		vif->credit_timeout.function =
 			tx_credit_callback;
 		mod_timer(&vif->credit_timeout,
-			  next_credit);
+			  (unsigned long)next_credit);
+		vif->credit_window_start = next_credit;
 
 		return true;
 	}
-- 
1.7.10.4

^ permalink raw reply related

* Re: Big performance loss from 3.4.63 to 3.10.13 when routing ipv4
From: Steffen Klassert @ 2013-10-28 11:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Wolfgang Walter, David Miller, hannes, netdev, klassert
In-Reply-To: <1382941051.13037.13.camel@edumazet-glaptop.roam.corp.google.com>

On Sun, Oct 27, 2013 at 11:17:31PM -0700, Eric Dumazet wrote:
> On Fri, 2013-10-25 at 11:20 +0200, Steffen Klassert wrote:
> > On Fri, Oct 25, 2013 at 01:50:28AM -0700, Eric Dumazet wrote:
> > > 
> > > 32768 as the default seems fine to me
> > > 
> > > 448 bytes per dst -> thats less than 30 Mbytes of memory if we hit 65536
> > > dst.
> > > 
> > 
> > Ok, I'll add the patch below to to the ipsec tree if everyone is fine
> > with that threshold.
> > 
> > Subject: [PATCH RFC] xfrm: Increase the garbage collector threshold
> > 
> > With the removal of the routing cache, we lost the
> > option to tweak the garbage collector threshold
> > along with the maximum routing cache size. So git
> > commit 703fb94ec ("xfrm: Fix the gc threshold value
> > for ipv4") moved back to a static threshold.
> > 
> > It turned out that the current threshold before we
> > start garbage collecting is much to small for some
> > workloads, so increase it from 1024 to 32768. This
> > means that we start the garbage collector if we have
> > more than 32768 dst entries in the system and refuse
> > new allocations if we are above 65536.
> > 
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> > ---
> 
> Sure please add :
> 
> Reported-by: Wolfgang Walter <linux@stwm.de>

Done.

Now applied to the ipsec tree, thanks everyone!

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: use jiffies_64 value to calculate credit timeout
From: Wei Liu @ 2013-10-28 11:30 UTC (permalink / raw)
  To: annie li
  Cc: Wei Liu, Ian Campbell, netdev, xen-devel, david.vrabel, jbeulich,
	Jason Luan
In-Reply-To: <526DD2D8.5030405@oracle.com>

On Mon, Oct 28, 2013 at 10:58:32AM +0800, annie li wrote:
[...]
> >>      if (timer_pending(&vif->credit_timeout))
> >>          return true;
> >>        /* Passed the point where we can replenish credit? */
> >>-    if (time_after_eq(now, next_credit)) {
> >>-        vif->credit_timeout.expires = now;
> >>+    if (time_after_eq64(now, next_credit)) {
> >>+        vif->credit_timeout.expires = (unsigned long)now;
> >
> >updates credit_window_start as following,
> >vif->credit_window_start = (unsigned long)now;
> 
> both credit_window_start and credit_timeout.expires need to be
> updated here,
> 
> vif->credit_window_start = (unsigned long)now;
> vif->credit_timeout.expires = (unsigned long)now;
> 

IMHO we don't need to update .expires anymore -- we now track the window
with another variable.

Wei.

^ permalink raw reply

* Re: [PATCH net-next 2/3] tipc: message reassembly using fragment chain
From: Jon Maloy @ 2013-10-28 10:24 UTC (permalink / raw)
  Cc: jon.maloy, paul.gortmaker, erik.hugne, ying.xue, tipc-discussion,
	netdev, David Miller
In-Reply-To: <20131028.010737.400887499395331496.davem@davemloft.net>

On 10/28/2013 01:07 AM, David Miller wrote:
> From: Jon Maloy <jon.maloy@ericsson.com>
> Date: Sat, 26 Oct 2013 14:41:02 -0400
>
>> +			int ret = tipc_link_recv_fragment(
>> +					&node->bclink.reasm_head,
>> +					&node->bclink.reasm_tail,
>> +					&buf);
> This is not the correct way to indent a function call that spans
> multiple lines.  In such a situation the arguments that appear
> on the second and subsequent lines must start at the first column
> after the openning parenthesis of the function call.
>
> Like this:
>
> 	func(a, b, c,
> 	     d, e, f);
>
> Please audit this in your entire set of patches and resubmit,
> thanks.

Doing as David says here means that some lines will be >80 chars.
This was the reason for the somewhat strange indentation.
I tried to rename the function to tipc_link_rcv_fragm(), but one
line was still too long. The problem we have goes deeper.

In Linus' coding style manual I read that the 80 char limit is a hard limit,
a limit we violate in several places. One offender is that we have too
many indentation levels, at least in tipc_recv_msg() and probably in
some other places. This is sensitive code, that I don't feel for touching
right now.  A more low hanging fruit is our local variable names:
names such as l_ptr, n_ptr, b_ptr is exactly what Linus characterizes
as "brain dead Hungarian style", and I never liked that naming anyway.
For me l, n, and b is good enough as long as the context is clear.

But, doing so, at least in tipc_recv_msg(), would require another, separate,
patch, and it would lead to style inconsistency.

In brief, I am at loss about to proceed here, and I am not going to submit
this patch again until I have some feedback from somebody who can tell me
what is the right thing to do. Maybe > 80 chars is fine for now?

///jon

^ permalink raw reply

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Peter Schmitt @ 2013-10-28 10:10 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <CALnjE+pdvpfCTo=bTtSrY29mxc0yZKMj0XSc2gQVhGjbpByhcA@mail.gmail.com>

Hi Pravin,

I tested this patch and it works fine with 3.10.17! Now I get correct packets from the GRE-device and can surf again using WCCP.

Thank you very much!

Best regards,
Peter





> On Friday, October 25, 2013 5:25 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> > Hi Peter,
> I think its easy to fix this case, Can you try attached patch?
> If it works I will send formal patch for stable.
> 
> Thanks,
> Pravin.
> 
> 
> On Fri, Oct 18, 2013 at 4:39 AM, Peter Schmitt <peter.schmitt82@yahoo.de> 
> wrote:
>>  Hi,
>> 
>> 
>> 
>>>  On Thursday, October 17, 2013 3:44 PM, Eric Dumazet 
> <eric.dumazet@gmail.com> wrote:
>>>  > On Thu, 2013-10-17 at 11:53 +0100, Peter Schmitt wrote:
>>>>   Hi,
>>>> 
>>>> 
>>>>   >
>>>>   > 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the 
> probable
>>>>   > fix :
>>>>   >
>>>>   > commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
>>>>   > ("ip_tunnel: push generic protocol handling to ip_tunnel
>>>  module.")
>>>>   >
>>>>   > The problem is 3.10 do not pull the extra 4 bytes of WCCP
>>>>   >
>>>>   > This is now properly done in iptunnel_pull_header()
>>>>   >
>>>> 
>>>>   First of all, many thanks for your help on this.
>>>> 
>>>>   I tried to apply the fix to the 3.10.16 sources, as I would like 
> to
>>>>   stay with the long-term line. Unfortunately it does not apply, as
>>>>   there are a lot of dependencies on other patches.
>>>> 
>>>>   Do you think there will be a fix for the long-time 3.10 kernel 
> line?
>>>>   Or can you guide me on how to apply this fix to the long-term 
> kernel?
>>> 
>>>  3.10 is right in the middle of GRE refactoring, and many bugs are in 
> it.
>>> 
>>>  It might be very painful to get a complete list of patches to backport.
>> 
>>  thanks again. Yes I saw that there were lots of changes.
>> 
>>  Do you think there will be a fix for all the 3.10.x stable long term kernel 
> users? Many of them might not be able to use some newer kernel easily or will 
> this be a "won't fix"?
>> 
>>  Best regards,
>>  Peter
>> 
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox