Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] bonding: move dev_addr cpy to bond_enslave
From: David Miller @ 2010-06-02 11:17 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100519111428.GA2788@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 19 May 2010 13:14:29 +0200

> Move the code that copies slave's mac address in case that's the first slave into
> bond_enslave. Ifenslave app does this also but that's not a problem. This is
> something that should be done in bond_enslave, and it shound not matter from
> where is it called.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

(Jiri, please number your patches in a set, even if they should apply
 properly independantly, thanks)

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: remove unused original_flags struct slave member
From: David Miller @ 2010-06-02 11:17 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100519111740.GB2788@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 19 May 2010 13:17:41 +0200

> This is stored but never restored. So remove this as it is useless.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: optimize tlb_get_least_loaded_slave
From: David Miller @ 2010-06-02 11:17 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100519132638.GC2788@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 19 May 2010 15:26:39 +0200

> In the worst case, when the first loop breaks an the end of the slave list,
> the slave list is iterated through twice. This patch reduces this
> function only to one loop. Also makes it simpler.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: move dev_addr cpy to bond_enslave
From: Jiri Pirko @ 2010-06-02 11:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100602.041714.236244214.davem@davemloft.net>

Wed, Jun 02, 2010 at 01:17:14PM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Wed, 19 May 2010 13:14:29 +0200
>
>> Move the code that copies slave's mac address in case that's the first slave into
>> bond_enslave. Ifenslave app does this also but that's not a problem. This is
>> something that should be done in bond_enslave, and it shound not matter from
>> where is it called.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>(Jiri, please number your patches in a set, even if they should apply
> properly independantly, thanks)

Sorry, this was not ment to be a set. I was just posting patches as I went thru
the code. Will try to "buffer" it next time.

Thanks.

>
>Applied.

^ permalink raw reply

* [PATCH 1/2] pktgen: increasing transmission granularity
From: Daniel Turull @ 2010-06-02 11:49 UTC (permalink / raw)
  To: netdev; +Cc: robert, jens.laas

This patch increases the granularity of the rate generated by pktgen.
The previous version of pktgen uses micro seconds (udelay) resolution when it 
was delayed causing gaps in the rates. It is changed to nanosecond (ndelay).
Now any rate is possible.

Also it allows to set, the desired rate in Mb/s or packets per second.

The documentation has been updated.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>

---
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index 61bb645..ac0e4ff 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -78,6 +78,9 @@ Examples:
 
  pgset "delay 5000"      adds delay to hard_start_xmit(). nanoseconds
 
+ pgset "rate 300M"       set rate to 300 Mb/s
+ pgset "ratep 1000000"   set rate to 1Mpps
+
  pgset "dst 10.0.0.1"    sets IP destination address
                          (BEWARE! This generator is very aggressive!)
 
@@ -200,6 +203,9 @@ debug
 frags
 delay
 
+rate
+ratep
+
 src_mac_count
 dst_mac_count
 
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2ad68da..6428653 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -169,7 +169,7 @@
 #include <asm/dma.h>
 #include <asm/div64.h>		/* do_div */
 
-#define VERSION 	"2.73"
+#define VERSION	"2.74"
 #define IP_NAME_SZ 32
 #define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
 #define MPLS_STACK_BOTTOM htonl(0x00000100)
@@ -980,6 +980,40 @@ static ssize_t pktgen_if_write(struct file *file,
 			(unsigned long long) pkt_dev->delay);
 		return count;
 	}
+	if (!strcmp(name, "rate")) {
+		len = num_arg(&user_buffer[i], 10, &value);
+		if (len < 0)
+			return len;
+
+		i += len;
+		if (!value)
+			return len;
+		pkt_dev->delay = pkt_dev->min_pkt_size*8*NSEC_PER_USEC/value;
+		if (debug)
+			printk(KERN_INFO
+				 "pktgen: Delay set at: %llu ns\n",
+					pkt_dev->delay);
+
+		sprintf(pg_result, "OK: rate=%lu", value);
+		return count;
+	}
+	if (!strcmp(name, "ratep")) {
+		len = num_arg(&user_buffer[i], 10, &value);
+		if (len < 0)
+			return len;
+
+		i += len;
+		if (!value)
+			return len;
+		pkt_dev->delay = NSEC_PER_SEC/value;
+		if (debug)
+			printk(KERN_INFO
+				 "pktgen: Delay set at: %llu ns\n",
+					pkt_dev->delay);
+
+		sprintf(pg_result, "OK: rate=%lu", value);
+		return count;
+	}
 	if (!strcmp(name, "udp_src_min")) {
 		len = num_arg(&user_buffer[i], 10, &value);
 		if (len < 0)
@@ -2142,15 +2176,15 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 	hrtimer_init_on_stack(&t.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	hrtimer_set_expires(&t.timer, spin_until);
 
-	remaining = ktime_to_us(hrtimer_expires_remaining(&t.timer));
+	remaining = ktime_to_ns(hrtimer_expires_remaining(&t.timer));
 	if (remaining <= 0) {
 		pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
 		return;
 	}
 
 	start_time = ktime_now();
-	if (remaining < 100)
-		udelay(remaining); 	/* really small just spin */
+	if (remaining < 100000)
+		ndelay(remaining);	/* really small just spin */
 	else {
 		/* see do_nanosleep */
 		hrtimer_init_sleeper(&t, current);
@@ -2170,7 +2204,7 @@ static void spin(struct pktgen_dev *pkt_dev, ktime_t spin_until)
 	end_time = ktime_now();
 
 	pkt_dev->idle_acc += ktime_to_ns(ktime_sub(end_time, start_time));
-	pkt_dev->next_tx = ktime_add_ns(end_time, pkt_dev->delay);
+	pkt_dev->next_tx = ktime_add_ns(spin_until, pkt_dev->delay);
 }
 
 static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)


^ permalink raw reply related

* [PATCH 2/2] pktgen: receive packets and process incoming rate
From: Daniel Turull @ 2010-06-02 11:49 UTC (permalink / raw)
  To: netdev; +Cc: robert, jens.laas

This patch adds receiver part to pktgen taking advantages of SMP systems
with multiple rx queues:
- Creation of new proc file  /proc/net/pktgen/pgrx to control and display the receiver.
- It uses PER-CPU variable to store the results per each CPU.
- Results displayed per CPU and aggregated.
- The packet handler is add in the protocols handlers (dev_Add_pack())
- Available statistics: packets and bytes received, work time and rate
- Only process pktgen packets
- It is possible to select the incoming interface 
- Documentation updated with the new commands to control the receiver part.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>

---
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index ac0e4ff..20a8810 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -22,7 +22,7 @@ For monitoring and control pktgen creates:
 	/proc/net/pktgen/pgctrl
 	/proc/net/pktgen/kpktgend_X
         /proc/net/pktgen/ethX
-
+	/proc/net/pktgen/pgrx
 
 Viewing threads
 ===============
@@ -155,6 +155,43 @@ Examples:
  pgset stop    	          aborts injection. Also, ^C aborts generator.
 
 
+=======
+Viewing receiver
+================
+
+/proc/net/pktgen/pgrx
+
+		RECEPTION STATISTICS
+	PER-CPU Stats.
+CPU 0: 	Rx packets: 0	 Rx bytes: 0
+CPU 1: 	Rx packets: 2502400	 Rx bytes: 150144000
+	Rate:  22218pps 10 Mb/sec (10665033bps)
+	Worktime 112625248 us
+CPU 2: 	Rx packets: 1251200	 Rx bytes: 75072000
+	Rate:  11109pps 5 Mb/sec (5332412bps)
+	Worktime 112627453 us
+CPU 3: 	Rx packets: 1251100	 Rx bytes: 75066000
+	Rate:  11108pps 5 Mb/sec (5332035bps)
+	Worktime 112626413 us
+CPU 4: 	Rx packets: 1251400	 Rx bytes: 75084000
+	Rate:  11111pps 5 Mb/sec (5333458bps)
+	Worktime 112623364 us
+CPU 5: 	Rx packets: 1251200	 Rx bytes: 75072000
+	Rate:  11110pps 5 Mb/sec (5332844bps)
+	Worktime 112618314 us
+CPU 6: 	Rx packets: 1241500	 Rx bytes: 74490000
+	Rate:  11023pps 5 Mb/sec (5291273bps)
+	Worktime 112623172 us
+CPU 7: 	Rx packets: 1251200	 Rx bytes: 75072000
+	Rate:  11109pps 5 Mb/sec (5332628bps)
+	Worktime 112622877 us
+
+	Global Statistics
+Packets Rx: 10000000	 Bytes Rx: 600000000
+Start: 64952587054 us	 Stop: 65065224574 us	 Worktime  112637519 us
+Received throughput:
+  88780pps 42 Mb/sec (42614574bps)
+
 Example scripts
 ===============
 
@@ -247,6 +284,12 @@ src6
 flows
 flowlen
 
+**Receiver commands:
+
+rx [device]
+rx_reset
+rx_disable
+
 References:
 ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/
 ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 6428653..1ee67d0 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -114,6 +114,7 @@
  * Fixed src_mac command to set source mac of packet to value specified in
  * command by Adit Ranadive <adit.262@gmail.com>
  *
+ * Receiver support and rate control by Daniel Turull <daniel.turull@gmail.com>
  */
 #include <linux/sys.h>
 #include <linux/types.h>
@@ -168,6 +169,7 @@
 #include <linux/uaccess.h>
 #include <asm/dma.h>
 #include <asm/div64.h>		/* do_div */
+#include "kmap_skb.h"
 
 #define VERSION	"2.74"
 #define IP_NAME_SZ 32
@@ -204,8 +206,10 @@
 
 /* Used to help with determining the pkts on receive */
 #define PKTGEN_MAGIC 0xbe9be955
+#define PKTGEN_MAGIC_NET htonl(PKTGEN_MAGIC)
 #define PG_PROC_DIR "pktgen"
 #define PGCTRL	    "pgctrl"
+#define PGRX        "pgrx"
 static struct proc_dir_entry *pg_proc_dir;
 
 #define MAX_CFLOWS  65536
@@ -406,6 +410,15 @@ struct pktgen_thread {
 	struct completion start_done;
 };
 
+/*Recevier parameters per cpu*/
+struct pktgen_rx {
+	u64 rx_packets;		/*packets arrived*/
+	u64 rx_bytes;		/*bytes arrived*/
+
+	ktime_t start_time;	/*first time stamp of a packet*/
+	ktime_t last_time;	/*last packet arrival */
+};
+
 #define REMOVE 1
 #define FIND   0
 
@@ -438,6 +451,13 @@ static void pktgen_stop_all_threads_ifs(void);
 static void pktgen_stop(struct pktgen_thread *t);
 static void pktgen_clear_counters(struct pktgen_dev *pkt_dev);
 
+/*Receiver functions*/
+static int pktgen_rcv_basic(struct sk_buff *skb, struct net_device *dev,
+	struct packet_type *pt, struct net_device *orig_dev);
+static int pktgen_add_rx(const char *ifname);
+static int pktgen_clean_rx(void);
+static void pg_reset_rx(void);
+
 static unsigned int scan_ip6(const char *s, char ip[16]);
 static unsigned int fmt_ip6(char *s, const char ip[16]);
 
@@ -450,10 +470,19 @@ static int debug  __read_mostly;
 static DEFINE_MUTEX(pktgen_thread_lock);
 static LIST_HEAD(pktgen_threads);
 
+DEFINE_PER_CPU(struct pktgen_rx, pktgen_rx_data);
+static int pg_initialized;
+
 static struct notifier_block pktgen_notifier_block = {
 	.notifier_call = pktgen_device_event,
 };
 
+/*Reception functions test*/
+static struct packet_type pktgen_packet_type __read_mostly = {
+	.type = __constant_htons(ETH_P_IP),
+	.func = pktgen_rcv_basic,
+};
+
 /*
  * /proc handling functions
  *
@@ -1876,6 +1905,188 @@ static const struct file_operations pktgen_thread_fops = {
 	.release = single_release,
 };
 
+/*
+ * Function that show Receiver statistics
+ */
+static int pgrx_show(struct seq_file *seq, void *v)
+{
+	__u64 bps, mbps, pps;
+	int cpu;
+	u64 total_packets = 0, total_bytes = 0, work_time_us = 0;
+	u64 packets = 0, bytes = 0;
+	ktime_t start_global, stop_global, tmp;
+	start_global.tv64 = 0;
+	stop_global.tv64 = 0;
+
+	seq_puts(seq, "\t\tRECEPTION STATISTICS\n");
+	if (pg_initialized == 0) {
+		seq_puts(seq, "Not enabled.\n");
+		return 0;
+	}
+	seq_puts(seq, "\tPER-CPU Stats.\n");
+
+	for_each_online_cpu(cpu) {
+		seq_printf(seq, "CPU %d: ", cpu);
+		packets = per_cpu(pktgen_rx_data, cpu).rx_packets;
+		bytes = per_cpu(pktgen_rx_data, cpu).rx_bytes;
+
+		total_packets += packets;
+		total_bytes += bytes;
+		seq_printf(seq, "\tRx packets: %llu\t Rx bytes: %llu\n",
+			packets, bytes);
+
+		tmp = per_cpu(pktgen_rx_data, cpu).start_time;
+		if (start_global.tv64 == 0 && tmp.tv64 != 0)
+			start_global = tmp;
+		else if (tmp.tv64 < start_global.tv64 && tmp.tv64 != 0)
+			start_global = tmp;
+
+		tmp = per_cpu(pktgen_rx_data, cpu).last_time;
+		if (ktime_to_ns(tmp) > ktime_to_ns(stop_global))
+			stop_global = tmp;
+
+		work_time_us = ktime_to_us(ktime_sub(
+			per_cpu(pktgen_rx_data, cpu).last_time,
+			per_cpu(pktgen_rx_data, cpu).start_time));
+
+		if (!work_time_us)
+			continue;
+
+		bps = div64_u64(bytes*8*USEC_PER_SEC, work_time_us);
+		mbps = bps;
+		do_div(mbps, 1000000);
+		pps = div64_u64(packets * USEC_PER_SEC, work_time_us);
+
+		seq_printf(seq, "\tRate:  %llupps %llu Mb/sec (%llubps)\n",
+				(unsigned long long)pps,
+				(unsigned long long)mbps,
+				(unsigned long long)bps);
+		seq_printf(seq, "\tWork time %llu us\n", work_time_us);
+
+	}
+
+	seq_puts(seq, "\n\tGlobal Statistics\n");
+
+	seq_printf(seq, "Packets Rx: %llu\t Bytes Rx: %llu\n",
+		(unsigned long long) total_packets,
+		(unsigned long long) total_bytes);
+
+	/*Bandwidth*/
+	work_time_us = ktime_to_us(ktime_sub(stop_global, start_global));
+
+	seq_printf(seq, "Start: %llu us\t Stop: %llu us\t Work time  %llu us\n",
+		ktime_to_us(start_global),
+		ktime_to_us(stop_global),
+		work_time_us);
+
+	if (!work_time_us)
+		return 0;
+
+	bps = div64_u64(total_bytes*8*USEC_PER_SEC, work_time_us);
+	mbps = bps;
+	do_div(mbps, 1000000);
+	pps = div64_u64(total_packets * USEC_PER_SEC, work_time_us);
+
+	seq_puts(seq, "Received throughput:\n");
+
+	seq_printf(seq, "  %llupps %llu Mb/sec (%llubps)\n",
+		     (unsigned long long)pps,
+		     (unsigned long long)mbps,
+		     (unsigned long long)bps);
+
+	return 0;
+}
+/*receiver configuration*/
+static ssize_t pgrx_write(struct file *file, const char __user * user_buffer,
+				size_t count, loff_t *ppos)
+{
+	int i = 0, max, len, ret;
+	char name[40];
+
+	if (count < 1)
+		return -EINVAL;
+
+	max = count - i;
+	len = count_trail_chars(&user_buffer[i], max);
+	if (len < 0)
+		return len;
+
+	i += len;
+
+	/* Read variable name */
+
+	len = strn_len(&user_buffer[i], sizeof(name) - 1);
+	if (len < 0)
+		return len;
+
+	memset(name, 0, sizeof(name));
+	if (copy_from_user(name, &user_buffer[i], len))
+		return -EFAULT;
+	i += len;
+
+	max = count - i;
+	len = count_trail_chars(&user_buffer[i], max);
+	if (len < 0)
+		return len;
+
+	i += len;
+
+	if (debug)
+		printk(KERN_DEBUG "pktgen: t=%s, count=%lu\n",
+		       name, (unsigned long)count);
+
+	if (!strcmp(name, "rx")) {
+		char f[32];
+		memset(f, 0, 32);
+		len = strn_len(&user_buffer[i], sizeof(f) - 1);
+		if (len < 0) {
+			ret = len;
+			goto out;
+		}
+		if (copy_from_user(f, &user_buffer[i], len))
+			return -EFAULT;
+		i += len;
+
+		if (debug)
+			printk(KERN_INFO "pktgen: Adding rx %s\n", f);
+		pktgen_add_rx(f);
+		ret = count;
+		goto out;
+	} else if (!strcmp(name, "rx_reset")) {
+		ret = count;
+		pg_reset_rx();
+		if (debug)
+			printk(KERN_INFO "pktgen: Reseting reception\n");
+		goto out;
+	} else if (!strcmp(name, "rx_disable")) {
+		ret = count;
+		pktgen_clean_rx();
+		if (debug)
+			printk(KERN_INFO "pktgen: Cleaning reception\n");
+		goto out;
+	} else
+		printk(KERN_WARNING "pktgen: Unknown command: %s\n", name);
+
+	ret = count;
+
+out:
+	return ret;
+}
+
+static int pgrx_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, pgrx_show, PDE(inode)->data);
+}
+
+static const struct file_operations pktgen_rx_fops = {
+	.owner   = THIS_MODULE,
+	.open    = pgrx_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.write   = pgrx_write,
+	.release = single_release,
+};
+
 /* Think find or remove for NN */
 static struct pktgen_dev *__pktgen_NN_threads(const char *ifname, int remove)
 {
@@ -3886,6 +4097,88 @@ static int pktgen_remove_device(struct pktgen_thread *t,
 	return 0;
 }
 
+static void pg_reset_rx(void)
+{
+	int cpu;
+	for_each_online_cpu(cpu) {
+		per_cpu(pktgen_rx_data, cpu).rx_packets = 0;
+		per_cpu(pktgen_rx_data, cpu).rx_bytes = 0;
+		per_cpu(pktgen_rx_data, cpu).last_time.tv64 = 0;
+		per_cpu(pktgen_rx_data, cpu).start_time.tv64 = 0;
+	}
+}
+
+static int pktgen_add_rx(const char *ifname)
+{
+	int err = 0;
+	struct net_device *idev = NULL;
+
+	pg_reset_rx();
+
+	idev = pktgen_dev_get_by_name(NULL, ifname);
+	if (!idev)
+		printk(KERN_INFO
+			"pktgen: device not present %s. Using all\n", ifname);
+
+	if (!pg_initialized) {
+		pktgen_packet_type.dev = idev;
+		dev_add_pack(&pktgen_packet_type);
+		err = 0;
+		net_disable_timestamp();
+		pg_initialized = 1;
+	} else {
+		dev_remove_pack(&pktgen_packet_type);
+		pktgen_packet_type.dev = idev;
+		dev_add_pack(&pktgen_packet_type);
+		err = 0;
+	}
+	if (idev)
+		dev_put(idev);
+	return err;
+}
+
+static int pktgen_clean_rx(void)
+{
+	if (pg_initialized) {
+		dev_remove_pack(&pktgen_packet_type);
+		pg_initialized = 0;
+	}
+	return 0;
+}
+
+int pktgen_rcv_basic(struct sk_buff *skb, struct net_device *dev,
+			 struct packet_type *pt, struct net_device *orig_dev)
+{
+	/* Check magic*/
+	struct iphdr *iph = ip_hdr(skb);
+	struct pktgen_hdr *pgh;
+	void *vaddr;
+	if (skb_is_nonlinear(skb)) {
+		vaddr = kmap_skb_frag(&skb_shinfo(skb)->frags[0]);
+		pgh = (struct pktgen_hdr *)
+			(vaddr+skb_shinfo(skb)->frags[0].page_offset);
+	} else {
+		pgh = (struct pktgen_hdr *)(((char *)(iph)) + 28);
+	}
+
+	if (unlikely(pgh->pgh_magic != PKTGEN_MAGIC_NET))
+		goto end;
+
+	if (unlikely(!__get_cpu_var(pktgen_rx_data).rx_packets))
+		__get_cpu_var(pktgen_rx_data).start_time = ktime_now();
+
+	__get_cpu_var(pktgen_rx_data).last_time = ktime_now();
+
+	/* Update counter of packets*/
+	__get_cpu_var(pktgen_rx_data).rx_packets++;
+	__get_cpu_var(pktgen_rx_data).rx_bytes += skb->len+14;
+end:
+	if (skb_is_nonlinear(skb))
+		kunmap_skb_frag(vaddr);
+	kfree_skb(skb);
+	return 0;
+}
+
 static int __init pg_init(void)
 {
 	int cpu;
@@ -3908,6 +4201,15 @@ static int __init pg_init(void)
 	/* Register us to receive netdevice events */
 	register_netdevice_notifier(&pktgen_notifier_block);
 
+	/*Create proc rx*/
+	pe = proc_create(PGRX, 0600, pg_proc_dir, &pktgen_rx_fops);
+	if (pe == NULL) {
+		printk(KERN_ERR "pktgen: ERROR: cannot create %s "
+		       "procfs entry.\n", PGRX);
+		proc_net_remove(&init_net, PG_PROC_DIR);
+		return -EINVAL;
+	}
+
 	for_each_online_cpu(cpu) {
 		int err;
 
@@ -3921,6 +4223,8 @@ static int __init pg_init(void)
 		printk(KERN_ERR "pktgen: ERROR: Initialization failed for "
 		       "all threads\n");
 		unregister_netdevice_notifier(&pktgen_notifier_block);
+		pktgen_clean_rx();
+		remove_proc_entry(PGRX, pg_proc_dir);
 		remove_proc_entry(PGCTRL, pg_proc_dir);
 		proc_net_remove(&init_net, PG_PROC_DIR);
 		return -ENODEV;
@@ -3947,6 +4251,9 @@ static void __exit pg_cleanup(void)
 	/* Un-register us from receiving netdevice events */
 	unregister_netdevice_notifier(&pktgen_notifier_block);
 
+	pktgen_clean_rx();
+	remove_proc_entry(PGRX, pg_proc_dir);
+
 	/* Clean up proc file system */
 	remove_proc_entry(PGCTRL, pg_proc_dir);
 	proc_net_remove(&init_net, PG_PROC_DIR);


^ permalink raw reply related

* [PATCH] TCP: tcp_hybla: Fix integer overflow in slow start increment
From: Daniele Lacamera @ 2010-06-02 12:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, root, linux-kernel

From: root <root@kitchen.(none)>

For large values of rtt, 2^rho operation may overflow u32. Clamp down the increment to 2^16.
Signed-off-by: Daniele Lacamera <root@danielinux.net>
---
 net/ipv4/tcp_hybla.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_hybla.c b/net/ipv4/tcp_hybla.c
index c209e05..8db01d4 100644
--- a/net/ipv4/tcp_hybla.c
+++ b/net/ipv4/tcp_hybla.c
@@ -126,8 +126,8 @@ static void hybla_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 		 * calculate 2^fract in a <<7 value.
 		 */
 		is_slowstart = 1;
-		increment = ((1 << ca->rho) * hybla_fraction(rho_fractions))
-			- 128;
+		increment = ((1 << min(ca->rho, 16U)) *
+			hybla_fraction(rho_fractions)) - 128;
 	} else {
 		/*
 		 * congestion avoidance
-- 
1.5.6.5

^ permalink raw reply related

* [PATCH] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-02 11:58 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, kaber, jengelh, Timo Teras

This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.

Timers are identified by labels and are created when a rule is set with a new
label.  The rules also take a timeout value (in seconds) as an option.  If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.

One entry for each timer is created in sysfs.  This attribute contains the
timer remaining for the timer to expire.  The attributes are located under
the module's object:

/sys/module/xt_IDLETIMER/idletimer/<label>

When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).

Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
---
 include/linux/netfilter/xt_IDLETIMER.h |   40 ++++
 net/netfilter/Kconfig                  |   11 +
 net/netfilter/Makefile                 |    1 +
 net/netfilter/xt_IDLETIMER.c           |  347 ++++++++++++++++++++++++++++++++
 4 files changed, 399 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_IDLETIMER.h
 create mode 100644 net/netfilter/xt_IDLETIMER.c

diff --git a/include/linux/netfilter/xt_IDLETIMER.h b/include/linux/netfilter/xt_IDLETIMER.h
new file mode 100644
index 0000000..5958b31
--- /dev/null
+++ b/include/linux/netfilter/xt_IDLETIMER.h
@@ -0,0 +1,40 @@
+/*
+ * linux/include/linux/netfilter/xt_IDLETIMER.h
+ *
+ * Header file for Xtables timer target module.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and forward-ported to 2.6.34
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifndef _XT_IDLETIMER_H
+#define _XT_IDLETIMER_H
+
+#define MAX_LABEL_SIZE 32
+
+struct idletimer_tg_info {
+	unsigned int timeout;
+
+	char label[MAX_LABEL_SIZE];
+};
+
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8593a77..e891c61 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -424,6 +424,17 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_IDLETIMER
+	tristate  "IDLETIMER target support"
+	help
+
+	  This option adds the `IDLETIMER' target.  Each matching packet
+	  resets the timer associated with label specified when the rule is
+	  added.  When the timer expires, it triggers a sysfs notification.
+	  The remaining time for expiration can be read via sysfs.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_LED
 	tristate '"LED" target support'
 	depends on LEDS_CLASS && LEDS_TRIGGERS
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 14e3a8f..e28420a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
new file mode 100644
index 0000000..4a72b2f
--- /dev/null
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -0,0 +1,347 @@
+/*
+ * linux/net/netfilter/xt_IDLETIMER.c
+ *
+ * Netfilter module to trigger a timer when packet matches.
+ * After timer expires a kevent will be sent.
+ *
+ * Copyright (C) 2004, 2010 Nokia Corporation
+ * Written by Timo Teras <ext-timo.teras@nokia.com>
+ *
+ * Converted to x_tables and reworked for upstream inclusion
+ * by Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * Contact: Luciano Coelho <luciano.coelho@nokia.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/timer.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_IDLETIMER.h>
+#include <linux/kobject.h>
+#include <linux/workqueue.h>
+#include <linux/sysfs.h>
+
+struct idletimer_tg {
+	struct list_head entry;
+	struct timer_list timer;
+	struct work_struct work;
+
+	struct kobject *kobj;
+	struct idletimer_tg_attr *attr;
+
+	unsigned int refcnt;
+};
+
+/* Start with an empty group, we'll add attributes dynamically */
+static struct attribute *idletimer_tg_group_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group idletimer_tg_group = {
+	.attrs = idletimer_tg_group_attrs,
+};
+
+struct idletimer_tg_attr {
+        struct attribute attr;
+	ssize_t	(*show)(struct kobject *kobj,
+			struct attribute *attr, char *buf);
+};
+
+static LIST_HEAD(idletimer_tg_list);
+static DEFINE_SPINLOCK(list_lock);
+
+static struct kobject *idletimer_tg_kobj;
+
+static
+struct idletimer_tg *__idletimer_tg_find_by_label(const char *label)
+{
+	struct idletimer_tg *entry;
+
+	BUG_ON(!label);
+
+	list_for_each_entry(entry, &idletimer_tg_list, entry) {
+		if (!strcmp(label, entry->attr->attr.name))
+			return entry;
+	}
+
+	return NULL;
+}
+
+static ssize_t idletimer_tg_show(struct kobject *kobj, struct attribute *attr,
+				 char *buf)
+{
+	struct idletimer_tg *timer;
+	unsigned long expires = 0;
+
+	spin_lock_bh(&list_lock);
+	timer =	__idletimer_tg_find_by_label(attr->name);
+	if (timer)
+		expires = timer->timer.expires;
+	spin_unlock_bh(&list_lock);
+
+	if (expires > jiffies)
+		return sprintf(buf, "%u\n",
+			       jiffies_to_msecs(expires - jiffies) / 1000);
+
+	return sprintf(buf, "0\n");
+}
+
+static void idletimer_tg_delete(const struct idletimer_tg_info *info)
+{
+	struct idletimer_tg *timer;
+
+	spin_lock_bh(&list_lock);
+	timer = __idletimer_tg_find_by_label(info->label);
+	if (!timer) {
+		spin_unlock_bh(&list_lock);
+		return;
+	}
+
+	if (--timer->refcnt == 0) {
+		pr_debug("deleting timer %s\n", info->label);
+
+		list_del(&timer->entry);
+		del_timer_sync(&timer->timer);
+		spin_unlock_bh(&list_lock);
+
+		sysfs_remove_file_from_group(idletimer_tg_kobj,
+					     &timer->attr->attr, NULL);
+		kfree(timer->attr->attr.name);
+		kfree(timer->attr);
+		kfree(timer);
+	}
+	else {
+		spin_unlock_bh(&list_lock);
+		pr_debug("decreased refcnt of timer %s to %u\n",
+			 info->label, timer->refcnt);
+	}
+}
+
+static void idletimer_tg_work(struct work_struct *work)
+{
+	struct idletimer_tg *timer = container_of(work, struct idletimer_tg,
+						  work);
+
+	sysfs_notify(idletimer_tg_kobj, NULL,
+		     timer->attr->attr.name);
+}
+
+static void idletimer_tg_expired(unsigned long data)
+{
+	struct idletimer_tg *timer = (struct idletimer_tg *) data;
+
+	pr_debug("timer %s expired\n",
+		 timer->attr->attr.name);
+
+	schedule_work(&timer->work);
+}
+
+static
+struct idletimer_tg *idletimer_tg_create(const struct idletimer_tg_info *info)
+{
+	struct idletimer_tg *timer;
+	struct idletimer_tg_attr *attr;
+
+	attr = kzalloc(sizeof(attr), GFP_KERNEL);
+	if (!attr) {
+		pr_debug("couldn't alloc attribute\n");
+		return NULL;
+	}
+
+	attr->attr.name = kstrdup(info->label, GFP_KERNEL);
+	attr->attr.mode = 0444;
+	attr->show = idletimer_tg_show;
+
+	if (sysfs_add_file_to_group(idletimer_tg_kobj, &attr->attr, NULL)) {
+		pr_debug("couldn't add attr to sysfs\n");
+		goto out_free;
+	}
+
+	timer = kmalloc(sizeof(struct idletimer_tg), GFP_KERNEL);
+	if (!timer) {
+		pr_debug("couldn't alloc timer\n");
+		goto out_free;
+	}
+
+	spin_lock_bh(&list_lock);
+	list_add(&timer->entry, &idletimer_tg_list);
+
+	init_timer(&timer->timer);
+	setup_timer(&timer->timer, idletimer_tg_expired, (unsigned long) timer);
+	mod_timer(&timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+
+	timer->attr = attr;
+	timer->refcnt = 0;
+
+	INIT_WORK(&timer->work, idletimer_tg_work);
+	spin_unlock_bh(&list_lock);
+
+	return timer;
+
+out_free:
+	kfree(attr->attr.name);
+	kfree(attr);
+	return NULL;
+}
+
+static void idletimer_tg_cleanup(void)
+{
+	struct idletimer_tg *timer;
+
+	sysfs_remove_group(idletimer_tg_kobj,
+			   &idletimer_tg_group);
+
+	spin_lock(&list_lock);
+	list_for_each_entry(timer, &idletimer_tg_list, entry) {
+		pr_debug("deleting timer %s\n", timer->attr->attr.name);
+
+		list_del(&timer->entry);
+		del_timer_sync(&timer->timer);
+		kfree(timer->attr->attr.name);
+		kfree(timer->attr);
+		kfree(timer);
+	}
+	spin_unlock(&list_lock);
+}
+
+/*
+ * The actual xt_tables plugin.
+ */
+static unsigned int idletimer_tg_target(struct sk_buff *skb,
+					 const struct xt_action_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+	struct idletimer_tg *timer;
+
+	pr_debug("resetting timer %s, timeout period %u\n",
+		 info->label, info->timeout);
+
+	spin_lock(&list_lock);
+	timer = __idletimer_tg_find_by_label(info->label);
+
+	BUG_ON(!timer);
+
+	mod_timer(&timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+	spin_unlock(&list_lock);
+
+	return XT_CONTINUE;
+}
+
+static int idletimer_tg_checkentry(const struct xt_tgchk_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+	struct idletimer_tg *timer;
+
+	pr_debug("checkentry targinfo %s\n", info->label);
+
+	if (info->timeout == 0) {
+		pr_debug("timeout value is zero\n");
+		return -EINVAL;
+	}
+
+	if (!info->label || strlen(info->label) == 0) {
+		pr_debug("label is missing\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&list_lock);
+	timer = __idletimer_tg_find_by_label(info->label);
+	if (!timer) {
+		spin_unlock(&list_lock);
+		timer = idletimer_tg_create(info);
+		if (!timer) {
+			pr_debug("failed to create timer\n");
+			return -ENOMEM;
+		}
+		spin_lock(&list_lock);
+	}
+
+	timer->refcnt++;
+	mod_timer(&timer->timer,
+		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
+	spin_unlock(&list_lock);
+
+	return 0;
+}
+
+static void idletimer_tg_destroy(const struct xt_tgdtor_param *par)
+{
+	const struct idletimer_tg_info *info = par->targinfo;
+
+	pr_debug("destroy targinfo %s\n", info->label);
+
+	idletimer_tg_delete(info);
+}
+
+static struct xt_target idletimer_tg __read_mostly = {
+	.name		= "IDLETIMER",
+	.family		= NFPROTO_IPV4,
+	.target		= idletimer_tg_target,
+	.targetsize     = sizeof(struct idletimer_tg_info),
+	.checkentry	= idletimer_tg_checkentry,
+	.destroy        = idletimer_tg_destroy,
+	.me		= THIS_MODULE,
+};
+
+static int __init idletimer_tg_init(void)
+{
+	int ret;
+
+	idletimer_tg_kobj = kobject_create_and_add("idletimer",
+						   &THIS_MODULE->mkobj.kobj);
+	if (!idletimer_tg_kobj)
+		return -ENOMEM;
+
+	/* FIXME: do we want to keep it in the module or in the net class? */
+	ret = sysfs_create_group(idletimer_tg_kobj,
+				 &idletimer_tg_group);
+	if (ret < 0) {
+		pr_debug("failed to create sysfs group");
+		return ret;
+	}
+
+	ret =  xt_register_target(&idletimer_tg);
+	if (ret < 0) {
+		kobject_put(idletimer_tg_kobj);
+		idletimer_tg_cleanup();
+		return ret;
+	}
+
+	return 0;
+}
+
+static void __exit idletimer_tg_exit(void)
+{
+	xt_unregister_target(&idletimer_tg);
+	kobject_put(idletimer_tg_kobj);
+	idletimer_tg_cleanup();
+}
+
+module_init(idletimer_tg_init);
+module_exit(idletimer_tg_exit);
+
+MODULE_AUTHOR("Timo Teras <ext-timo.teras@nokia.com>");
+MODULE_AUTHOR("Luciano Coelho <luciano.coelho@nokia.com>");
+MODULE_DESCRIPTION("Xtables: idle time monitor");
+MODULE_LICENSE("GPL v2");
-- 
1.6.3.3


^ permalink raw reply related

* Re: [PATCH] net: add additional lock to qdisc to increase throughput
From: David Miller @ 2010-06-02 12:10 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: eric.dumazet, netdev
In-Reply-To: <80769D7B14936844A23C0C43D9FBCF0F256284B068@orsmsx501.amr.corp.intel.com>

From: "Duyck, Alexander H" <alexander.h.duyck@intel.com>
Date: Fri, 21 May 2010 13:04:20 -0700

> Eric Dumazet wrote:
>> Tests with following script gave a boost from ~50.000 pps to ~600.000
>> pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
>> (A single netperf flow can reach ~800.000 pps on this platform)
>> 
>> for j in `seq 0 3`; do
>>   for i in `seq 0 7`; do
>>     netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
>>   done
>> done
> 
> Running the same script with your patch my results went from 200Kpps to 1.2Mpps on a dual Xeon 5570.
> 
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

Applied, thanks guys.

^ permalink raw reply

* Re: [patch 1/2] caif: remove unneeded variable from caif_net_open()
From: David Miller @ 2010-06-02 12:16 UTC (permalink / raw)
  To: error27; +Cc: sjur.brandeland, netdev, kernel-janitors
In-Reply-To: <20100601070855.GD5483@bicker>

From: Dan Carpenter <error27@gmail.com>
Date: Tue, 1 Jun 2010 09:08:55 +0200

> We don't use the "ser" variable so I've removed it.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Applied.

^ permalink raw reply

* Re: [patch 2/2] caif: add newlines after declarations in caif_serial.c
From: David Miller @ 2010-06-02 12:16 UTC (permalink / raw)
  To: error27; +Cc: sjur.brandeland, netdev, kernel-janitors
In-Reply-To: <20100601070933.GE5483@bicker>

From: Dan Carpenter <error27@gmail.com>
Date: Tue, 1 Jun 2010 09:09:33 +0200

> I added newlines after the declarations in caif_serial.c.  This is
> normal kernel style, although I can't see anywhere it's documented.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] mac8390: propagate error code from request_irq
From: David Miller @ 2010-06-02 12:17 UTC (permalink / raw)
  To: fthain; +Cc: geert, joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.OSX.2.00.1006011738330.299@localhost>

From: Finn Thain <fthain@telegraphics.com.au>
Date: Tue, 1 Jun 2010 22:18:32 +1000 (EST)

> 
> Use the request_irq() error code as the return value for mac8390_open(). 
> EAGAIN doesn't make sense for Nubus slot IRQs. Only this driver can claim 
> this IRQ (until the NIC is removed, which means everything is powered 
> down).
> 
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

Applied.

^ permalink raw reply

* Re: [PATCH] ppp: eliminate shadowed variable name
From: David Miller @ 2010-06-02 12:17 UTC (permalink / raw)
  To: shemminger; +Cc: paulus, netdev
In-Reply-To: <20100601090546.79c1fca0@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 1 Jun 2010 09:05:46 -0700

> Sparse complains about shadowed declaration of skb. So use other
> name.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: CONFIG_NET_NS reduction
From: David Miller @ 2010-06-02 12:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, kaber
In-Reply-To: <1275411079.2738.231.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 01 Jun 2010 18:51:19 +0200

> Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] cls_u32: use skb_copy_bits() to dereference data safely
From: jamal @ 2010-06-02 12:20 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <AANLkTilkweiJPmQOUv78lKL9ohZo1StzlJcBw0johYi0@mail.gmail.com>

Hi Changli,

On Wed, 2010-06-02 at 01:47 +0800, Changli Gao wrote:

> 
> I added the following debug code into cls_u32.c
> 
>                 for (i = n->sel.nkeys; i>0; i--, key++) {
> +                       int off;
> +
> +                       off = key->off+(off2&key->offmask) + (ptr - skb->data);
> +                       if (off + 4 > skb->len)
> +                               printk("skb->len: %d, off: %d\n",
> skb->len, off);
> 

Ok, makes more sense. And thanks for taking time to construct a
meaningful example.
It is not a common use - but i agree it is a bug.
I am suprised we never caught this all this years and wondering why this
never crashed in your example?
Can we make the fix very simple please? i.e no copy bits, this is the
fast path.

> It isn't an optimization, but an error exit. :)

What i meant was if you can tell immediately what the maximum offset is
then you dont need to go through for loop making comparison with each
key. You could immediately bailout - which is an optimization ;->

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] cls_u32: use skb_copy_bits() to dereference data safely
From: jamal @ 2010-06-02 12:25 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1275481219.14363.6.camel@bigi>

On Wed, 2010-06-02 at 08:21 -0400, jamal wrote:

> Can we make the fix very simple please? i.e no copy bits, this is the
> fast path.

Example, something along lines of:

---
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 9627542..dde7a23 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -135,6 +135,9 @@ next_knode:
 
 for (i = n->sel.nkeys; i>0; i--, key++) {
 
+        int toff = key->off+(off2&key->offmask)- 4;
+        if (unlikely(toff > skb->len))
+              /* bailout here - needs some thought */
         if ((*(__be32*)(ptr+key->off+(off2&key->offmask))^key->v
             n = n->next;
             goto next_knode;
----


cheers,
jamal


^ permalink raw reply related

* Re: [PATCH] cls_u32: use skb_copy_bits() to dereference data safely
From: David Miller @ 2010-06-02 12:45 UTC (permalink / raw)
  To: hadi; +Cc: xiaosuo, netdev
In-Reply-To: <1275481538.14363.10.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 02 Jun 2010 08:25:38 -0400

> --- a/net/sched/cls_u32.c
> +++ b/net/sched/cls_u32.c
> @@ -135,6 +135,9 @@ next_knode:
>  
>  for (i = n->sel.nkeys; i>0; i--, key++) {
>  
> +        int toff = key->off+(off2&key->offmask)- 4;
> +        if (unlikely(toff > skb->len))
> +              /* bailout here - needs some thought */
>          if ((*(__be32*)(ptr+key->off+(off2&key->offmask))^key->v

I don't think it's that simple.

You can't dereference from the skb->data linear area if your offset is
beyond "skb->len - skb->data_len" (aka. skb_headlen()) since that's
where the paged or fragmented portion starts.

We really need to use skb_copy_bits() if we want to allow
any offset into the SKB, and because of all the ways
packets can be transformed and constructed we absolutely
have to.

^ permalink raw reply

* Re: [PATCH] cls_u32: use skb_copy_bits() to dereference data safely
From: David Miller @ 2010-06-02 12:47 UTC (permalink / raw)
  To: hadi; +Cc: xiaosuo, netdev
In-Reply-To: <1275481219.14363.6.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Wed, 02 Jun 2010 08:20:19 -0400

> I am suprised we never caught this all this years and wondering why
> this never crashed in your example?

Well for one thing there is all sorts of "stuff" past the end of the
valid skb->data area.  For example, there is some padding and then
there is skb_shared_info().

Furthermore, the kernel allocator can round up the size it uses for
SLAB objects which gives even more padding past the end of even
skb_shared_info().

Futrhermore, the chance of the page past the page skb->data is in
being invalid is very low.  You'd have to have invalid memory in the
page after the skb->data.

All of this conspires to just letting blind reads work in a large
number of illegal cases.

^ permalink raw reply

* Re: [PATCH]: vxge: Fix checkstack warning in vxge_probe()
From: David Miller @ 2010-06-02 12:51 UTC (permalink / raw)
  To: prarit; +Cc: netdev, mschmidt
In-Reply-To: <20100528175735.30903.21134.sendpatchset@prarit.bos.redhat.com>

From: Prarit Bhargava <prarit@redhat.com>
Date: Fri, 28 May 2010 14:01:49 -0400

> Linux 2.6.33 reports this checkstack warning:
> 
> drivers/net/vxge/vxge-main.c: In function 'vxge_probe':
> drivers/net/vxge/vxge-main.c:4409: warning: the frame size of 1028 bytes is larger than 1024 bytes
> 
> This warning does not occur in the latest linux-2.6 or linux-next, however,
> when I do a 'make -j32 CONFIG_FRAME_WARN=512' instead of 1024 I see
> 
> drivers/net/vxge/vxge-main.c: In function ‘vxge_probe’:
> drivers/net/vxge/vxge-main.c:4423: warning: the frame size of 1024 bytes is larger than 512 bytes
> 
> This patch moves the large vxge_config struct off the stack.
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH] packet_mmap: expose hw packet timestamps to network packet capture utilities
From: David Miller @ 2010-06-02 12:54 UTC (permalink / raw)
  To: scott.a.mcmillan; +Cc: netdev, tcpdump-workers
In-Reply-To: <09ED21B37E0F694688A2317C4FED9ED3046E53F595@azsmsx504.amr.corp.intel.com>

From: "Mcmillan, Scott A" <scott.a.mcmillan@intel.com>
Date: Thu, 27 May 2010 09:58:19 -0700

> Subject: [RFC PATCH] packet_mmap: expose hw packet timestamps to network packet capture utilities
> 
> This patch adds a setting, PACKET_TIMESTAMP, to specify the packet
> timestamp source that is exported to capture utilities like tcpdump
> by packet_mmap.
> 
> PACKET_TIMESTAMP accepts the same integer bit field as
> SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE
> and SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by
> PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence
> over SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.
> 
> If PACKET_TIMESTAMP is not set, a software timestamp generated
> inside the networking stack is used (the behavior before this
> setting was added).
> 
> Signed-off-by: Scott McMillan <scott.a.mcmillan@intel.com>

Ok, this looks fine, applied.

Thanks.

^ permalink raw reply

* Re: [PATCH] netfilter: Xtables: idletimer target implementation
From: Jan Engelhardt @ 2010-06-02 12:54 UTC (permalink / raw)
  To: Luciano Coelho; +Cc: netdev, netfilter-devel, kaber, Timo Teras
In-Reply-To: <1275479897-16779-1-git-send-email-luciano.coelho@nokia.com>


On Wednesday 2010-06-02 13:58, Luciano Coelho wrote:
>+
>+#ifndef _XT_IDLETIMER_H
>+#define _XT_IDLETIMER_H
>+
>+#define MAX_LABEL_SIZE 32
>+
>+struct idletimer_tg_info {
>+	unsigned int timeout;
>+
>+	char label[MAX_LABEL_SIZE];
>+};

As per "Writing Netfilter Modules" e-book, using "int" is a no-no.

> 
>+config NETFILTER_XT_TARGET_IDLETIMER
>+	tristate  "IDLETIMER target support"

depends on NETFILTER_ADVANCED

>xt_IDLETIMER.c
>+struct idletimer_tg_attr {
>+        struct attribute attr;
>+	ssize_t	(*show)(struct kobject *kobj,
>+			struct attribute *attr, char *buf);
>+};

Some indent seems to have gone wrong.

>+	attr->attr.name = kstrdup(info->label, GFP_KERNEL);

Need to check return value!

>+	attr->attr.mode = 0444;

attr->attr.mode = S_IRUGO;

>+static struct xt_target idletimer_tg __read_mostly = {
>+	.name		= "IDLETIMER",
>+	.family		= NFPROTO_IPV4,

NFPROTO_UNSPEC

>+	.target		= idletimer_tg_target,
>+	.targetsize     = sizeof(struct idletimer_tg_info),
>+	.checkentry	= idletimer_tg_checkentry,
>+	.destroy        = idletimer_tg_destroy,
>+	.me		= THIS_MODULE,
>+};
>+
>+static int __init idletimer_tg_init(void)
>+{
>+	int ret;
>+
>+	idletimer_tg_kobj = kobject_create_and_add("idletimer",
>+						   &THIS_MODULE->mkobj.kobj);
>+	if (!idletimer_tg_kobj)
>+		return -ENOMEM;
>+
>+	/* FIXME: do we want to keep it in the module or in the net class? */

I have only ever seen interfaces in /sys/class/net, so it might be
wise to keep it that way in light of scripts doing 
echo /sys/class/net/*  to get a list of interfaces.


Looks quite ok.

^ permalink raw reply

* Re: [PATCH] phylib: Add support for the LXT973 phy.
From: Richard Cochran @ 2010-06-02 12:55 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev
In-Reply-To: <AANLkTilKJVZiKLRUlL7AeaqSnlAs2u5sTW7o2vfvEKns@mail.gmail.com>

On Tue, Jun 01, 2010 at 05:39:22PM -0500, Andy Fleming wrote:
> That's a bit hacky.  There is a dev_flags field, which could be used
> for this.  Probably, we should add a more general way of saying what
> sort of port this is.  But don't use the presence and absence of
> "priv", as it could one day get used for a different purpose, and this
> seems like it would leave us open to strange bugs.

Okay, I changed it.

At first, I was worried about using 'dev_flags' because I couldn't
tell exactly who may write to this field. Looking at tg.c and
broadcom.c, it appears that the MAC drivers may also write this
field. In contrast, the 'priv' field is surely private.

> Also, is this erratum true for all lxt973 models, or is it fixed in
> some revisions?

The documentation http://www.cortina-systems.com/products/download/266
says, "Status: This erratum has been previously fixed." However, I
could not find a reference to when this was fixed.

Richard

Date: Wed, 2 Jun 2010 13:47:02 +0200
Subject: [PATCH] phylib: Add support for the LXT973 phy.

This patch implements a work around for Erratum 5, "3.3 V Fiber Speed
Selection." If the hardware wiring does not respect this erratum, then
fiber optic mode will not work properly.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
---
 drivers/net/phy/lxt.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/lxt.c b/drivers/net/phy/lxt.c
index 8ee929b..4f97fdc 100644
--- a/drivers/net/phy/lxt.c
+++ b/drivers/net/phy/lxt.c
@@ -53,6 +53,9 @@
 
 #define MII_LXT971_ISR		19  /* Interrupt Status Register */
 
+/* register definitions for the 973 */
+#define MII_LXT973_PCR 16 /* Port Configuration Register */
+#define PCR_FIBER_SELECT 1
 
 MODULE_DESCRIPTION("Intel LXT PHY driver");
 MODULE_AUTHOR("Andy Fleming");
@@ -119,6 +122,34 @@ static int lxt971_config_intr(struct phy_device *phydev)
 	return err;
 }
 
+static int lxt973_probe(struct phy_device *phydev)
+{
+	int val = phy_read(phydev, MII_LXT973_PCR);
+
+	if (val & PCR_FIBER_SELECT) {
+		/*
+		 * If fiber is selected, then the only correct setting
+		 * is 100Mbps, full duplex, and auto negotiation off.
+		 */
+		val = phy_read(phydev, MII_BMCR);
+		val |= (BMCR_SPEED100 | BMCR_FULLDPLX);
+		val &= ~BMCR_ANENABLE;
+		phy_write(phydev, MII_BMCR, val);
+		/* Remember that the port is in fiber mode. */
+		phydev->dev_flags = PCR_FIBER_SELECT;
+	} else {
+		phydev->dev_flags = 0;
+	}
+	return 0;
+}
+
+static int lxt973_config_aneg(struct phy_device *phydev)
+{
+	/* Do nothing if port is in fiber mode. */
+	return PCR_FIBER_SELECT == phydev->dev_flags ?
+		0 : genphy_config_aneg(phydev);
+}
+
 static struct phy_driver lxt970_driver = {
 	.phy_id		= 0x78100000,
 	.name		= "LXT970",
@@ -146,6 +177,18 @@ static struct phy_driver lxt971_driver = {
 	.driver 	= { .owner = THIS_MODULE,},
 };
 
+static struct phy_driver lxt973_driver = {
+	.phy_id		= 0x00137a10,
+	.name		= "LXT973",
+	.phy_id_mask	= 0xfffffff0,
+	.features	= PHY_BASIC_FEATURES,
+	.flags		= 0,
+	.probe		= lxt973_probe,
+	.config_aneg	= lxt973_config_aneg,
+	.read_status	= genphy_read_status,
+	.driver		= { .owner = THIS_MODULE,},
+};
+
 static int __init lxt_init(void)
 {
 	int ret;
@@ -157,9 +200,15 @@ static int __init lxt_init(void)
 	ret = phy_driver_register(&lxt971_driver);
 	if (ret)
 		goto err2;
+
+	ret = phy_driver_register(&lxt973_driver);
+	if (ret)
+		goto err3;
 	return 0;
 
- err2:	
+ err3:
+	phy_driver_unregister(&lxt971_driver);
+ err2:
 	phy_driver_unregister(&lxt970_driver);
  err1:
 	return ret;
@@ -169,6 +218,7 @@ static void __exit lxt_exit(void)
 {
 	phy_driver_unregister(&lxt970_driver);
 	phy_driver_unregister(&lxt971_driver);
+	phy_driver_unregister(&lxt973_driver);
 }
 
 module_init(lxt_init);
-- 
1.6.3.3



^ permalink raw reply related

* Re: [PATCH] phylib: Add support for the LXT973 phy.
From: Richard Cochran @ 2010-06-02 13:07 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev
In-Reply-To: <20100602125527.GA20396@riccoc20.at.omicron.at>

On Wed, Jun 02, 2010 at 02:55:27PM +0200, Richard Cochran wrote:
> On Tue, Jun 01, 2010 at 05:39:22PM -0500, Andy Fleming wrote:
> > Also, is this erratum true for all lxt973 models, or is it fixed in
> > some revisions?
> 
> The documentation http://www.cortina-systems.com/products/download/266
> says, "Status: This erratum has been previously fixed." However, I
> could not find a reference to when this was fixed.

In any case, the PHY only supports 100 Mbps when in fiber mode, so the
fix is always safe to use.

Richard

^ permalink raw reply

* Re: [PATCH 1/3] korina: fix deadlock on RX FIFO overrun
From: David Miller @ 2010-06-02 13:12 UTC (permalink / raw)
  To: phil; +Cc: florian, netdev
In-Reply-To: <20100529232343.ABC1D4CD45@orbit.nwl.cc>

From: Phil Sutter <phil@nwl.cc>
Date: Sun, 30 May 2010 01:23:34 +0200

> By calling korina_restart(), the IRQ handler tries to disable the
> interrupt it's currently serving. This leads to a deadlock since
> disable_irq() waits for any running IRQ handlers to finish before
> returning. This patch addresses the issue by turning korina_restart()
> into a workqueue task, which is then scheduled when needed.
> 
> Reproducing the deadlock is easily done using e.g. GNU netcat to send
> large amounts of UDP data to the host running this driver.
> 
> Note that the same problem (and fix) applies to TX FIFO underruns, but
> apparently these are less easy to trigger.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3] korina: use netdev_alloc_skb_ip_align() here, too
From: David Miller @ 2010-06-02 13:12 UTC (permalink / raw)
  To: phil; +Cc: florian, netdev
In-Reply-To: <20100529232348.3179F4CD57@orbit.nwl.cc>

From: Phil Sutter <phil@nwl.cc>
Date: Sun, 30 May 2010 01:23:35 +0200

> This patch completes commit 89d71a66c40d629e3b1285def543ab1425558cd5
> which missed this spot, as it seems.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox