Netdev List
 help / color / mirror / Atom feed
* linux-next: manual merge of the net tree with the net-current tree
From: Stephen Rothwell @ 2010-08-02  1:04 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: linux-next, linux-kernel, Herbert Xu

Hi all,

Today's linux-next merge of the net tree got a conflict in
net/bridge/br_device.c between commit
6d1d1d398cb7db7a12c5d652d50f85355345234f ("bridge: Fix skb leak when
multicast parsing fails on TX") from the net-current tree and commit
91d2c34a4eed32876ca333b0ca44f3bc56645805 ("bridge: Fix netpoll support")
from the net tree.

Just context changes. I fixed it up (see below) and can carry the fix for
a while.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc net/bridge/br_device.c
index f49bcd9,075c435..0000000
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@@ -46,12 -48,13 +48,16 @@@ netdev_tx_t br_dev_xmit(struct sk_buff 
  	skb_reset_mac_header(skb);
  	skb_pull(skb, ETH_HLEN);
  
 +	rcu_read_lock();
  	if (is_multicast_ether_addr(dest)) {
+ 		if (unlikely(netpoll_tx_running(dev))) {
+ 			br_flood_deliver(br, skb);
+ 			goto out;
+ 		}
 -		if (br_multicast_rcv(br, NULL, skb))
 +		if (br_multicast_rcv(br, NULL, skb)) {
 +			kfree_skb(skb);
  			goto out;
 +		}
  
  		mdst = br_mdb_get(br, skb);
  		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))

^ permalink raw reply

* [PATCH net-next] drivers/net/wan/farsync.c: Use standard pr_<level>
From: Joe Perches @ 2010-08-02  3:20 UTC (permalink / raw)
  To: Kevin Curtis; +Cc: David S. Miller, netdev, LKML

Remove locally defined equivalents

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/wan/farsync.c |  111 +++++++++++++++++++++-----------------------
 1 files changed, 53 insertions(+), 58 deletions(-)

diff --git a/drivers/net/wan/farsync.c b/drivers/net/wan/farsync.c
index 43b7727..06965a0 100644
--- a/drivers/net/wan/farsync.c
+++ b/drivers/net/wan/farsync.c
@@ -15,6 +15,8 @@
  *      Maintainer:  Kevin Curtis  <kevin.curtis@farsite.co.uk>
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/version.h>
@@ -511,21 +513,19 @@ static int fst_debug_mask = { FST_DEBUG };
  * support variable numbers of macro parameters. The inverted if prevents us
  * eating someone else's else clause.
  */
-#define dbg(F,fmt,A...) if ( ! ( fst_debug_mask & (F))) \
-                                ; \
-                        else \
-                                printk ( KERN_DEBUG FST_NAME ": " fmt, ## A )
-
+#define dbg(F, fmt, args...)					\
+do {								\
+	if (fst_debug_mask & (F))				\
+		printk(KERN_DEBUG pr_fmt(fmt), ##args);		\
+} while (0)
 #else
-#define dbg(X...)		/* NOP */
+#define dbg(F, fmt, args...)					\
+do {								\
+	if (0)							\
+		printk(KERN_DEBUG pr_fmt(fmt), ##args);		\
+} while (0)
 #endif
 
-/*      Printing short cuts
- */
-#define printk_err(fmt,A...)    printk ( KERN_ERR     FST_NAME ": " fmt, ## A )
-#define printk_warn(fmt,A...)   printk ( KERN_WARNING FST_NAME ": " fmt, ## A )
-#define printk_info(fmt,A...)   printk ( KERN_INFO    FST_NAME ": " fmt, ## A )
-
 /*
  *      PCI ID lookup table
  */
@@ -961,7 +961,7 @@ fst_issue_cmd(struct fst_port_info *port, unsigned short cmd)
 		spin_lock_irqsave(&card->card_lock, flags);
 
 		if (++safety > 2000) {
-			printk_err("Mailbox safety timeout\n");
+			pr_err("Mailbox safety timeout\n");
 			break;
 		}
 
@@ -1241,8 +1241,8 @@ fst_intr_rx(struct fst_card_info *card, struct fst_port_info *port)
 		 * This seems to happen on the TE1 interface sometimes
 		 * so throw the frame away and log the event.
 		 */
-		printk_err("Frame received with 0 length. Card %d Port %d\n",
-			   card->card_no, port->index);
+		pr_err("Frame received with 0 length. Card %d Port %d\n",
+		       card->card_no, port->index);
 		/* Return descriptor to card */
 		FST_WRB(card, rxDescrRing[pi][rxp].bits, DMA_OWN);
 
@@ -1486,9 +1486,8 @@ fst_intr(int dummy, void *dev_id)
 	 */
 	dbg(DBG_INTR, "intr: %d %p\n", card->irq, card);
 	if (card->state != FST_RUNNING) {
-		printk_err
-		    ("Interrupt received for card %d in a non running state (%d)\n",
-		     card->card_no, card->state);
+		pr_err("Interrupt received for card %d in a non running state (%d)\n",
+		       card->card_no, card->state);
 
 		/* 
 		 * It is possible to really be running, i.e. we have re-loaded
@@ -1614,8 +1613,7 @@ fst_intr(int dummy, void *dev_id)
 			break;
 
 		default:
-			printk_err("intr: unknown card event %d. ignored\n",
-				   event);
+			pr_err("intr: unknown card event %d. ignored\n", event);
 			break;
 		}
 
@@ -1637,13 +1635,13 @@ check_started_ok(struct fst_card_info *card)
 
 	/* Check structure version and end marker */
 	if (FST_RDW(card, smcVersion) != SMC_VERSION) {
-		printk_err("Bad shared memory version %d expected %d\n",
-			   FST_RDW(card, smcVersion), SMC_VERSION);
+		pr_err("Bad shared memory version %d expected %d\n",
+		       FST_RDW(card, smcVersion), SMC_VERSION);
 		card->state = FST_BADVERSION;
 		return;
 	}
 	if (FST_RDL(card, endOfSmcSignature) != END_SIG) {
-		printk_err("Missing shared memory signature\n");
+		pr_err("Missing shared memory signature\n");
 		card->state = FST_BADVERSION;
 		return;
 	}
@@ -1651,11 +1649,11 @@ check_started_ok(struct fst_card_info *card)
 	if ((i = FST_RDB(card, taskStatus)) == 0x01) {
 		card->state = FST_RUNNING;
 	} else if (i == 0xFF) {
-		printk_err("Firmware initialisation failed. Card halted\n");
+		pr_err("Firmware initialisation failed. Card halted\n");
 		card->state = FST_HALTED;
 		return;
 	} else if (i != 0x00) {
-		printk_err("Unknown firmware status 0x%x\n", i);
+		pr_err("Unknown firmware status 0x%x\n", i);
 		card->state = FST_HALTED;
 		return;
 	}
@@ -1665,9 +1663,10 @@ check_started_ok(struct fst_card_info *card)
 	 * existing firmware etc so we just report it for the moment.
 	 */
 	if (FST_RDL(card, numberOfPorts) != card->nports) {
-		printk_warn("Port count mismatch on card %d."
-			    " Firmware thinks %d we say %d\n", card->card_no,
-			    FST_RDL(card, numberOfPorts), card->nports);
+		pr_warning("Port count mismatch on card %d. "
+			   "Firmware thinks %d we say %d\n",
+			   card->card_no,
+			   FST_RDL(card, numberOfPorts), card->nports);
 	}
 }
 
@@ -2090,9 +2089,8 @@ fst_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		 */
 
 		if (card->state != FST_RUNNING) {
-			printk_err
-			    ("Attempt to configure card %d in non-running state (%d)\n",
-			     card->card_no, card->state);
+			pr_err("Attempt to configure card %d in non-running state (%d)\n",
+			       card->card_no, card->state);
 			return -EIO;
 		}
 		if (copy_from_user(&info, ifr->ifr_data, sizeof (info))) {
@@ -2384,8 +2382,8 @@ fst_init_card(struct fst_card_info *card)
                 err = register_hdlc_device(card->ports[i].dev);
                 if (err < 0) {
 			int j;
-                        printk_err ("Cannot register HDLC device for port %d"
-                                    " (errno %d)\n", i, -err );
+			pr_err("Cannot register HDLC device for port %d (errno %d)\n",
+			       i, -err);
 			for (j = i; j < card->nports; j++) {
 				free_netdev(card->ports[j].dev);
 				card->ports[j].dev = NULL;
@@ -2395,10 +2393,10 @@ fst_init_card(struct fst_card_info *card)
                 }
 	}
 
-	printk_info("%s-%s: %s IRQ%d, %d ports\n",
-	       port_to_dev(&card->ports[0])->name,
-	       port_to_dev(&card->ports[card->nports - 1])->name,
-	       type_strings[card->type], card->irq, card->nports);
+	pr_info("%s-%s: %s IRQ%d, %d ports\n",
+		port_to_dev(&card->ports[0])->name,
+		port_to_dev(&card->ports[card->nports - 1])->name,
+		type_strings[card->type], card->irq, card->nports);
 }
 
 static const struct net_device_ops fst_ops = {
@@ -2417,19 +2415,17 @@ static const struct net_device_ops fst_ops = {
 static int __devinit
 fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
-	static int firsttime_done = 0;
 	static int no_of_cards_added = 0;
 	struct fst_card_info *card;
 	int err = 0;
 	int i;
 
-	if (!firsttime_done) {
-		printk_info("FarSync WAN driver " FST_USER_VERSION
-		       " (c) 2001-2004 FarSite Communications Ltd.\n");
-		firsttime_done = 1;
-		dbg(DBG_ASS, "The value of debug mask is %x\n", fst_debug_mask);
-	}
-
+	printk_once(KERN_INFO
+		    pr_fmt("FarSync WAN driver " FST_USER_VERSION
+			   " (c) 2001-2004 FarSite Communications Ltd.\n"));
+#if FST_DEBUG
+	dbg(DBG_ASS, "The value of debug mask is %x\n", fst_debug_mask);
+#endif
 	/*
 	 * We are going to be clever and allow certain cards not to be
 	 * configured.  An exclude list can be provided in /etc/modules.conf
@@ -2441,8 +2437,8 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		 */
 		for (i = 0; i < fst_excluded_cards; i++) {
 			if ((pdev->devfn) >> 3 == fst_excluded_list[i]) {
-				printk_info("FarSync PCI device %d not assigned\n",
-				       (pdev->devfn) >> 3);
+				pr_info("FarSync PCI device %d not assigned\n",
+					(pdev->devfn) >> 3);
 				return -EBUSY;
 			}
 		}
@@ -2451,20 +2447,19 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Allocate driver private data */
 	card = kzalloc(sizeof (struct fst_card_info), GFP_KERNEL);
 	if (card == NULL) {
-		printk_err("FarSync card found but insufficient memory for"
-			   " driver storage\n");
+		pr_err("FarSync card found but insufficient memory for driver storage\n");
 		return -ENOMEM;
 	}
 
 	/* Try to enable the device */
 	if ((err = pci_enable_device(pdev)) != 0) {
-		printk_err("Failed to enable card. Err %d\n", -err);
+		pr_err("Failed to enable card. Err %d\n", -err);
 		kfree(card);
 		return err;
 	}
 
 	if ((err = pci_request_regions(pdev, "FarSync")) !=0) {
-	        printk_err("Failed to allocate regions. Err %d\n", -err);
+		pr_err("Failed to allocate regions. Err %d\n", -err);
 		pci_disable_device(pdev);
 		kfree(card);
 	        return err;
@@ -2475,14 +2470,14 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	card->phys_mem = pci_resource_start(pdev, 2);
 	card->phys_ctlmem = pci_resource_start(pdev, 3);
 	if ((card->mem = ioremap(card->phys_mem, FST_MEMSIZE)) == NULL) {
-		printk_err("Physical memory remap failed\n");
+		pr_err("Physical memory remap failed\n");
 		pci_release_regions(pdev);
 		pci_disable_device(pdev);
 		kfree(card);
 		return -ENODEV;
 	}
 	if ((card->ctlmem = ioremap(card->phys_ctlmem, 0x10)) == NULL) {
-		printk_err("Control memory remap failed\n");
+		pr_err("Control memory remap failed\n");
 		pci_release_regions(pdev);
 		pci_disable_device(pdev);
 		kfree(card);
@@ -2492,7 +2487,7 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	/* Register the interrupt handler */
 	if (request_irq(pdev->irq, fst_intr, IRQF_SHARED, FST_DEV_NAME, card)) {
-		printk_err("Unable to register interrupt %d\n", card->irq);
+		pr_err("Unable to register interrupt %d\n", card->irq);
 		pci_release_regions(pdev);
 		pci_disable_device(pdev);
 		iounmap(card->ctlmem);
@@ -2523,7 +2518,7 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		if (!dev) {
 			while (i--)
 				free_netdev(card->ports[i].dev);
-			printk_err ("FarSync: out of memory\n");
+			pr_err("FarSync: out of memory\n");
                         free_irq(card->irq, card);
                         pci_release_regions(pdev);
                         pci_disable_device(pdev);
@@ -2587,7 +2582,7 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		    pci_alloc_consistent(card->device, FST_MAX_MTU,
 					 &card->rx_dma_handle_card);
 		if (card->rx_dma_handle_host == NULL) {
-			printk_err("Could not allocate rx dma buffer\n");
+			pr_err("Could not allocate rx dma buffer\n");
 			fst_disable_intr(card);
 			pci_release_regions(pdev);
 			pci_disable_device(pdev);
@@ -2600,7 +2595,7 @@ fst_add_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		    pci_alloc_consistent(card->device, FST_MAX_MTU,
 					 &card->tx_dma_handle_card);
 		if (card->tx_dma_handle_host == NULL) {
-			printk_err("Could not allocate tx dma buffer\n");
+			pr_err("Could not allocate tx dma buffer\n");
 			fst_disable_intr(card);
 			pci_release_regions(pdev);
 			pci_disable_device(pdev);
@@ -2672,7 +2667,7 @@ fst_init(void)
 static void __exit
 fst_cleanup_module(void)
 {
-	printk_info("FarSync WAN driver unloading\n");
+	pr_info("FarSync WAN driver unloading\n");
 	pci_unregister_driver(&fst_driver);
 }
 

^ permalink raw reply related

* Re: [PATCH] net: wl12xx: do not use kfree'd memory
From: Juuso Oikarinen @ 2010-08-02  5:04 UTC (permalink / raw)
  To: ext Kulikov Vasiliy
  Cc: kernel-janitors-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Coelho Luciano (Nokia-MS/Helsinki), John W. Linville,
	Paasikivi Teemu.3 (EXT-Ixonos/Tampere),
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1280594039-3246-1-git-send-email-segooon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Sat, 2010-07-31 at 18:33 +0200, ext Kulikov Vasiliy wrote:
> wl1271_dump() uses cmd after kfree(cmd). Move kfree() just after
> wl1271_dump().
> 
> Signed-off-by: Kulikov Vasiliy <segooon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/net/wireless/wl12xx/wl1271_spi.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/wl12xx/wl1271_spi.c b/drivers/net/wireless/wl12xx/wl1271_spi.c
> index 96d25fb..4cb99c5 100644
> --- a/drivers/net/wireless/wl12xx/wl1271_spi.c
> +++ b/drivers/net/wireless/wl12xx/wl1271_spi.c
> @@ -160,9 +160,8 @@ static void wl1271_spi_init(struct wl1271 *wl)
>  	spi_message_add_tail(&t, &m);
>  
>  	spi_sync(wl_to_spi(wl), &m);
> -	kfree(cmd);
> -
>  	wl1271_dump(DEBUG_SPI, "spi init -> ", cmd, WSPI_INIT_CMD_LEN);
> +	kfree(cmd);
>  }
>  
>  #define WL1271_BUSY_WORD_TIMEOUT 1000

Whoops ;) Good catch, thanks.

Acked-by: Juuso Oikarinen <juuso.oikarinen-xNZwKgViW5gAvxtiuMwx3w@public.gmane.org>

-Juuso

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: shape traffic on tun interfaces
From: Franchoze Eric @ 2010-08-02  7:46 UTC (permalink / raw)
  To: alex; +Cc: lartc, netdev
In-Reply-To: <301941280614170@web105.yandex.ru>

>Do the QoS on your next hop router 
There is no next router. And QoS should be on the same machine.

>or on the interface all your de-encapsulated VPN traffic flows over (ie. 'eth0') instead.

It is not a problem to find interface with de-encapsulated traffic. The problem is that tc rules should be written accoring to network device.
And it is really uncinvinient clone this rules which are differ only with destination IP.

For example look here. It's needed to create subclass for each destination IP.

#class
tc class add dev $DEV parent 1: classid 1:1 htb rate ${SPEED}kbit
#subclass
# high priority traffic (where we get money, http for example)
tc class add dev $DEV parent 1:1 classid 1:2 htb rate ${SPEED}kbit ceil ${SPEED}kbit prio 0
# low priority trafic - no adds - now money - low speed
tc class add dev $DEV parent 1:1 classid 1:3 htb rate ${SPEED}/2kbit ceil ${SPEED}kbit prio 1
#handle
tc qdisc add dev $DEV parent 1:2 handle 2: sfq perturb 10
tc qdisc add dev $DEV parent 1:3 handle 3: sfq perturb 10
#connect with
tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 2 fw flowid 1:1001
tc filter add dev $DEV parent 1:0 prio 0 protocol ip handle 3 fw flowid 1:1002 


^ permalink raw reply

* [PATCH] tc: make symbols loaded from tc action modules global.
From: Andreas Henriksson @ 2010-08-02  7:30 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

Fixes problems with xtables based MARK target ("ipt" module).
When tc loads the "ipt" (xt) module it kept the symbols local,
this made loading of libxtables not find the required struct.

currently ipt/xt is the only tc action module.
iproute2 never seem to do dlclose.
hopefully the modules doesn't export more symbols then needed.

In this situation hopefully the RTLD_GLOBAL flag won't hurt us.

I've been using this patch in the Debian package of iproute for
the last 3 weeks and noone has complained.
( This fixes http://bugs.debian.org/584898 )

Signed-off-by: Andreas Henriksson <andreas@fatal.se>
---
 tc/m_action.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index a198158..6464b2e 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -99,7 +99,7 @@ restart_s:
 	}
 
 	snprintf(buf, sizeof(buf), "%s/m_%s.so", get_tc_lib(), str);
-	dlh = dlopen(buf, RTLD_LAZY);
+	dlh = dlopen(buf, RTLD_LAZY | RTLD_GLOBAL);
 	if (dlh == NULL) {
 		dlh = aBODY;
 		if (dlh == NULL) {
-- 
1.7.1


^ permalink raw reply related

* why do we need printk on sending syn flood cookie?
From: Franchoze Eric @ 2010-08-02  7:58 UTC (permalink / raw)
  To: netdev

 Just sirious why do we need printk each 1 second (60*HZ) about possible syn-flood? It really floods dmesg. Is there something dengerous? I have suggestion to turn off printk about sending tcp cookie each 1 second.

Something like this:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fe193e5..5574adc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1230,8 +1230,10 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	__u32 isn = TCP_SKB_CB(skb)->when;
 #ifdef CONFIG_SYN_COOKIES
 	int want_cookie = 0;
+	int want_cookie_no_warn = 0;
 #else
 #define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
+#define want_cookie_no_warn 0 
 #endif
 
 	/* Never answer to SYNs send to broadcast or multicast */
@@ -1246,7 +1248,10 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 #ifdef CONFIG_SYN_COOKIES
 		if (sysctl_tcp_syncookies) {
 			want_cookie = 1;
-		} else
+			if (sysctl_tcp_syncookies == 2)
+				want_cookie_no_warn = 1;
+		}
+		else
 #endif
 		goto drop;
 	}
@@ -1296,6 +1301,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 
 #ifdef CONFIG_SYN_COOKIES
 		want_cookie = 0;	/* not our kind of cookie */
+		want_cookie_no_warn = 0;	/* no printk on syn flood */
 #endif
 		tmp_ext.cookie_out_never = 0; /* false */
 		tmp_ext.cookie_plus = tmp_opt.cookie_plus;
@@ -1328,7 +1334,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 
 	if (want_cookie) {
 #ifdef CONFIG_SYN_COOKIES
-		syn_flood_warning(skb);
+		if (!want_cookie_no_warn)
+			syn_flood_warning(skb);
 		req->cookie_ts = tmp_opt.tstamp_ok;
 #endif
 		isn = cookie_v4_init_sequence(sk, skb, &req->mss);

^ permalink raw reply related

* Re: why do we need printk on sending syn flood cookie?
From: Florian Westphal @ 2010-08-02  8:17 UTC (permalink / raw)
  To: Franchoze Eric; +Cc: netdev
In-Reply-To: <480391280735894@web102.yandex.ru>

Franchoze Eric <franchoze@yandex.ru> wrote:
>  Just sirious why do we need printk each 1 second (60*HZ) about possible syn-flood? It really floods dmesg. Is there something dengerous? I have suggestion to turn off printk about sending tcp cookie each 1 second.

It is handled exactly like other printks in the networking path,
e.g. receipt of tcp wscale == 15.

Why does this need special treatment?

^ permalink raw reply

* Re: shape traffic on tun interfaces
From: Changli Gao @ 2010-08-02  8:36 UTC (permalink / raw)
  To: Franchoze Eric; +Cc: alex, lartc, netdev
In-Reply-To: <474381280735172@web82.yandex.ru>

On Mon, Aug 2, 2010 at 3:46 PM, Franchoze Eric <franchoze@yandex.ru> wrote:
>>Do the QoS on your next hop router
> There is no next router. And QoS should be on the same machine.
>
>>or on the interface all your de-encapsulated VPN traffic flows over (ie. 'eth0') instead.
>
> It is not a problem to find interface with de-encapsulated traffic. The problem is that tc rules should be written accoring to network device.

try ifb:

config IFB
        tristate "Intermediate Functional Block support"
        depends on NET_CLS_ACT
        ---help---
          This is an intermediate driver that allows sharing of
          resources.
          To compile this driver as a module, choose M here: the module
          will be called ifb.  If you want to use more than one ifb
          device at a time, you need to compile this driver as a module.
          Instead of 'ifb', the devices will then be called 'ifb0',
          'ifb1' etc.
          Look at the iproute2 documentation directory for usage etc

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: Is it a possible bug in dev_gro_receive()?
From: Jarek Poplawski @ 2010-08-02 10:29 UTC (permalink / raw)
  To: Xin Xiaohui; +Cc: netdev, herbert, davem
In-Reply-To: <1280454855-7893-1-git-send-email-xiaohui.xin@intel.com>

Xin Xiaohui wrote:
> I looked into the code dev_gro_receive(), found the code here:
> if the frags[0] is pulled to 0, then the page will be released,
> and memmove() frags left.
> Is that right? I'm not sure if memmove do right or not, but
> frags[0].size is never set after memove at least. what I think
> a simple way is not to do anything if we found frags[0].size == 0.
> The patch is as followed.
> 
> Or am I missing something here?

I think, you're right, but fixing memmove looks nicer to me:

 -	--skb_shinfo(skb)->nr_frags);
 +	--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));

Jarek P.

> 
> ---
>  net/core/dev.c |    7 -------
>  1 files changed, 0 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 264137f..28cdbbf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2730,13 +2730,6 @@ pull:
>  
>  		skb_shinfo(skb)->frags[0].page_offset += grow;
>  		skb_shinfo(skb)->frags[0].size -= grow;
> -
> -		if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
> -			put_page(skb_shinfo(skb)->frags[0].page);
> -			memmove(skb_shinfo(skb)->frags,
> -				skb_shinfo(skb)->frags + 1,
> -				--skb_shinfo(skb)->nr_frags);
> -		}
>  	}
>  
>  ok:




^ permalink raw reply

* Re: Is it a possible bug in dev_gro_receive()?
From: Herbert Xu @ 2010-08-02 11:04 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Xin Xiaohui, netdev, davem
In-Reply-To: <20100802102906.GA8439@ff.dom.local>

On Mon, Aug 02, 2010 at 10:29:06AM +0000, Jarek Poplawski wrote:
> Xin Xiaohui wrote:
> > I looked into the code dev_gro_receive(), found the code here:
> > if the frags[0] is pulled to 0, then the page will be released,
> > and memmove() frags left.
> > Is that right? I'm not sure if memmove do right or not, but
> > frags[0].size is never set after memove at least. what I think
> > a simple way is not to do anything if we found frags[0].size == 0.
> > The patch is as followed.
> > 
> > Or am I missing something here?
> 
> I think, you're right, but fixing memmove looks nicer to me:
> 
>  -	--skb_shinfo(skb)->nr_frags);
>  +	--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));

I agree with the diagnosis and your proposed fix.

Thanks for catching this Xiaohui!

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* RFC: New BGF 'LOOP' instruction
From: Paul LeoNerd Evans @ 2010-08-02 11:03 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 2670 bytes --]

---
 Proposal: Create a new BPF instruction, "LOOP", which can implement a
   specific time-bounded kind of while() loop over packet contents
---

IPv6 packets contain a linked-list of headers. Some other network
protocols may also contain linked-list structure.

BPF cannot implement loops.

Currently therefore, it is impossible to efficiently parse IPv6 packets
without resorting to such annoying tricks as statically unrolling a loop
into a long list of instructions. In IPv6's case this gets very large
very quickly, as different header types have different lengths, or
structure layouts.

I propose to add a new instruction, "LOOP", with the following
semantics:

 BPF_JMP|BPF_LOOP, jt

    If A == 0, fallthrough to the next instruction.
      (TODO: Or perhaps this should be considered a hard error which
       immediately aborts the filter, similar to divide by zero?)
    Otherwise:
       X += A.
       If X < len, jump backwards jt instructions.
       Otherwise, fallthrough to the next instruction

The following static checks would be enforced:

 None of the 'jt' preceeding instructions before the LOOP instruction
 (i.e. the body of the loop) may themselves be LOOP instructions, nor may
 they be STX.

The intention of this instruction is to be able to implement a loop in
which successive iterations advance the index register along the packet
buffer. By comparing X to the packet length, we can bound the running
time of the loop instruction, avoiding it locking up the kernel. By
banning STX instructions within the body of the loop, we can ensure that
X must be a strictly monotonically increasing sequence. At absolute
worst, X is increased by 1 each time, meaning at worst the body of the
loop must execute for every byte in the packet. By banning further
nested LOOP instructions, we can ensure at worst a linear running time.


I believe this addition should have minimal impact on existing users of
the filter layer, as it simply adds a new instruction and does not
otherwise change the semantics of any existing code. I also believe it
to be useful in writing filters that process IPv6 packets. I also
believe that the semantics and static checks are sufficient to preserve
the termination guarantee of BPF filter programs, ensuring each packet's
fate is decided in a timely fashion to avoid locking up the kernel.

Any comments on this, while I proceed? Barring any major complaints,
I'll have a hack at some code and present a patch in due course...

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: RFC: New BPF 'LOOP' instruction
From: Paul LeoNerd Evans @ 2010-08-02 11:13 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20100802110334.GK11110@cel.leo>

[-- Attachment #1: Type: text/plain, Size: 195 bytes --]

*ahem* Typo in the subject line there, sorry. I meant "BPF".


-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: [PATCH 01/11] pcmcia: use pcmica_{read,write}_config_byte
From: Komuro @ 2010-08-02 11:59 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Michael Buesch, netdev, linux-pcmcia, linux-wireless,
	Dominik Brodowski, linux-serial
In-Reply-To: <1280667550-3040-1-git-send-email-linux@dominikbrodowski.net>

Hi,

>--- a/drivers/net/pcmcia/xirc2ps_cs.c
>+++ b/drivers/net/pcmcia/xirc2ps_cs.c


>+	if (err)
> 	    goto config_error;
>-	reg.Action = CS_WRITE;
>-	reg.Offset = CISREG_IOBASE_1;
>-	reg.Value = (link->io.BasePort2 >> 8) & 0xff;
>-	if ((err = pcmcia_access_configuration_register(link, &reg)))
>+
>+	err = pcmcia_write_config_byte(link, CISREG_IOBASE_1,
>+				link->io.BasePort2 & 0xff);

It should be

	err = pcmcia_write_config_byte(link, CISREG_IOBASE_1,
				(link->io.BasePort2 >> 8) & 0xff);

^ permalink raw reply

* Re: [PATCH] net: Add getsockopt support for TCP thin-streams
From: Andreas Petlund @ 2010-08-02 11:46 UTC (permalink / raw)
  To: Josh Hunt; +Cc: davem, kuznet, jmorris, kaber, netdev, linux-kernel, juhlenko
In-Reply-To: <1280533775-7700-1-git-send-email-johunt@akamai.com>

On 07/31/2010 01:49 AM, Josh Hunt wrote:
> Initial TCP thin-stream commit did not add getsockopt support for the new
> socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support
> for them.
> 
> Signed-off-by: Josh Hunt <johunt@akamai.com>
> ---
>  net/ipv4/tcp.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 65afeae..3ed3525 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2591,6 +2591,12 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>  			return -EFAULT;
>  		return 0;
>  	}
> +	case TCP_THIN_LINEAR_TIMEOUTS:
> +		val = tp->thin_lto;
> +		break;
> +	case TCP_THIN_DUPACK:
> +		val = tp->thin_dupack;
> +		break;
>  	default:
>  		return -ENOPROTOOPT;
>  	}

Thanks for noticing and fixing this :)

Tested-by: Andreas Petlund <apetlund@simula.no>
Acked-by: Andreas Petlund <apetlund@simula.no>

-Andreas

^ permalink raw reply

* Re: [PATCH] Multiqueue macvtap driver
From: Krishna Kumar2 @ 2010-08-02 12:37 UTC (permalink / raw)
  To: David Miller; +Cc: arnd, bhutchings, mst, netdev
In-Reply-To: <20100801.003406.02275545.davem@davemloft.net>

Thanks Ben & Dave. A question though - which of the following
is preferable for the macvtap driver:

1. Calculate flow and use that to find a queue; or
2. First check if skb_rx_queue_recorded is true and if so use
    that; otherwise calculate the flow as in #1.

I guess #1 is better, since packets for a single flow will go to the
same queue even if they arrive on different rxqs of a mq driver.
But I want to make sure.

Thanks,

- KK

David Miller <davem@davemloft.net> wrote on 08/01/2010 01:04:06 PM:

> David Miller <davem@davemloft.net>
> 08/01/2010 01:04 PM
>
> To
>
> bhutchings@solarflare.com
>
> cc
>
> Krishna Kumar2/India/IBM@IBMIN, arnd@arndb.de,
> netdev@vger.kernel.org, mst@redhat.com
>
> Subject
>
> Re: [PATCH] Multiqueue macvtap driver
>
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Sat, 31 Jul 2010 20:18:27 +0100
>
> > On Sat, 2010-07-31 at 19:27 +0530, Krishna Kumar wrote:
> > [...]
> >> @@ -136,39 +158,68 @@ static void macvtap_put_queue(struct mac
> >>  }
> >>
> >>  /*
> >> - * Since we only support one queue, just dereference the pointer.
> >> + * Select a queue based on the rxq of the device on which this packet
> >> + * arrived. If the incoming device is not mq, then use our cpu number
> >> + * to select a queue. vlan->numvtaps is cached in case it changes
> >> + * during the execution of this function.
> >>   */
> > [...]
> >
> > This can result in reordering if a single-queue device's RX interrupt's
> > CPU affinity is changed.  We generally try to avoid that.  You should
> > really use or generate a flow hash.  There is code for this in
> > net/core/dev.c:get_rps_cpu() which could be factored out into a
separate
> > function.
>
> Agreed.


^ permalink raw reply

* Fwd: a Great Idea - include Kademlia networking protocol in kernel
From: hp fk @ 2010-08-02 13:16 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTik7powjUaJ51bA_bgPVb03U-RVZay3NffHBYLzx@mail.gmail.com>

---------- Forwarded message ----------
From: hp fk <fkhp101@gmail.com>
Date: 2010/8/2
Subject: a Great Idea - include Kademlia networking protocol in kernel
To: linux-kernel@vger.kernel.org


Kademlia protocol is widely used by many p2p applications, if kernel
support the Kademlia protocol and its p2p network infrastracture. it
will be easier to bring us into a new stage of network with much more
freedom and equality.

1.systems could get public identity on network much more easier, and
need not depend on DNS system, which usual controled by buraucracy
agents.
2. system presence on network is easier to resolved in a dynamic ip
environment that many ISPs use to effectively use IP address. and
communication could be much easier.
3.many p2p applications could be releaved from building the p2p
network infrastracture repeatedly and focus on application features.
4. some restrictions imposed by buraucracy censorship will be breaked
and give us much more communication freedom
5. all linux systems could be equal citizens of a unified greate linux
p2p network and promote the development of p2p network.

^ permalink raw reply

* Re: [PATCH] Multiqueue macvtap driver
From: Ben Hutchings @ 2010-08-02 13:29 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, arnd, mst, netdev
In-Reply-To: <OF8519C301.F0E5FA07-ON65257772.004983CC-65257773.00453C8B@in.ibm.com>

On Mon, 2010-08-02 at 18:07 +0530, Krishna Kumar2 wrote:
> Thanks Ben & Dave. A question though - which of the following
> is preferable for the macvtap driver:
> 
> 1. Calculate flow and use that to find a queue; or
> 2. First check if skb_rx_queue_recorded is true and if so use
>     that; otherwise calculate the flow as in #1.
> 
> I guess #1 is better, since packets for a single flow will go to the
> same queue even if they arrive on different rxqs of a mq driver.
> But I want to make sure.
[...]

#2 is right.  Just as macvtap should provide a stable flow to RX-queue
mapping, so should the drivers it interfaces with.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [patch] usbnet: fix 100% CPU use on suspended device
From: Elly Jones @ 2010-08-02 13:31 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alan Stern, David Miller, netdev-u79uwXL29TY76Z2rM5mHXA, USB list
In-Reply-To: <201007261821.15020.oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org>

On Mon, Jul 26, 2010 at 12:21 PM, Oliver Neukum <oliver-GvhC2dPhHPQdnm+yROfE0A@public.gmane.org> wrote:
> Am Montag, 26. Juli 2010, 17:13:23 schrieb Alan Stern:
>> On Mon, 26 Jul 2010, Elly Jones wrote:
>>
>> > > This isn't right.  The problem should be fixed some other way.  Under
>> > > what circumstances are URBs submitted incorrectly?
>> >
>> > When the device is autosuspended. What is the proper thing for a
>> > device to do here?
>>
>> From looking at the code, it appears that the EVENT_DEV_ASLEEP flag
>> should be tested in usbnet_bh() the way it is in rx_submit().  But I'm
>> not an expert on usbnet; we should ask someone who is, like Oliver.
>
> Sorry, I didn't notice this thread.
>
> The correct way to check for autosuspend in usbnet is to look
> at EVENT_DEV_ASLEEP under txq.lock. That being said, usbnet_bh()
> uses rx_submit() which does the correct check. The bug seems to be
> a lack of error handling in usbnet_bh() regarding the return of rx_submit()

If rx_submit() fails, should usbnet_bh() just not tasklet_schedule() itself?

>        Regards
>                Oliver

-- Elly
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Fwd: a Great Idea - include Kademlia networking protocol in kernel
From: Paulius Zaleckas @ 2010-08-02 13:51 UTC (permalink / raw)
  To: hp fk; +Cc: netdev
In-Reply-To: <AANLkTik5qnmHALi7=AMA+Lai6a+9STgub9BnQAV6EVkh@mail.gmail.com>

On 08/02/2010 04:16 PM, hp fk wrote:
> ---------- Forwarded message ----------
> From: hp fk<fkhp101@gmail.com>
> Date: 2010/8/2
> Subject: a Great Idea - include Kademlia networking protocol in kernel
> To: linux-kernel@vger.kernel.org
>
>
> Kademlia protocol is widely used by many p2p applications, if kernel
> support the Kademlia protocol and its p2p network infrastracture. it
> will be easier to bring us into a new stage of network with much more
> freedom and equality.
>
> 1.systems could get public identity on network much more easier, and
> need not depend on DNS system, which usual controled by buraucracy
> agents.
> 2. system presence on network is easier to resolved in a dynamic ip
> environment that many ISPs use to effectively use IP address. and
> communication could be much easier.
> 3.many p2p applications could be releaved from building the p2p
> network infrastracture repeatedly and focus on application features.
> 4. some restrictions imposed by buraucracy censorship will be breaked
> and give us much more communication freedom
> 5. all linux systems could be equal citizens of a unified greate linux
> p2p network and promote the development of p2p network.

I don't see any valid point why it should be done in kernel and not
in userspace. Actually doing this in userspace is better since it
would allow you to support other OS like BSD, Windows and etc.

^ permalink raw reply

* [PATCH v2 1/2] core: Factor out flow calculation from get_rps_cpu
From: Krishna Kumar @ 2010-08-02 14:33 UTC (permalink / raw)
  To: davem, arnd; +Cc: bhutchings, netdev, therbert, Krishna Kumar, mst

From: Krishna Kumar <krkumar2@in.ibm.com>

Factor out flow calculation code from get_rps_cpu, since macvtap
driver can use the same code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/core/dev.c |   94 +++++++++++++++++++++++++++++------------------
 1 file changed, 58 insertions(+), 36 deletions(-)

diff -ruNp org/net/core/dev.c new/net/core/dev.c
--- org/net/core/dev.c	2010-08-02 10:06:59.000000000 +0530
+++ new/net/core/dev.c	2010-08-02 19:29:34.000000000 +0530
@@ -2263,51 +2263,24 @@ static inline void ____napi_schedule(str
 	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
 
-#ifdef CONFIG_RPS
-
-/* One global table that all flow-based protocols share. */
-struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
-EXPORT_SYMBOL(rps_sock_flow_table);
-
 /*
- * get_rps_cpu is called from netif_receive_skb and returns the target
- * CPU from the RPS map of the receiving queue for a given skb.
- * rcu_read_lock must be held on entry.
+ * skb_calculate_flow: calculate a flow hash based on src/dst addresses
+ * and src/dst port numbers. On success, returns a hash number (> 0),
+ * otherwise -1.
  */
-static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
-		       struct rps_dev_flow **rflowp)
+int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb)
 {
+	int hash = skb->rxhash;
 	struct ipv6hdr *ip6;
 	struct iphdr *ip;
-	struct netdev_rx_queue *rxqueue;
-	struct rps_map *map;
-	struct rps_dev_flow_table *flow_table;
-	struct rps_sock_flow_table *sock_flow_table;
-	int cpu = -1;
 	u8 ip_proto;
-	u16 tcpu;
 	u32 addr1, addr2, ihl;
 	union {
 		u32 v32;
 		u16 v16[2];
 	} ports;
 
-	if (skb_rx_queue_recorded(skb)) {
-		u16 index = skb_get_rx_queue(skb);
-		if (unlikely(index >= dev->num_rx_queues)) {
-			WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
-				"on queue %u, but number of RX queues is %u\n",
-				dev->name, index, dev->num_rx_queues);
-			goto done;
-		}
-		rxqueue = dev->_rx + index;
-	} else
-		rxqueue = dev->_rx;
-
-	if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
-		goto done;
-
-	if (skb->rxhash)
+	if (hash)
 		goto got_hash; /* Skip hash computation on packet header */
 
 	switch (skb->protocol) {
@@ -2334,6 +2307,7 @@ static int get_rps_cpu(struct net_device
 	default:
 		goto done;
 	}
+
 	switch (ip_proto) {
 	case IPPROTO_TCP:
 	case IPPROTO_UDP:
@@ -2356,11 +2330,59 @@ static int get_rps_cpu(struct net_device
 	/* get a consistent hash (same value on both flow directions) */
 	if (addr2 < addr1)
 		swap(addr1, addr2);
-	skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
-	if (!skb->rxhash)
-		skb->rxhash = 1;
+
+	hash = jhash_3words(addr1, addr2, ports.v32, hashrnd);
+	if (!hash)
+		hash = 1;
 
 got_hash:
+	return hash;
+
+done:
+	return -1;
+}
+EXPORT_SYMBOL(skb_calculate_flow);
+
+#ifdef CONFIG_RPS
+
+/* One global table that all flow-based protocols share. */
+struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
+EXPORT_SYMBOL(rps_sock_flow_table);
+
+/*
+ * get_rps_cpu is called from netif_receive_skb and returns the target
+ * CPU from the RPS map of the receiving queue for a given skb.
+ * rcu_read_lock must be held on entry.
+ */
+static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
+		       struct rps_dev_flow **rflowp)
+{
+	struct netdev_rx_queue *rxqueue;
+	struct rps_map *map;
+	struct rps_dev_flow_table *flow_table;
+	struct rps_sock_flow_table *sock_flow_table;
+	int cpu = -1;
+	u16 tcpu;
+
+	if (skb_rx_queue_recorded(skb)) {
+		u16 index = skb_get_rx_queue(skb);
+		if (unlikely(index >= dev->num_rx_queues)) {
+			WARN_ONCE(dev->num_rx_queues > 1, "%s received packet "
+				"on queue %u, but number of RX queues is %u\n",
+				dev->name, index, dev->num_rx_queues);
+			goto done;
+		}
+		rxqueue = dev->_rx + index;
+	} else
+		rxqueue = dev->_rx;
+
+	if (!rxqueue->rps_map && !rxqueue->rps_flow_table)
+		goto done;
+
+	skb->rxhash = skb_calculate_flow(dev, skb);
+	if (skb->rxhash < 0)
+		goto done;
+
 	flow_table = rcu_dereference(rxqueue->rps_flow_table);
 	sock_flow_table = rcu_dereference(rps_sock_flow_table);
 	if (flow_table && sock_flow_table) {

^ permalink raw reply

* [PATCH v2 2/2] macvtap: Implement multiqueue macvtap driver
From: Krishna Kumar @ 2010-08-02 14:33 UTC (permalink / raw)
  To: davem, arnd; +Cc: bhutchings, netdev, mst, Krishna Kumar, therbert
In-Reply-To: <20100802143304.1517.42494.sendpatchset@krkumar2.in.ibm.com>

From: Krishna Kumar <krkumar2@in.ibm.com>

Implement multiqueue facility for macvtap driver. The idea is that
a macvtap device can be opened multiple times and the fd's can be
used to register eg, as backend for vhost.

Please review.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 drivers/net/macvtap.c      |   89 ++++++++++++++++++++++++++++-------
 include/linux/if_macvlan.h |   11 +++-
 include/linux/netdevice.h  |    1 
 3 files changed, 83 insertions(+), 18 deletions(-)

diff -ruNp org/include/linux/netdevice.h new/include/linux/netdevice.h
--- org/include/linux/netdevice.h	2010-07-25 16:57:07.000000000 +0530
+++ new/include/linux/netdevice.h	2010-08-02 16:05:57.000000000 +0530
@@ -2253,6 +2253,7 @@ static inline const char *netdev_name(co
 	return dev->name;
 }
 
+extern int skb_calculate_flow(struct net_device *dev, struct sk_buff *skb);
 extern int netdev_printk(const char *level, const struct net_device *dev,
 			 const char *format, ...)
 	__attribute__ ((format (printf, 3, 4)));
diff -ruNp org/include/linux/if_macvlan.h new/include/linux/if_macvlan.h
--- org/include/linux/if_macvlan.h	2010-08-02 15:32:33.000000000 +0530
+++ new/include/linux/if_macvlan.h	2010-08-02 15:32:33.000000000 +0530
@@ -40,6 +40,14 @@ struct macvlan_rx_stats {
 	unsigned long		rx_errors;
 };
 
+#define MIN(x, y)		(((x) < (y)) ? (x) : (y))
+
+/*
+ * Maximum times a macvtap device can be opened. This can be used to
+ * configure the number of receive queue, e.g. for multiqueue virtio.
+ */
+#define MAX_MACVTAP_QUEUES	MIN(16, NR_CPUS)
+
 struct macvlan_dev {
 	struct net_device	*dev;
 	struct list_head	list;
@@ -50,7 +58,8 @@ struct macvlan_dev {
 	enum macvlan_mode	mode;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
-	struct macvtap_queue	*tap;
+	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
+	int			numvtaps;
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
diff -ruNp org/drivers/net/macvtap.c new/drivers/net/macvtap.c
--- org/drivers/net/macvtap.c	2010-07-28 15:10:10.000000000 +0530
+++ new/drivers/net/macvtap.c	2010-08-02 17:48:38.000000000 +0530
@@ -84,26 +84,45 @@ static const struct proto_ops macvtap_so
 static DEFINE_SPINLOCK(macvtap_lock);
 
 /*
- * Choose the next free queue, for now there is only one
+ * get_slot: return a [unused/occupied] slot in vlan->taps[]:
+ *	- if 'q' is NULL, return the first empty slot;
+ *	- otherwise, return the slot this pointer occupies.
  */
+static int get_slot(struct macvlan_dev *vlan, struct macvtap_queue *q)
+{
+	int i;
+
+	for (i = 0; i < MAX_MACVTAP_QUEUES; i++) {
+		if (rcu_dereference(vlan->taps[i]) == q)
+			return i;
+	}
+
+	/* Should never happen */
+	BUG_ON(1);
+}
+
 static int macvtap_set_queue(struct net_device *dev, struct file *file,
 				struct macvtap_queue *q)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	int index;
 	int err = -EBUSY;
 
 	spin_lock(&macvtap_lock);
-	if (rcu_dereference(vlan->tap))
+	if (vlan->numvtaps == MAX_MACVTAP_QUEUES)
 		goto out;
 
 	err = 0;
+	index = get_slot(vlan, NULL);
 	rcu_assign_pointer(q->vlan, vlan);
-	rcu_assign_pointer(vlan->tap, q);
+	rcu_assign_pointer(vlan->taps[index], q);
 	sock_hold(&q->sk);
 
 	q->file = file;
 	file->private_data = q;
 
+	vlan->numvtaps++;
+
 out:
 	spin_unlock(&macvtap_lock);
 	return err;
@@ -124,9 +143,12 @@ static void macvtap_put_queue(struct mac
 	spin_lock(&macvtap_lock);
 	vlan = rcu_dereference(q->vlan);
 	if (vlan) {
-		rcu_assign_pointer(vlan->tap, NULL);
+		int index = get_slot(vlan, q);
+
+		rcu_assign_pointer(vlan->taps[index], NULL);
 		rcu_assign_pointer(q->vlan, NULL);
 		sock_put(&q->sk);
+		--vlan->numvtaps;
 	}
 
 	spin_unlock(&macvtap_lock);
@@ -136,39 +158,72 @@ static void macvtap_put_queue(struct mac
 }
 
 /*
- * Since we only support one queue, just dereference the pointer.
+ * Select a queue based on the rxq of the device on which this packet
+ * arrived. If the incoming device is not mq, calculate a flow hash to
+ * select a queue. vlan->numvtaps is cached in case it reduces during
+ * the execution of this function.
  */
 static struct macvtap_queue *macvtap_get_queue(struct net_device *dev,
 					       struct sk_buff *skb)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct macvtap_queue *tap = NULL;
+	int numvtaps = vlan->numvtaps;
+	u16 rxq;
+
+	if (!numvtaps)
+		goto out;
+
+	if (skb_rx_queue_recorded(skb)) {
+		rxq = skb_get_rx_queue(skb);
+
+		while (unlikely(rxq >= numvtaps))
+			rxq -= numvtaps;
 
-	return rcu_dereference(vlan->tap);
+		tap = rcu_dereference(vlan->taps[rxq]);
+		if (tap)
+			goto out;
+	}
+
+	rxq = skb_calculate_flow(dev, skb);
+	if (rxq < 0)
+		rxq = smp_processor_id();
+
+	tap = rcu_dereference(vlan->taps[rxq & (numvtaps - 1)]);
+
+out:
+	return tap;
 }
 
 /*
  * The net_device is going away, give up the reference
- * that it holds on the queue (all the queues one day)
- * and safely set the pointer from the queues to NULL.
+ * that it holds on all queues and safely set the pointer
+ * from the queues to NULL.
  */
 static void macvtap_del_queues(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
-	struct macvtap_queue *q;
+	struct macvtap_queue *q, *qlist[MAX_MACVTAP_QUEUES];
+	int i, j = 0;
 
+	/* macvtap_put_queue can free some slots, so go through all slots */
 	spin_lock(&macvtap_lock);
-	q = rcu_dereference(vlan->tap);
-	if (!q) {
-		spin_unlock(&macvtap_lock);
-		return;
+	for (i = 0; i < MAX_MACVTAP_QUEUES && vlan->numvtaps; i++) {
+		q = rcu_dereference(vlan->taps[i]);
+		if (q) {
+			qlist[j++] = q;
+			rcu_assign_pointer(vlan->taps[i], NULL);
+			rcu_assign_pointer(q->vlan, NULL);
+			vlan->numvtaps--;
+		}
 	}
-
-	rcu_assign_pointer(vlan->tap, NULL);
-	rcu_assign_pointer(q->vlan, NULL);
+	BUG_ON(vlan->numvtaps != 0);
 	spin_unlock(&macvtap_lock);
 
 	synchronize_rcu();
-	sock_put(&q->sk);
+
+	for (--j; j >= 0; j--)
+		sock_put(&qlist[j]->sk);
 }
 
 /*

^ permalink raw reply

* Re: [PATCH nf-next-2.6] netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
From: Patrick McHardy @ 2010-08-02 14:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netfilter Development Mailinglist, netdev
In-Reply-To: <1279898595.2481.69.camel@edumazet-laptop>

On 23.07.2010 17:23, Eric Dumazet wrote:
> We currently disable BH for the whole duration of get_counters()
> 
> On machines with a lot of cpus and large tables, this might be too long.
> 
> We can disable preemption during the whole function, and disable BH only
> while fetching counters for the current cpu.
> 

Applied, thanks Eric.


^ permalink raw reply

* Re: [PATCH repost] sched: export sched_set/getaffinity to modules
From: Oleg Nesterov @ 2010-08-02 15:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sridhar Samudrala, Peter Zijlstra, Tejun Heo, Ingo Molnar, netdev,
	lkml, kvm@vger.kernel.org, Andrew Morton, Dmitri Vorobiev,
	Jiri Kosina, Thomas Gleixner, Andi Kleen
In-Reply-To: <20100801085022.GD16158@redhat.com>

On 08/01, Michael S. Tsirkin wrote:
>
> Oleg, I mean Ack the exporting of get/set affinity.

Ah, I misunderstood.

Yes, I believe the exporting is the lesser evil. Please feel free
to add my ack.

Oleg.

^ permalink raw reply

* Re: [PATCH] nf_conntrack_extend: introduce __nf_ct_ext_exist()
From: Patrick McHardy @ 2010-08-02 15:07 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netfilter-devel, netdev
In-Reply-To: <1280068734-5332-1-git-send-email-xiaosuo@gmail.com>

On 25.07.2010 16:38, Changli Gao wrote:
> some users of nf_ct_ext_exist() know ct->ext isn't NULL. For these users, the
> check for ct->ext isn't necessary, the function __nf_ct_ext_exist() can be
> used instead.
> 
> the type of the return value of nf_ct_ext_exist() is changed to bool.

Applied, thanks.

^ permalink raw reply

* Re: [patch] ipvs: remove EXPERIMENTAL tag
From: Patrick McHardy @ 2010-08-02 15:08 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Wensong Zhang, Julian Anastasov
In-Reply-To: <20100726075609.GA11755@verge.net.au>

On 26.07.2010 09:56, Simon Horman wrote:
> IPVS was merged into the kernel quite a long time ago and
> has been seeing wide-spread production use for even longer.
> 
> It seems appropriate for it to be no longer tagged as EXPERIMENTAL

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox