Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Nicolas de Pesloüan @ 2011-09-04  7:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110903211438.2a43d2f2@nehalam.ftrdhcpuser.net>

Le 04/09/2011 06:14, Stephen Hemminger a écrit :

>> Instead of asserting carrier when the bridge have no port, can't we assert carrier when the three
>> following condition are true at the same time :
>>
>> - The bridge have no port.
>> - At least one IP address is setup on the bridge.
>> - The two above conditions are true for more than a configurable amount of seconds, with a default
>> of 10, for example.
>>
>> This would only delay carrier on for a few seconds for the regression and keep the current behavior
>> (carrier off until at least 1 port is on) for DHCP.
>
> This fails on two counts:
> 1. Bridge's often run without IP addresses!
> 2. DHCP won't try and send out request until carrier is true.

Sorry, I missed to say that we should of course also assert carrier on if one port has carrier on.

And rethinking about it, the delay is probably useless :

bridge_carrier_on = at_least_one_port_has_carrier_on | (bridge_has_no_port & bridge_has_at_least_one_ip)

That way :
- for those using bridge without any port, manually setting the IP will assert carrier on. (By the 
way, why don't they use a dummy device instead?)

- for those using bridge with ports:
-- Using any kind of autoconfig will work as expected. Carrier will only be asserted at the time 
first port get carrier.
-- Using static IP confifiguration, carrier will possibly be erroneously reported as on during the 
small time gap between IP address configuration and first port is added to the bridge. This time gap 
may be removed by simply configuring the IP after the first port is added. This is probably already 
true for most distribs. And anyway, this time gap is probably not a problem.
-- Carrier will also be erroneously reported as on after removing the last port, if the bridge still 
has an IP. (But we can arrange for this not to happen).

And in order to ensure user really understand why carrier is on of off, we can simply issue an INFO 
message for the non-natural case (bridge_has_no_port & bridga_has_at_least_one_ip).

I consider all this reasonable.

	Nicolas.

^ permalink raw reply

* Re: [PATCH 02/11] net: add function to allocate skbuff head without data area
From: Eric Dumazet @ 2011-09-04  8:12 UTC (permalink / raw)
  To: kaber; +Cc: davem, netfilter-devel, netdev
In-Reply-To: <1315070771-18576-3-git-send-email-kaber@trash.net>

Le samedi 03 septembre 2011 à 19:26 +0200, kaber@trash.net a écrit :

> +struct sk_buff *__alloc_skb_head(gfp_t gfp_mask, int node)
> +{
> +	struct sk_buff *skb;
> +
> +	/* Get the HEAD */
> +	skb = kmem_cache_alloc_node(skbuff_head_cache,
> +				    gfp_mask & ~__GFP_DMA, node);
> +	if (!skb)
> +		goto out;
> +	prefetchw(skb);

Please remove this prefetchw(), since we have no delay between it and
actual memset(skb).

> +
> +	/*
> +	 * Only clear those fields we need to clear, not those that we will
> +	 * actually initialise below. Hence, don't put any more fields after
> +	 * the tail pointer in struct sk_buff!
> +	 */
> +	memset(skb, 0, offsetof(struct sk_buff, tail));



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Yan, Zheng @ 2011-09-04  8:23 UTC (permalink / raw)
  To: sedat.dilek
  Cc: netdev@vger.kernel.org, davem@davemloft.net, sfr@canb.auug.org.au,
	tim.c.chen@linux.intel.com, jirislaby@gmail.com
In-Reply-To: <CA+icZUU25QMMCVT1VfF9XB0wrkppNmKQbHwoHDAQ-qV9bi+HqQ@mail.gmail.com>

On Sun, Sep 4, 2011 at 3:12 PM, Sedat Dilek <sedat.dilek@googlemail.com> wrote:
> On Sun, Sep 4, 2011 at 7:44 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
>> Commit 0856a30409 (Scm: Remove unnecessary pid & credential references
>> in Unix socket's send and receive path) introduced a use-after-free bug.
>> It passes the scm reference to the first skb. Skb(s) afterwards may
>> reference freed data structure because the first skb can be destructed
>> by the receiver at anytime. The fix is by passing the scm reference to
>> the very last skb.
>>
>
> s/by passing/bypassing ?

No

>
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
>> Reported-by: Jiri Slaby <jirislaby@gmail.com>
>> ---
>
> Tested on i386 against linux-next (next-20110831).
>

Thank you.

> - Sedat -
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH] usbnet: ignore get interface retval of -EINPROGRESS
From: Oliver Neukum @ 2011-09-04 10:16 UTC (permalink / raw)
  To: Jim Wylder; +Cc: netdev
In-Reply-To: <CAPopfEWi0v9VXeDYcBfEgnWYxg7WcTTj3GBc6USzwcN+aGrKOQ@mail.gmail.com>

Am Samstag, 3. September 2011, 19:32:44 schrieb Jim Wylder:

Hi,

> Thanks for the quick feedback.  True usbnet_start_xmit() could be
> running at anytime, but the usb_autopm_get_interface_async() will only
> return -EINPROGRESS when rpm_resume detects that
> dev->power.runtime_status == RPM_RESUMING.  My understanding is that
> for an asynchronous request, the promise that the device is resuming
> would be equivalent to cases where usb_autopm_get_interface_async()
> returns success.

If CONFIG_PM is set.

> In all other cases, when we are not attempting to resume an already
> resuming interface, this change should have no impact.
> 
> Are you recommending that I add an additional check for DEV_ASLEEP,

The check is already there but depending on CONFIG_PM.

> possibly to decide whether to drop the or continue on?  Or am I
> missing your point?  I had not done anything similar because my
> understanding was that knowing that the device is in fact resuming
> would be sufficient.

At that point it is sufficient. Upon further thought it looks like your check
is correct.
We just need to make very sure that for the decision to queue or
to submit EVENT_DEV_ASLEEP is relevant.
Would you consider defining a macro with a nice name for the
==0 || == -EINPROGRESS check, so that any reader knows what
this is about? This is because I doubt only usbnet is affected.

	Regards
		Oliver

^ permalink raw reply

* Re: linux-next: Tree for Aug 26 (git-bisect: [0856a30] Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path)
From: Sedat Dilek @ 2011-09-04 10:43 UTC (permalink / raw)
  To: Stephen Rothwell, tim.c.chen
  Cc: linux-next, LKML, jirislaby, David Miller, Yan, Zheng,
	Valdis.Kletnieks, netdev
In-Reply-To: <CA+icZUXOnoG07Z8-AAUNqvuJmq6-OY1_dTzkxsBOLczoNYYGEw@mail.gmail.com>

On Fri, Sep 2, 2011 at 11:43 PM, Sedat Dilek <sedat.dilek@googlemail.com> wrote:
> On Fri, Aug 26, 2011 at 7:00 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>> Hi all,
>>
>> The powerpc allyesconfig build still fails today.
>>
>> Changes since 20110825:
>>
>> The slave-dma tree gained a conflict against Linus' tree.
>>
>> I have reverted the x86/spinlocks branch from the tip tree for today.
>>
>> The ptrace tree lost its build failure.
>>
>> The xen tree lost its conflict since I reveretd the version of the same
>> patches in the tip tree..
>>
>> The tty tree still has its build failure but I applied a supplied test
>> patch for it.
>>
>> The staging tree lost its build failures and gained a conflict against
>> the net tree.
>>
>> The moduleh tree lost a conflict.
>>
>> The akpm tree lost its build failure.
>>
>> ----------------------------------------------------------------------------
>>
>
> Hi,
>
> I saw diverse strange kernel-panics letting my notebook blink and
> scream a high-frequency tone, really ugly.
>
> My last good linux-next (l-n) kernel was next-20110818 and the next
> l-n kernel in series which I built and was broken: next-20110826.
> ( BTW, 29/31 Aug are broken too. )
>
> For the correctness I built 22/23/24/25 Aug which were all good.
> As usual when I do once a year a git-bisect... its the last step of 12
> steps in total (see git-bisect_visualize.txt and
> git-bisect_view-stat.txt).
>
> I was irritated by using next-20110825 and next-20110826 as git-bisect
> good and bad, as the first offered commit-id pointed me to
> 3.1-rc1-665-gec5efe7, but I was sure the culprit is definitely
> v3.1-rc3+ and furthermore I got no info how many revisions have to be
> walked through. So, I cancelled this run.
>
> Next thought was to do a git format-patch using above mentionned
> commit-ids (of the tags): 819 single patches.
> I took the commit-ids of 0001 (good) and 0818 (bad) - 0819 was
> linux-specific mumbo-jumbo.
>
> Jiri has reported a similiar breakage in [2] and returning CULPRIT
> commit [1] helped him.
> ( This git-bisect steal a whole Friday. Anyway, it's good to see we
> isolated the "bad" patch. )
>
> As a personal conclusion, trust more git-bisect...
> It does not hurt to NOT turn off brain my looking over the single
> patches, there were lots of staging, arm, powerpc which could be
> surely ignored, but in the end you are wiser - 12 steps are faster
> than reverting xxx single commits on spec.
>
> - Sedat -
>
> [1] http://git.us.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=0856a304091b33a8e8f9f9c98e776f425af2b625
> [2] http://lkml.org/lkml/2011/9/1/332
> [3] http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
> [4] http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#using-bisect
>
> P.S.: Used 0001 (good) and 0818 (bad) for git-bisect
>
> $ head -5 0001-target-Change-TCM_NON_EXISTENT_LUN-response-to-ASC-L.patch
> 0818-Fixes-the-deadlock-when-inserting-and-removing-the-d.patch
> ==> 0001-target-Change-TCM_NON_EXISTENT_LUN-response-to-ASC-L.patch <==
> From eb39d34004888afcc0a44d9c36383cd69fa3b3b9 Mon Sep 17 00:00:00 2001
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> Date: Tue, 26 Jul 2011 16:59:00 -0700
> Subject: [PATCH 001/819] target: Change TCM_NON_EXISTENT_LUN response to
>  ASC=LOGICAL UNIT NOT SUPPORTED
>
> ==> 0818-Fixes-the-deadlock-when-inserting-and-removing-the-d.patch <==
> From 9b198e560524e3fe341cd2ae478a1cdf2f57fc33 Mon Sep 17 00:00:00 2001
> From: Clifton Barnes <cabarnes@indesign-llc.com>
> Date: Thu, 25 Aug 2011 09:47:59 +1000
> Subject: [PATCH 818/819] Fixes the deadlock when inserting and removing the
>  ds2780.
>

Issue fixed, see "[PATCH -next v2] unix stream: Fix use-after-free crashes" [1].

- Sedat -

[1] http://marc.info/?l=linux-netdev&m=131511507007022&w=2

^ permalink raw reply

* [PATCH 2/2] net: Handle different key sizes between address families in flow cache
From: David Ward @ 2011-09-04 13:07 UTC (permalink / raw)
  To: netdev; +Cc: Julian Anastasov, David Ward
In-Reply-To: <alpine.LFD.2.00.1109031004500.1492@ja.ssi.bg>

With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
 include/net/flow.h |   19 +++++++++++++++++++
 net/core/flow.c    |   28 ++++++++++++++++------------
 2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 2ec377d..99119f6 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -7,6 +7,7 @@
 #ifndef _NET_FLOW_H
 #define _NET_FLOW_H
 
+#include <linux/socket.h>
 #include <linux/in6.h>
 #include <linux/atomic.h>
 
@@ -161,6 +162,24 @@ static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn)
 	return container_of(fldn, struct flowi, u.dn);
 }
 
+typedef unsigned long flow_compare_t;
+
+static inline size_t flowi_size(u16 family)
+{
+	switch (family) {
+	case AF_INET:
+		BUILD_BUG_ON(sizeof(struct flowi4) % sizeof(flow_compare_t));
+		return sizeof(struct flowi4);
+	case AF_INET6:
+		BUILD_BUG_ON(sizeof(struct flowi6) % sizeof(flow_compare_t));
+		return sizeof(struct flowi6);
+	case AF_DECnet:
+		BUILD_BUG_ON(sizeof(struct flowidn) % sizeof(flow_compare_t));
+		return sizeof(struct flowidn);
+	}
+	return 0;
+}
+
 #define FLOW_DIR_IN	0
 #define FLOW_DIR_OUT	1
 #define FLOW_DIR_FWD	2
diff --git a/net/core/flow.c b/net/core/flow.c
index bf32c33..1545445 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -172,26 +172,24 @@ static void flow_new_hash_rnd(struct flow_cache *fc,
 
 static u32 flow_hash_code(struct flow_cache *fc,
 			  struct flow_cache_percpu *fcp,
-			  const struct flowi *key)
+			  const struct flowi *key,
+			  size_t keysize)
 {
 	const u32 *k = (const u32 *) key;
 
-	return jhash2(k, (sizeof(*key) / sizeof(u32)), fcp->hash_rnd)
+	return jhash2(k, (keysize / sizeof(u32)), fcp->hash_rnd)
 		& (flow_cache_hash_size(fc) - 1);
 }
 
-typedef unsigned long flow_compare_t;
-
 /* I hear what you're saying, use memcmp.  But memcmp cannot make
  * important assumptions that we can here, such as alignment and
- * constant size.
+ * size.
  */
-static int flow_key_compare(const struct flowi *key1, const struct flowi *key2)
+static int flow_key_compare(const struct flowi *key1, const struct flowi *key2,
+			    size_t keysize)
 {
 	const flow_compare_t *k1, *k1_lim, *k2;
-	const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
-
-	BUILD_BUG_ON(sizeof(struct flowi) % sizeof(flow_compare_t));
+	const int n_elem = keysize / sizeof(flow_compare_t);
 
 	k1 = (const flow_compare_t *) key1;
 	k1_lim = k1 + n_elem;
@@ -215,6 +213,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 	struct flow_cache_entry *fle, *tfle;
 	struct hlist_node *entry;
 	struct flow_cache_object *flo;
+	size_t keysize;
 	unsigned int hash;
 
 	local_bh_disable();
@@ -222,6 +221,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 
 	fle = NULL;
 	flo = NULL;
+
+	keysize = flowi_size(family);
+	if (!keysize)
+		goto nocache;
+
 	/* Packet really early in init?  Making flow_cache_init a
 	 * pre-smp initcall would solve this.  --RR */
 	if (!fcp->hash_table)
@@ -230,11 +234,11 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 	if (fcp->hash_rnd_recalc)
 		flow_new_hash_rnd(fc, fcp);
 
-	hash = flow_hash_code(fc, fcp, key);
+	hash = flow_hash_code(fc, fcp, key, keysize);
 	hlist_for_each_entry(tfle, entry, &fcp->hash_table[hash], u.hlist) {
 		if (tfle->family == family &&
 		    tfle->dir == dir &&
-		    flow_key_compare(key, &tfle->key) == 0) {
+		    flow_key_compare(key, &tfle->key, keysize) == 0) {
 			fle = tfle;
 			break;
 		}
@@ -248,7 +252,7 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 		if (fle) {
 			fle->family = family;
 			fle->dir = dir;
-			memcpy(&fle->key, key, sizeof(*key));
+			memcpy(&fle->key, key, keysize);
 			fle->object = NULL;
 			hlist_add_head(&fle->u.hlist, &fcp->hash_table[hash]);
 			fcp->hash_count++;
-- 
1.7.1

^ permalink raw reply related

* [PATCH 0/2] Fixes to flow cache for AF-specifc flowi structs
From: David Ward @ 2011-09-04 13:07 UTC (permalink / raw)
  To: netdev; +Cc: Julian Anastasov, David Ward
In-Reply-To: <alpine.LFD.2.00.1109031004500.1492@ja.ssi.bg>

These fixes to the flow cache are needed with the conversion to AF-specific
flowi structs. They are written so as to avoid introducing AF-specific code
into net/core/flow.c.

Note that __xfrm_policy_check (in net/xfrm/xfrm_policy.c) still allocates a
struct flowi on the stack and passes it to flow_cache_lookup. My understanding
is that since this is on the stack, this will not be aligned, and therefore it
will cause problems with flow_hash_code and flow_key_compare. Is that correct?

Signed-off-by: David Ward <david.ward@ll.mit.edu>

David Ward (2):
  net: Align AF-specific flowi structs to long
  net: Handle different key sizes between address families in flow
    cache

 include/net/flow.h |   25 ++++++++++++++++++++++---
 net/core/flow.c    |   28 ++++++++++++++++------------
 2 files changed, 38 insertions(+), 15 deletions(-)

^ permalink raw reply

* [PATCH 1/2] net: Align AF-specific flowi structs to long
From: David Ward @ 2011-09-04 13:07 UTC (permalink / raw)
  To: netdev; +Cc: Julian Anastasov, David Ward
In-Reply-To: <alpine.LFD.2.00.1109031004500.1492@ja.ssi.bg>

AF-specific flowi structs are now passed to flow_key_compare, which must
also be aligned to a long.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
---
 include/net/flow.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 78113da..2ec377d 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -68,7 +68,7 @@ struct flowi4 {
 #define fl4_ipsec_spi		uli.spi
 #define fl4_mh_type		uli.mht.type
 #define fl4_gre_key		uli.gre_key
-};
+} __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 static inline void flowi4_init_output(struct flowi4 *fl4, int oif,
 				      __u32 mark, __u8 tos, __u8 scope,
@@ -112,7 +112,7 @@ struct flowi6 {
 #define fl6_ipsec_spi		uli.spi
 #define fl6_mh_type		uli.mht.type
 #define fl6_gre_key		uli.gre_key
-};
+} __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 struct flowidn {
 	struct flowi_common	__fl_common;
@@ -127,7 +127,7 @@ struct flowidn {
 	union flowi_uli		uli;
 #define fld_sport		uli.ports.sport
 #define fld_dport		uli.ports.dport
-};
+} __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 struct flowi {
 	union {
-- 
1.7.1

^ permalink raw reply related

* Re: [next] unix stream crashes
From: Valdis.Kletnieks @ 2011-09-04 14:43 UTC (permalink / raw)
  To: Yan, Zheng 
  Cc: Jiri Slaby, sedat.dilek, Sedat Dilek, Tim Chen, David S. Miller,
	ML netdev, LKML, Stephen Rothwell
In-Reply-To: <CAAM7YAkB3VVNMmBMVuvEZuV6oGZeyog37_sjFGUunu+15apvZA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 321 bytes --]

On Sat, 03 Sep 2011 20:30:19 +0800, "Yan, Zheng " said:
> The skb can be destructed before the while loop in unix_stream_sendmsg stops.
> please try below patch.

Tried this on top of next-20110831, and system is up and running fine.

Feel free to stick this on it:

Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] net: Handle different key sizes between address families in flow cache
From: Michał Mirosław @ 2011-09-04 15:34 UTC (permalink / raw)
  To: David Ward; +Cc: netdev, Julian Anastasov
In-Reply-To: <1315141641-3120-3-git-send-email-david.ward@ll.mit.edu>

2011/9/4 David Ward <david.ward@ll.mit.edu>:
> With the conversion of struct flowi to a union of AF-specific structs, some
> operations on the flow cache need to account for the exact size of the key.
[...]
> --- a/include/net/flow.h
> +++ b/include/net/flow.h
[...]
> +typedef unsigned long flow_compare_t;
> +
> +static inline size_t flowi_size(u16 family)
> +{
> +       switch (family) {
> +       case AF_INET:
> +               BUILD_BUG_ON(sizeof(struct flowi4) % sizeof(flow_compare_t));
> +               return sizeof(struct flowi4);
> +       case AF_INET6:
> +               BUILD_BUG_ON(sizeof(struct flowi6) % sizeof(flow_compare_t));
> +               return sizeof(struct flowi6);
> +       case AF_DECnet:
> +               BUILD_BUG_ON(sizeof(struct flowidn) % sizeof(flow_compare_t));
> +               return sizeof(struct flowidn);
> +       }
> +       return 0;
> +}

Since most called user (flow_key_compare) uses returned value didided
by sizeof(flow_compare_t), you could just return divided value here
and save some shift operations by it.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH -next v2] unix stream: Fix use-after-free crashes
From: Joe Perches @ 2011-09-04 15:50 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: sedat.dilek, netdev@vger.kernel.org, davem@davemloft.net,
	sfr@canb.auug.org.au, tim.c.chen@linux.intel.com,
	jirislaby@gmail.com
In-Reply-To: <CAAM7YAk0csJLLVFF6KYA8vMG_BN6QDvmYvtnY0Sq7P=cBsuKfg@mail.gmail.com>

On Sun, 2011-09-04 at 16:23 +0800, Yan, Zheng wrote:
> On Sun, Sep 4, 2011 at 3:12 PM, Sedat Dilek <sedat.dilek@googlemail.com> wrote:
> > On Sun, Sep 4, 2011 at 7:44 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
> >> It passes the scm reference to the first skb. Skb(s) afterwards may
> >> reference freed data structure because the first skb can be destructed
> >> by the receiver at anytime. The fix is by passing the scm reference to
> >> the very last skb.
> > s/by passing/bypassing ?
> No

(putting on my Randy Dunlap hat)

The issue was fixed by passing...
or maybe
The fix is to pass...

^ permalink raw reply

* Re: [PATCH 08/11] netlink: implement memory mapped sendmsg()
From: Michał Mirosław @ 2011-09-04 16:18 UTC (permalink / raw)
  To: kaber; +Cc: davem, netfilter-devel, netdev
In-Reply-To: <1315070771-18576-9-git-send-email-kaber@trash.net>

On Sat, Sep 03, 2011 at 07:26:08PM +0200, kaber@trash.net wrote:
> From: Patrick McHardy <kaber@trash.net>
> 
> Add support for memory mapped sendmsg() to netlink. Userspace queued to
> be processed frames into the TX ring and invokes sendmsg with
> msg.iov.iov_base = NULL to trigger processing of all pending messages.
> 
> Since the kernel usually performs full message validation before beginning
> processing, userspace must be prevented from modifying the message
> contents while the kernel is processing them. In order to do so, the
> frames contents are copied to an allocated skb in case the the ring is
> mapped more than once or the file descriptor is shared (f.i. through
> AF_UNIX file descriptor passing).
> 
> Otherwise an skb without a data area is allocated, the data pointer set
> to point to the data area of the ring frame and the skb is processed.
> Once the skb is freed, the destructor releases the frame back to userspace
> by setting the status to NL_MMAP_STATUS_UNUSED.

Is this protected from threads? Like: one thread waits on sendmsg() and
another (same process) changes the buffer.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Stephen Hemminger @ 2011-09-04 16:36 UTC (permalink / raw)
  To: Nicolas de Pesloüan; +Cc: David S. Miller, netdev
In-Reply-To: <4E632A2E.5040805@gmail.com>

On Sun, 04 Sep 2011 09:35:10 +0200
Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:

> Le 04/09/2011 06:14, Stephen Hemminger a écrit :
> 
> >> Instead of asserting carrier when the bridge have no port, can't we assert carrier when the three
> >> following condition are true at the same time :
> >>
> >> - The bridge have no port.
> >> - At least one IP address is setup on the bridge.
> >> - The two above conditions are true for more than a configurable amount of seconds, with a default
> >> of 10, for example.
> >>
> >> This would only delay carrier on for a few seconds for the regression and keep the current behavior
> >> (carrier off until at least 1 port is on) for DHCP.
> >
> > This fails on two counts:
> > 1. Bridge's often run without IP addresses!
> > 2. DHCP won't try and send out request until carrier is true.
> 
> Sorry, I missed to say that we should of course also assert carrier on if one port has carrier on.
> 
> And rethinking about it, the delay is probably useless :
> 
> bridge_carrier_on = at_least_one_port_has_carrier_on | (bridge_has_no_port & bridge_has_at_least_one_ip)
> 
> That way :
> - for those using bridge without any port, manually setting the IP will assert carrier on. (By the 
> way, why don't they use a dummy device instead?)
> 
> - for those using bridge with ports:
> -- Using any kind of autoconfig will work as expected. Carrier will only be asserted at the time 
> first port get carrier.
> -- Using static IP confifiguration, carrier will possibly be erroneously reported as on during the 
> small time gap between IP address configuration and first port is added to the bridge. This time gap 
> may be removed by simply configuring the IP after the first port is added. This is probably already 
> true for most distribs. And anyway, this time gap is probably not a problem.
> -- Carrier will also be erroneously reported as on after removing the last port, if the bridge still 
> has an IP. (But we can arrange for this not to happen).
> 
> And in order to ensure user really understand why carrier is on of off, we can simply issue an INFO 
> message for the non-natural case (bridge_has_no_port & bridga_has_at_least_one_ip).
> 
> I consider all this reasonable.
> 
> 	Nicolas.

Any bridge behaviour based on IP address configuration is a
layering violation and won't work.  The problem is related to dynamic issues
with IPv6 and DHCP and needs to be addressed at that level.

^ permalink raw reply

* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Krzysztof Olędzki @ 2011-09-04 17:12 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Nicolas de Pesloüan, David S. Miller, netdev
In-Reply-To: <20110904093634.685d7c56@nehalam.ftrdhcpuser.net>

On 2011-09-04 18:36, Stephen Hemminger wrote:
> On Sun, 04 Sep 2011 09:35:10 +0200
> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 04/09/2011 06:14, Stephen Hemminger a écrit :
>>
>>>> Instead of asserting carrier when the bridge have no port, can't we assert carrier when the three
>>>> following condition are true at the same time :
>>>>
>>>> - The bridge have no port.
>>>> - At least one IP address is setup on the bridge.
>>>> - The two above conditions are true for more than a configurable amount of seconds, with a default
>>>> of 10, for example.
>>>>
>>>> This would only delay carrier on for a few seconds for the regression and keep the current behavior
>>>> (carrier off until at least 1 port is on) for DHCP.
>>>
>>> This fails on two counts:
>>> 1. Bridge's often run without IP addresses!
>>> 2. DHCP won't try and send out request until carrier is true.
>>
>> Sorry, I missed to say that we should of course also assert carrier on if one port has carrier on.
>>
>> And rethinking about it, the delay is probably useless :
>>
>> bridge_carrier_on = at_least_one_port_has_carrier_on | (bridge_has_no_port&  bridge_has_at_least_one_ip)
>>
>> That way :
>> - for those using bridge without any port, manually setting the IP will assert carrier on. (By the
>> way, why don't they use a dummy device instead?)
>>
>> - for those using bridge with ports:
>> -- Using any kind of autoconfig will work as expected. Carrier will only be asserted at the time
>> first port get carrier.
>> -- Using static IP confifiguration, carrier will possibly be erroneously reported as on during the
>> small time gap between IP address configuration and first port is added to the bridge. This time gap
>> may be removed by simply configuring the IP after the first port is added. This is probably already
>> true for most distribs. And anyway, this time gap is probably not a problem.
>> -- Carrier will also be erroneously reported as on after removing the last port, if the bridge still
>> has an IP. (But we can arrange for this not to happen).
>>
>> And in order to ensure user really understand why carrier is on of off, we can simply issue an INFO
>> message for the non-natural case (bridge_has_no_port&  bridga_has_at_least_one_ip).
>>
>> I consider all this reasonable.
>>
>> 	Nicolas.
>
> Any bridge behaviour based on IP address configuration is a
> layering violation and won't work.  The problem is related to dynamic issues
> with IPv6 and DHCP and needs to be addressed at that level.

Maybe we can simply add a switch controlling if a bridge with no 
attached ports has carrier off (default) or on.

For example:
  echo {0|1} > /sys/devices/virtual/net/brX/bridge/orphaned_carrier

  brctl orphaned_carrier brX {on|off}

Best regards,

				Krzysztof Olędzki

^ permalink raw reply

* Re: [PATCH 2/2] Add a netlink attribute INET_DIAG_SECCTX
From: Rongqing Li @ 2011-09-05  0:32 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, selinux, linux-security-module
In-Reply-To: <1971656.b979ahlREa@sifl>

On 09/01/2011 08:28 PM, Paul Moore wrote:
> On Thursday, September 01, 2011 05:33:07 PM Rongqing Li wrote:
>> On 09/01/2011 05:18 AM, Paul Moore wrote:
>>> On Wednesday, August 31, 2011 04:36:17 PM rongqing.li@windriver.com wrote:
>>>> From: Roy.Li<rongqing.li@windriver.com>
>>>>
>>>> Add a new netlink attribute INET_DIAG_SECCTX to dump the security
>>>> context of TCP sockets.
>>>
>>> You'll have to forgive me, I'm not familiar with the netlink code used
>>> by
>>> netstat and friends, but is there anyway to report back the security
>>> context of UDP sockets?  Or does the code below handle that already?
>>>
>>> In general, AF_INET and AF_INET6 sockets, regardless of any upper level
>>> protocols, have security contexts associated with them and it would be
>>> nice to see them in netstat.
>>
>> Yes, this is real concern, If the dumping tcp security context can be
>> accepted by netdev, I am planning to implement it for ipv4 udp socket, unix
>> socket. then ipv6..
>
> Great, I'm glad to hear you're planning on implementing this for more than
> just TCP.
>
> I understand your desire to have the basic idea accepted with only TCP
> implemented - and that is fine with me - but I would like to see support for
> all of the protocols merged at the same time.  In other words, seeking the
> basic ACKs for TCP from the davem, et al is okay but I'd like to defer merging
> TCP support until you have everything implemented and ready to be merged.
>

Ok, I will try

-- 
Best Reagrds,
Roy | RongQing Li

^ permalink raw reply

* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Ang Way Chuang @ 2011-09-05  4:51 UTC (permalink / raw)
  To: Krzysztof Olędzki
  Cc: Stephen Hemminger, Nicolas de Pesloüan, David S. Miller,
	netdev, Achmad Basuki
In-Reply-To: <4E63B175.6020106@ans.pl>

On 05/09/11 02:12, Krzysztof Olędzki wrote:
> On 2011-09-04 18:36, Stephen Hemminger wrote:
>> On Sun, 04 Sep 2011 09:35:10 +0200
>> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>  wrote:
>>
>>> Le 04/09/2011 06:14, Stephen Hemminger a écrit :
>>>
>>>>> Instead of asserting carrier when the bridge have no port, can't we assert carrier when the three
>>>>> following condition are true at the same time :
>>>>>
>>>>> - The bridge have no port.
>>>>> - At least one IP address is setup on the bridge.
>>>>> - The two above conditions are true for more than a configurable amount of seconds, with a default
>>>>> of 10, for example.
>>>>>
>>>>> This would only delay carrier on for a few seconds for the regression and keep the current behavior
>>>>> (carrier off until at least 1 port is on) for DHCP.
>>>>
>>>> This fails on two counts:
>>>> 1. Bridge's often run without IP addresses!
>>>> 2. DHCP won't try and send out request until carrier is true.
>>>
>>> Sorry, I missed to say that we should of course also assert carrier on if one port has carrier on.
>>>
>>> And rethinking about it, the delay is probably useless :
>>>
>>> bridge_carrier_on = at_least_one_port_has_carrier_on | (bridge_has_no_port&  bridge_has_at_least_one_ip)
>>>
>>> That way :
>>> - for those using bridge without any port, manually setting the IP will assert carrier on. (By the
>>> way, why don't they use a dummy device instead?)
>>>
>>> - for those using bridge with ports:
>>> -- Using any kind of autoconfig will work as expected. Carrier will only be asserted at the time
>>> first port get carrier.
>>> -- Using static IP confifiguration, carrier will possibly be erroneously reported as on during the
>>> small time gap between IP address configuration and first port is added to the bridge. This time gap
>>> may be removed by simply configuring the IP after the first port is added. This is probably already
>>> true for most distribs. And anyway, this time gap is probably not a problem.
>>> -- Carrier will also be erroneously reported as on after removing the last port, if the bridge still
>>> has an IP. (But we can arrange for this not to happen).
>>>
>>> And in order to ensure user really understand why carrier is on of off, we can simply issue an INFO
>>> message for the non-natural case (bridge_has_no_port&  bridga_has_at_least_one_ip).
>>>
>>> I consider all this reasonable.
>>>
>>>     Nicolas.
>>
>> Any bridge behaviour based on IP address configuration is a
>> layering violation and won't work.  The problem is related to dynamic issues
>> with IPv6 and DHCP and needs to be addressed at that level.
>
> Maybe we can simply add a switch controlling if a bridge with no attached ports has carrier off (default) or on.
>
> For example:
>  echo {0|1} > /sys/devices/virtual/net/brX/bridge/orphaned_carrier
>
>  brctl orphaned_carrier brX {on|off}
>
> Best regards,
>
>                 Krzysztof Olędzki
My colleague investigated the problem. In the case of libvirt, there are 2 issues at hand:
- dnsmasq cannot bind to port 53 for IPv6 address, therefore fails.
- radvd fails to start because the interface is treated as being down.

This blog entry from Daniel Berrange explains the configuration used by libvirt:
http://berrange.com/posts/2011/06/16/providing-ipv6-connectivity-to-virtual-guests-with-libvirt-and-kvm/
It's quite possible that other users may have the same problem in the future if they plan to use radvd or
dnsmasq on a bridge interface. I'm not sure if there are other applications that depend on the carrier on
a bridge interface to function properly. Come to think of it, unless the user is aware of the change
in carrier on the bridge interface, it is very hard for them to diagnose the problem and apply the right
solution (toggling the carrier). Ideally, there is no reason why dnsmasq and radvd should be started if
there is no port attached to a bridge interface. But, we cannot assume that's how it is done.

I think that it is prudent that the whole commit should be reverted because the problem may not be limited
to libvirt use case.
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH] ipv4: Fix fib_info->fib_metrics leak
From: Yan, Zheng @ 2011-09-05  6:24 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: davem@davemloft.net, laijs

Commit 4670994d(net,rcu: convert call_rcu(fc_rport_free_rcu) to
kfree_rcu()) introduced a memory leak. This patch reverts it.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 33e2c35..80106d8 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -142,6 +142,14 @@ const struct fib_prop fib_props[RTN_MAX + 1] = {
 };
 
 /* Release a nexthop info record */
+static void free_fib_info_rcu(struct rcu_head *head)
+{
+	struct fib_info *fi = container_of(head, struct fib_info, rcu);
+
+	if (fi->fib_metrics != (u32 *) dst_default_metrics)
+		kfree(fi->fib_metrics);
+	kfree(fi);
+}
 
 void free_fib_info(struct fib_info *fi)
 {
@@ -156,7 +164,7 @@ void free_fib_info(struct fib_info *fi)
 	} endfor_nexthops(fi);
 	fib_info_cnt--;
 	release_net(fi->fib_net);
-	kfree_rcu(fi, rcu);
+	call_rcu(&fi->rcu, free_fib_info_rcu);
 }
 
 void fib_release_info(struct fib_info *fi)

^ permalink raw reply related

* RE: [PATCH net-next v4 4/4] r8169: support new chips of RTL8111F
From: hayeswang @ 2011-09-05 12:13 UTC (permalink / raw)
  To: 'Francois Romieu'; +Cc: netdev, linux-kernel
In-Reply-To: <20110902162831.GA25544@electric-eye.fr.zoreil.com>

 

> -----Original Message-----
> From: Francois Romieu [mailto:romieu@fr.zoreil.com] 
> Sent: Saturday, September 03, 2011 12:29 AM
> To: Hayeswang
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next v4 4/4] r8169: support new chips 
> of RTL8111F
> 
> Hayes Wang <hayeswang@realtek.com> :
> > Support new chips of RTL8111F.
> 
> Amongst other things :o)
> 
> > diff --git a/drivers/net/ethernet/realtek/r8169.c 
> > b/drivers/net/ethernet/realtek/r8169.c
> > index 175c769..8e6a200 100644
> > --- a/drivers/net/ethernet/realtek/r8169.c
> > +++ b/drivers/net/ethernet/realtek/r8169.c
> [...]
> > @@ -711,7 +719,10 @@ MODULE_FIRMWARE(FIRMWARE_8168D_1);
> >  MODULE_FIRMWARE(FIRMWARE_8168D_2);
> >  MODULE_FIRMWARE(FIRMWARE_8168E_1);
> >  MODULE_FIRMWARE(FIRMWARE_8168E_2);
> > +MODULE_FIRMWARE(FIRMWARE_8168E_3);
> 
> This one is relevant for Linus's tree.
> 
> Don't worry about submitting again, I'll send it separately.
> 

OK. I would adjust it again.

> No opinion regarding the jumbo fixes patches I sent on 2011/07/17 ?
> 

It seems fine except that you don't need to disable rx checksum when using jumbo
frame. The chips don't support tx checksum when using jumbo frame because of the
feature of early tx. The early tx means that the hw would start to send the
packet without loading the all data into fifo, so the software checksum is
necessary. However, there is no influence for rx, so you could still enable rx
checksum.
 
Best Regards,
Hayes

^ permalink raw reply

* RE: [linux-firmware v4 2/2] rtl_nic: add new firmware for RTL8111F
From: hayeswang @ 2011-09-05 12:33 UTC (permalink / raw)
  To: 'Ben Hutchings'; +Cc: dwmw2, romieu, netdev
In-Reply-To: <1314969392.3092.157.camel@deadeye>

 

> -----Original Message-----
> From: Ben Hutchings [mailto:ben@decadent.org.uk] 
> Sent: Friday, September 02, 2011 9:17 PM
> To: Hayeswang
> Cc: dwmw2@infradead.org; romieu@fr.zoreil.com; netdev@vger.kernel.org
> Subject: Re: [linux-firmware v4 2/2] rtl_nic: add new 
> firmware for RTL8111F
> 
> On Thu, 2011-09-01 at 15:07 +0800, Hayes Wang wrote:
> > Add new firmware:
> > 1. rtl_nic/rtl8168f-1.fw
> >    version: 0.0.2
> > 2. rtl_nic/rtl8168f-2.fw
> >    version: 0.0.2
> [...]
> 
> Applied.  (And please do send firmware files to me as well as 
> David; we can both commit to linux-firmware.git.)
> 

Fine. Thanks.
 
Best Regards,
Hayes

^ permalink raw reply

* 3.1-rc2: (on a second machine): WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x23f/0x250()
From: Justin Piszcz @ 2011-09-05 12:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Alan Piszcz

Hello,

Kernel crash: 3.1-rc2:

Sep  5 08:28:36 l1 kernel: [1194854.704017] ------------[ cut here ]------------
Sep  5 08:28:36 l1 kernel: [1194854.704030] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x23f/0x250()
Sep  5 08:28:36 l1 kernel: [1194854.704033] Hardware name: P35-DS4
Sep  5 08:28:36 l1 kernel: [1194854.704036] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Sep  5 08:28:36 l1 kernel: [1194854.704039] Modules linked in: tcp_diag ub ppdev lp inet_diag w83795 ipmi_msghandler dm_mod nouveau ttm snd_hda_codec_realtek snd_hda_intel snd_hda_codec drm_kms_helper snd_pcm_oss snd_mixer_oss drm snd_pcm i2c_algo_bit mxm_wmi wmi intel_agp intel_gtt agpgart floppy snd_timer firewire_ohci firewire_core video snd soundcore snd_page_alloc parport_pc parport
Sep  5 08:28:36 l1 kernel: [1194854.704077] Pid: 19, comm: ksoftirqd/3 Not tainted 3.1.0-rc2 #1
Sep  5 08:28:36 l1 kernel: [1194854.704080] Call Trace:
Sep  5 08:28:36 l1 kernel: [1194854.704088]  [<ffffffff810372da>] warn_slowpath_common+0x7a/0xb0
Sep  5 08:28:36 l1 kernel: [1194854.704093]  [<ffffffff810373b1>] warn_slowpath_fmt+0x41/0x50
Sep  5 08:28:36 l1 kernel: [1194854.704099]  [<ffffffff815ee154>] ? schedule+0x2e4/0x950
Sep  5 08:28:36 l1 kernel: [1194854.704103]  [<ffffffff814fbc7f>] dev_watchdog+0x23f/0x250
Sep  5 08:28:36 l1 kernel: [1194854.704109]  [<ffffffff81043162>] run_timer_softirq+0xf2/0x220
Sep  5 08:28:36 l1 kernel: [1194854.704113]  [<ffffffff814fba40>] ? qdisc_reset+0x50/0x50
Sep  5 08:28:36 l1 kernel: Sep  5 08:31:47 l1 syslogd 1.5.0#6.1: restart (remote reception).

Thoughts?

Justin.

^ permalink raw reply

* Re: [PATCH 1/2] sfc: Remove requirement to set tx-usecs-irq for shared channels when modifying coalescing parameters
From: Ben Hutchings @ 2011-09-05 14:12 UTC (permalink / raw)
  To: Ripduman Sohan; +Cc: linux-net-drivers, shodgson, netdev
In-Reply-To: <1314784356-30619-1-git-send-email-ripduman.sohan@cl.cam.ac.uk>

On Wed, 2011-08-31 at 10:52 +0100, Ripduman Sohan wrote:
> Shared TX/RX channels possess a single channel timer controlled by the
> rx-usecs-irq parameter.  Changing coalescing parameters required
> explicitly setting the tx-usecs-irq parameter to 0.  Ethtool (to HEAD
> of tree) does not do this and instead retrieves and re-submits the
> current tx-usecs-irq value resulting in an unsupported operation
> error.  I found this behaviour counter-intuitive and was only able to
> work out correct moderation parameters by studying the driver code.
> 
> This patch relaxes the requirement to set tx-usecs-irq to 0 by only
> erring if the presented tx-usecs-irq value differs from the current
> value.  I acknowledge, however, that there may be existing scripts
> relying on the old behaviour and so this condition is only triggered
> if a value for tx-usecs-irq is actually presented.

After you first reminded me about this in email, I had a good look at
our ethtool coalescing control functions and tried to fix them
thoroughly.

Eli Cohen also recently queried about the semantics of the fields in
struct ethtool_coalesce
<http://thread.gmane.org/gmane.linux.network/202368>, so I updated the
kernel-doc comments for it (19e2f6f..a27fc96).

So I have some changes of my own in preparation, which I'll send as
follow-ups.

> ---
>  drivers/net/sfc/efx.c     |    6 +++---
>  drivers/net/sfc/efx.h     |    1 +
>  drivers/net/sfc/ethtool.c |    4 +++-
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
> index faca764..9a313cd 100644
> --- a/drivers/net/sfc/efx.c
> +++ b/drivers/net/sfc/efx.c
> @@ -1556,7 +1556,7 @@ static void efx_remove_all(struct efx_nic *efx)
>   *
>   **************************************************************************/
>  
> -static unsigned irq_mod_ticks(int usecs, int resolution)
> +unsigned efx_irq_mod_ticks(int usecs, int resolution)
>  {
>  	if (usecs <= 0)
>  		return 0; /* cannot receive interrupts ahead of time :-) */
> @@ -1570,8 +1570,8 @@ void efx_init_irq_moderation(struct efx_nic *efx, int tx_usecs, int rx_usecs,
>  			     bool rx_adaptive)
>  {
>  	struct efx_channel *channel;
> -	unsigned tx_ticks = irq_mod_ticks(tx_usecs, EFX_IRQ_MOD_RESOLUTION);
> -	unsigned rx_ticks = irq_mod_ticks(rx_usecs, EFX_IRQ_MOD_RESOLUTION);
> +	unsigned tx_ticks = efx_irq_mod_ticks(tx_usecs, EFX_IRQ_MOD_RESOLUTION);
> +	unsigned rx_ticks = efx_irq_mod_ticks(rx_usecs, EFX_IRQ_MOD_RESOLUTION);
>  
>  	EFX_ASSERT_RESET_SERIALISED(efx);
>  

I would rather add a efx_get_irq_moderation() function for use in
ethtool.c.

> diff --git a/drivers/net/sfc/efx.h b/drivers/net/sfc/efx.h
> index b0d1209..ddfcc7e 100644
> --- a/drivers/net/sfc/efx.h
> +++ b/drivers/net/sfc/efx.h
> @@ -113,6 +113,7 @@ extern int efx_reset_up(struct efx_nic *efx, enum reset_type method, bool ok);
>  extern void efx_schedule_reset(struct efx_nic *efx, enum reset_type type);
>  extern void efx_init_irq_moderation(struct efx_nic *efx, int tx_usecs,
>  				    int rx_usecs, bool rx_adaptive);
> +extern unsigned efx_irq_mod_ticks(int usecs, int resolution);
>  
>  /* Dummy PHY ops for PHY drivers */
>  extern int efx_port_dummy_op_int(struct efx_nic *efx);
> diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
> index bc4643a..0a52447 100644
> --- a/drivers/net/sfc/ethtool.c
> +++ b/drivers/net/sfc/ethtool.c
> @@ -644,7 +644,9 @@ static int efx_ethtool_set_coalesce(struct net_device *net_dev,
>  	efx_for_each_channel(channel, efx) {
>  		if (efx_channel_has_rx_queue(channel) &&
>  		    efx_channel_has_tx_queues(channel) &&
> -		    tx_usecs) {
> +		    tx_usecs &&
> +		    efx_irq_mod_ticks(tx_usecs, EFX_IRQ_MOD_RESOLUTION) !=
> +		    channel->irq_moderation) {
>  			netif_err(efx, drv, efx->net_dev, "Channel is shared. "
>  				  "Only RX coalescing may be set\n");
>  			return -EOPNOTSUPP;

channel->irq_moderation will be the value selected by the adaptive
moderation algorithm.  efx->rx_irq_moderation is the appropriate value
to use here.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] usbnet: ignore get interface retval of -EINPROGRESS
From: Jim Wylder @ 2011-09-05 16:29 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev
In-Reply-To: <201109041216.12127.oliver@neukum.org>

When calling pm_runtime_get, usb_autopm_get_interface_async
treats a return value of -EINPROGRESS as a success and
increments the usage count.  Since the interface is resuming,
it is safe for usbnet_start_xmit to submit the urb.  If instead,
usbnet_start_xmit treats this as an error the packet will be
dropped.  Additionally, a corresponding tx_complete will not
run to offset the earlier increment of the usage count from the
call to usb_autopm_get_interface_async.

Signed-off-by: James Wylder <james.wylder@motorola.com>
---
 drivers/net/usb/usbnet.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index ce395fe..7dcef05 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -81,6 +81,19 @@

 /*-------------------------------------------------------------------------*/

+/* When power management is configured, usb_autopm_get_interface_async()
+ * will return -EINPROGRESS to signify that the interface was already
+ * resuming at the time that the function was called.  In our case
+ * (all cases?) this not an error condition.
+ * */
+#ifdef CONFIG_PM
+#define USBNET_PM_INTF_ALREADY_RESUMING(x) (x == -EINPROGRESS)
+#else
+#define USBNET_PM_INTF_ALREADY_RESUMING(x) 0
+#endif
+
+/*-------------------------------------------------------------------------*/
+
 // randomly generated ethernet address
 static u8	node_id [ETH_ALEN];

@@ -1042,6 +1055,7 @@ EXPORT_SYMBOL_GPL(usbnet_tx_timeout);

 /*-------------------------------------------------------------------------*/

+
 netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 				     struct net_device *net)
 {
@@ -1105,7 +1119,7 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,

 	spin_lock_irqsave(&dev->txq.lock, flags);
 	retval = usb_autopm_get_interface_async(dev->intf);
-	if (retval < 0) {
+	if (retval < 0 && !USBNET_PM_INTF_ALREADY_RESUMING(retval)) {
 		spin_unlock_irqrestore(&dev->txq.lock, flags);
 		goto drop;
 	}
-- 
1.7.6

On Sun, Sep 4, 2011 at 5:16 AM, Oliver Neukum <oliver@neukum.org> wrote:
> Am Samstag, 3. September 2011, 19:32:44 schrieb Jim Wylder:
>
> Hi,
>
>> Thanks for the quick feedback.  True usbnet_start_xmit() could be
>> running at anytime, but the usb_autopm_get_interface_async() will only
>> return -EINPROGRESS when rpm_resume detects that
>> dev->power.runtime_status == RPM_RESUMING.  My understanding is that
>> for an asynchronous request, the promise that the device is resuming
>> would be equivalent to cases where usb_autopm_get_interface_async()
>> returns success.
>
> If CONFIG_PM is set.
>
>> In all other cases, when we are not attempting to resume an already
>> resuming interface, this change should have no impact.
>>
>> Are you recommending that I add an additional check for DEV_ASLEEP,
>
> The check is already there but depending on CONFIG_PM.
>
>> possibly to decide whether to drop the or continue on?  Or am I
>> missing your point?  I had not done anything similar because my
>> understanding was that knowing that the device is in fact resuming
>> would be sufficient.
>
> At that point it is sufficient. Upon further thought it looks like your check
> is correct.
> We just need to make very sure that for the decision to queue or
> to submit EVENT_DEV_ASLEEP is relevant.
> Would you consider defining a macro with a nice name for the
> ==0 || == -EINPROGRESS check, so that any reader knows what
> this is about? This is because I doubt only usbnet is affected.
>
>        Regards
>                Oliver
>

^ permalink raw reply related

* Re: [PATCH] usbnet: ignore get interface retval of -EINPROGRESS
From: Jim Wylder @ 2011-09-05 16:36 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev
In-Reply-To: <CAPopfEXz+HPdwWh_GQ2WzMnHh5DuXTse78NAU=-iN15wv-0C5g@mail.gmail.com>

Oliver,

I have a couple of questions about this:

- Does the macro as implemented properly communicate the intent as you
requested?

- You made a comment about this being useful outside of usbnet.  The
only appropriate place that I could identify for this would be all the
way up in usb.h.  Do you think something at that level is more
appropriate?

Jim

^ permalink raw reply

* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Nicolas de Pesloüan @ 2011-09-05 17:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110904093634.685d7c56@nehalam.ftrdhcpuser.net>

Le 04/09/2011 18:36, Stephen Hemminger a écrit :
> On Sun, 04 Sep 2011 09:35:10 +0200
> Nicolas de Pesloüan<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 04/09/2011 06:14, Stephen Hemminger a écrit :
>>
>>>> Instead of asserting carrier when the bridge have no port, can't we assert carrier when the three
>>>> following condition are true at the same time :
>>>>
>>>> - The bridge have no port.
>>>> - At least one IP address is setup on the bridge.
>>>> - The two above conditions are true for more than a configurable amount of seconds, with a default
>>>> of 10, for example.
>>>>
>>>> This would only delay carrier on for a few seconds for the regression and keep the current behavior
>>>> (carrier off until at least 1 port is on) for DHCP.
>>>
>>> This fails on two counts:
>>> 1. Bridge's often run without IP addresses!
>>> 2. DHCP won't try and send out request until carrier is true.
>>
>> Sorry, I missed to say that we should of course also assert carrier on if one port has carrier on.
>>
>> And rethinking about it, the delay is probably useless :
>>
>> bridge_carrier_on = at_least_one_port_has_carrier_on | (bridge_has_no_port&  bridge_has_at_least_one_ip)
>>
>> That way :
>> - for those using bridge without any port, manually setting the IP will assert carrier on. (By the
>> way, why don't they use a dummy device instead?)
>>
>> - for those using bridge with ports:
>> -- Using any kind of autoconfig will work as expected. Carrier will only be asserted at the time
>> first port get carrier.
>> -- Using static IP confifiguration, carrier will possibly be erroneously reported as on during the
>> small time gap between IP address configuration and first port is added to the bridge. This time gap
>> may be removed by simply configuring the IP after the first port is added. This is probably already
>> true for most distribs. And anyway, this time gap is probably not a problem.
>> -- Carrier will also be erroneously reported as on after removing the last port, if the bridge still
>> has an IP. (But we can arrange for this not to happen).
>>
>> And in order to ensure user really understand why carrier is on of off, we can simply issue an INFO
>> message for the non-natural case (bridge_has_no_port&  bridga_has_at_least_one_ip).
>>
>> I consider all this reasonable.
>>
>> 	Nicolas.
>
> Any bridge behaviour based on IP address configuration is a
> layering violation and won't work.  The problem is related to dynamic issues
> with IPv6 and DHCP and needs to be addressed at that level.

Well, this is not a bridge behavior, this is a behavior of the local (non physical) interface of the 
bridge. If this interface were physical, it would have been external to the bridge (on one of the 
hosts connected to the bridge).  As such, this behavior is not related to the layering of the 
bridge, so I don't consider it a layering violation.

My proposed change doesn't impact the way the bridge works (forwards packets) in any way.

And it really solves both issues.

	Nicolas.

^ permalink raw reply

* [PATCH net-next 1/5] sfc: Correct error code for unsupported interrupt coalescing parameters
From: Ben Hutchings @ 2011-09-05 17:41 UTC (permalink / raw)
  To: David Miller, Ripduman Sohan; +Cc: netdev, linux-net-drivers
In-Reply-To: <1315231921.3028.11.camel@bwh-desktop>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/ethtool.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index bc4643a..6de2715 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -628,12 +628,12 @@ static int efx_ethtool_set_coalesce(struct net_device *net_dev,
 	unsigned tx_usecs, rx_usecs, adaptive;
 
 	if (coalesce->use_adaptive_tx_coalesce)
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	if (coalesce->rx_coalesce_usecs || coalesce->tx_coalesce_usecs) {
 		netif_err(efx, drv, efx->net_dev, "invalid coalescing setting. "
 			  "Only rx/tx_coalesce_usecs_irq are supported\n");
-		return -EOPNOTSUPP;
+		return -EINVAL;
 	}
 
 	rx_usecs = coalesce->rx_coalesce_usecs_irq;
@@ -647,7 +647,7 @@ static int efx_ethtool_set_coalesce(struct net_device *net_dev,
 		    tx_usecs) {
 			netif_err(efx, drv, efx->net_dev, "Channel is shared. "
 				  "Only RX coalescing may be set\n");
-			return -EOPNOTSUPP;
+			return -EINVAL;
 		}
 	}
 
-- 
1.7.4.4



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox