Netdev List

Netdev List
 help / color / mirror / Atom feed

* SMSC 9303 support
From: Gary Thomas @ 2013-09-24 12:21 UTC (permalink / raw)
  To: netdev

I need to support the SMSC9303 in an embedded system.  I'm not
finding any [explicit] support for this device in the latest
mainline kernel.  Did I miss something?

To be clear, the SMSC9303 is a 3-port managed ethernet switch
capable of supporting 802.1D/802.1Q directly. This switch is
driven by a single MAC via MII/RMII and exposes the other two
ports via physical PHYs.  What I need it to do is behave like
two external, separate devices.  I was thinking that what I need
to do is treat these as VLAN devices since the switch can manage
the routing.

Does this seem like a reasonable approach?
How do I "hook up" my normal ethernet driver to it?  To the hardware
it just looks like any other MII/RMII PHY.  The device is managed
separately via I2C.  I can have that set up separately if necessary.

Thanks for any pointers/ideas

-- 
------------------------------------------------------------
Gary Thomas                 |  Consulting for the
MLB Associates              |    Embedded world
------------------------------------------------------------

^ permalink raw reply

* Re: [pchecks v1 2/4] Use raw cpu ops for calls that would trigger with checks
From: Eric Dumazet @ 2013-09-24 12:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Tejun Heo, akpm, Steven Rostedt, linux-kernel,
	Peter Zijlstra, netdev
In-Reply-To: <20130924073250.GD28538@gmail.com>

On Tue, 2013-09-24 at 09:32 +0200, Ingo Molnar wrote:
> (netdev Cc:-ed)
> 
> * Christoph Lameter <cl@linux.com> wrote:
> 
> > These location triggered during testing with KVM.
> > 
> > These are fetches without preemption off where we judged that
> > to be more performance efficient or where other means of
> > providing synchronization (BH handling) are available.
> 
> > Index: linux/include/net/snmp.h
> > ===================================================================
> > --- linux.orig/include/net/snmp.h	2013-09-12 13:26:29.216103951 -0500
> > +++ linux/include/net/snmp.h	2013-09-12 13:26:29.208104037 -0500
> > @@ -126,7 +126,7 @@ struct linux_xfrm_mib {
> >  	extern __typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
> >  
> >  #define SNMP_INC_STATS_BH(mib, field)	\
> > -			__this_cpu_inc(mib[0]->mibs[field])
> > +			raw_cpu_inc(mib[0]->mibs[field])
> >  
> >  #define SNMP_INC_STATS_USER(mib, field)	\
> >  			this_cpu_inc(mib[0]->mibs[field])
> > @@ -141,7 +141,7 @@ struct linux_xfrm_mib {
> >  			this_cpu_dec(mib[0]->mibs[field])
> >  
> >  #define SNMP_ADD_STATS_BH(mib, field, addend)	\
> > -			__this_cpu_add(mib[0]->mibs[field], addend)
> > +			raw_cpu_add(mib[0]->mibs[field], addend)
> 
> Are the networking folks fine with allowing unafe operations of SNMP stats 
> in preemptible sections, or should the kernel produce an optional warning 
> message if CONFIG_PREEMPT_DEBUG=y and these ops are used in preemptible 
> (non-bh, non-irq-handler, non-irqs-off, etc.) sections?
> 
> RAW_SNMP_*_STATS() ops could be used to annotate those places where that 
> kind of usage is safe.


I would rather not use RAW_ prefix in the macro, but add debugging
check to make sure we use _BH() variant in the right context.

BUG_ON(!in_softirq())

^ permalink raw reply

* Re: [PATCH] ptp: add range check on n_samples
From: Richard Cochran @ 2013-09-24 12:50 UTC (permalink / raw)
  To: Dong Zhu; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20130924070557.GA28795@zhudong.nay.redhat.com>

On Tue, Sep 24, 2013 at 03:05:57PM +0800, Dong Zhu wrote:
> From d4eb97e8d5def76d46167c91059147e2c7d33433 Mon Sep 17 00:00:00 2001
> 
> When using PTP_SYS_OFFSET ioctl to measure the time offset between the
> PHC and system clock, we need to specify the number of measurements, the
> valid value of n_samples is between 1 to 25. If n_samples <= 0 or > 25
> it makes no sense, so this patch intends to add a range check.

The field, n_samples, is unsigned, so the check is not needed.

Thanks,
Richard
 
> Signed-off-by: Dong Zhu <bluezhudong@gmail.com>
> ---
>  drivers/ptp/ptp_chardev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 34a0c60..4e85b23 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -104,7 +104,8 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
>  			err = -EFAULT;
>  			break;
>  		}
> -		if (sysoff->n_samples > PTP_MAX_SAMPLES) {
> +		if (sysoff->n_samples <= 0 ||
> +		    sysoff->n_samples > PTP_MAX_SAMPLES) {
>  			err = -EINVAL;
>  			break;
>  		}
> -- 
> 1.7.11.7
> 
> -- 
> Best Regards,
> Dong Zhu

^ permalink raw reply

* [PATCH] net: net_secret should not depend on TCP
From: Eric Dumazet @ 2013-09-24 13:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <20130924054532.GA24446@order.stressinduktion.org>

From: Eric Dumazet <edumazet@google.com>

A host might need net_secret[] and never open a single socket. 

Problem added in commit aebda156a570782
("net: defer net_secret[] initialization")

Based on prior patch from Hannes Frederic Sowa.

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/secure_seq.h |    1 -
 net/core/secure_seq.c    |   27 ++++++++++++++++++++++++---
 net/ipv4/af_inet.c       |    4 +---
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 6ca975b..c2e542b 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -3,7 +3,6 @@
 
 #include <linux/types.h>
 
-extern void net_secret_init(void);
 extern __u32 secure_ip_id(__be32 daddr);
 extern __u32 secure_ipv6_id(const __be32 daddr[4]);
 extern u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 6a2f13c..3f1ec15 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -10,11 +10,24 @@
 
 #include <net/secure_seq.h>
 
-static u32 net_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-void net_secret_init(void)
+static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+
+static void net_secret_init(void)
 {
-	get_random_bytes(net_secret, sizeof(net_secret));
+	u32 tmp;
+	int i;
+
+	if (likely(net_secret[0]))
+		return;
+
+	for (i = NET_SECRET_SIZE; i > 0;) {
+		do {
+			get_random_bytes(&tmp, sizeof(tmp));
+		} while (!tmp);
+		cmpxchg(&net_secret[--i], 0, tmp);
+	}
 }
 
 #ifdef CONFIG_INET
@@ -42,6 +55,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 	u32 hash[MD5_DIGEST_WORDS];
 	u32 i;
 
+	net_secret_init();
 	memcpy(hash, saddr, 16);
 	for (i = 0; i < 4; i++)
 		secret[i] = net_secret[i] + (__force u32)daddr[i];
@@ -63,6 +77,7 @@ u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 	u32 hash[MD5_DIGEST_WORDS];
 	u32 i;
 
+	net_secret_init();
 	memcpy(hash, saddr, 16);
 	for (i = 0; i < 4; i++)
 		secret[i] = net_secret[i] + (__force u32) daddr[i];
@@ -82,6 +97,7 @@ __u32 secure_ip_id(__be32 daddr)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
+	net_secret_init();
 	hash[0] = (__force __u32) daddr;
 	hash[1] = net_secret[13];
 	hash[2] = net_secret[14];
@@ -96,6 +112,7 @@ __u32 secure_ipv6_id(const __be32 daddr[4])
 {
 	__u32 hash[4];
 
+	net_secret_init();
 	memcpy(hash, daddr, 16);
 	md5_transform(hash, net_secret);
 
@@ -107,6 +124,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
+	net_secret_init();
 	hash[0] = (__force u32)saddr;
 	hash[1] = (__force u32)daddr;
 	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
@@ -121,6 +139,7 @@ u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
+	net_secret_init();
 	hash[0] = (__force u32)saddr;
 	hash[1] = (__force u32)daddr;
 	hash[2] = (__force u32)dport ^ net_secret[14];
@@ -140,6 +159,7 @@ u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 	u32 hash[MD5_DIGEST_WORDS];
 	u64 seq;
 
+	net_secret_init();
 	hash[0] = (__force u32)saddr;
 	hash[1] = (__force u32)daddr;
 	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
@@ -164,6 +184,7 @@ u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 	u64 seq;
 	u32 i;
 
+	net_secret_init();
 	memcpy(hash, saddr, 16);
 	for (i = 0; i < 4; i++)
 		secret[i] = net_secret[i] + daddr[i];
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..cfeb85c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -263,10 +263,8 @@ void build_ehash_secret(void)
 		get_random_bytes(&rnd, sizeof(rnd));
 	} while (rnd == 0);
 
-	if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) {
+	if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0)
 		get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
-		net_secret_init();
-	}
 }
 EXPORT_SYMBOL(build_ehash_secret);
 

^ permalink raw reply related

* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-09-24 13:35 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <1380023107.3162.53.camel@ubuntu-vm-makita>

On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich wrote:
>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich wrote:
>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>> On Thu, 2013-09-12 at 16:00 -0400, David Miller wrote:
>>>>>> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>>>>>> Date: Tue, 10 Sep 2013 19:27:54 +0900
>>>>>>
>>>>>>> There seem to be some undesirable behaviors related with PVID.
>>>>>>> 1. It has no effect assigning PVID to a port. PVID cannot be applied
>>>>>>> to any frame regardless of whether we set it or not.
>>>>>>> 2. FDB entries learned via frames applied PVID are registered with
>>>>>>> VID 0 rather than VID value of PVID.
>>>>>>> 3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
>>>>>>> This leads interoperational problems such as sending frames with VID
>>>>>>> 4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
>>>>>>> 0 as they belong to VLAN 0, which is expected to be handled as they have
>>>>>>> no VID according to IEEE 802.1Q.
>>>>>>>
>>>>>>> Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
>>>>>>> is fixed, because we cannot activate PVID due to it.
>>>>>>
>>>>>> Please work out the issues in patch #2 with Vlad and resubmit this
>>>>>> series.
>>>>>>
>>>>>> Thank you.
>>>>>
>>>>> I'm hovering between whether we should fix the issue by changing vlan 0
>>>>> interface behavior in 8021q module or enabling a bridge port to sending
>>>>> priority-tagged frames, or another better way.
>>>>>
>>>>> If you could comment it, I'd appreciate it :)
>>>>>
>>>>>
>>>>> BTW, I think what is discussed in patch #2 is another problem about
>>>>> handling priority-tags, and it exists without this patch set applied.
>>>>> It looks like that we should prepare another patch set than this to fix
>>>>> that problem.
>>>>>
>>>>> Should I include patches that fix the priority-tags problem in this
>>>>> patch set and resubmit them all together?
>>>>>
>>>>
>>>> I am thinking that we might need to do it in bridge and it looks like
>>>> the simplest way to do it is to have default priority regeneration table
>>>> (table 6-5 from 802.1Q doc).
>>>>
>>>> That way I think we would conform to the spec.
>>>>
>>>> -vlad
>>>
>>> Unfortunately I don't think the default priority regeneration table
>>> resolves the problem because IEEE 802.1Q says that a VLAN-aware bridge
>>> can transmit untagged or VLAN-tagged frames only (the end of section 7.5
>>> and 8.1.7).
>>>
>>> No mechanism to send priority-tagged frames is found as far as I can see
>>> the standard. I think the regenerated priority is used for outgoing PCP
>>> field only if egress policy is not untagged (i.e. transmitting as
>>> VLAN-tagged), and unused if untagged (Section 6.9.2 3rd/4th Paragraph).
>>>
>>> If we want to transmit priority-tagged frames from a bridge port, I
>>> think we need to implement a new (optional) feature that is above the
>>> standard, as I stated previously.
>>>
>>> How do you feel about adding a per-port policy that enables a bridge to
>>> send priority-tagged frames instead of untagged frames when egress
>>> policy for the port is untagged?
>>> With this change, we can transmit frames for a given vlan as either all
>>> untagged, all priority-tagged or all VLAN-tagged.
>>
>> That would work.  What I am thinking is that we do it by special casing
>> the vid 0 egress policy specification.  Let it be untagged by default
>> and if it is tagged, then we preserve the priority field and forward
>> it on.
>>
>> This keeps the API stable and doesn't require user/admin from knowing
>> exactly what happens.  Default operation conforms to the spec and allows
>> simple change to make it backward-compatible.
>>
>> What do you think.  I've done a simple prototype of this an it seems to
>> work with the VMs I am testing with.
>
> Are you saying that
> - by default, set the 0th bit of untagged_bitmap; and
> - if we unset the 0th bit and set the "vid"th bit, we transmit frames
> classified as belonging to VLAN "vid" as priority-tagged?
>
> If so, though it's attractive to keep current API, I'm worried about if
> it could be a bit confusing and not intuitive for kernel/iproute2
> developers that VID 0 has a special meaning only in the egress policy.
> Wouldn't it be better to adding a new member to struct net_port_vlans
> instead of using VID 0 of untagged_bitmap?
>
> Or are you saying that we use a new flag in struct net_port_vlans but
> use the BRIDGE_VLAN_INFO_UNTAGGED bit with VID 0 in netlink to set the
> flag?
>
> Even in that case, I'm afraid that it might be confusing for developers
> for the same reason. We are going to prohibit to specify VID with 0 (and
> 4095) in adding/deleting a FDB entry or a vlan filtering entry, but it
> would allow us to use VID 0 only when a vlan filtering entry is
> configured.
> I am thinking a new nlattr is a straightforward approach to configure
> it.

By making this an explicit attribute it makes vid 0 a special case for
any automatic tool that would provision such filtering.  Seeing vid 0
would mean that these tools would have to know that this would have to
be translated to a different attribute instead of setting the policy
values.

How it is implemented internally in the kernel isn't as big of an issue.
We can do it as a separate flag or as part of existing policy.

-vlad

>
> Thanks,
>
> Toshiaki Makita
>
>>
>> -vlad
>>
>>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>
>

^ permalink raw reply

* [PATCH net-next v3 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg()
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
  To: davem; +Cc: netdev

There is no way to set the IP_TOS field on a per-packet basis in IPv4, while
IPv6 has such a mechanism. Therefore one has to fall back to the setsockopt()
in case of IPv4. 

Using the existing per-socket option is not convenient particularly in the
situations where multiple threads have to use the same socket data requiring
per-thread TOS values. In fact this would involve calling setsockopt() before
sendmsg() every time.

Francesco Fusco (2):
  ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  ipv4: processing ancillary IP_TOS or IP_TTL

 include/net/inet_sock.h |  3 +++
 include/net/ip.h        | 14 ++++++++++++++
 include/net/route.h     |  1 +
 net/ipv4/icmp.c         |  5 +++++
 net/ipv4/ip_output.c    | 13 ++++++++++---
 net/ipv4/ip_sockglue.c  | 20 +++++++++++++++++++-
 net/ipv4/ping.c         |  4 +++-
 net/ipv4/raw.c          |  4 +++-
 net/ipv4/udp.c          |  4 +++-
 9 files changed, 61 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next v3 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1379944641.git.ffusco@redhat.com>

This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:

- the TTL, expressed as __u8.
  The allowed values are in the [1-255].
  A value of 0 means that the TTL is not specified.

- the TOS, expressed as __s16.
  The allowed values are in the range [0,255].
  A value of -1 means that the TOS is not specified.

- the priority, expressed as a char and computed when
  handling the ancillary data.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
 v1->v2
  - changed the icmp_cookie ttl field from __s16 to __u8.
    A value of 0 means that the TTL has not been specified
  - to tos field is still __s16. The user can specify
    values in the range 0-255 included, therefore I use
    a value of -1 as a flag saying that the value has
    not been specified
  - the priority it is now a char instead of a __u32, 
    which is the return type of rt_tos2priority
  - improved commit message
 v1->v2
  - no code changes, rebase to net-next

 include/net/ip.h       |  3 +++
 net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index c1f192b..0135f38 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -56,6 +56,9 @@ struct ipcm_cookie {
 	int			oif;
 	struct ip_options_rcu	*opt;
 	__u8			tx_flags;
+	__u8			ttl;
+	__s16			tos;
+	char			priority;
 };
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d9c4f11..56e3445 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL(ip_cmsg_recv);
 
 int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
 {
-	int err;
+	int err, val;
 	struct cmsghdr *cmsg;
 
 	for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
@@ -215,6 +215,24 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
 			ipc->addr = info->ipi_spec_dst.s_addr;
 			break;
 		}
+		case IP_TTL:
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+				return -EINVAL;
+			val = *(int *)CMSG_DATA(cmsg);
+			if (val < 1 || val > 255)
+				return -EINVAL;
+			ipc->ttl = val;
+			break;
+		case IP_TOS:
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+				return -EINVAL;
+			val = *(int *)CMSG_DATA(cmsg);
+			if (val < 0 || val > 255)
+				return -EINVAL;
+			ipc->tos = val;
+			ipc->priority = rt_tos2priority(ipc->tos);
+			break;
+
 		default:
 			return -EINVAL;
 		}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 2/2] ipv4: processing ancillary IP_TOS or IP_TTL
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1379944641.git.ffusco@redhat.com>

If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().

The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.

Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
 v1->v2
  - reworked the entire patch
  - modified the ttl field in the struct inet_cork from __s16 to __u8:
    0 means that the TTL is not specified
  - the tos field in the struct inet_cork is still __s16: 
    -1 means tha the tos is not set
  - modified the priority field in the struct inet_cork from __u32 to 
    char.
  - introduced the get_rttos and get_rtconn_flags functions
 v2->v3
  - no code changes, rebase to net-next

 include/net/inet_sock.h |  3 +++
 include/net/ip.h        | 11 +++++++++++
 include/net/route.h     |  1 +
 net/ipv4/icmp.c         |  5 +++++
 net/ipv4/ip_output.c    | 13 ++++++++++---
 net/ipv4/ping.c         |  4 +++-
 net/ipv4/raw.c          |  4 +++-
 net/ipv4/udp.c          |  4 +++-
 8 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 636d203..f314177 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -103,6 +103,9 @@ struct inet_cork {
 	int			length; /* Total length of all frames */
 	struct dst_entry	*dst;
 	u8			tx_flags;
+	__u8			ttl;
+	__s16			tos;
+	char			priority;
 };
 
 struct inet_cork_full {
diff --git a/include/net/ip.h b/include/net/ip.h
index 0135f38..77b4f9b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -28,6 +28,7 @@
 #include <linux/skbuff.h>
 
 #include <net/inet_sock.h>
+#include <net/route.h>
 #include <net/snmp.h>
 #include <net/flow.h>
 
@@ -140,6 +141,16 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
 	return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
 }
 
+static inline __u8 get_rttos(struct ipcm_cookie* ipc, struct inet_sock *inet)
+{
+	return (ipc->tos != -1) ? RT_TOS(ipc->tos) : RT_TOS(inet->tos);
+}
+
+static inline __u8 get_rtconn_flags(struct ipcm_cookie* ipc, struct sock* sk)
+{
+	return (ipc->tos != -1) ? RT_CONN_FLAGS_TOS(sk, ipc->tos) : RT_CONN_FLAGS(sk);
+}
+
 /* datagram.c */
 int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 
diff --git a/include/net/route.h b/include/net/route.h
index 6f572ca..0ad8e01 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -39,6 +39,7 @@
 #define RTO_ONLINK	0x01
 
 #define RT_CONN_FLAGS(sk)   (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
+#define RT_CONN_FLAGS_TOS(sk,tos)   (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
 
 struct fib_nh;
 struct fib_info;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5f7d11a..5c0e8bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -353,6 +353,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	saddr = fib_compute_spec_dst(skb);
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
+
 	if (icmp_param->replyopts.opt.opt.optlen) {
 		ipc.opt = &icmp_param->replyopts.opt;
 		if (ipc.opt->opt.srr)
@@ -608,6 +611,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	ipc.addr = iph->saddr;
 	ipc.opt = &icmp_param->replyopts.opt;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
 			       type, code, icmp_param);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a04d872..7d8357b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1060,6 +1060,9 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 			 rt->dst.dev->mtu : dst_mtu(&rt->dst);
 	cork->dst = &rt->dst;
 	cork->length = 0;
+	cork->ttl = ipc->ttl;
+	cork->tos = ipc->tos;
+	cork->priority = ipc->priority;
 	cork->tx_flags = ipc->tx_flags;
 
 	return 0;
@@ -1311,7 +1314,9 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	if (cork->flags & IPCORK_OPT)
 		opt = cork->opt;
 
-	if (rt->rt_type == RTN_MULTICAST)
+	if (cork->ttl != 0)
+		ttl = cork->ttl;
+	else if (rt->rt_type == RTN_MULTICAST)
 		ttl = inet->mc_ttl;
 	else
 		ttl = ip_select_ttl(inet, &rt->dst);
@@ -1319,7 +1324,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	iph = ip_hdr(skb);
 	iph->version = 4;
 	iph->ihl = 5;
-	iph->tos = inet->tos;
+	iph->tos = (cork->tos != -1) ? cork->tos : inet->tos;
 	iph->frag_off = df;
 	iph->ttl = ttl;
 	iph->protocol = sk->sk_protocol;
@@ -1331,7 +1336,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 		ip_options_build(skb, opt, cork->addr, rt, 0);
 	}
 
-	skb->priority = sk->sk_priority;
+	skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
 	skb->mark = sk->sk_mark;
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
@@ -1481,6 +1486,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 	ipc.addr = daddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	if (replyopts.opt.opt.optlen) {
 		ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index d7d9882..706d108e 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -713,6 +713,8 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	ipc.opt = NULL;
 	ipc.oif = sk->sk_bound_dev_if;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	sock_tx_timestamp(sk, &ipc.tx_flags);
 
@@ -744,7 +746,7 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			return -EINVAL;
 		faddr = ipc.opt->opt.faddr;
 	}
-	tos = RT_TOS(inet->tos);
+	tos = get_rttos(&ipc, inet);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
 	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index bfec521..a3fe534 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -517,6 +517,8 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	ipc.addr = inet->inet_saddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 	ipc.oif = sk->sk_bound_dev_if;
 
 	if (msg->msg_controllen) {
@@ -556,7 +558,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			daddr = ipc.opt->opt.faddr;
 		}
 	}
-	tos = RT_CONN_FLAGS(sk);
+	tos = get_rtconn_flags(&ipc, sk);
 	if (msg->msg_flags & MSG_DONTROUTE)
 		tos |= RTO_ONLINK;
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 74d2c95..22462d94 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -855,6 +855,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
 
@@ -938,7 +940,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		faddr = ipc.opt->opt.faddr;
 		connected = 0;
 	}
-	tos = RT_TOS(inet->tos);
+	tos = get_rttos(&ipc, inet);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
 	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: David Miller @ 2013-09-24 14:02 UTC (permalink / raw)
  To: mkl; +Cc: joe, netdev, linux-kernel, wg, linux-can
In-Reply-To: <52414128.8030706@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Tue, 24 Sep 2013 09:37:12 +0200

> On 09/24/2013 12:11 AM, Joe Perches wrote:
>> There are a mix of function prototypes with and without extern
>> in the kernel sources.  Standardize on not using extern for
>> function prototypes.
>> 
>> Function prototypes don't need to be written with extern.
>> extern is assumed by the compiler.  Its use is as unnecessary as
>> using auto to declare automatic/local variables in a block.
>> 
>> Signed-off-by: Joe Perches <joe@perches.com>
> 
> Thx, added to linux-can-next. The patch will be included in the next
> pull request to David.

Marc, I'm trying to just quickly apply these all into my tree directly.

Thanks.

^ permalink raw reply

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: David Miller @ 2013-09-24 14:11 UTC (permalink / raw)
  To: joe; +Cc: netdev, linux-kernel, wg, mkl, linux-can
In-Reply-To: <5570169a078375fa8662adeb2a7f24c1ae718bfb.1379974101.git.joe@perches.com>


Series applied, thanks Joe.

^ permalink raw reply

* Re: [net 5/6] i40e: better return values
From: David Miller @ 2013-09-24 14:12 UTC (permalink / raw)
  To: joe; +Cc: jeffrey.t.kirsher, jesse.brandeburg, netdev, gospo, sassmann
In-Reply-To: <1380022486.3575.74.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Tue, 24 Sep 2013 04:34:46 -0700

> On Tue, 2013-09-24 at 02:45 -0700, Jeff Kirsher wrote:
> 
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> []
>> @@ -3339,9 +3345,7 @@ static u8 i40e_dcb_get_num_tc(struct i40e_dcbx_config *dcbcfg)
>>  	/* Traffic class index starts from zero so
>>  	 * increment to return the actual count
>>  	 */
>> -	num_tc++;
>> -
>> -	return num_tc;
>> +	return num_tc++;
> 
> Ick.  post_increment problem.
> 
> 	return ++num_tc;
> 
> There's nothing wrong with the original code
> unless this is a bugfix which should be documented
> better than "better return values".

Agreed, this style of coding is asking for a bug.

If you want to return "num_tc PLUS ONE" just say that:

	return num_tc + 1;

Why even use pre/post increment when the variable has no other
use than as a return value?

^ permalink raw reply

* Re: [PATCH ] net: udp: do not report ICMP redirects to user space
From: David Miller @ 2013-09-24 14:17 UTC (permalink / raw)
  To: duanj.fnst; +Cc: netdev
In-Reply-To: <523C216C.90707@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 20 Sep 2013 18:20:28 +0800

> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> Redirect isn't an error condition, it should leave
> the error handler without touching the socket.
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH ] net: raw: do not report ICMP redirects to user space
From: David Miller @ 2013-09-24 14:17 UTC (permalink / raw)
  To: duanj.fnst; +Cc: netdev
In-Reply-To: <523C21A5.4050504@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 20 Sep 2013 18:21:25 +0800

> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> Redirect isn't an error condition, it should leave
> the error handler without touching the socket.
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH] skge: fix invalid value passed to pci_unmap_sigle
From: David Miller @ 2013-09-24 14:18 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, romieu, i.gnatenko.brain, stephen
In-Reply-To: <alpine.LRH.2.02.1309201352010.1763@file01.intranet.prod.int.rdu2.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Fri, 20 Sep 2013 13:53:22 -0400 (EDT)

> In my patch c194992cbe71c20bb3623a566af8d11b0bfaa721 I didn't fix the skge

Always refer to commits, not just by SHA ID, but also with the commit
header line text in parenthesis and double quotes, for this you'd say:

c194992cbe71c20bb3623a566af8d11b0bfaa721 ("skge: fix broken driver")

Using just the SHA ID is completely ambiguous, because the SHA ID will
be entirely different if this commit is added to a different tree, such
as -stable.

> bug correctly. The value of the new mapping (not old) was passed to
> pci_unmap_single.
> 
> If we enable CONFIG_DMA_API_DEBUG, it results in this warning:
> WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:986 check_sync+0x4c4/0x580()
> skge 0000:02:07.0: DMA-API: device driver tries to sync DMA memory it has
> not allocated [device address=0x000000023a0096c0] [size=1536 bytes]
> 
> This patch makes the skge driver pass the correct value to
> pci_unmap_single and fixes the warning. It copies the old descriptor to
> on-stack variable "ee" and unmaps it if mapping of the new descriptor
> succeeded.
> 
> This patch should be backported to 3.11-stable.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Mikulas Patocka <mpatocka@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: Marc Kleine-Budde @ 2013-09-24 14:22 UTC (permalink / raw)
  To: David Miller; +Cc: joe, netdev, linux-kernel, wg, linux-can
In-Reply-To: <20130924.100209.1219862004963547693.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

On 09/24/2013 04:02 PM, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Tue, 24 Sep 2013 09:37:12 +0200
> 
>> On 09/24/2013 12:11 AM, Joe Perches wrote:
>>> There are a mix of function prototypes with and without extern
>>> in the kernel sources.  Standardize on not using extern for
>>> function prototypes.
>>>
>>> Function prototypes don't need to be written with extern.
>>> extern is assumed by the compiler.  Its use is as unnecessary as
>>> using auto to declare automatic/local variables in a block.
>>>
>>> Signed-off-by: Joe Perches <joe@perches.com>
>>
>> Thx, added to linux-can-next. The patch will be included in the next
>> pull request to David.
> 
> Marc, I'm trying to just quickly apply these all into my tree directly.

Fine with me.

Tnx, Marc


-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply

* Re: [net-next PATCH 0/4] cpsw: support for control module register
From: David Miller @ 2013-09-24 14:34 UTC (permalink / raw)
  To: mugunthanvnm
  Cc: netdev, zonque, bcousson, tony, devicetree, linux-omap,
	linux-arm-kernel
In-Reply-To: <1379704841-32693-1-git-send-email-mugunthanvnm@ti.com>

From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Sat, 21 Sep 2013 00:50:37 +0530

> This patch series adds the support for configuring GMII_SEL register
> of control module to select the phy mode type and also to configure
> the clock source for RMII phy mode whether to use internal clock or
> the external clock from the phy itself.
> 
> Till now CPSW works as this configuration is done in U-Boot and carried
> over to the kernel. But during suspend/resume Control module tends to
> lose its configured value for GMII_SEL register in AM33xx PG1.0, so
> if CPSW is used in RMII or RGMII mode, on resume cpsw is not working
> as GMII_SEL register lost its configuration values.
> 
> The initial version of the patch is done by Daniel Mack but as per
> Tony's comment he wants it as a seperate driver as it is done in USB
> control module. I have created a seperate driver for the same.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v3 -next 1/2] tcp: syncookies: reduce cookie lifetime to 128 seconds
From: David Miller @ 2013-09-24 14:40 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1379709176-1625-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 Sep 2013 22:32:55 +0200

> We currently accept cookies that were created less than 4 minutes ago
> (ie, cookies with counter delta 0-3).  Combined with the 8 mss table
> values, this yields 32 possible values (out of 2**32) that will be valid.
> 
> Reducing the lifetime to < 2 minutes halves the guessing chance while
> still providing a large enough period.
> 
> While at it, get rid of jiffies value -- they overflow too quickly on
> 32 bit platforms.
> 
> getnstimeofday is used to create a counter that increments every 64s.
> perf shows getnstimeofday cost is negible compared to sha_transform;
> normal tcp initial sequence number generation uses getnstimeofday, too.
> 
> Reported-by: Jakob Lell <jakob@jakoblell.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied.

^ permalink raw reply

* Re: [PATCH v3 -next 2/2] tcp: syncookies: reduce mss table to four values
From: David Miller @ 2013-09-24 14:40 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1379709176-1625-2-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 Sep 2013 22:32:56 +0200

> Halve mss table size to make blind cookie guessing more difficult.
> This is sad since the tables were already small, but there
> is little alternative except perhaps adding more precise mss information
> in the tcp timestamp.  Timestamps are unfortunately not ubiquitous.
> 
> Guessing all possible cookie values still has 8-in 2**32 chance.
> 
> Reported-by: Jakob Lell <jakob@jakoblell.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied.

Thanks for following up on all of my feedback.

^ permalink raw reply

* [PATCH net v3 0/1] xen-netback: windows frontend compatibility fixes
From: Paul Durrant @ 2013-09-24 14:51 UTC (permalink / raw)
  To: xen-devel, netdev

The following patches fix a couple more issues found when testing with
Windows frontends.

v3:
- Collapse both v2 patches into a single patch that introduces a new
  function to handle backend state transtions. By doing this we ensure that
  we always transition through intermediate states and that we don't attempt
  repeated connects or disconnects.

v2:
- Add comment in 2/2 to note that state transitions from Connected to Closed
  are incorrect.

^ permalink raw reply

* [PATCH net v3 1/1] xen-netback: Handle backend state transitions in a more robust way
From: Paul Durrant @ 2013-09-24 14:51 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Ian Campbell, Wei Liu, David Vrabel
In-Reply-To: <1380034282-11210-1-git-send-email-paul.durrant@citrix.com>

When the frontend state changes metback now specifies its desired state to
a new function, set_backend_state(), which transitions through any
necessary intermediate states.
This fixes an issue observed with some old Windows frontend drivers where
they failed to transition through the Closing state and netback would not
behave correctly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/xenbus.c |  145 ++++++++++++++++++++++++++++++--------
 1 file changed, 114 insertions(+), 31 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index a53782e..716b167 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -24,6 +24,7 @@
 struct backend_info {
 	struct xenbus_device *dev;
 	struct xenvif *vif;
+	enum xenbus_state state;
 	enum xenbus_state frontend_state;
 	struct xenbus_watch hotplug_status_watch;
 	u8 have_hotplug_status_watch:1;
@@ -136,6 +137,8 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		goto fail;
 
+	be->state = XenbusStateInitWait;
+
 	/* This kicks hotplug scripts, so do it immediately. */
 	backend_create_xenvif(be);
 
@@ -208,24 +211,113 @@ static void backend_create_xenvif(struct backend_info *be)
 	kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE);
 }
 
-
-static void disconnect_backend(struct xenbus_device *dev)
+static void backend_disconnect(struct backend_info *be)
 {
-	struct backend_info *be = dev_get_drvdata(&dev->dev);
-
 	if (be->vif)
 		xenvif_disconnect(be->vif);
 }
 
-static void destroy_backend(struct xenbus_device *dev)
+static void backend_connect(struct backend_info *be)
 {
-	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	if (be->vif)
+		connect(be);
+}
 
-	if (be->vif) {
-		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
-		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_free(be->vif);
-		be->vif = NULL;
+static inline void backend_switch_state(struct backend_info *be,
+					enum xenbus_state state)
+{
+	struct xenbus_device *dev = be->dev;
+
+	pr_debug("%s -> %s\n", dev->nodename, xenbus_strstate(state));
+	be->state = state;
+
+	/* If we are waiting for a hotplug script then defer the
+	 * actual xenbus state change.
+	 */
+	if (!be->have_hotplug_status_watch)
+		xenbus_switch_state(dev, state);
+}
+
+/* Handle backend state transitions:
+ *
+ * The backend state starts in InitWait and the following transtions are
+ * allowed.
+ *
+ * InitWait -> Connected
+ *
+ *    ^    \         |
+ *    |     \        |
+ *    |      \       |
+ *    |       \      |
+ *    |        \     |
+ *    |         \    |
+ *    |          V   V
+ *
+ *  Closed  <-> Closing
+ *
+ * The state argument specifies the eventual state of the backend and the
+ * function transitions to that state via the shortest path.
+ */
+static void set_backend_state(struct backend_info *be,
+			      enum xenbus_state state)
+{
+	while (be->state != state) {
+		switch (be->state) {
+		case XenbusStateClosed:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+				pr_info("%s: prepare for reconnect\n",
+					be->dev->nodename);
+				backend_switch_state(be, XenbusStateInitWait);
+				break;
+			case XenbusStateClosing:
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateInitWait:
+			switch (state) {
+			case XenbusStateConnected:
+				backend_connect(be);
+				backend_switch_state(be, XenbusStateConnected);
+				break;
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateConnected:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				backend_disconnect(be);
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateClosing:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosed);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		default:
+			BUG();
+		}
 	}
 }
 
@@ -237,40 +329,33 @@ static void frontend_changed(struct xenbus_device *dev,
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
 
-	pr_debug("frontend state %s\n", xenbus_strstate(frontend_state));
+	pr_debug("%s -> %s\n", dev->otherend, xenbus_strstate(frontend_state));
 
 	be->frontend_state = frontend_state;
 
 	switch (frontend_state) {
 	case XenbusStateInitialising:
-		if (dev->state == XenbusStateClosed) {
-			pr_info("%s: prepare for reconnect\n", dev->nodename);
-			xenbus_switch_state(dev, XenbusStateInitWait);
-		}
+		set_backend_state(be, XenbusStateInitWait);
 		break;
 
 	case XenbusStateInitialised:
 		break;
 
 	case XenbusStateConnected:
-		if (dev->state == XenbusStateConnected)
-			break;
-		if (be->vif)
-			connect(be);
+		set_backend_state(be, XenbusStateConnected);
 		break;
 
 	case XenbusStateClosing:
-		disconnect_backend(dev);
-		xenbus_switch_state(dev, XenbusStateClosing);
+		set_backend_state(be, XenbusStateClosing);
 		break;
 
 	case XenbusStateClosed:
-		xenbus_switch_state(dev, XenbusStateClosed);
+		set_backend_state(be, XenbusStateClosed);
 		if (xenbus_dev_is_online(dev))
 			break;
-		destroy_backend(dev);
 		/* fall through if not online */
 	case XenbusStateUnknown:
+		set_backend_state(be, XenbusStateClosed);
 		device_unregister(&dev->dev);
 		break;
 
@@ -363,7 +448,9 @@ static void hotplug_status_changed(struct xenbus_watch *watch,
 	if (IS_ERR(str))
 		return;
 	if (len == sizeof("connected")-1 && !memcmp(str, "connected", len)) {
-		xenbus_switch_state(be->dev, XenbusStateConnected);
+		/* Complete any pending state change */
+		xenbus_switch_state(be->dev, be->state);
+
 		/* Not interested in this watch anymore. */
 		unregister_hotplug_status_watch(be);
 	}
@@ -389,16 +476,12 @@ static void connect(struct backend_info *be)
 			  &be->vif->credit_usec);
 	be->vif->remaining_credit = be->vif->credit_bytes;
 
-	unregister_hotplug_status_watch(be);
+	BUG_ON(be->have_hotplug_status_watch);
 	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
 				   hotplug_status_changed,
 				   "%s/%s", dev->nodename, "hotplug-status");
-	if (err) {
-		/* Switch now, since we can't do a watch. */
-		xenbus_switch_state(dev, XenbusStateConnected);
-	} else {
+	if (!err)
 		be->have_hotplug_status_watch = 1;
-	}
 
 	netif_wake_queue(be->vif->dev);
 }
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] net: net_secret should not depend on TCP
From: Hannes Frederic Sowa @ 2013-09-24 15:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <1380028797.3165.65.camel@edumazet-glaptop>

On Tue, Sep 24, 2013 at 06:19:57AM -0700, Eric Dumazet wrote:
> -void net_secret_init(void)
> +static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
> +
> +static void net_secret_init(void)
>  {
> -	get_random_bytes(net_secret, sizeof(net_secret));
> +	u32 tmp;
> +	int i;
> +
> +	if (likely(net_secret[0]))
> +		return;
> +
> +	for (i = NET_SECRET_SIZE; i > 0;) {
> +		do {
> +			get_random_bytes(&tmp, sizeof(tmp));
> +		} while (!tmp);

I am afraid we can block here on embedded systems in an atomic section? Is
this actually an issue? It does get called in a spin_lock_h.

^ permalink raw reply

* Re: [PATCH net-next] net: introduce SO_MAX_PACING_RATE
From: Eric Dumazet @ 2013-09-24 15:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Steinar H. Gunderson, Michael Kerrisk
In-Reply-To: <1379949014.3165.24.camel@edumazet-glaptop>

On Mon, 2013-09-23 at 08:10 -0700, Eric Dumazet wrote:

> +
> +	case SO_MAX_PACING_RATE:
> +		sk->sk_max_pacing_rate = val;
> +		break;
> +

I'll send a v2, adding here :

sk->sk_pacing_rate = min(sk->sk_pacing_rate, sk->max_pacing_rate);

to enforce current limit for non TCP protocols.

^ permalink raw reply

* Re: [REGRESSION][BISECTED] skge: add dma_mapping check
From: Joseph Salisbury @ 2013-09-24 15:16 UTC (permalink / raw)
  To: Igor Gnatenko, Francois Romieu, mpatocka
  Cc: Mirko Lindner, linux-kernel, Stephen Hemminger, netdev,
	member graysky, Greg KH
In-Reply-To: <1379581391.2403.5.camel@ThinkPad-X230.localdomain>

On 09/19/2013 05:03 AM, Igor Gnatenko wrote:
> Please, send patch.
>
The patch is in mainline as of 3.12-rc2 as commit:

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Thu Sep 19 14:13:17 2013 -0400

    skge: fix broken driver

I don't see that the commit was Cc'd to stable.  Mikulas, we might need
to send a request directly to the stable maintainers and reqeust that
the commit be pulled into stable, in case they didn't notice the request
in the commit message.

^ permalink raw reply

* Re: [PATCH net-next] tcp: fix dynamic right sizing
From: David Miller @ 2013-09-24 15:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ncardwell, ycheng, vanj
In-Reply-To: <1379710618.3431.5.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 20 Sep 2013 13:56:58 -0700

> Dynamic Right Sizing (DRS) is supposed to open TCP receive window
> automatically, but suffers from two bugs, presented by order
> of importance.
> 
> 1) tcp_rcv_space_adjust() fix :
> 
> Using twice the last received amount is very pessimistic,
> because it doesn't allow fast recovery or proper slow start
> ramp up, if sender wants to increase cwin by 100% every RTT.
> 
> copied = bytes received in previous RTT
> 
> 2*copied = bytes we expect to receive in next RTT
> 
> 4*copied = bytes we need to advertise in rwin at end of next RTT
> 
> DRS is one RTT late, it needs a 4x factor.
> 
> If sender is not using ABC, and increases cwin by 50% every rtt,
> then we needed 1.5*1.5 = 2.25 factor.
> This is probably why this bug was not really noticed.
> 
> 2) There is no window adjustment after first RTT. DRS triggers only
>   after the second RTT.
>   DRS needs two RTT to initialize, so tcp_fixup_rcvbuf() should setup
>   sk_rcvbuf to allow proper window grow for first two RTT.
> 
> This patch increases TCP efficiency particularly for large RTT flows
> when autotuning is used at the receiver, and more particularly
> in presence of packet losses.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Cc: Van Jacobson <vanj@google.com>

Looks good, applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH v2] qlge: call ql_core_dump() only if dump memory was allocated.
From: David Miller @ 2013-09-24 15:20 UTC (permalink / raw)
  To: malahal; +Cc: netdev
In-Reply-To: <1379712077-31750-1-git-send-email-malahal@us.ibm.com>

From: Malahal Naineni <malahal@us.ibm.com>
Date: Fri, 20 Sep 2013 16:21:17 -0500

> Also changed a log message to indicate that memory was not allocated
> instead of memory not available!
> 
> Signed-off-by: Malahal Naineni <malahal@us.ibm.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox