Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: Bonding
From: Gustavo Pimentel @ 2014-02-11 17:15 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev@vger.kernel.org
In-Reply-To: <20140211141549.GB29570@redhat.com>

Hi Veaceslav,

It's quite different from broadcast mode, each frame sent through the slaves has attached a Redundancy Control Trailer also known as RCT (this trailer is compose by a LAN identifier, sequence number, a LSDU size and a PRP suffix).
Also the equipment with PRP capability has to send periodically a supervision frame to both similar LANs. Each device on the network has to keep track of receive sequence numbers received, if the received a sequence number for instance from LAN A of specific device and it doesn't exist on internal table, the device should accept the frame and update the internal table. When receiving the same sequence number from the LAN B, the device should discard it, providing zero downtime redundancy.

I can supply you information about this redundancy protocol, if you like. This type of network redundancy is now being large deployed on electrical power stations (like thermal and hydro) and transmission power stations instead of teaming / bonding that depends on RSTP for redundancy.

> -----Original Message-----
> From: Veaceslav Falico [mailto:vfalico@redhat.com]
> Sent: terça-feira, 11 de Fevereiro de 2014 14:16
> To: Gustavo Pimentel
> Cc: netdev@vger.kernel.org
> Subject: Re: Bonding
> 
> On Tue, Feb 11, 2014 at 01:53:32PM +0000, Gustavo Pimentel wrote:
> >Hi,
> 
> Hi Gustavo,
> 
> >
> >I'm writing you because because I'm have implemented a new mode (PRP Parallel
> Redundancy Protocol) for bonding kernel driver. This new mode is quite simple, I
> don't know if you have heard about PRP, but it's a new standard that allows to
> overcome any single network failure without affecting the data transmission. The
> general idea resides on having two separate LAN (A & B) very similar and
> transmitting the almost the same frame through both LANs and the end device
> should accept one frame and discard the other according to a known mechanism.
> 
> Isn't that the current 'broadcast' mode, where every packet is transmitted over all
> the slaves? After quick googling/reading I don't see any difference there, though I
> might have missed something.
> 
> >
> >I have implemented this new mode on bonding driver, but I have some
> difficulties:
> >. Writing linux driver is quite new for me. I don't' know if exists guide lines for
> driver coding.
> 
> You can find everything under Documentation/, but without the code I can't tell you
> exact documents. CodingStyle and SubmittingPatches might be the first ones.
> 
> Also, try CC-ing relevant people for more feedback, specifically bonding
> maintainers.
> 
> >. I don't know how to submit the code to be include on kernel repository.
> >. Maybe another pair of eyes could find help to improve the writing code for this
> mode.
> 
> Try sending an RFC when net-next opens.
> 
> >
> >I think my driver code is 99% complete. I'm currently testing with 3 equipments (1
> pc + 1 embedded device running both my modify bonding driver) and a third party
> equipment called RedBox.
> >
> >Would you be interested in participating / helping this project?
> >
> >With my best regards,
> >
> >Gustavo Gama da Rocha Pimentel
> >Power Systems Automation / Innovation & Development Efacec Engenharia e
> >Sistemas, S.A.
> >Phone: +351229403391
> >Disclaimer
> >
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org More majordomo info
> >at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 13/13] net: Mark functions as static in net/sunrpc/svc_xprt.c
From: J. Bruce Fields @ 2014-02-11 17:12 UTC (permalink / raw)
  To: Rashika Kheria
  Cc: linux-kernel, Trond Myklebust, David S. Miller, linux-nfs, netdev,
	josh
In-Reply-To: <259f3dc38d7115eccf3aaa2aa7de2414a47f90a2.1391888654.git.rashika.kheria@gmail.com>

On Sun, Feb 09, 2014 at 01:40:54AM +0530, Rashika Kheria wrote:
> Mark functions as static in net/sunrpc/svc_xprt.c because they are not
> used outside this file.
> 
> This eliminates the following warning in net/sunrpc/svc_xprt.c:
> net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes]
> net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes]
> net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes]

Thanks, applying.--b.

> 
> Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
> ---
>  net/sunrpc/svc_xprt.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 80a6640..06c6ff0 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -571,7 +571,7 @@ static void svc_check_conn_limits(struct svc_serv *serv)
>  	}
>  }
>  
> -int svc_alloc_arg(struct svc_rqst *rqstp)
> +static int svc_alloc_arg(struct svc_rqst *rqstp)
>  {
>  	struct svc_serv *serv = rqstp->rq_server;
>  	struct xdr_buf *arg;
> @@ -612,7 +612,7 @@ int svc_alloc_arg(struct svc_rqst *rqstp)
>  	return 0;
>  }
>  
> -struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
> +static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
>  {
>  	struct svc_xprt *xprt;
>  	struct svc_pool		*pool = rqstp->rq_pool;
> @@ -691,7 +691,7 @@ struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
>  	return xprt;
>  }
>  
> -void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt)
> +static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt)
>  {
>  	spin_lock_bh(&serv->sv_lock);
>  	set_bit(XPT_TEMP, &newxpt->xpt_flags);
> -- 
> 1.7.9.5
> 

^ permalink raw reply

* Re: RFC: bridge get fdb by bridge device
From: Jamal Hadi Salim @ 2014-02-11 17:07 UTC (permalink / raw)
  To: vyasevic, netdev@vger.kernel.org
  Cc: Stephen Hemminger, Scott Feldman, John Fastabend
In-Reply-To: <52F8FEF1.60407@redhat.com>

On 02/10/14 11:31, Vlad Yasevich wrote:
> On 02/09/2014 10:06 AM, Jamal Hadi Salim wrote:

>> +	ndm = nlmsg_data(cb->nlh);
>> +	if (ndm->ndm_ifindex) {
>
> We get really lucky here that ndm_ifindex and ifi_index happen to map to
> the same location.
>

Didnt follow - but I have a feeling you are looking at the reference
point of a bridge port.
Note as per my response to John: The target is a bridge device, not
a bridge port.

>
> I agree with both of Johns commens fro the above code.
> I think you can use ndo_dflt_fdb_dump() here and remove the first check
> for IFF_EBRIDGE.
>

Same comment i made to John. The goal is to emulate
brctl showmacs <bridge>
ndo_dflt_fdb_dump() gives me in theory all the bridge ports
unicast and multicast MAC addresses. There is a posibility that
the bridgeport is a bridge - in which case I can find out from
user space and safely request for it directly instead of via
its parent.

> The only odd thing is that it would permit syntax like:
>   # bridge fbd show br eth0
> or
>   # bridge fdb show br macvlan0
>
> but I think that's ok.

Ok, since both you and John point to macvlan - is that
considered as something with an fdb? It doesnt forward
packets between two devices.

>> diff --git a/bridge/fdb.c b/bridge/fdb.c
>> index e2e53f1..f3073d6 100644
>> --- a/bridge/fdb.c
>> +++ b/bridge/fdb.c
>> @@ -33,7 +33,7 @@ static void usage(void)
>>   	fprintf(stderr, "Usage: bridge fdb { add | append | del | replace }
> ADDR dev DEV {self|master} [ temp ]\n"
>>   		        "              [router] [ dst IPADDR] [ vlan VID ]\n"
>>   		        "              [ port PORT] [ vni VNI ] [via DEV]\n");
>> -	fprintf(stderr, "       bridge fdb {show} [ dev DEV ]\n");
>> +	fprintf(stderr, "       bridge fdb {show} [ br BRDEV ] [ dev DEV ]\n");
>
> 'port' option is now allowed in the show operation
>

Thanks - it is already taken seems by vxlan using the same interface.

cheers,
jamal

^ permalink raw reply

* Re: RFC: bridge get fdb by bridge device
From: Jamal Hadi Salim @ 2014-02-11 17:03 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev@vger.kernel.org, vyasevic, Stephen Hemminger,
	Scott Feldman, John Fastabend
In-Reply-To: <52F7D7EE.7010302@gmail.com>

On 02/09/14 14:33, John Fastabend wrote:
> On 02/09/2014 07:06 AM, Jamal Hadi Salim wrote:

>
>> +    if (ndm->ndm_ifindex) {
>> +        dev = __dev_get_by_index(net, ndm->ndm_ifindex);
>> +        if (dev == NULL) {
>> +            pr_info("PF_BRIDGE: RTM_GETNEIGH with unknown ifindex\n");
>> +            return -ENODEV;
>> +        }
>> +
>> +        if (!(dev->priv_flags & IFF_EBRIDGE)) {
>
> Can we drop this 'if case' and just use the 'if (ops->ndo_fdb_dump)'
> below? IFF_EBRIDGE is specific to ./net/bridge so it will fail for
> macvlans and I think the command is useful in both cases.
>

My only worry is:
The target ndm_ifindex is to a bridge device i.e something
that has an "fdb" (user space says "br X" then we use X to
mean we are talking about a bridge device. This is what
brctl showmacs <X> means.
I know this doesnt seem right relative to current point of
view where the target is the bridge port ifindex.
I'd be more than happy to make the change if it makes sense,
but i am struggling with that thought.

>
>> +            pr_info("PF_BRIDGE: RTM_GETNEIGH %s not a bridge device\n",
>> +                dev->name);
>> +            return -EINVAL;
>>          }
>> +        ops = dev->netdev_ops;
>> +        if (ops->ndo_fdb_dump) {
>> +            idx = ops->ndo_fdb_dump(skb, cb, dev, idx);
>> +        } else {
>
> Is there any problem with using the ndo_dflt_fdb_dump() in the else
> here?
>

If the assumption is the target ifindex is to a bridge device, I would
expect it MUST have a ops->ndo_fdb_dump(), no?
the default dumper deals with bridge ports (which may be bridges, but
if i wanted their databases i would address them)

> Userspace should be able to easily learn which ports are ebridge ports
> so I don't think that should be an issue. Anyways with the above
> IFF_EBRIDGE check you should never hit this else case although I think
> its safe to drop the above check as noted.
>

Right - so if i can find out which is an ebridge i can send it the
same request.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] netfilter: nf_nat_snmp_basic: fix duplicates in if/else branches
From: Patrick McHardy @ 2014-02-11 16:11 UTC (permalink / raw)
  To: Francois-Xavier Le Bail, NETDEV, David Miller, netfilter-devel,
	netfilter, coreteam
In-Reply-To: <1392130165-4313-1-git-send-email-fx.lebail@yahoo.com>

On 11. Februar 2014 14:49:25 GMT+00:00, Francois-Xavier Le Bail <fx.lebail@yahoo.com> wrote:
>The solution was found by Patrick in 2.4 kernel sources.
>
>Cc: Patrick McHardy <kaber@trash.net>

Acked-by: Patrick McHardy <kaber@trash.net>

>Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
>---
> net/ipv4/netfilter/nf_nat_snmp_basic.c |    4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c
>b/net/ipv4/netfilter/nf_nat_snmp_basic.c
>index d551e31..7c67667 100644
>--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
>+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
>@@ -1198,8 +1198,8 @@ static int snmp_translate(struct nf_conn *ct,
> 		map.to = NOCT1(&ct->tuplehash[!dir].tuple.dst.u3.ip);
> 	} else {
> 		/* DNAT replies */
>-		map.from = NOCT1(&ct->tuplehash[dir].tuple.src.u3.ip);
>-		map.to = NOCT1(&ct->tuplehash[!dir].tuple.dst.u3.ip);
>+		map.from = NOCT1(&ct->tuplehash[!dir].tuple.src.u3.ip);
>+		map.to = NOCT1(&ct->tuplehash[dir].tuple.dst.u3.ip);
> 	}
> 
> 	if (map.from == map.to)
>--
>To unsubscribe from this list: send the line "unsubscribe
>netfilter-devel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* Re: [PATCH] net: rtnetlink: fix correct size given to memset
From: François-Xavier Le Bail @ 2014-02-11 16:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: NETDEV, David Miller
In-Reply-To: <1392132522.6615.85.camel@edumazet-glaptop2.roam.corp.google.com>

> From: Eric Dumazet <eric.dumazet@gmail.com>

> To: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
>>  Find by cppcheck:
>>  [net/core/rtnetlink.c:1842]: (warning) Using size of pointer linkinfo 
> instead of size of its data.
>> 
>>  Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
>>  ---
>> 
>>  The diagnosis of cppcheck seems relevant.
>> 
>>   net/core/rtnetlink.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> 
> This is absolutely wrong.
> 
>     struct nlattr *linkinfo[IFLA_INFO_MAX+1];
> 
> sizeof(linkinfo) is totally appropriate 
> 
> sizeof(*linkinfo) is equal to sizeof(void *), and this is not what we
> need here.

Oops, sorry for the noise.

^ permalink raw reply

* RE: [PATCH] net: rtnetlink: fix correct size given to memset
From: David Laight @ 2014-02-11 15:54 UTC (permalink / raw)
  To: 'Eric Dumazet', Francois-Xavier Le Bail; +Cc: NETDEV, David Miller
In-Reply-To: <1392132522.6615.85.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet
> On Tue, 2014-02-11 at 15:06 +0100, Francois-Xavier Le Bail wrote:
> > Find by cppcheck:
> > [net/core/rtnetlink.c:1842]: (warning) Using size of pointer linkinfo instead of size of its data.
> >
> > Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
> > ---
> >
> > The diagnosis of cppcheck seems relevant.
> >
> >  net/core/rtnetlink.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index 048dc8d..9a5dbf1 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -1839,7 +1839,7 @@ replay:
> >  		if (err < 0)
> >  			return err;
> >  	} else
> > -		memset(linkinfo, 0, sizeof(linkinfo));
> > +		memset(linkinfo, 0, sizeof(*linkinfo));
> >
> >  	if (linkinfo[IFLA_INFO_KIND]) {
> >  		nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
> 
> 
> This is absolutely wrong.
> 
> 	struct nlattr *linkinfo[IFLA_INFO_MAX+1];
> 
> sizeof(linkinfo) is totally appropriate
> 
> sizeof(*linkinfo) is equal to sizeof(void *), and this is not what we
> need here.

I guess that changing it to:
	memset(&linkinfo, 0, sizeof(linkinfo));

might make the tool happy, but it really ought to be taught about arrays.

	David


^ permalink raw reply

* Re: [PATCH v2 2/2] sctp: optimize the sctp_sysctl_net_register
From: Sergei Shtylyov @ 2014-02-11 16:47 UTC (permalink / raw)
  To: Wang Weidong, nhorman, davem, vyasevich; +Cc: dborkman, netdev
In-Reply-To: <1392083346-9420-3-git-send-email-wangweidong1@huawei.com>

Hello.

On 02/11/2014 04:49 AM, Wang Weidong wrote:

> Here, when the net is init_net, we needn't to kmemdup the ctl_table
> again. So add a check for net. Also we can save some memory.

> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
> ---
>   net/sctp/sysctl.c | 16 +++++++++-------
>   1 file changed, 9 insertions(+), 7 deletions(-)

> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> index 2ddb401..b65396b 100644
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -403,15 +403,17 @@ static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write,
>
>   int sctp_sysctl_net_register(struct net *net)
>   {
> -	struct ctl_table *table;
> -	int i;
> +	struct ctl_table *table = sctp_net_table;
>
> -	table = kmemdup(sctp_net_table, sizeof(sctp_net_table), GFP_KERNEL);
> -	if (!table)
> -		return -ENOMEM;
> +	if (!net_eq(net, &init_net)) {
> +		int i;

    Empty line after declaration wouldn't hurt, just like above.

> +		table = kmemdup(sctp_net_table, sizeof(sctp_net_table), GFP_KERNEL);
> +		if (!table)
> +			return -ENOMEM;

WBR, Sergei

^ permalink raw reply

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Michael S. Tsirkin @ 2014-02-11 15:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Qin Chuanyu, davem, jasowang, Anthony Liguori, KVM list, netdev
In-Reply-To: <1392132244.6615.83.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Feb 11, 2014 at 07:24:04AM -0800, Eric Dumazet wrote:
> On Tue, 2014-02-11 at 22:25 +0800, Qin Chuanyu wrote:
> > we could xmit directly instead of going through softirq to gain 
> > throughput and lantency improved.
> > test model: VM-Host-Host just do transmit. with vhost thread and nic
> > interrupt bind cpu1. netperf do throuhput test and qperf do lantency test.
> > Host OS: suse11sp3, Guest OS: suse11sp3
> > 
> > latency result(us):
> > packet_len 64 256 512 1460
> > old(UDP)   44  47  48   66
> > new(UDP)   38  41  42   66
> > 
> > old(TCP)   52  55  70  117
> > new(TCP)   45  48  61  114
> > 
> > throughput result(Gbit/s):
> > packet_len   64   512   1024   1460
> > old(UDP)   0.42  2.02   3.75   4.68
> > new(UDP)   0.45  2.14   3.77   5.06
> > 
> > TCP due to the latency, client couldn't send packet big enough
> > to get benefit from TSO of nic, so the result show it will send
> > more packet per sencond but get lower throughput.
> > 
> > Eric mentioned that it would has problem with cgroup, but the patch
> > had been sent by Herbert Xu.
> > patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
> > 
> > Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> > ---
> >   drivers/net/tun.c |    4 +++-
> >   1 files changed, 3 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 44c4db8..90b4e58 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct 
> > *tun, struct tun_file *tfile,
> >   	skb_probe_transport_header(skb, 0);
> > 
> >   	rxhash = skb_get_hash(skb);
> > -	netif_rx_ni(skb);
> > +	rcu_read_lock_bh();
> > +	netif_receive_skb(skb);
> > +	rcu_read_unlock_bh();
> > 
> >   	tun->dev->stats.rx_packets++;
> >   	tun->dev->stats.rx_bytes += len;
> 
> I already said this patch is not good :
> 
> rcu_read_lock_bh() makes no sense here.
> 
> What is really needed is local_bh_disable();
> 
> Herbert patch ( http://patchwork.ozlabs.org/patch/52963/ ) had a much
> cleaner form.
> 
> Just use it, CC him, credit him, please ?
> 

But not before making sure (testing) that patch does not break the
cgroups classifier please.

-- 
MST

^ permalink raw reply

* [PATCH v2 2/2] dp83640: Get pin and master/slave configuration from DT
From: Stefan Sørensen @ 2014-02-11 15:29 UTC (permalink / raw)
  To: richardcochran, grant.likely, robh+dt, mark.rutland, netdev,
	linux-kernel, devicetree
  Cc: Stefan Sørensen
In-Reply-To: <1392132562-23644-1-git-send-email-stefan.sorensen@spectralink.com>

This patch adds configuration of the periodic output and external timestamp
pins available on the dp83640 family of PHYs. It also configures the
master/slave relationship in a group of PHYs on the same MDIO bus and the pins
used for clock calibration in the group.

The configuration is retrieved from DT through the properties
    dp83640,slave
    dp83640,calibrate-pin
    dp83640,perout-pins
    dp83640,extts-pins
The configuration module parameters are retained as fallback for the non-DT
case.

Since the pin configuration is now stored for each clock device, groups of
devices on different mdio busses can now have different pin configurations.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 Documentation/devicetree/bindings/net/dp83640.txt |  29 +++++
 drivers/net/phy/dp83640.c                         | 152 ++++++++++++++++++----
 2 files changed, 154 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/dp83640.txt

diff --git a/Documentation/devicetree/bindings/net/dp83640.txt b/Documentation/devicetree/bindings/net/dp83640.txt
new file mode 100644
index 0000000..b9a57c0
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dp83640.txt
@@ -0,0 +1,29 @@
+Required properties for the National DP83640 ethernet phy:
+
+- compatible : Must contain "national,dp83640"
+
+Optional properties:
+
+- dp83640,slave: If present, this phy will be slave to another dp83640
+  on the same mdio bus.
+- dp83640,perout-pins : List of the pin pins used for periodic output
+  triggers.
+- dp83640,extts-pins : List of the pin pins used for external event
+  timestamping.
+- dp83640,calibrate-pin : The pin used for master/slave calibration.
+
+Example:
+
+	ethernet-phy@1 {
+		compatible = "national,dp83640", "ethernet-phy-ieee802.3-c22";
+		reg = <1>;
+		dp83640,perout-pins = <2>;
+		dp83640,extts-pins = <3 4 8 9 10 11>;
+		dp83640,calibrate-pin = <1>;
+	};
+
+	ethernet-phy@2 {
+		compatible = "national,dp83640", "ethernet-phy-ieee802.3-c22";
+		reg = <2>;
+		dp83640,slave;
+	};
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index d4fe95d..a9bd553 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -30,6 +30,7 @@
 #include <linux/phy.h>
 #include <linux/ptp_classify.h>
 #include <linux/ptp_clock_kernel.h>
+#include <linux/of_device.h>
 
 #include "dp83640_reg.h"
 
@@ -119,6 +120,8 @@ struct dp83640_private {
 	/* queues of incoming and outgoing packets */
 	struct sk_buff_head rx_queue;
 	struct sk_buff_head tx_queue;
+	/* is this phyter a slave */
+	bool slave;
 };
 
 struct dp83640_clock {
@@ -140,6 +143,7 @@ struct dp83640_clock {
 	struct list_head phylist;
 	/* reference to our PTP hardware clock */
 	struct ptp_clock *ptp_clock;
+	u32 perout_pins[N_EXT], extts_pins[N_EXT], calibrate_pin;
 };
 
 
@@ -263,8 +267,8 @@ static void periodic_output(struct dp83640_clock *clock,
 	u32 sec, nsec, period;
 	u16 gpio, ptp_trig, trigger, val;
 
-	gpio = on ? perout_pins[index] : 0;
-	trigger = n_ext_ts + index;
+	gpio = on ? clock->perout_pins[index] : 0;
+	trigger = clock->caps.n_ext_ts + index;
 
 	ptp_trig = TRIG_WR |
 		(trigger & TRIG_CSEL_MASK) << TRIG_CSEL_SHIFT |
@@ -419,12 +423,12 @@ static int ptp_dp83640_enable(struct ptp_clock_info *ptp,
 	switch (rq->type) {
 	case PTP_CLK_REQ_EXTTS:
 		index = rq->extts.index;
-		if (index < 0 || index >= n_ext_ts)
+		if (index < 0 || index >= clock->caps.n_ext_ts)
 			return -EINVAL;
 		event_num = index;
 		evnt = EVNT_WR | (event_num & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
 		if (on) {
-			gpio_num = extts_pins[index];
+			gpio_num = clock->extts_pins[index];
 			evnt |= (gpio_num & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
 			evnt |= EVNT_RISE;
 		}
@@ -433,7 +437,7 @@ static int ptp_dp83640_enable(struct ptp_clock_info *ptp,
 
 	case PTP_CLK_REQ_PEROUT:
 		index = rq->perout.index;
-		if (index < 0 || index >= n_per_out)
+		if (index < 0 || index >= clock->caps.n_per_out)
 			return -EINVAL;
 		periodic_output(clock, rq, index, on);
 		return 0;
@@ -530,7 +534,7 @@ static void recalibrate(struct dp83640_clock *clock)
 	struct phy_device *master = clock->chosen->phydev;
 	u16 cfg0, evnt, ptp_trig, trigger, val;
 
-	trigger = n_ext_ts + n_per_out;
+	trigger = clock->caps.n_ext_ts + clock->caps.n_per_out;
 
 	mutex_lock(&clock->extreg_lock);
 
@@ -554,7 +558,7 @@ static void recalibrate(struct dp83640_clock *clock)
 	 */
 	evnt = EVNT_WR | EVNT_RISE | EVNT_SINGLE;
 	evnt |= (trigger & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
-	evnt |= (calibrate_pin & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
+	evnt |= (clock->calibrate_pin & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
 
 	list_for_each(this, &clock->phylist) {
 		tmp = list_entry(this, struct dp83640_private, list);
@@ -567,7 +571,7 @@ static void recalibrate(struct dp83640_clock *clock)
 	 */
 	ptp_trig = TRIG_WR | TRIG_IF_LATE | TRIG_PULSE;
 	ptp_trig |= (trigger  & TRIG_CSEL_MASK) << TRIG_CSEL_SHIFT;
-	ptp_trig |= (calibrate_pin & TRIG_GPIO_MASK) << TRIG_GPIO_SHIFT;
+	ptp_trig |= (clock->calibrate_pin & TRIG_GPIO_MASK) << TRIG_GPIO_SHIFT;
 	ext_write(0, master, PAGE5, PTP_TRIG, ptp_trig);
 
 	/* load trigger */
@@ -672,7 +676,7 @@ static int decode_evnt(struct dp83640_private *dp83640,
 	event.type = PTP_CLOCK_EXTTS;
 	event.timestamp = phy2txts(&dp83640->edata);
 
-	for (i = 0; i < n_ext_ts; i++) {
+	for (i = 0; i < dp83640->clock->caps.n_ext_ts; i++) {
 		if (ext_status & exts_chan_to_edata(i)) {
 			event.index = i;
 			ptp_clock_event(dp83640->clock->ptp_clock, &event);
@@ -874,12 +878,11 @@ static void dp83640_clock_init(struct dp83640_clock *clock, struct mii_bus *bus)
 	mutex_init(&clock->extreg_lock);
 	mutex_init(&clock->clock_lock);
 	INIT_LIST_HEAD(&clock->phylist);
+	clock->calibrate_pin = -1;
 	clock->caps.owner = THIS_MODULE;
 	sprintf(clock->caps.name, "dp83640 timer");
 	clock->caps.max_adj	= 1953124;
 	clock->caps.n_alarm	= 0;
-	clock->caps.n_ext_ts	= n_ext_ts;
-	clock->caps.n_per_out	= n_per_out;
 	clock->caps.pps		= 0;
 	clock->caps.adjfreq	= ptp_dp83640_adjfreq;
 	clock->caps.adjtime	= ptp_dp83640_adjtime;
@@ -892,18 +895,6 @@ static void dp83640_clock_init(struct dp83640_clock *clock, struct mii_bus *bus)
 	get_device(&bus->dev);
 }
 
-static int choose_this_phy(struct dp83640_clock *clock,
-			   struct phy_device *phydev)
-{
-	if (chosen_phy == -1 && !clock->chosen)
-		return 1;
-
-	if (chosen_phy == phydev->addr)
-		return 1;
-
-	return 0;
-}
-
 static struct dp83640_clock *dp83640_clock_get(struct dp83640_clock *clock)
 {
 	if (clock)
@@ -949,6 +940,95 @@ static void dp83640_clock_put(struct dp83640_clock *clock)
 	mutex_unlock(&clock->clock_lock);
 }
 
+#ifdef CONFIG_OF
+static int dp83640_probe_dt(struct device_node *node,
+			    struct dp83640_private *dp83640)
+{
+	struct dp83640_clock *clock = dp83640->clock;
+	struct property *prop;
+	int err, proplen;
+
+	dp83640->slave = of_property_read_bool(node, "dp83640,slave");
+	if (!dp83640->slave && clock->chosen) {
+		pr_err("dp83640,slave must be set if more than one device on the same bus");
+		return -EINVAL;
+	}
+
+	prop = of_find_property(node, "dp83640,perout-pins", &proplen);
+	if (prop) {
+		if (dp83640->slave) {
+			pr_err("dp83640,perout-pins property can not be set together with dp83640,slave");
+			return -EINVAL;
+		}
+
+		clock->caps.n_per_out = proplen / sizeof(u32);
+		if (clock->caps.n_per_out > N_EXT) {
+			pr_err("dp83640,perout-pins may not have more than %d entries",
+			       N_EXT);
+			return -EINVAL;
+		}
+		err = of_property_read_u32_array(node, "dp83640,perout-pins",
+						 clock->perout_pins,
+						 clock->caps.n_per_out);
+		if (err < 0)
+			return err;
+	}
+
+	prop = of_find_property(node, "dp83640,extts-pins", &proplen);
+	if (prop) {
+		if (dp83640->slave) {
+			pr_err("dp83640,extts-pins property can not be set together with dp83640,slave");
+			return -EINVAL;
+		}
+
+		clock->caps.n_ext_ts = proplen / sizeof(u32);
+		if (clock->caps.n_ext_ts > N_EXT) {
+			pr_err("dp83640,extts-pins may not have more than %d entries",
+			       N_EXT);
+			return -EINVAL;
+		}
+		err = of_property_read_u32_array(node, "dp83640,extts-pins",
+						 clock->extts_pins,
+						 clock->caps.n_ext_ts);
+		if (err < 0)
+			return err;
+	}
+
+	prop = of_find_property(node, "dp83640,calibrate-pin", &proplen);
+	if (prop) {
+		if (dp83640->slave) {
+			pr_err("dp83640,calibrate-pin property can not be set together with dp83640,slave");
+			return -EINVAL;
+		}
+		of_property_read_u32(node, "dp83640,calibrate-pin",
+				     &clock->calibrate_pin);
+	}
+
+	if (!dp83640->slave) {
+		if (clock->caps.n_per_out + clock->caps.n_ext_ts +
+		    (clock->calibrate_pin != -1 ? 1 : 0) > N_EXT) {
+			pr_err("Too many pins configured");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static const struct of_device_id dp83640_of_match_table[] = {
+	{ .compatible = "national,dp83640", },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, dp83640_of_match_table);
+#else
+
+static inline int dp83640_probe_dt(struct device_node *node,
+				   struct dp83640_private *dp83640)
+{
+	return 0;
+}
+#endif
+
 static int dp83640_probe(struct phy_device *phydev)
 {
 	struct dp83640_clock *clock;
@@ -966,7 +1046,24 @@ static int dp83640_probe(struct phy_device *phydev)
 	if (!dp83640)
 		goto no_memory;
 
+	dp83640->clock = clock;
 	dp83640->phydev = phydev;
+
+	if (phydev->dev.of_node) {
+		err = dp83640_probe_dt(phydev->dev.of_node, dp83640);
+		if (err)
+			return err;
+	} else {
+		clock->calibrate_pin = calibrate_pin;
+		memcpy(clock->perout_pins, perout_pins,
+		       sizeof(clock->perout_pins));
+		memcpy(clock->extts_pins, extts_pins,
+		       sizeof(clock->extts_pins));
+		if (clock->chosen ||
+		    (chosen_phy != -1 && phydev->addr != chosen_phy))
+			dp83640->slave = true;
+	}
+
 	INIT_WORK(&dp83640->ts_work, rx_timestamp_work);
 
 	INIT_LIST_HEAD(&dp83640->rxts);
@@ -980,9 +1077,7 @@ static int dp83640_probe(struct phy_device *phydev)
 	skb_queue_head_init(&dp83640->rx_queue);
 	skb_queue_head_init(&dp83640->tx_queue);
 
-	dp83640->clock = clock;
-
-	if (choose_this_phy(clock, phydev)) {
+	if (!dp83640->slave) {
 		clock->chosen = dp83640;
 		clock->ptp_clock = ptp_clock_register(&clock->caps, &phydev->dev);
 		if (IS_ERR(clock->ptp_clock)) {
@@ -1327,7 +1422,10 @@ static struct phy_driver dp83640_driver = {
 	.hwtstamp	= dp83640_hwtstamp,
 	.rxtstamp	= dp83640_rxtstamp,
 	.txtstamp	= dp83640_txtstamp,
-	.driver		= {.owner = THIS_MODULE,}
+	.driver		= {
+		.owner = THIS_MODULE,
+		.of_match_table = of_match_ptr(dp83640_of_match_table),
+	}
 };
 
 static int __init dp83640_init(void)
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH v2 1/2] dp83640: Support a configurable number of periodic outputs
From: Stefan Sørensen @ 2014-02-11 15:29 UTC (permalink / raw)
  To: richardcochran, grant.likely, robh+dt, mark.rutland, netdev,
	linux-kernel, devicetree
  Cc: Stefan Sørensen
In-Reply-To: <1392132562-23644-1-git-send-email-stefan.sorensen@spectralink.com>

The driver is currently limited to a single periodic output. This patch makes
the number of peridodic output dynamic by dropping the gpio_tab module
parameter and adding calibrate_pin, perout_pins, and extts_pins parameters.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
---
 drivers/net/phy/dp83640.c | 75 ++++++++++++++++++++---------------------------
 1 file changed, 32 insertions(+), 43 deletions(-)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 547725f..d4fe95d 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -38,15 +38,11 @@
 #define LAYER4		0x02
 #define LAYER2		0x01
 #define MAX_RXTS	64
-#define N_EXT_TS	6
+#define N_EXT		8
 #define PSF_PTPVER	2
 #define PSF_EVNT	0x4000
 #define PSF_RX		0x2000
 #define PSF_TX		0x1000
-#define EXT_EVENT	1
-#define CAL_EVENT	7
-#define CAL_TRIGGER	7
-#define PER_TRIGGER	6
 
 #define MII_DP83640_MICR 0x11
 #define MII_DP83640_MISR 0x12
@@ -146,32 +142,24 @@ struct dp83640_clock {
 	struct ptp_clock *ptp_clock;
 };
 
-/* globals */
-
-enum {
-	CALIBRATE_GPIO,
-	PEROUT_GPIO,
-	EXTTS0_GPIO,
-	EXTTS1_GPIO,
-	EXTTS2_GPIO,
-	EXTTS3_GPIO,
-	EXTTS4_GPIO,
-	EXTTS5_GPIO,
-	GPIO_TABLE_SIZE
-};
 
 static int chosen_phy = -1;
-static ushort gpio_tab[GPIO_TABLE_SIZE] = {
-	1, 2, 3, 4, 8, 9, 10, 11
-};
+static int calibrate_pin = 1;
+static int perout_pins[N_EXT] = {2};
+static int n_per_out = 1;
+static int extts_pins[N_EXT] = {3, 4, 8, 9, 10, 11};
+static int n_ext_ts = 6;
 
 module_param(chosen_phy, int, 0444);
-module_param_array(gpio_tab, ushort, NULL, 0444);
+module_param(calibrate_pin, int, 0444);
+module_param_array(perout_pins, int, &n_per_out, 0444);
+module_param_array(extts_pins, int, &n_ext_ts, 0444);
 
 MODULE_PARM_DESC(chosen_phy, \
 	"The address of the PHY to use for the ancillary clock features");
-MODULE_PARM_DESC(gpio_tab, \
-	"Which GPIO line to use for which purpose: cal,perout,extts1,...,extts6");
+MODULE_PARM_DESC(calibrate_pin, "Which pin to use for calibration");
+MODULE_PARM_DESC(perout_pins, "Which pins to use for periodic output");
+MODULE_PARM_DESC(extts_pins, "Which pins to use for external timestamping");
 
 /* a list of clocks and a mutex to protect it */
 static LIST_HEAD(phyter_clocks);
@@ -267,15 +255,16 @@ static u64 phy2txts(struct phy_txts *p)
 }
 
 static void periodic_output(struct dp83640_clock *clock,
-			    struct ptp_clock_request *clkreq, bool on)
+			    struct ptp_clock_request *clkreq, int index,
+			    bool on)
 {
 	struct dp83640_private *dp83640 = clock->chosen;
 	struct phy_device *phydev = dp83640->phydev;
 	u32 sec, nsec, period;
 	u16 gpio, ptp_trig, trigger, val;
 
-	gpio = on ? gpio_tab[PEROUT_GPIO] : 0;
-	trigger = PER_TRIGGER;
+	gpio = on ? perout_pins[index] : 0;
+	trigger = n_ext_ts + index;
 
 	ptp_trig = TRIG_WR |
 		(trigger & TRIG_CSEL_MASK) << TRIG_CSEL_SHIFT |
@@ -430,12 +419,12 @@ static int ptp_dp83640_enable(struct ptp_clock_info *ptp,
 	switch (rq->type) {
 	case PTP_CLK_REQ_EXTTS:
 		index = rq->extts.index;
-		if (index < 0 || index >= N_EXT_TS)
+		if (index < 0 || index >= n_ext_ts)
 			return -EINVAL;
-		event_num = EXT_EVENT + index;
+		event_num = index;
 		evnt = EVNT_WR | (event_num & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
 		if (on) {
-			gpio_num = gpio_tab[EXTTS0_GPIO + index];
+			gpio_num = extts_pins[index];
 			evnt |= (gpio_num & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
 			evnt |= EVNT_RISE;
 		}
@@ -443,9 +432,10 @@ static int ptp_dp83640_enable(struct ptp_clock_info *ptp,
 		return 0;
 
 	case PTP_CLK_REQ_PEROUT:
-		if (rq->perout.index != 0)
+		index = rq->perout.index;
+		if (index < 0 || index >= n_per_out)
 			return -EINVAL;
-		periodic_output(clock, rq, on);
+		periodic_output(clock, rq, index, on);
 		return 0;
 
 	default:
@@ -538,10 +528,9 @@ static void recalibrate(struct dp83640_clock *clock)
 	struct list_head *this;
 	struct dp83640_private *tmp;
 	struct phy_device *master = clock->chosen->phydev;
-	u16 cal_gpio, cfg0, evnt, ptp_trig, trigger, val;
+	u16 cfg0, evnt, ptp_trig, trigger, val;
 
-	trigger = CAL_TRIGGER;
-	cal_gpio = gpio_tab[CALIBRATE_GPIO];
+	trigger = n_ext_ts + n_per_out;
 
 	mutex_lock(&clock->extreg_lock);
 
@@ -564,8 +553,8 @@ static void recalibrate(struct dp83640_clock *clock)
 	 * enable an event timestamp
 	 */
 	evnt = EVNT_WR | EVNT_RISE | EVNT_SINGLE;
-	evnt |= (CAL_EVENT & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
-	evnt |= (cal_gpio & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
+	evnt |= (trigger & EVNT_SEL_MASK) << EVNT_SEL_SHIFT;
+	evnt |= (calibrate_pin & EVNT_GPIO_MASK) << EVNT_GPIO_SHIFT;
 
 	list_for_each(this, &clock->phylist) {
 		tmp = list_entry(this, struct dp83640_private, list);
@@ -578,7 +567,7 @@ static void recalibrate(struct dp83640_clock *clock)
 	 */
 	ptp_trig = TRIG_WR | TRIG_IF_LATE | TRIG_PULSE;
 	ptp_trig |= (trigger  & TRIG_CSEL_MASK) << TRIG_CSEL_SHIFT;
-	ptp_trig |= (cal_gpio & TRIG_GPIO_MASK) << TRIG_GPIO_SHIFT;
+	ptp_trig |= (calibrate_pin & TRIG_GPIO_MASK) << TRIG_GPIO_SHIFT;
 	ext_write(0, master, PAGE5, PTP_TRIG, ptp_trig);
 
 	/* load trigger */
@@ -642,7 +631,7 @@ static void recalibrate(struct dp83640_clock *clock)
 
 static inline u16 exts_chan_to_edata(int ch)
 {
-	return 1 << ((ch + EXT_EVENT) * 2);
+	return 1 << ((ch) * 2);
 }
 
 static int decode_evnt(struct dp83640_private *dp83640,
@@ -676,14 +665,14 @@ static int decode_evnt(struct dp83640_private *dp83640,
 		parsed = words + 2;
 	} else {
 		parsed = words + 1;
-		i = ((ests >> EVNT_NUM_SHIFT) & EVNT_NUM_MASK) - EXT_EVENT;
+		i = ((ests >> EVNT_NUM_SHIFT) & EVNT_NUM_MASK);
 		ext_status = exts_chan_to_edata(i);
 	}
 
 	event.type = PTP_CLOCK_EXTTS;
 	event.timestamp = phy2txts(&dp83640->edata);
 
-	for (i = 0; i < N_EXT_TS; i++) {
+	for (i = 0; i < n_ext_ts; i++) {
 		if (ext_status & exts_chan_to_edata(i)) {
 			event.index = i;
 			ptp_clock_event(dp83640->clock->ptp_clock, &event);
@@ -889,8 +878,8 @@ static void dp83640_clock_init(struct dp83640_clock *clock, struct mii_bus *bus)
 	sprintf(clock->caps.name, "dp83640 timer");
 	clock->caps.max_adj	= 1953124;
 	clock->caps.n_alarm	= 0;
-	clock->caps.n_ext_ts	= N_EXT_TS;
-	clock->caps.n_per_out	= 1;
+	clock->caps.n_ext_ts	= n_ext_ts;
+	clock->caps.n_per_out	= n_per_out;
 	clock->caps.pps		= 0;
 	clock->caps.adjfreq	= ptp_dp83640_adjfreq;
 	clock->caps.adjtime	= ptp_dp83640_adjtime;
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH v2 0/2] dp83640: Get pin and master/slave configuration from DT
From: Stefan Sørensen @ 2014-02-11 15:29 UTC (permalink / raw)
  To: richardcochran, grant.likely, robh+dt, mark.rutland, netdev,
	linux-kernel, devicetree
  Cc: Stefan Sørensen

This patch series add DT configuration to the DP83640 PHY driver and makes
the configuration of periodic output pins dynamic.

Changes since v1:
 - Add bindings documentation
 - Keep module parameters
 - Rename gpio->pin
 - Split patch into DT part and dynamic periodic output support

Stefan


Stefan Sørensen (2):
  dp83640: Support a configurable number of periodic outputs
  dp83640: Get pin and master/slave configuration from DT

 Documentation/devicetree/bindings/net/dp83640.txt |  29 +++
 drivers/net/phy/dp83640.c                         | 205 +++++++++++++++-------
 2 files changed, 175 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/dp83640.txt

-- 
1.8.5.3

^ permalink raw reply

* Re: [PATCH] net: rtnetlink: fix correct size given to memset
From: Eric Dumazet @ 2014-02-11 15:28 UTC (permalink / raw)
  To: Francois-Xavier Le Bail; +Cc: NETDEV, David Miller
In-Reply-To: <1392127598-3719-1-git-send-email-fx.lebail@yahoo.com>

On Tue, 2014-02-11 at 15:06 +0100, Francois-Xavier Le Bail wrote:
> Find by cppcheck:
> [net/core/rtnetlink.c:1842]: (warning) Using size of pointer linkinfo instead of size of its data.
> 
> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
> ---
> 
> The diagnosis of cppcheck seems relevant.
> 
>  net/core/rtnetlink.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 048dc8d..9a5dbf1 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1839,7 +1839,7 @@ replay:
>  		if (err < 0)
>  			return err;
>  	} else
> -		memset(linkinfo, 0, sizeof(linkinfo));
> +		memset(linkinfo, 0, sizeof(*linkinfo));
>  
>  	if (linkinfo[IFLA_INFO_KIND]) {
>  		nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));


This is absolutely wrong.

	struct nlattr *linkinfo[IFLA_INFO_MAX+1];

sizeof(linkinfo) is totally appropriate 

sizeof(*linkinfo) is equal to sizeof(void *), and this is not what we
need here.

^ permalink raw reply

* Re: [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Eric Dumazet @ 2014-02-11 15:24 UTC (permalink / raw)
  To: Qin Chuanyu
  Cc: davem, jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list,
	netdev
In-Reply-To: <52FA32C5.9040601@huawei.com>

On Tue, 2014-02-11 at 22:25 +0800, Qin Chuanyu wrote:
> we could xmit directly instead of going through softirq to gain 
> throughput and lantency improved.
> test model: VM-Host-Host just do transmit. with vhost thread and nic
> interrupt bind cpu1. netperf do throuhput test and qperf do lantency test.
> Host OS: suse11sp3, Guest OS: suse11sp3
> 
> latency result(us):
> packet_len 64 256 512 1460
> old(UDP)   44  47  48   66
> new(UDP)   38  41  42   66
> 
> old(TCP)   52  55  70  117
> new(TCP)   45  48  61  114
> 
> throughput result(Gbit/s):
> packet_len   64   512   1024   1460
> old(UDP)   0.42  2.02   3.75   4.68
> new(UDP)   0.45  2.14   3.77   5.06
> 
> TCP due to the latency, client couldn't send packet big enough
> to get benefit from TSO of nic, so the result show it will send
> more packet per sencond but get lower throughput.
> 
> Eric mentioned that it would has problem with cgroup, but the patch
> had been sent by Herbert Xu.
> patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
> 
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> ---
>   drivers/net/tun.c |    4 +++-
>   1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44c4db8..90b4e58 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct 
> *tun, struct tun_file *tfile,
>   	skb_probe_transport_header(skb, 0);
> 
>   	rxhash = skb_get_hash(skb);
> -	netif_rx_ni(skb);
> +	rcu_read_lock_bh();
> +	netif_receive_skb(skb);
> +	rcu_read_unlock_bh();
> 
>   	tun->dev->stats.rx_packets++;
>   	tun->dev->stats.rx_bytes += len;

I already said this patch is not good :

rcu_read_lock_bh() makes no sense here.

What is really needed is local_bh_disable();

Herbert patch ( http://patchwork.ozlabs.org/patch/52963/ ) had a much
cleaner form.

Just use it, CC him, credit him, please ?




^ permalink raw reply

* [PATCH] netfilter: nf_nat_snmp_basic: fix duplicates in if/else branches
From: Francois-Xavier Le Bail @ 2014-02-11 14:49 UTC (permalink / raw)
  To: NETDEV, David Miller, Patrick McHardy, netfilter-devel, netfilter,
	coreteam

The solution was found by Patrick in 2.4 kernel sources.

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
---
 net/ipv4/netfilter/nf_nat_snmp_basic.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index d551e31..7c67667 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -1198,8 +1198,8 @@ static int snmp_translate(struct nf_conn *ct,
 		map.to = NOCT1(&ct->tuplehash[!dir].tuple.dst.u3.ip);
 	} else {
 		/* DNAT replies */
-		map.from = NOCT1(&ct->tuplehash[dir].tuple.src.u3.ip);
-		map.to = NOCT1(&ct->tuplehash[!dir].tuple.dst.u3.ip);
+		map.from = NOCT1(&ct->tuplehash[!dir].tuple.src.u3.ip);
+		map.to = NOCT1(&ct->tuplehash[dir].tuple.dst.u3.ip);
 	}
 
 	if (map.from == map.to)

^ permalink raw reply related

* Re: [PATCH v2] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: Vlad Yasevich @ 2014-02-11 14:52 UTC (permalink / raw)
  To: Matija Glavinic Pecotic, linux-sctp@vger.kernel.org
  Cc: netdev@vger.kernel.org, Alexander Sverdlin
In-Reply-To: <52F72AFE.4060802@nsn.com>

Hi Matija

On 02/09/2014 02:15 AM, Matija Glavinic Pecotic wrote:
>
> Proposed solution:
>
> Both problems share the same root cause, and that is improper scaling
of socket
> buffer with rwnd. Solution in which sizeof(sk_buff) is taken into
concern while
> calculating rwnd is not possible due to fact that there is no linear
> relationship between amount of data blamed in increase/decrease with
IP packet
> in which payload arrived. Even in case such solution would be followed,
> complexity of the code would increase. Due to nature of current rwnd
handling,
> slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure
state is
> entered is rationale, but it gives false representation to the sender
of current
> buffer space. Furthermore, it implements additional congestion control
mechanism
> which is defined on implementation, and not on standard basis.
>
> Proposed solution simplifies whole algorithm having on mind definition
from rfc:
>
> o  Receiver Window (rwnd): This gives the sender an indication of the
space
>    available in the receiver's inbound buffer.
>
> Core of the proposed solution is given with these lines:
>
> sctp_assoc_rwnd_update:
> 	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
> 		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
> 	else
> 		asoc->rwnd = 0;
>
> We advertise to sender (half of) actual space we have. Half is in the
braces
> depending whether you would like to observe size of socket buffer as
SO_RECVBUF
> or twice the amount, i.e. size is the one visible from userspace, that is,
> from kernelspace.
> In this way sender is given with good approximation of our buffer space,
> regardless of the buffer policy - we always advertise what we have.
Proposed
> solution fixes described problems and removes necessity for rwnd
restoration
> algorithm. Finally, as proposed solution is simplification, some lines
of code,
> along with some bytes in struct sctp_association are saved.
>
> Signed-off-by: Matija Glavinic Pecotic
<matija.glavinic-pecotic.ext@nsn.com>
> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
>
> --- net-next.orig/net/sctp/associola.c
> +++ net-next/net/sctp/associola.c
> @@ -1367,44 +1367,35 @@ static inline bool sctp_peer_needs_updat
>  	return false;
>  }
>
> -/* Increase asoc's rwnd by len and send any window update SACK if
needed. */
> -void sctp_assoc_rwnd_increase(struct sctp_association *asoc, unsigned
int len)
> +/* Update asoc's rwnd for the approximated state in the buffer,
> + * and check whether SACK needs to be sent.
> + */
> +void sctp_assoc_rwnd_update(struct sctp_association *asoc, bool
update_peer)
>  {
> +	int rx_count;
>  	struct sctp_chunk *sack;
>  	struct timer_list *timer;
>
> -	if (asoc->rwnd_over) {
> -		if (asoc->rwnd_over >= len) {
> -			asoc->rwnd_over -= len;
> -		} else {
> -			asoc->rwnd += (len - asoc->rwnd_over);
> -			asoc->rwnd_over = 0;
> -		}
> -	} else {
> -		asoc->rwnd += len;
> -	}
> +	if (asoc->ep->rcvbuf_policy)
> +		rx_count = atomic_read(&asoc->rmem_alloc);
> +	else
> +		rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc);
>
> -	/* If we had window pressure, start recovering it
> -	 * once our rwnd had reached the accumulated pressure
> -	 * threshold.  The idea is to recover slowly, but up
> -	 * to the initial advertised window.
> -	 */
> -	if (asoc->rwnd_press && asoc->rwnd >= asoc->rwnd_press) {
> -		int change = min(asoc->pathmtu, asoc->rwnd_press);
> -		asoc->rwnd += change;
> -		asoc->rwnd_press -= change;
> -	}
> +	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
> +		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
> +	else
> +		asoc->rwnd = 0;
>
> -	pr_debug("%s: asoc:%p rwnd increased by %d to (%u, %u) - %u\n",
> -		 __func__, asoc, len, asoc->rwnd, asoc->rwnd_over,
> -		 asoc->a_rwnd);
> +	pr_debug("%s: asoc:%p rwnd=%u, rx_count=%d, sk_rcvbuf=%d\n",
> +		 __func__, asoc, asoc->rwnd, rx_count,
> +		 asoc->base.sk->sk_rcvbuf);
>
>  	/* Send a window update SACK if the rwnd has increased by at least the
>  	 * minimum of the association's PMTU and half of the receive buffer.
>  	 * The algorithm used is similar to the one described in
>  	 * Section 4.2.3.3 of RFC 1122.
>  	 */
> -	if (sctp_peer_needs_update(asoc)) {
> +	if (update_peer && sctp_peer_needs_update(asoc)) {
>  		asoc->a_rwnd = asoc->rwnd;
>
>  		pr_debug("%s: sending window update SACK- asoc:%p rwnd:%u "
> @@ -1426,45 +1417,6 @@ void sctp_assoc_rwnd_increase(struct sct
>  	}
>  }
>
> -/* Decrease asoc's rwnd by len. */
> -void sctp_assoc_rwnd_decrease(struct sctp_association *asoc, unsigned
int len)
> -{
> -	int rx_count;
> -	int over = 0;
> -
> -	if (unlikely(!asoc->rwnd || asoc->rwnd_over))
> -		pr_debug("%s: association:%p has asoc->rwnd:%u, "
> -			 "asoc->rwnd_over:%u!\n", __func__, asoc,
> -			 asoc->rwnd, asoc->rwnd_over);
> -
> -	if (asoc->ep->rcvbuf_policy)
> -		rx_count = atomic_read(&asoc->rmem_alloc);
> -	else
> -		rx_count = atomic_read(&asoc->base.sk->sk_rmem_alloc);
> -
> -	/* If we've reached or overflowed our receive buffer, announce
> -	 * a 0 rwnd if rwnd would still be positive.  Store the
> -	 * the potential pressure overflow so that the window can be restored
> -	 * back to original value.
> -	 */
> -	if (rx_count >= asoc->base.sk->sk_rcvbuf)
> -		over = 1;
> -
> -	if (asoc->rwnd >= len) {
> -		asoc->rwnd -= len;
> -		if (over) {
> -			asoc->rwnd_press += asoc->rwnd;
> -			asoc->rwnd = 0;
> -		}
> -	} else {
> -		asoc->rwnd_over = len - asoc->rwnd;
> -		asoc->rwnd = 0;
> -	}
> -
> -	pr_debug("%s: asoc:%p rwnd decreased by %d to (%u, %u, %u)\n",
> -		 __func__, asoc, len, asoc->rwnd, asoc->rwnd_over,
> -		 asoc->rwnd_press);
> -}
>
>  /* Build the bind address list for the association based on info from the
>   * local endpoint and the remote peer.
> --- net-next.orig/include/net/sctp/structs.h
> +++ net-next/include/net/sctp/structs.h
> @@ -1653,17 +1653,6 @@ struct sctp_association {
>  	/* This is the last advertised value of rwnd over a SACK chunk. */
>  	__u32 a_rwnd;
>
> -	/* Number of bytes by which the rwnd has slopped.  The rwnd is allowed
> -	 * to slop over a maximum of the association's frag_point.
> -	 */
> -	__u32 rwnd_over;
> -
> -	/* Keeps treack of rwnd pressure.  This happens when we have
> -	 * a window, but not recevie buffer (i.e small packets).  This one
> -	 * is releases slowly (1 PMTU at a time ).
> -	 */
> -	__u32 rwnd_press;
> -
>  	/* This is the sndbuf size in use for the association.
>  	 * This corresponds to the sndbuf size for the association,
>  	 * as specified in the sk->sndbuf.
> @@ -1892,8 +1881,7 @@ void sctp_assoc_update(struct sctp_assoc
>  __u32 sctp_association_get_next_tsn(struct sctp_association *);
>
>  void sctp_assoc_sync_pmtu(struct sock *, struct sctp_association *);
> -void sctp_assoc_rwnd_increase(struct sctp_association *, unsigned int);
> -void sctp_assoc_rwnd_decrease(struct sctp_association *, unsigned int);
> +void sctp_assoc_rwnd_update(struct sctp_association *, bool);
>  void sctp_assoc_set_primary(struct sctp_association *,
>  			    struct sctp_transport *);
>  void sctp_assoc_del_nonprimary_peers(struct sctp_association *,
> --- net-next.orig/net/sctp/sm_statefuns.c
> +++ net-next/net/sctp/sm_statefuns.c
> @@ -6176,7 +6176,7 @@ static int sctp_eat_data(const struct sc
>  	 * PMTU.  In cases, such as loopback, this might be a rather
>  	 * large spill over.
>  	 */
> -	if ((!chunk->data_accepted) && (!asoc->rwnd || asoc->rwnd_over ||
> +	if ((!chunk->data_accepted) && (!asoc->rwnd ||
>  	    (datalen > asoc->rwnd + asoc->frag_point))) {
>
>  		/* If this is the next TSN, consider reneging to make
> --- net-next.orig/net/sctp/socket.c
> +++ net-next/net/sctp/socket.c
> @@ -2092,12 +2092,6 @@ static int sctp_recvmsg(struct kiocb *io
>  		sctp_skb_pull(skb, copied);
>  		skb_queue_head(&sk->sk_receive_queue, skb);
>
> -		/* When only partial message is copied to the user, increase
> -		 * rwnd by that amount. If all the data in the skb is read,
> -		 * rwnd is updated when the event is freed.
> -		 */
> -		if (!sctp_ulpevent_is_notification(event))
> -			sctp_assoc_rwnd_increase(event->asoc, copied);
>  		goto out;
>  	} else if ((event->msg_flags & MSG_NOTIFICATION) ||
>  		   (event->msg_flags & MSG_EOR))
> --- net-next.orig/net/sctp/ulpevent.c
> +++ net-next/net/sctp/ulpevent.c
> @@ -989,7 +989,7 @@ static void sctp_ulpevent_receive_data(s
>  	skb = sctp_event2skb(event);
>  	/* Set the owner and charge rwnd for bytes received.  */
>  	sctp_ulpevent_set_owner(event, asoc);
> -	sctp_assoc_rwnd_decrease(asoc, skb_headlen(skb));
> +	sctp_assoc_rwnd_update(asoc, false);
>
>  	if (!skb->data_len)
>  		return;
> @@ -1035,8 +1035,9 @@ static void sctp_ulpevent_release_data(s
>  	}
>
>  done:
> -	sctp_assoc_rwnd_increase(event->asoc, len);
> -	sctp_ulpevent_release_owner(event);
> +	atomic_sub(event->rmem_len, &event->asoc->rmem_alloc);
> +	sctp_assoc_rwnd_update(event->asoc, true);
> +	sctp_association_put(event->asoc)

Can't we simply change the order of window update and release instead
of open coding it like this?


-vlad

>  }
>
>  static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH] net: rtnetlink: fix correct size given to memset
From: Francois-Xavier Le Bail @ 2014-02-11 14:06 UTC (permalink / raw)
  To: NETDEV, David Miller

Find by cppcheck:
[net/core/rtnetlink.c:1842]: (warning) Using size of pointer linkinfo instead of size of its data.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
---

The diagnosis of cppcheck seems relevant.

 net/core/rtnetlink.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 048dc8d..9a5dbf1 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1839,7 +1839,7 @@ replay:
 		if (err < 0)
 			return err;
 	} else
-		memset(linkinfo, 0, sizeof(linkinfo));
+		memset(linkinfo, 0, sizeof(*linkinfo));
 
 	if (linkinfo[IFLA_INFO_KIND]) {
 		nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));

^ permalink raw reply related

* [PATCH] tun: use netif_receive_skb instead of netif_rx_ni
From: Qin Chuanyu @ 2014-02-11 14:25 UTC (permalink / raw)
  To: davem
  Cc: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Eric Dumazet

we could xmit directly instead of going through softirq to gain 
throughput and lantency improved.
test model: VM-Host-Host just do transmit. with vhost thread and nic
interrupt bind cpu1. netperf do throuhput test and qperf do lantency test.
Host OS: suse11sp3, Guest OS: suse11sp3

latency result(us):
packet_len 64 256 512 1460
old(UDP)   44  47  48   66
new(UDP)   38  41  42   66

old(TCP)   52  55  70  117
new(TCP)   45  48  61  114

throughput result(Gbit/s):
packet_len   64   512   1024   1460
old(UDP)   0.42  2.02   3.75   4.68
new(UDP)   0.45  2.14   3.77   5.06

TCP due to the latency, client couldn't send packet big enough
to get benefit from TSO of nic, so the result show it will send
more packet per sencond but get lower throughput.

Eric mentioned that it would has problem with cgroup, but the patch
had been sent by Herbert Xu.
patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52

Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
---
  drivers/net/tun.c |    4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 44c4db8..90b4e58 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct 
*tun, struct tun_file *tfile,
  	skb_probe_transport_header(skb, 0);

  	rxhash = skb_get_hash(skb);
-	netif_rx_ni(skb);
+	rcu_read_lock_bh();
+	netif_receive_skb(skb);
+	rcu_read_unlock_bh();

  	tun->dev->stats.rx_packets++;
  	tun->dev->stats.rx_bytes += len;
-- 
1.7.3.1.msysgit.0

^ permalink raw reply related

* Re: Bonding
From: Veaceslav Falico @ 2014-02-11 14:15 UTC (permalink / raw)
  To: Gustavo Pimentel; +Cc: netdev@vger.kernel.org
In-Reply-To: <8532C3BD1ECBC64BA47CE43AFED6D38EC37E5AA5@S103.efacec.pt>

On Tue, Feb 11, 2014 at 01:53:32PM +0000, Gustavo Pimentel wrote:
>Hi,

Hi Gustavo,

>
>I'm writing you because because I'm have implemented a new mode (PRP Parallel Redundancy Protocol) for bonding kernel driver. This new mode is quite simple, I don't know if you have heard about PRP, but it's a new standard that allows to overcome any single network failure without affecting the data transmission. The general idea resides on having two separate LAN (A & B) very similar and transmitting the almost the same frame through both LANs and the end device should accept one frame and discard the other according to a known mechanism.

Isn't that the current 'broadcast' mode, where every packet is transmitted
over all the slaves? After quick googling/reading I don't see any
difference there, though I might have missed something.

>
>I have implemented this new mode on bonding driver, but I have some difficulties:
>. Writing linux driver is quite new for me. I don't' know if exists guide lines for driver coding.

You can find everything under Documentation/, but without the code I can't
tell you exact documents. CodingStyle and SubmittingPatches might be the
first ones.

Also, try CC-ing relevant people for more feedback, specifically bonding
maintainers.

>. I don't know how to submit the code to be include on kernel repository.
>. Maybe another pair of eyes could find help to improve the writing code for this mode.

Try sending an RFC when net-next opens.

>
>I think my driver code is 99% complete. I'm currently testing with 3 equipments (1 pc + 1 embedded device running both my modify bonding driver) and a third party equipment called RedBox.
>
>Would you be interested in participating / helping this project?
>
>With my best regards,
>
>Gustavo Gama da Rocha Pimentel
>Power Systems Automation / Innovation & Development
>Efacec Engenharia e Sistemas, S.A.
>Phone: +351229403391
>Disclaimer
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] can: xilinx CAN controller support.
From: Arnd Bergmann @ 2014-02-11 14:12 UTC (permalink / raw)
  To: monstr
  Cc: Marc Kleine-Budde, Michal Simek, robh+dt@kernel.org,
	Appana Durga Kedareswara Rao, wg@grandegger.com,
	grant.likely@linaro.org, linux-can@vger.kernel.org,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org
In-Reply-To: <52FA2419.70102@monstr.eu>

On Tuesday 11 February 2014, Michal Simek wrote:
> >> The only remaining question is if we should create generic DT binding
> >> for fifo depth. Arnd, Rob: Any opinion about it?
> >> Definitely will be worth to have one generic binding if this is generic feature.
> >> But if this is just specific feature for us then current properties should
> >> be fine.
> >>
> >> In general all these xlnx,XXX properties just reflect all configurable options
> >> which you can setup in design tool which means that provide full hw description
> >> with all variants and they are automatically generated from tools.
> >>
> >> Please let me know what you think.
> > 
> > I like:
> > 
> >     rx-fifo-depth
> >     tx-fifo-depth
> 
> No problem with that. Let Kedar to fix it according this and he will send v2 with this.
> 

Sounds reasonable to me too.

	Arnd

^ permalink raw reply

* Bonding
From: Gustavo Pimentel @ 2014-02-11 13:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi,

I'm writing you because because I'm have implemented a new mode (PRP Parallel Redundancy Protocol) for bonding kernel driver. This new mode is quite simple, I don't know if you have heard about PRP, but it's a new standard that allows to overcome any single network failure without affecting the data transmission. The general idea resides on having two separate LAN (A & B) very similar and transmitting the almost the same frame through both LANs and the end device should accept one frame and discard the other according to a known mechanism. 

I have implemented this new mode on bonding driver, but I have some difficulties:
. Writing linux driver is quite new for me. I don't' know if exists guide lines for driver coding.
. I don't know how to submit the code to be include on kernel repository.
. Maybe another pair of eyes could find help to improve the writing code for this mode.

I think my driver code is 99% complete. I'm currently testing with 3 equipments (1 pc + 1 embedded device running both my modify bonding driver) and a third party equipment called RedBox.

Would you be interested in participating / helping this project?

With my best regards, 

Gustavo Gama da Rocha Pimentel
Power Systems Automation / Innovation & Development
Efacec Engenharia e Sistemas, S.A.
Phone: +351229403391
Disclaimer

^ permalink raw reply

* Re: [PATCH] can: xilinx CAN controller support.
From: Michal Simek @ 2014-02-11 13:22 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Michal Simek, robh+dt@kernel.org, Arnd Bergmann,
	Appana Durga Kedareswara Rao, wg@grandegger.com,
	grant.likely@linaro.org, linux-can@vger.kernel.org,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org
In-Reply-To: <52FA1910.2060101@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 6218 bytes --]

On 02/11/2014 01:35 PM, Marc Kleine-Budde wrote:
> On 02/11/2014 12:45 PM, Michal Simek wrote:
>> Hi Marc,
>>
>> On 02/07/2014 10:37 AM, Marc Kleine-Budde wrote:
>>> On 02/07/2014 09:42 AM, Appana Durga Kedareswara Rao wrote:
>>>>>> ---
>>>>>> This patch is rebased on the 3.14 rc1 kernel.
>>>>>> ---
>>>>>>  .../devicetree/bindings/net/can/xilinx_can.txt     |   43 +
>>>>>>  drivers/net/can/Kconfig                            |    8 +
>>>>>>  drivers/net/can/Makefile                           |    1 +
>>>>>>  drivers/net/can/xilinx_can.c                       | 1150 ++++++++++++++++++++
>>>>>>  4 files changed, 1202 insertions(+), 0 deletions(-)  create mode
>>>>>> 100644 Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>>>  create mode 100644 drivers/net/can/xilinx_can.c
>>>>>>
>>>>>> diff --git a/Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>>> b/Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>>> new file mode 100644
>>>>>> index 0000000..34f9643
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>>> @@ -0,0 +1,43 @@
>>>>>> +Xilinx Axi CAN/Zynq CANPS controller Device Tree Bindings
>>>>>> +---------------------------------------------------------
>>>>>> +
>>>>>> +Required properties:
>>>>>> +- compatible               : Should be "xlnx,zynq-can-1.00.a" for Zynq
>>>>> CAN
>>>>>> +                     controllers and "xlnx,axi-can-1.00.a" for Axi CAN
>>>>>> +                     controllers.
>>>>>> +- reg                      : Physical base address and size of the Axi CAN/Zynq
>>>>>> +                     CANPS registers map.
>>>>>> +- interrupts               : Property with a value describing the interrupt
>>>>>> +                     number.
>>>>>> +- interrupt-parent : Must be core interrupt controller
>>>>>> +- clock-names              : List of input clock names - "ref_clk",
>>>>> "aper_clk"
>>>>>> +                     (See clock bindings for details. Two clocks are
>>>>>> +                      required for Zynq CAN. For Axi CAN
>>>>>> +                      case it is one(ref_clk)).
>>>>>> +- clocks           : Clock phandles (see clock bindings for details).
>>>>>> +- xlnx,can-tx-dpth : Can Tx fifo depth (Required for Axi CAN).
>>>>>> +- xlnx,can-rx-dpth : Can Rx fifo depth (Required for Axi CAN).
>>>>>> +
>>>>>> +
>>>>>> +Example:
>>>>>> +
>>>>>> +For Zynq CANPS Dts file:
>>>>>> +   zynq_can_0: zynq-can@e0008000 {
>>>>>> +                   compatible = "xlnx,zynq-can-1.00.a";
>>>>>> +                   clocks = <&clkc 19>, <&clkc 36>;
>>>>>> +                   clock-names = "ref_clk", "aper_clk";
>>>>>> +                   reg = <0xe0008000 0x1000>;
>>>>>> +                   interrupts = <0 28 4>;
>>>>>> +                   interrupt-parent = <&intc>;
>>>>>
>>>>> Above xlnx,can-{rx,tx}-dpth is mentioned as required, but it's not in the
>>>>> Zynq example.
>>>>
>>>> One of the Difference b/w the AXI CAN and zynq CAN is in AXI CAN the fifo depth(tx,rx)
>>>> Is user configurable. But in case of ZYNQ CAN  the fifo depth  is fixed for tx and rx fifo's(64)
>>>> Xlnx,can-{rx,tx}-dpth is required only for AXI CAN case it is not required for zynq CAN.
>>>> That's why didn't putted that property in device tree.
>>>
>>> The device tree should be a hardware only description and should not
>>> hold any user configurable data. Please split your patch into two
>>> patches. The first one should add the driver with a fixed fifo size
>>> (e.g. 0x40) for the AXI, too. The second patch should make the fifo
>>> configurable via device tree.
>>
>> can-rx/tx-dpth is not user configurable data as you think.
>> This is FPGA where you can configure this parameter in design tools.
>> It means these 2 values just describe real hardware and user can't just change it
>> for different software behaviour.
> 
> I see, thanks for the clarification. I had a short grep over the
> arm/boot/dts folder and it seems that fifo-depth is a more or less
> common property. I think it should be called {rx,tx}-fifo-depth. I'm
> unsure whether we need the xlnx or not.

We are using xlnx prefix for all generated properties that's why
Kedar just kept it there.


>> Also I don't think it is worth to create 2 patches for the same driver
>> where the first one is useless for axi can device. But if you think
>> that it is worth to do we can create 2 patches as you suggested.
>>
>> Also what we can do is to define that this property is required also
>> for zynq which is 0x40 and change code according too.
> 
> Good idea, I think this would make the driver more uniform.

ok.


>>> If it's acceptable to describe the fifo usage by device tree, I'd like
>>> to make it a generic CAN driver option. But we have to look around, e.g.
>>> what the Ethernet driver use to configure their hardware.
>>
>> I think the real question is not if this is acceptable or not. It is just
>> reality that we can setup hardware fifo depth and driver has to reflect this
>> because without it driver just doesn't work for axi can.
>>
>> The only remaining question is if we should create generic DT binding
>> for fifo depth. Arnd, Rob: Any opinion about it?
>> Definitely will be worth to have one generic binding if this is generic feature.
>> But if this is just specific feature for us then current properties should
>> be fine.
>>
>> In general all these xlnx,XXX properties just reflect all configurable options
>> which you can setup in design tool which means that provide full hw description
>> with all variants and they are automatically generated from tools.
>>
>> Please let me know what you think.
> 
> I like:
> 
>     rx-fifo-depth
>     tx-fifo-depth

No problem with that. Let Kedar to fix it according this and he will send v2 with this.

Thanks,
Michal



-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: 8% performance improved by change tap interact with kernel stack
From: Qin Chuanyu @ 2014-02-11 13:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: jasowang, Michael S. Tsirkin, Anthony Liguori, KVM list, netdev,
	Peter Klausler, davem
In-Reply-To: <1390920560.28432.8.camel@edumazet-glaptop2.roam.corp.google.com>

On 2014/1/28 22:49, Eric Dumazet wrote:
> On Tue, 2014-01-28 at 16:14 +0800, Qin Chuanyu wrote:
>> according perf test result，I found that there are 5%-8% cpu cost on
>> softirq by use netif_rx_ni called in tun_get_user.
>>
>> so I changed the function which cause skb transmitted more quickly.
>> from
>> 	tun_get_user	->
>> 		 netif_rx_ni(skb);
>> to
>> 	tun_get_user	->
>> 		rcu_read_lock_bh();
>> 		netif_receive_skb(skb);
>> 		rcu_read_unlock_bh();
>
> No idea why you use rcu here ?
>
>>
>> The test result is as below:
>> 	CPU: Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
>> 	NIC: intel 82599
>> 	Host OS/Guest OS:suse11sp3
>> 	Qemu-1.6
>> 	netperf udp 512(VM tx)
>> 	test model: VM->host->host
>>
>> 	modified before : 2.00Gbps 461146pps
>> 	modified after  : 2.16Gbps 498782pps
>>
>> 8% performance gained from this change,
>> Is there any problem for this patch ?
>
> http://patchwork.ozlabs.org/patch/52963/
the problem list above show that:

   Since the cgroup classifier has this check:
     if (softirq_count() != SOFTIRQ_OFFSET))
	   return -1;
   We still fail to classify the frame.

but the source in the recently version has changed as below:
	if (in_serving_softirq()) {
		/* If there is an sk_classid we'll use that. */
		if (!skb->sk)
			return -1;
		classid = skb->sk->sk_classid;
	}

skb is allocated by tun_alloc_skb, so skb->sk is not NULL.
I think the problem is not existed anymore.



^ permalink raw reply

* Re: [PATCH v5 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Herbert Xu @ 2014-02-11 13:14 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Eric Dumazet
In-Reply-To: <1392064537-30646-2-git-send-email-fw@strlen.de>

On Mon, Feb 10, 2014 at 09:35:37PM +0100, Florian Westphal wrote:
> Marcelo Ricardo Leitner reported problems when the forwarding link path
> has a lower mtu than the incoming one if the inbound interface supports GRO.
> 
> Given:
> Host <mtu1500> R1 <mtu1200> R2
> 
> Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
> 
> In this case, the kernel will fail to send ICMP fragmentation needed
> messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
> checks in forward path. Instead, Linux tries to send out packets exceeding
> the mtu.
> 
> When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
> not fragment the packets when forwarding, and again tries to send out
> packets exceeding R1-R2 link mtu.
> 
> This alters the forwarding dstmtu checks to take the individual gso
> segment lengths into account.
> 
> For ipv6, we send out pkt too big error for gso if the individual
> segments are too big.
> 
> For ipv4, we either send icmp fragmentation needed, or, if the DF bit
> is not set, perform software segmentation and let the output path
> create fragments when the packet is leaving the machine.
> It is not 100% correct as the error message will contain the headers of
> the GRO skb instead of the original/segmented one, but it seems to
> work fine in my (limited) tests.
> 
> Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
> sofware segmentation.
> 
> However it turns out that skb_segment() assumes skb nr_frags is related
> to mss size so we would BUG there.  I don't want to mess with it considering
> Herbert and Eric disagree on what the correct behavior should be.
> 
> Hannes Frederic Sowa notes that when we would shrink gso_size
> skb_segment would then also need to deal with the case where
> SKB_MAX_FRAGS would be exceeded.
> 
> This uses sofware segmentation in the forward path when we hit ipv4
> non-DF packets and the outgoing link mtu is too small.  Its not perfect,
> but given the lack of bug reports wrt. GRO fwd being broken this is a
> rare case anyway.  Also its not like this could not be improved later
> once the dust settles.
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Although I think we're adding too much complexity for ~DF packets,
I don't see anything wrong with this patch per se since we're
already aggregating ~DF packets.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] can: xilinx CAN controller support.
From: Marc Kleine-Budde @ 2014-02-11 12:35 UTC (permalink / raw)
  To: Michal Simek, robh+dt@kernel.org, Arnd Bergmann
  Cc: Appana Durga Kedareswara Rao, wg@grandegger.com,
	grant.likely@linaro.org, linux-can@vger.kernel.org,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org
In-Reply-To: <da80c522-5524-4f53-9373-3f2bb6f81ddc@CO9EHSMHS015.ehs.local>

[-- Attachment #1: Type: text/plain, Size: 5805 bytes --]

On 02/11/2014 12:45 PM, Michal Simek wrote:
> Hi Marc,
> 
> On 02/07/2014 10:37 AM, Marc Kleine-Budde wrote:
>> On 02/07/2014 09:42 AM, Appana Durga Kedareswara Rao wrote:
>>>>> ---
>>>>> This patch is rebased on the 3.14 rc1 kernel.
>>>>> ---
>>>>>  .../devicetree/bindings/net/can/xilinx_can.txt     |   43 +
>>>>>  drivers/net/can/Kconfig                            |    8 +
>>>>>  drivers/net/can/Makefile                           |    1 +
>>>>>  drivers/net/can/xilinx_can.c                       | 1150 ++++++++++++++++++++
>>>>>  4 files changed, 1202 insertions(+), 0 deletions(-)  create mode
>>>>> 100644 Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>>  create mode 100644 drivers/net/can/xilinx_can.c
>>>>>
>>>>> diff --git a/Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>> b/Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>> new file mode 100644
>>>>> index 0000000..34f9643
>>>>> --- /dev/null
>>>>> +++ b/Documentation/devicetree/bindings/net/can/xilinx_can.txt
>>>>> @@ -0,0 +1,43 @@
>>>>> +Xilinx Axi CAN/Zynq CANPS controller Device Tree Bindings
>>>>> +---------------------------------------------------------
>>>>> +
>>>>> +Required properties:
>>>>> +- compatible               : Should be "xlnx,zynq-can-1.00.a" for Zynq
>>>> CAN
>>>>> +                     controllers and "xlnx,axi-can-1.00.a" for Axi CAN
>>>>> +                     controllers.
>>>>> +- reg                      : Physical base address and size of the Axi CAN/Zynq
>>>>> +                     CANPS registers map.
>>>>> +- interrupts               : Property with a value describing the interrupt
>>>>> +                     number.
>>>>> +- interrupt-parent : Must be core interrupt controller
>>>>> +- clock-names              : List of input clock names - "ref_clk",
>>>> "aper_clk"
>>>>> +                     (See clock bindings for details. Two clocks are
>>>>> +                      required for Zynq CAN. For Axi CAN
>>>>> +                      case it is one(ref_clk)).
>>>>> +- clocks           : Clock phandles (see clock bindings for details).
>>>>> +- xlnx,can-tx-dpth : Can Tx fifo depth (Required for Axi CAN).
>>>>> +- xlnx,can-rx-dpth : Can Rx fifo depth (Required for Axi CAN).
>>>>> +
>>>>> +
>>>>> +Example:
>>>>> +
>>>>> +For Zynq CANPS Dts file:
>>>>> +   zynq_can_0: zynq-can@e0008000 {
>>>>> +                   compatible = "xlnx,zynq-can-1.00.a";
>>>>> +                   clocks = <&clkc 19>, <&clkc 36>;
>>>>> +                   clock-names = "ref_clk", "aper_clk";
>>>>> +                   reg = <0xe0008000 0x1000>;
>>>>> +                   interrupts = <0 28 4>;
>>>>> +                   interrupt-parent = <&intc>;
>>>>
>>>> Above xlnx,can-{rx,tx}-dpth is mentioned as required, but it's not in the
>>>> Zynq example.
>>>
>>> One of the Difference b/w the AXI CAN and zynq CAN is in AXI CAN the fifo depth(tx,rx)
>>> Is user configurable. But in case of ZYNQ CAN  the fifo depth  is fixed for tx and rx fifo's(64)
>>> Xlnx,can-{rx,tx}-dpth is required only for AXI CAN case it is not required for zynq CAN.
>>> That's why didn't putted that property in device tree.
>>
>> The device tree should be a hardware only description and should not
>> hold any user configurable data. Please split your patch into two
>> patches. The first one should add the driver with a fixed fifo size
>> (e.g. 0x40) for the AXI, too. The second patch should make the fifo
>> configurable via device tree.
> 
> can-rx/tx-dpth is not user configurable data as you think.
> This is FPGA where you can configure this parameter in design tools.
> It means these 2 values just describe real hardware and user can't just change it
> for different software behaviour.

I see, thanks for the clarification. I had a short grep over the
arm/boot/dts folder and it seems that fifo-depth is a more or less
common property. I think it should be called {rx,tx}-fifo-depth. I'm
unsure whether we need the xlnx or not.

> Also I don't think it is worth to create 2 patches for the same driver
> where the first one is useless for axi can device. But if you think
> that it is worth to do we can create 2 patches as you suggested.
> 
> Also what we can do is to define that this property is required also
> for zynq which is 0x40 and change code according too.

Good idea, I think this would make the driver more uniform.

>> If it's acceptable to describe the fifo usage by device tree, I'd like
>> to make it a generic CAN driver option. But we have to look around, e.g.
>> what the Ethernet driver use to configure their hardware.
> 
> I think the real question is not if this is acceptable or not. It is just
> reality that we can setup hardware fifo depth and driver has to reflect this
> because without it driver just doesn't work for axi can.
> 
> The only remaining question is if we should create generic DT binding
> for fifo depth. Arnd, Rob: Any opinion about it?
> Definitely will be worth to have one generic binding if this is generic feature.
> But if this is just specific feature for us then current properties should
> be fine.
> 
> In general all these xlnx,XXX properties just reflect all configurable options
> which you can setup in design tool which means that provide full hw description
> with all variants and they are automatically generated from tools.
> 
> Please let me know what you think.

I like:

    rx-fifo-depth
    tx-fifo-depth

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox