Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH][v4] tcp: fix ICMP-RTO war
From: Damian Lukowski @ 2010-05-08  8:30 UTC (permalink / raw)
  To: Jerry Chu; +Cc: ilpo.jarvinen, netdev
In-Reply-To: <n2gd1c2719f1005071625nd3df3d72t9655a1f64f2fd825@mail.gmail.com>

> I'm working on a patch that tries to measure and use the RTT for the passive
> open side when the TS option is NOT enabled. My code conflicts with your
> recently added "tcp_ack_update_rtt(sk, 0, 0);" Could you tell me why do you
> force this call for the no-TS case when obviously "0" is not a measured RTT?
> If you try to force icsk_rto to be initialized correctly, it is
> already initialized to
> TCP_TIMEOUT_INIT by tcp_create_openreq_child(). What am I missing?

Hi,
the backoff reversion code uses __tcp_set_rto() to recalculate icsk_rto,
which itself relies on tp->srtt and rttvar.
srtt is explicitly set to 0 in tcp_create_openreq_child(), so I didn't change it.
Initializing it with TCP_TIMEOUT_INIT should also fix that specific bug,
but I don't know if there are other impacts.

Regards
 Damian

> Thanks,
> 
> Jerry
> 
>> From: David Miller <davem@davemloft.net>
>>
>> Date: Sun, Feb 21, 2010 at 7:10 PM
>> Subject: Re: [PATCH][v4] tcp: fix ICMP-RTO war
>> To: ilpo.jarvinen@helsinki.fi
>> Cc: damian@tvk.rwth-aachen.de, netdev@vger.kernel.org
>>
>>
>> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
>> Date: Tue, 16 Feb 2010 14:45:25 +0200 (EET)
>>
>>> On Wed, 10 Feb 2010, Damian Lukowski wrote:
>>>
>>>> @@ -5783,12 +5783,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>>>>
>>>>                              /* tcp_ack considers this ACK as duplicate
>>>>                               * and does not calculate rtt.
>>>> -                             * Fix it at least with timestamps.
>>>> +                             * Force it here.
>>>>                               */
>>>> -                            if (tp->rx_opt.saw_tstamp &&
>>>> -                                tp->rx_opt.rcv_tsecr && !tp->srtt)
>>>> -                                    tcp_ack_saw_tstamp(sk, 0);
>>>> -
>>>> +                            tcp_ack_update_rtt(sk, 0, 0);
>>>> +
>>>
>>> ...Here a zero seq_rtt is given to RTT estimator (it will be effective
>>> only in the case w/o timestamps, TS case recalculates it from the stored
>>> timestamps). Maybe we could use some field (timestamp related one comes to
>>> my mind) in request sock to get a real RTT estimate for non-timestamp case
>>> too. ...It seems possible to me, though tricky because the request_sock is
>>> no longer that easily available here so some parameter passing would be
>>> needed.
>>
>> Agreed.
>>
>> But even more simply I think we should make even the current
>> tcp_ack_update_rtt() call here conditional on at least
>> tp->srtt being zero.
>>
>> Damian do you at least agree with that?
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [PATCH 1/2] netdev/fec: fix performance impact from mdio poll operation
From: Bryan Wu @ 2010-05-08 10:07 UTC (permalink / raw)
  To: Andy Fleming
  Cc: davem@davemloft.net, Sascha Hauer, Greg Ungerer, Amit Kucheria,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <i2l2acbd3e41005070906me50416bapd8154a567406b886@mail.gmail.com>

On 05/08/2010 12:06 AM, Andy Fleming wrote:
> On Thursday, May 6, 2010, Bryan Wu<bryan.wu@canonical.com>  wrote:
>> BugLink: http://bugs.launchpad.net/bugs/546649
>> BugLink: http://bugs.launchpad.net/bugs/457878
>>
>> After introducing phylib supporting, users experienced performace drop. That is
>> because of the mdio polling operation of phylib. Use msleep to replace the busy
>> waiting cpu_relax() and remove the warning message.
>
> I'm a little confused by this patch.  In order to improve performance,
> you made the mdio functions fail silently?  Why are you getting
> timeouts at all?

 From the BugLink, we experienced a lot of error message about mdio_read 
timeout. Actually, the logical in this function is quite simple and I still have 
no idea about why mdio_read timeout. That might be hardware defect.

> The Phy lib is not going to respond well if an mdio
> write times out without returning an error.  And returning an error
> means the whole state machine has to reset, as a failed write is not
> something we currently handle gracefully.
>

Right, if we return error to phylib due to mdio read times out, the performance 
will drop a lot.

> Is it possible to use an interrupt to get the phy state change?  This
> would allow the phy lib to stop its incessant polling.  It doesn't
> solve the question of why it's timing out, though.
>

I'm going to try the interrupt, but fec driver is shared by several SoCs. It 
might be a little complicated and still cannot fix the timing out issue. Is 
there any example driver to use an interrupt to get the phy state change?

Thanks,
-Bryan

>>
>> Signed-off-by: Bryan Wu<bryan.wu@canonical.com>
>> Acked-by: Andy Whitcroft<apw@canonical.com>
>> ---
>>   drivers/net/fec.c |   45 +++++++++++++++++++++------------------------
>>   1 files changed, 21 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
>> index 2b1651a..9c58f6b 100644
>> --- a/drivers/net/fec.c
>> +++ b/drivers/net/fec.c
>> @@ -203,7 +203,7 @@ static void fec_stop(struct net_device *dev);
>>   #define FEC_MMFR_TA            (2<<  16)
>>   #define FEC_MMFR_DATA(v)       (v&  0xffff)
>>
>> -#define FEC_MII_TIMEOUT                10000
>> +#define FEC_MII_TIMEOUT                10
>>
>>   /* Transmitter timeout */
>>   #define TX_TIMEOUT (2 * HZ)
>> @@ -611,13 +611,29 @@ spin_unlock:
>>   /*
>>    * NOTE: a MII transaction is during around 25 us, so polling it...
>>    */
>> -static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>> +static int fec_enet_mdio_poll(struct fec_enet_private *fep)
>>   {
>> -       struct fec_enet_private *fep = bus->priv;
>>          int timeout = FEC_MII_TIMEOUT;
>>
>>          fep->mii_timeout = 0;
>>
>> +       /* wait for end of transfer */
>> +       while (!(readl(fep->hwp + FEC_IEVENT)&  FEC_ENET_MII)) {
>> +               msleep(1);
>> +               if (timeout--<  0) {
>> +                       fep->mii_timeout = 1;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>> +{
>> +       struct fec_enet_private *fep = bus->priv;
>> +
>> +
>>          /* clear MII end of transfer bit*/
>>          writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>>
>> @@ -626,15 +642,7 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>>                  FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
>>                  FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
>>
>> -       /* wait for end of transfer */
>> -       while (!(readl(fep->hwp + FEC_IEVENT)&  FEC_ENET_MII)) {
>> -               cpu_relax();
>> -               if (timeout--<  0) {
>> -                       fep->mii_timeout = 1;
>> -                       printk(KERN_ERR "FEC: MDIO read timeout\n");
>> -                       return -ETIMEDOUT;
>> -               }
>> -       }
>> +       fec_enet_mdio_poll(fep);
>>
>>          /* return value */
>>          return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
>> @@ -644,9 +652,6 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>>                             u16 value)
>>   {
>>          struct fec_enet_private *fep = bus->priv;
>> -       int timeout = FEC_MII_TIMEOUT;
>> -
>> -       fep->mii_timeout = 0;
>>
>>          /* clear MII end of transfer bit*/
>>          writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>> @@ -657,15 +662,7 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>>                  FEC_MMFR_TA | FEC_MMFR_DATA(value),
>>                  fep->hwp + FEC_MII_DATA);
>>
>> -       /* wait for end of transfer */
>> -       while (!(readl(fep->hwp + FEC_IEVENT)&  FEC_ENET_MII)) {
>> -               cpu_relax();
>> -               if (timeout--<  0) {
>> -                       fep->mii_timeout = 1;
>> -                       printk(KERN_ERR "FEC: MDIO write timeout\n");
>> -                       return -ETIMEDOUT;
>> -               }
>> -       }
>> +       fec_enet_mdio_poll(fep);
>>
>>          return 0;
>>   }
>> --
>> 1.7.0.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply

* Re: [PATCH] ipv4: remove ip_rt_secret timer (v4)
From: Neil Horman @ 2010-05-08 12:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1273300597.2325.55.camel@edumazet-laptop>

On Sat, May 08, 2010 at 08:36:37AM +0200, Eric Dumazet wrote:
> Le vendredi 07 mai 2010 à 21:01 -0400, Neil Horman a écrit :
> > Hey, Had a moment tonight so I spun version 4 of the patch.
> > 
> > Change notes:
> > 1) Redid the initialization, in light of Erics clarification.  We randomly seed
> > all 32 bits of the rt_genid for a netns, but still just do 256 byte
> > modifications on cache invalidations
> > 
> > 2) got rid of the dup checking crap.
> > 
> > 
> > Summary:
> > 
> > A while back there was a discussion regarding the rt_secret_interval timer.
> > Given that we've had the ability to do emergency route cache rebuilds for awhile
> > now, based on a statistical analysis of the various hash chain lengths in the
> > cache, the use of the flush timer is somewhat redundant.  This patch removes the
> > rt_secret_interval sysctl, allowing us to rely solely on the statistical
> > analysis mechanism to determine the need for route cache flushes.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Sorry for the confusion Neil, thanks !
Its my fault, it was more clear to me on a second reading.  Apologies!
Neil

> 
> 
> 

^ permalink raw reply

* Re: [PATCH 1/2] netdev/fec: fix performance impact from mdio poll operation
From: Andy Fleming @ 2010-05-08 15:25 UTC (permalink / raw)
  To: bryan.wu@canonical.com
  Cc: davem@davemloft.net, Sascha Hauer, Greg Ungerer, Amit Kucheria,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <4BE537E2.6010809@canonical.com>

On Saturday, May 8, 2010, Bryan Wu <bryan.wu@canonical.com> wrote:
> On 05/08/2010 12:06 AM, Andy Fleming wrote:

> From the BugLink, we experienced a lot of error
>
> The Phy lib is not going to respond well if an mdio
> write times out without returning an error.  And returning an error
> means the whole state machine has to reset, as a failed write is not
> something we currently handle gracefully.
>
>
>
> Right, if we return error to phylib due to mdio read times out, the performance will drop a lot.


Yes, and if you *don't* return an error, you risk incorrect behavior.
This patch changes the driver to return the read result, after
detecting that the read result is invalid.  I don't know what the
value in that register will be, but it can't be correct.  If the
hardware is broken, then we need to find a workaround that doesn't
break things for chips which have working mdio.  It's also possible we
just need a longer timeout.  Does the mdio bus ever return correct
values?




>
>
> Is it possible to use an interrupt to get the phy state change?  This
> would allow the phy lib to stop its incessant polling.  It doesn't
> solve the question of why it's timing out, though.
>
>
>
> I'm going to try the interrupt, but fec driver is shared by several SoCs. It might be a little complicated and still cannot fix the timing out issue. Is there any example driver to use an interrupt to get the phy state change?

Well, there are examples, but they may be only somewhat helpful.  From
the perspective of the PHY Lib, all you have to do is put a correct
value into the irqs array in the mdio bus struct for your bus.  The
part where you will probably have to solve a problem is in figuring
out how to convey that information to the bus driver.  On Power CPUs
and socs, we use a device tree, which has an interrupt associated with
each PHY.  Look at wherever your mdio bus *device* is registered.
That's probably where you'll have that information.

Feel free to look at drivers/net/fsl_pq_mdio.c

Another possibility is to use the fixed link phy driver, which will
never attempt to use the mdio bus, and won't poll.

^ permalink raw reply

* Re: [PATCH][v4] tcp: fix ICMP-RTO war
From: Jerry Chu @ 2010-05-08 17:27 UTC (permalink / raw)
  To: Damian Lukowski; +Cc: ilpo.jarvinen, netdev
In-Reply-To: <4BE5213C.1030300@tvk.rwth-aachen.de>

On Sat, May 8, 2010 at 1:30 AM, Damian Lukowski
<damian@tvk.rwth-aachen.de> wrote:
>
> > I'm working on a patch that tries to measure and use the RTT for the passive
> > open side when the TS option is NOT enabled. My code conflicts with your
> > recently added "tcp_ack_update_rtt(sk, 0, 0);" Could you tell me why do you
> > force this call for the no-TS case when obviously "0" is not a measured RTT?
> > If you try to force icsk_rto to be initialized correctly, it is
> > already initialized to
> > TCP_TIMEOUT_INIT by tcp_create_openreq_child(). What am I missing?
>
> Hi,
> the backoff reversion code uses __tcp_set_rto() to recalculate icsk_rto,
> which itself relies on tp->srtt and rttvar.

Guess you are talking about

inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
                                         icsk->icsk_backoff;

inside tcp_v4_err(), right? (I'm looking at 2.6.33 kernel.)

Yes it seems to be a bug when __tcp_set_rto() is called before
tcp_rtt_estimator()
gets a chance to initialize all the variables properly.

But I don't like your fix of adding tcp_ack_update_rtt(sk, 0, 0); to
tcp_rcv_state_process()
because that means you've got a measured RTT of 0 (really meaning < 1 tick) for
the no-TS case, which will cause tcp_rtt_estimator() to compute all
the variables as if
there has been a valid RTT measurement of 1.

A better fix IMHO is to make sure all the variables are properly
initialized when exiting
tcp_init_metrics(), e.g, if srtt remains 0, make sure
tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
(mdev already been initialized to TCP_TIMEOUT_INIT. I think you got
hit by rttvar == 0)

> srtt is explicitly set to 0 in tcp_create_openreq_child(), so I didn't change it.
> Initializing it with TCP_TIMEOUT_INIT should also fix that specific bug,
> but I don't know if there are other impacts.

So what do I care? Because I'm mucking with the code in this area and your fix
causes some conflict with my logic.

What do you think?

Best,

Jerry

>
> Regards
>  Damian
>
> > Thanks,
> >
> > Jerry
> >
> >> From: David Miller <davem@davemloft.net>
> >>
> >> Date: Sun, Feb 21, 2010 at 7:10 PM
> >> Subject: Re: [PATCH][v4] tcp: fix ICMP-RTO war
> >> To: ilpo.jarvinen@helsinki.fi
> >> Cc: damian@tvk.rwth-aachen.de, netdev@vger.kernel.org
> >>
> >>
> >> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> >> Date: Tue, 16 Feb 2010 14:45:25 +0200 (EET)
> >>
> >>> On Wed, 10 Feb 2010, Damian Lukowski wrote:
> >>>
> >>>> @@ -5783,12 +5783,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
> >>>>
> >>>>                              /* tcp_ack considers this ACK as duplicate
> >>>>                               * and does not calculate rtt.
> >>>> -                             * Fix it at least with timestamps.
> >>>> +                             * Force it here.
> >>>>                               */
> >>>> -                            if (tp->rx_opt.saw_tstamp &&
> >>>> -                                tp->rx_opt.rcv_tsecr && !tp->srtt)
> >>>> -                                    tcp_ack_saw_tstamp(sk, 0);
> >>>> -
> >>>> +                            tcp_ack_update_rtt(sk, 0, 0);
> >>>> +
> >>>
> >>> ...Here a zero seq_rtt is given to RTT estimator (it will be effective
> >>> only in the case w/o timestamps, TS case recalculates it from the stored
> >>> timestamps). Maybe we could use some field (timestamp related one comes to
> >>> my mind) in request sock to get a real RTT estimate for non-timestamp case
> >>> too. ...It seems possible to me, though tricky because the request_sock is
> >>> no longer that easily available here so some parameter passing would be
> >>> needed.
> >>
> >> Agreed.
> >>
> >> But even more simply I think we should make even the current
> >> tcp_ack_update_rtt() call here conditional on at least
> >> tp->srtt being zero.
> >>
> >> Damian do you at least agree with that?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH][v4] tcp: fix ICMP-RTO war
From: Ilpo Järvinen @ 2010-05-08 22:00 UTC (permalink / raw)
  To: Jerry Chu; +Cc: Damian Lukowski, Netdev
In-Reply-To: <k2xd1c2719f1005081027v376a4ebfp300c6272f9ea91df@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2678 bytes --]

On Sat, 8 May 2010, Jerry Chu wrote:

> On Sat, May 8, 2010 at 1:30 AM, Damian Lukowski
> <damian@tvk.rwth-aachen.de> wrote:
> >
> > > I'm working on a patch that tries to measure and use the RTT for the passive
> > > open side when the TS option is NOT enabled. My code conflicts with your
> > > recently added "tcp_ack_update_rtt(sk, 0, 0);" Could you tell me why do you
> > > force this call for the no-TS case when obviously "0" is not a measured RTT?
> > > If you try to force icsk_rto to be initialized correctly, it is
> > > already initialized to
> > > TCP_TIMEOUT_INIT by tcp_create_openreq_child(). What am I missing?
> >
> > Hi,
> > the backoff reversion code uses __tcp_set_rto() to recalculate icsk_rto,
> > which itself relies on tp->srtt and rttvar.
> 
> Guess you are talking about
> 
> inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
>                                          icsk->icsk_backoff;
> 
> inside tcp_v4_err(), right? (I'm looking at 2.6.33 kernel.)
> 
> Yes it seems to be a bug when __tcp_set_rto() is called before
> tcp_rtt_estimator()
> gets a chance to initialize all the variables properly.
>
> But I don't like your fix of adding tcp_ack_update_rtt(sk, 0, 0); to
> tcp_rcv_state_process()
> because that means you've got a measured RTT of 0 (really meaning < 1 tick) for
> the no-TS case, which will cause tcp_rtt_estimator() to compute all
> the variables as if
> there has been a valid RTT measurement of 1.
> 
> A better fix IMHO is to make sure all the variables are properly
> initialized when exiting
> tcp_init_metrics(), e.g, if srtt remains 0, make sure
> tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
> (mdev already been initialized to TCP_TIMEOUT_INIT. I think you got
> hit by rttvar == 0)
> 
> > srtt is explicitly set to 0 in tcp_create_openreq_child(), so I didn't change it.
> > Initializing it with TCP_TIMEOUT_INIT should also fix that specific bug,
> > but I don't know if there are other impacts.
> 
> So what do I care? Because I'm mucking with the code in this area and your fix
> causes some conflict with my logic.
> 
> What do you think?

I didn't/don't like the fix either, but it's better than ICMP-RTO war that 
it fixed.

I think that most sensible solution to this issue would not be to 
initialize with the TCP_TIMEOUT_INIT bogosity either but do the real 
measurement for rtt which should be possible already. I suggested this 
already when the fix was made but nobody has yet found the time and
energy to code the rtt measurement for the early rtt when not using TS. 
Since you're going to make that lack of feature to go away, please change 
this illogical call too then while you go :-).

-- 
 i.

^ permalink raw reply

* [PATCH] virtif: initial interface extensions
From: Arnd Bergmann @ 2010-05-08 23:20 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw
In-Reply-To: <201005061842.51981.arnd@arndb.de>

Building on the work of Scott Feldman, this extends the netlink interface
to deal with not only port profiles but also CDCP multichannel and VDP
VSI registration.

The protocols are split apart into separate netlink attributes for
a cleaner separation. A device can have multiple IFLA_VIRTIF attributes,
each pointing to one of the slaves that get registered using one of
the protocols. In case of VDP, each IFLA_VIRTIF attribute needs
both an IFLA_VIRTIF_VSI and one or more IFLA_VIRTIF_VSI_MAC_VLAN
attributes.

The VF number is split out because it only makes sense when the
implementation is in the device driver or firmware, but not for
software-only implementations like we do for the initial VDP code
using LLDPAD. If a IFLA_VIRTIF_VF attribute is given, the kernel
takes care of the association through the device driver for that
VF, otherwise we do it in user space. This should work for each
of the three protocols.

This code is a first rough prototype, completely untested and
very likely buggy. This is also the first time I'm trying
to deal with netlink in the kernel, so chances are that I've
misunderstood something in a major way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/if_link.h   |   81 ++++++++++++++++++++++++++++------
 include/linux/netdevice.h |    8 ++--
 net/core/rtnetlink.c      |  106 ++++++++++++++++++++++++++++++++++-----------
 3 files changed, 152 insertions(+), 43 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index d763358..675d190 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,7 +116,7 @@ enum {
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VFINFO,
 	IFLA_STATS64,
-	IFLA_VF_PORT_PROFILE,
+	IFLA_VIRTIF,
 	__IFLA_MAX
 };
 
@@ -262,26 +262,79 @@ struct ifla_vf_info {
 };
 
 enum {
-	IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN,
-	IFLA_VF_PORT_PROFILE_STATUS_SUCCESS,
-	IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS,
-	IFLA_VF_PORT_PROFILE_STATUS_ERROR,
+	IFLA_VIRTIF_UNSPEC,
+	IFLA_VIRTIF_VF,
+	IFLA_VIRTIF_PORT_PROFILE, /* Cisco enic */
+	IFLA_VIRTIF_CHANNEL,	  /* 802.1Qbg CDCP */
+	IFLA_VIRTIF_VSI,	  /* 802.1Qbg VDP */
+	IFLA_VIRTIF_VSI_MAC_VLAN,
+	__IFLA_VIRTIF_MAX,
 };
 
-#define IFLA_VF_PORT_PROFILE_MAX	40
-#define IFLA_VF_UUID_MAX		40
-#define IFLA_VF_CLIENT_NAME_MAX		40
+#define IFLA_VIRTIF_MAX (__IFLA_VIRTIF_MAX - 1)
 
-struct ifla_vf_port_profile {
-	__u32 vf;
+enum {
+	VIRTIF_PORT_PROFILE_STATUS_UNKNOWN,
+	VIRTIF_PORT_PROFILE_STATUS_SUCCESS,
+	VIRTIF_PORT_PROFILE_STATUS_INPROGRESS,
+	VIRTIF_PORT_PROFILE_STATUS_ERROR,
+};
+
+#define VIRTIF_PORT_PROFILE_MAX	40
+#define VIRTIF_UUID_MAX		40
+#define VIRTIF_CLIENT_NAME_MAX	40
+
+struct ifla_virtif_port_profile {
 	__u32 flags;
 	__u32 status;
-	__u8 port_profile[IFLA_VF_PORT_PROFILE_MAX];
+	__u8 port_profile[VIRTIF_PORT_PROFILE_MAX];
 	__u8 mac[32];					/* MAX_ADDR_LEN */
 	/* UUID e.g. "CEEFD3B1-9E11-11DE-BDFD-000BAB01C0FB" */
-	__u8 host_uuid[IFLA_VF_UUID_MAX];
-	__u8 client_uuid[IFLA_VF_UUID_MAX];
-	__u8 client_name[IFLA_VF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+	__u8 host_uuid[VIRTIF_UUID_MAX];
+	__u8 client_uuid[VIRTIF_UUID_MAX];
+	__u8 client_name[VIRTIF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+};
+
+/*
+ * CDCP and VDP come from the 802.1Qbg standard, these definitions are taken
+ * from the respective TLV definitions in there. Adapters implementing CDCP
+ * or VDP can encapsulate the structures in LLDP/ECP headers and send them
+ * to the switch.
+ */
+struct ifla_virtif_channel {
+	__u16 scid;	/* S-Channel ID */
+	__u16 svid;	/* S-VLAN ID */
+};
+
+enum {
+	VIRTIF_VDP_REQUEST_PREASSOCIATE = 0,
+	VIRVIF_VDP_REQUEST_PREASSOCIATE_RR,
+	VIRVIF_VDP_REQUEST_ASSOCIATE,
+	VIRVIF_VDP_REQUEST_DISASSOCIATE,
+};
+
+enum {
+	VIRTIF_VDP_RESPONSE_SUCCESS = 0,
+	VIRTIF_VDP_RESPONSE_INVALID_FORMAT,
+	VIRTIF_VDP_RESPONSE_INSUFFICIENT_RESOURCES,
+	VIRTIF_VDP_RESPONSE_UNUSED_VTID,
+	VIRTIF_VDP_RESPONSE_VTID_VIOLATION,
+	VIRTIF_VDP_RESPONSE_VTID_VERSION_VIOLATION,
+	VIRTIF_VDP_RESPONSE_OUT_OF_SYNC,
+};
+
+struct ifla_virtif_vsi {
+	__u8 mode_request;
+	__u8 mode_response;
+	__u8 vsi_mgr_id;	/* these three define the policy */
+	__u8 vsi_type_id[3]; 	/*   for the guest */
+	__u8 vsi_type_version;
+	__u8 vsi_instance[16];	/* identifies the guest */
+};
+
+struct ifla_virtif_vsi_mac_vlan {
+	__u8 mac[6];
+	__u8 vlan_id[2];
 };
 
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f5b0be5..d549a3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -687,9 +687,9 @@ struct netdev_rx_queue {
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_port_profile)(struct net_device *dev, int vf,
- *				  struct ifla_vf_port_profile *ivp);
+ *				  struct virtif_port_profile *ivp);
  * int (*ndo_get_vf_port_profile)(struct net_device *dev, int vf,
- *				  struct ifla_vf_port_profile *ivp);
+ *				  struct virtif_port_profile *ivp);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -741,10 +741,10 @@ struct net_device_ops {
 						     struct ifla_vf_info *ivf);
 	int			(*ndo_set_vf_port_profile)(
 					struct net_device *dev, int vf,
-					struct ifla_vf_port_profile *ivp);
+					struct ifla_virtif_port_profile *ivp);
 	int			(*ndo_get_vf_port_profile)(
 					struct net_device *dev, int vf,
-					struct ifla_vf_port_profile *ivp);
+					struct ifla_virtif_port_profile *ivp);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2ef45b..cf5e328 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,6 +653,22 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		return 0;
 }
 
+static size_t rtnl_virtif_get_size(const struct net_device *dev)
+{
+	size_t total = 0;
+
+	if (dev->netdev_ops->ndo_get_vf_port_profile) {
+		total = dev_num_vf(dev->dev.parent) *
+			(nla_total_size(sizeof(struct ifla_virtif_port_profile))
+			 + nla_total_size(4)); /* IFLA_VIRTIF_VF */
+	}
+
+	/* fill in IFLA_VIRTIF_CHANNEL and IFLA_VIRTIF_VSI when needed */
+		
+	return total;
+}
+
+
 static inline size_t if_nlmsg_size(const struct net_device *dev)
 {
 	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
@@ -674,6 +690,38 @@ static inline size_t if_nlmsg_size(const struct net_device *dev)
 	       + nla_total_size(4) /* IFLA_NUM_VF */
 	       + nla_total_size(rtnl_vfinfo_size(dev)) /* IFLA_VFINFO */
 	       + rtnl_link_get_size(dev); /* IFLA_LINKINFO */
+	       + rtnl_virtif_get_size(dev); /* IFLA_VIRTIF */
+}
+
+static int rtnl_virtif_fill_ifinfo(struct sk_buff *skb, struct net_device *dev)
+{
+	if (dev->netdev_ops->ndo_get_vf_port_profile && dev->dev.parent) {
+		struct ifla_virtif_port_profile ivp;
+
+		if (dev_num_vf(dev->dev.parent)) {
+			int i;
+
+			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
+				if (dev->netdev_ops->ndo_get_vf_port_profile(
+					dev, i, &ivp))
+					break;
+				NLA_PUT_U32(skb, IFLA_VIRTIF_VF, i);
+				NLA_PUT(skb, IFLA_VIRTIF_PORT_PROFILE,
+					sizeof(ivp), &ivp);
+			}
+		}
+#if 0 /* Not sure how we should do this -arnd */
+		 else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
+			0, &ivp)) {
+			NLA_PUT(skb, VIRTIF_PORT_PROFILE,
+				sizeof(ivp), &ivp);
+		}
+#endif
+	}
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
 }
 
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
@@ -761,25 +809,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	if (dev->netdev_ops->ndo_get_vf_port_profile && dev->dev.parent) {
-		struct ifla_vf_port_profile ivp;
-
-		if (dev_num_vf(dev->dev.parent)) {
-			int i;
-
-			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
-				if (dev->netdev_ops->ndo_get_vf_port_profile(
-					dev, i, &ivp))
-					break;
-				NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
-					sizeof(ivp), &ivp);
-			}
-		} else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
-			0, &ivp)) {
-			NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
-				sizeof(ivp), &ivp);
-		}
-	}
+	if (rtnl_virtif_fill_ifinfo(skb, dev))
+		goto nla_put_failure;
 
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
@@ -847,8 +878,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 				    .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct ifla_vf_tx_rate) },
-	[IFLA_VF_PORT_PROFILE]	= { .type = NLA_BINARY,
-				    .len = sizeof(struct ifla_vf_port_profile)},
+	[IFLA_VIRTIF]		= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -857,6 +887,18 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
 	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
 };
 
+static const struct nla_policy ifla_virtif_policy[IFLA_VIRTIF_MAX+1] = {
+	[IFLA_VIRTIF_VF]	   = { .type = NLA_U32 },
+	[IFLA_VIRTIF_PORT_PROFILE] = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_port_profile) },
+	[IFLA_VIRTIF_CHANNEL]	   = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_channel) },
+	[IFLA_VIRTIF_VSI]	   = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_vsi) },
+	[IFLA_VIRTIF_VSI_MAC_VLAN] = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_vsi_mac_vlan) },
+};
+				    
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;
@@ -1053,16 +1095,30 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	}
 	err = 0;
 
-	if (tb[IFLA_VF_PORT_PROFILE]) {
-		struct ifla_vf_port_profile *ivp;
-		ivp = nla_data(tb[IFLA_VF_PORT_PROFILE]);
+	if (tb[IFLA_VIRTIF]) {
+		struct ifla_virtif_port_profile *ivp;
+		struct nlattr *virtif[IFLA_VIRTIF_MAX+1];
+		u32 vf;
+
+		err = nla_parse_nested(virtif, IFLA_VIRTIF_MAX,
+				       tb[IFLA_VIRTIF], ifla_virtif_policy);
+		if (err < 0)
+			return err;
+
+		if (!virtif[IFLA_VIRTIF_VF] || !virtif[IFLA_VIRTIF_PORT_PROFILE])
+			goto novirtif; /* IFLA_VIRTIF may be directed at user space */
+
+		vf = nla_get_u32(virtif[IFLA_VIRTIF_VF]);
+		ivp = nla_data(virtif[IFLA_VIRTIF_PORT_PROFILE]);
+
 		err = -EOPNOTSUPP;
 		if (ops->ndo_set_vf_port_profile)
-			err = ops->ndo_set_vf_port_profile(dev, ivp->vf, ivp);
+			err = ops->ndo_set_vf_port_profile(dev, vf, ivp);
 		if (err < 0)
 			goto errout;
 		modified = 1;
 	}
+novirtif:
 	err = 0;
 
 errout:
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO
From: FUJITA Tomonori @ 2010-05-09  4:00 UTC (permalink / raw)
  To: akpm, davem; +Cc: randy.dunlap, netdev

I think that it's a good idea to add what drivers should to do exactly
in the DMA mapping failure case.

Seems some networking drivers call dev_kfree_skb() and return
NETDEV_TX_OK, however, some return NETDEV_TX_BUSY without freeing the
skb. (and many don't even check the failure, which we need to fix).

What networking drivers are supposed to do in the case?

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO

Adds the concrete DMA mapping error handling for Networking drivers on
the transmit path.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 Documentation/DMA-API-HOWTO.txt |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 52618ab..8c3235c 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -740,6 +740,11 @@ failure can be determined by:
 		 */
 	}
 
+Networking drivers must call dev_kfree_skb to free the socket buffer
+and return NETDEV_TX_OK if the DMA mapping fails on the transmit hook
+(ndo_start_xmit). This means that the socket buffer is just dropped in
+the failure case.
+
 			   Closing
 
 This document, and the API itself, would not be in it's current
-- 
1.6.5


^ permalink raw reply related

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Herbert Xu @ 2010-05-09  6:48 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: David Miller, yoshfuji, netdev, shemminger
In-Reply-To: <20100427155533.GA5709@gondor.apana.org.au>

On Tue, Apr 27, 2010 at 11:55:33PM +0800, Herbert Xu wrote:
> On Tue, Apr 27, 2010 at 05:50:34PM +0200, Jiri Bohac wrote:
> >
> >   I think that what Herbert is doing is only going to enforce that
> >   the ip6_del_rt() is never going to fail, so the dst_free()
> >   won't ever be called anyway, right?
> 
> Yes I think you're right.  But let's fix the bigger problem first,
> and then we can audit this and possibly turn it into a WARN_ON.

Sorry for not fixing this sooner, but I'm back on the case.

Also, the dst_free really is needed.  It's there for the case
where we delete the address before DAD completion.  In that
case the route object is allocated, but not inserted in the
routing table.  So we must manually dst_free it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO
From: David Miller @ 2010-05-09  8:46 UTC (permalink / raw)
  To: fujita.tomonori; +Cc: akpm, randy.dunlap, netdev
In-Reply-To: <20100509130439D.fujita.tomonori@lab.ntt.co.jp>

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Sun, 9 May 2010 13:00:49 +0900

> I think that it's a good idea to add what drivers should to do exactly
> in the DMA mapping failure case.
> 
> Seems some networking drivers call dev_kfree_skb() and return
> NETDEV_TX_OK, however, some return NETDEV_TX_BUSY without freeing the
> skb. (and many don't even check the failure, which we need to fix).
> 
> What networking drivers are supposed to do in the case?

The best thing is probably just to drop, pretending the queue
is full and passing the packet back to the stack is just
full of complications and not worth it, so...

> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Subject: [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO
> 
> Adds the concrete DMA mapping error handling for Networking drivers on
> the transmit path.
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: virtio: put last_used and last_avail index into ring itself.
From: Michael S. Tsirkin @ 2010-05-09  8:57 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, virtualization, kvm, linux-kernel, mingo, linux-mm, akpm,
	hpa, gregory.haskins, s.hetze, Daniel Walker, Eric Dumazet
In-Reply-To: <201005071235.40590.rusty@rustcorp.com.au>

On Fri, May 07, 2010 at 12:35:39PM +0930, Rusty Russell wrote:
> On Thu, 6 May 2010 03:57:55 pm Michael S. Tsirkin wrote:
> > On Thu, May 06, 2010 at 10:22:12AM +0930, Rusty Russell wrote:
> > > On Wed, 5 May 2010 03:52:36 am Michael S. Tsirkin wrote:
> > > > What do you think?
> > > 
> > > I think everyone is settled on 128 byte cache lines for the forseeable
> > > future, so it's not really an issue.
> > 
> > You mean with 64 bit descriptors we will be bouncing a cache line
> > between host and guest, anyway?
> 
> I'm confused by this entire thread.
> 
> Descriptors are 16 bytes.  They are at the start, so presumably aligned to
> cache boundaries.
> 
> Available ring follows that at 2 bytes per entry, so it's also packed nicely
> into cachelines.
> 
> Then there's padding to page boundary.  That puts us on a cacheline again
> for the used ring; also 2 bytes per entry.
> 

Hmm, is used ring really 2 bytes per entry?


/* u32 is used here for ids for padding reasons. */
struct vring_used_elem {
        /* Index of start of used descriptor chain. */
        __u32 id;
        /* Total length of the descriptor chain which was used (written to) */
        __u32 len;
};

struct vring_used {
        __u16 flags;
        __u16 idx;
        struct vring_used_elem ring[];
};

> I don't see how any change in layout could be more cache friendly?
> Rusty.

I thought that used ring has 8 bytes per entry, and that struct
vring_used is aligned at page boundary, this
would mean that ring element is at offset 4 bytes from page boundary.
Thus with cacheline size 128 bytes, each 4th element crosses
a cacheline boundary. If we had a 4 byte padding after idx, each
used element would always be completely within a single cacheline.

What am I missing?
-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.
From: Michael S. Tsirkin @ 2010-05-09  9:26 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, davem@davemloft.net,
	jdike@linux.intel.com
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D790AB4A3F137@shsmsx502.ccr.corp.intel.com>

On Sat, May 08, 2010 at 03:55:48PM +0800, Xin, Xiaohui wrote:
> Michael,
> Sorry, somehow I missed this mail. :-(
> 
> >> Here, we have ever considered 2 ways to utilize the page constructor
> >> API to dispense the user buffers.
> >> 
> >> One:	Modify __alloc_skb() function a bit, it can only allocate a 
> >> 	structure of sk_buff, and the data pointer is pointing to a 
> >> 	user buffer which is coming from a page constructor API.
> >> 	Then the shinfo of the skb is also from guest.
> >> 	When packet is received from hardware, the skb->data is filled
> >> 	directly by h/w. What we have done is in this way.
> >> 
> >> 	Pros:	We can avoid any copy here.
> >> 	Cons:	Guest virtio-net driver needs to allocate skb as almost
> >> 		the same method with the host NIC drivers, say the size
> >> 		of netdev_alloc_skb() and the same reserved space in the
> >> 		head of skb. Many NIC drivers are the same with guest and
> >> 		ok for this. But some lastest NIC drivers reserves special
> >> 		room in skb head. To deal with it, we suggest to provide
> >> 		a method in guest virtio-net driver to ask for parameter
> >> 		we interest from the NIC driver when we know which device 
> >> 		we have bind to do zero-copy. Then we ask guest to do so.
> >> 		Is that reasonable?
> 
> >Do you still do this?
> 
> Currently, we still use the first way. But we now ignore the room which 
> host skb_reserve() required when device is doing zero-copy. Now we don't 
> taint guest virtio-net driver with a new method by this way.
> 
> >> Two:	Modify driver to get user buffer allocated from a page constructor
> >> 	API(to substitute alloc_page()), the user buffer are used as payload
> >> 	buffers and filled by h/w directly when packet is received. Driver
> >> 	should associate the pages with skb (skb_shinfo(skb)->frags). For 
> >> 	the head buffer side, let host allocates skb, and h/w fills it. 
> >> 	After that, the data filled in host skb header will be copied into
> >> 	guest header buffer which is submitted together with the payload buffer.
> >> 
> >> 	Pros:	We could less care the way how guest or host allocates their
> >> 		buffers.
> >> 	Cons:	We still need a bit copy here for the skb header.
> >> 
> >> We are not sure which way is the better here. This is the first thing we want
> >> to get comments from the community. We wish the modification to the network
> >> part will be generic which not used by vhost-net backend only, but a user
> >> application may use it as well when the zero-copy device may provides async
> >> read/write operations later.
> 
> >I commented on this in the past. Do you still want comments?
> 
> Now we continue with the first way and try to push it. But any comments about the two methods are still welcome.
> 
> >That's nice. The thing to do is probably to enable GSO/TSO
> >and see what we get this way. Also, mergeable buffer support
> >was recently posted and I hope to merge it for 2.6.35.
> >You might want to take a look.
> 
> I'm looking at the mergeable buffer. I think GSO/GRO support with zero-copy also needs it.
> Currently, GSO/TSO is still not supported by vhost-net?

GSO/TSO are currently supported with tap and macvtap,
AF_PACKET socket backend still needs some work to
enable GSO.

> -- 
> MST

^ permalink raw reply

* [PATCH 03/11] netdev: bfin_mac: invalid data cache only once for each new rx skb buffer
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

The skb buffer isn't actually used until we finish transferring and pass
it up to higher layers, so only invalidate the range once before we start
receiving actual data.  This also avoids the problem with data invalidating
on Blackfin systems -- there is no invalidate-only, just invalidate+flush.
So when running in writeback mode, there is the small (but not uncommon)
possibility of the flush overwriting valid DMA-ed data from the cache.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index c888465..f9ba598 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -203,6 +203,11 @@ static int desc_list_init(void)
 			goto init_error;
 		}
 		skb_reserve(new_skb, NET_IP_ALIGN);
+		/* Invidate the data cache of skb->data range when it is write back
+		 * cache. It will prevent overwritting the new data from DMA
+		 */
+		blackfin_dcache_invalidate_range((unsigned long)new_skb->head,
+					 (unsigned long)new_skb->end);
 		r->skb = new_skb;
 
 		/*
@@ -1012,19 +1017,17 @@ static void bfin_mac_rx(struct net_device *dev)
 	}
 	/* reserve 2 bytes for RXDWA padding */
 	skb_reserve(new_skb, NET_IP_ALIGN);
-	current_rx_ptr->skb = new_skb;
-	current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;
-
 	/* Invidate the data cache of skb->data range when it is write back
 	 * cache. It will prevent overwritting the new data from DMA
 	 */
 	blackfin_dcache_invalidate_range((unsigned long)new_skb->head,
 					 (unsigned long)new_skb->end);
 
+	current_rx_ptr->skb = new_skb;
+	current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;
+
 	len = (unsigned short)((current_rx_ptr->status.status_word) & RX_FRLEN);
 	skb_put(skb, len);
-	blackfin_dcache_invalidate_range((unsigned long)skb->head,
-					 (unsigned long)skb->tail);
 
 	skb->protocol = eth_type_trans(skb, dev);
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 02/11] netdev: bfin_mac: handler RX status errors
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Peter Meerwald, Graf Yang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Peter Meerwald <pmeerw@pmeerw.net>

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 6a9519f..c888465 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -981,12 +981,25 @@ out:
 	return NETDEV_TX_OK;
 }
 
+#define RX_ERROR_MASK (RX_LONG | RX_ALIGN | RX_CRC | RX_LEN | \
+	RX_FRAG | RX_ADDR | RX_DMAO | RX_PHY | RX_LATE | RX_RANGE)
+
 static void bfin_mac_rx(struct net_device *dev)
 {
 	struct sk_buff *skb, *new_skb;
 	unsigned short len;
 	struct bfin_mac_local *lp __maybe_unused = netdev_priv(dev);
 
+	/* check if frame status word reports an error condition
+	 * we which case we simply drop the packet
+	 */
+	if (current_rx_ptr->status.status_word & RX_ERROR_MASK) {
+		printk(KERN_NOTICE DRV_NAME
+		       ": rx: receive error - packet dropped\n");
+		dev->stats.rx_dropped++;
+		goto out;
+	}
+
 	/* allocate a new skb for next time receive */
 	skb = current_rx_ptr->skb;
 
@@ -1025,11 +1038,9 @@ static void bfin_mac_rx(struct net_device *dev)
 	netif_rx(skb);
 	dev->stats.rx_packets++;
 	dev->stats.rx_bytes += len;
+out:
 	current_rx_ptr->status.status_word = 0x00000000;
 	current_rx_ptr = current_rx_ptr->next;
-
-out:
-	return;
 }
 
 /* interrupt routine to handle rx and error signal */
-- 
1.7.1


^ permalink raw reply related

* [PATCH 05/11] netdev: bfin_mac: clear RXCKS if hardware generated checksum is not enabled
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

Otherwise we might be get a setting mismatch from a previous module or
bootloader and what the driver currently expects.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 5b00fc8..6d69bbb 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -515,10 +515,11 @@ void setup_system_regs(struct net_device *dev)
 	 * Configure checksum support and rcve frame word alignment
 	 */
 	sysctl = bfin_read_EMAC_SYSCTL();
+	sysctl |= RXDWA;
 #if defined(BFIN_MAC_CSUM_OFFLOAD)
-	sysctl |= RXDWA | RXCKS;
+	sysctl |= RXCKS;
 #else
-	sysctl |= RXDWA;
+	sysctl &= ~RXCKS;
 #endif
 	bfin_write_EMAC_SYSCTL(sysctl);
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 06/11] netdev: bfin_mac: avoid tx skb overflows in the tx DMA ring
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 6d69bbb..0b5ea01 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -920,6 +920,9 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 	u32 data_align = (unsigned long)(skb->data) & 0x3;
 	union skb_shared_tx *shtx = skb_tx(skb);
 
+	if (current_tx_ptr->next == tx_list_head)
+		return NETDEV_TX_BUSY;
+
 	current_tx_ptr->skb = skb;
 
 	if (data_align == 0x2) {
-- 
1.7.1


^ permalink raw reply related

* [PATCH 07/11] netdev: bfin_mac: add support for wake-on-lan magic packets
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Michael Hennerich
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Michael Hennerich <michael.hennerich@analog.com>

Note that WOL works only in PM Suspend Standby Mode (Sleep Mode).

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   76 +++++++++++++++++++++++++++++++++++++++++++++--
 drivers/net/bfin_mac.h |    3 ++
 2 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 0b5ea01..e80ccaa 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -464,6 +464,14 @@ static int mii_probe(struct net_device *dev)
  * Ethtool support
  */
 
+/*
+ * interrupt routine for magic packet wakeup
+ */
+static irqreturn_t bfin_mac_wake_interrupt(int irq, void *dev_id)
+{
+	return IRQ_HANDLED;
+}
+
 static int
 bfin_mac_ethtool_getsettings(struct net_device *dev, struct ethtool_cmd *cmd)
 {
@@ -498,11 +506,57 @@ static void bfin_mac_ethtool_getdrvinfo(struct net_device *dev,
 	strcpy(info->bus_info, dev_name(&dev->dev));
 }
 
+static void bfin_mac_ethtool_getwol(struct net_device *dev,
+	struct ethtool_wolinfo *wolinfo)
+{
+	struct bfin_mac_local *lp = netdev_priv(dev);
+
+	wolinfo->supported = WAKE_MAGIC;
+	wolinfo->wolopts = lp->wol;
+}
+
+static int bfin_mac_ethtool_setwol(struct net_device *dev,
+	struct ethtool_wolinfo *wolinfo)
+{
+	struct bfin_mac_local *lp = netdev_priv(dev);
+	int rc;
+
+	if (wolinfo->wolopts & (WAKE_MAGICSECURE |
+				WAKE_UCAST |
+				WAKE_MCAST |
+				WAKE_BCAST |
+				WAKE_ARP))
+		return -EOPNOTSUPP;
+
+	lp->wol = wolinfo->wolopts;
+
+	if (lp->wol && !lp->irq_wake_requested) {
+		/* register wake irq handler */
+		rc = request_irq(IRQ_MAC_WAKEDET, bfin_mac_wake_interrupt,
+				 IRQF_DISABLED, "EMAC_WAKE", dev);
+		if (rc)
+			return rc;
+		lp->irq_wake_requested = true;
+	}
+
+	if (!lp->wol && lp->irq_wake_requested) {
+		free_irq(IRQ_MAC_WAKEDET, dev);
+		lp->irq_wake_requested = false;
+	}
+
+	/* Make sure the PHY driver doesn't suspend */
+	device_init_wakeup(&dev->dev, lp->wol);
+
+	return 0;
+}
+
 static const struct ethtool_ops bfin_mac_ethtool_ops = {
 	.get_settings = bfin_mac_ethtool_getsettings,
 	.set_settings = bfin_mac_ethtool_setsettings,
 	.get_link = ethtool_op_get_link,
 	.get_drvinfo = bfin_mac_ethtool_getdrvinfo,
+	.get_wol = bfin_mac_ethtool_getwol,
+	.set_wol = bfin_mac_ethtool_setwol,
 };
 
 /**************************************************************************/
@@ -1472,9 +1526,16 @@ static int __devexit bfin_mac_remove(struct platform_device *pdev)
 static int bfin_mac_suspend(struct platform_device *pdev, pm_message_t mesg)
 {
 	struct net_device *net_dev = platform_get_drvdata(pdev);
+	struct bfin_mac_local *lp = netdev_priv(net_dev);
 
-	if (netif_running(net_dev))
-		bfin_mac_close(net_dev);
+	if (lp->wol) {
+		bfin_write_EMAC_OPMODE((bfin_read_EMAC_OPMODE() & ~TE) | RE);
+		bfin_write_EMAC_WKUP_CTL(MPKE);
+		enable_irq_wake(IRQ_MAC_WAKEDET);
+	} else {
+		if (netif_running(net_dev))
+			bfin_mac_close(net_dev);
+	}
 
 	return 0;
 }
@@ -1482,9 +1543,16 @@ static int bfin_mac_suspend(struct platform_device *pdev, pm_message_t mesg)
 static int bfin_mac_resume(struct platform_device *pdev)
 {
 	struct net_device *net_dev = platform_get_drvdata(pdev);
+	struct bfin_mac_local *lp = netdev_priv(net_dev);
 
-	if (netif_running(net_dev))
-		bfin_mac_open(net_dev);
+	if (lp->wol) {
+		bfin_write_EMAC_OPMODE(bfin_read_EMAC_OPMODE() | TE);
+		bfin_write_EMAC_WKUP_CTL(0);
+		disable_irq_wake(IRQ_MAC_WAKEDET);
+	} else {
+		if (netif_running(net_dev))
+			bfin_mac_open(net_dev);
+	}
 
 	return 0;
 }
diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index 87c454f..1ae7b82 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -66,6 +66,9 @@ struct bfin_mac_local {
 	unsigned char Mac[6];	/* MAC address of the board */
 	spinlock_t lock;
 
+	int wol;		/* Wake On Lan */
+	int irq_wake_requested;
+
 	/* MII and PHY stuffs */
 	int old_link;          /* used by bf537_adjust_link */
 	int old_speed;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 08/11] netdev: bfin_mac: use promiscuous flag for promiscuous mode
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

Rather than using the Receive All Frames (RAF) bit to enable promiscuous
mode, use the Promiscuous (PR) bit.  This lowers overhead at runtime as
we let the hardware process the packets that should actually be checked.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index e80ccaa..04cda8f 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1274,7 +1274,7 @@ static void bfin_mac_set_multicast_list(struct net_device *dev)
 	if (dev->flags & IFF_PROMISC) {
 		printk(KERN_INFO "%s: set to promisc mode\n", dev->name);
 		sysctl = bfin_read_EMAC_OPMODE();
-		sysctl |= RAF;
+		sysctl |= PR;
 		bfin_write_EMAC_OPMODE(sysctl);
 	} else if (dev->flags & IFF_ALLMULTI) {
 		/* accept all multicast */
-- 
1.7.1


^ permalink raw reply related

* [PATCH 09/11] netdev: bfin_mac: only use hardware checksum in normal IPv4 mode
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Jon Kowal, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Jon Kowal <jon.kowal@dspecialists.de>

The Blackfin on-chip MAC checksum logic only works when the IP packet has
a header length of 20 bytes.  This is true for most IPv4 packets, but not
for IPv6 packets or IPv4 packets which use header options.  So only use
the hardware checksum when appropriate.

Signed-off-by: Jon Kowal <jon.kowal@dspecialists.de>
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   42 +++++++++++++++++++++++++-----------------
 1 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 04cda8f..00d0f25 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1044,6 +1044,7 @@ out:
 	return NETDEV_TX_OK;
 }
 
+#define IP_HEADER_OFF  0
 #define RX_ERROR_MASK (RX_LONG | RX_ALIGN | RX_CRC | RX_LEN | \
 	RX_FRAG | RX_ADDR | RX_DMAO | RX_PHY | RX_LATE | RX_RANGE)
 
@@ -1098,25 +1099,32 @@ static void bfin_mac_rx(struct net_device *dev)
 	bfin_rx_hwtstamp(dev, skb);
 
 #if defined(BFIN_MAC_CSUM_OFFLOAD)
-	skb->csum = current_rx_ptr->status.ip_payload_csum;
-	/*
-	 * Deduce Ethernet FCS from hardware generated IP payload checksum.
-	 * IP checksum is based on 16-bit one's complement algorithm.
-	 * To deduce a value from checksum is equal to add its inversion.
-	 * If the IP payload len is odd, the inversed FCS should also
-	 * begin from odd address and leave first byte zero.
+	/* Checksum offloading only works for IPv4 packets with the standard IP header
+	 * length of 20 bytes, because the blackfin MAC checksum calculation is
+	 * based on that assumption. We must NOT use the calculated checksum if our
+	 * IP version or header break that assumption.
 	 */
-	if (skb->len % 2) {
-		fcs[0] = 0;
-		for (i = 0; i < ETH_FCS_LEN; i++)
-			fcs[i + 1] = ~skb->data[skb->len + i];
-		skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1, skb->csum);
-	} else {
-		for (i = 0; i < ETH_FCS_LEN; i++)
-			fcs[i] = ~skb->data[skb->len + i];
-		skb->csum = csum_partial(fcs, ETH_FCS_LEN, skb->csum);
+	if (skb->data[IP_HEADER_OFF] == 0x45) {
+		skb->csum = current_rx_ptr->status.ip_payload_csum;
+		/*
+		 * Deduce Ethernet FCS from hardware generated IP payload checksum.
+		 * IP checksum is based on 16-bit one's complement algorithm.
+		 * To deduce a value from checksum is equal to add its inversion.
+		 * If the IP payload len is odd, the inversed FCS should also
+		 * begin from odd address and leave first byte zero.
+		 */
+		if (skb->len % 2) {
+			fcs[0] = 0;
+			for (i = 0; i < ETH_FCS_LEN; i++)
+				fcs[i + 1] = ~skb->data[skb->len + i];
+			skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1, skb->csum);
+		} else {
+			for (i = 0; i < ETH_FCS_LEN; i++)
+				fcs[i] = ~skb->data[skb->len + i];
+			skb->csum = csum_partial(fcs, ETH_FCS_LEN, skb->csum);
+		}
+		skb->ip_summed = CHECKSUM_COMPLETE;
 	}
-	skb->ip_summed = CHECKSUM_COMPLETE;
 #endif
 
 	netif_rx(skb);
-- 
1.7.1


^ permalink raw reply related

* [PATCH 10/11] netdev: bfin_mac: handle timeouts with the MDIO registers gracefully
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

Have the low level MDIO functions pass back up timeout information so we
don't waste time polling them multiple times when there is a problem, and
so we don't let higher layers think the device is available when it isn't.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   53 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 00d0f25..a4c9ab2 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -81,9 +81,6 @@ static u16 pin_req[] = P_RMII0;
 static u16 pin_req[] = P_MII0;
 #endif
 
-static void bfin_mac_disable(void);
-static void bfin_mac_enable(void);
-
 static void desc_list_free(void)
 {
 	struct net_dma_desc_rx *r;
@@ -260,7 +257,7 @@ init_error:
  * MII operations
  */
 /* Wait until the previous MDC/MDIO transaction has completed */
-static void bfin_mdio_poll(void)
+static int bfin_mdio_poll(void)
 {
 	int timeout_cnt = MAX_TIMEOUT_CNT;
 
@@ -270,22 +267,30 @@ static void bfin_mdio_poll(void)
 		if (timeout_cnt-- < 0) {
 			printk(KERN_ERR DRV_NAME
 			": wait MDC/MDIO transaction to complete timeout\n");
-			break;
+			return -ETIMEDOUT;
 		}
 	}
+
+	return 0;
 }
 
 /* Read an off-chip register in a PHY through the MDC/MDIO port */
 static int bfin_mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum)
 {
-	bfin_mdio_poll();
+	int ret;
+
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	/* read mode */
 	bfin_write_EMAC_STAADD(SET_PHYAD((u16) phy_addr) |
 				SET_REGAD((u16) regnum) |
 				STABUSY);
 
-	bfin_mdio_poll();
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	return (int) bfin_read_EMAC_STADAT();
 }
@@ -294,7 +299,11 @@ static int bfin_mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum)
 static int bfin_mdiobus_write(struct mii_bus *bus, int phy_addr, int regnum,
 			      u16 value)
 {
-	bfin_mdio_poll();
+	int ret;
+
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	bfin_write_EMAC_STADAT((u32) value);
 
@@ -304,9 +313,7 @@ static int bfin_mdiobus_write(struct mii_bus *bus, int phy_addr, int regnum,
 				STAOP |
 				STABUSY);
 
-	bfin_mdio_poll();
-
-	return 0;
+	return bfin_mdio_poll();
 }
 
 static int bfin_mdiobus_reset(struct mii_bus *bus)
@@ -1184,8 +1191,9 @@ static void bfin_mac_disable(void)
 /*
  * Enable Interrupts, Receive, and Transmit
  */
-static void bfin_mac_enable(void)
+static int bfin_mac_enable(void)
 {
+	int ret;
 	u32 opmode;
 
 	pr_debug("%s: %s\n", DRV_NAME, __func__);
@@ -1195,7 +1203,9 @@ static void bfin_mac_enable(void)
 	bfin_write_DMA1_CONFIG(rx_list_head->desc_a.config);
 
 	/* Wait MII done */
-	bfin_mdio_poll();
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	/* We enable only RX here */
 	/* ASTP   : Enable Automatic Pad Stripping
@@ -1219,6 +1229,8 @@ static void bfin_mac_enable(void)
 #endif
 	/* Turn on the EMAC rx */
 	bfin_write_EMAC_OPMODE(opmode);
+
+	return 0;
 }
 
 /* Our watchdog timed out. Called by the networking layer */
@@ -1333,7 +1345,7 @@ static void bfin_mac_shutdown(struct net_device *dev)
 static int bfin_mac_open(struct net_device *dev)
 {
 	struct bfin_mac_local *lp = netdev_priv(dev);
-	int retval;
+	int ret;
 	pr_debug("%s: %s\n", dev->name, __func__);
 
 	/*
@@ -1347,18 +1359,21 @@ static int bfin_mac_open(struct net_device *dev)
 	}
 
 	/* initial rx and tx list */
-	retval = desc_list_init();
-
-	if (retval)
-		return retval;
+	ret = desc_list_init();
+	if (ret)
+		return ret;
 
 	phy_start(lp->phydev);
 	phy_write(lp->phydev, MII_BMCR, BMCR_RESET);
 	setup_system_regs(dev);
 	setup_mac_addr(dev->dev_addr);
+
 	bfin_mac_disable();
-	bfin_mac_enable();
+	ret = bfin_mac_enable();
+	if (ret)
+		return ret;
 	pr_debug("hardware init finished\n");
+
 	netif_start_queue(dev);
 	netif_carrier_on(dev);
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 11/11] netdev: bfin_mac: check for mii_bus platform data
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

If the platform data for the mii_bus is missing, gracefully error out
rather than deference NULL pointers.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index a4c9ab2..c65d5e1 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1472,6 +1472,11 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	}
 	pd = pdev->dev.platform_data;
 	lp->mii_bus = platform_get_drvdata(pd);
+	if (!lp->mii_bus) {
+		dev_err(&pdev->dev, "Cannot get mii_bus!\n");
+		rc = -ENODEV;
+		goto out_err_mii_bus_probe;
+	}
 	lp->mii_bus->priv = ndev;
 
 	rc = mii_probe(ndev);
@@ -1517,6 +1522,7 @@ out_err_request_irq:
 out_err_mii_probe:
 	mdiobus_unregister(lp->mii_bus);
 	mdiobus_free(lp->mii_bus);
+out_err_mii_bus_probe:
 	peripheral_free_list(pin_req);
 out_err_probe_mac:
 	platform_set_drvdata(pdev, NULL);
-- 
1.7.1


^ permalink raw reply related

* [PATCH 01/11] netdev: bfin_mac: add support for IEEE 1588 PTP
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller
  Cc: uclinux-dist-devel-ZG0+EudsQA8dtHy/vicBwGD2FQJk+8+b, Barry Song

From: Barry Song <barry.song-OyLXuOCK7orQT0dZR+AlfA@public.gmane.org>

Newer on-chip MAC peripherals support IEEE 1588 PTP in the hardware, so
extend the driver to support this functionality.

Signed-off-by: Barry Song <barry.song-OyLXuOCK7orQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
---
 drivers/net/Kconfig    |    7 +
 drivers/net/bfin_mac.c |  341 +++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/bfin_mac.h |   15 ++
 3 files changed, 362 insertions(+), 1 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 7b832c7..ac536b9 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -887,6 +887,13 @@ config BFIN_MAC_RMII
 	help
 	  Use Reduced PHY MII Interface
 
+config BFIN_MAC_USE_HWSTAMP
+	bool "Use IEEE 1588 hwstamp"
+	depends on BFIN_MAC && BF518
+	default y
+	help
+	  To support the IEEE 1588 Precision Time Protocol (PTP), select y here
+
 config SMC9194
 	tristate "SMC 9194 support"
 	depends on NET_VENDOR_SMC && (ISA || MAC && BROKEN)
diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 587f93c..6a9519f 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -33,6 +33,7 @@
 #include <asm/dma.h>
 #include <linux/dma-mapping.h>
 
+#include <asm/div64.h>
 #include <asm/dpmc.h>
 #include <asm/blackfin.h>
 #include <asm/cacheflush.h>
@@ -551,6 +552,309 @@ static int bfin_mac_set_mac_address(struct net_device *dev, void *p)
 	return 0;
 }
 
+#ifdef CONFIG_BFIN_MAC_USE_HWSTAMP
+#define bfin_mac_hwtstamp_is_none(cfg) ((cfg) == HWTSTAMP_FILTER_NONE)
+
+static int bfin_mac_hwtstamp_ioctl(struct net_device *netdev,
+		struct ifreq *ifr, int cmd)
+{
+	struct hwtstamp_config config;
+	struct bfin_mac_local *lp = netdev_priv(netdev);
+	u16 ptpctl;
+	u32 ptpfv1, ptpfv2, ptpfv3, ptpfoff;
+
+	if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+		return -EFAULT;
+
+	pr_debug("%s config flag:0x%x, tx_type:0x%x, rx_filter:0x%x\n",
+			__func__, config.flags, config.tx_type, config.rx_filter);
+
+	/* reserved for future extensions */
+	if (config.flags)
+		return -EINVAL;
+
+	if ((config.tx_type != HWTSTAMP_TX_OFF) &&
+			(config.tx_type != HWTSTAMP_TX_ON))
+		return -ERANGE;
+
+	ptpctl = bfin_read_EMAC_PTP_CTL();
+
+	switch (config.rx_filter) {
+	case HWTSTAMP_FILTER_NONE:
+		/*
+		 * Dont allow any timestamping
+		 */
+		ptpfv3 = 0xFFFFFFFF;
+		bfin_write_EMAC_PTP_FV3(ptpfv3);
+		break;
+	case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+		/*
+		 * Clear the five comparison mask bits (bits[12:8]) in EMAC_PTP_CTL)
+		 * to enable all the field matches.
+		 */
+		ptpctl &= ~0x1F00;
+		bfin_write_EMAC_PTP_CTL(ptpctl);
+		/*
+		 * Keep the default values of the EMAC_PTP_FOFF register.
+		 */
+		ptpfoff = 0x4A24170C;
+		bfin_write_EMAC_PTP_FOFF(ptpfoff);
+		/*
+		 * Keep the default values of the EMAC_PTP_FV1 and EMAC_PTP_FV2
+		 * registers.
+		 */
+		ptpfv1 = 0x11040800;
+		bfin_write_EMAC_PTP_FV1(ptpfv1);
+		ptpfv2 = 0x0140013F;
+		bfin_write_EMAC_PTP_FV2(ptpfv2);
+		/*
+		 * The default value (0xFFFC) allows the timestamping of both
+		 * received Sync messages and Delay_Req messages.
+		 */
+		ptpfv3 = 0xFFFFFFFC;
+		bfin_write_EMAC_PTP_FV3(ptpfv3);
+
+		config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+		/* Clear all five comparison mask bits (bits[12:8]) in the
+		 * EMAC_PTP_CTL register to enable all the field matches.
+		 */
+		ptpctl &= ~0x1F00;
+		bfin_write_EMAC_PTP_CTL(ptpctl);
+		/*
+		 * Keep the default values of the EMAC_PTP_FOFF register, except set
+		 * the PTPCOF field to 0x2A.
+		 */
+		ptpfoff = 0x2A24170C;
+		bfin_write_EMAC_PTP_FOFF(ptpfoff);
+		/*
+		 * Keep the default values of the EMAC_PTP_FV1 and EMAC_PTP_FV2
+		 * registers.
+		 */
+		ptpfv1 = 0x11040800;
+		bfin_write_EMAC_PTP_FV1(ptpfv1);
+		ptpfv2 = 0x0140013F;
+		bfin_write_EMAC_PTP_FV2(ptpfv2);
+		/*
+		 * To allow the timestamping of Pdelay_Req and Pdelay_Resp, set
+		 * the value to 0xFFF0.
+		 */
+		ptpfv3 = 0xFFFFFFF0;
+		bfin_write_EMAC_PTP_FV3(ptpfv3);
+
+		config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
+		break;
+	case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+		/*
+		 * Clear bits 8 and 12 of the EMAC_PTP_CTL register to enable only the
+		 * EFTM and PTPCM field comparison.
+		 */
+		ptpctl &= ~0x1100;
+		bfin_write_EMAC_PTP_CTL(ptpctl);
+		/*
+		 * Keep the default values of all the fields of the EMAC_PTP_FOFF
+		 * register, except set the PTPCOF field to 0x0E.
+		 */
+		ptpfoff = 0x0E24170C;
+		bfin_write_EMAC_PTP_FOFF(ptpfoff);
+		/*
+		 * Program bits [15:0] of the EMAC_PTP_FV1 register to 0x88F7, which
+		 * corresponds to PTP messages on the MAC layer.
+		 */
+		ptpfv1 = 0x110488F7;
+		bfin_write_EMAC_PTP_FV1(ptpfv1);
+		ptpfv2 = 0x0140013F;
+		bfin_write_EMAC_PTP_FV2(ptpfv2);
+		/*
+		 * To allow the timestamping of Pdelay_Req and Pdelay_Resp
+		 * messages, set the value to 0xFFF0.
+		 */
+		ptpfv3 = 0xFFFFFFF0;
+		bfin_write_EMAC_PTP_FV3(ptpfv3);
+
+		config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L2_EVENT;
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	if (config.tx_type == HWTSTAMP_TX_OFF &&
+	    bfin_mac_hwtstamp_is_none(config.rx_filter)) {
+		ptpctl &= ~PTP_EN;
+		bfin_write_EMAC_PTP_CTL(ptpctl);
+
+		SSYNC();
+	} else {
+		ptpctl |= PTP_EN;
+		bfin_write_EMAC_PTP_CTL(ptpctl);
+
+		/*
+		 * clear any existing timestamp
+		 */
+		bfin_read_EMAC_PTP_RXSNAPLO();
+		bfin_read_EMAC_PTP_RXSNAPHI();
+
+		bfin_read_EMAC_PTP_TXSNAPLO();
+		bfin_read_EMAC_PTP_TXSNAPHI();
+
+		/*
+		 * Set registers so that rollover occurs soon to test this.
+		 */
+		bfin_write_EMAC_PTP_TIMELO(0x00000000);
+		bfin_write_EMAC_PTP_TIMEHI(0xFF800000);
+
+		SSYNC();
+
+		lp->compare.last_update = 0;
+		timecounter_init(&lp->clock,
+				&lp->cycles,
+				ktime_to_ns(ktime_get_real()));
+		timecompare_update(&lp->compare, 0);
+	}
+
+	lp->stamp_cfg = config;
+	return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ?
+		-EFAULT : 0;
+}
+
+static void bfin_dump_hwtamp(char *s, ktime_t *hw, ktime_t *ts, struct timecompare *cmp)
+{
+	ktime_t sys = ktime_get_real();
+
+	pr_debug("%s %s hardware:%d,%d transform system:%d,%d system:%d,%d, cmp:%lld, %lld\n",
+			__func__, s, hw->tv.sec, hw->tv.nsec, ts->tv.sec, ts->tv.nsec, sys.tv.sec,
+			sys.tv.nsec, cmp->offset, cmp->skew);
+}
+
+static void bfin_tx_hwtstamp(struct net_device *netdev, struct sk_buff *skb)
+{
+	struct bfin_mac_local *lp = netdev_priv(netdev);
+	union skb_shared_tx *shtx = skb_tx(skb);
+
+	if (shtx->hardware) {
+		int timeout_cnt = MAX_TIMEOUT_CNT;
+
+		/* When doing time stamping, keep the connection to the socket
+		 * a while longer
+		 */
+		shtx->in_progress = 1;
+
+		/*
+		 * The timestamping is done at the EMAC module's MII/RMII interface
+		 * when the module sees the Start of Frame of an event message packet. This
+		 * interface is the closest possible place to the physical Ethernet transmission
+		 * medium, providing the best timing accuracy.
+		 */
+		while ((!(bfin_read_EMAC_PTP_ISTAT() & TXTL)) && (--timeout_cnt))
+			udelay(1);
+		if (timeout_cnt == 0)
+			printk(KERN_ERR DRV_NAME
+					": fails to timestamp the TX packet\n");
+		else {
+			struct skb_shared_hwtstamps shhwtstamps;
+			u64 ns;
+			u64 regval;
+
+			regval = bfin_read_EMAC_PTP_TXSNAPLO();
+			regval |= (u64)bfin_read_EMAC_PTP_TXSNAPHI() << 32;
+			memset(&shhwtstamps, 0, sizeof(shhwtstamps));
+			ns = timecounter_cyc2time(&lp->clock,
+					regval);
+			timecompare_update(&lp->compare, ns);
+			shhwtstamps.hwtstamp = ns_to_ktime(ns);
+			shhwtstamps.syststamp =
+				timecompare_transform(&lp->compare, ns);
+			skb_tstamp_tx(skb, &shhwtstamps);
+
+			bfin_dump_hwtamp("TX", &shhwtstamps.hwtstamp, &shhwtstamps.syststamp, &lp->compare);
+		}
+	}
+}
+
+static void bfin_rx_hwtstamp(struct net_device *netdev, struct sk_buff *skb)
+{
+	struct bfin_mac_local *lp = netdev_priv(netdev);
+	u32 valid;
+	u64 regval, ns;
+	struct skb_shared_hwtstamps *shhwtstamps;
+
+	if (bfin_mac_hwtstamp_is_none(lp->stamp_cfg.rx_filter))
+		return;
+
+	valid = bfin_read_EMAC_PTP_ISTAT() & RXEL;
+	if (!valid)
+		return;
+
+	shhwtstamps = skb_hwtstamps(skb);
+
+	regval = bfin_read_EMAC_PTP_RXSNAPLO();
+	regval |= (u64)bfin_read_EMAC_PTP_RXSNAPHI() << 32;
+	ns = timecounter_cyc2time(&lp->clock, regval);
+	timecompare_update(&lp->compare, ns);
+	memset(shhwtstamps, 0, sizeof(*shhwtstamps));
+	shhwtstamps->hwtstamp = ns_to_ktime(ns);
+	shhwtstamps->syststamp = timecompare_transform(&lp->compare, ns);
+
+	bfin_dump_hwtamp("RX", &shhwtstamps->hwtstamp, &shhwtstamps->syststamp, &lp->compare);
+}
+
+/*
+ * bfin_read_clock - read raw cycle counter (to be used by time counter)
+ */
+static cycle_t bfin_read_clock(const struct cyclecounter *tc)
+{
+	u64 stamp;
+
+	stamp =  bfin_read_EMAC_PTP_TIMELO();
+	stamp |= (u64)bfin_read_EMAC_PTP_TIMEHI() << 32ULL;
+
+	return stamp;
+}
+
+#define PTP_CLK 25000000
+
+static void bfin_mac_hwtstamp_init(struct net_device *netdev)
+{
+	struct bfin_mac_local *lp = netdev_priv(netdev);
+	u64 append;
+
+	/* Initialize hardware timer */
+	append = PTP_CLK * (1ULL << 32);
+	do_div(append, get_sclk());
+	bfin_write_EMAC_PTP_ADDEND((u32)append);
+
+	memset(&lp->cycles, 0, sizeof(lp->cycles));
+	lp->cycles.read = bfin_read_clock;
+	lp->cycles.mask = CLOCKSOURCE_MASK(64);
+	lp->cycles.mult = 1000000000 / PTP_CLK;
+	lp->cycles.shift = 0;
+
+	/* Synchronize our NIC clock against system wall clock */
+	memset(&lp->compare, 0, sizeof(lp->compare));
+	lp->compare.source = &lp->clock;
+	lp->compare.target = ktime_get_real;
+	lp->compare.num_samples = 10;
+
+	/* Initialize hwstamp config */
+	lp->stamp_cfg.rx_filter = HWTSTAMP_FILTER_NONE;
+	lp->stamp_cfg.tx_type = HWTSTAMP_TX_OFF;
+}
+
+#else
+# define bfin_mac_hwtstamp_is_none(cfg) 0
+# define bfin_mac_hwtstamp_init(dev)
+# define bfin_mac_hwtstamp_ioctl(dev, ifr, cmd) (-EOPNOTSUPP)
+# define bfin_rx_hwtstamp(dev, skb)
+# define bfin_tx_hwtstamp(dev, skb)
+#endif
+
 static void adjust_tx_list(void)
 {
 	int timeout_cnt = MAX_TIMEOUT_CNT;
@@ -608,18 +912,32 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 {
 	u16 *data;
 	u32 data_align = (unsigned long)(skb->data) & 0x3;
+	union skb_shared_tx *shtx = skb_tx(skb);
+
 	current_tx_ptr->skb = skb;
 
 	if (data_align == 0x2) {
 		/* move skb->data to current_tx_ptr payload */
 		data = (u16 *)(skb->data) - 1;
-				*data = (u16)(skb->len);
+		*data = (u16)(skb->len);
+		/*
+		 * When transmitting an Ethernet packet, the PTP_TSYNC module requires
+		 * a DMA_Length_Word field associated with the packet. The lower 12 bits
+		 * of this field are the length of the packet payload in bytes and the higher
+		 * 4 bits are the timestamping enable field.
+		 */
+		if (shtx->hardware)
+			*data |= 0x1000;
+
 		current_tx_ptr->desc_a.start_addr = (u32)data;
 		/* this is important! */
 		blackfin_dcache_flush_range((u32)data,
 				(u32)((u8 *)data + skb->len + 4));
 	} else {
 		*((u16 *)(current_tx_ptr->packet)) = (u16)(skb->len);
+		/* enable timestamping for the sent packet */
+		if (shtx->hardware)
+			*((u16 *)(current_tx_ptr->packet)) |= 0x1000;
 		memcpy((u8 *)(current_tx_ptr->packet + 2), skb->data,
 			skb->len);
 		current_tx_ptr->desc_a.start_addr =
@@ -653,6 +971,9 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 
 out:
 	adjust_tx_list();
+
+	bfin_tx_hwtstamp(dev, skb);
+
 	current_tx_ptr = current_tx_ptr->next;
 	dev->trans_start = jiffies;
 	dev->stats.tx_packets++;
@@ -664,9 +985,11 @@ static void bfin_mac_rx(struct net_device *dev)
 {
 	struct sk_buff *skb, *new_skb;
 	unsigned short len;
+	struct bfin_mac_local *lp __maybe_unused = netdev_priv(dev);
 
 	/* allocate a new skb for next time receive */
 	skb = current_rx_ptr->skb;
+
 	new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);
 	if (!new_skb) {
 		printk(KERN_NOTICE DRV_NAME
@@ -691,6 +1014,9 @@ static void bfin_mac_rx(struct net_device *dev)
 					 (unsigned long)skb->tail);
 
 	skb->protocol = eth_type_trans(skb, dev);
+
+	bfin_rx_hwtstamp(dev, skb);
+
 #if defined(BFIN_MAC_CSUM_OFFLOAD)
 	skb->csum = current_rx_ptr->status.ip_payload_csum;
 	skb->ip_summed = CHECKSUM_COMPLETE;
@@ -874,6 +1200,16 @@ static void bfin_mac_set_multicast_list(struct net_device *dev)
 	}
 }
 
+static int bfin_mac_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+	switch (cmd) {
+	case SIOCSHWTSTAMP:
+		return bfin_mac_hwtstamp_ioctl(netdev, ifr, cmd);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 /*
  * this puts the device in an inactive state
  */
@@ -958,6 +1294,7 @@ static const struct net_device_ops bfin_mac_netdev_ops = {
 	.ndo_set_mac_address	= bfin_mac_set_mac_address,
 	.ndo_tx_timeout		= bfin_mac_timeout,
 	.ndo_set_multicast_list	= bfin_mac_set_multicast_list,
+	.ndo_do_ioctl           = bfin_mac_ioctl,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= eth_change_mtu,
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -1049,6 +1386,8 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 		goto out_err_reg_ndev;
 	}
 
+	bfin_mac_hwtstamp_init(ndev);
+
 	/* now, print out the card info, in a short format.. */
 	dev_info(&pdev->dev, "%s, Version %s\n", DRV_DESC, DRV_VERSION);
 
diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index 052b5dc..87c454f 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -7,6 +7,12 @@
  *
  * Licensed under the GPL-2 or later.
  */
+#ifndef _BFIN_MAC_H_
+#define _BFIN_MAC_H_
+
+#include <linux/net_tstamp.h>
+#include <linux/clocksource.h>
+#include <linux/timecompare.h>
 
 #define BFIN_MAC_CSUM_OFFLOAD
 
@@ -67,6 +73,15 @@ struct bfin_mac_local {
 
 	struct phy_device *phydev;
 	struct mii_bus *mii_bus;
+
+#if defined(CONFIG_BFIN_MAC_USE_HWSTAMP)
+	struct cyclecounter cycles;
+	struct timecounter clock;
+	struct timecompare compare;
+	struct hwtstamp_config stamp_cfg;
+#endif
 };
 
 extern void bfin_get_ether_addr(char *addr);
+
+#endif
-- 
1.7.1

^ permalink raw reply related

* [PATCH 04/11] netdev: bfin_mac: deduce Ethernet FCS from hardware IP payload checksum
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller
  Cc: uclinux-dist-devel-ZG0+EudsQA8dtHy/vicBwGD2FQJk+8+b
In-Reply-To: <1273400337-26501-1-git-send-email-vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>

From: Sonic Zhang <sonic.zhang-OyLXuOCK7orQT0dZR+AlfA@public.gmane.org>

IP checksum is based on 16-bit one's complement algorithm.
To deduce a value from checksum is equal to add its complement.

Signed-off-by: Sonic Zhang <sonic.zhang-OyLXuOCK7orQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
---
 drivers/net/bfin_mac.c |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index f9ba598..5b00fc8 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -994,6 +994,10 @@ static void bfin_mac_rx(struct net_device *dev)
 	struct sk_buff *skb, *new_skb;
 	unsigned short len;
 	struct bfin_mac_local *lp __maybe_unused = netdev_priv(dev);
+#if defined(BFIN_MAC_CSUM_OFFLOAD)
+	unsigned int i;
+	unsigned char fcs[ETH_FCS_LEN + 1];
+#endif
 
 	/* check if frame status word reports an error condition
 	 * we which case we simply drop the packet
@@ -1027,6 +1031,8 @@ static void bfin_mac_rx(struct net_device *dev)
 	current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;
 
 	len = (unsigned short)((current_rx_ptr->status.status_word) & RX_FRLEN);
+	/* Deduce Ethernet FCS length from Ethernet payload length */
+	len -= ETH_FCS_LEN;
 	skb_put(skb, len);
 
 	skb->protocol = eth_type_trans(skb, dev);
@@ -1035,6 +1041,23 @@ static void bfin_mac_rx(struct net_device *dev)
 
 #if defined(BFIN_MAC_CSUM_OFFLOAD)
 	skb->csum = current_rx_ptr->status.ip_payload_csum;
+	/*
+	 * Deduce Ethernet FCS from hardware generated IP payload checksum.
+	 * IP checksum is based on 16-bit one's complement algorithm.
+	 * To deduce a value from checksum is equal to add its inversion.
+	 * If the IP payload len is odd, the inversed FCS should also
+	 * begin from odd address and leave first byte zero.
+	 */
+	if (skb->len % 2) {
+		fcs[0] = 0;
+		for (i = 0; i < ETH_FCS_LEN; i++)
+			fcs[i + 1] = ~skb->data[skb->len + i];
+		skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1, skb->csum);
+	} else {
+		for (i = 0; i < ETH_FCS_LEN; i++)
+			fcs[i] = ~skb->data[skb->len + i];
+		skb->csum = csum_partial(fcs, ETH_FCS_LEN, skb->csum);
+	}
 	skb->ip_summed = CHECKSUM_COMPLETE;
 #endif
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] fix multi-buffer logging with mergeable buffers
From: Michael S. Tsirkin @ 2010-05-09 11:10 UTC (permalink / raw)
  To: David L Stevens; +Cc: netdev
In-Reply-To: <1273263068.24088.1.camel@w-dls.beaverton.ibm.com>

On Fri, May 07, 2010 at 01:11:08PM -0700, David L Stevens wrote:
> This patch fixes the multibuffer case of logging with
> mergeable buffers.
> 
> Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

So, I folded this into your original patch submission,
this way we don't get broken logging followed
by a fix (better for bisect). Further, I think it's better to only use log_num
on success. After fixing whitespace (+ needs space around it:
didn't checkpatch complain?) I merged the below,
and rebased my tweaks patch on top.

The result can be found here:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-mrg-rxbuf

Compiles fine but I'm a bit busy with other things,
and didn't test at all yet, I'd appreciate testing and reports.

--

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8ef5e3f..1964ab0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -877,7 +877,7 @@ int vhost_get_desc_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 	unsigned int out, in;
 	int seg = 0;
 	int headcount = 0;
-	int r;
+	int r, nlogs = 0;
 
 	while (datalen > 0) {
 		if (headcount >= VHOST_NET_MAX_SG) {
@@ -897,12 +897,18 @@ int vhost_get_desc_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 			r = -EINVAL;
 			goto err;
 		}
+		if (unlikely(log)) {
+			nlogs += *log_num;
+			log += *log_num;
+		}
 		heads[headcount].len = iov_length(vq->iov + seg, in);
 		datalen -= heads[headcount].len;
 		++headcount;
 		seg += in;
 	}
 	*iovcount = seg;
+	if (unlikely(log))
+		*log_num = nlogs;
 	return headcount;
 err:
 	vhost_discard_desc(vq, headcount);
-- 
MST

^ permalink raw reply related

* [PATCH] X25: Replace BKL in sockopts calls
From: Andrew Hendry @ 2010-05-09 12:45 UTC (permalink / raw)
  To: netdev


x25_setsockopt only updates the socket
x25_get only reads

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>

---
 net/x25/af_x25.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index 296e65e..9f177a1 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -453,7 +453,6 @@ static int x25_setsockopt(struct socket *sock, int level, int optname,
 	struct sock *sk = sock->sk;
 	int rc = -ENOPROTOOPT;
 
-	lock_kernel();
 	if (level != SOL_X25 || optname != X25_QBITINCL)
 		goto out;
 
@@ -465,20 +464,20 @@ static int x25_setsockopt(struct socket *sock, int level, int optname,
 	if (get_user(opt, (int __user *)optval))
 		goto out;
 
+	lock_sock(sk);
 	x25_sk(sk)->qbitincl = !!opt;
+	release_sock(sk);
 	rc = 0;
 out:
-	unlock_kernel();
 	return rc;
 }
 
 static int x25_getsockopt(struct socket *sock, int level, int optname,
 			  char __user *optval, int __user *optlen)
 {
-	struct sock *sk = sock->sk;
+	struct x25_sock *sk = x25_sk(sock->sk);
 	int val, len, rc = -ENOPROTOOPT;
 
-	lock_kernel();
 	if (level != SOL_X25 || optname != X25_QBITINCL)
 		goto out;
 
@@ -496,10 +495,9 @@ static int x25_getsockopt(struct socket *sock, int level, int optname,
 	if (put_user(len, optlen))
 		goto out;
 
-	val = x25_sk(sk)->qbitincl;
+	val = sk->qbitincl;
 	rc = copy_to_user(optval, &val, len) ? -EFAULT : 0;
 out:
-	unlock_kernel();
 	return rc;
 }
 
-- 
1.7.0.4



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox