Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC-PATCH] caif: Add debug connection type for CAIF.
From: sjur.brandeland @ 2010-05-08  7:23 UTC (permalink / raw)
  To: netdev; +Cc: marcel, sjurbr, Sjur Braendeland

From: Sjur Braendeland <sjur.brandeland@stericsson.com>

Added new CAIF protocol type CAIFPROTO_DEBUG for accessing
CAIF debug on the ST Ericsson modems.

There are two debug servers on the modem, one for radio related
debug (CAIF_RADIO_DEBUG_SERVICE) and the other for
communication/application related debug (CAIF_COM_DEBUG_SERVICE).

The debug connection can contain trace debug printouts or
interactive debug used for debugging and test.

Debug connections can be of type STREAM or SEQPACKET.

Feedback is welcomed.
---
 include/linux/caif/caif_socket.h |   36 ++++++++++++++++++++++++++++++++++++
 net/caif/caif_config_util.c      |    5 +++++
 net/caif/caif_socket.c           |    5 -----
 3 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/include/linux/caif/caif_socket.h b/include/linux/caif/caif_socket.h
index 2a61eb1..1a8fd1f 100644
--- a/include/linux/caif/caif_socket.h
+++ b/include/linux/caif/caif_socket.h
@@ -62,6 +62,7 @@ enum caif_channel_priority {
  * @CAIFPROTO_DATAGRAM_LOOP:	Datagram loopback channel, used for testing.
  * @CAIFPROTO_UTIL:		Utility (Psock) channel.
  * @CAIFPROTO_RFM:		Remote File Manager
+ * @CAIFPROTO_DEBUG:		Debug link
  *
  * This enum defines the CAIF Channel type to be used. This defines
  * the service to connect to on the modem.
@@ -72,6 +73,7 @@ enum caif_protocol_type {
 	CAIFPROTO_DATAGRAM_LOOP,
 	CAIFPROTO_UTIL,
 	CAIFPROTO_RFM,
+	CAIFPROTO_DEBUG,
 	_CAIFPROTO_MAX
 };
 #define	CAIFPROTO_MAX _CAIFPROTO_MAX
@@ -85,6 +87,29 @@ enum caif_at_type {
 };
 
 /**
+ * enum caif_debug_type - Content selection for debug connection
+ * @CAIF_DEBUG_TRACE_INTERACTIVE: Connection will contain
+ *				both trace and interactive debug.
+ * @CAIF_DEBUG_TRACE:		Connection contains trace only.
+ * @CAIF_DEBUG_INTERACTIVE:	Connection to interactive debug.
+ */
+enum caif_debug_type {
+	CAIF_DEBUG_TRACE_INTERACTIVE = 0,
+	CAIF_DEBUG_TRACE,
+	CAIF_DEBUG_INTERACTIVE,
+};
+
+/**
+ * enum caif_debug_service - Debug Service to connect.
+ * @CAIF_RADIO_DEBUG_SERVICE:	Debug on the Radio sub-system
+ * @CAIF_COM_DEBUG_SERVICE:	Debug on the communication sub-system
+ */
+enum caif_debug_service {
+	CAIF_RADIO_DEBUG_SERVICE = 1,
+	CAIF_COM_DEBUG_SERVICE
+};
+
+/**
  * struct sockaddr_caif - the sockaddr structure for CAIF sockets.
  * @family:		     Address family number, must be AF_CAIF.
  * @u:			     Union of address data 'switched' by family.
@@ -109,6 +134,13 @@ enum caif_at_type {
  *
  * @u.rfm.volume:            Volume to mount.
  *
+ * @u.dbg:                    Applies when family = CAIFPROTO_DEBUG.
+ *
+ * @u.dbg.type:		      Type of debug connection to set up
+ *			       (caif_debug_type).
+ *
+ * @u.dbg.service:            Service sub-system to connect (caif_debug_service)
+ *
  * Description:
  * This structure holds the connect parameters used for setting up a
  * CAIF Channel. It defines the service to connect to on the modem.
@@ -130,6 +162,10 @@ struct sockaddr_caif {
 			__u32 connection_id;
 			char	  volume[16];
 		} rfm;				/* CAIFPROTO_RFM */
+		struct {
+			__u8  type;		/* type:enum caif_debug_type */
+			__u8  service;		/* service:caif_debug_service */
+		} dbg;				/* CAIFPROTO_DEBUG */
 	} u;
 };
 
diff --git a/net/caif/caif_config_util.c b/net/caif/caif_config_util.c
index 6f36580..76ae683 100644
--- a/net/caif/caif_config_util.c
+++ b/net/caif/caif_config_util.c
@@ -80,6 +80,11 @@ int connect_req_to_link_param(struct cfcnfg *cnfg,
 		       l->u.utility.paramlen);
 
 		break;
+	case CAIFPROTO_DEBUG:
+		l->linktype = CFCTRL_SRV_DBG;
+		l->endpoint = s->sockaddr.u.dbg.service;
+		l->chtype = s->sockaddr.u.dbg.type;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index c3a70c5..4e6c610 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -843,11 +843,6 @@ static int caif_connect(struct socket *sock, struct sockaddr *uaddr,
 	if (uaddr->sa_family != AF_CAIF)
 		goto out;
 
-	err = -ESOCKTNOSUPPORT;
-	if (unlikely(!(sk->sk_type == SOCK_STREAM &&
-		       cf_sk->sk.sk_protocol == CAIFPROTO_AT) &&
-		       sk->sk_type != SOCK_SEQPACKET))
-		goto out;
 	switch (sock->state) {
 	case SS_UNCONNECTED:
 		/* Normal case, a fresh connect */
-- 
1.6.3.3


^ permalink raw reply related

* Re: [PATCH] ipv4: remove ip_rt_secret timer (v4)
From: Eric Dumazet @ 2010-05-08  6:36 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20100508010108.GA3028@localhost.localdomain>

Le vendredi 07 mai 2010 à 21:01 -0400, Neil Horman a écrit :
> Hey, Had a moment tonight so I spun version 4 of the patch.
> 
> Change notes:
> 1) Redid the initialization, in light of Erics clarification.  We randomly seed
> all 32 bits of the rt_genid for a netns, but still just do 256 byte
> modifications on cache invalidations
> 
> 2) got rid of the dup checking crap.
> 
> 
> Summary:
> 
> A while back there was a discussion regarding the rt_secret_interval timer.
> Given that we've had the ability to do emergency route cache rebuilds for awhile
> now, based on a statistical analysis of the various hash chain lengths in the
> cache, the use of the flush timer is somewhat redundant.  This patch removes the
> rt_secret_interval sysctl, allowing us to rely solely on the statistical
> analysis mechanism to determine the need for route cache flushes.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Sorry for the confusion Neil, thanks !



^ permalink raw reply

* RE: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.
From: Xin, Xiaohui @ 2010-05-08  7:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, davem@davemloft.net,
	jdike@linux.intel.com
In-Reply-To: <20100425121420.GB10238@redhat.com>

Michael,
Sorry, somehow I missed this mail. :-(

>> Here, we have ever considered 2 ways to utilize the page constructor
>> API to dispense the user buffers.
>> 
>> One:	Modify __alloc_skb() function a bit, it can only allocate a 
>> 	structure of sk_buff, and the data pointer is pointing to a 
>> 	user buffer which is coming from a page constructor API.
>> 	Then the shinfo of the skb is also from guest.
>> 	When packet is received from hardware, the skb->data is filled
>> 	directly by h/w. What we have done is in this way.
>> 
>> 	Pros:	We can avoid any copy here.
>> 	Cons:	Guest virtio-net driver needs to allocate skb as almost
>> 		the same method with the host NIC drivers, say the size
>> 		of netdev_alloc_skb() and the same reserved space in the
>> 		head of skb. Many NIC drivers are the same with guest and
>> 		ok for this. But some lastest NIC drivers reserves special
>> 		room in skb head. To deal with it, we suggest to provide
>> 		a method in guest virtio-net driver to ask for parameter
>> 		we interest from the NIC driver when we know which device 
>> 		we have bind to do zero-copy. Then we ask guest to do so.
>> 		Is that reasonable?

>Do you still do this?

Currently, we still use the first way. But we now ignore the room which 
host skb_reserve() required when device is doing zero-copy. Now we don't 
taint guest virtio-net driver with a new method by this way.

>> Two:	Modify driver to get user buffer allocated from a page constructor
>> 	API(to substitute alloc_page()), the user buffer are used as payload
>> 	buffers and filled by h/w directly when packet is received. Driver
>> 	should associate the pages with skb (skb_shinfo(skb)->frags). For 
>> 	the head buffer side, let host allocates skb, and h/w fills it. 
>> 	After that, the data filled in host skb header will be copied into
>> 	guest header buffer which is submitted together with the payload buffer.
>> 
>> 	Pros:	We could less care the way how guest or host allocates their
>> 		buffers.
>> 	Cons:	We still need a bit copy here for the skb header.
>> 
>> We are not sure which way is the better here. This is the first thing we want
>> to get comments from the community. We wish the modification to the network
>> part will be generic which not used by vhost-net backend only, but a user
>> application may use it as well when the zero-copy device may provides async
>> read/write operations later.

>I commented on this in the past. Do you still want comments?

Now we continue with the first way and try to push it. But any comments about the two methods are still welcome.

>That's nice. The thing to do is probably to enable GSO/TSO
>and see what we get this way. Also, mergeable buffer support
>was recently posted and I hope to merge it for 2.6.35.
>You might want to take a look.

I'm looking at the mergeable buffer. I think GSO/GRO support with zero-copy also needs it.
Currently, GSO/TSO is still not supported by vhost-net?
-- 
MST

^ permalink raw reply

* Re: [PATCH] ipv4: remove ip_rt_secret timer (v4)
From: David Miller @ 2010-05-08  8:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nhorman, netdev, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1273300597.2325.55.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 08 May 2010 08:36:37 +0200

> Le vendredi 07 mai 2010 à 21:01 -0400, Neil Horman a écrit :
>> A while back there was a discussion regarding the rt_secret_interval timer.
>> Given that we've had the ability to do emergency route cache rebuilds for awhile
>> now, based on a statistical analysis of the various hash chain lengths in the
>> cache, the use of the flush timer is somewhat redundant.  This patch removes the
>> rt_secret_interval sysctl, allowing us to rely solely on the statistical
>> analysis mechanism to determine the need for route cache flushes.
>> 
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks guys!

^ permalink raw reply

* Re: [PATCH][v4] tcp: fix ICMP-RTO war
From: Damian Lukowski @ 2010-05-08  8:30 UTC (permalink / raw)
  To: Jerry Chu; +Cc: ilpo.jarvinen, netdev
In-Reply-To: <n2gd1c2719f1005071625nd3df3d72t9655a1f64f2fd825@mail.gmail.com>

> I'm working on a patch that tries to measure and use the RTT for the passive
> open side when the TS option is NOT enabled. My code conflicts with your
> recently added "tcp_ack_update_rtt(sk, 0, 0);" Could you tell me why do you
> force this call for the no-TS case when obviously "0" is not a measured RTT?
> If you try to force icsk_rto to be initialized correctly, it is
> already initialized to
> TCP_TIMEOUT_INIT by tcp_create_openreq_child(). What am I missing?

Hi,
the backoff reversion code uses __tcp_set_rto() to recalculate icsk_rto,
which itself relies on tp->srtt and rttvar.
srtt is explicitly set to 0 in tcp_create_openreq_child(), so I didn't change it.
Initializing it with TCP_TIMEOUT_INIT should also fix that specific bug,
but I don't know if there are other impacts.

Regards
 Damian

> Thanks,
> 
> Jerry
> 
>> From: David Miller <davem@davemloft.net>
>>
>> Date: Sun, Feb 21, 2010 at 7:10 PM
>> Subject: Re: [PATCH][v4] tcp: fix ICMP-RTO war
>> To: ilpo.jarvinen@helsinki.fi
>> Cc: damian@tvk.rwth-aachen.de, netdev@vger.kernel.org
>>
>>
>> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
>> Date: Tue, 16 Feb 2010 14:45:25 +0200 (EET)
>>
>>> On Wed, 10 Feb 2010, Damian Lukowski wrote:
>>>
>>>> @@ -5783,12 +5783,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>>>>
>>>>                              /* tcp_ack considers this ACK as duplicate
>>>>                               * and does not calculate rtt.
>>>> -                             * Fix it at least with timestamps.
>>>> +                             * Force it here.
>>>>                               */
>>>> -                            if (tp->rx_opt.saw_tstamp &&
>>>> -                                tp->rx_opt.rcv_tsecr && !tp->srtt)
>>>> -                                    tcp_ack_saw_tstamp(sk, 0);
>>>> -
>>>> +                            tcp_ack_update_rtt(sk, 0, 0);
>>>> +
>>>
>>> ...Here a zero seq_rtt is given to RTT estimator (it will be effective
>>> only in the case w/o timestamps, TS case recalculates it from the stored
>>> timestamps). Maybe we could use some field (timestamp related one comes to
>>> my mind) in request sock to get a real RTT estimate for non-timestamp case
>>> too. ...It seems possible to me, though tricky because the request_sock is
>>> no longer that easily available here so some parameter passing would be
>>> needed.
>>
>> Agreed.
>>
>> But even more simply I think we should make even the current
>> tcp_ack_update_rtt() call here conditional on at least
>> tp->srtt being zero.
>>
>> Damian do you at least agree with that?
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [PATCH 1/2] netdev/fec: fix performance impact from mdio poll operation
From: Bryan Wu @ 2010-05-08 10:07 UTC (permalink / raw)
  To: Andy Fleming
  Cc: davem@davemloft.net, Sascha Hauer, Greg Ungerer, Amit Kucheria,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <i2l2acbd3e41005070906me50416bapd8154a567406b886@mail.gmail.com>

On 05/08/2010 12:06 AM, Andy Fleming wrote:
> On Thursday, May 6, 2010, Bryan Wu<bryan.wu@canonical.com>  wrote:
>> BugLink: http://bugs.launchpad.net/bugs/546649
>> BugLink: http://bugs.launchpad.net/bugs/457878
>>
>> After introducing phylib supporting, users experienced performace drop. That is
>> because of the mdio polling operation of phylib. Use msleep to replace the busy
>> waiting cpu_relax() and remove the warning message.
>
> I'm a little confused by this patch.  In order to improve performance,
> you made the mdio functions fail silently?  Why are you getting
> timeouts at all?

 From the BugLink, we experienced a lot of error message about mdio_read 
timeout. Actually, the logical in this function is quite simple and I still have 
no idea about why mdio_read timeout. That might be hardware defect.

> The Phy lib is not going to respond well if an mdio
> write times out without returning an error.  And returning an error
> means the whole state machine has to reset, as a failed write is not
> something we currently handle gracefully.
>

Right, if we return error to phylib due to mdio read times out, the performance 
will drop a lot.

> Is it possible to use an interrupt to get the phy state change?  This
> would allow the phy lib to stop its incessant polling.  It doesn't
> solve the question of why it's timing out, though.
>

I'm going to try the interrupt, but fec driver is shared by several SoCs. It 
might be a little complicated and still cannot fix the timing out issue. Is 
there any example driver to use an interrupt to get the phy state change?

Thanks,
-Bryan

>>
>> Signed-off-by: Bryan Wu<bryan.wu@canonical.com>
>> Acked-by: Andy Whitcroft<apw@canonical.com>
>> ---
>>   drivers/net/fec.c |   45 +++++++++++++++++++++------------------------
>>   1 files changed, 21 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
>> index 2b1651a..9c58f6b 100644
>> --- a/drivers/net/fec.c
>> +++ b/drivers/net/fec.c
>> @@ -203,7 +203,7 @@ static void fec_stop(struct net_device *dev);
>>   #define FEC_MMFR_TA            (2<<  16)
>>   #define FEC_MMFR_DATA(v)       (v&  0xffff)
>>
>> -#define FEC_MII_TIMEOUT                10000
>> +#define FEC_MII_TIMEOUT                10
>>
>>   /* Transmitter timeout */
>>   #define TX_TIMEOUT (2 * HZ)
>> @@ -611,13 +611,29 @@ spin_unlock:
>>   /*
>>    * NOTE: a MII transaction is during around 25 us, so polling it...
>>    */
>> -static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>> +static int fec_enet_mdio_poll(struct fec_enet_private *fep)
>>   {
>> -       struct fec_enet_private *fep = bus->priv;
>>          int timeout = FEC_MII_TIMEOUT;
>>
>>          fep->mii_timeout = 0;
>>
>> +       /* wait for end of transfer */
>> +       while (!(readl(fep->hwp + FEC_IEVENT)&  FEC_ENET_MII)) {
>> +               msleep(1);
>> +               if (timeout--<  0) {
>> +                       fep->mii_timeout = 1;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>> +{
>> +       struct fec_enet_private *fep = bus->priv;
>> +
>> +
>>          /* clear MII end of transfer bit*/
>>          writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>>
>> @@ -626,15 +642,7 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
>>                  FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
>>                  FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
>>
>> -       /* wait for end of transfer */
>> -       while (!(readl(fep->hwp + FEC_IEVENT)&  FEC_ENET_MII)) {
>> -               cpu_relax();
>> -               if (timeout--<  0) {
>> -                       fep->mii_timeout = 1;
>> -                       printk(KERN_ERR "FEC: MDIO read timeout\n");
>> -                       return -ETIMEDOUT;
>> -               }
>> -       }
>> +       fec_enet_mdio_poll(fep);
>>
>>          /* return value */
>>          return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
>> @@ -644,9 +652,6 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>>                             u16 value)
>>   {
>>          struct fec_enet_private *fep = bus->priv;
>> -       int timeout = FEC_MII_TIMEOUT;
>> -
>> -       fep->mii_timeout = 0;
>>
>>          /* clear MII end of transfer bit*/
>>          writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>> @@ -657,15 +662,7 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
>>                  FEC_MMFR_TA | FEC_MMFR_DATA(value),
>>                  fep->hwp + FEC_MII_DATA);
>>
>> -       /* wait for end of transfer */
>> -       while (!(readl(fep->hwp + FEC_IEVENT)&  FEC_ENET_MII)) {
>> -               cpu_relax();
>> -               if (timeout--<  0) {
>> -                       fep->mii_timeout = 1;
>> -                       printk(KERN_ERR "FEC: MDIO write timeout\n");
>> -                       return -ETIMEDOUT;
>> -               }
>> -       }
>> +       fec_enet_mdio_poll(fep);
>>
>>          return 0;
>>   }
>> --
>> 1.7.0.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply

* Re: [PATCH] ipv4: remove ip_rt_secret timer (v4)
From: Neil Horman @ 2010-05-08 12:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1273300597.2325.55.camel@edumazet-laptop>

On Sat, May 08, 2010 at 08:36:37AM +0200, Eric Dumazet wrote:
> Le vendredi 07 mai 2010 à 21:01 -0400, Neil Horman a écrit :
> > Hey, Had a moment tonight so I spun version 4 of the patch.
> > 
> > Change notes:
> > 1) Redid the initialization, in light of Erics clarification.  We randomly seed
> > all 32 bits of the rt_genid for a netns, but still just do 256 byte
> > modifications on cache invalidations
> > 
> > 2) got rid of the dup checking crap.
> > 
> > 
> > Summary:
> > 
> > A while back there was a discussion regarding the rt_secret_interval timer.
> > Given that we've had the ability to do emergency route cache rebuilds for awhile
> > now, based on a statistical analysis of the various hash chain lengths in the
> > cache, the use of the flush timer is somewhat redundant.  This patch removes the
> > rt_secret_interval sysctl, allowing us to rely solely on the statistical
> > analysis mechanism to determine the need for route cache flushes.
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Sorry for the confusion Neil, thanks !
Its my fault, it was more clear to me on a second reading.  Apologies!
Neil

> 
> 
> 

^ permalink raw reply

* Re: [PATCH 1/2] netdev/fec: fix performance impact from mdio poll operation
From: Andy Fleming @ 2010-05-08 15:25 UTC (permalink / raw)
  To: bryan.wu@canonical.com
  Cc: davem@davemloft.net, Sascha Hauer, Greg Ungerer, Amit Kucheria,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <4BE537E2.6010809@canonical.com>

On Saturday, May 8, 2010, Bryan Wu <bryan.wu@canonical.com> wrote:
> On 05/08/2010 12:06 AM, Andy Fleming wrote:

> From the BugLink, we experienced a lot of error
>
> The Phy lib is not going to respond well if an mdio
> write times out without returning an error.  And returning an error
> means the whole state machine has to reset, as a failed write is not
> something we currently handle gracefully.
>
>
>
> Right, if we return error to phylib due to mdio read times out, the performance will drop a lot.

Yes, and if you *don't* return an error, you risk incorrect behavior.
This patch changes the driver to return the read result, after
detecting that the read result is invalid.  I don't know what the
value in that register will be, but it can't be correct.  If the
hardware is broken, then we need to find a workaround that doesn't
break things for chips which have working mdio.  It's also possible we
just need a longer timeout.  Does the mdio bus ever return correct
values?

>
>
> Is it possible to use an interrupt to get the phy state change?  This
> would allow the phy lib to stop its incessant polling.  It doesn't
> solve the question of why it's timing out, though.
>
>
>
> I'm going to try the interrupt, but fec driver is shared by several SoCs. It might be a little complicated and still cannot fix the timing out issue. Is there any example driver to use an interrupt to get the phy state change?

Well, there are examples, but they may be only somewhat helpful.  From
the perspective of the PHY Lib, all you have to do is put a correct
value into the irqs array in the mdio bus struct for your bus.  The
part where you will probably have to solve a problem is in figuring
out how to convey that information to the bus driver.  On Power CPUs
and socs, we use a device tree, which has an interrupt associated with
each PHY.  Look at wherever your mdio bus *device* is registered.
That's probably where you'll have that information.

Feel free to look at drivers/net/fsl_pq_mdio.c

Another possibility is to use the fixed link phy driver, which will
never attempt to use the mdio bus, and won't poll.

^ permalink raw reply

* Re: [PATCH][v4] tcp: fix ICMP-RTO war
From: Jerry Chu @ 2010-05-08 17:27 UTC (permalink / raw)
  To: Damian Lukowski; +Cc: ilpo.jarvinen, netdev
In-Reply-To: <4BE5213C.1030300@tvk.rwth-aachen.de>

On Sat, May 8, 2010 at 1:30 AM, Damian Lukowski
<damian@tvk.rwth-aachen.de> wrote:
>
> > I'm working on a patch that tries to measure and use the RTT for the passive
> > open side when the TS option is NOT enabled. My code conflicts with your
> > recently added "tcp_ack_update_rtt(sk, 0, 0);" Could you tell me why do you
> > force this call for the no-TS case when obviously "0" is not a measured RTT?
> > If you try to force icsk_rto to be initialized correctly, it is
> > already initialized to
> > TCP_TIMEOUT_INIT by tcp_create_openreq_child(). What am I missing?
>
> Hi,
> the backoff reversion code uses __tcp_set_rto() to recalculate icsk_rto,
> which itself relies on tp->srtt and rttvar.

Guess you are talking about

inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
                                         icsk->icsk_backoff;

inside tcp_v4_err(), right? (I'm looking at 2.6.33 kernel.)

Yes it seems to be a bug when __tcp_set_rto() is called before
tcp_rtt_estimator()
gets a chance to initialize all the variables properly.

But I don't like your fix of adding tcp_ack_update_rtt(sk, 0, 0); to
tcp_rcv_state_process()
because that means you've got a measured RTT of 0 (really meaning < 1 tick) for
the no-TS case, which will cause tcp_rtt_estimator() to compute all
the variables as if
there has been a valid RTT measurement of 1.

A better fix IMHO is to make sure all the variables are properly
initialized when exiting
tcp_init_metrics(), e.g, if srtt remains 0, make sure
tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
(mdev already been initialized to TCP_TIMEOUT_INIT. I think you got
hit by rttvar == 0)

> srtt is explicitly set to 0 in tcp_create_openreq_child(), so I didn't change it.
> Initializing it with TCP_TIMEOUT_INIT should also fix that specific bug,
> but I don't know if there are other impacts.

So what do I care? Because I'm mucking with the code in this area and your fix
causes some conflict with my logic.

What do you think?

Best,

Jerry

>
> Regards
>  Damian
>
> > Thanks,
> >
> > Jerry
> >
> >> From: David Miller <davem@davemloft.net>
> >>
> >> Date: Sun, Feb 21, 2010 at 7:10 PM
> >> Subject: Re: [PATCH][v4] tcp: fix ICMP-RTO war
> >> To: ilpo.jarvinen@helsinki.fi
> >> Cc: damian@tvk.rwth-aachen.de, netdev@vger.kernel.org
> >>
> >>
> >> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> >> Date: Tue, 16 Feb 2010 14:45:25 +0200 (EET)
> >>
> >>> On Wed, 10 Feb 2010, Damian Lukowski wrote:
> >>>
> >>>> @@ -5783,12 +5783,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
> >>>>
> >>>>                              /* tcp_ack considers this ACK as duplicate
> >>>>                               * and does not calculate rtt.
> >>>> -                             * Fix it at least with timestamps.
> >>>> +                             * Force it here.
> >>>>                               */
> >>>> -                            if (tp->rx_opt.saw_tstamp &&
> >>>> -                                tp->rx_opt.rcv_tsecr && !tp->srtt)
> >>>> -                                    tcp_ack_saw_tstamp(sk, 0);
> >>>> -
> >>>> +                            tcp_ack_update_rtt(sk, 0, 0);
> >>>> +
> >>>
> >>> ...Here a zero seq_rtt is given to RTT estimator (it will be effective
> >>> only in the case w/o timestamps, TS case recalculates it from the stored
> >>> timestamps). Maybe we could use some field (timestamp related one comes to
> >>> my mind) in request sock to get a real RTT estimate for non-timestamp case
> >>> too. ...It seems possible to me, though tricky because the request_sock is
> >>> no longer that easily available here so some parameter passing would be
> >>> needed.
> >>
> >> Agreed.
> >>
> >> But even more simply I think we should make even the current
> >> tcp_ack_update_rtt() call here conditional on at least
> >> tp->srtt being zero.
> >>
> >> Damian do you at least agree with that?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH][v4] tcp: fix ICMP-RTO war
From: Ilpo Järvinen @ 2010-05-08 22:00 UTC (permalink / raw)
  To: Jerry Chu; +Cc: Damian Lukowski, Netdev
In-Reply-To: <k2xd1c2719f1005081027v376a4ebfp300c6272f9ea91df@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2678 bytes --]

On Sat, 8 May 2010, Jerry Chu wrote:

> On Sat, May 8, 2010 at 1:30 AM, Damian Lukowski
> <damian@tvk.rwth-aachen.de> wrote:
> >
> > > I'm working on a patch that tries to measure and use the RTT for the passive
> > > open side when the TS option is NOT enabled. My code conflicts with your
> > > recently added "tcp_ack_update_rtt(sk, 0, 0);" Could you tell me why do you
> > > force this call for the no-TS case when obviously "0" is not a measured RTT?
> > > If you try to force icsk_rto to be initialized correctly, it is
> > > already initialized to
> > > TCP_TIMEOUT_INIT by tcp_create_openreq_child(). What am I missing?
> >
> > Hi,
> > the backoff reversion code uses __tcp_set_rto() to recalculate icsk_rto,
> > which itself relies on tp->srtt and rttvar.
> 
> Guess you are talking about
> 
> inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) <<
>                                          icsk->icsk_backoff;
> 
> inside tcp_v4_err(), right? (I'm looking at 2.6.33 kernel.)
> 
> Yes it seems to be a bug when __tcp_set_rto() is called before
> tcp_rtt_estimator()
> gets a chance to initialize all the variables properly.
>
> But I don't like your fix of adding tcp_ack_update_rtt(sk, 0, 0); to
> tcp_rcv_state_process()
> because that means you've got a measured RTT of 0 (really meaning < 1 tick) for
> the no-TS case, which will cause tcp_rtt_estimator() to compute all
> the variables as if
> there has been a valid RTT measurement of 1.
> 
> A better fix IMHO is to make sure all the variables are properly
> initialized when exiting
> tcp_init_metrics(), e.g, if srtt remains 0, make sure
> tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
> (mdev already been initialized to TCP_TIMEOUT_INIT. I think you got
> hit by rttvar == 0)
> 
> > srtt is explicitly set to 0 in tcp_create_openreq_child(), so I didn't change it.
> > Initializing it with TCP_TIMEOUT_INIT should also fix that specific bug,
> > but I don't know if there are other impacts.
> 
> So what do I care? Because I'm mucking with the code in this area and your fix
> causes some conflict with my logic.
> 
> What do you think?

I didn't/don't like the fix either, but it's better than ICMP-RTO war that 
it fixed.

I think that most sensible solution to this issue would not be to 
initialize with the TCP_TIMEOUT_INIT bogosity either but do the real 
measurement for rtt which should be possible already. I suggested this 
already when the fix was made but nobody has yet found the time and
energy to code the rtt measurement for the early rtt when not using TS. 
Since you're going to make that lack of feature to go away, please change 
this illogical call too then while you go :-).

-- 
 i.

^ permalink raw reply

* [PATCH] virtif: initial interface extensions
From: Arnd Bergmann @ 2010-05-08 23:20 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw
In-Reply-To: <201005061842.51981.arnd@arndb.de>

Building on the work of Scott Feldman, this extends the netlink interface
to deal with not only port profiles but also CDCP multichannel and VDP
VSI registration.

The protocols are split apart into separate netlink attributes for
a cleaner separation. A device can have multiple IFLA_VIRTIF attributes,
each pointing to one of the slaves that get registered using one of
the protocols. In case of VDP, each IFLA_VIRTIF attribute needs
both an IFLA_VIRTIF_VSI and one or more IFLA_VIRTIF_VSI_MAC_VLAN
attributes.

The VF number is split out because it only makes sense when the
implementation is in the device driver or firmware, but not for
software-only implementations like we do for the initial VDP code
using LLDPAD. If a IFLA_VIRTIF_VF attribute is given, the kernel
takes care of the association through the device driver for that
VF, otherwise we do it in user space. This should work for each
of the three protocols.

This code is a first rough prototype, completely untested and
very likely buggy. This is also the first time I'm trying
to deal with netlink in the kernel, so chances are that I've
misunderstood something in a major way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/if_link.h   |   81 ++++++++++++++++++++++++++++------
 include/linux/netdevice.h |    8 ++--
 net/core/rtnetlink.c      |  106 ++++++++++++++++++++++++++++++++++-----------
 3 files changed, 152 insertions(+), 43 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index d763358..675d190 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,7 +116,7 @@ enum {
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VFINFO,
 	IFLA_STATS64,
-	IFLA_VF_PORT_PROFILE,
+	IFLA_VIRTIF,
 	__IFLA_MAX
 };
 
@@ -262,26 +262,79 @@ struct ifla_vf_info {
 };
 
 enum {
-	IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN,
-	IFLA_VF_PORT_PROFILE_STATUS_SUCCESS,
-	IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS,
-	IFLA_VF_PORT_PROFILE_STATUS_ERROR,
+	IFLA_VIRTIF_UNSPEC,
+	IFLA_VIRTIF_VF,
+	IFLA_VIRTIF_PORT_PROFILE, /* Cisco enic */
+	IFLA_VIRTIF_CHANNEL,	  /* 802.1Qbg CDCP */
+	IFLA_VIRTIF_VSI,	  /* 802.1Qbg VDP */
+	IFLA_VIRTIF_VSI_MAC_VLAN,
+	__IFLA_VIRTIF_MAX,
 };
 
-#define IFLA_VF_PORT_PROFILE_MAX	40
-#define IFLA_VF_UUID_MAX		40
-#define IFLA_VF_CLIENT_NAME_MAX		40
+#define IFLA_VIRTIF_MAX (__IFLA_VIRTIF_MAX - 1)
 
-struct ifla_vf_port_profile {
-	__u32 vf;
+enum {
+	VIRTIF_PORT_PROFILE_STATUS_UNKNOWN,
+	VIRTIF_PORT_PROFILE_STATUS_SUCCESS,
+	VIRTIF_PORT_PROFILE_STATUS_INPROGRESS,
+	VIRTIF_PORT_PROFILE_STATUS_ERROR,
+};
+
+#define VIRTIF_PORT_PROFILE_MAX	40
+#define VIRTIF_UUID_MAX		40
+#define VIRTIF_CLIENT_NAME_MAX	40
+
+struct ifla_virtif_port_profile {
 	__u32 flags;
 	__u32 status;
-	__u8 port_profile[IFLA_VF_PORT_PROFILE_MAX];
+	__u8 port_profile[VIRTIF_PORT_PROFILE_MAX];
 	__u8 mac[32];					/* MAX_ADDR_LEN */
 	/* UUID e.g. "CEEFD3B1-9E11-11DE-BDFD-000BAB01C0FB" */
-	__u8 host_uuid[IFLA_VF_UUID_MAX];
-	__u8 client_uuid[IFLA_VF_UUID_MAX];
-	__u8 client_name[IFLA_VF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+	__u8 host_uuid[VIRTIF_UUID_MAX];
+	__u8 client_uuid[VIRTIF_UUID_MAX];
+	__u8 client_name[VIRTIF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+};
+
+/*
+ * CDCP and VDP come from the 802.1Qbg standard, these definitions are taken
+ * from the respective TLV definitions in there. Adapters implementing CDCP
+ * or VDP can encapsulate the structures in LLDP/ECP headers and send them
+ * to the switch.
+ */
+struct ifla_virtif_channel {
+	__u16 scid;	/* S-Channel ID */
+	__u16 svid;	/* S-VLAN ID */
+};
+
+enum {
+	VIRTIF_VDP_REQUEST_PREASSOCIATE = 0,
+	VIRVIF_VDP_REQUEST_PREASSOCIATE_RR,
+	VIRVIF_VDP_REQUEST_ASSOCIATE,
+	VIRVIF_VDP_REQUEST_DISASSOCIATE,
+};
+
+enum {
+	VIRTIF_VDP_RESPONSE_SUCCESS = 0,
+	VIRTIF_VDP_RESPONSE_INVALID_FORMAT,
+	VIRTIF_VDP_RESPONSE_INSUFFICIENT_RESOURCES,
+	VIRTIF_VDP_RESPONSE_UNUSED_VTID,
+	VIRTIF_VDP_RESPONSE_VTID_VIOLATION,
+	VIRTIF_VDP_RESPONSE_VTID_VERSION_VIOLATION,
+	VIRTIF_VDP_RESPONSE_OUT_OF_SYNC,
+};
+
+struct ifla_virtif_vsi {
+	__u8 mode_request;
+	__u8 mode_response;
+	__u8 vsi_mgr_id;	/* these three define the policy */
+	__u8 vsi_type_id[3]; 	/*   for the guest */
+	__u8 vsi_type_version;
+	__u8 vsi_instance[16];	/* identifies the guest */
+};
+
+struct ifla_virtif_vsi_mac_vlan {
+	__u8 mac[6];
+	__u8 vlan_id[2];
 };
 
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f5b0be5..d549a3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -687,9 +687,9 @@ struct netdev_rx_queue {
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_port_profile)(struct net_device *dev, int vf,
- *				  struct ifla_vf_port_profile *ivp);
+ *				  struct virtif_port_profile *ivp);
  * int (*ndo_get_vf_port_profile)(struct net_device *dev, int vf,
- *				  struct ifla_vf_port_profile *ivp);
+ *				  struct virtif_port_profile *ivp);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -741,10 +741,10 @@ struct net_device_ops {
 						     struct ifla_vf_info *ivf);
 	int			(*ndo_set_vf_port_profile)(
 					struct net_device *dev, int vf,
-					struct ifla_vf_port_profile *ivp);
+					struct ifla_virtif_port_profile *ivp);
 	int			(*ndo_get_vf_port_profile)(
 					struct net_device *dev, int vf,
-					struct ifla_vf_port_profile *ivp);
+					struct ifla_virtif_port_profile *ivp);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2ef45b..cf5e328 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,6 +653,22 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		return 0;
 }
 
+static size_t rtnl_virtif_get_size(const struct net_device *dev)
+{
+	size_t total = 0;
+
+	if (dev->netdev_ops->ndo_get_vf_port_profile) {
+		total = dev_num_vf(dev->dev.parent) *
+			(nla_total_size(sizeof(struct ifla_virtif_port_profile))
+			 + nla_total_size(4)); /* IFLA_VIRTIF_VF */
+	}
+
+	/* fill in IFLA_VIRTIF_CHANNEL and IFLA_VIRTIF_VSI when needed */
+		
+	return total;
+}
+
+
 static inline size_t if_nlmsg_size(const struct net_device *dev)
 {
 	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
@@ -674,6 +690,38 @@ static inline size_t if_nlmsg_size(const struct net_device *dev)
 	       + nla_total_size(4) /* IFLA_NUM_VF */
 	       + nla_total_size(rtnl_vfinfo_size(dev)) /* IFLA_VFINFO */
 	       + rtnl_link_get_size(dev); /* IFLA_LINKINFO */
+	       + rtnl_virtif_get_size(dev); /* IFLA_VIRTIF */
+}
+
+static int rtnl_virtif_fill_ifinfo(struct sk_buff *skb, struct net_device *dev)
+{
+	if (dev->netdev_ops->ndo_get_vf_port_profile && dev->dev.parent) {
+		struct ifla_virtif_port_profile ivp;
+
+		if (dev_num_vf(dev->dev.parent)) {
+			int i;
+
+			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
+				if (dev->netdev_ops->ndo_get_vf_port_profile(
+					dev, i, &ivp))
+					break;
+				NLA_PUT_U32(skb, IFLA_VIRTIF_VF, i);
+				NLA_PUT(skb, IFLA_VIRTIF_PORT_PROFILE,
+					sizeof(ivp), &ivp);
+			}
+		}
+#if 0 /* Not sure how we should do this -arnd */
+		 else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
+			0, &ivp)) {
+			NLA_PUT(skb, VIRTIF_PORT_PROFILE,
+				sizeof(ivp), &ivp);
+		}
+#endif
+	}
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
 }
 
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
@@ -761,25 +809,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		}
 	}
 
-	if (dev->netdev_ops->ndo_get_vf_port_profile && dev->dev.parent) {
-		struct ifla_vf_port_profile ivp;
-
-		if (dev_num_vf(dev->dev.parent)) {
-			int i;
-
-			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
-				if (dev->netdev_ops->ndo_get_vf_port_profile(
-					dev, i, &ivp))
-					break;
-				NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
-					sizeof(ivp), &ivp);
-			}
-		} else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
-			0, &ivp)) {
-			NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
-				sizeof(ivp), &ivp);
-		}
-	}
+	if (rtnl_virtif_fill_ifinfo(skb, dev))
+		goto nla_put_failure;
 
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
@@ -847,8 +878,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 				    .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct ifla_vf_tx_rate) },
-	[IFLA_VF_PORT_PROFILE]	= { .type = NLA_BINARY,
-				    .len = sizeof(struct ifla_vf_port_profile)},
+	[IFLA_VIRTIF]		= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -857,6 +887,18 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
 	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
 };
 
+static const struct nla_policy ifla_virtif_policy[IFLA_VIRTIF_MAX+1] = {
+	[IFLA_VIRTIF_VF]	   = { .type = NLA_U32 },
+	[IFLA_VIRTIF_PORT_PROFILE] = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_port_profile) },
+	[IFLA_VIRTIF_CHANNEL]	   = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_channel) },
+	[IFLA_VIRTIF_VSI]	   = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_vsi) },
+	[IFLA_VIRTIF_VSI_MAC_VLAN] = { .type = NLA_BINARY,
+				       .len = sizeof(struct ifla_virtif_vsi_mac_vlan) },
+};
+				    
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;
@@ -1053,16 +1095,30 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	}
 	err = 0;
 
-	if (tb[IFLA_VF_PORT_PROFILE]) {
-		struct ifla_vf_port_profile *ivp;
-		ivp = nla_data(tb[IFLA_VF_PORT_PROFILE]);
+	if (tb[IFLA_VIRTIF]) {
+		struct ifla_virtif_port_profile *ivp;
+		struct nlattr *virtif[IFLA_VIRTIF_MAX+1];
+		u32 vf;
+
+		err = nla_parse_nested(virtif, IFLA_VIRTIF_MAX,
+				       tb[IFLA_VIRTIF], ifla_virtif_policy);
+		if (err < 0)
+			return err;
+
+		if (!virtif[IFLA_VIRTIF_VF] || !virtif[IFLA_VIRTIF_PORT_PROFILE])
+			goto novirtif; /* IFLA_VIRTIF may be directed at user space */
+
+		vf = nla_get_u32(virtif[IFLA_VIRTIF_VF]);
+		ivp = nla_data(virtif[IFLA_VIRTIF_PORT_PROFILE]);
+
 		err = -EOPNOTSUPP;
 		if (ops->ndo_set_vf_port_profile)
-			err = ops->ndo_set_vf_port_profile(dev, ivp->vf, ivp);
+			err = ops->ndo_set_vf_port_profile(dev, vf, ivp);
 		if (err < 0)
 			goto errout;
 		modified = 1;
 	}
+novirtif:
 	err = 0;
 
 errout:
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO
From: FUJITA Tomonori @ 2010-05-09  4:00 UTC (permalink / raw)
  To: akpm, davem; +Cc: randy.dunlap, netdev

I think that it's a good idea to add what drivers should to do exactly
in the DMA mapping failure case.

Seems some networking drivers call dev_kfree_skb() and return
NETDEV_TX_OK, however, some return NETDEV_TX_BUSY without freeing the
skb. (and many don't even check the failure, which we need to fix).

What networking drivers are supposed to do in the case?

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO

Adds the concrete DMA mapping error handling for Networking drivers on
the transmit path.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 Documentation/DMA-API-HOWTO.txt |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 52618ab..8c3235c 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -740,6 +740,11 @@ failure can be determined by:
 		 */
 	}
 
+Networking drivers must call dev_kfree_skb to free the socket buffer
+and return NETDEV_TX_OK if the DMA mapping fails on the transmit hook
+(ndo_start_xmit). This means that the socket buffer is just dropped in
+the failure case.
+
 			   Closing
 
 This document, and the API itself, would not be in it's current
-- 
1.6.5


^ permalink raw reply related

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Herbert Xu @ 2010-05-09  6:48 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: David Miller, yoshfuji, netdev, shemminger
In-Reply-To: <20100427155533.GA5709@gondor.apana.org.au>

On Tue, Apr 27, 2010 at 11:55:33PM +0800, Herbert Xu wrote:
> On Tue, Apr 27, 2010 at 05:50:34PM +0200, Jiri Bohac wrote:
> >
> >   I think that what Herbert is doing is only going to enforce that
> >   the ip6_del_rt() is never going to fail, so the dst_free()
> >   won't ever be called anyway, right?
> 
> Yes I think you're right.  But let's fix the bigger problem first,
> and then we can audit this and possibly turn it into a WARN_ON.

Sorry for not fixing this sooner, but I'm back on the case.

Also, the dst_free really is needed.  It's there for the case
where we delete the address before DAD completion.  In that
case the route object is allocated, but not inserted in the
routing table.  So we must manually dst_free it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO
From: David Miller @ 2010-05-09  8:46 UTC (permalink / raw)
  To: fujita.tomonori; +Cc: akpm, randy.dunlap, netdev
In-Reply-To: <20100509130439D.fujita.tomonori@lab.ntt.co.jp>

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Sun, 9 May 2010 13:00:49 +0900

> I think that it's a good idea to add what drivers should to do exactly
> in the DMA mapping failure case.
> 
> Seems some networking drivers call dev_kfree_skb() and return
> NETDEV_TX_OK, however, some return NETDEV_TX_BUSY without freeing the
> skb. (and many don't even check the failure, which we need to fix).
> 
> What networking drivers are supposed to do in the case?

The best thing is probably just to drop, pretending the queue
is full and passing the packet back to the stack is just
full of complications and not worth it, so...

> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Subject: [PATCH -mm] Documentation: add networking driver's mapping error handling to DMA-API-HOWTO
> 
> Adds the concrete DMA mapping error handling for Networking drivers on
> the transmit path.
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: virtio: put last_used and last_avail index into ring itself.
From: Michael S. Tsirkin @ 2010-05-09  8:57 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, virtualization, kvm, linux-kernel, mingo, linux-mm, akpm,
	hpa, gregory.haskins, s.hetze, Daniel Walker, Eric Dumazet
In-Reply-To: <201005071235.40590.rusty@rustcorp.com.au>

On Fri, May 07, 2010 at 12:35:39PM +0930, Rusty Russell wrote:
> On Thu, 6 May 2010 03:57:55 pm Michael S. Tsirkin wrote:
> > On Thu, May 06, 2010 at 10:22:12AM +0930, Rusty Russell wrote:
> > > On Wed, 5 May 2010 03:52:36 am Michael S. Tsirkin wrote:
> > > > What do you think?
> > > 
> > > I think everyone is settled on 128 byte cache lines for the forseeable
> > > future, so it's not really an issue.
> > 
> > You mean with 64 bit descriptors we will be bouncing a cache line
> > between host and guest, anyway?
> 
> I'm confused by this entire thread.
> 
> Descriptors are 16 bytes.  They are at the start, so presumably aligned to
> cache boundaries.
> 
> Available ring follows that at 2 bytes per entry, so it's also packed nicely
> into cachelines.
> 
> Then there's padding to page boundary.  That puts us on a cacheline again
> for the used ring; also 2 bytes per entry.
> 

Hmm, is used ring really 2 bytes per entry?


/* u32 is used here for ids for padding reasons. */
struct vring_used_elem {
        /* Index of start of used descriptor chain. */
        __u32 id;
        /* Total length of the descriptor chain which was used (written to) */
        __u32 len;
};

struct vring_used {
        __u16 flags;
        __u16 idx;
        struct vring_used_elem ring[];
};

> I don't see how any change in layout could be more cache friendly?
> Rusty.

I thought that used ring has 8 bytes per entry, and that struct
vring_used is aligned at page boundary, this
would mean that ring element is at offset 4 bytes from page boundary.
Thus with cacheline size 128 bytes, each 4th element crosses
a cacheline boundary. If we had a 4 byte padding after idx, each
used element would always be completely within a single cacheline.

What am I missing?
-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC][PATCH v4 00/18] Provide a zero-copy method on KVM virtio-net.
From: Michael S. Tsirkin @ 2010-05-09  9:26 UTC (permalink / raw)
  To: Xin, Xiaohui
  Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, davem@davemloft.net,
	jdike@linux.intel.com
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D790AB4A3F137@shsmsx502.ccr.corp.intel.com>

On Sat, May 08, 2010 at 03:55:48PM +0800, Xin, Xiaohui wrote:
> Michael,
> Sorry, somehow I missed this mail. :-(
> 
> >> Here, we have ever considered 2 ways to utilize the page constructor
> >> API to dispense the user buffers.
> >> 
> >> One:	Modify __alloc_skb() function a bit, it can only allocate a 
> >> 	structure of sk_buff, and the data pointer is pointing to a 
> >> 	user buffer which is coming from a page constructor API.
> >> 	Then the shinfo of the skb is also from guest.
> >> 	When packet is received from hardware, the skb->data is filled
> >> 	directly by h/w. What we have done is in this way.
> >> 
> >> 	Pros:	We can avoid any copy here.
> >> 	Cons:	Guest virtio-net driver needs to allocate skb as almost
> >> 		the same method with the host NIC drivers, say the size
> >> 		of netdev_alloc_skb() and the same reserved space in the
> >> 		head of skb. Many NIC drivers are the same with guest and
> >> 		ok for this. But some lastest NIC drivers reserves special
> >> 		room in skb head. To deal with it, we suggest to provide
> >> 		a method in guest virtio-net driver to ask for parameter
> >> 		we interest from the NIC driver when we know which device 
> >> 		we have bind to do zero-copy. Then we ask guest to do so.
> >> 		Is that reasonable?
> 
> >Do you still do this?
> 
> Currently, we still use the first way. But we now ignore the room which 
> host skb_reserve() required when device is doing zero-copy. Now we don't 
> taint guest virtio-net driver with a new method by this way.
> 
> >> Two:	Modify driver to get user buffer allocated from a page constructor
> >> 	API(to substitute alloc_page()), the user buffer are used as payload
> >> 	buffers and filled by h/w directly when packet is received. Driver
> >> 	should associate the pages with skb (skb_shinfo(skb)->frags). For 
> >> 	the head buffer side, let host allocates skb, and h/w fills it. 
> >> 	After that, the data filled in host skb header will be copied into
> >> 	guest header buffer which is submitted together with the payload buffer.
> >> 
> >> 	Pros:	We could less care the way how guest or host allocates their
> >> 		buffers.
> >> 	Cons:	We still need a bit copy here for the skb header.
> >> 
> >> We are not sure which way is the better here. This is the first thing we want
> >> to get comments from the community. We wish the modification to the network
> >> part will be generic which not used by vhost-net backend only, but a user
> >> application may use it as well when the zero-copy device may provides async
> >> read/write operations later.
> 
> >I commented on this in the past. Do you still want comments?
> 
> Now we continue with the first way and try to push it. But any comments about the two methods are still welcome.
> 
> >That's nice. The thing to do is probably to enable GSO/TSO
> >and see what we get this way. Also, mergeable buffer support
> >was recently posted and I hope to merge it for 2.6.35.
> >You might want to take a look.
> 
> I'm looking at the mergeable buffer. I think GSO/GRO support with zero-copy also needs it.
> Currently, GSO/TSO is still not supported by vhost-net?

GSO/TSO are currently supported with tap and macvtap,
AF_PACKET socket backend still needs some work to
enable GSO.

> -- 
> MST

^ permalink raw reply

* [PATCH 03/11] netdev: bfin_mac: invalid data cache only once for each new rx skb buffer
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

The skb buffer isn't actually used until we finish transferring and pass
it up to higher layers, so only invalidate the range once before we start
receiving actual data.  This also avoids the problem with data invalidating
on Blackfin systems -- there is no invalidate-only, just invalidate+flush.
So when running in writeback mode, there is the small (but not uncommon)
possibility of the flush overwriting valid DMA-ed data from the cache.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index c888465..f9ba598 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -203,6 +203,11 @@ static int desc_list_init(void)
 			goto init_error;
 		}
 		skb_reserve(new_skb, NET_IP_ALIGN);
+		/* Invidate the data cache of skb->data range when it is write back
+		 * cache. It will prevent overwritting the new data from DMA
+		 */
+		blackfin_dcache_invalidate_range((unsigned long)new_skb->head,
+					 (unsigned long)new_skb->end);
 		r->skb = new_skb;

 		/*
@@ -1012,19 +1017,17 @@ static void bfin_mac_rx(struct net_device *dev)
 	}
 	/* reserve 2 bytes for RXDWA padding */
 	skb_reserve(new_skb, NET_IP_ALIGN);
-	current_rx_ptr->skb = new_skb;
-	current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;
-
 	/* Invidate the data cache of skb->data range when it is write back
 	 * cache. It will prevent overwritting the new data from DMA
 	 */
 	blackfin_dcache_invalidate_range((unsigned long)new_skb->head,
 					 (unsigned long)new_skb->end);

+	current_rx_ptr->skb = new_skb;
+	current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;
+
 	len = (unsigned short)((current_rx_ptr->status.status_word) & RX_FRLEN);
 	skb_put(skb, len);
-	blackfin_dcache_invalidate_range((unsigned long)skb->head,
-					 (unsigned long)skb->tail);

 	skb->protocol = eth_type_trans(skb, dev);

-- 
1.7.1

^ permalink raw reply related

* [PATCH 02/11] netdev: bfin_mac: handler RX status errors
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Peter Meerwald, Graf Yang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Peter Meerwald <pmeerw@pmeerw.net>

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 6a9519f..c888465 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -981,12 +981,25 @@ out:
 	return NETDEV_TX_OK;
 }
 
+#define RX_ERROR_MASK (RX_LONG | RX_ALIGN | RX_CRC | RX_LEN | \
+	RX_FRAG | RX_ADDR | RX_DMAO | RX_PHY | RX_LATE | RX_RANGE)
+
 static void bfin_mac_rx(struct net_device *dev)
 {
 	struct sk_buff *skb, *new_skb;
 	unsigned short len;
 	struct bfin_mac_local *lp __maybe_unused = netdev_priv(dev);
 
+	/* check if frame status word reports an error condition
+	 * we which case we simply drop the packet
+	 */
+	if (current_rx_ptr->status.status_word & RX_ERROR_MASK) {
+		printk(KERN_NOTICE DRV_NAME
+		       ": rx: receive error - packet dropped\n");
+		dev->stats.rx_dropped++;
+		goto out;
+	}
+
 	/* allocate a new skb for next time receive */
 	skb = current_rx_ptr->skb;
 
@@ -1025,11 +1038,9 @@ static void bfin_mac_rx(struct net_device *dev)
 	netif_rx(skb);
 	dev->stats.rx_packets++;
 	dev->stats.rx_bytes += len;
+out:
 	current_rx_ptr->status.status_word = 0x00000000;
 	current_rx_ptr = current_rx_ptr->next;
-
-out:
-	return;
 }
 
 /* interrupt routine to handle rx and error signal */
-- 
1.7.1


^ permalink raw reply related

* [PATCH 05/11] netdev: bfin_mac: clear RXCKS if hardware generated checksum is not enabled
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

Otherwise we might be get a setting mismatch from a previous module or
bootloader and what the driver currently expects.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 5b00fc8..6d69bbb 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -515,10 +515,11 @@ void setup_system_regs(struct net_device *dev)
 	 * Configure checksum support and rcve frame word alignment
 	 */
 	sysctl = bfin_read_EMAC_SYSCTL();
+	sysctl |= RXDWA;
 #if defined(BFIN_MAC_CSUM_OFFLOAD)
-	sysctl |= RXDWA | RXCKS;
+	sysctl |= RXCKS;
 #else
-	sysctl |= RXDWA;
+	sysctl &= ~RXCKS;
 #endif
 	bfin_write_EMAC_SYSCTL(sysctl);

-- 
1.7.1

^ permalink raw reply related

* [PATCH 06/11] netdev: bfin_mac: avoid tx skb overflows in the tx DMA ring
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 6d69bbb..0b5ea01 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -920,6 +920,9 @@ static int bfin_mac_hard_start_xmit(struct sk_buff *skb,
 	u32 data_align = (unsigned long)(skb->data) & 0x3;
 	union skb_shared_tx *shtx = skb_tx(skb);
 
+	if (current_tx_ptr->next == tx_list_head)
+		return NETDEV_TX_BUSY;
+
 	current_tx_ptr->skb = skb;
 
 	if (data_align == 0x2) {
-- 
1.7.1


^ permalink raw reply related

* [PATCH 07/11] netdev: bfin_mac: add support for wake-on-lan magic packets
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Michael Hennerich
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Michael Hennerich <michael.hennerich@analog.com>

Note that WOL works only in PM Suspend Standby Mode (Sleep Mode).

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   76 +++++++++++++++++++++++++++++++++++++++++++++--
 drivers/net/bfin_mac.h |    3 ++
 2 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 0b5ea01..e80ccaa 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -464,6 +464,14 @@ static int mii_probe(struct net_device *dev)
  * Ethtool support
  */
 
+/*
+ * interrupt routine for magic packet wakeup
+ */
+static irqreturn_t bfin_mac_wake_interrupt(int irq, void *dev_id)
+{
+	return IRQ_HANDLED;
+}
+
 static int
 bfin_mac_ethtool_getsettings(struct net_device *dev, struct ethtool_cmd *cmd)
 {
@@ -498,11 +506,57 @@ static void bfin_mac_ethtool_getdrvinfo(struct net_device *dev,
 	strcpy(info->bus_info, dev_name(&dev->dev));
 }
 
+static void bfin_mac_ethtool_getwol(struct net_device *dev,
+	struct ethtool_wolinfo *wolinfo)
+{
+	struct bfin_mac_local *lp = netdev_priv(dev);
+
+	wolinfo->supported = WAKE_MAGIC;
+	wolinfo->wolopts = lp->wol;
+}
+
+static int bfin_mac_ethtool_setwol(struct net_device *dev,
+	struct ethtool_wolinfo *wolinfo)
+{
+	struct bfin_mac_local *lp = netdev_priv(dev);
+	int rc;
+
+	if (wolinfo->wolopts & (WAKE_MAGICSECURE |
+				WAKE_UCAST |
+				WAKE_MCAST |
+				WAKE_BCAST |
+				WAKE_ARP))
+		return -EOPNOTSUPP;
+
+	lp->wol = wolinfo->wolopts;
+
+	if (lp->wol && !lp->irq_wake_requested) {
+		/* register wake irq handler */
+		rc = request_irq(IRQ_MAC_WAKEDET, bfin_mac_wake_interrupt,
+				 IRQF_DISABLED, "EMAC_WAKE", dev);
+		if (rc)
+			return rc;
+		lp->irq_wake_requested = true;
+	}
+
+	if (!lp->wol && lp->irq_wake_requested) {
+		free_irq(IRQ_MAC_WAKEDET, dev);
+		lp->irq_wake_requested = false;
+	}
+
+	/* Make sure the PHY driver doesn't suspend */
+	device_init_wakeup(&dev->dev, lp->wol);
+
+	return 0;
+}
+
 static const struct ethtool_ops bfin_mac_ethtool_ops = {
 	.get_settings = bfin_mac_ethtool_getsettings,
 	.set_settings = bfin_mac_ethtool_setsettings,
 	.get_link = ethtool_op_get_link,
 	.get_drvinfo = bfin_mac_ethtool_getdrvinfo,
+	.get_wol = bfin_mac_ethtool_getwol,
+	.set_wol = bfin_mac_ethtool_setwol,
 };
 
 /**************************************************************************/
@@ -1472,9 +1526,16 @@ static int __devexit bfin_mac_remove(struct platform_device *pdev)
 static int bfin_mac_suspend(struct platform_device *pdev, pm_message_t mesg)
 {
 	struct net_device *net_dev = platform_get_drvdata(pdev);
+	struct bfin_mac_local *lp = netdev_priv(net_dev);
 
-	if (netif_running(net_dev))
-		bfin_mac_close(net_dev);
+	if (lp->wol) {
+		bfin_write_EMAC_OPMODE((bfin_read_EMAC_OPMODE() & ~TE) | RE);
+		bfin_write_EMAC_WKUP_CTL(MPKE);
+		enable_irq_wake(IRQ_MAC_WAKEDET);
+	} else {
+		if (netif_running(net_dev))
+			bfin_mac_close(net_dev);
+	}
 
 	return 0;
 }
@@ -1482,9 +1543,16 @@ static int bfin_mac_suspend(struct platform_device *pdev, pm_message_t mesg)
 static int bfin_mac_resume(struct platform_device *pdev)
 {
 	struct net_device *net_dev = platform_get_drvdata(pdev);
+	struct bfin_mac_local *lp = netdev_priv(net_dev);
 
-	if (netif_running(net_dev))
-		bfin_mac_open(net_dev);
+	if (lp->wol) {
+		bfin_write_EMAC_OPMODE(bfin_read_EMAC_OPMODE() | TE);
+		bfin_write_EMAC_WKUP_CTL(0);
+		disable_irq_wake(IRQ_MAC_WAKEDET);
+	} else {
+		if (netif_running(net_dev))
+			bfin_mac_open(net_dev);
+	}
 
 	return 0;
 }
diff --git a/drivers/net/bfin_mac.h b/drivers/net/bfin_mac.h
index 87c454f..1ae7b82 100644
--- a/drivers/net/bfin_mac.h
+++ b/drivers/net/bfin_mac.h
@@ -66,6 +66,9 @@ struct bfin_mac_local {
 	unsigned char Mac[6];	/* MAC address of the board */
 	spinlock_t lock;
 
+	int wol;		/* Wake On Lan */
+	int irq_wake_requested;
+
 	/* MII and PHY stuffs */
 	int old_link;          /* used by bf537_adjust_link */
 	int old_speed;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 08/11] netdev: bfin_mac: use promiscuous flag for promiscuous mode
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

Rather than using the Receive All Frames (RAF) bit to enable promiscuous
mode, use the Promiscuous (PR) bit.  This lowers overhead at runtime as
we let the hardware process the packets that should actually be checked.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index e80ccaa..04cda8f 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1274,7 +1274,7 @@ static void bfin_mac_set_multicast_list(struct net_device *dev)
 	if (dev->flags & IFF_PROMISC) {
 		printk(KERN_INFO "%s: set to promisc mode\n", dev->name);
 		sysctl = bfin_read_EMAC_OPMODE();
-		sysctl |= RAF;
+		sysctl |= PR;
 		bfin_write_EMAC_OPMODE(sysctl);
 	} else if (dev->flags & IFF_ALLMULTI) {
 		/* accept all multicast */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 09/11] netdev: bfin_mac: only use hardware checksum in normal IPv4 mode
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Jon Kowal, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Jon Kowal <jon.kowal@dspecialists.de>

The Blackfin on-chip MAC checksum logic only works when the IP packet has
a header length of 20 bytes.  This is true for most IPv4 packets, but not
for IPv6 packets or IPv4 packets which use header options.  So only use
the hardware checksum when appropriate.

Signed-off-by: Jon Kowal <jon.kowal@dspecialists.de>
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   42 +++++++++++++++++++++++++-----------------
 1 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 04cda8f..00d0f25 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1044,6 +1044,7 @@ out:
 	return NETDEV_TX_OK;
 }
 
+#define IP_HEADER_OFF  0
 #define RX_ERROR_MASK (RX_LONG | RX_ALIGN | RX_CRC | RX_LEN | \
 	RX_FRAG | RX_ADDR | RX_DMAO | RX_PHY | RX_LATE | RX_RANGE)
 
@@ -1098,25 +1099,32 @@ static void bfin_mac_rx(struct net_device *dev)
 	bfin_rx_hwtstamp(dev, skb);
 
 #if defined(BFIN_MAC_CSUM_OFFLOAD)
-	skb->csum = current_rx_ptr->status.ip_payload_csum;
-	/*
-	 * Deduce Ethernet FCS from hardware generated IP payload checksum.
-	 * IP checksum is based on 16-bit one's complement algorithm.
-	 * To deduce a value from checksum is equal to add its inversion.
-	 * If the IP payload len is odd, the inversed FCS should also
-	 * begin from odd address and leave first byte zero.
+	/* Checksum offloading only works for IPv4 packets with the standard IP header
+	 * length of 20 bytes, because the blackfin MAC checksum calculation is
+	 * based on that assumption. We must NOT use the calculated checksum if our
+	 * IP version or header break that assumption.
 	 */
-	if (skb->len % 2) {
-		fcs[0] = 0;
-		for (i = 0; i < ETH_FCS_LEN; i++)
-			fcs[i + 1] = ~skb->data[skb->len + i];
-		skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1, skb->csum);
-	} else {
-		for (i = 0; i < ETH_FCS_LEN; i++)
-			fcs[i] = ~skb->data[skb->len + i];
-		skb->csum = csum_partial(fcs, ETH_FCS_LEN, skb->csum);
+	if (skb->data[IP_HEADER_OFF] == 0x45) {
+		skb->csum = current_rx_ptr->status.ip_payload_csum;
+		/*
+		 * Deduce Ethernet FCS from hardware generated IP payload checksum.
+		 * IP checksum is based on 16-bit one's complement algorithm.
+		 * To deduce a value from checksum is equal to add its inversion.
+		 * If the IP payload len is odd, the inversed FCS should also
+		 * begin from odd address and leave first byte zero.
+		 */
+		if (skb->len % 2) {
+			fcs[0] = 0;
+			for (i = 0; i < ETH_FCS_LEN; i++)
+				fcs[i + 1] = ~skb->data[skb->len + i];
+			skb->csum = csum_partial(fcs, ETH_FCS_LEN + 1, skb->csum);
+		} else {
+			for (i = 0; i < ETH_FCS_LEN; i++)
+				fcs[i] = ~skb->data[skb->len + i];
+			skb->csum = csum_partial(fcs, ETH_FCS_LEN, skb->csum);
+		}
+		skb->ip_summed = CHECKSUM_COMPLETE;
 	}
-	skb->ip_summed = CHECKSUM_COMPLETE;
 #endif
 
 	netif_rx(skb);
-- 
1.7.1


^ permalink raw reply related

* [PATCH 10/11] netdev: bfin_mac: handle timeouts with the MDIO registers gracefully
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

Have the low level MDIO functions pass back up timeout information so we
don't waste time polling them multiple times when there is a problem, and
so we don't let higher layers think the device is available when it isn't.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |   53 ++++++++++++++++++++++++++++++-----------------
 1 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 00d0f25..a4c9ab2 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -81,9 +81,6 @@ static u16 pin_req[] = P_RMII0;
 static u16 pin_req[] = P_MII0;
 #endif
 
-static void bfin_mac_disable(void);
-static void bfin_mac_enable(void);
-
 static void desc_list_free(void)
 {
 	struct net_dma_desc_rx *r;
@@ -260,7 +257,7 @@ init_error:
  * MII operations
  */
 /* Wait until the previous MDC/MDIO transaction has completed */
-static void bfin_mdio_poll(void)
+static int bfin_mdio_poll(void)
 {
 	int timeout_cnt = MAX_TIMEOUT_CNT;
 
@@ -270,22 +267,30 @@ static void bfin_mdio_poll(void)
 		if (timeout_cnt-- < 0) {
 			printk(KERN_ERR DRV_NAME
 			": wait MDC/MDIO transaction to complete timeout\n");
-			break;
+			return -ETIMEDOUT;
 		}
 	}
+
+	return 0;
 }
 
 /* Read an off-chip register in a PHY through the MDC/MDIO port */
 static int bfin_mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum)
 {
-	bfin_mdio_poll();
+	int ret;
+
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	/* read mode */
 	bfin_write_EMAC_STAADD(SET_PHYAD((u16) phy_addr) |
 				SET_REGAD((u16) regnum) |
 				STABUSY);
 
-	bfin_mdio_poll();
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	return (int) bfin_read_EMAC_STADAT();
 }
@@ -294,7 +299,11 @@ static int bfin_mdiobus_read(struct mii_bus *bus, int phy_addr, int regnum)
 static int bfin_mdiobus_write(struct mii_bus *bus, int phy_addr, int regnum,
 			      u16 value)
 {
-	bfin_mdio_poll();
+	int ret;
+
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	bfin_write_EMAC_STADAT((u32) value);
 
@@ -304,9 +313,7 @@ static int bfin_mdiobus_write(struct mii_bus *bus, int phy_addr, int regnum,
 				STAOP |
 				STABUSY);
 
-	bfin_mdio_poll();
-
-	return 0;
+	return bfin_mdio_poll();
 }
 
 static int bfin_mdiobus_reset(struct mii_bus *bus)
@@ -1184,8 +1191,9 @@ static void bfin_mac_disable(void)
 /*
  * Enable Interrupts, Receive, and Transmit
  */
-static void bfin_mac_enable(void)
+static int bfin_mac_enable(void)
 {
+	int ret;
 	u32 opmode;
 
 	pr_debug("%s: %s\n", DRV_NAME, __func__);
@@ -1195,7 +1203,9 @@ static void bfin_mac_enable(void)
 	bfin_write_DMA1_CONFIG(rx_list_head->desc_a.config);
 
 	/* Wait MII done */
-	bfin_mdio_poll();
+	ret = bfin_mdio_poll();
+	if (ret)
+		return ret;
 
 	/* We enable only RX here */
 	/* ASTP   : Enable Automatic Pad Stripping
@@ -1219,6 +1229,8 @@ static void bfin_mac_enable(void)
 #endif
 	/* Turn on the EMAC rx */
 	bfin_write_EMAC_OPMODE(opmode);
+
+	return 0;
 }
 
 /* Our watchdog timed out. Called by the networking layer */
@@ -1333,7 +1345,7 @@ static void bfin_mac_shutdown(struct net_device *dev)
 static int bfin_mac_open(struct net_device *dev)
 {
 	struct bfin_mac_local *lp = netdev_priv(dev);
-	int retval;
+	int ret;
 	pr_debug("%s: %s\n", dev->name, __func__);
 
 	/*
@@ -1347,18 +1359,21 @@ static int bfin_mac_open(struct net_device *dev)
 	}
 
 	/* initial rx and tx list */
-	retval = desc_list_init();
-
-	if (retval)
-		return retval;
+	ret = desc_list_init();
+	if (ret)
+		return ret;
 
 	phy_start(lp->phydev);
 	phy_write(lp->phydev, MII_BMCR, BMCR_RESET);
 	setup_system_regs(dev);
 	setup_mac_addr(dev->dev_addr);
+
 	bfin_mac_disable();
-	bfin_mac_enable();
+	ret = bfin_mac_enable();
+	if (ret)
+		return ret;
 	pr_debug("hardware init finished\n");
+
 	netif_start_queue(dev);
 	netif_carrier_on(dev);
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH 11/11] netdev: bfin_mac: check for mii_bus platform data
From: Mike Frysinger @ 2010-05-09 10:18 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: uclinux-dist-devel, Sonic Zhang
In-Reply-To: <1273400337-26501-1-git-send-email-vapier@gentoo.org>

From: Sonic Zhang <sonic.zhang@analog.com>

If the platform data for the mii_bus is missing, gracefully error out
rather than deference NULL pointers.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 drivers/net/bfin_mac.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index a4c9ab2..c65d5e1 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1472,6 +1472,11 @@ static int __devinit bfin_mac_probe(struct platform_device *pdev)
 	}
 	pd = pdev->dev.platform_data;
 	lp->mii_bus = platform_get_drvdata(pd);
+	if (!lp->mii_bus) {
+		dev_err(&pdev->dev, "Cannot get mii_bus!\n");
+		rc = -ENODEV;
+		goto out_err_mii_bus_probe;
+	}
 	lp->mii_bus->priv = ndev;
 
 	rc = mii_probe(ndev);
@@ -1517,6 +1522,7 @@ out_err_request_irq:
 out_err_mii_probe:
 	mdiobus_unregister(lp->mii_bus);
 	mdiobus_free(lp->mii_bus);
+out_err_mii_bus_probe:
 	peripheral_free_list(pin_req);
 out_err_probe_mac:
 	platform_set_drvdata(pdev, NULL);
-- 
1.7.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox