Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: How to use gretap with bridge?
From: Neulinger, Nathan @ 2009-10-29 19:01 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20091029170631.GA29405@gondor.apana.org.au>

Further testing - if the leading octet of the 'local' address is even,
it allows it to be added to bridge, if it's odd, it won't.

Any ideas?

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       nneul@mst.edu
Missouri S&T Information Technology    (573) 612-1412
System Administrator - Principal       KD0DMH


> -----Original Message-----
> From: Neulinger, Nathan
> Sent: Thursday, October 29, 2009 1:39 PM
> To: 'netdev@vger.kernel.org'
> Subject: RE: How to use gretap with bridge?
> 
> I've been able to reproduce this with a upstream kernel (2.6.32-rc5) -
> symptom appears to be specific to the IP addresses specified on the ip
> command, but not in any clear way. I assume that remote should be the
> ip of the host at the remote end of the tunnel, and local should be an
> IP address of a real interface on the this machine?
> 
> Am I missing something obvious here? At the time of the below
commands,
> br0 exists, but has no members, and eth0 is configured and up with ip
> 131.151.0.36/255.255.254.0. All other interfaces are down.
> 
> [root@bridge-rol ~]# ip link add gre3 type gretap remote 131.151.35.35
> local 131.151.0.36
> [root@bridge-rol ~]# brctl addif br0 gre3
> can't add gre3 to bridge br0: Invalid argument
> 
> [root@bridge-rol ~]# ip link del gre3
> [root@bridge-rol ~]# ip link add gre3 type gretap remote 131.151.35.35
> local 131.151.0.35
> [root@bridge-rol ~]# brctl addif br0 gre3
> can't add gre3 to bridge br0: Invalid argument
> 
> [root@bridge-rol ~]# ip link del gre3
> [root@bridge-rol ~]# ip link add gre3 type gretap remote 131.151.35.35
> local 10.151.0.35
> [root@bridge-rol ~]# brctl addif br0 gre3
> 
> [root@bridge-rol ~]# ip link del gre3
> [root@bridge-rol ~]# ip link add gre3 type gretap remote 131.151.35.35
> local 131.1.1.1
> [root@bridge-rol ~]# brctl addif br0 gre3
> can't add gre3 to bridge br0: Invalid argument
> 
> 
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       nneul@mst.edu
> Missouri S&T Information Technology    (573) 612-1412
> System Administrator - Principal       KD0DMH
> 
> 
> > -----Original Message-----
> > From: Herbert Xu [mailto:herbert@gondor.apana.org.au]
> > Sent: Thursday, October 29, 2009 12:07 PM
> > To: Neulinger, Nathan
> > Subject: Re: How to use gretap with bridge?
> >
> > On Thu, Oct 29, 2009 at 10:41:18AM -0500, Neulinger, Nathan wrote:
> > > Is there some trick I'm missing to adding a gretap interface to a
> > > bridge?
> > >
> > > ip link add gre1 type gretap remote 131.151.0.36 local
131.151.0.35
> > > ip link set gre1 up
> > > brctl addbr br0
> > > brctl addif br0 gre1
> > >
> > > This results in an Invalid argument error when issuing the addif.
> > > Testing with latest fc12 2.6.31.5-96 kernel.
> > >
> > > Any suggestions?
> >
> > I can't reproduce this here.  Can you please try the latest
> > upstream kernel? If it still does the same thing, please post
> > to netdev@vger.kernel.org.
> >
> > Thanks!
> > --
> > Visit Openswan at http://www.openswan.org/
> > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2009-10-28
From: Pekka Enberg @ 2009-10-29 19:45 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: David Miller, linux-wireless, netdev, linux-kernel, linville
In-Reply-To: <200910291514.40318.bzolnier@gmail.com>

Hi Bart,

On Thu, Oct 29, 2009 at 4:14 PM, Bartlomiej Zolnierkiewicz
<bzolnier@gmail.com> wrote:
>> lots of cleanups to the staging drivers, why not direct some of that
>> energy to the drivers/net/wireless ones?
>
> When did we start to apply "fix it yourself" rule instead of "submitter
> should fix it" one to the _new_ code..

Don't be silly, I didn't say that.

I was simply pointing out that your time would probably be better
spent in improving the "proper" ralink wireless drivers but if you
_really_ prefer to spend your time in pointless arguments, go ahead.
It should be pretty obvious by now that the best way to improve things
is to work with the relevant maintainers, not against them. (Unless
you wish your work to be ignored, of course.)

                        Pekka

^ permalink raw reply

* Re: How to use gretap with bridge?
From: Stephen Hemminger @ 2009-10-29 20:00 UTC (permalink / raw)
  To: Neulinger, Nathan; +Cc: netdev
In-Reply-To: <846C5B546E47494CBBD796CA8CA1617EA3B428@MST-VMAIL1.srv.mst.edu>

On Thu, 29 Oct 2009 14:01:31 -0500
"Neulinger, Nathan" <nneul@mst.edu> wrote:

> Further testing - if the leading octet of the 'local' address is even,
> it allows it to be added to bridge, if it's odd, it won't.
> 
> Any ideas?
> 

If leading octet of MAC address is odd, then bridge thinks it
is not a valid ethernet for bridging because it is a multicast
address.

-- 

^ permalink raw reply

* Re: [PATCH 3/3] net: TCP thin dupack
From: Ilpo Järvinen @ 2009-10-29 20:14 UTC (permalink / raw)
  To: Andreas Petlund; +Cc: Netdev, LKML, shemminger, David Miller
In-Reply-To: <879da81bfa8a9f0f34717c64b08332ed.squirrel@webmail.uio.no>

On Thu, 29 Oct 2009, apetlund@simula.no wrote:

> I apologise that some of you received this mail more than once. My email
> client played a HTML-trick on me.
> 
> >> +	/* If a thin stream is detected, retransmit after first
> >> +	 * received dupack */
> >> +	if ((tp->thin_dupack || sysctl_tcp_force_thin_dupack) &&
> >> +	    tcp_dupack_heurestics(tp) > 1 && tcp_stream_is_thin(tp))
> >> +		return 1;
> >> +
> >>  	return 0;
> >>  }
> >
> > Have you tested it? ...I doubt this will work like you say and
> retransmit
> > something when the window is small. ...Besides, you should have built
> this
> > patch on top of the function rename you submitted earlier as after DaveM
> applied that this will no longer even compile...
> >
> > --
> >  i.
> >
> 
> We have performed extensive tests mapping the effect of the patch you
> commented on some months ago. Since then, the only change was the one you
> requested of switching tcp_fackets_out() with tcp_dupack_heurestics().
> After inspecting the code, I believed the effect should be equal to the
> previous, only making considerations for SACK and FACK availability.
> Please tell if this will break the intended effect, and I will modify the
> patch accordingly.

Ah, you're of course right. FACK retransmits the head always but RFC3517 
mode doesn't. I think you'd need to artificially lower (ie., to calculate)
the dupthresh (from tp->reordering) to be 1 for it to work as intented.

> Graphs from our tests of the original patch can be found at the location
> linked to below.  I have tested the new one for functionality, but have
> not et performed tests on this scope as the changes were minor. I will, of
> course, fix the function rename in the next iteration. Sorry for that.
> 
> http://folk.uio.no/apetlund/lktmp/

You curiousity, have you run this more aggressive form of early retransmit 
against the one ID gives? ...I checked your results but if I understood 
them correctly the IDish early retransmit wasn't among the variants used.

-- 
 i.

^ permalink raw reply

* Re: [PATCH 2/3] [RFC] Add c/r support for connected INET sockets (v3)
From: Oren Laadan @ 2009-10-29 20:15 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers, netdev, John Dykstra
In-Reply-To: <1256666008-8231-3-git-send-email-danms@us.ibm.com>


Dan Smith wrote:
> This patch adds basic support for C/R of open INET sockets.  I think that
> all the important bits of the TCP and ICSK socket structures is saved,
> but I think there is still some additional IPv6 stuff that needs to be
> handled.
> 
> With this patch applied, the following script can be used to demonstrate
> the functionality:
> 
>   https://lists.linux-foundation.org/pipermail/containers/2009-October/021239.html
> 
> It shows that this enables migration of a sendmail process with open
> connections from one machine to another without dropping.
> 
> We probably need comments from the netdev people about the quality of
> sanity checking we do on the values in the ckpt_hdr_socket_inet
> structure on restart.
> 
> Note that this still doesn't address lingering sockets yet.
> 
> Changes in v3:
>  - Prevent restart from allowing a bind on a <1024 port unless the
>    user is granted that capability
>  - Add some sanity checking in the inet_precheck() function to make sure
>    the values read from the checkpoint image are within acceptable ranges
>  - Check the result of sock_restore_header_info() and fail if needed
> 
> Changes in v2:
>  - Restore saddr, rcv_saddr, daddr, sport, and dport from the sockaddr
>    structure instead of saving them separately
>  - Fix 'sock' naming in sock_cptrst()
>  - Don't take the queue lock before skb_queue_tail() since it is
>    done for us
>  - Allow "listen only" restore behavior if RESTART_SOCK_LISTENONLY
>    flag is specified on sys_restart()
>  - Pull the implementation of the list of listening sockets back into
>    this patch
>  - Fix dangling printk
>  - Add some comments around the parent/child restore logic
> 
> Cc: netdev@vger.kernel.org
> Cc: Oren Laadan <orenl@librato.com>
> Cc: John Dykstra <jdykstra72@gmail.com>
> Signed-off-by: Dan Smith <danms@us.ibm.com>
> ---

This looks good:
Acked-by: Oren Laadan <orenl@cs.columbia.edu>

I still want to move this to right under the restart-specific fields:
	struct list_head listen_sockets;/* listening parent sockets */

Also, I'm looking for a better name for RESTART_SOCK_LISTENONLY
(it isn't listen only, udp sockets are preserved...), something
that will convey the idea that we drop old connections,
perhaps:
	RESTART_NET_RESET
	RESTART_CONN_RESET
	RESTART_DROPCONN
?

Oren.


^ permalink raw reply

* RE: How to use gretap with bridge?
From: Neulinger, Nathan @ 2009-10-29 20:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20091029130036.1e61f415@nehalam>

I was referring to the local IP in the "ip link add ... remote x.z.z.z
local y.z.z.z" command specifying the endpoints of the tunnel. It lets
it be added to the bridge if y is even, but not if y is odd. Why should
it care what the IP of the tunnel endpoints are?

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       nneul@mst.edu
Missouri S&T Information Technology    (573) 612-1412
System Administrator - Principal       KD0DMH

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> Sent: Thursday, October 29, 2009 3:01 PM
> To: Neulinger, Nathan
> Cc: netdev@vger.kernel.org
> Subject: Re: How to use gretap with bridge?
> 
> On Thu, 29 Oct 2009 14:01:31 -0500
> "Neulinger, Nathan" <nneul@mst.edu> wrote:
> 
> > Further testing - if the leading octet of the 'local' address is
> even,
> > it allows it to be added to bridge, if it's odd, it won't.
> >
> > Any ideas?
> >
> 
> If leading octet of MAC address is odd, then bridge thinks it
> is not a valid ethernet for bridging because it is a multicast
> address.
> 
> 
> --

^ permalink raw reply

* Re: [PATCH 1/3] net: TCP thin-stream detection
From: Ilpo Järvinen @ 2009-10-29 20:26 UTC (permalink / raw)
  To: Arnd Hannemann
  Cc: Andreas Petlund, William Allen Simpson, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, shemminger@vyatta.com,
	davem@davemloft.net, Christian Samsel
In-Reply-To: <4AE9C396.3040705@nets.rwth-aachen.de>

On Thu, 29 Oct 2009, Arnd Hannemann wrote:

> Andreas Petlund schrieb:
> > Den 28. okt. 2009 kl. 04.09 skrev William Allen Simpson:
> > 
> >> Andreas Petlund wrote:
> >>> +/* Determines whether this is a thin stream (which may suffer from
> >>> + * increased latency). Used to trigger latency-reducing mechanisms.
> >>> + */
> >>> +static inline unsigned int tcp_stream_is_thin(const struct  
> >>> tcp_sock *tp)
> >>> +{
> >>> +	return tp->packets_out < 4;
> >>> +}
> >>> +
> >> This bothers me a bit.  Having just looked at your Linux presentation,
> >> and not (yet) read your papers, it seems much of your justification  
> >> was
> >> with 1 packet per RTT.  Here, you seem to be concentrating on 4,  
> >> probably
> >> because many implementations quickly ramp up to 4.
> >>
> > 
> > The limit of 4 packets in flight is based on the fact that less than 4  
> > packets in flight makes fast retransmissions impossible, thus limiting  
> > the retransmit options to timeout-retransmissions. The criterion is  
> 
> There is Limited Transmit! So this is generally not true.
> 
> > therefore as conservative as possible while still serving its purpose.  
> > If further losses occur, the exponential backoff will increase latency  
> > further. The concept of using this limit is also discussed in the  
> > Internet draft for Early Retransmit by Allman et al.:
> > http://www.icir.org/mallman/papers/draft-ietf-tcpm-early-rexmt-01.txt
> 
> This ID is covering exactly the cases which Limited Transmit does not
> cover and works "automagically" without help of application. So why not
> just implement this ID?

I even gave some advise recently to one guy how to polish up the early 
retransmit implementation of his. ...However, I think we haven't heard 
from that since then... I added him as CC if he happens to have it already 
done.

It is actually so that patches 1+3 implement sort of an early retransmit, 
just slightly more aggressive of it than what is given in ID but I find 
the difference in the aggressiveness rather insignificant. ...Whereas the 
RTO stuff is more questionable.


-- 
 i.

^ permalink raw reply

* Re: [PATCH 2/3] net: TCP thin linear timeouts
From: Ilpo Järvinen @ 2009-10-29 20:52 UTC (permalink / raw)
  To: Andreas Petlund
  Cc: Eric Dumazet, Arnd Hannemann, Netdev, LKML, shemminger,
	David Miller
In-Reply-To: <69812160e5682c9fb4acba05bc082664.squirrel@webmail.uio.no>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4304 bytes --]

On Thu, 29 Oct 2009, apetlund@simula.no wrote:

> > Andreas Petlund a écrit :
> >
> >> The removal of exponential backoff on a general basis has been
> >> investigated and discussed already, for instance here:
> >> http://ccr.sigcomm.org/online/?q=node/416
> >> Such steps are, however considered drastic, and I agree that caution
> must be made to thoroughly investigate the effects of such changes. The
> changes introduced by the proposed patches, however, are not
> default
> >> behaviour, but an option for applications that suffer from the
> >> thin-stream TCP increased retransmission latencies. They will, as such,
> not affect all streams. In addition, the changes will only be active
> for
> >> streams which are perpetually thin or in the early phase of expanding
> their cwnd. Also, experiments performed on congested bottlenecks with
> tail-drop queues show very little (if any at all) effect on goodput for
> the modified scenario compared to a scenario with unmodified TCP
> streams.
> >> Graphs both for latency-results and fairness tests can be found here:
> http://folk.uio.no/apetlund/lktmp/
> >
> > There should be a limit to linear timeouts, to say ... no more than 6
> retransmits
> > (eventually tunable), then switch to exponential backoff. Maybe your
> patch
> > already implement such heuristic ?
> 
> The limitation you suggest to the linear timeouts makes very good sense.
> Our experiments performed on the Internet indicate that it is extremely
> rare that more than 6 retransmissions are needed to recover. It is not
> included in the current patch, so I will include this in the next
> iteration.

I've heard that BSD would use linear for first three and then exponential 
but this is based on some gossip (which could well turn out to be a myth) 
rather than checking it out myself. But if it is true, it certainly hasn't 
been that devastating.

> > True link collapses do happen, it would be good if not all streams
> wakeup
> > in the same
> > second and make recovery very slow.
> >
> 
> Each stream will have its own schedule for wakeup, so such events will
> still be subject to coincidence. The timer granularity of the TCP wakeup
> timer will also influence how many streams will wake at the same time. The
> experiments we have performed on severely congested bottlenecks (link
> above) indicate that the modifications will not create a large negative
> effect. In fact, when goodput is drastically reduced due to severe
> overload, regular TCP and the LT and dupACK modifications seem to perform
> nearly identically. Other scenarios may exist where different effects can
> be observed, and I am open to suggestions for further testing.

Could you point out where exactly where the goodput results? ...I only 
seem to find latency results which is not exactly the same. I don't except
some that is in order of what Nagle talks (32kbps -> 40bps irc) but 10-50% 
goodput reduction over a relatively short period of time (until RTTs top 
RTOs once again preventing spurious RTOs and thus also segment duplication 
due to retransmissions ceases).

Were these results obtained with Linux, and if so what was FRTO set to?

> > Thats too easy to accept possibly dangerous features with the excuse of
> saying
> > "It wont be used very much", because you cannot predict the future.
> 
> I agree that it is no argument to say that it won't be used much; indeed,
> my hope is that it will be used much. However, our experiments indicate no
> negative effects while showing a large improvement on retransmission
> latency for the scenario in question. I therefore think that the option
> for such an improvement should be made available for time-dependent
> thin-stream applications.

Everyone can right away tell that most RTOs are not due to extreme 
congestion, so some linear back off seems sensible when dupACK feedback 
is lacking for some reason. Of course it is a tradeoff as there's that 
chance for getting 1/(n+1) goodput only (where n is the number of linear 
steps) step if RTOs were spurious (and without FRTO even more unnecessary 
retransmission will be triggered so in fact even could be slightly worse 
in theory). But that to happen in the first place requires of course this 
RTT > RTO situation which is hard to see to be a persisting state.


-- 
 i.

^ permalink raw reply

* RE: How to use gretap with bridge?
From: Neulinger, Nathan @ 2009-10-29 20:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev
In-Reply-To: <20091029130036.1e61f415@nehalam>

As a note - the bridging/tunneling is working perfectly once I force it
to use a bogus IP range that starts with an even number, but
unfortunately, that's not going to work so good given that our primary
address space is 131.151.x.x. 

Any ideas on what is up with the even/odd error?

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       nneul@mst.edu
Missouri S&T Information Technology    (573) 612-1412
System Administrator - Principal       KD0DMH


> -----Original Message-----
> From: Neulinger, Nathan
> Sent: Thursday, October 29, 2009 3:22 PM
> To: 'Stephen Hemminger'
> Cc: netdev@vger.kernel.org
> Subject: RE: How to use gretap with bridge?
> 
> I was referring to the local IP in the "ip link add ... remote x.z.z.z
> local y.z.z.z" command specifying the endpoints of the tunnel. It lets
> it be added to the bridge if y is even, but not if y is odd. Why
should
> it care what the IP of the tunnel endpoints are?
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       nneul@mst.edu
> Missouri S&T Information Technology    (573) 612-1412
> System Administrator - Principal       KD0DMH
> 
> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> > Sent: Thursday, October 29, 2009 3:01 PM
> > To: Neulinger, Nathan
> > Cc: netdev@vger.kernel.org
> > Subject: Re: How to use gretap with bridge?
> >
> > On Thu, 29 Oct 2009 14:01:31 -0500
> > "Neulinger, Nathan" <nneul@mst.edu> wrote:
> >
> > > Further testing - if the leading octet of the 'local' address is
> > even,
> > > it allows it to be added to bridge, if it's odd, it won't.
> > >
> > > Any ideas?
> > >
> >
> > If leading octet of MAC address is odd, then bridge thinks it
> > is not a valid ethernet for bridging because it is a multicast
> > address.
> >
> >
> > --

^ permalink raw reply

* Re: [RFC-PATCH] dhcp provisioning support in cxgb3i
From: Mike Christie @ 2009-10-29 21:09 UTC (permalink / raw)
  To: Rakesh Ranjan
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, James Bottomley, Karen Xie,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org, LKML,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4AE995C4.4080909-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>


Rakesh Ranjan wrote:
> Hi Mike,
> 
> Herein attached patches for having dhcp provisioning support in cxgb3i. 
> I have added one new iscsi netlink message ISCSI_UEVENT_REQ_IPCONF. 

Is the idea to have iscsid/uip send down this msg?

Was it not possible to hook in more like how bnx2i does dhcp?

^ permalink raw reply

* Re: [RFC] multiqueue changes
From: Jarek Poplawski @ 2009-10-29 21:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List
In-Reply-To: <4AE9C4C3.9040503@trash.net>

On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
...
> Well, we do need both values for supporting changes to the actually
> used numbers of TX queues. If I understood Dave's explanation correctly,
> this is also what's intended. It also doesn't seem unreasonable
> what bnx2 is doing.

Exactly. With a growing number of cores, both available and powered
off, these values will be soon treated more carefully than now.

> But getting back to the problem Eric reported - so you're suggesting
> that bnx2.c should also adjust num_tx_queues in case the hardware
> doesn't support multiqueue? That seems reasonable as well.

Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
too soon for some drivers. It seems they should be able to do it
separately later during the .probe. There is a question if .ndo_open
should be considered too.

Jarek P.

^ permalink raw reply

* Re: pull request: wireless-next-2.6 2009-10-28
From: Bartlomiej Zolnierkiewicz @ 2009-10-29 21:48 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: David Miller, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linville-2XuSBdqkA4R54TAoqtyWWQ
In-Reply-To: <84144f020910291245l1a7a3fd6o8822ecd4ce3b5504-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thursday 29 October 2009 20:45:07 Pekka Enberg wrote:
> Hi Bart,
> 
> On Thu, Oct 29, 2009 at 4:14 PM, Bartlomiej Zolnierkiewicz
> <bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> lots of cleanups to the staging drivers, why not direct some of that
> >> energy to the drivers/net/wireless ones?
> >
> > When did we start to apply "fix it yourself" rule instead of "submitter
> > should fix it" one to the _new_ code..
> 
> Don't be silly, I didn't say that.

Sorry, I must have misunderstood you.

> I was simply pointing out that your time would probably be better
> spent in improving the "proper" ralink wireless drivers but if you

Thanks for the concern.  However recent discussions made my realize how
I should really be spending my time effectively way too well.

> _really_ prefer to spend your time in pointless arguments, go ahead.

I don't think that my technical arguments are pointless.

Quite the contrary, I'm pretty confident that addressing my review concerns
would result in better RT28x00 / RT30x0 support in the very near future.

> It should be pretty obvious by now that the best way to improve things
> is to work with the relevant maintainers, not against them. (Unless
> you wish your work to be ignored, of course.)

I work with a lot of other maintainers.  I would say that providing valuable
review feedback is also "working with" (at least I would be very happy with
such feedback in my projects).

--
Bartlomiej Zolnierkiewicz
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: How to use gretap with bridge?
From: Neulinger, Nathan @ 2009-10-29 22:04 UTC (permalink / raw)
  To: Herbert Xu, Stephen Hemminger; +Cc: netdev
In-Reply-To: <20091029130036.1e61f415@nehalam>

Now I see it - Stephen actually had it right on - the problem is that
the gre tunnel is creating a MAC address on the fly based on the tunnel
endpoint ip address, so if the tunnel endpoint address starts with an
odd number, it hits the multicast check in the bridging code. (I'm sure
that's what he meant and I just missed it entirely.)

Simplest option would probably be to just mask off the first octet with
0xFD or using the ip as the last four octets of the mac instead of the
first four.

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       nneul@mst.edu
Missouri S&T Information Technology    (573) 612-1412
System Administrator - Principal       KD0DMH


> -----Original Message-----
> From: Neulinger, Nathan
> Sent: Thursday, October 29, 2009 3:59 PM
> To: 'Herbert Xu'
> Cc: 'netdev@vger.kernel.org'
> Subject: RE: How to use gretap with bridge?
> 
> As a note - the bridging/tunneling is working perfectly once I force
it
> to use a bogus IP range that starts with an even number, but
> unfortunately, that's not going to work so good given that our primary
> address space is 131.151.x.x.
> 
> Any ideas on what is up with the even/odd error?
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       nneul@mst.edu
> Missouri S&T Information Technology    (573) 612-1412
> System Administrator - Principal       KD0DMH
> 
> 
> > -----Original Message-----
> > From: Neulinger, Nathan
> > Sent: Thursday, October 29, 2009 3:22 PM
> > To: 'Stephen Hemminger'
> > Cc: netdev@vger.kernel.org
> > Subject: RE: How to use gretap with bridge?
> >
> > I was referring to the local IP in the "ip link add ... remote
> x.z.z.z
> > local y.z.z.z" command specifying the endpoints of the tunnel. It
> lets
> > it be added to the bridge if y is even, but not if y is odd. Why
> should
> > it care what the IP of the tunnel endpoints are?
> >
> > -- Nathan
> >
> > ------------------------------------------------------------
> > Nathan Neulinger                       nneul@mst.edu
> > Missouri S&T Information Technology    (573) 612-1412
> > System Administrator - Principal       KD0DMH
> >
> >
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:shemminger@vyatta.com]
> > > Sent: Thursday, October 29, 2009 3:01 PM
> > > To: Neulinger, Nathan
> > > Cc: netdev@vger.kernel.org
> > > Subject: Re: How to use gretap with bridge?
> > >
> > > On Thu, 29 Oct 2009 14:01:31 -0500
> > > "Neulinger, Nathan" <nneul@mst.edu> wrote:
> > >
> > > > Further testing - if the leading octet of the 'local' address is
> > > even,
> > > > it allows it to be added to bridge, if it's odd, it won't.
> > > >
> > > > Any ideas?
> > > >
> > >
> > > If leading octet of MAC address is odd, then bridge thinks it
> > > is not a valid ethernet for bridging because it is a multicast
> > > address.
> > >
> > >
> > > --

^ permalink raw reply

* Re: How to use gretap with bridge?
From: Stephen Hemminger @ 2009-10-29 22:12 UTC (permalink / raw)
  To: Neulinger, Nathan; +Cc: netdev
In-Reply-To: <846C5B546E47494CBBD796CA8CA1617EA3B431@MST-VMAIL1.srv.mst.edu>

On Thu, 29 Oct 2009 15:22:19 -0500
"Neulinger, Nathan" <nneul@mst.edu> wrote:

> I was referring to the local IP in the "ip link add ... remote x.z.z.z
> local y.z.z.z" command specifying the endpoints of the tunnel. It lets
> it be added to the bridge if y is even, but not if y is odd. Why should
> it care what the IP of the tunnel endpoints are?
> 
> -- Nathan
> 

It looks like a GRE driver bug. It uses IP address for dev->dev_addr
when it really should be generating a Ethernet address when using
Ethernet wrapper mode.

^ permalink raw reply

* Re: [RFC] multiqueue changes
From: Patrick McHardy @ 2009-10-29 22:12 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List
In-Reply-To: <20091029211543.GA3036@ami.dom.local>

Jarek Poplawski wrote:
> On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> ...
>> Well, we do need both values for supporting changes to the actually
>> used numbers of TX queues. If I understood Dave's explanation correctly,
>> this is also what's intended. It also doesn't seem unreasonable
>> what bnx2 is doing.
> 
> Exactly. With a growing number of cores, both available and powered
> off, these values will be soon treated more carefully than now.
> 
>> But getting back to the problem Eric reported - so you're suggesting
>> that bnx2.c should also adjust num_tx_queues in case the hardware
>> doesn't support multiqueue? That seems reasonable as well.
> 
> Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> too soon for some drivers. It seems they should be able to do it
> separately later during the .probe.

The value passed into alloc_netdev_mq() is just used for allocation
purposes, from what I can tell there's no downside in reducing it
before the dev_activate() call.

> There is a question if .ndo_open should be considered too.

I currently can't see any purpose in decreasing num_tx_queues after
registration instead of real_num_tx_queues. But it depends on how
exactly this will be implemented and how it interacts with qdiscs
(hence me previous mail, where I tried to point out possible
inconsistencies from using real_num_tx_queues).

^ permalink raw reply

* [RFC] bridge: check address size
From: Stephen Hemminger @ 2009-10-29 22:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Neulinger, Nathan, netdev
In-Reply-To: <20091029151222.156945ca@nehalam>

Check the address size of underlying device because the bridge assumes
the underlying device has ethernet address format. See forwarding table
and STP, for places where this true.

Also, add some comments to explain errors.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/bridge/br_if.c	2009-10-29 15:18:48.363916679 -0700
+++ b/net/bridge/br_if.c	2009-10-29 15:21:38.142667043 -0700
@@ -377,12 +377,17 @@ int br_add_if(struct net_bridge *br, str
 	struct net_bridge_port *p;
 	int err = 0;
 
-	if (dev->flags & IFF_LOOPBACK || dev->type != ARPHRD_ETHER)
+	/* Don't allow bridging non ethernet like devices */
+	if (dev->flags & IFF_LOOPBACK
+	    || dev->type != ARPHRD_ETHER
+	    || dev->addr_len != ETH_ALEN)
 		return -EINVAL;
 
+	/* No bridging of bridges */
 	if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit)
 		return -ELOOP;
 
+	/* Device is already being bridged */
 	if (dev->br_port != NULL)
 		return -EBUSY;
 


^ permalink raw reply

* Re: [PATCH] net: allow netdev_wait_allrefs() to run faster
From: Eric W. Biederman @ 2009-10-29 23:07 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Eric Dumazet, Octavian Purdila, netdev, Cosmin Ratiu
In-Reply-To: <20091021165139.GL877@kvack.org>

Benjamin LaHaise <bcrl@lhnet.ca> writes:

> On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote:
>> Ben patch only address interface deletion, and one part of the problem,
>> maybe the more visible one for the current kernel.
>
> The first part I've been tackling has been the overhead in procfs, sysctl 
> and sysfs.  

Could you keep me in the loop with that.  I have some pending cleanups for
all of those pieces of code and may be able to help/advice/review.

Eric

^ permalink raw reply

* Re: [PATCH] net: allow netdev_wait_allrefs() to run faster
From: Benjamin LaHaise @ 2009-10-29 23:38 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Eric Dumazet, Octavian Purdila, netdev, Cosmin Ratiu
In-Reply-To: <m1d445yf2x.fsf@fess.ebiederm.org>

On Thu, Oct 29, 2009 at 04:07:18PM -0700, Eric W. Biederman wrote:
> Could you keep me in the loop with that.  I have some pending cleanups for
> all of those pieces of code and may be able to help/advice/review.

Here are the sysfs scaling improvements.  I have to break them up, as there 
are 3 separate changes in this patch: 1. use an rbtree for name lookup in 
sysfs, 2. keep track of the number of directories for the purpose of 
generating the link count, as otherwise too much cpu time is spent in 
sysfs_count_nlink when new entries are added, and 3. when adding a new 
sysfs_dirent, walk the list backwards when linking it in, as higher 
numbered inodes tend to be at the end of the list, not the beginning.

		-ben


diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5fad489..38ad7c8 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -43,10 +43,18 @@ static DEFINE_IDA(sysfs_ino_ida);
 static void sysfs_link_sibling(struct sysfs_dirent *sd)
 {
 	struct sysfs_dirent *parent_sd = sd->s_parent;
-	struct sysfs_dirent **pos;
+	struct sysfs_dirent **pos, *prev = NULL;
+	struct rb_node **new, *parent;
 
 	BUG_ON(sd->s_sibling);
 
+	if (parent_sd->s_dir.children_tail &&
+	    parent_sd->s_dir.children_tail->s_ino < sd->s_ino) {
+		prev = parent_sd->s_dir.children_tail;
+		pos = &prev->s_sibling;
+		goto got_it;
+	}
+
 	/* Store directory entries in order by ino.  This allows
 	 * readdir to properly restart without having to add a
 	 * cursor into the s_dir.children list.
@@ -54,9 +62,36 @@ static void sysfs_link_sibling(struct sysfs_dirent *sd)
 	for (pos = &parent_sd->s_dir.children; *pos; pos = &(*pos)->s_sibling) {
 		if (sd->s_ino < (*pos)->s_ino)
 			break;
+		prev = *pos;
 	}
+got_it:
+	if (prev == parent_sd->s_dir.children_tail)
+		parent_sd->s_dir.children_tail = sd;
 	sd->s_sibling = *pos;
+	sd->s_sibling_prev = prev;
 	*pos = sd;
+	parent_sd->s_nr_children_dir += (sysfs_type(sd) == SYSFS_DIR);
+
+	// rb tree insert
+	new = &(parent_sd->s_dir.child_rb_root.rb_node);
+	parent = NULL;
+
+	while (*new) {
+		struct sysfs_dirent *this =
+			container_of(*new, struct sysfs_dirent, s_rb_node);
+		int result = strcmp(sd->s_name, this->s_name);
+
+		parent = *new;
+		if (result < 0)
+			new = &((*new)->rb_left);
+		else if (result > 0)
+			new = &((*new)->rb_right);
+		else
+			BUG();
+	}
+
+	rb_link_node(&sd->s_rb_node, parent, new);
+	rb_insert_color(&sd->s_rb_node, &parent_sd->s_dir.child_rb_root);
 }
 
 /**
@@ -71,16 +106,22 @@ static void sysfs_link_sibling(struct sysfs_dirent *sd)
  */
 static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
 {
-	struct sysfs_dirent **pos;
+	struct sysfs_dirent **pos, *prev = NULL;
 
-	for (pos = &sd->s_parent->s_dir.children; *pos;
-	     pos = &(*pos)->s_sibling) {
-		if (*pos == sd) {
-			*pos = sd->s_sibling;
-			sd->s_sibling = NULL;
-			break;
-		}
-	}
+	prev = sd->s_sibling_prev;
+	if (prev)
+		pos = &prev->s_sibling;
+	else
+		pos = &sd->s_parent->s_dir.children;
+	if (sd == sd->s_parent->s_dir.children_tail)
+		sd->s_parent->s_dir.children_tail = prev;
+	*pos = sd->s_sibling;
+	if (sd->s_sibling)
+		sd->s_sibling->s_sibling_prev = prev;
+	
+	sd->s_parent->s_nr_children_dir -= (sysfs_type(sd) == SYSFS_DIR);
+	sd->s_sibling_prev = NULL;
+	rb_erase(&sd->s_rb_node, &sd->s_parent->s_dir.child_rb_root);
 }
 
 /**
@@ -331,6 +372,9 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 	sd->s_mode = mode;
 	sd->s_flags = type;
 
+	if (type == SYSFS_DIR)
+		sd->s_dir.child_rb_root = RB_ROOT;
+
 	return sd;
 
  err_out2:
@@ -630,11 +674,20 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
 struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
 				       const unsigned char *name)
 {
-	struct sysfs_dirent *sd;
-
-	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling)
-		if (!strcmp(sd->s_name, name))
-			return sd;
+	struct rb_node *node = parent_sd->s_dir.child_rb_root.rb_node;
+
+	while (node) {
+		struct sysfs_dirent *data =
+			container_of(node, struct sysfs_dirent, s_rb_node);
+		int result;
+		result = strcmp(name, data->s_name);
+		if (result < 0)
+			node = node->rb_left;
+		else if (result > 0)
+			node = node->rb_right;
+		else
+			return data;
+	}
 	return NULL;
 }
 
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index e28cecf..ff6e960 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -191,14 +191,7 @@ static struct lock_class_key sysfs_inode_imutex_key;
 
 static int sysfs_count_nlink(struct sysfs_dirent *sd)
 {
-	struct sysfs_dirent *child;
-	int nr = 0;
-
-	for (child = sd->s_dir.children; child; child = child->s_sibling)
-		if (sysfs_type(child) == SYSFS_DIR)
-			nr++;
-
-	return nr + 2;
+	return sd->s_nr_children_dir + 2;
 }
 
 static void sysfs_init_inode(struct sysfs_dirent *sd, struct inode *inode)
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index af4c4e7..22fd1bc 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -9,6 +9,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/rbtree.h>
 
 struct sysfs_open_dirent;
 
@@ -16,7 +17,10 @@ struct sysfs_open_dirent;
 struct sysfs_elem_dir {
 	struct kobject		*kobj;
 	/* children list starts here and goes through sd->s_sibling */
+	
 	struct sysfs_dirent	*children;
+	struct sysfs_dirent	*children_tail;
+	struct rb_root		child_rb_root;
 };
 
 struct sysfs_elem_symlink {
@@ -52,6 +56,8 @@ struct sysfs_dirent {
 	atomic_t		s_active;
 	struct sysfs_dirent	*s_parent;
 	struct sysfs_dirent	*s_sibling;
+	struct sysfs_dirent	*s_sibling_prev;
+	struct rb_node		s_rb_node;
 	const char		*s_name;
 
 	union {
@@ -65,6 +71,8 @@ struct sysfs_dirent {
 	ino_t			s_ino;
 	umode_t			s_mode;
 	struct sysfs_inode_attrs *s_iattr;
+
+	int			s_nr_children_dir;
 };
 
 #define SD_DEACTIVATED_BIAS		INT_MIN

^ permalink raw reply related

* [net-2.6 PATCH] e100: e100_phy_init() isolates selected PHY, causes 10 second boot delay
From: Jeff Kirsher @ 2009-10-29 23:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Bernhard Kaindl, Bruce Allan, Jeff Kirsher

From: Bruce Allan <bruce.w.allan@intel.com>

A change in how PHYs are electrically isolated caused all PHYs to be
isolated followed by reverting that isolation for the selected PHY.
Unfortunately, isolating the selected PHY for even a short period of
time can result in DHCP negotiation taking more than 10 seconds on certain
embedded configurations delaying boot time as reported by Bernhard Kaindl.
This patch reverts the change to how PHYs are isolated yet still works
around the issue for 82552 needing the selected PHY's BMCR register to
be written after the unused PHYs are isolated.  This code is moved below
the setting of nic->phy ID in order to do the 82552-specific workaround.

Cc: Bernhard Kaindl <bernhard.kaindl@gmx.net>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e100.c |   26 +++++++++++++++++++-------
 1 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 679965c..d19b084 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -1426,19 +1426,31 @@ static int e100_phy_init(struct nic *nic)
 	} else
 		DPRINTK(HW, DEBUG, "phy_addr = %d\n", nic->mii.phy_id);
 
-	/* Isolate all the PHY ids */
-	for (addr = 0; addr < 32; addr++)
-		mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);
-	/* Select the discovered PHY */
-	bmcr &= ~BMCR_ISOLATE;
-	mdio_write(netdev, nic->mii.phy_id, MII_BMCR, bmcr);
-
 	/* Get phy ID */
 	id_lo = mdio_read(netdev, nic->mii.phy_id, MII_PHYSID1);
 	id_hi = mdio_read(netdev, nic->mii.phy_id, MII_PHYSID2);
 	nic->phy = (u32)id_hi << 16 | (u32)id_lo;
 	DPRINTK(HW, DEBUG, "phy ID = 0x%08X\n", nic->phy);
 
+	/* Select the phy and isolate the rest */
+	for (addr = 0; addr < 32; addr++) {
+		if (addr != nic->mii.phy_id) {
+			mdio_write(netdev, addr, MII_BMCR, BMCR_ISOLATE);
+		} else if (nic->phy != phy_82552_v) {
+			bmcr = mdio_read(netdev, addr, MII_BMCR);
+			mdio_write(netdev, addr, MII_BMCR,
+				bmcr & ~BMCR_ISOLATE);
+		}
+	}
+	/*
+	 * Workaround for 82552:
+	 * Clear the ISOLATE bit on selected phy_id last (mirrored on all
+	 * other phy_id's) using bmcr value from addr discovery loop above.
+	 */
+	if (nic->phy == phy_82552_v)
+		mdio_write(netdev, nic->mii.phy_id, MII_BMCR,
+			bmcr & ~BMCR_ISOLATE);
+
 	/* Handle National tx phys */
 #define NCS_PHY_MODEL_MASK	0xFFF0FFFF
 	if ((nic->phy & NCS_PHY_MODEL_MASK) == phy_nsc_tx) {


^ permalink raw reply related

* [net-2.6 PATCH 1/2] e1000e: config PHY via software after resets
From: Jeff Kirsher @ 2009-10-29 23:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Bruce Allan, Jeff Kirsher

From: Bruce Allan <bruce.w.allan@intel.com>

On PCH-based (82577/82578) and some ICH8-based parts (82566) there is an
issue with the hardware automatically configuring the PHY with contents
from the EEPROM after the PHY is reset, so do the configuration by the
driver instead.  This was already similarly done for some 82566 parts in
e1000_phy_hw_reset_ich8lan() but needs to be done after other resets,
so move the PHY configuration code to its own function and call after
all PHY resets.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/defines.h |    1 
 drivers/net/e1000e/ich8lan.c |  295 +++++++++++++++++++++++++++++++-----------
 2 files changed, 218 insertions(+), 78 deletions(-)

diff --git a/drivers/net/e1000e/defines.h b/drivers/net/e1000e/defines.h
index c0f185b..4741ef9 100644
--- a/drivers/net/e1000e/defines.h
+++ b/drivers/net/e1000e/defines.h
@@ -347,6 +347,7 @@
 /* Extended Configuration Control and Size */
 #define E1000_EXTCNF_CTRL_MDIO_SW_OWNERSHIP      0x00000020
 #define E1000_EXTCNF_CTRL_LCD_WRITE_ENABLE       0x00000001
+#define E1000_EXTCNF_CTRL_OEM_WRITE_ENABLE       0x00000008
 #define E1000_EXTCNF_CTRL_SWFLAG                 0x00000020
 #define E1000_EXTCNF_SIZE_EXT_PCIE_LENGTH_MASK   0x00FF0000
 #define E1000_EXTCNF_SIZE_EXT_PCIE_LENGTH_SHIFT          16
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index b6388b9..095ffa5 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -124,9 +124,20 @@
 
 #define SW_FLAG_TIMEOUT    1000 /* SW Semaphore flag timeout in milliseconds */
 
+/* SMBus Address Phy Register */
+#define HV_SMB_ADDR            PHY_REG(768, 26)
+#define HV_SMB_ADDR_PEC_EN     0x0200
+#define HV_SMB_ADDR_VALID      0x0080
+
+/* Strapping Option Register - RO */
+#define E1000_STRAP                     0x0000C
+#define E1000_STRAP_SMBUS_ADDRESS_MASK  0x00FE0000
+#define E1000_STRAP_SMBUS_ADDRESS_SHIFT 17
+
 /* OEM Bits Phy Register */
 #define HV_OEM_BITS            PHY_REG(768, 25)
 #define HV_OEM_BITS_LPLU       0x0004 /* Low Power Link Up */
+#define HV_OEM_BITS_GBE_DIS    0x0040 /* Gigabit Disable */
 #define HV_OEM_BITS_RESTART_AN 0x0400 /* Restart Auto-negotiation */
 
 /* ICH GbE Flash Hardware Sequencing Flash Status Register bit breakdown */
@@ -208,6 +219,7 @@ static s32 e1000_cleanup_led_pchlan(struct e1000_hw *hw);
 static s32 e1000_led_on_pchlan(struct e1000_hw *hw);
 static s32 e1000_led_off_pchlan(struct e1000_hw *hw);
 static s32 e1000_set_lplu_state_pchlan(struct e1000_hw *hw, bool active);
+static void e1000_lan_init_done_ich8lan(struct e1000_hw *hw);
 
 static inline u16 __er16flash(struct e1000_hw *hw, unsigned long reg)
 {
@@ -794,6 +806,191 @@ static s32 e1000_phy_force_speed_duplex_ich8lan(struct e1000_hw *hw)
 }
 
 /**
+ *  e1000_sw_lcd_config_ich8lan - SW-based LCD Configuration
+ *  @hw:   pointer to the HW structure
+ *
+ *  SW should configure the LCD from the NVM extended configuration region
+ *  as a workaround for certain parts.
+ **/
+static s32 e1000_sw_lcd_config_ich8lan(struct e1000_hw *hw)
+{
+	struct e1000_phy_info *phy = &hw->phy;
+	u32 i, data, cnf_size, cnf_base_addr, sw_cfg_mask;
+	s32 ret_val;
+	u16 word_addr, reg_data, reg_addr, phy_page = 0;
+
+	ret_val = hw->phy.ops.acquire_phy(hw);
+	if (ret_val)
+		return ret_val;
+
+	/*
+	 * Initialize the PHY from the NVM on ICH platforms.  This
+	 * is needed due to an issue where the NVM configuration is
+	 * not properly autoloaded after power transitions.
+	 * Therefore, after each PHY reset, we will load the
+	 * configuration data out of the NVM manually.
+	 */
+	if ((hw->mac.type == e1000_ich8lan && phy->type == e1000_phy_igp_3) ||
+		(hw->mac.type == e1000_pchlan)) {
+		struct e1000_adapter *adapter = hw->adapter;
+
+		/* Check if SW needs to configure the PHY */
+		if ((adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_M_AMT) ||
+		    (adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_M) ||
+		    (hw->mac.type == e1000_pchlan))
+			sw_cfg_mask = E1000_FEXTNVM_SW_CONFIG_ICH8M;
+		else
+			sw_cfg_mask = E1000_FEXTNVM_SW_CONFIG;
+
+		data = er32(FEXTNVM);
+		if (!(data & sw_cfg_mask))
+			goto out;
+
+		/* Wait for basic configuration completes before proceeding */
+		e1000_lan_init_done_ich8lan(hw);
+
+		/*
+		 * Make sure HW does not configure LCD from PHY
+		 * extended configuration before SW configuration
+		 */
+		data = er32(EXTCNF_CTRL);
+		if (data & E1000_EXTCNF_CTRL_LCD_WRITE_ENABLE)
+			goto out;
+
+		cnf_size = er32(EXTCNF_SIZE);
+		cnf_size &= E1000_EXTCNF_SIZE_EXT_PCIE_LENGTH_MASK;
+		cnf_size >>= E1000_EXTCNF_SIZE_EXT_PCIE_LENGTH_SHIFT;
+		if (!cnf_size)
+			goto out;
+
+		cnf_base_addr = data & E1000_EXTCNF_CTRL_EXT_CNF_POINTER_MASK;
+		cnf_base_addr >>= E1000_EXTCNF_CTRL_EXT_CNF_POINTER_SHIFT;
+
+		if (!(data & E1000_EXTCNF_CTRL_OEM_WRITE_ENABLE) &&
+		    (hw->mac.type == e1000_pchlan)) {
+			/*
+			 * HW configures the SMBus address and LEDs when the
+			 * OEM and LCD Write Enable bits are set in the NVM.
+			 * When both NVM bits are cleared, SW will configure
+			 * them instead.
+			 */
+			data = er32(STRAP);
+			data &= E1000_STRAP_SMBUS_ADDRESS_MASK;
+			reg_data = data >> E1000_STRAP_SMBUS_ADDRESS_SHIFT;
+			reg_data |= HV_SMB_ADDR_PEC_EN | HV_SMB_ADDR_VALID;
+			ret_val = e1000_write_phy_reg_hv_locked(hw, HV_SMB_ADDR,
+			                                        reg_data);
+			if (ret_val)
+				goto out;
+
+			data = er32(LEDCTL);
+			ret_val = e1000_write_phy_reg_hv_locked(hw,
+			                                        HV_LED_CONFIG,
+			                                        (u16)data);
+			if (ret_val)
+				goto out;
+		}
+		/* Configure LCD from extended configuration region. */
+
+		/* cnf_base_addr is in DWORD */
+		word_addr = (u16)(cnf_base_addr << 1);
+
+		for (i = 0; i < cnf_size; i++) {
+			ret_val = e1000_read_nvm(hw, (word_addr + i * 2), 1,
+			                           &reg_data);
+			if (ret_val)
+				goto out;
+
+			ret_val = e1000_read_nvm(hw, (word_addr + i * 2 + 1),
+			                           1, &reg_addr);
+			if (ret_val)
+				goto out;
+
+			/* Save off the PHY page for future writes. */
+			if (reg_addr == IGP01E1000_PHY_PAGE_SELECT) {
+				phy_page = reg_data;
+				continue;
+			}
+
+			reg_addr &= PHY_REG_MASK;
+			reg_addr |= phy_page;
+
+			ret_val = phy->ops.write_phy_reg_locked(hw,
+			                                    (u32)reg_addr,
+			                                    reg_data);
+			if (ret_val)
+				goto out;
+		}
+	}
+
+out:
+	hw->phy.ops.release_phy(hw);
+	return ret_val;
+}
+
+/**
+ *  e1000_oem_bits_config_ich8lan - SW-based LCD Configuration
+ *  @hw:       pointer to the HW structure
+ *  @d0_state: boolean if entering d0 or d3 device state
+ *
+ *  SW will configure Gbe Disable and LPLU based on the NVM. The four bits are
+ *  collectively called OEM bits.  The OEM Write Enable bit and SW Config bit
+ *  in NVM determines whether HW should configure LPLU and Gbe Disable.
+ **/
+static s32 e1000_oem_bits_config_ich8lan(struct e1000_hw *hw, bool d0_state)
+{
+	s32 ret_val = 0;
+	u32 mac_reg;
+	u16 oem_reg;
+
+	if (hw->mac.type != e1000_pchlan)
+		return ret_val;
+
+	ret_val = hw->phy.ops.acquire_phy(hw);
+	if (ret_val)
+		return ret_val;
+
+	mac_reg = er32(EXTCNF_CTRL);
+	if (mac_reg & E1000_EXTCNF_CTRL_OEM_WRITE_ENABLE)
+		goto out;
+
+	mac_reg = er32(FEXTNVM);
+	if (!(mac_reg & E1000_FEXTNVM_SW_CONFIG_ICH8M))
+		goto out;
+
+	mac_reg = er32(PHY_CTRL);
+
+	ret_val = hw->phy.ops.read_phy_reg_locked(hw, HV_OEM_BITS, &oem_reg);
+	if (ret_val)
+		goto out;
+
+	oem_reg &= ~(HV_OEM_BITS_GBE_DIS | HV_OEM_BITS_LPLU);
+
+	if (d0_state) {
+		if (mac_reg & E1000_PHY_CTRL_GBE_DISABLE)
+			oem_reg |= HV_OEM_BITS_GBE_DIS;
+
+		if (mac_reg & E1000_PHY_CTRL_D0A_LPLU)
+			oem_reg |= HV_OEM_BITS_LPLU;
+	} else {
+		if (mac_reg & E1000_PHY_CTRL_NOND0A_GBE_DISABLE)
+			oem_reg |= HV_OEM_BITS_GBE_DIS;
+
+		if (mac_reg & E1000_PHY_CTRL_NOND0A_LPLU)
+			oem_reg |= HV_OEM_BITS_LPLU;
+	}
+	/* Restart auto-neg to activate the bits */
+	oem_reg |= HV_OEM_BITS_RESTART_AN;
+	ret_val = hw->phy.ops.write_phy_reg_locked(hw, HV_OEM_BITS, oem_reg);
+
+out:
+	hw->phy.ops.release_phy(hw);
+
+	return ret_val;
+}
+
+
+/**
  *  e1000_hv_phy_workarounds_ich8lan - A series of Phy workarounds to be
  *  done after every PHY reset.
  **/
@@ -882,11 +1079,8 @@ static void e1000_lan_init_done_ich8lan(struct e1000_hw *hw)
  **/
 static s32 e1000_phy_hw_reset_ich8lan(struct e1000_hw *hw)
 {
-	struct e1000_phy_info *phy = &hw->phy;
-	u32 i;
-	u32 data, cnf_size, cnf_base_addr, sw_cfg_mask;
-	s32 ret_val;
-	u16 reg, word_addr, reg_data, reg_addr, phy_page = 0;
+	s32 ret_val = 0;
+	u16 reg;
 
 	ret_val = e1000e_phy_hw_reset_generic(hw);
 	if (ret_val)
@@ -905,81 +1099,16 @@ static s32 e1000_phy_hw_reset_ich8lan(struct e1000_hw *hw)
 	if (hw->mac.type == e1000_pchlan)
 		e1e_rphy(hw, BM_WUC, &reg);
 
-	/*
-	 * Initialize the PHY from the NVM on ICH platforms.  This
-	 * is needed due to an issue where the NVM configuration is
-	 * not properly autoloaded after power transitions.
-	 * Therefore, after each PHY reset, we will load the
-	 * configuration data out of the NVM manually.
-	 */
-	if (hw->mac.type == e1000_ich8lan && phy->type == e1000_phy_igp_3) {
-		struct e1000_adapter *adapter = hw->adapter;
-
-		/* Check if SW needs configure the PHY */
-		if ((adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_M_AMT) ||
-		    (adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_M))
-			sw_cfg_mask = E1000_FEXTNVM_SW_CONFIG_ICH8M;
-		else
-			sw_cfg_mask = E1000_FEXTNVM_SW_CONFIG;
-
-		data = er32(FEXTNVM);
-		if (!(data & sw_cfg_mask))
-			return 0;
-
-		/* Wait for basic configuration completes before proceeding */
-		e1000_lan_init_done_ich8lan(hw);
-
-		/*
-		 * Make sure HW does not configure LCD from PHY
-		 * extended configuration before SW configuration
-		 */
-		data = er32(EXTCNF_CTRL);
-		if (data & E1000_EXTCNF_CTRL_LCD_WRITE_ENABLE)
-			return 0;
-
-		cnf_size = er32(EXTCNF_SIZE);
-		cnf_size &= E1000_EXTCNF_SIZE_EXT_PCIE_LENGTH_MASK;
-		cnf_size >>= E1000_EXTCNF_SIZE_EXT_PCIE_LENGTH_SHIFT;
-		if (!cnf_size)
-			return 0;
-
-		cnf_base_addr = data & E1000_EXTCNF_CTRL_EXT_CNF_POINTER_MASK;
-		cnf_base_addr >>= E1000_EXTCNF_CTRL_EXT_CNF_POINTER_SHIFT;
-
-		/* Configure LCD from extended configuration region. */
-
-		/* cnf_base_addr is in DWORD */
-		word_addr = (u16)(cnf_base_addr << 1);
-
-		for (i = 0; i < cnf_size; i++) {
-			ret_val = e1000_read_nvm(hw,
-						(word_addr + i * 2),
-						1,
-						&reg_data);
-			if (ret_val)
-				return ret_val;
-
-			ret_val = e1000_read_nvm(hw,
-						(word_addr + i * 2 + 1),
-						1,
-						&reg_addr);
-			if (ret_val)
-				return ret_val;
-
-			/* Save off the PHY page for future writes. */
-			if (reg_addr == IGP01E1000_PHY_PAGE_SELECT) {
-				phy_page = reg_data;
-				continue;
-			}
-
-			reg_addr |= phy_page;
+	/* Configure the LCD with the extended configuration region in NVM */
+	ret_val = e1000_sw_lcd_config_ich8lan(hw);
+	if (ret_val)
+		goto out;
 
-			ret_val = e1e_wphy(hw, (u32)reg_addr, reg_data);
-			if (ret_val)
-				return ret_val;
-		}
-	}
+	/* Configure the LCD with the OEM bits in NVM */
+	if (hw->mac.type == e1000_pchlan)
+		ret_val = e1000_oem_bits_config_ich8lan(hw, true);
 
+out:
 	return 0;
 }
 
@@ -2386,6 +2515,15 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 	if (hw->mac.type == e1000_pchlan)
 		e1e_rphy(hw, BM_WUC, &reg);
 
+	ret_val = e1000_sw_lcd_config_ich8lan(hw);
+	if (ret_val)
+		goto out;
+
+	if (hw->mac.type == e1000_pchlan) {
+		ret_val = e1000_oem_bits_config_ich8lan(hw, true);
+		if (ret_val)
+			goto out;
+	}
 	/*
 	 * For PCH, this write will make sure that any noise
 	 * will be detected as a CRC error and be dropped rather than show up
@@ -2404,6 +2542,7 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 	if (hw->mac.type == e1000_pchlan)
 		ret_val = e1000_hv_phy_workarounds_ich8lan(hw);
 
+out:
 	return ret_val;
 }
 


^ permalink raw reply related

* [net-2.6 PATCH 2/2] e1000e: rework disable K1 at 1000Mbps for 82577/82578
From: Jeff Kirsher @ 2009-10-29 23:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Bruce Allan, Jeff Kirsher
In-Reply-To: <20091029234545.5413.9667.stgit@localhost.localdomain>

From: Bruce Allan <bruce.w.allan@intel.com>

This patch reworks a previous workaround (commit 7d3cabbcc) for an issue
in hardware where noise on the interconnect between the MAC and PHY could
be generated by a lower power mode (K1) at 1000Mbps resulting in bad
packets.  Disable K1 while at 1000 Mbps but keep it enabled for 10/100Mbps
and when the cable is disconnected.  The original version of this
workaround was found to be incomplete.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/defines.h |    1 
 drivers/net/e1000e/e1000.h   |   14 +++
 drivers/net/e1000e/hw.h      |    1 
 drivers/net/e1000e/ich8lan.c |  187 ++++++++++++++++++++++++++++++++++++++----
 drivers/net/e1000e/phy.c     |   15 +--
 5 files changed, 190 insertions(+), 28 deletions(-)

diff --git a/drivers/net/e1000e/defines.h b/drivers/net/e1000e/defines.h
index 4741ef9..1190167 100644
--- a/drivers/net/e1000e/defines.h
+++ b/drivers/net/e1000e/defines.h
@@ -76,6 +76,7 @@
 /* Extended Device Control */
 #define E1000_CTRL_EXT_SDP7_DATA 0x00000080 /* Value of SW Definable Pin 7 */
 #define E1000_CTRL_EXT_EE_RST    0x00002000 /* Reinitialize from EEPROM */
+#define E1000_CTRL_EXT_SPD_BYPS  0x00008000 /* Speed Select Bypass */
 #define E1000_CTRL_EXT_RO_DIS    0x00020000 /* Relaxed Ordering disable */
 #define E1000_CTRL_EXT_DMA_DYN_CLK_EN 0x00080000 /* DMA Dynamic Clock Gating */
 #define E1000_CTRL_EXT_LINK_MODE_MASK 0x00C00000
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 405a144..189dfa2 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -141,6 +141,20 @@ struct e1000_info;
 #define HV_TNCRS_UPPER		PHY_REG(778, 29) /* Transmit with no CRS */
 #define HV_TNCRS_LOWER		PHY_REG(778, 30)
 
+/* BM PHY Copper Specific Status */
+#define BM_CS_STATUS                      17
+#define BM_CS_STATUS_LINK_UP              0x0400
+#define BM_CS_STATUS_RESOLVED             0x0800
+#define BM_CS_STATUS_SPEED_MASK           0xC000
+#define BM_CS_STATUS_SPEED_1000           0x8000
+
+/* 82577 Mobile Phy Status Register */
+#define HV_M_STATUS                       26
+#define HV_M_STATUS_AUTONEG_COMPLETE      0x1000
+#define HV_M_STATUS_SPEED_MASK            0x0300
+#define HV_M_STATUS_SPEED_1000            0x0200
+#define HV_M_STATUS_LINK_UP               0x0040
+
 enum e1000_boards {
 	board_82571,
 	board_82572,
diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 7b05cf4..aaea41e 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -903,6 +903,7 @@ struct e1000_shadow_ram {
 struct e1000_dev_spec_ich8lan {
 	bool kmrn_lock_loss_workaround_enabled;
 	struct e1000_shadow_ram shadow_ram[E1000_ICH8_SHADOW_RAM_WORDS];
+	bool nvm_k1_enabled;
 };
 
 struct e1000_hw {
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 095ffa5..51ddb04 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -140,6 +140,9 @@
 #define HV_OEM_BITS_GBE_DIS    0x0040 /* Gigabit Disable */
 #define HV_OEM_BITS_RESTART_AN 0x0400 /* Restart Auto-negotiation */
 
+#define E1000_NVM_K1_CONFIG 0x1B /* NVM K1 Config Word */
+#define E1000_NVM_K1_ENABLE 0x1  /* NVM Enable K1 bit */
+
 /* ICH GbE Flash Hardware Sequencing Flash Status Register bit breakdown */
 /* Offset 04h HSFSTS */
 union ich8_hws_flash_status {
@@ -220,6 +223,8 @@ static s32 e1000_led_on_pchlan(struct e1000_hw *hw);
 static s32 e1000_led_off_pchlan(struct e1000_hw *hw);
 static s32 e1000_set_lplu_state_pchlan(struct e1000_hw *hw, bool active);
 static void e1000_lan_init_done_ich8lan(struct e1000_hw *hw);
+static s32  e1000_k1_gig_workaround_hv(struct e1000_hw *hw, bool link);
+static s32 e1000_configure_k1_ich8lan(struct e1000_hw *hw, bool k1_enable);
 
 static inline u16 __er16flash(struct e1000_hw *hw, unsigned long reg)
 {
@@ -495,14 +500,6 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 		goto out;
 	}
 
-	if (hw->mac.type == e1000_pchlan) {
-		ret_val = e1000e_write_kmrn_reg(hw,
-		                                   E1000_KMRNCTRLSTA_K1_CONFIG,
-		                                   E1000_KMRNCTRLSTA_K1_ENABLE);
-		if (ret_val)
-			goto out;
-	}
-
 	/*
 	 * First we want to see if the MII Status Register reports
 	 * link.  If so, then we want to get the current speed/duplex
@@ -512,6 +509,12 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 	if (ret_val)
 		goto out;
 
+	if (hw->mac.type == e1000_pchlan) {
+		ret_val = e1000_k1_gig_workaround_hv(hw, link);
+		if (ret_val)
+			goto out;
+	}
+
 	if (!link)
 		goto out; /* No link detected */
 
@@ -929,6 +932,141 @@ out:
 }
 
 /**
+ *  e1000_k1_gig_workaround_hv - K1 Si workaround
+ *  @hw:   pointer to the HW structure
+ *  @link: link up bool flag
+ *
+ *  If K1 is enabled for 1Gbps, the MAC might stall when transitioning
+ *  from a lower speed.  This workaround disables K1 whenever link is at 1Gig
+ *  If link is down, the function will restore the default K1 setting located
+ *  in the NVM.
+ **/
+static s32 e1000_k1_gig_workaround_hv(struct e1000_hw *hw, bool link)
+{
+	s32 ret_val = 0;
+	u16 status_reg = 0;
+	bool k1_enable = hw->dev_spec.ich8lan.nvm_k1_enabled;
+
+	if (hw->mac.type != e1000_pchlan)
+		goto out;
+
+	/* Wrap the whole flow with the sw flag */
+	ret_val = hw->phy.ops.acquire_phy(hw);
+	if (ret_val)
+		goto out;
+
+	/* Disable K1 when link is 1Gbps, otherwise use the NVM setting */
+	if (link) {
+		if (hw->phy.type == e1000_phy_82578) {
+			ret_val = hw->phy.ops.read_phy_reg_locked(hw,
+			                                          BM_CS_STATUS,
+			                                          &status_reg);
+			if (ret_val)
+				goto release;
+
+			status_reg &= BM_CS_STATUS_LINK_UP |
+			              BM_CS_STATUS_RESOLVED |
+			              BM_CS_STATUS_SPEED_MASK;
+
+			if (status_reg == (BM_CS_STATUS_LINK_UP |
+			                   BM_CS_STATUS_RESOLVED |
+			                   BM_CS_STATUS_SPEED_1000))
+				k1_enable = false;
+		}
+
+		if (hw->phy.type == e1000_phy_82577) {
+			ret_val = hw->phy.ops.read_phy_reg_locked(hw,
+			                                          HV_M_STATUS,
+			                                          &status_reg);
+			if (ret_val)
+				goto release;
+
+			status_reg &= HV_M_STATUS_LINK_UP |
+			              HV_M_STATUS_AUTONEG_COMPLETE |
+			              HV_M_STATUS_SPEED_MASK;
+
+			if (status_reg == (HV_M_STATUS_LINK_UP |
+			                   HV_M_STATUS_AUTONEG_COMPLETE |
+			                   HV_M_STATUS_SPEED_1000))
+				k1_enable = false;
+		}
+
+		/* Link stall fix for link up */
+		ret_val = hw->phy.ops.write_phy_reg_locked(hw, PHY_REG(770, 19),
+		                                           0x0100);
+		if (ret_val)
+			goto release;
+
+	} else {
+		/* Link stall fix for link down */
+		ret_val = hw->phy.ops.write_phy_reg_locked(hw, PHY_REG(770, 19),
+		                                           0x4100);
+		if (ret_val)
+			goto release;
+	}
+
+	ret_val = e1000_configure_k1_ich8lan(hw, k1_enable);
+
+release:
+	hw->phy.ops.release_phy(hw);
+out:
+	return ret_val;
+}
+
+/**
+ *  e1000_configure_k1_ich8lan - Configure K1 power state
+ *  @hw: pointer to the HW structure
+ *  @enable: K1 state to configure
+ *
+ *  Configure the K1 power state based on the provided parameter.
+ *  Assumes semaphore already acquired.
+ *
+ *  Success returns 0, Failure returns -E1000_ERR_PHY (-2)
+ **/
+static s32 e1000_configure_k1_ich8lan(struct e1000_hw *hw, bool k1_enable)
+{
+	s32 ret_val = 0;
+	u32 ctrl_reg = 0;
+	u32 ctrl_ext = 0;
+	u32 reg = 0;
+	u16 kmrn_reg = 0;
+
+	ret_val = e1000e_read_kmrn_reg_locked(hw,
+	                                     E1000_KMRNCTRLSTA_K1_CONFIG,
+	                                     &kmrn_reg);
+	if (ret_val)
+		goto out;
+
+	if (k1_enable)
+		kmrn_reg |= E1000_KMRNCTRLSTA_K1_ENABLE;
+	else
+		kmrn_reg &= ~E1000_KMRNCTRLSTA_K1_ENABLE;
+
+	ret_val = e1000e_write_kmrn_reg_locked(hw,
+	                                      E1000_KMRNCTRLSTA_K1_CONFIG,
+	                                      kmrn_reg);
+	if (ret_val)
+		goto out;
+
+	udelay(20);
+	ctrl_ext = er32(CTRL_EXT);
+	ctrl_reg = er32(CTRL);
+
+	reg = ctrl_reg & ~(E1000_CTRL_SPD_1000 | E1000_CTRL_SPD_100);
+	reg |= E1000_CTRL_FRCSPD;
+	ew32(CTRL, reg);
+
+	ew32(CTRL_EXT, ctrl_ext | E1000_CTRL_EXT_SPD_BYPS);
+	udelay(20);
+	ew32(CTRL, ctrl_reg);
+	ew32(CTRL_EXT, ctrl_ext);
+	udelay(20);
+
+out:
+	return ret_val;
+}
+
+/**
  *  e1000_oem_bits_config_ich8lan - SW-based LCD Configuration
  *  @hw:       pointer to the HW structure
  *  @d0_state: boolean if entering d0 or d3 device state
@@ -1030,10 +1168,20 @@ static s32 e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw)
 	ret_val = hw->phy.ops.acquire_phy(hw);
 	if (ret_val)
 		return ret_val;
+
 	hw->phy.addr = 1;
-	e1000e_write_phy_reg_mdic(hw, IGP01E1000_PHY_PAGE_SELECT, 0);
+	ret_val = e1000e_write_phy_reg_mdic(hw, IGP01E1000_PHY_PAGE_SELECT, 0);
+	if (ret_val)
+		goto out;
 	hw->phy.ops.release_phy(hw);
 
+	/*
+	 * Configure the K1 Si workaround during phy reset assuming there is
+	 * link so that it disables K1 if link is in 1Gbps.
+	 */
+	ret_val = e1000_k1_gig_workaround_hv(hw, true);
+
+out:
 	return ret_val;
 }
 
@@ -2435,6 +2583,7 @@ static s32 e1000_get_bus_info_ich8lan(struct e1000_hw *hw)
  **/
 static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 {
+	struct e1000_dev_spec_ich8lan *dev_spec = &hw->dev_spec.ich8lan;
 	u16 reg;
 	u32 ctrl, icr, kab;
 	s32 ret_val;
@@ -2470,6 +2619,18 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 		ew32(PBS, E1000_PBS_16K);
 	}
 
+	if (hw->mac.type == e1000_pchlan) {
+		/* Save the NVM K1 bit setting*/
+		ret_val = e1000_read_nvm(hw, E1000_NVM_K1_CONFIG, 1, &reg);
+		if (ret_val)
+			return ret_val;
+
+		if (reg & E1000_NVM_K1_ENABLE)
+			dev_spec->nvm_k1_enabled = true;
+		else
+			dev_spec->nvm_k1_enabled = false;
+	}
+
 	ctrl = er32(CTRL);
 
 	if (!e1000_check_reset_block(hw)) {
@@ -2847,14 +3008,6 @@ static s32 e1000_get_link_up_info_ich8lan(struct e1000_hw *hw, u16 *speed,
 	if (ret_val)
 		return ret_val;
 
-	if ((hw->mac.type == e1000_pchlan) && (*speed == SPEED_1000)) {
-		ret_val = e1000e_write_kmrn_reg(hw,
-		                                  E1000_KMRNCTRLSTA_K1_CONFIG,
-		                                  E1000_KMRNCTRLSTA_K1_DISABLE);
-		if (ret_val)
-			return ret_val;
-	}
-
 	if ((hw->mac.type == e1000_ich8lan) &&
 	    (hw->phy.type == e1000_phy_igp_3) &&
 	    (*speed == SPEED_1000)) {
diff --git a/drivers/net/e1000e/phy.c b/drivers/net/e1000e/phy.c
index f9d33ab..03175b3 100644
--- a/drivers/net/e1000e/phy.c
+++ b/drivers/net/e1000e/phy.c
@@ -95,13 +95,6 @@ static const u16 e1000_igp_2_cable_length_table[] =
 /* BM PHY Copper Specific Control 1 */
 #define BM_CS_CTRL1                       16
 
-/* BM PHY Copper Specific Status */
-#define BM_CS_STATUS                      17
-#define BM_CS_STATUS_LINK_UP              0x0400
-#define BM_CS_STATUS_RESOLVED             0x0800
-#define BM_CS_STATUS_SPEED_MASK           0xC000
-#define BM_CS_STATUS_SPEED_1000           0x8000
-
 #define HV_MUX_DATA_CTRL               PHY_REG(776, 16)
 #define HV_MUX_DATA_CTRL_GEN_TO_MAC    0x0400
 #define HV_MUX_DATA_CTRL_FORCE_SPEED   0x0004
@@ -563,7 +556,7 @@ s32 e1000e_read_kmrn_reg(struct e1000_hw *hw, u32 offset, u16 *data)
 }
 
 /**
- *  e1000_read_kmrn_reg_locked -  Read kumeran register
+ *  e1000e_read_kmrn_reg_locked -  Read kumeran register
  *  @hw: pointer to the HW structure
  *  @offset: register offset to be read
  *  @data: pointer to the read data
@@ -572,7 +565,7 @@ s32 e1000e_read_kmrn_reg(struct e1000_hw *hw, u32 offset, u16 *data)
  *  information retrieved is stored in data.
  *  Assumes semaphore already acquired.
  **/
-s32 e1000_read_kmrn_reg_locked(struct e1000_hw *hw, u32 offset, u16 *data)
+s32 e1000e_read_kmrn_reg_locked(struct e1000_hw *hw, u32 offset, u16 *data)
 {
 	return __e1000_read_kmrn_reg(hw, offset, data, true);
 }
@@ -631,7 +624,7 @@ s32 e1000e_write_kmrn_reg(struct e1000_hw *hw, u32 offset, u16 data)
 }
 
 /**
- *  e1000_write_kmrn_reg_locked -  Write kumeran register
+ *  e1000e_write_kmrn_reg_locked -  Write kumeran register
  *  @hw: pointer to the HW structure
  *  @offset: register offset to write to
  *  @data: data to write at register offset
@@ -639,7 +632,7 @@ s32 e1000e_write_kmrn_reg(struct e1000_hw *hw, u32 offset, u16 data)
  *  Write the data to PHY register at the offset using the kumeran interface.
  *  Assumes semaphore already acquired.
  **/
-s32 e1000_write_kmrn_reg_locked(struct e1000_hw *hw, u32 offset, u16 data)
+s32 e1000e_write_kmrn_reg_locked(struct e1000_hw *hw, u32 offset, u16 data)
 {
 	return __e1000_write_kmrn_reg(hw, offset, data, true);
 }


^ permalink raw reply related

* Re: How to use gretap with bridge?
From: Herbert Xu @ 2009-10-29 23:47 UTC (permalink / raw)
  To: Neulinger, Nathan; +Cc: Stephen Hemminger, netdev
In-Reply-To: <846C5B546E47494CBBD796CA8CA1617EA3B439@MST-VMAIL1.srv.mst.edu>

On Thu, Oct 29, 2009 at 05:04:02PM -0500, Neulinger, Nathan wrote:
> Now I see it - Stephen actually had it right on - the problem is that
> the gre tunnel is creating a MAC address on the fly based on the tunnel
> endpoint ip address, so if the tunnel endpoint address starts with an
> odd number, it hits the multicast check in the bridging code. (I'm sure
> that's what he meant and I just missed it entirely.)
> 
> Simplest option would probably be to just mask off the first octet with
> 0xFD or using the ip as the last four octets of the mac instead of the
> first four.

This looks like a bug in either iproute or the kernel.  It's
not supposed to set a MAC address unless the user specifically
gives one.  If one is not given the kernel will generate a valid
MAC address.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* RE: How to use gretap with bridge?
From: Neulinger, Nathan @ 2009-10-29 23:51 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20091029234756.GA1595@gondor.apana.org.au>

Actually, it looks like it's broke here in ipgre_tunnel_init:

--- net/ipv4/ip_gre.c.orig      2009-10-29 18:45:29.335723326 -0500
+++ net/ipv4/ip_gre.c   2009-10-29 18:45:13.069697015 -0500
@@ -1240,7 +1240,8 @@
        tunnel->dev = dev;
        strcpy(tunnel->parms.name, dev->name);
 
-       memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
+       /* assign random mac addr on init */
+       random_ether_addr(dev->dev_addr);
        memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
 
        if (iph->daddr) {

The above change fixes it for me, but I'm no expert on this chunk of
code. (Perhaps it it shouldn't set dev_addr at all?)

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       nneul@mst.edu
Missouri S&T Information Technology    (573) 612-1412
System Administrator - Principal       KD0DMH


> -----Original Message-----
> From: Herbert Xu [mailto:herbert@gondor.apana.org.au]
> Sent: Thursday, October 29, 2009 6:48 PM
> To: Neulinger, Nathan
> Cc: Stephen Hemminger; netdev@vger.kernel.org
> Subject: Re: How to use gretap with bridge?
> 
> On Thu, Oct 29, 2009 at 05:04:02PM -0500, Neulinger, Nathan wrote:
> > Now I see it - Stephen actually had it right on - the problem is
that
> > the gre tunnel is creating a MAC address on the fly based on the
> tunnel
> > endpoint ip address, so if the tunnel endpoint address starts with
an
> > odd number, it hits the multicast check in the bridging code. (I'm
> sure
> > that's what he meant and I just missed it entirely.)
> >
> > Simplest option would probably be to just mask off the first octet
> with
> > 0xFD or using the ip as the last four octets of the mac instead of
> the
> > first four.
> 
> This looks like a bug in either iproute or the kernel.  It's
> not supposed to set a MAC address unless the user specifically
> gives one.  If one is not given the kernel will generate a valid
> MAC address.
> 
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH] sky2: set carrier off in probe
From: Brandon Philips @ 2009-10-29 23:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Before bringing up a sky2 interface up ethtool reports 
"Link detected: yes". Do as ixgbe does and netif_carrier_off() on
probe().

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/net/sky2.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/net/sky2.c
===================================================================
--- linux-2.6.orig/drivers/net/sky2.c
+++ linux-2.6/drivers/net/sky2.c
@@ -4538,6 +4538,8 @@ static int __devinit sky2_probe(struct p
 		goto err_out_free_netdev;
 	}
 
+	netif_carrier_off(dev);
+
 	netif_napi_add(dev, &hw->napi, sky2_poll, NAPI_WEIGHT);
 
 	err = request_irq(pdev->irq, sky2_intr,

^ permalink raw reply

* [PATCH 0/6] Bonding simplifications and netns support
From: Eric W. Biederman @ 2009-10-30  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jay Vosburgh

I recently had it pointed out to me that the bonding driver does not
work in a network namespace.  So I have simplified the bonding driver
a bit, added support for ip link add and ip link del, and finally made
the bonding driver work in multiple network namespaces.

The most note worthy change in the patchset is the addition of support
in the networking core for registering a sysfs group for a device.

Using this in the bonding driver simplifies the code and removes a
userspace race between actions triggered by the netlink event and the
bonding sysfs attributes appearing.

Eric

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox