Netdev List

Netdev List
 help / color / mirror / Atom feed

* schedule() does not work...
From: benzi vizman @ 2006-05-04 14:59 UTC (permalink / raw)
  To: netdev

Hi,

I try to run the following code in kernel (which is similar to apm() function) but it does not seem to work...

if (smp_processor_id() != cpu) {
      current->cpus_allowed = (1UL << cpu);
      schedule();
      if (smp_processor_id() != cpu) {
          BUG();
      }
}

Any idea why?

Benzi.

^ permalink raw reply

* schedule() does not work...
From: benzi vizman @ 2006-05-04 15:08 UTC (permalink / raw)
  To: netdev

Hi,

I try to run the following code in kernel (which is similar to apm() function) but it does not seem to work...

if (smp_processor_id() != cpu) {
      current->cpus_allowed = (1UL << cpu);
      schedule();
      if (smp_processor_id() != cpu) {
          BUG();
      }
}

Any idea why?

Benzi.

^ permalink raw reply

* Re: [DCCP]: Fix sock_orphan dead lock
From: Arnaldo Carvalho de Melo @ 2006-05-04 15:16 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, mingo, arjan, netdev
In-Reply-To: <20060504071750.GA17180@gondor.apana.org.au>

On 5/4/06, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, May 03, 2006 at 11:32:23PM -0700, David S. Miller wrote:
> >
> > Ok, it took me a while to review this one as verifying such
> > a set of changes is always tricky, but looks good!
>
> Thanks.  It took me quite a while to assure myself that it was OK
> as well :)
>
> Anyway, the same issue affects DCCP (and SCTP too probably).  Here
> is a patch that does the same thing for DCCP.
>
> [DCCP]: Fix sock_orphan dead lock
>
> Calling sock_orphan inside bh_lock_sock in dccp_close can lead to dead
> locks.  For example, the inet_diag code holds sk_callback_lock without
> disabling BH.  If an inbound packet arrives during that admittedly tiny
> window, it will cause a dead lock on bh_lock_sock.  Another possible
> path would be through sock_wfree if the network device driver frees the
> tx skb in process context with BH enabled.
>
> We can fix this by moving sock_orphan out of bh_lock_sock.
>
> The tricky bit is to work out when we need to destroy the socket
> ourselves and when it has already been destroyed by someone else.
>
> By moving sock_orphan before the release_sock we can solve this
> problem.  This is because as long as we own the socket lock its
> state cannot change.
>
> So we simply record the socket state before the release_sock
> and then check the state again after we regain the socket lock.
> If the socket state has transitioned to DCCP_CLOSED in the time being,
> we know that the socket has been destroyed.  Otherwise the socket is
> still ours to keep.
>
> This problem was discoverd by Ingo Molnar using his lock validator.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks Herbert,

Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>

- Arnaldo

^ permalink raw reply

* Re: Question on e1000 patch, rx-copy-break related.
From: Jesse Brandeburg @ 2006-05-04 16:00 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev, jeffrey.t.kirsher
In-Reply-To: <4458F2A2.60605@candelatech.com>

On 5/3/06, Ben Greear <greearb@candelatech.com> wrote:
> Jesse Brandeburg wrote:
> > On 5/2/06, Ben Greear <greearb@candelatech.com> wrote:
> >
> >> In commit:  a292ca6efbc1f259ddfb9c902367f2588e0e8b0f
> >> to e1000_main.c, there is the change below.
> >>
> >> I am curious why the skb_put no longer subtracts ETHERNET_FCS_SIZE
> >> from the length.  Is the idea that we will now always include the
> >> FCS at the end of the skb?
> >
> >
> > This is a long and hairy story behind this, but there is a bit called
> > SECRC that controls hardware stripping of the CRC.  In *this* driver
> > we turned that bit on, so that we didn't have to mess with skb->len -=
> > 4 and the nasty negative unwrap if we were using multiple descriptors
> > for rx.
> >
> > Since then, we've removed multiple descriptor rx.
> >
> > And after that, I've discovered that some BMCs are very unhappy if we
> > strip CRCs using SECRC.
> >
> > So, my related question is: is it okay if we just always leave the CRC
> > on the end of the data?  It doesn't seem to harm anything.
>
> I believe it might mess up bridging logic if it tries to send the entire
> skb to a peer interface, ie cause an extra 4 bytes to be written.

hm, good point, we'll look into that.

> My personal preference is to set a flag in the skb struct indicating whether
> or not the crc is appended (and skb_put).  Then, bridging code can ignore it if needed,
> and sniffers and such can get the CRC in user-land.  To remain backwards compat,
> at least the skb-put of the crc logic should default to OFF so that we don't
> break any existing user-land bridging logic.  I have the ethtool API logic written to
> twiddle this save-crc behaviour if someone decides this is worthy of the kernel.

that seems like an excellent idea, however, finding room in the skb
struct is fun.

> > Well, its a changing picture.  I had planned to eventually enable the
> > hardware to strip the CRC if we aren't connected to some kind of
> > offboard management.  We'll get there in steps.
>
> So, as of 2.6.16.13, is the hardware stripping (SERC) enabled?  Could
> you also let me know where this bit is defined in case I want to twiddle
> it myself (a quick grep for SERC in 2.6.16.13 yields nothing.)

Yes, SECRC is enabled in 2.6.16, for both packet split and legacy
receive paths.  It will probably stay enabled for 2.6.17 too unless
the BMC communication bug is rated important enough for the change to
be made.  Unfortunately right now due to SECRC bit being set BMC
communication over SMBUS is likely broken.

Jesse

^ permalink raw reply

* dBm cutoff at -1dBm is too low
From: Pavel Roskin @ 2006-05-04 16:37 UTC (permalink / raw)
  To: Jean Tourrilhes; +Cc: NetDev

Hello, Jean!

I'm converting Orinoco to the dBm reporting, and it turns out that the
best signal iwconfig will report is -1dBm (0.8mW).  This would happen if
qual->level has its highest value of 255.  Please see this code from
wireless_tools.29.pre10:

len = snprintf(buffer, buflen, "Signal level%c%d dBm  ",
               qual->updated & IW_QUAL_LEVEL_UPDATED ? '=' : ':',
               qual->level - 0x100);

With most cards transmitting at 100mW and some going as high as 500mW,
it's not unreasonable to expect that the received signal may exceed 1mW
for closely located receivers with good antennas.  I have seen HostAP
reporting as much as 3mW through the proc filesystem!

Wouldn't it be better to put the cutoff at a higher value?  The simplest
approach would be to treat qual->level and qual->noise as signed char,
which would put the cutoff and 127dBm.  500 gigawatts should be enough
for everyone :-)

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: [PATCH wireless-dev] d80211: Add support for user space client MLME
From: Jouni Malinen @ 2006-05-04 16:44 UTC (permalink / raw)
  To: Jiri Benc; +Cc: John W. Linville, netdev, jkmaline
In-Reply-To: <20060504110028.5af3a22a@griffin.suse.cz>

On Thu, May 04, 2006 at 11:00:28AM +0200, Jiri Benc wrote:

> Ok. So what about PRISM2_HOSTAPD_MGMT_IF ioctl that will add management
> interface (if not added yet) and return its ifindex? It would accept a
> boolean parameter (like PRISM2_PARAM_HOSTAPD) to add/remove that
> interface. If you agree with this, I will make a patch.

Yes, that would be fine.

> Of course, both hostapd and wpa_supplicant needs to be changed for that.
> This should not be a problem now as the d80211 stack is not in a common
> use yet.

Sure, no problems with that at the moment.

> So wpa_supplicant needs to know if the card is softmac or fullmac and
> behave accordingly, right?

Yes. Currently, I'm just using a configuration file parameter for this,
but this information should eventually be exported by the kernel code to
allow wpa_supplicant to select which mode to use without requiring user
configuration.

-- 
Jouni Malinen                                            PGP id EFC895FA

^ permalink raw reply

* Re: Question on e1000 patch, rx-copy-break related.
From: Ben Greear @ 2006-05-04 16:48 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: NetDev, jeffrey.t.kirsher
In-Reply-To: <4807377b0605040900y31bfaec3o73cb04c5c9a2e4e@mail.gmail.com>

Jesse Brandeburg wrote:

>> My personal preference is to set a flag in the skb struct indicating 
>> whether
>> or not the crc is appended (and skb_put).  Then, bridging code can 
>> ignore it if needed,
>> and sniffers and such can get the CRC in user-land.  To remain 
>> backwards compat,
>> at least the skb-put of the crc logic should default to OFF so that we 
>> don't
>> break any existing user-land bridging logic.  I have the ethtool API 
>> logic written to
>> twiddle this save-crc behaviour if someone decides this is worthy of 
>> the kernel.
> 
> 
> that seems like an excellent idea, however, finding room in the skb
> struct is fun.

This field has 2 bits free.  We only need one bit for this feature.

	__u8			pkt_type:3,
				fclone:2,
				ipvs_property:1;

>> > Well, its a changing picture.  I had planned to eventually enable the
>> > hardware to strip the CRC if we aren't connected to some kind of
>> > offboard management.  We'll get there in steps.
>>
>> So, as of 2.6.16.13, is the hardware stripping (SERC) enabled?  Could
>> you also let me know where this bit is defined in case I want to twiddle
>> it myself (a quick grep for SERC in 2.6.16.13 yields nothing.)
> 
> 
> Yes, SECRC is enabled in 2.6.16, for both packet split and legacy
> receive paths.  It will probably stay enabled for 2.6.17 too unless
> the BMC communication bug is rated important enough for the change to
> be made.  Unfortunately right now due to SECRC bit being set BMC
> communication over SMBUS is likely broken.

Ok, thanks for the info.

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* [PATCH] Re: a question on tcp_highspeed.c (in 2.6.16)
From: John Heffner @ 2006-05-04 19:06 UTC (permalink / raw)
  To: Xiaoliang (David) Wei; +Cc: netdev
In-Reply-To: <7335583a0605040703x1d6e8a20n515b22241795d3ab@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Xiaoliang (David) Wei wrote:
> Hi gurus,
> 
>    I am reading the code of tcp_highspeed.c in the kernel and have a
> question on the hstcp_cong_avoid function, specifically the following
> AI part (line 136~143 in net/ipv4/tcp_highspeed.c ):
> 
>                /* Do additive increase */
>                if (tp->snd_cwnd < tp->snd_cwnd_clamp) {
>                        tp->snd_cwnd_cnt += ca->ai;
>                        if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
>                                tp->snd_cwnd++;
>                                tp->snd_cwnd_cnt -= tp->snd_cwnd;
>                        }
>                }
> 
>    In this part, when (tp->snd_cwnd_cnt == tp->snd_cwnd),
> snd_cwnd_cnt will be -1... snd_cwnd_cnt is defined as u16, will this
> small chance of getting -1 becomes a problem?
> Shall we change it by reversing the order of the cwnd++ and cwnd_cnt -= 
> cwnd?

Absolutely correct.  Thanks.

Signed-off-by: John Heffner <jheffner@psc.edu>

[-- Attachment #2: highspeed_cwnd_cnt.patch --]
[-- Type: text/plain, Size: 434 bytes --]

diff --git a/net/ipv4/tcp_highspeed.c b/net/ipv4/tcp_highspeed.c
index e0e9d13..b72fa55 100644
--- a/net/ipv4/tcp_highspeed.c
+++ b/net/ipv4/tcp_highspeed.c
@@ -137,8 +137,8 @@ static void hstcp_cong_avoid(struct sock
 		if (tp->snd_cwnd < tp->snd_cwnd_clamp) {
 			tp->snd_cwnd_cnt += ca->ai;
 			if (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
-				tp->snd_cwnd++;
 				tp->snd_cwnd_cnt -= tp->snd_cwnd;
+				tp->snd_cwnd++;
 			}
 		}
 	}

^ permalink raw reply related

* Re: [PATCH] bcm43xx-d80211: proper implementation of virtual interface support
From: Marcus Better @ 2006-05-04 19:26 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20060502132037.059a7cb2@griffin.suse.cz>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jiri Benc wrote:
> This is unnecessary. AFAIK bcm43xx hardware doesn't support more than one
> interface at a time (not counting monitor interfaces as those are somewhat
> special).

I wonder what exactly is causing this limitation? Why isn't it just a matter
of software support?

Marcus

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFEWlUrXjXn6TzcAQkRAoa6AJ97kI8F7Q42orjNQC6HBIXwj3I/DwCgux18
RBkhEcOFBDtQGEHFq2eMxKU=
=3WoL
-----END PGP SIGNATURE-----


^ permalink raw reply

* [rfc][patch] ipvs: use proper timeout instead of fixed value
From: Andy Gospodarek @ 2006-05-04 20:11 UTC (permalink / raw)
  To: netdev; +Cc: wensong, horms, ja


Instead of using the default timeout of 3 minutes, this uses the timeout
specific to the protocol used for the connection. The 3 minute timeout
seems somewhat arbitrary (though I know it is used other places in the
ipvs code) and when failing over it would be much nicer to use one of
the configured timeout values.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
---

 ip_vs_sync.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -67,7 +67,6 @@ struct ip_vs_sync_conn_options {
 	struct ip_vs_seq        out_seq;        /* outgoing seq. struct */
 };
 
-#define IP_VS_SYNC_CONN_TIMEOUT (3*60*HZ)
 #define SIMPLE_CONN_SIZE  (sizeof(struct ip_vs_sync_conn))
 #define FULL_CONN_SIZE  \
 (sizeof(struct ip_vs_sync_conn) + sizeof(struct ip_vs_sync_conn_options))
@@ -279,6 +278,7 @@ static void ip_vs_process_message(const 
 	struct ip_vs_sync_conn *s;
 	struct ip_vs_sync_conn_options *opt;
 	struct ip_vs_conn *cp;
+	struct ip_vs_protocol *pp;
 	char *p;
 	int i;
 
@@ -337,7 +337,8 @@ static void ip_vs_process_message(const 
 			p += SIMPLE_CONN_SIZE;
 
 		atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]);
-		cp->timeout = IP_VS_SYNC_CONN_TIMEOUT;
+		pp = ip_vs_proto_get(s->protocol);
+		cp->timeout = pp->timeout_table[cp->state];
 		ip_vs_conn_put(cp);
 
 		if (p > buffer+buflen) {

^ permalink raw reply

* Re: [PATCH] bcm43xx-d80211: proper implementation of virtual interface support
From: Ivo van Doorn @ 2006-05-04 20:55 UTC (permalink / raw)
  To: Marcus Better; +Cc: netdev
In-Reply-To: <e3dkfd$pc5$1@sea.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]

On Thursday 4 May 2006 21:26, Marcus Better wrote:
> Jiri Benc wrote:
> > This is unnecessary. AFAIK bcm43xx hardware doesn't support more than one
> > interface at a time (not counting monitor interfaces as those are somewhat
> > special).
> 
> I wonder what exactly is causing this limitation? Why isn't it just a matter
> of software support?

Hardware limitations probably. For example the Ralink devices.
The MAC address must be written into the registers for correct behaviour.
When multiple virtual interfaces are going to be used, the device
should be working under multiple MAC addresses. While the device
could be tricked into doing that (I believe rt61 and rt73 drivers from Ralink
are making such an attempt) it could give quite some overhead when for
each frame that is being send the driver should sort out from which
virtual interface it came, write the matching MAC address to the register,
and hope that the device will like it when the MAC address is changed so often
and the RX is not being affected because of some hardware filter.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH] bcm43xx-d80211: proper implementation of virtual interface support
From: Jiri Benc @ 2006-05-04 22:22 UTC (permalink / raw)
  To: Ivo van Doorn; +Cc: Marcus Better, netdev
In-Reply-To: <200605042256.02333.IvDoorn@gmail.com>

Thu, 4 May 2006 22:55:58 +0200, Ivo van Doorn pise:
> On Thursday 4 May 2006 21:26, Marcus Better wrote:
> > I wonder what exactly is causing this limitation? Why isn't it just a
> > matter of software support?
> 
> Hardware limitations probably. For example the Ralink devices.
> The MAC address must be written into the registers for correct behaviour.
> When multiple virtual interfaces are going to be used, the device
> should be working under multiple MAC addresses. While the device
> could be tricked into doing that (I believe rt61 and rt73 drivers from
> Ralink are making such an attempt) it could give quite some overhead when
> for each frame that is being send the driver should sort out from which
> virtual interface it came, write the matching MAC address to the register,
> and hope that the device will like it when the MAC address is changed so
> often and the RX is not being affected because of some hardware filter.

The actual problem is with receiving. ACK frames need to be sent exactly
after SIFS interval (10 or 16 microseconds depending on a PHY, +/- 10%)
after last bit of the frame was received. This means it cannot be done in a
software. You need to tell the hardware about MAC address you are using and
hardware will ACK all frames destined to that address (well, actually only
that received with valid FCS, but it doesn't matter). And most hardware
don't allow to specify more MAC addresses.

 Jiri

-- 
Jiri Benc
SUSE Labs

^ permalink raw reply

* Re: VJ Channel API - driver level (PATCH)
From: Alex Aizman @ 2006-05-04 22:49 UTC (permalink / raw)
  To: Caitlin Bestler
  Cc: David S. Miller, johnpol, Leonid.Grossman, shemminger, netdev
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F149E8D5@NT-SJCA-0751.brcm.ad.broadcom.com>

So, what are the requirements? The hardware already parses L2, L3, L4 headers, 
and for the future generation we could add to the set of already supported 
steering/filtering criteria. Having some discussion on the essential vs. 
optional requirements seems like the right thing at this point.

On one hand, this describes what's available:

http://www.spinics.net/lists/netdev/msg04001.html

OTOH, and this is just my opinion - it'd be unrealistic to expect a general 
purpose NIC to offload the entire netfilter. On the third hand, one could 
think of IPsec and/or NAT, and what happens then with the hardware-supported 
filtering. And so on.

There's also a question of relative importance specifically for the Data 
Center environment.

Anyway, discussion would help.

Caitlin Bestler wrote:
> David S. Miller wrote:
>> From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
>> Date: Wed, 3 May 2006 22:07:40 +0400
>>
>>> On Wed, May 03, 2006 at 08:56:23AM -0700, Caitlin Bestler
>> (caitlinb@broadcom.com) wrote:
>>>>> I'd expect high end NIC ASICs to implement rx steering based upon
>>>>> some sort of hash (for load balancing), as well as explicit "1:1"
>>>>> steering between a sw channel and a hw channel. Both options for
>>>>> channel configuration are present in the driver interface.
>>>>> If netfilter assists can be done in hardware, I agree the driver
>>>>> interface will need to add support for these - otherwise,
>>>>> netfilter processing will stay above the driver.
>>>>>
>>>>>
>>>> Even if the hardware cannot fully implement netfilter rules there is
>>>> still value in having an interface that documents exactly how much
>>>> filtering a given piece of hardware can do.
>>>> There is no point in having the kernel repeat packet classifications
>>>> that have already been done by the NIC.
>>> Please do not suppose that vj channel must rely on underlaying
>>> hardware. 
>> I am not.  I am just saying that it is futile to build
>> hardware that cannot handle netfilter at least to some
>> extent, because this will result in HW net channels being
>> disabled for %99 of real users which makes the hardware just a waste.
> 
> Or netfilters being disabled, which would be just as bad or worse.
> The kernel and hardware need to co-operate so that users are not
> asked to make artificial choices between performance and security.
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply

* Re: VJ Channel API - driver level (PATCH)
From: David S. Miller @ 2006-05-04 23:04 UTC (permalink / raw)
  To: alex; +Cc: caitlinb, johnpol, Leonid.Grossman, shemminger, netdev
In-Reply-To: <445A84E7.1060906@neterion.com>

From: Alex Aizman <alex@neterion.com>
Date: Thu, 04 May 2006 15:49:11 -0700

> So, what are the requirements?

I will say it a 10th time, "we simply don't know yet."

Please be patient and let us design the net channel infrastructure
properly, then we can think clearly about how hardware might support
things.

Hardware folks are jumping the gun and it's very annoying and takes
precious time away from thinking about and working on the
implementation.

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-05-04 23:11 UTC (permalink / raw)
  To: kelly; +Cc: netdev, rusty
In-Reply-To: <200605041728.27131.kelly@au.ibm.com>

From: Kelly Daly <kelly@au1.ibm.com>
Date: Thu, 4 May 2006 17:28:27 +1000

> On Wednesday 26 April 2006 17:59, David S. Miller wrote:
> > Next, you can't even begin to work on the protocol channels before you
> > do one very important piece of work.  Integration of all of the ipv4
> > and ipv6 protocol hash tables into a central code, it's a total
> > prerequisite.  Then you modify things to use a generic
> > inet_{,listen_}lookup() or inet6_{,listen_}lookup() that takes a
> > protocol number as well as saddr/daddr/sport/dport and searches
> > from a central table.
> 
> Back here again  ;)
> 
> Is this on the right track (see patch below)?

It is on the right track.

I very much fear abuse of the inet_hashes[] array.  So I'd rather
hide it behind a programmatic interface, something like:

extern struct sock *inet_lookup_proto(u16 protocol, u32 saddr, u16 sport,
				      u32 daddr, u16 dport, int ifindex);

and export that from inet_hashtables.c

Then you have registry and unregistry functions in inet_hashtables.c
that setup the static inet_hashes[] array.  So TCP would go:

	inet_hash_register(IPPROTO_TCP, &tcp_hashinfo);

instead of the direct assignment to inet_hashes[] it makes right
now in your patch.

Thanks!

^ permalink raw reply

* pci_enable_msix throws up error
From: Ravinandan Arakali @ 2006-05-04 23:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ananda. Raju, netdev, Leonid Grossman

Hi,
I am seeing the following problem with MSI/MSI-X.

Note: I am copying netdev since other network drivers use
this feature and somebody on the list could throw light.

Our 10G network card(Xframe II) supports MSI and MSI-X.
When I load/unload the driver with MSI support followed
by an attempt to load with MSI-X, I get the following
message from pci_enable_msix:

"Can't enable MSI-X.  Device already has an MSI vector assigned"

I seem to be doing the correct things when unloading the
MSI driver. Basically, I do free_irq() followed by pci_disable_msi().
Any idea what I am missing ?

Further analysis:
Looking at the code, the following check(when it finds a match) in
msi_lookup_vector(called by pci_enable_msix) seems to throw up this
message:
if (!msi_desc[vector] || msi_desc[vector]->dev != dev ||
    msi_desc[vector]->msi_attrib.type != type ||
    msi_desc[vector]->msi_attrib.default_vector != dev->irq)

pci_enable_msi, on successful completion will populate the
fields in msi_desc. But neither pci_disable_msi nor free_irq
seems to undo/unpopulate the msi_desc table.
Could this be the cause for the problem ?

Thanks,
Ravi

^ permalink raw reply

* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
From: David S. Miller @ 2006-05-04 23:22 UTC (permalink / raw)
  To: kelly; +Cc: rusty, netdev
In-Reply-To: <200605041259.23204.kelly@au.ibm.com>

From: Kelly Daly <kelly@au1.ibm.com>
Date: Thu, 4 May 2006 12:59:23 +1000

> We DID write an infrastructure to resolve this issue, although it is more 
> complex than the dynamic descriptor scheme for userspace.  And we want to 
> keep this simple - right?

Yes.

I wonder if it is possible to manage the buffer pool just like a SLAB
cache to deal with the variable lifetimes.  The system has a natural
"working set" size of networking buffers at a given point in time and
even the default net channel can grow to accomodate that with some
kind of limit.

This is kind of what I was alluding to in the past, in that we now
have globals limits on system TCP socket memory when really what we
want to do is have a set of global generic system packet memory
limits.

These two things can tie in together.

Note that this means we need a callback in the SKB to free the memory
up.  For direct net channels to a socket, you don't need any callbacks
of course because as you mentioned you know the buffer lifetimes.

People want such a callback anyways in order to experiment with SKB
recycling in drivers.

Note that some kind of "shrink" callback would need to be implemented.
It would only be needed for the default channel.  We need to seriously
avoid needing something like this over the socket net channels because
that is serious complexity.

Finally... if we go the global packet memory route, we will need hard
and soft limits.  There is a danger in such a scheme of not being able
to get critical control packets out (ACKs, etc.).  Also, there are all
kinds of classification and drop algorithms (see RED) which could be
used to handle overload situations gracefully.

So, are you still sure you want to do away with the descriptors for
the default channel?  Is the scheme I have outlined above doable or
is there some critical barrier or some complexity issue which makes
it undesirable?

^ permalink raw reply

* Re: latest -stable breaks Squid
From: David S. Miller @ 2006-05-04 23:25 UTC (permalink / raw)
  To: imcdnzl; +Cc: greearb, herbert, davej, netdev, stable, fedora-list, fedoralist
In-Reply-To: <cbec11ac0605031859n5ef0babcib5f970da1ced7df7@mail.gmail.com>

From: "Ian McDonald" <imcdnzl@gmail.com>
Date: Thu, 4 May 2006 13:59:04 +1200

> Wouldn't it be more likely commit 5d0b6f2bdaf7e016e750cd24164a241512d968a3
> 
> as this touches net/ipv4/tcp_output.c and is also in same general area?

This commit makes us account transmit memory properly.  Previously we
were underaccounting which is a serious error and in fact could result
in assertion failures due to sk->sk_forward_alloc going negative if
things were just right.

If this change is what makes an application go slower, then the
problem is likely that the socket send buffer limits are not being set
large enough.

That being said, the first thing that should be tried is reverting
the above mentioned change and see if the problem goes away.  If
so, then we need to investigate what the bandwidth delay product is
for the connection, and whether the socket send buffer is set large
enough for that size of pipe.

^ permalink raw reply

* [PATCH] bcm43xx: check for valid MAC address in SPROM
From: Stefano Brivio @ 2006-05-04 23:26 UTC (permalink / raw)
  To: John W. Linville, Andrew Morton, bcm43xx-dev, netdev

Please apply to 2.6.17, as it fixes a problem that prevents bcm43xx devices
which support 802.11a in addition to 802.11b/g from working, as the MAC
address isn't detected correctly. This applies to 2.6.17-rc3.

--

Check for valid MAC address in SPROM fields instead of relying on PHY type
while setting the MAC address in the networking subsystem, as some devices
have multiple PHYs.

Signed-off-by: Stefano Brivio <stefano.brivio@polimi.it>

Index: linux-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c
===================================================================
--- linux-2.6.orig/drivers/net/wireless/bcm43xx/bcm43xx_main.c	2006-05-05 00:50:00.370034536 +0200
+++ linux-2.6/drivers/net/wireless/bcm43xx/bcm43xx_main.c	2006-05-05 00:50:03.926493872 +0200
@@ -3482,7 +3482,7 @@
	bcm43xx_pctl_set_crystal(bcm, 0);

	/* Set the MAC address in the networking subsystem */
-	if (bcm43xx_current_phy(bcm)->type == BCM43xx_PHYTYPE_A)
+	if (is_valid_ether_addr(bcm->sprom.et1macaddr))
		memcpy(bcm->net_dev->dev_addr, bcm->sprom.et1macaddr, 6);
	else
		memcpy(bcm->net_dev->dev_addr, bcm->sprom.il0macaddr, 6);

--
Ciao
Stefano

^ permalink raw reply

* Re: latest -stable breaks Squid
From: Herbert Xu @ 2006-05-04 23:30 UTC (permalink / raw)
  To: David S. Miller
  Cc: imcdnzl, greearb, davej, netdev, stable, fedora-list, fedoralist
In-Reply-To: <20060504.162546.88959729.davem@davemloft.net>

On Thu, May 04, 2006 at 04:25:46PM -0700, David S. Miller wrote:
> 
> That being said, the first thing that should be tried is reverting
> the above mentioned change and see if the problem goes away.  If
> so, then we need to investigate what the bandwidth delay product is
> for the connection, and whether the socket send buffer is set large
> enough for that size of pipe.

I've tracked the networking breakage down to a broken Xen patch.
So 2.6.16.13 is all OK.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: how to change classful netem loss probability?
From: George P Nychis @ 2006-05-04 23:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: lartc, netdev
In-Reply-To: <20060428102421.7d304ca6@localhost.localdomain>

where did you send this patch?

I don't think I ever got it

> Loss was broken, patch sent.
> 
> The following works now:
> 
> # tc qdisc add dev eth1 root handle 1:0 netem loss 20%
> 
> # tc qdisc add dev eth1 parent 1:1 handle 10: tbf \ rate 256kbit buffer
> 1600 limit 3000 # ping -f -c 1000 shell
> 
> 1000 packets transmitted, 781 received, 21% packet loss, time 3214ms rtt
> min/avg/max/mdev = 0.187/0.398/3.763/0.730 ms, ipg/ewma 3.217/0.538 ms
> 
> # tc qdisc chang dev eth1 handle 1: netem loss 1% # ping -f -c 1000 shell
> 
> 1000 packets transmitted, 990 received, 1% packet loss, time 2922ms rtt
> min/avg/max/mdev = 0.187/2.739/3.298/0.789 ms, ipg/ewma 2.924/2.084 ms
> 
> 
> 
> 


-- 

^ permalink raw reply

* [PATCH] bridge: keep track of received multicast packets
From: Stephen Hemminger @ 2006-05-04 23:10 UTC (permalink / raw)
  To: Paul C. Diem, David S. Miller; +Cc: bridge, netdev
In-Reply-To: <FFEPKIHJDEDDAAFOFDGFMEJGCBAA.PCDiem@FoxValley.net>

It makes sense to add this simple statistic to keep track of received
multicast packets.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

--- bridge.orig/net/bridge/br_input.c	2006-04-21 14:28:55.000000000 -0700
+++ bridge/net/bridge/br_input.c	2006-05-04 16:07:24.000000000 -0700
@@ -66,6 +66,7 @@
 	}

 	if (is_multicast_ether_addr(dest)) {
+		br->statistics.multicast++;
 		br_flood_forward(br, skb, !passedup);
 		if (!passedup)
 			br_pass_frame_up(br, skb);

^ permalink raw reply

* Re: latest -stable breaks Squid
From: Dave Jones @ 2006-05-04 23:47 UTC (permalink / raw)
  To: David S. Miller
  Cc: imcdnzl, greearb, herbert, netdev, stable, fedora-list,
	fedoralist
In-Reply-To: <20060504.162546.88959729.davem@davemloft.net>

On Thu, May 04, 2006 at 04:25:46PM -0700, David S. Miller wrote:

 > That being said, the first thing that should be tried is reverting
 > the above mentioned change and see if the problem goes away.  If
 > so, then we need to investigate what the bandwidth delay product is
 > for the connection, and whether the socket send buffer is set large
 > enough for that size of pipe.

It's now believed (after some detective work from Herbert) that
this round of problems wasn't caused by the -stable patch, but
by a bogus update to Xen which we carry in the Fedora kernel
that sneaked in without a changelog (which is why I didn't even suspect that thing
given it worked fine previously).

Until today I had no idea just how much that thing poked into net/

Ugh.  Apologies for the false alarm.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply

* Re: [PATCH 2/2] ipg: redundancy with mii.h
From: Francois Romieu @ 2006-05-04 23:55 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: David Vrabel, linux-kernel, netdev, david
In-Reply-To: <1146750276.11422.0.camel@localhost>

Pekka Enberg <penberg@cs.helsinki.fi> :
[...]
> My initial patches are here:
> 
> http://www.cs.helsinki.fi/u/penberg/linux/ip1000/

Nice. Thanks.

> I consider your tree the master now. Please let me know when you have
> recreated the history, so I can clone your tree. Thanks.

The new serie is available in branch 'netdev-ipg' at
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git

Former branch 'netdev-ipg' was moved to 'netdev-ipg.log'.

I have added a new patch (see (way) below).

$ git log master..

commit 17adeb85054a693224a2ab7787a224c722bdd4eb
Author: Francois Romieu <romieu@fr.zoreil.com>
Date:   Fri May 5 01:42:12 2006 +0200

    ip1000: ethtool {get/set}_settings support
    
    The patch shoehorns the driver into the usual mii/ethtool framework.
    mii_mutex will prove useful when the link management tasks issued in
    irq context are moved to user/workqueue context. Next step.
    
    At least the current code should not be _too_ broken.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 997a02f4c4b3cf2ec0177ba33ffda6d550396eb7
Author: Francois Romieu <romieu@fr.zoreil.com>
Date:   Thu May 4 00:29:59 2006 +0200

    ip1000: remove forward declarations
    
    It makes no sense in a new driver.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit b141c4ec3572f3151b2eb037ef5e7aba8d190445
Author: Francois Romieu <romieu@fr.zoreil.com>
Date:   Thu May 4 00:04:57 2006 +0200

    ip1000: replace #define with enum
    
    Added some underscores to improve readability.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 9d27e1ab51b2295d72ea254bbc8de5b5797a4139
Author: Francois Romieu <romieu@fr.zoreil.com>
Date:   Wed May 3 22:51:16 2006 +0200

    ip1000: removal of useless #defines
    
    IPG_TX_NOTBUSY apart (one occurence in ipg.c), the #defines appear
    nowhere in the sources.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 42ad0c39990c1897ee56dfc34bb4a7b2de76ace0
Author: Francois Romieu <romieu@fr.zoreil.com>
Date:   Wed May 3 22:44:47 2006 +0200

    ip1000: redundancy with mii.h - take II
    
    Replace a bunch of #define with their counterpart from mii.h
    
    It is applied to the usual MII registers this time.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit be97a643f5d955208c0be7aa9e16a05c7737a2be
Author: Romieu Francois <romieu@fr.zoreil.com>
Date:   Tue May 2 23:25:44 2006 +0200

    ip1000: redundancy with mii.h
    
    Replace a bunch of #define with their counterpart from mii.h
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 273a5a5f0815a35cf06faeafea6fc47503d364cb
Author: Romieu Francois <romieu@fr.zoreil.com>
Date:   Tue May 2 22:15:34 2006 +0200

    ip1000: sanitize the pci device table
    
    - vendor id belong to include/linux/pci_id.h ;
    - the pci table does not include all the devices in nics_supported ;
    - qualify the pci table as __devinitdata ;
    - kill 50 LOC.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit e1e773d79369783d68fd7f66fffbe154929df837
Author: Romieu Francois <romieu@fr.zoreil.com>
Date:   Tue May 2 01:07:48 2006 +0200

    ip1000: plug leaks in the error path of ipg_nic_open
    
    Added ipg_{rx/tx}_clear() to factor out some code.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 9cb655fddfd2d169bc7574e849fa5eb179fef12c
Author: Romieu Francois <romieu@fr.zoreil.com>
Date:   Tue May 2 00:15:54 2006 +0200

    ip1000: leaks in ipg_probe
    
    The error paths are badly broken.
    
    Bonus:
    - remove duplicate initialization of sp;
    - remove useless NULL initialization of dev;
    - USE_IO_OPS is not used (and the driver does not seem to care about
      posted writes, rejoice).
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 2b5d40c1c4cf795c31d2683bcdff646a8984115d
Author: Romieu Francois <romieu@fr.zoreil.com>
Date:   Mon May 1 23:52:37 2006 +0200

    ip1000: removal of unreachable code
    
    map/unmap is done in ipg_{probe/remove}
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

commit 5b8bde4d4c02e3d54672d0c431941d9357c35417
Author: Romieu Francois <romieu@fr.zoreil.com>
Date:   Mon May 1 22:40:29 2006 +0200

    ip1000: speed-up access to the PHY registers
    
    Reduce delays when reading/writing the PHY registers so we clock the
    MII management interface at 2.5 MHz (the maximum according to the
    datasheet) instead of 500 Hz.
    
    Signed-off-by: David Vrabel <dvrabel@cantab.net>

commit d4a621ceadb0e4ee83a644be4ffef8bdef687249
Author: David Vrabel <dvrabel@cantab.net>
Date:   Mon May 1 21:34:19 2006 +0200

    ip1000: root_dev removal and PHY initialization
    
    - Remove ether_crc_le() -- use crc32_le() instead.
    - No more nonsense with root_dev -- ipg_remove() now works.
    - Move PHY and MAC address initialization into the ipg_probe().
      It was previously filling in the MAC address on open which breaks
      some user space.
    - Folded ipg_nic_init into ipg_probe since it was broke otherwise.
    
    Signed-off-by: David Vrabel <dvrabel@cantab.net>

commit ae6f7b51bfb69a60062d390f10c83396c37f0301
Author: David Vrabel <dvrabel@cantab.net>
Date:   Mon May 1 13:20:49 2006 +0200

    ip1000: remove changelogs
    
    Signed-off-by: David Vrabel <dvrabel@cantab.net>

commit 7aeb86e2db5deac7d8357d6d794550999fd5e7c6
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 12:45:16 2006 +0300

    ip1000: alloc_etherdev already allocates memory for private data
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 49b94ecdf8124c988160585d460b1cc643ee07e0
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 12:42:02 2006 +0300

    ip1000: remove unused forward declarations
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit e85531a2d8c23b6ff17b3a8cd6d14e6c09954240
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 12:16:58 2006 +0300

    ip1000: interrupt handler code cleanups
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 822b9f528c47df857514e33d1ad98a9a551a3b00
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 12:05:09 2006 +0300

    ip1000: rename baseaddr to ioaddr
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit e8e5f8dece06429793074df6f956a8c4cb60c362
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 12:02:51 2006 +0300

    ip1000: remove device register write macros
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 8e1a207d9e0d355aa2949ef4e5710e8c8987b01b
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 11:41:01 2006 +0300

    ip1000: kill obfuscating register read macros
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit fd3e0c5b3599375024e1e7fc3ddb00f7f5bc0c53
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sun Apr 30 00:03:23 2006 +0300

    ip1000: use iomap for pio/mmio
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 7ed56b01680978afc5db87f684de81398366bf84
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sat Apr 29 14:02:34 2006 +0300

    ip1000: use static for private symbols
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit b83bac329befd061bb66377d33b1efd4dd39a931
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sat Apr 29 13:19:36 2006 +0300

    ip1000: sparse fixes
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit 65229dd8a40d887364f3aa60277ac3f0e8c64041
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sat Apr 29 13:06:27 2006 +0300

    ip1000: iomem annotations
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

commit d79a8a846188eee278f5b490b7f15c070f1245d4
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Sat Apr 29 12:41:45 2006 +0300

    ip1000: new gigabit ethernet device driver
    
    Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

----------------8<-----------------------------------------------------------

Patch of the day:

The patch shoehorns the driver into the usual mii/ethtool framework.
mii_mutex will prove useful when the link management tasks issued in
irq context are moved to user/workqueue context. Next step.

At least the current code should not be _too_ broken.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

---

 drivers/net/ipg.c |  344 ++++++++++++++++++++---------------------------------
 drivers/net/ipg.h |    3 
 2 files changed, 130 insertions(+), 217 deletions(-)

8e15d782fa4a0f483a71faec7ba17697d9397e76
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 31cde6c..5d19269 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -15,7 +15,9 @@
  * craig_rich@sundanceti.com
  */
 #include <linux/crc32.h>
+#include <linux/ethtool.h>
 #include <linux/mii.h>
+#include <linux/mutex.h>
 
 #define IPG_RX_RING_BYTES	(sizeof(struct RFD) * IPG_RFDLIST_LENGTH)
 #define IPG_TX_RING_BYTES	(sizeof(struct TFD) * IPG_TFDLIST_LENGTH)
@@ -161,8 +163,7 @@ static u16 read_phy_bit(void __iomem * i
 	return bit_data;
 }
 
-u16 read_phy_register(struct net_device * dev,
-		      int phy_address, int phy_register)
+static int mdio_read(struct net_device * dev, int phy_address, int phy_register)
 {
 	/* Read a register from the Physical Layer device located
 	 * on the IPG NIC, using the IPG PHYCTRL register.
@@ -176,7 +177,7 @@ u16 read_phy_register(struct net_device 
 	u8 databit;
 	u8 phyctrlpolarity;
 
-	IPG_DEBUG_MSG("read_phy_register\n");
+	IPG_DEBUG_MSG("mdio_read\n");
 
 	/* The GMII mangement frame structure for a read is as follows:
 	 *
@@ -274,8 +275,8 @@ u16 read_phy_register(struct net_device 
 	return field[6];
 }
 
-static void write_phy_register(struct net_device *dev,
-			       int phy_address, int phy_register, u16 writeval)
+static void mdio_write(struct net_device *dev,
+		       int phy_address, int phy_register, int val)
 {
 	/* Write to a register from the Physical Layer device located
 	 * on the IPG NIC, using the IPG PHYCTRL register.
@@ -289,7 +290,7 @@ static void write_phy_register(struct ne
 	u8 databit;
 	u8 phyctrlpolarity;
 
-	IPG_DEBUG_MSG("write_phy_register\n");
+	IPG_DEBUG_MSG("mdio_write\n");
 
 	/* The GMII mangement frame structure for a read is as follows:
 	 *
@@ -318,7 +319,7 @@ static void write_phy_register(struct ne
 	fieldlen[4] = 5;	/* REGAD */
 	field[5] = 0x0002;
 	fieldlen[5] = 2;	/* TA */
-	field[6] = writeval;
+	field[6] = val & 0xffff;
 	fieldlen[6] = 16;	/* DATA */
 	field[7] = 0x0000;
 	fieldlen[7] = 1;	/* IDLE */
@@ -486,7 +487,7 @@ static int ipg_tmi_fiber_detect(struct n
 
 	IPG_DEBUG_MSG("_tmi_fiber_detect\n");
 
-	phyid = read_phy_register(dev, phyaddr, MII_PHYSID1);
+	phyid = mdio_read(dev, phyaddr, MII_PHYSID1);
 
 	IPG_DEBUG_MSG("PHY ID = %x\n", phyid);
 
@@ -497,15 +498,14 @@ static int ipg_tmi_fiber_detect(struct n
 }
 #endif
 
+/* Find the GMII PHY address. */
 static int ipg_find_phyaddr(struct net_device *dev)
 {
-	/* Find the GMII PHY address. */
-
-	int i;
-	int phyaddr;
-	u32 status;
+	int phyaddr, i;
 
 	for (i = 0; i < 32; i++) {
+		u32 status;
+
 		/* Search for the correct PHY address among 32 possible. */
 		phyaddr = (IPG_NIC_PHY_ADDRESS + i) % 32;
 
@@ -513,13 +513,13 @@ static int ipg_find_phyaddr(struct net_d
 		   GMII_PHY_ID1
 		 */
 
-		status = read_phy_register(dev, phyaddr, MII_BMSR);
+		status = mdio_read(dev, phyaddr, MII_BMSR);
 
 		if ((status != 0xFFFF) && (status != 0))
 			return phyaddr;
 	}
 
-	return -1;
+	return 0x1f;
 }
 
 #ifdef NOTGRACE
@@ -618,10 +618,11 @@ #endif
 		       dev->name);
 		return -EILSEQ;
 	}
+	sp->mii_if.phy_id = phyaddr;
 
 	IPG_DEBUG_MSG("GMII/MII PHY address = %x\n", phyaddr);
 
-	status = read_phy_register(dev, phyaddr, MII_BMSR);
+	status = mdio_read(dev, phyaddr, MII_BMSR);
 
 	printk("PHYStatus = %x \n", status);
 	if ((status & BMSR_ANEGCAPABLE) == 0) {
@@ -631,8 +632,8 @@ #endif
 		return -EILSEQ;
 	}
 
-	advertisement = read_phy_register(dev, phyaddr, MII_ADVERTISE);
-	linkpartner_ability = read_phy_register(dev, phyaddr, MII_LPA);
+	advertisement = mdio_read(dev, phyaddr, MII_ADVERTISE);
+	linkpartner_ability = mdio_read(dev, phyaddr, MII_LPA);
 
 	printk("PHYadvertisement=%x LinkPartner=%x \n", advertisement,
 	       linkpartner_ability);
@@ -660,8 +661,8 @@ #endif
 			/* In 1000BASE-X using IPG's internal PCS
 			 * layer, so write to the GMII duplex bit.
 			 */
-			write_phy_register(dev, phyaddr, MII_BMCR,
-				read_phy_register(dev, phyaddr, MII_BMCR) |
+			mdio_write(dev, phyaddr, MII_BMCR,
+				mdio_read(dev, phyaddr, MII_BMCR) |
 					   ADVERTISE_1000HALF); // Typo ?
 
 		} else {
@@ -670,8 +671,8 @@ #endif
 			/* In 1000BASE-X using IPG's internal PCS
 			 * layer, so write to the GMII duplex bit.
 			 */
-			write_phy_register(dev, phyaddr, MII_BMCR,
-				read_phy_register(dev, phyaddr, MII_BMCR) &
+			mdio_write(dev, phyaddr, MII_BMCR,
+				mdio_read(dev, phyaddr, MII_BMCR) &
 					   ~ADVERTISE_1000HALF); // Typo ?
 		}
 	}
@@ -683,10 +684,8 @@ #endif
 		 * link partner abilities exchanged via next page
 		 * transfers.
 		 */
-		gigadvertisement =
-			read_phy_register(dev, phyaddr, MII_CTRL1000);
-		giglinkpartner_ability =
-			read_phy_register(dev, phyaddr, MII_STAT1000);
+		gigadvertisement = mdio_read(dev, phyaddr, MII_CTRL1000);
+		giglinkpartner_ability = mdio_read(dev, phyaddr, MII_STAT1000);
 
 		/* Compare the full duplex bits in the 1000BASE-T GMII
 		 * registers for the local device, and the link partner.
@@ -795,11 +794,10 @@ #define LPA_PAUSE_ANY	(LPA_1000XPAUSE_AS
 			/* PAUSE is not being advertised. Advertise
 			 * PAUSE and restart auto-negotiation.
 			 */
-			write_phy_register(dev, phyaddr, MII_ADVERTISE,
+			mdio_write(dev, phyaddr, MII_ADVERTISE,
 				(advertisement |
 				 ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM));
-			write_phy_register(dev, phyaddr, MII_BMCR,
-					   BMCR_ANRESTART);
+			mdio_write(dev, phyaddr, MII_BMCR, BMCR_ANRESTART);
 
 			return -EAGAIN;
 		}
@@ -845,10 +843,9 @@ #define LPA_PAUSE_ANY	(LPA_1000XPAUSE_AS
 			/* PAUSE is not being advertised. Advertise
 			 * PAUSE and restart auto-negotiation.
 			 */
-			write_phy_register(dev, phyaddr, MII_ADVERTISE,
-					   advertisement | ADVERTISE_PAUSE_CAP);
-			write_phy_register(dev, phyaddr, MII_BMCR,
-					   BMCR_ANRESTART);
+			mdio_write(dev, phyaddr, MII_ADVERTISE,
+				   advertisement | ADVERTISE_PAUSE_CAP);
+			mdio_write(dev, phyaddr, MII_BMCR, BMCR_ANRESTART);
 			return -EAGAIN;
 		}
 
@@ -2671,8 +2668,7 @@ static void ipg_set_phy_default_param(un
 				address = *phy_param;
 				value = *(phy_param + 1);
 				phy_param += 2;
-				write_phy_register(dev,
-						   phy_address, address, value);
+				mdio_write(dev, phy_address, address, value);
 				length -= 4;
 			}
 
@@ -2710,14 +2706,47 @@ static int read_eeprom(struct net_device
 	return ret;
 }
 
+static void ipg_init_mii(struct net_device *dev)
+{
+	struct ipg_nic_private *sp = netdev_priv(dev);
+	struct mii_if_info *mii_if = &sp->mii_if;
+	int phyaddr;
+
+	mii_if->dev          = dev;
+	mii_if->mdio_read    = mdio_read;
+	mii_if->mdio_write   = mdio_write;
+	mii_if->phy_id_mask  = 0x1f;
+	mii_if->reg_num_mask = 0x1f;
+
+	mii_if->phy_id = phyaddr = ipg_find_phyaddr(dev);
+
+	if (phyaddr != 0x1f) {
+		u16 mii_phyctrl, mii_1000cr;
+		u8 revisionid = 0;
+
+		mii_1000cr  = mdio_read(dev, phyaddr, MII_CTRL1000);
+		mii_1000cr |= ADVERTISE_1000FULL | ADVERTISE_1000HALF |
+			GMII_PHY_1000BASETCONTROL_PreferMaster;
+		mdio_write(dev, phyaddr, MII_CTRL1000, mii_1000cr);
+
+		mii_phyctrl = mdio_read(dev, phyaddr, MII_BMCR);
+
+		/* Set default phyparam */
+		pci_read_config_byte(sp->pdev, PCI_REVISION_ID, &revisionid);
+		ipg_set_phy_default_param(revisionid, dev, phyaddr);
+
+		/* Reset PHY */
+		mii_phyctrl |= BMCR_RESET | BMCR_ANRESTART;
+		mdio_write(dev, phyaddr, MII_BMCR, mii_phyctrl);
+
+	}
+}
+
 static int ipg_hw_init(struct net_device *dev)
 {
-	int phyaddr = 0;
-	int error = 0;
-	int i;
-	void __iomem *ioaddr = ipg_ioaddr(dev);
-	u8 revisionid = 0;
 	struct ipg_nic_private *sp = netdev_priv(dev);
+	void __iomem *ioaddr = sp->ioaddr;
+	int i, rc;
 
 	/* Reset all functions within the IPG. Do not assert
 	 * RST_OUT as not compatible with some PHYs.
@@ -2729,34 +2758,11 @@ static int ipg_hw_init(struct net_device
 	/* Read LED Mode Configuration from EEPROM */
 	sp->LED_Mode = read_eeprom(dev, 6);
 
-	error = ipg_reset(dev, i);
-	if (error < 0) {
-		return error;
-	}
-
-	ioaddr = ipg_ioaddr(dev);
-
-	/* Reset PHY. */
-	phyaddr = ipg_find_phyaddr(dev);
-
-	if (phyaddr != -1) {
-		u16 mii_phyctrl, mii_1000cr;
-		mii_1000cr =
-			read_phy_register(dev, phyaddr, MII_CTRL1000);
-		write_phy_register(dev, phyaddr, MII_CTRL1000,
-			mii_1000cr | ADVERTISE_1000FULL | ADVERTISE_1000HALF |
-				   GMII_PHY_1000BASETCONTROL_PreferMaster);
-
-		mii_phyctrl = read_phy_register(dev, phyaddr, MII_BMCR);
-		/* Set default phyparam */
-		pci_read_config_byte(sp->pdev, PCI_REVISION_ID, &revisionid);
-		ipg_set_phy_default_param(revisionid, dev, phyaddr);
-
-		/* reset Phy */
-		write_phy_register(dev, phyaddr, MII_BMCR,
-			(mii_phyctrl | BMCR_RESET | BMCR_ANRESTART));
+	rc = ipg_reset(dev, i);
+	if (rc < 0)
+		goto out;
 
-	}
+	ipg_init_mii(dev);
 
 	/* Read MAC Address from EERPOM Jesse20040128EEPROM_VALUE */
 	sp->StationAddr0 = read_eeprom(dev, 16);
@@ -2774,161 +2780,21 @@ static int ipg_hw_init(struct net_device
 	dev->dev_addr[3] = (ioread16(ioaddr + STATION_ADDRESS_1) & 0xff00) >> 8;
 	dev->dev_addr[4] =  ioread16(ioaddr + STATION_ADDRESS_2) & 0x00ff;
 	dev->dev_addr[5] = (ioread16(ioaddr + STATION_ADDRESS_2) & 0xff00) >> 8;
-
-	return 0;
+out:
+	return rc;
 }
 
-static int ipg_nic_do_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
+static int ipg_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
-	/* IOCTL commands for IPG NIC.
-	 *
-	 * SIOCDEVPRIVATE       nothing
-	 * SIOCDEVPRIVATE+1     register read
-	 *                      ifr_data[0] = 0x08, 0x10, 0x20
-	 *                      ifr_data[1] = register offset
-	 *                      ifr_data[2] = value read
-	 * SIOCDEVPRIVATE+2     register write
-	 *                      ifr_data[0] = 0x08, 0x10, 0x20
-	 *                      ifr_data[1] = register offset
-	 *                      ifr_data[2] = value to write
-	 * SIOCDEVPRIVATE+3     GMII register read
-	 *                      ifr_data[1] = register offset
-	 * SIOCDEVPRIVATE+4     GMII register write
-	 *                      ifr_data[1] = register offset
-	 *                      ifr_data[2] = value to write
-	 * SIOCDEVPRIVATE+5     PCI register read
-	 *                      ifr_data[0] = 0x08, 0x10, 0x20
-	 *                      ifr_data[1] = register offset
-	 *                      ifr_data[2] = value read
-	 * SIOCDEVPRIVATE+6     PCI register write
-	 *                      ifr_data[0] = 0x08, 0x10, 0x20
-	 *                      ifr_data[1] = register offset
-	 *                      ifr_data[2] = value to write
-	 *
-	 */
-
-	u8 val8;
-	u16 val16;
-	u32 val32;
-	unsigned int *data;
-	int phyaddr = 0;
-	void __iomem *ioaddr = ipg_ioaddr(dev);
 	struct ipg_nic_private *sp = netdev_priv(dev);
+	struct mii_ioctl_data *data = if_mii(ifr);
+	int rc;
 
-	IPG_DEBUG_MSG("_nic_do_ioctl\n");
-
-	data = (unsigned int *)&req->ifr_data;
-
-	switch (cmd) {
-	case SIOCDEVPRIVATE:
-		return 0;
-
-	case SIOCDEVPRIVATE + 1:
-		switch (data[0]) {
-		case 0x08:
-			data[2] = ioread8(ioaddr + data[1]);
-			return 0;
-
-		case 0x10:
-			data[2] = ioread16(ioaddr + data[1]);
-			return 0;
-
-		case 0x20:
-			data[2] = ioread32(ioaddr + data[1]);
-			return 0;
-
-		default:
-			data[2] = 0x00;
-			return -EINVAL;
-		}
-
-	case SIOCDEVPRIVATE + 2:
-		switch (data[0]) {
-		case 0x08:
-			iowrite8(data[2], ioaddr + data[1]);
-			return 0;
-
-		case 0x10:
-			iowrite16(data[2], ioaddr + data[1]);
-			return 0;
-
-		case 0x20:
-			iowrite32(data[2], ioaddr + data[1]);
-			return 0;
-
-		default:
-			return -EINVAL;
-		}
-
-	case SIOCDEVPRIVATE + 3:
-		phyaddr = ipg_find_phyaddr(dev);
-
-		if (phyaddr == -1)
-			return -EINVAL;
-
-		data[2] = read_phy_register(dev, phyaddr, data[1]);
-
-		return 0;
-
-	case SIOCDEVPRIVATE + 4:
-		phyaddr = ipg_find_phyaddr(dev);
-
-		if (phyaddr == -1)
-			return -EINVAL;
-
-		write_phy_register(dev, phyaddr, data[1], (u16) data[2]);
-
-		return 0;
-
-	case SIOCDEVPRIVATE + 5:
-		switch (data[0]) {
-		case 0x08:
-			pci_read_config_byte(sp->pdev, data[1], &val8);
-			data[2] = (unsigned int)val8;
-			return 0;
-
-		case 0x10:
-			pci_read_config_word(sp->pdev, data[1], &val16);
-			data[2] = (unsigned int)val16;
-			return 0;
-
-		case 0x20:
-			pci_read_config_dword(sp->pdev, data[1], &val32);
-			data[2] = (unsigned int)val32;
-			return 0;
-
-		default:
-			data[2] = 0x00;
-			return -EINVAL;
-		}
-
-	case SIOCDEVPRIVATE + 6:
-		switch (data[0]) {
-		case 0x08:
-			pci_write_config_byte(sp->pdev, data[1], (u8) data[2]);
-			return 0;
-
-		case 0x10:
-			pci_write_config_word(sp->pdev, data[1], (u16) data[2]);
-			return 0;
-
-		case 0x20:
-			pci_write_config_dword(sp->pdev, data[1],
-					       (u32) data[2]);
-			return 0;
+	mutex_lock(&sp->mii_mutex);
+	rc = generic_mii_ioctl(&sp->mii_if, data, cmd, NULL);
+	mutex_unlock(&sp->mii_mutex);
 
-		default:
-			return -EINVAL;
-		}
-
-	case SIOCSIFMTU:
-		{
-			return 0;
-		}
-
-	default:
-		return -EOPNOTSUPP;
-	}
+	return rc;
 }
 
 static int ipg_nic_change_mtu(struct net_device *dev, int new_mtu)
@@ -2953,6 +2819,48 @@ static int ipg_nic_change_mtu(struct net
 	return 0;
 }
 
+static int ipg_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+	struct ipg_nic_private *sp = netdev_priv(dev);
+	int rc;
+
+	mutex_lock(&sp->mii_mutex);
+	rc = mii_ethtool_gset(&sp->mii_if, cmd);
+	mutex_unlock(&sp->mii_mutex);
+
+	return rc;
+}
+
+static int ipg_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+	struct ipg_nic_private *sp = netdev_priv(dev);
+	int rc;
+
+	mutex_lock(&sp->mii_mutex);
+	rc = mii_ethtool_sset(&sp->mii_if, cmd);
+	mutex_unlock(&sp->mii_mutex);
+
+	return rc;
+}
+
+static int ipg_nway_reset(struct net_device *dev)
+{
+	struct ipg_nic_private *sp = netdev_priv(dev);
+	int rc;
+
+	mutex_lock(&sp->mii_mutex);
+	rc = mii_nway_restart(&sp->mii_if);
+	mutex_unlock(&sp->mii_mutex);
+
+	return rc;
+}
+
+static struct ethtool_ops ipg_ethtool_ops = {
+	.get_settings = ipg_get_settings,
+	.set_settings = ipg_set_settings,
+	.nway_reset   = ipg_nway_reset,
+};
+
 static void ipg_remove(struct pci_dev *pdev)
 {
 	struct net_device *dev = pci_get_drvdata(pdev);
@@ -3011,6 +2919,7 @@ static int __devinit ipg_probe(struct pc
 
 	sp = netdev_priv(dev);
 	spin_lock_init(&sp->lock);
+	mutex_init(&sp->mii_mutex);
 
 	/* Declare IPG NIC functions for Ethernet device methods.
 	 */
@@ -3019,11 +2928,12 @@ static int __devinit ipg_probe(struct pc
 	dev->hard_start_xmit = &ipg_nic_hard_start_xmit;
 	dev->get_stats = &ipg_nic_get_stats;
 	dev->set_multicast_list = &ipg_nic_set_multicast_list;
-	dev->do_ioctl = &ipg_nic_do_ioctl;
+	dev->do_ioctl = ipg_ioctl;
 	dev->change_mtu = &ipg_nic_change_mtu;
 
 	SET_MODULE_OWNER(dev);
 	SET_NETDEV_DEV(dev, &pdev->dev);
+	SET_ETHTOOL_OPS(dev, &ipg_ethtool_ops);
 
 	rc = pci_request_regions(pdev, DRV_NAME);
 	if (rc)
diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h
index 6c55b0b..58b1417 100644
--- a/drivers/net/ipg.h
+++ b/drivers/net/ipg.h
@@ -870,6 +870,9 @@ #endif
 	u16 StationAddr1;	/* Station Address in EEPROM Reg 0x11 */
 	u16 StationAddr2;	/* Station Address in EEPROM Reg 0x12 */
 
+	struct mutex		mii_mutex;
+	struct mii_if_info	mii_if;
+
 #ifdef IPG_DEBUG
 	int TFDunavailCount;
 	int RFDlistendCount;
-- 
1.3.1


^ permalink raw reply related

* Re: [PATCH 2/2] ipg: redundancy with mii.h
From: Francois Romieu @ 2006-05-05  0:24 UTC (permalink / raw)
  To: David Vrabel; +Cc: Pekka J Enberg, linux-kernel, netdev, david
In-Reply-To: <4459A4A6.1080207@cantab.net>

David Vrabel <dvrabel@cantab.net> :
[...]
> >    ipg: replace #define with enum
> >    
> >    Added some underscores to improve readability.
> >    
> >    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> 
> Nack.  Register names in code should match those used in the 
> documentation (even if they are a bit unreadable).  Though I will 
> conceed that the available datasheet doesn't actually describe the 
> majority of the registers.

We will have to agree to disagree. Feel free to open a different
branch.

-- 
Ueimor

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox