From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [RFR] gianfar ethernet driver Date: Wed, 07 Jul 2004 01:27:40 -0400 Sender: netdev-bounce@oss.sgi.com Message-ID: <40EB89CC.2040100@pobox.com> References: <89563A5C-CFAE-11D8-BA44-000393C30512@freescale.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Kumar Gala , netdev@oss.sgi.com, dwmw2@infradead.org, hadi@cyberus.ca Return-path: To: Andy Fleming In-Reply-To: <89563A5C-CFAE-11D8-BA44-000393C30512@freescale.com> Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Andy Fleming wrote: >>> 7) ethtool_ops should not be kmalloc'd. >>> + dev_ethtool_ops = >>> + (struct ethtool_ops *)kmalloc(sizeof(struct >>> ethtool_ops), >>> + GFP_KERNEL); > > > Is there a particular reason? The reason I did it this way is because > the driver supports a 10/100 controller which does not have the RMON > statistics, and therefore should not read from that register space. So > I needed to use different functions to fill in the statistics lists. Just switch out one static pointer for another, once you know the hardware you're dealing with. You _must_ have that knowledge anyway, before calling register_netdev(), otherwise you're got a race in initialization versus interface-up. >>> 8) I recommend enabling _both_ hardware interrupt coalescing and NAPI, >>> at the same time. Several other Linux net drivers need to be changed to >>> do this, as well > > > Ok... but this is possible with the driver as it is. Interrupt > Coalescing is runtime configurable, and NAPI is a compile-time option. > NAPI can sometimes hurt performance, and so we'd like to have the > ability to disable it for some deployments. I guess I'm not sure what > change this suggestion was meant to cause. Default hardware mitigation to on in all cases, and preferably NAPI as well. If you see cases that hurt performance, that wants investigating. >>> 14) surely gfar_set_mac_address() needs some sort of synchronization? > > > I'm not sure why. It only gets called during gfar_open(), and > startup_gfar() gets called after it. Incorrect. Set-mac-address can be called when the interface is up and active, such as by the bonding driver. >>> 15) gfar_change_mtu() synchronization appears absent? > > > I'm not exactly sure what kind of synchronization is expected here. > stop_gfar() and startup_gfar() do modify the appropriate hardware values. Same conditions (and response) as set-mac-address. These can be called while you're actively DMAing packets. >>> 23) gfar_set_multi() does not appear to take into account huge >>> multi-cast lists. Drivers usually handle large dev->mc_count by falling >>> back to ALLMULTI behavior. > > > Actually, it does. The bits which are set represent hash table values. > Essentially, the MAC addr is converted to an 8-bit CRC. This value > then serves as an index to the 256-bit hash table. If the list is > large, then certain bits may be written twice, but if all the bits are > set, then the behavior is the same as in the ALLMULTI behavior -- every > multicast packet arrives. It would be a mistake, IMHO, to cut it off > after N entries, as several of those entries could overlap in the > bitmap. Clearly, the less bits which are set, the more effective the > hardware filter, so checking each address will, I think, always be a > performance win or tie (on the filtering side) over ALLMULTI ok >>> 24) setting IFF_MULTICAST is suspicious at best, possibly wrong: >>> + dev->mtu = 1500; >>> + dev->set_multicast_list = gfar_set_multi; >>> + dev->flags |= IFF_MULTICAST; >>> + > > > I'll believe you, but is there a reason for this? I guess it's already > set, by default, so easy fix. follow the other drivers, and behave predictably :) >>> 28) I see you support register dump (good!), now please send the >>> register-pretty-print patch to the ethtool package :) > > > Heh. Well, I haven't written that, yet. There's actually a slight > problem, in that the 85xx registers are 32-bit, and refuse to be > accessed as bytes. I'll be sure to look at that in the near...ish future. Nothing wrong with accessing registers as 32-bit quantities. That attribute is common to a lot of NICs. >>> 31) infinite loops, particularly on 1-bits, are discouraged: >>> + /* Wait for the transaction to finish */ >>> + while (gfar_read(®base->miimind) & (MIIMIND_NOTVALID | >>> MIIMIND_BUSY)) >>> + cpu_relax(); > > > What is the suggested method for waiting for the bus to be free? Should > I timeout after some time, and bring the driver down? it's really up to you, as it depends on the implementation platforms. The most simple is to include a counter that counts down to zero, and starts some absurdly large number like 100000. >>> 33) merge-stopper: mii_parse_sr(). never wait for autonegotiation to >>> complete. it is an asynchronous operation that could exceed 30 seconds. > > > Hmm... > >>> 34) merge-stopper: dm9161_wait(). same thing as #33. > > > This may be a problem. That function is there to work around an issue > in the PHY, wherein if you try to configure it before it has come up all > the way, it refuses to bring the link up. We've sworn at this code many > times, but there has, as of yet, not been a good suggestion as to how we > can ensure that the 9161 is ready before we configure it. I interpret that as a driver bug :) As common sense, regardless of phy bugs, you should not be trying to configure the MAC _or_ the phy in the middle of autonegotiation. Presumeably you are using a combination of netif_stop_queue(), netif_carrier_off(), and netif_device_detach() to achieve this. >>> 35) liberally borrow code and ideas from Ben H's sungem_phy.c. >>> Eventually we want to move to a generic phy module. > > > Heh. ironically, I stole liberally from the ibm_emac driver, which now > looks exactly like the sungem_phy code for phy handling. A generic phy > module would be a good thing, and I'm even interested in helping with > code and/or suggestions. If nothing else, I'll jump on the new generic > phy module bandwagon! Cool. This item #35 is more of a long term "think about it" type of request. Please do not hesitate to think of ways that you could share code with sungem_phy.c, and/or make them both use the same API, and submit patches along those lines :) It's not enough to just write a driver, help change Linux for the better :) > From Jamal: > >> >> 1) The check (in gfar_start_xmit()): >> >> Should happen much sooner - i.e before the skb is touched. > > > I'm not sure I agree here. What I am doing is detecting the full state > before an skb needs to be rejected. I am testing the NEXT descriptor to > see if it is ready (if not, dirty_tx will be pointing to it). This way, > I always handle any skb that is passed to gfar_start_xmit() See my comments to jamal as well: if you guarantee that you always have room on the DMA ring for an additional skb, that check should never be needed. Some extremely cautious/paranoid programmers will add a check, with a printk noting it's a BUG (i.e. impossible) condition, such as if (unlikely(... no more descriptors ...)) { printk(KERN_ERR "%s: BUG: no room for TX\n", dev->name); netif_stop_queue(); spin_unlock_irqrestore(); return 1; } ... queue TX to hardware ... if (no more descriptors) netif_stop_queue() spin_unlock_irqrestore() >> 2) Also its pretty scary if you are doing: >> txbdp->status |= TXBD_INTERRUPT for every packet. >> Look at other drivers, they try to do this every few packets; >> or enable tx mitigation to slow the rate of those interupts. > > > I don't believe this is as dire as you think. This bit only indicates > that, if the conditions are right, an interrupt will be generated once > that descriptor is processed, and its data sent. Conditions which > mitigate that are: > 1) coalescing is on, and the timer and counter have not triggered the > interrupt yet > 2) NAPI is enabled, and so interrupts are disabled after the first > packet arrives > 3) NAPI is disabled, but the driver is currently handling a previous > interrupt, so the interrupts are disabled for now. Even at 10/100 speeds, you really don't want to be generating one interrupt per Tx... Jeff