RE: Submission for S2io 10GbE driver

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: Submission for S2io 10GbE driver
@ 2004-02-25  6:03 raghavendra.koushik
  2004-02-26  7:40 ` Jeff Garzik
  0 siblings, 1 reply; 18+ messages in thread
From: raghavendra.koushik @ 2004-02-25  6:03 UTC (permalink / raw)
  To: jgarzik, leonid.grossman; +Cc: netdev, raghavendra.koushik, ravinandan.arakali

Jeff,
	some questions on few of your comments.

>>30) do not call netif_stop_queue() and netif_wake_queue() on link 
>>events, in s2io_link.  Simply call netif_carrier_{on,off}.

When link goes down and I just call netif_carrier_off the upper layer
still continues to send packets to the s2io_xmit routine. In order to 
avoid this, I stop the queue and a corresponding wake when link returns.
Is there any particular reason why this should be avoided?

>>28) are you aware that all of s2io_tx_watchdog is inside the 
>>dev->tx_lock spinlock?  I am concern s2io_tx_watchdog execution time may
>>be quite excessive a duration to hold a spinlock.

Actually no. The intention is to reset the NIC and re-initialize it in the
tx_watchdog function and I'am not sure how else to do this.
Do you foresee a problem with the current method, because for most part of 
the function the queue would be in a stopped state (the netif_stop_queue is 
called right on top of s2io_close and the queue is woken up at almost
the end of s2io_open). 

>>29) never call netif_wake_queue() unconditionally.  only call it if you 
>>are 100% certain that the net stack is allowed to add another packet to 
>>your hardware's TX queue(s).

I wake the queue in txIntrHandler without checking anything because at this 
point I'am certain that some free transmit descriptors are available for 
new xmit. The tx Interrupt arrives only after one or more Tx descriptor and
buffer were successfully DMA'ed to the NIC and the ownership of these 
descriptor(s) is returned to the host.

Regards

Koushik

-----Original Message-----
From: Jeff Garzik [mailto:jgarzik@pobox.com] 
Sent: Tuesday, February 17, 2004 5:59 AM
To: Leonid Grossman
Cc: netdev@oss.sgi.com; raghavendra.koushik@s2io.com; 'ravinandan arakali'
Subject: Re: Submission for S2io 10GbE driver

Comments:

1) use ULL suffix on u64 constants.

static u64 round_robin_reg0 = 0x0001020304000105;
static u64 round_robin_reg1 = 0x0200030106000204;
static u64 round_robin_reg2 = 0x0103000502010007;
static u64 round_robin_reg3 = 0x0304010002060500;
static u64 round_robin_reg4 = 0x0103020400000000;

2) you'll want to (unfortunately) add #ifdefs around the PCI_xxx_ID 
constants, because a full submission to the kernel includes a patch to 
include/linux/pci_ids.h.

/* VENDOR and DEVICE ID of XENA. */
#define PCI_VENDOR_ID_S2IO      0x17D5
#define PCI_DEVICE_ID_S2IO_WIN  0x5731
#define PCI_DEVICE_ID_S2IO_UNI  0x5831

3) AS_A_MODULE is incorrect.

/* Load driver as a module */
#define AS_A_MODULE

First, it is defined unconditionally.  Second, it should not even exist. 
  The kernel module API is intentionally designed such that the source 
code functions whether a kernel module or built into vmlinux, without 
#ifdefs.  So, simply remove the ifdefs.

As a general rule, Linux kernel source code tries to be as free of 
ifdefs as possible.

4) You will of course need to change CONFIGURE_ETHTOOL_SUPPORT, 
CONFIGURE_NAPI_SUPPORT to Kconfig-generate CONFIG_xxx defines, when 
submitting.

5) again, follow the kernel's no-ifdef philosophy:

#ifdef KERN_26
static irqreturn_t s2io_isr(int irq, void *dev_id, struct pt_regs *regs); #else void s2io_isr(int irq, void *dev_id, struct pt_regs *regs); #endif /** KERN_26 **/

The "irqreturn_t" type was designed specifically to work without #ifdefs 
in earlier kernels.  Here is the proper compatibility code, taken from 
release kernel 2.4.25's include/linux/interrupt.h:

	/* For 2.6.x compatibility */
	typedef void irqreturn_t;
	#define IRQ_NONE
	#define IRQ_HANDLED
	#define IRQ_RETVAL(x)

I hope you notice a key philosophy emerging ;-)  You want to write a 
no-ifdef driver for 2.6, and then use the C pre-processor, typedefs, and 
other tricks to make the driver work on earlier kernels with as little 
modification as possible.

Look at http://sf.net/projects/gkernel/  module "kcompat" for an example 
of a toolkit which allows you to write a current driver, and then use it 
on older kernels.

6) delete, not needed

#ifdef UNDEFINED
         suspend:NULL,
         resume:NULL,
#endif

7) memory leak on error
                 /*  Allocating all the Rx blocks */
                 for (j = 0; j < blk_cnt; j++) {
                         size = (MAX_RXDS_PER_BLOCK + 1) * (sizeof(RxD_t));
                         tmp_v_addr = pci_alloc_consistent(nic->pdev, size,
                                                           &tmp_p_addr);
                         if (tmp_v_addr == NULL) {
                                 return -ENOMEM;
                         }
                         memset(tmp_v_addr, 0, size);

8) memory leak on error

/* Allocation and initialization of Statistics block */
         size = sizeof(StatInfo_t);
         mac_control->stats_mem = pci_alloc_consistent
             (nic->pdev, size, &mac_control->stats_mem_phy);

         if (!mac_control->stats_mem) {
                 return -ENOMEM;
         }

9) if you store a pointer for your shared memory, it is wasteful to 
store an -additional- flag indicating this memory has been allocated. 
simply check for NULL.

         if (nic->_fResource & TXD_ALLOCED) {
                 nic->_fResource &= ~TXD_ALLOCED;
                 pci_free_consistent(nic->pdev,
                                     mac_control->txd_list_mem_sz,

10) ULL suffix

         write64(&bar0->swapper_ctrl, 0xffffffffffffffff);
         val64 = (SWAPPER_CTRL_PIF_R_FE |

11) ditto this for other 64-bit constants

12) never mdelay() for this long.  Either create a timer, or make sure 
you're in process constant and sleep via schedule_timeout().

/* Remove XGXS from reset state*/
         val64 = 0;
         write64(&bar0->sw_reset, val64);
         mdelay(500);

13) memory writes without memory reads following them are often the 
victims of PCI write posting bugs.  At the very least, this driver 
appears to have many PCI write posting issues.

         write64(&bar0->dtx_control, 0x8000051500000000);
         udelay(50);
         write64(&bar0->dtx_control, 0x80000515000000E0);
         udelay(50);
         write64(&bar0->dtx_control, 0x80000515D93500E4);
         udelay(50);

         write64(&bar0->dtx_control, 0x8001051500000000);
         udelay(50);
         write64(&bar0->dtx_control, 0x80010515000000E0);
         udelay(50);
         write64(&bar0->dtx_control, 0x80010515001E00E4);
         udelay(50);

You are not guaranteed that the write will have completed, by the end of 
each udelay(), unless you first issue a PCI read of some sort.

14) another mdelay(500) loop to be fixed

/*  Wait for the operation to complete */
         time = 0;
         while (TRUE) {
                 val64 = read64(&bar0->rti_command_mem);
                 if (!(val64 & TTI_CMD_MEM_STROBE_NEW_CMD)) {
                         break;
                 }
                 if (time > 50) {
                         DBG_PRINT(ERR_DBG, "%s: RTI init Failed\n",
                                   dev->name);
                         return -1;
                 }
                 time++;
                 mdelay(10);

15) you obviously mean TASK_UNINTERRUPTIBLE here:

/* Enabling MC-RLDRAM */
         val64 = read64(&bar0->mc_rldram_mrs);
         val64 |= MC_RLDRAM_QUEUE_SIZE_ENABLE | MC_RLDRAM_MRS_ENABLE;
         write64(&bar0->mc_rldram_mrs, val64);
         set_current_state(TASK_INTERRUPTIBLE);
         schedule_timeout(HZ / 10);

16) get this from struct pci_dev, not directly from the PCI bus:

         /* SXE-002: Initialize link and activity LED */
         ret =
             pci_read_config_word(nic->pdev, PCI_SUBSYSTEM_ID,
                                  (u16 *) & subid);

17) question: do you not support more advanced checksum offload?  like 
ipv6 or "hey I put the packet checksum <here>"

18) waitForCmdComplete can mdelay() an unacceptably long time

19) ditto s2io_reset.

20) your driver has its spinlocks backwards!  Your interrupt handler 
uses spin_lock_irqsave(), and your non-interrupt handling code uses 
spin_lock().  That's backwards from correct.

21) s2io_close could mdelay() for unacceptably long time.  Fortunately, 
you -can- sleep here, so just replace with schedule_timeout() calls.

22) remove the commented-out MOD_{INC,DEC}_USE_COUNT.

23) your tx_lock spinlock is completely unused.  oops.  :)  the spinlock 
covers two areas of code, both of which are mutually exclusive.

Given this and #20... you might want to make sure to build and test on 
SMP.  Even SMP kernels on uniprocessor hardware helps find spinlock 
deadlocks.

24) your tx_lock does not cover the interrupt handler code.  I presume 
this is an oversight?

25) delete s2io_set_mac_addr.  It's not needed.  It is preferred to use 
the default eth_mac_addr.  Follow this procedure, usually:

	a) During probe, obtain MAC address from "original source",
	usually EEPROM / SROM.
	b) Each time dev->open() is called, write MAC address to h/w.

26) check and make sure you initialize your link to off 
(netif_carrier_off(dev)), in your dev->open() function.  In the 
background, your phy state machine should call netif_carrier_on() once 
it is certain link has been established.

this _must_ be an asynchronous process.  You may not sleep and wait for 
link, in dev->open().

27) for current 2.4 and 2.6 kernels, please use struct ethtool_ops 
rather than a large C switch statement.

28) are you aware that all of s2io_tx_watchdog is inside the 
dev->tx_lock spinlock?  I am concern s2io_tx_watchdog execution time may
be quite excessive a duration to hold a spinlock.

29) never call netif_wake_queue() unconditionally.  only call it if you 
are 100% certain that the net stack is allowed to add another packet to 
your hardware's TX queue(s).

30) do not call netif_stop_queue() and netif_wake_queue() on link 
events, in s2io_link.  Simply call netif_carrier_{on,off}.

31) ULL suffix

         } else if (!pci_set_dma_mask(pdev, 0xffffffff)) {

32) missing call to pci_disable_device() on error:

                 if (pci_set_consistent_dma_mask
                     (pdev, 0xffffffffffffffffULL)) {
                         DBG_PRINT(ERR_DBG,
                                   "Unable to obtain 64bit DMA for \
                                         consistent allocations\n");
                         return -ENOMEM;

33) if you use CHECKSUM_UNNECESSARY, you should be using the 
less-capable NETIF_F_IP_CSUM.

         dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;

NETIF_F_HW_CSUM requires the actual checksum value.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-02-25  6:03 Submission for S2io 10GbE driver raghavendra.koushik
@ 2004-02-26  7:40 ` Jeff Garzik
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Garzik @ 2004-02-26  7:40 UTC (permalink / raw)
  To: raghavendra.koushik
  Cc: leonid.grossman, netdev, raghavendra.koushik, ravinandan.arakali

raghavendra.koushik@wipro.com wrote:
> Jeff,
> 	some questions on few of your comments.
> 
> 
>>>30) do not call netif_stop_queue() and netif_wake_queue() on link
>>>events, in s2io_link.  Simply call netif_carrier_{on,off}.
> 
> 
> When link goes down and I just call netif_carrier_off the upper layer
> still continues to send packets to the s2io_xmit routine. In order to
> avoid this, I stop the queue and a corresponding wake when link returns.
> Is there any particular reason why this should be avoided?

Ignore me on this one, I am incorrect.


>>>28) are you aware that all of s2io_tx_watchdog is inside the
>>>dev->tx_lock spinlock?  I am concern s2io_tx_watchdog execution time may
>>>be quite excessive a duration to hold a spinlock.
> 
> 
> Actually no. The intention is to reset the NIC and re-initialize it in the
> tx_watchdog function and I'am not sure how else to do this.
> Do you foresee a problem with the current method, because for most part of
> the function the queue would be in a stopped state (the netif_stop_queue is
> called right on top of s2io_close and the queue is woken up at almost
> the end of s2io_open).

It is incorrect to perform any slow operation inside spin_lock_bh().

You can schedule_work() to perform the actual reset, for example, to get 
around this.


>>>29) never call netif_wake_queue() unconditionally.  only call it if you
>>>are 100% certain that the net stack is allowed to add another packet to
>>>your hardware's TX queue(s).
> 
> 
> I wake the queue in txIntrHandler without checking anything because at this
> point I'am certain that some free transmit descriptors are available for
> new xmit. The tx Interrupt arrives only after one or more Tx descriptor and
> buffer were successfully DMA'ed to the NIC and the ownership of these
> descriptor(s) is returned to the host.

OK, all good then.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <403573B5.4050100@pobox.com>]

* RE: Submission for S2io 10GbE driver
       [not found] <403573B5.4050100@pobox.com>
@ 2004-02-20  2:59 ` ravinandan arakali
  2004-02-20  3:30   ` Jeff Garzik
  0 siblings, 1 reply; 18+ messages in thread
From: ravinandan arakali @ 2004-02-20  2:59 UTC (permalink / raw)
  To: 'Jeff Garzik'
  Cc: raghavendra.koushik, leonid.grossman, netdev, raghavendra.koushik

Hi Jeff,
Since the whole nic structure is zeroed out initially, the
"block_virt_addr" is initially NULL. So, if the allocation
succeeds, it will hold a non-NULL value.
So, in the freeSharedMem() if we check for it's non-NULL
value and free it, we should be okay. Do you agree ?

Thanks,
Ravi

-----Original Message-----
From: Jeff Garzik [mailto:jgarzik@pobox.com] 
Sent: Thursday, February 19, 2004 6:41 PM
To: ravinandan arakali
Cc: raghavendra.koushik@wipro.com; leonid.grossman@s2io.com;
netdev@oss.sgi.com; raghavendra.koushik@s2io.com
Subject: Re: Submission for S2io 10GbE driver

ravinandan arakali wrote:
> Hi Jeff,
> For #7, the temporary variable tmp_v_addr is assigned as follows
> (immediately below the code you have quoted):
> 
> nic->rx_blocks[i][j].block_virt_addr = tmp_v_addr;

Yes... but only after the allocation.


> In the freeSharedMem() routine, we go thru' each block of each
> receive ring and free it. Following is the relevant piece of
> code:
> for (i = 0; i < config->RxRingNum; i++) {
>      blk_cnt = nic->block_count[i];
>      for (j = 0; j < blk_cnt; j++) {
>           tmp_v_addr = nic->rx_blocks[i][j].block_virt_addr;
>           tmp_p_addr = nic->rx_blocks[i][j].block_dma_addr;
>           pci_free_consistent(nic->pdev, size, tmp_v_addr,
tmp_p_addr);
>      }
> }
> 
> But I guess in the above piece of code, it may be a good idea to
> check if nic->rx_blocks[i][j].block_virt_addr is non-NULL before
> doing the pci_free_consistent().

Yes, you could add the check there.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-02-20  2:59 ` ravinandan arakali
@ 2004-02-20  3:30   ` Jeff Garzik
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Garzik @ 2004-02-20  3:30 UTC (permalink / raw)
  To: ravinandan arakali
  Cc: raghavendra.koushik, leonid.grossman, netdev, raghavendra.koushik

ravinandan arakali wrote:
> Hi Jeff,
> Since the whole nic structure is zeroed out initially, the
> "block_virt_addr" is initially NULL. So, if the allocation
> succeeds, it will hold a non-NULL value.
> So, in the freeSharedMem() if we check for it's non-NULL
> value and free it, we should be okay. Do you agree ?


Yes, agreed, sorry if I was not clear earlier.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
@ 2004-02-19  7:16 raghavendra.koushik
  2004-02-19  8:14 ` Jeff Garzik
  0 siblings, 1 reply; 18+ messages in thread
From: raghavendra.koushik @ 2004-02-19  7:16 UTC (permalink / raw)
  To: jgarzik, leonid.grossman; +Cc: netdev, raghavendra.koushik, ravinandan.arakali

Hi Jeff,

1. points 7 and 8, when initSharedMem returns error, I call
freeSharedMem which should free any partially alloced memory.

2. For point 17 and 33 
We do support IPV6 checksum offload. There is one issue though,
our hardware only says whether the checksum is Ok or not it does
not actually return the checksum values! 
The value I put into skb->csum is a dummy value and hence set 
ip_summed as UNNECESSARY in Rx path if checksums are reported OK.

If I say features as NETIF_F_IP_CSUM instead of NETIF_F_HW_CSUM,
then I cannot utilize it's entire gamut of checksum offload feature
as the offload will be limited to just TCP/UDP over IPV4.

Regards
Koushik

-----Original Message-----
From: Jeff Garzik [mailto:jgarzik@pobox.com] 
Sent: Tuesday, February 17, 2004 5:59 AM
To: Leonid Grossman
Cc: netdev@oss.sgi.com; raghavendra.koushik@s2io.com; 'ravinandan arakali'
Subject: Re: Submission for S2io 10GbE driver

Comments:

1) use ULL suffix on u64 constants.

static u64 round_robin_reg0 = 0x0001020304000105;
static u64 round_robin_reg1 = 0x0200030106000204;
static u64 round_robin_reg2 = 0x0103000502010007;
static u64 round_robin_reg3 = 0x0304010002060500;
static u64 round_robin_reg4 = 0x0103020400000000;

2) you'll want to (unfortunately) add #ifdefs around the PCI_xxx_ID 
constants, because a full submission to the kernel includes a patch to 
include/linux/pci_ids.h.

/* VENDOR and DEVICE ID of XENA. */
#define PCI_VENDOR_ID_S2IO      0x17D5
#define PCI_DEVICE_ID_S2IO_WIN  0x5731
#define PCI_DEVICE_ID_S2IO_UNI  0x5831

3) AS_A_MODULE is incorrect.

/* Load driver as a module */
#define AS_A_MODULE

First, it is defined unconditionally.  Second, it should not even exist. 
  The kernel module API is intentionally designed such that the source 
code functions whether a kernel module or built into vmlinux, without 
#ifdefs.  So, simply remove the ifdefs.

As a general rule, Linux kernel source code tries to be as free of 
ifdefs as possible.

4) You will of course need to change CONFIGURE_ETHTOOL_SUPPORT, 
CONFIGURE_NAPI_SUPPORT to Kconfig-generate CONFIG_xxx defines, when 
submitting.

5) again, follow the kernel's no-ifdef philosophy:

#ifdef KERN_26
static irqreturn_t s2io_isr(int irq, void *dev_id, struct pt_regs *regs); #else void s2io_isr(int irq, void *dev_id, struct pt_regs *regs); #endif /** KERN_26 **/

The "irqreturn_t" type was designed specifically to work without #ifdefs 
in earlier kernels.  Here is the proper compatibility code, taken from 
release kernel 2.4.25's include/linux/interrupt.h:

	/* For 2.6.x compatibility */
	typedef void irqreturn_t;
	#define IRQ_NONE
	#define IRQ_HANDLED
	#define IRQ_RETVAL(x)

I hope you notice a key philosophy emerging ;-)  You want to write a 
no-ifdef driver for 2.6, and then use the C pre-processor, typedefs, and 
other tricks to make the driver work on earlier kernels with as little 
modification as possible.

Look at http://sf.net/projects/gkernel/  module "kcompat" for an example 
of a toolkit which allows you to write a current driver, and then use it 
on older kernels.

6) delete, not needed

#ifdef UNDEFINED
         suspend:NULL,
         resume:NULL,
#endif

7) memory leak on error
                 /*  Allocating all the Rx blocks */
                 for (j = 0; j < blk_cnt; j++) {
                         size = (MAX_RXDS_PER_BLOCK + 1) * (sizeof(RxD_t));
                         tmp_v_addr = pci_alloc_consistent(nic->pdev, size,
                                                           &tmp_p_addr);
                         if (tmp_v_addr == NULL) {
                                 return -ENOMEM;
                         }
                         memset(tmp_v_addr, 0, size);

8) memory leak on error

/* Allocation and initialization of Statistics block */
         size = sizeof(StatInfo_t);
         mac_control->stats_mem = pci_alloc_consistent
             (nic->pdev, size, &mac_control->stats_mem_phy);

         if (!mac_control->stats_mem) {
                 return -ENOMEM;
         }

9) if you store a pointer for your shared memory, it is wasteful to 
store an -additional- flag indicating this memory has been allocated. 
simply check for NULL.

         if (nic->_fResource & TXD_ALLOCED) {
                 nic->_fResource &= ~TXD_ALLOCED;
                 pci_free_consistent(nic->pdev,
                                     mac_control->txd_list_mem_sz,

10) ULL suffix

         write64(&bar0->swapper_ctrl, 0xffffffffffffffff);
         val64 = (SWAPPER_CTRL_PIF_R_FE |

11) ditto this for other 64-bit constants

12) never mdelay() for this long.  Either create a timer, or make sure 
you're in process constant and sleep via schedule_timeout().

/* Remove XGXS from reset state*/
         val64 = 0;
         write64(&bar0->sw_reset, val64);
         mdelay(500);

13) memory writes without memory reads following them are often the 
victims of PCI write posting bugs.  At the very least, this driver 
appears to have many PCI write posting issues.

         write64(&bar0->dtx_control, 0x8000051500000000);
         udelay(50);
         write64(&bar0->dtx_control, 0x80000515000000E0);
         udelay(50);
         write64(&bar0->dtx_control, 0x80000515D93500E4);
         udelay(50);

         write64(&bar0->dtx_control, 0x8001051500000000);
         udelay(50);
         write64(&bar0->dtx_control, 0x80010515000000E0);
         udelay(50);
         write64(&bar0->dtx_control, 0x80010515001E00E4);
         udelay(50);

You are not guaranteed that the write will have completed, by the end of 
each udelay(), unless you first issue a PCI read of some sort.

14) another mdelay(500) loop to be fixed

/*  Wait for the operation to complete */
         time = 0;
         while (TRUE) {
                 val64 = read64(&bar0->rti_command_mem);
                 if (!(val64 & TTI_CMD_MEM_STROBE_NEW_CMD)) {
                         break;
                 }
                 if (time > 50) {
                         DBG_PRINT(ERR_DBG, "%s: RTI init Failed\n",
                                   dev->name);
                         return -1;
                 }
                 time++;
                 mdelay(10);

15) you obviously mean TASK_UNINTERRUPTIBLE here:

/* Enabling MC-RLDRAM */
         val64 = read64(&bar0->mc_rldram_mrs);
         val64 |= MC_RLDRAM_QUEUE_SIZE_ENABLE | MC_RLDRAM_MRS_ENABLE;
         write64(&bar0->mc_rldram_mrs, val64);
         set_current_state(TASK_INTERRUPTIBLE);
         schedule_timeout(HZ / 10);

16) get this from struct pci_dev, not directly from the PCI bus:

         /* SXE-002: Initialize link and activity LED */
         ret =
             pci_read_config_word(nic->pdev, PCI_SUBSYSTEM_ID,
                                  (u16 *) & subid);

17) question: do you not support more advanced checksum offload?  like 
ipv6 or "hey I put the packet checksum <here>"

18) waitForCmdComplete can mdelay() an unacceptably long time

19) ditto s2io_reset.

20) your driver has its spinlocks backwards!  Your interrupt handler 
uses spin_lock_irqsave(), and your non-interrupt handling code uses 
spin_lock().  That's backwards from correct.

21) s2io_close could mdelay() for unacceptably long time.  Fortunately, 
you -can- sleep here, so just replace with schedule_timeout() calls.

22) remove the commented-out MOD_{INC,DEC}_USE_COUNT.

23) your tx_lock spinlock is completely unused.  oops.  :)  the spinlock 
covers two areas of code, both of which are mutually exclusive.

Given this and #20... you might want to make sure to build and test on 
SMP.  Even SMP kernels on uniprocessor hardware helps find spinlock 
deadlocks.

24) your tx_lock does not cover the interrupt handler code.  I presume 
this is an oversight?

25) delete s2io_set_mac_addr.  It's not needed.  It is preferred to use 
the default eth_mac_addr.  Follow this procedure, usually:

	a) During probe, obtain MAC address from "original source",
	usually EEPROM / SROM.
	b) Each time dev->open() is called, write MAC address to h/w.

26) check and make sure you initialize your link to off 
(netif_carrier_off(dev)), in your dev->open() function.  In the 
background, your phy state machine should call netif_carrier_on() once 
it is certain link has been established.

this _must_ be an asynchronous process.  You may not sleep and wait for 
link, in dev->open().

27) for current 2.4 and 2.6 kernels, please use struct ethtool_ops 
rather than a large C switch statement.

28) are you aware that all of s2io_tx_watchdog is inside the 
dev->tx_lock spinlock?  I am concern s2io_tx_watchdog execution time may
be quite excessive a duration to hold a spinlock.

29) never call netif_wake_queue() unconditionally.  only call it if you 
are 100% certain that the net stack is allowed to add another packet to 
your hardware's TX queue(s).

30) do not call netif_stop_queue() and netif_wake_queue() on link 
events, in s2io_link.  Simply call netif_carrier_{on,off}.

31) ULL suffix

         } else if (!pci_set_dma_mask(pdev, 0xffffffff)) {

32) missing call to pci_disable_device() on error:

                 if (pci_set_consistent_dma_mask
                     (pdev, 0xffffffffffffffffULL)) {
                         DBG_PRINT(ERR_DBG,
                                   "Unable to obtain 64bit DMA for \
                                         consistent allocations\n");
                         return -ENOMEM;

33) if you use CHECKSUM_UNNECESSARY, you should be using the 
less-capable NETIF_F_IP_CSUM.

         dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;

NETIF_F_HW_CSUM requires the actual checksum value.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-02-19  7:16 raghavendra.koushik
@ 2004-02-19  8:14 ` Jeff Garzik
  2004-02-20  2:33   ` ravinandan arakali
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2004-02-19  8:14 UTC (permalink / raw)
  To: raghavendra.koushik
  Cc: leonid.grossman, netdev, raghavendra.koushik, ravinandan.arakali

raghavendra.koushik@wipro.com wrote:
> Hi Jeff,
> 
> 1. points 7 and 8, when initSharedMem returns error, I call
> freeSharedMem which should free any partially alloced memory.

For #8 quite possibly, and if so, I stand corrected.

For #7, it's a temporary variable so this would be impossible.

> 2. For point 17 and 33
> We do support IPV6 checksum offload. There is one issue though,
> our hardware only says whether the checksum is Ok or not it does
> not actually return the checksum values!
[...]
> If I say features as NETIF_F_IP_CSUM instead of NETIF_F_HW_CSUM,
> then I cannot utilize it's entire gamut of checksum offload feature
> as the offload will be limited to just TCP/UDP over IPV4.

Correct.  Your hardware cannot utilize NETIF_F_HW_CSUM.  You must be 
able to supply a valid csum from hardware, to use NETIF_F_HW_CSUM. 
Using NETIF_F_HW_CSUM as s2io does is abuse of the API, and prone to 
breakage...

For the future, it sounds like you should create a NETIF_F_IPV6_CSUM 
that works for both IPv4 and IPv6, and more closely matches your 
hardware.  We need to do this anyway, because most future cards will 
almost certainly offload IPv6 as well as IPv4.

For the present, NETIF_F_IP_CSUM is unfortunately your only choice. 
Zero-copy only occurs for sendfile(2) system call, which works fine with 
NETIF_F_IP_CSUM, so no big deal.

In general, I certainly want to encourage s2io to participate in adding 
features to Linux that is needed to more fully utilize the hardware. 
Some of the proposed features might not be appropriate, but adding 
NETIF_F_IPV6_CSUM for you guys certainly seems reasonable.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
  2004-02-19  8:14 ` Jeff Garzik
@ 2004-02-20  2:33   ` ravinandan arakali
  0 siblings, 0 replies; 18+ messages in thread
From: ravinandan arakali @ 2004-02-20  2:33 UTC (permalink / raw)
  To: 'Jeff Garzik', raghavendra.koushik
  Cc: leonid.grossman, netdev, raghavendra.koushik

Hi Jeff,
For #7, the temporary variable tmp_v_addr is assigned as follows
(immediately below the code you have quoted):

nic->rx_blocks[i][j].block_virt_addr = tmp_v_addr;

In the freeSharedMem() routine, we go thru' each block of each
receive ring and free it. Following is the relevant piece of
code:
for (i = 0; i < config->RxRingNum; i++) {
     blk_cnt = nic->block_count[i];
     for (j = 0; j < blk_cnt; j++) {
          tmp_v_addr = nic->rx_blocks[i][j].block_virt_addr;
          tmp_p_addr = nic->rx_blocks[i][j].block_dma_addr;
          pci_free_consistent(nic->pdev, size, tmp_v_addr, tmp_p_addr);
     }
}

But I guess in the above piece of code, it may be a good idea to
check if nic->rx_blocks[i][j].block_virt_addr is non-NULL before
doing the pci_free_consistent().

Thanks,
Ravi

-----Original Message-----
From: Jeff Garzik [mailto:jgarzik@pobox.com] 
Sent: Thursday, February 19, 2004 12:15 AM
To: raghavendra.koushik@wipro.com
Cc: leonid.grossman@s2io.com; netdev@oss.sgi.com;
raghavendra.koushik@s2io.com; ravinandan.arakali@s2io.com
Subject: Re: Submission for S2io 10GbE driver

raghavendra.koushik@wipro.com wrote:
> Hi Jeff,
> 
> 1. points 7 and 8, when initSharedMem returns error, I call
> freeSharedMem which should free any partially alloced memory.

For #8 quite possibly, and if so, I stand corrected.

For #7, it's a temporary variable so this would be impossible.

> 2. For point 17 and 33
> We do support IPV6 checksum offload. There is one issue though,
> our hardware only says whether the checksum is Ok or not it does
> not actually return the checksum values!
[...]
> If I say features as NETIF_F_IP_CSUM instead of NETIF_F_HW_CSUM,
> then I cannot utilize it's entire gamut of checksum offload feature
> as the offload will be limited to just TCP/UDP over IPV4.

Correct.  Your hardware cannot utilize NETIF_F_HW_CSUM.  You must be 
able to supply a valid csum from hardware, to use NETIF_F_HW_CSUM. 
Using NETIF_F_HW_CSUM as s2io does is abuse of the API, and prone to 
breakage...

For the future, it sounds like you should create a NETIF_F_IPV6_CSUM 
that works for both IPv4 and IPv6, and more closely matches your 
hardware.  We need to do this anyway, because most future cards will 
almost certainly offload IPv6 as well as IPv4.

For the present, NETIF_F_IP_CSUM is unfortunately your only choice. 
Zero-copy only occurs for sendfile(2) system call, which works fine with

NETIF_F_IP_CSUM, so no big deal.

In general, I certainly want to encourage s2io to participate in adding 
features to Linux that is needed to more fully utilize the hardware. 
Some of the proposed features might not be appropriate, but adding 
NETIF_F_IPV6_CSUM for you guys certainly seems reasonable.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: FW: Submission for S2io 10GbE driver
@ 2004-02-05  0:49 Grant Grundler
  2004-02-16 21:16 ` Leonid Grossman
  0 siblings, 1 reply; 18+ messages in thread
From: Grant Grundler @ 2004-02-05  0:49 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: 'Andi Kleen', netdev, 'Ragu Vatsavayi',
	'Grant Grundler'

On Wed, Feb 04, 2004 at 12:44:21PM -0800, Leonid Grossman wrote:
> 1) We did't find quad word memory operations(writeq and readq) on PCI
> bus for PPC64 architecture.

I would consider that a bug in the PPC64 port. You can submit
a patch to add a readq/writeq implementation using 32-bit ops
if the PCI Host controller doesn't support 64-bit transactions.
This doesn't always work since two 32-bit writes to a 64-bit register
is not the same as one 64-bit write. Your HW needs to be aware
of that if you want to support that platform.

> 2) We did't had a chance to test the driver on other big endian systems
> apart from PPC64. In PPC64 architecture though,
> I write/read a value to/from ioremaped address it swaps the value and
> behaves like a little endian m/c:  
> 
> For ex: if I  do writel(0x12345678, addr) then in it writes 
> addr: 78, (addr+1):56, (addr+2):34, (addr+3):12
...
> On a little endian m/c like IA32 also writel writes same values in a
> similar manner as shown above. 
> So the question is -
> Do all big endian machines with linux OS swap the values and write in
> little endian format?? 

Yes - I believe so.
There are ways to avoid the byte swap but I'm really not keen
on advocating this path. byte swapping is so much cheaper than
the PIO Reads, it just doesn't make sense to obfuscate the
code most of the time.

> 3)In PPC64 architecture dma_addr_t is u32, unlike remaining 64 bit
> architectures where it is defined as u64. Because of this we have
> to deal separately for PP64. 

You shouldn't be using dma_addr_t for layout of data structures
shared with the card. Several architectures *only* use 32-bit
DMA addresses (IOMMU for all DMA). It makes sense on those
architectures to define dma_addr_t as u32.
Try "fgrep dma_addr_t include/asm*/* | fgrep typedef" on linux-2.6 tree.

> So any suggestions will be useful, .i.e. generally how PPC64 developers
> deal with this?

Only use dma_addr_t to map data and not to define data structures
consumed by the card.

Some history from my perspective:
Fri, 9 Feb 2001, I started a discussion on linux-mm@kvack.org mailing
list on exactly this issue for linux-2.4. At the time Dave Miller
"owned" DMA interface and insisted dma_addr_t be u32.
He offered dma64_addr_t would be introduced in linux-2.5.
So far so good.

The only problem was IA64 was already defining dma_addr_t as u64
and mips was using "unsigned long".

Issue came up again on linux-kernel/linux-ia64 Feb 6, 2002.
(Subject: Proper fix for sym53c8xx_2 driver and dma64_addr_t)
It was pretty clear pci_dac_XXX() was the "preferred" interface
to use. But it made it impossible for a driver to support multiple
chips that supported both.

Somewhere along the line, I think with more thrashing about on
pci_set_dma_mask(), it was decided it was ok for dma_addr_t to
be u64 on architectures that supported it.

Last step into this mess was "64 Bits DMA Addresses for Alloc Consistent
Interfaces" thread started by SGI. Altix box needed it in order to
support PCI-X devices in PCI-X mode. (15 May 2003)
This resulted in "pci_set_consistent_dma_mask()" interface
in order to change the dma_mask for pci_alloc_consistent() calls.
AlexW just submitted support for this a week or two ago to 2.6 kernel.

> Say if we have some structure corresponding to memory region of our 
> hardware device which should be of 16 bytes.
...
> Then above definition will not work for PPC64, we need to modify it like
> this:
> struct hw {
> dma_addr_t  phys;
> char            *virt;
> #ifdef  ARCH32 
> u64 dummy;
> #endif
> #ifdef ARCH_PPC64
> u32 dummy;
> #endif
> }

Use "u64 phys" and it will always be correct.

And "char * virt" has the same problem. "char *" will vary
in size depending arch (ILP32 vs LP64 data model) as well.
You did claim this code worked on i386, ia64 and PPC64, right?

hth,
grant

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
  2004-02-05  0:49 FW: " Grant Grundler
@ 2004-02-16 21:16 ` Leonid Grossman
  2004-02-16 22:12   ` Jeff Garzik
  2004-02-17  0:11   ` Christoph Hellwig
  0 siblings, 2 replies; 18+ messages in thread
From: Leonid Grossman @ 2004-02-16 21:16 UTC (permalink / raw)
  To: netdev
  Cc: 'Andi Kleen', 'Jeff Garzik',
	'Stephen Hemminger', 'Francois Romieu',
	'jamal', 'Grant Grundler',
	'Anton Blanchard', 'Jes Sorensen',
	raghavendra.koushik, 'ravinandan arakali'

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

Hi all,
Attached is the second submission for our 10GbE Adapter.

Many thanks for the input; I believe we have addressed all the comments
to date (I cc everybody who helped us with the comments, to make sure
their concerns have been addressed).

s2io_linux_drv_submission02.tar : Contains the new driver source files
s2io_linux_drv_patches.tar : Contains the patch files that can be
applied on the files we originally submitted.

All the utilities/diagnostics/tuning scripts are removed from the
submission.

Also, couple questions - 

1. At the moment, lspci output looks like
"02:02.0 Ethernet controller: Unknown device 17d5:5831 (rev 02)"; do we
need to submit a patch for drivers/pci/pci.ids?

2. The card fully supports Ethernet and TCP header separation in
hardware (so called receive 3-buffer mode).
The mode may have some performance advantages but so far we did not
implement the mode in Linux since it seems that Linux stack can't handle
the fragmented buffers in the receive path. Is this a correct
assumption, does receive buffer has to be continuous?

Thanks, Leonid

[-- Attachment #2: s2io_linux_drv_patches.tar --]
[-- Type: application/octet-stream, Size: 274786 bytes --]

[-- Attachment #3: s2io_linux_drv_submission02.tar --]
[-- Type: application/octet-stream, Size: 211585 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-02-16 21:16 ` Leonid Grossman
@ 2004-02-16 22:12   ` Jeff Garzik
  2004-02-16 23:53     ` Leonid Grossman
  2004-02-17  0:11   ` Christoph Hellwig
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2004-02-16 22:12 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: netdev, 'Andi Kleen', 'Stephen Hemminger',
	'Francois Romieu', 'jamal',
	'Grant Grundler', 'Anton Blanchard',
	'Jes Sorensen', raghavendra.koushik,
	'ravinandan arakali'

Leonid Grossman wrote:
> 1. At the moment, lspci output looks like
> "02:02.0 Ethernet controller: Unknown device 17d5:5831 (rev 02)"; do we
> need to submit a patch for drivers/pci/pci.ids?

http://pciids.sourceforge.net/

The file drivers/pci/pci.ids is only associated with /proc/pci strings, 
and I'm trying to deprecate it :)


> 2. The card fully supports Ethernet and TCP header separation in
> hardware (so called receive 3-buffer mode).
> The mode may have some performance advantages but so far we did not
> implement the mode in Linux since it seems that Linux stack can't handle
> the fragmented buffers in the receive path. Is this a correct
> assumption, does receive buffer has to be continuous?

In theory, the skb can be fragmented.  I'm not as much as an expert in 
the ipv4/tcp/socket levels of the net stack, but I don't recall any 
place that yet supports skb frags on receive?

I think that's likely an area that would need some minor 
adjustments/additions in the upstream kernels, but not major surgery, 
since the skb already supports creating, noticing, and freeing frags.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
  2004-02-16 22:12   ` Jeff Garzik
@ 2004-02-16 23:53     ` Leonid Grossman
  0 siblings, 0 replies; 18+ messages in thread
From: Leonid Grossman @ 2004-02-16 23:53 UTC (permalink / raw)
  To: 'Jeff Garzik'
  Cc: netdev, raghavendra.koushik, 'ravinandan arakali'

> http://pciids.sourceforge.net/
>The file drivers/pci/pci.ids is only associated with /proc/pci strings,

>and I'm trying to deprecate it :)

Done, thanks for the pointer!


>> 2. The card fully supports Ethernet and TCP header separation in 
>> hardware (so called receive 3-buffer mode). The mode may have some 
>> performance advantages but so far we did not implement the mode in 
>> Linux since it seems that Linux stack can't handle the fragmented 
>> buffers in the receive path. Is this a correct assumption, does 
>> receive buffer has to be continuous?

>In theory, the skb can be fragmented.  I'm not as much as an expert in 
>the ipv4/tcp/socket levels of the net stack, but I don't recall any 
>place that yet supports skb frags on receive?

>I think that's likely an area that would need some minor 
>adjustments/additions in the upstream kernels, but not major surgery, 
>since the skb already supports creating, noticing, and freeing frags.

>	Jeff


I think it will be a good idea to have full support for skb frags in
general, and not just for our product.

Please let me know if there is a consensus (and hopefully a timeframe
:-) for this; 
We volunteer to implement the mode in our driver and test the solution.

Leonid

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-02-16 21:16 ` Leonid Grossman
  2004-02-16 22:12   ` Jeff Garzik
@ 2004-02-17  0:11   ` Christoph Hellwig
  2004-02-17  0:16     ` Stephen Hemminger
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2004-02-17  0:11 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: netdev, 'Andi Kleen', 'Jeff Garzik',
	'Stephen Hemminger', 'Francois Romieu',
	'jamal', 'Grant Grundler',
	'Anton Blanchard', 'Jes Sorensen',
	raghavendra.koushik, 'ravinandan arakali'

A bunch of comments:

 - if you want to submit the driver for inclusion please submit a patch against a kernel tree,
   not a tarball.
 - please try to avoid version ifdefs by provoding the newer APIs on older kernels, e.g.:

#ifndef IRQ_RETVAL
#define irqreturn_t                     void
#define IRQ_RETVAL(foo)
#endif

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,00)
#define free_netdev kfree
#endif

 - your AS_A_MODULE ifdef is bogs - everything under it is fine for a
   non-modular driver, too
 - your XENA_ARCH_64 is not good - just cast 64bit values to
   (unsigned long long) always and use the ll format specifier always.
   You missed a few 64bit arches, btw :)
 - can you get rid of all those BOOL/TRUE/FALSE/SUCCESS/etc.. ifdefs?
 - s2io_driver wants to be converted to C99 initializers (.foo instead of foo:)
 - In Linux comments usually are on the same indentation level as surrounding
   code
 - CONFIGURE_NAPI_SUPPORT should probably become CONFIG_XENA_NAPI or whatever
 - you want to use ethtool_ops instead of the ioctl variant
 - there's lots of non-static symbols in s2io.c - these shouldn't exist in a
   single source-file driver
 - you can't return -ENOMEM from the isr - just IRQ_HANDLED or IRQ_NONE
 - having the RCS log at the end of the source files looks ... odd

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-02-17  0:11   ` Christoph Hellwig
@ 2004-02-17  0:16     ` Stephen Hemminger
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2004-02-17  0:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Leonid Grossman, netdev, 'Andi Kleen',
	'Jeff Garzik', 'Francois Romieu', 'jamal',
	'Grant Grundler', 'Anton Blanchard',
	'Jes Sorensen', raghavendra.koushik,
	'ravinandan arakali'

On Tue, 17 Feb 2004 00:11:12 +0000
Christoph Hellwig <hch@infradead.org> wrote:

> A bunch of comments:
> 
>  - if you want to submit the driver for inclusion please submit a patch against a kernel tree,
>    not a tarball.
>  - please try to avoid version ifdefs by provoding the newer APIs on older kernels, e.g.:
> 
> #ifndef IRQ_RETVAL
> #define irqreturn_t                     void
> #define IRQ_RETVAL(foo)
> #endif
> 
> #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,00)
> #define free_netdev kfree
> #endif

The proper way for that is:

#ifndef HAVE_FREE_NETDEV
#define free_netdev(x)  kfree(x)
#endif

^ permalink raw reply	[flat|nested] 18+ messages in thread

* FW: Submission for S2io 10GbE driver
@ 2004-01-23 21:22 Leonid Grossman
  2004-01-23 21:54 ` Stephen Hemminger
  2004-01-23 22:22 ` FW: " Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Leonid Grossman @ 2004-01-23 21:22 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1917 bytes --]

Hi all,
Please fund attached a source code for S2io 10GbE adapter (with some
disclaimers below).
Send me your comments/suggestions on the source please,  and we will
address the code changes (if any) in real time.

Regards, Leonid

Leonid Grossman 
Vice President, SW Engineering 
S2io Inc. 
www.s2io.com

-----Original Message-----
From: Leonid Grossman [mailto:leonid.grossman@s2io.com] 
Sent: Tuesday, January 20, 2004 8:13 PM
To: 'Jeff Garzik'
Subject: Submission for S2io 10GbE driver

Hi Jeff, 
Per your suggestion, attached is the driver source.

Couple disclaimers - 
1. There are some history logs from our CVS at the end of the source
files, you will probably want us to remove these. 

2. We are working on adding more loadable tunable parameters. 

3. This is a shipping (as of last October) card, and Linux driver got
some reasonable mileage on 2.6 kernel and number of 2.4 distributions,
mostly 2.4.21 based. Having said that, we still run performance and
stress tests in QA since 10GbE is fairly new.

All comments/suggestions are very welcome; let me know if you need
anything besides the source from our end.

Regards, Leonid

> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com]
> Sent: Monday, January 12, 2004 1:36 PM
> To: Leonid Grossman
> Subject: Re: FW: Submission for 10GbE driver
> 
> Just a suggestion, but it might be a good idea to email me
> your driver 
> sooner than later.
> 
> _Almost all_ drivers require changes before being submitting, so
> delaying source release due to a Q/A cycle or similar is 
> probably just 
> wasting time, if the driver will likely have to be updated and 
> re-submitted, possibly several times.
> 
> Don't let me scare you :)  Going through 1-2 iterations
> before a driver 
> passes review is normal for most vendors, particularly the first one. 
> The key is listening to kernel developers' feedback.
> 
> 	Jeff
> 
> 
> 

[-- Attachment #2: xframe_rc1.65.tar --]
[-- Type: application/octet-stream, Size: 440320 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-01-23 21:22 FW: " Leonid Grossman
@ 2004-01-23 21:54 ` Stephen Hemminger
  2004-01-23 21:58   ` Leonid Grossman
  2004-01-23 22:22 ` FW: " Andi Kleen
  1 sibling, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2004-01-23 21:54 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: netdev

On Fri, 23 Jan 2004 13:22:11 -0800
"Leonid Grossman" <leonid.grossman@s2io.com> wrote:

> Hi all,
> Please fund attached a source code for S2io 10GbE adapter (with some
> disclaimers below).
> Send me your comments/suggestions on the source please,  and we will
> address the code changes (if any) in real time.
> 
> Regards, Leonid
> 
> 
> 
> Leonid Grossman 
> Vice President, SW Engineering 
> S2io Inc. 
> www.s2io.com

Okay, but next time how about only sending the source of the driver
as a kernel patch.  Not a whole build directory including binaries
and CVS!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
  2004-01-23 21:54 ` Stephen Hemminger
@ 2004-01-23 21:58   ` Leonid Grossman
  0 siblings, 0 replies; 18+ messages in thread
From: Leonid Grossman @ 2004-01-23 21:58 UTC (permalink / raw)
  To: 'Stephen Hemminger'; +Cc: netdev



> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@osdl.org] 
> Sent: Friday, January 23, 2004 1:54 PM
> To: Leonid Grossman
> Cc: netdev@oss.sgi.com
> Subject: Re: Submission for S2io 10GbE driver
> Okay, but next time how about only sending the source of the 
> driver as a kernel patch.  Not a whole build directory 
> including binaries and CVS!


Will do next time. 
Leonid
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: FW: Submission for S2io 10GbE driver
  2004-01-23 21:22 FW: " Leonid Grossman
  2004-01-23 21:54 ` Stephen Hemminger
@ 2004-01-23 22:22 ` Andi Kleen
  2004-01-24  0:21   ` Stephen Hemminger
  1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2004-01-23 22:22 UTC (permalink / raw)
  To: Leonid Grossman; +Cc: netdev

On Fri, 23 Jan 2004 13:22:11 -0800
"Leonid Grossman" <leonid.grossman@s2io.com> wrote:

> Hi all,
> Please fund attached a source code for S2io 10GbE adapter (with some
> disclaimers below).
> Send me your comments/suggestions on the source please,  and we will
> address the code changes (if any) in real time.

>From a quick look:

The debugging ioctls look quite dangerous. iirc they are not root projected
by higher level code. Either remove them or add a root check at least,
otherwise you'll have a potential root hole.

All the ARCH_PPC64 ifdefs shouldn't be needed. Can you remove that?
If there are problems in ppc64 code they should be fixed there, not worked
around. Same with the ifdefs for kernel 2.6 features. An driver integrated
into the kernel should not contain such ifdefs.

mdelay(100) is quite evil. Better make that a schedule_timeout() in 
process context.

ETH_GREGS/GEEPROM/SEEPROM/GSTRINGS ioctl handlers: seems to leak memory on error paths

ETH_SEEPROM: len should be limit checked

Running the driver through scripts/Lindent may not be a bad idea.

initNic: the jiffies check does not handle jiffies wrapping. 2.6 forces a jiffies wrap
5 minutes after boot, so this may be fatal. Use the right macros for this.

init_nic: x86-64 supports consistent dma masks too. in fact they should be just
used unconditionally.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-01-23 22:22 ` FW: " Andi Kleen
@ 2004-01-24  0:21   ` Stephen Hemminger
  2004-01-27  5:32     ` Leonid Grossman
  0 siblings, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2004-01-24  0:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Leonid Grossman, netdev

Noticed the setup loopback test seems to register for a packet type
and then forget to unregister that type! 

Also nothing really restricts the packet type to only coming in on the expected
interface; therefore if someone sends the same packet in over another interface,
then sp->loop_pkt_cnt will end up incrementing some other drivers private
data structure *bad*.

IMHO the whole loopback test frame stuff seems like something in a test
bed driver, not production code.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
  2004-01-24  0:21   ` Stephen Hemminger
@ 2004-01-27  5:32     ` Leonid Grossman
  2004-01-27  6:08       ` Jeff Garzik
  0 siblings, 1 reply; 18+ messages in thread
From: Leonid Grossman @ 2004-01-27  5:32 UTC (permalink / raw)
  To: 'Stephen Hemminger', 'Andi Kleen'
  Cc: netdev, raghavendra.koushik

Hi Stephen,
Below are responses from our developer.

Thanks for the input, Leonid

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@osdl.org] 
> Sent: Friday, January 23, 2004 4:22 PM
> To: Andi Kleen
> Cc: Leonid Grossman; netdev@oss.sgi.com
> Subject: Re: Submission for S2io 10GbE driver
> 
> 
> Noticed the setup loopback test seems to register for a 
> packet type and then forget to unregister that type! 

The packet type gets unregistered at the end of the test in the
'reset_loopback'
function through the system call 'dev_remove_pack()'

> 
> Also nothing really restricts the packet type to only coming 
> in on the expected interface; therefore if someone sends the 
> same packet in over another interface, then sp->loop_pkt_cnt 
> will end up incrementing some other drivers private data 
> structure *bad*.

Correct, that's why in the s2io.c source where I define the
packet_type's protocol value
through the Macro 'ETH_LOOP_TEST_TYPE' there's a ToDo to obtain a
private protocol ID. 
This way no app in the real world can ever pass a frame with that T/L
field in the packet.
Also, the problem can only happen during the 3 second duration when this
test is in progress.

> 
> IMHO the whole loopback test frame stuff seems like something 
> in a test bed driver, not production code.

The loopback test is there as a part of the ethtool's diagnostic option.

There are pros and cons of having the test in there I guess, anyone else
has an opinion on this?

Do other net drivers normally support loopback and other diag tests as a
part of the ethtool support,
or they provide little/no support for the option and ship a standalone
diag program instead?

Thanks, Leonid

> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Submission for S2io 10GbE driver
  2004-01-27  5:32     ` Leonid Grossman
@ 2004-01-27  6:08       ` Jeff Garzik
  2004-01-27  6:19         ` Leonid Grossman
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2004-01-27  6:08 UTC (permalink / raw)
  To: Leonid Grossman
  Cc: 'Stephen Hemminger', 'Andi Kleen', netdev,
	raghavendra.koushik

Leonid Grossman wrote:
> The loopback test is there as a part of the ethtool's diagnostic option.
> 
> 
> There are pros and cons of having the test in there I guess, anyone else
> has an opinion on this?
> 
> Do other net drivers normally support loopback and other diag tests as a
> part of the ethtool support,
> or they provide little/no support for the option and ship a standalone
> diag program instead?

The ethtool diag stuff is more of a quick sanity test than anything 
exhaustive.

I definitely want to discourage tons of test code in drivers, as its 
code that users will almost-never run, it bloats the driver, and can be 
done with a special diag-only driver or diag program (or a combination 
of both).

A lot of the 10/100 drivers originated from Donald Becker, who typically 
creates a userland (i.e. separate) diag program for each driver he writes.

	Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Submission for S2io 10GbE driver
  2004-01-27  6:08       ` Jeff Garzik
@ 2004-01-27  6:19         ` Leonid Grossman
  0 siblings, 0 replies; 18+ messages in thread
From: Leonid Grossman @ 2004-01-27  6:19 UTC (permalink / raw)
  To: 'Jeff Garzik'
  Cc: 'Stephen Hemminger', 'Andi Kleen', netdev,
	raghavendra.koushik



> 
> The ethtool diag stuff is more of a quick sanity test than anything 
> exhaustive.
> 
> I definitely want to discourage tons of test code in drivers, as its 
> code that users will almost-never run, it bloats the driver, 
> and can be 
> done with a special diag-only driver or diag program (or a 
> combination 
> of both).
> 
> A lot of the 10/100 drivers originated from Donald Becker, 
> who typically 
> creates a userland (i.e. separate) diag program for each 
> driver he writes.


I see the point; we'll go ahead and get rid of the loopback.
We actually have pretty extensive standalone diag tool that is based on
diag driver.

Thanks, Leonid


> 
> 	Jeff
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-02-26  7:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25  6:03 Submission for S2io 10GbE driver raghavendra.koushik
2004-02-26  7:40 ` Jeff Garzik
     [not found] <403573B5.4050100@pobox.com>
2004-02-20  2:59 ` ravinandan arakali
2004-02-20  3:30   ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2004-02-19  7:16 raghavendra.koushik
2004-02-19  8:14 ` Jeff Garzik
2004-02-20  2:33   ` ravinandan arakali
2004-02-05  0:49 FW: " Grant Grundler
2004-02-16 21:16 ` Leonid Grossman
2004-02-16 22:12   ` Jeff Garzik
2004-02-16 23:53     ` Leonid Grossman
2004-02-17  0:11   ` Christoph Hellwig
2004-02-17  0:16     ` Stephen Hemminger
2004-01-23 21:22 FW: " Leonid Grossman
2004-01-23 21:54 ` Stephen Hemminger
2004-01-23 21:58   ` Leonid Grossman
2004-01-23 22:22 ` FW: " Andi Kleen
2004-01-24  0:21   ` Stephen Hemminger
2004-01-27  5:32     ` Leonid Grossman
2004-01-27  6:08       ` Jeff Garzik
2004-01-27  6:19         ` Leonid Grossman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).