Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/14] nes: device structures and defines
From: Michael Buesch @ 2007-08-08 13:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Garzik, ggrundstrom, rdreier, ewg, netdev
In-Reply-To: <20070808133850.GE14419@one.firstfloor.org>

On Wednesday 08 August 2007 15:38:50 Andi Kleen wrote:
> On Wed, Aug 08, 2007 at 03:28:33PM +0200, Michael Buesch wrote:
> > On Wednesday 08 August 2007 15:08:28 Andi Kleen wrote:
> > > On Wed, Aug 08, 2007 at 03:02:35PM +0200, Michael Buesch wrote:
> > > > On Wednesday 08 August 2007 14:55:11 Andi Kleen wrote:
> > > > > On Wed, Aug 08, 2007 at 01:50:35PM +0200, Michael Buesch wrote:
> > > > > > On Wednesday 08 August 2007 14:38:10 Andi Kleen wrote:
> > > > > > > Jeff Garzik <jeff@garzik.org> writes:
> > > > > > > > > +			val, reg_index, addr, addr+4); */
> > > > > > > > > +	writel(cpu_to_le32(reg_index), addr);
> > > > > > > > > +	writel(cpu_to_le32(val),(u8 *)addr + 4);
> > > > > > > > 
> > > > > > > > wrong -- endian conversion macros not needed with writel()
> > 
> > > > Fact is that we do _not_ need cpu_to_le32 with writel.
> > > 
> > > We do on a big endian platform if the register is LE. I assume that's the case 
> > > on this hardware.
> > 
> > That is not true.
> > writeX does automatically convert to bus-endian.
> > Which, in case of the PCI bus, is little endian.
> > So if your register is LE (which it is most likely), you don't
> > need any conversion. If your register is BE (which I very much doubt),
> > then you need swab32().
> > In _no_ case you need cpu_to_xx().
> 
> Hmm, I checked a couple of BE architectures and none seem to convert explicitely.

That depends on the arch.
Look at this from ppc, for example:

184 static inline void writel(__u32 b, volatile void __iomem *addr)
185 {
186         out_le32(addr, b);
187 }

out_le32 writes a little endian value. So it converts it.
Also note that b is __u32. Sparse would complain, if someone used cpu_to_xx
on it.

> But ok it's possible that their PCI bridges convert. I'll take your
> word for it although it sounds somewhat dubious.
>
> However if that's true then there are quite some drivers wrong:
> 
> % grep -r write[bwl]\(cpu_to *   | wc -l
> 57

Yeah, seems so.
Here comes an example (with 16bit values)

Little endian register
	LittleEndian arch		BigEngian arch
value	0xBBAA				0xAABB
writew	0xBBAA				0xAABB
on bus	0xBBAA				0xBBAA
on dev	0xBBAA				0xBBAA

Big endian register
	LittleEndian arch		BigEngian arch
value	0xBBAA				0xAABB
swab16	0xAABB				0xBBAA
writew	0xAABB				0xBBAA
on bus	0xAABB				0xAABB
on dev	0xAABB				0xAABB

I hope I got it right. :)
Same result on every arch.

Most hardware has little endian registers. Some can switch
endianess based on some bit, too. So in case of a BE register that
bit has to be flipped, or if not possible swabX() has to be used.

-- 
Greetings Michael.

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Michael Buesch @ 2007-08-08 13:55 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Garzik, ggrundstrom, rdreier, ewg, netdev
In-Reply-To: <200708081548.25399.mb@bu3sch.de>

On Wednesday 08 August 2007 15:48:25 Michael Buesch wrote:
> > However if that's true then there are quite some drivers wrong:
> > 
> > % grep -r write[bwl]\(cpu_to *   | wc -l
> > 57
> 
> Yeah, seems so.

Most of them seem to be
 __raw_writew(cpu_toXX(...

which I think might be valid.
But there are indeed a few cases that look wrong.

arch/mips/pci/ops-bonito64.c:           writel(cpu_to_le32(*data), addrp);
arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
arch/x86_64/kernel/pci-calgary.c:               writel(cpu_to_be32(val32), target);
arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val32), target);
arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val32), target);
drivers/atm/fore200e.c:    writel(cpu_to_le32(val), addr);
drivers/net/starfire.c:         writew(cpu_to_be16(eaddrs[2]), setup_frm); setup_frm += 4;
drivers/net/starfire.c:         writew(cpu_to_be16(eaddrs[1]), setup_frm); setup_frm += 4;
drivers/net/starfire.c:         writew(cpu_to_be16(eaddrs[0]), setup_frm); setup_frm += 8;
drivers/net/starfire.c:                         writew(cpu_to_be16(i), filter_addr);
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[2]), filter_addr); filter_addr += 4;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[1]), filter_addr); filter_addr += 4;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[0]), filter_addr); filter_addr += 8;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[0]), filter_addr); filter_addr += 4;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[1]), filter_addr); filter_addr += 4;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[2]), filter_addr); filter_addr += 8;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[0]), filter_addr); filter_addr += 4;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[1]), filter_addr); filter_addr += 4;
drivers/net/starfire.c:                 writew(cpu_to_be16(eaddrs[2]), filter_addr); filter_addr += 8;
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->rx_ring_dma), ioaddr + RxPtr);
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->rx_ring_dma) >> 32, ioaddr + RxPtr + 4);
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->tx_ring_dma), ioaddr + TxPtr);
drivers/net/hamachi.c:  writel(cpu_to_le64(hmp->tx_ring_dma) >> 32, ioaddr + TxPtr + 4);
drivers/net/hamachi.c:  writel(cpu_to_le32(hmp->rx_ring_dma), ioaddr + RxPtr);
drivers/net/hamachi.c:  writel(cpu_to_le32(hmp->tx_ring_dma), ioaddr + TxPtr);
drivers/net/tokenring/lanstreamer.c:            writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, rx_ring, 512, PCI_DMA_FROMDEVICE)),
drivers/net/tokenring/lanstreamer.c:    writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, &streamer_priv->streamer_rx_ring[0],
drivers/net/tokenring/lanstreamer.c:    writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, &streamer_priv->streamer_rx_ring[STREAMER_RX_RING_SIZE - 1],
drivers/net/tokenring/lanstreamer.c:                                    writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
drivers/net/tokenring/lanstreamer.c:                                            writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
drivers/net/tokenring/lanstreamer.c:            writel(cpu_to_le32(pci_map_single(streamer_priv->pci_dev, 
drivers/net/via-velocity.c:             writel(cpu_to_le32(vptr->rd_pool_dma), &regs->RDBaseLo);
drivers/net/via-velocity.c:                     writel(cpu_to_le32(vptr->td_pool_dma[i]), &(regs->TDBaseLo[i]));
drivers/scsi/nsp32_io.h:        writew(cpu_to_le16(val), ptr);
drivers/scsi/nsp32_io.h:        writel(cpu_to_le32(val), ptr);
drivers/scsi/nsp32_io.h:        writew(cpu_to_le16(val), data_ptr );
drivers/block/umem.c:   writel(cpu_to_le32((page->page_dma+offset)&0xffffffff),
drivers/block/umem.c:   writel(cpu_to_le32(((u64)page->page_dma)>>32),
drivers/block/umem.c:   writel(cpu_to_le32(DMASCR_GO | DMASCR_CHAIN_EN | pci_cmds),
drivers/block/umem.c:           writel(cpu_to_le32(DMASCR_DMA_COMPLETE|DMASCR_CHAIN_COMPLETE),

-- 
Greetings Michael.

^ permalink raw reply

* Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching
From: Patrick McHardy @ 2007-08-08 14:05 UTC (permalink / raw)
  To: Krishna Kumar
  Cc: johnpol, davem, shemminger, sri, jagana, Robert.Olsson,
	peter.p.waskiewicz.jr, herbert, gaagaan, kumarkr, rdreier,
	rick.jones2, mcarlson, jeff, general, mchan, tgraf, hadi, netdev,
	xma
In-Reply-To: <20070808093145.15396.91711.sendpatchset@localhost.localdomain>

Krishna Kumar wrote:
> + * Algorithm to get skb(s) is:
> + *	- Non batching drivers, or if the batch list is empty and there is
> + *	  1 skb in the queue - dequeue skb and put it in *skbp to tell the
> + *	  caller to use the single xmit API.
> + *	- Batching drivers where the batch list already contains atleast one
> + *	  skb, or if there are multiple skbs in the queue: keep dequeue'ing
> + *	  skb's upto a limit and set *skbp to NULL to tell the caller to use
> + *	  the multiple xmit API.
> + *
> + * Returns:
> + *	1 - atleast one skb is to be sent out, *skbp contains skb or NULL
> + *	    (in case >1 skbs present in blist for batching)
> + *	0 - no skbs to be sent.
> + */
> +static inline int get_skb(struct net_device *dev, struct Qdisc *q,
> +			  struct sk_buff_head *blist, struct sk_buff **skbp)
> +{
> +	if (likely(!blist || (!skb_queue_len(blist) && qdisc_qlen(q) <= 1))) {
> +		return likely((*skbp = dev_dequeue_skb(dev, q)) != NULL);
> +	} else {
> +		int max = dev->tx_queue_len - skb_queue_len(blist);
> +		struct sk_buff *skb;
> +
> +		while (max > 0 && (skb = dev_dequeue_skb(dev, q)) != NULL)
> +			max -= dev_add_skb_to_blist(skb, dev);
> +
> +		*skbp = NULL;
> +		return 1;	/* we have atleast one skb in blist */
> +	}
> +}


I think you missed my previous reply to this in the flood of
responses (or I missed your reply), so I'm copying it below:

The entire idea of a single queue after qdiscs that is refilled
independantly of transmissions times etc. make be worry a bit.
By changing the timing you're effectively changing the qdiscs
behaviour, at least in some cases. SFQ is a good example, but I
believe it affects most work-conserving qdiscs. Think of this
situation:

100 packets of flow 1 arrive
50 packets of flow 1 are sent
100 packets for flow 2 arrive
remaining packets are sent

On the wire you'll first see 50 packets of flow 1, than 100 packets
alternate of flow 1 and 2, then 50 packets flow 2.

With your additional queue all packets of flow 1 are pulled out of
the qdisc immediately and put in the fifo. When the 100 packets of
the second flow arrive they will also get pulled out immediately
and are put in the fifo behind the remaining 50 packets of flow 1.
So what you get on the wire is:

100 packets of flow 1
100 packets of flow 1

So SFQ is without any effect. This is not completely avoidable of
course, but you can and should limit the damage by only pulling
out as much packets as the driver can take and have the driver
stop the queue afterwards.


^ permalink raw reply

* RE: [PATCH 6/14] nes: hardware init
From: Glenn Grundstrom @ 2007-08-08 14:33 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: rdreier, ewg, netdev
In-Reply-To: <46B92342.9090802@garzik.org>

Jeff,

Thanks for the input.  I'll take your suggestions into account for the
patch v2 posting.

Glenn.

-----Original Message-----
From: Jeff Garzik [mailto:jeff@garzik.org] 
Sent: Tuesday, August 07, 2007 8:58 PM
To: Glenn Grundstrom
Cc: rdreier@cisco.com; ewg@lists.openfabrics.org; netdev@vger.kernel.org
Subject: Re: [PATCH 6/14] nes: hardware init

ggrundstrom@neteffect.com wrote:
> +struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8
hw_rev) {
> +	struct nes_adapter *nesadapter = NULL;
> +	unsigned long num_pds;
> +	u32 u32temp;
> +	u32 port_count;
> +	u16 max_rq_wrs;
> +	u16 max_sq_wrs;
> +	u32 max_mr;
> +	u32 max_256pbl;
> +	u32 max_4kpbl;
> +	u32 max_qp;
> +	u32 max_irrq;
> +	u32 max_cq;
> +	u32 hte_index_mask;
> +	u32 adapter_size;
> +	u32 arp_table_size;
> +	u8  OneG_Mode;
> +
> +	/* search the list of existing adapters */
> +	list_for_each_entry(nesadapter, &nes_adapter_list, list) {
> +		dprintk("Searching Adapter list for PCI devfn = 0x%X,"
> +				" adapter PCI slot/bus = %u/%u, pci
devices PCI slot/bus = %u/%u, .\n",
> +				nesdev->pcidev->devfn,
> +				PCI_SLOT(nesadapter->devfn),
> +				nesadapter->bus_number,
> +				PCI_SLOT(nesdev->pcidev->devfn),
> +				nesdev->pcidev->bus->number );
> +		if ((PCI_SLOT(nesadapter->devfn) ==
PCI_SLOT(nesdev->pcidev->devfn)) &&
> +			(nesadapter->bus_number ==
nesdev->pcidev->bus->number)) {
> +			nesadapter->ref_count++;
> +			return(nesadapter);

you don't need any of this PCI bus scanning at all.  Please convert to
normal PCI usage



> +	/* no adapter found */
> +	num_pds = pci_resource_len(nesdev->pcidev, BAR_1) / 4096;

see, this is why the BAR_1 define should go away -- it's actually define

to the value '2'


> +	if (hw_rev != NE020_REV) {
> +		dprintk("%s: NE020 driver detected unknown hardware
revision 0x%x\n",
> +				__FUNCTION__, hw_rev);
> +		return(NULL);
> +	}

move this test to the top of the function


> +	dprintk("%s:%u Determine Soft Reset, QP_control=0x%x, CPU0=0x%x,
CPU1=0x%x, CPU2=0x%x\n",
> +			__FUNCTION__, __LINE__,
> +			nes_read_indexed(nesdev, NES_IDX_QP_CONTROL +
PCI_FUNC(nesdev->pcidev->devfn) * 8),
> +			nes_read_indexed(nesdev,
NES_IDX_INT_CPU_STATUS),
> +			nes_read_indexed(nesdev, 0x00A4),
> +			nes_read_indexed(nesdev, 0x00A8));
> +
> +		dprintk("%s: Reset and init NE020\n", __FUNCTION__);
> +		if ((port_count = nes_reset_adapter_ne020(nesdev,
&OneG_Mode)) == 0) {
> +			return(NULL);
> +		}
> +		if (nes_init_serdes(nesdev, port_count)) {
> +			return(NULL);
> +		}

kill braces


> +		nes_init_csr_ne020(nesdev, hw_rev, port_count);
> +
> +	/* Setup and enable the periodic timer */
> +	nesdev->et_rx_coalesce_usecs_irq = interrupt_mod_interval;
> +	if (nesdev->et_rx_coalesce_usecs_irq) {
> +		nes_write32(nesdev->regs+NES_PERIODIC_CONTROL,
0x80000000 |
> +				((u32)(nesdev->et_rx_coalesce_usecs_irq
* 8)));
> +	} else {
> +		nes_write32(nesdev->regs+NES_PERIODIC_CONTROL,
0x00000000);
> +	}
> +
> +	max_qp = nes_read_indexed(nesdev, NES_IDX_QP_CTX_SIZE);
> +	dprintk("%s: QP_CTX_SIZE=%u\n", __FUNCTION__, max_qp);
> +
> +	u32temp = nes_read_indexed(nesdev,
NES_IDX_QUAD_HASH_TABLE_SIZE);
> +	if (max_qp > ((u32)1 << (u32temp & 0x001f))) {
> +		dprintk("Reducing Max QPs to %u due to hash table size =
0x%08X\n",
> +				max_qp, u32temp);
> +		max_qp = (u32)1 << (u32temp & 0x001f);
> +	}
> +
> +	hte_index_mask = ((u32)1 << ((u32temp & 0x001f)+1))-1;
> +	dprintk("Max QP = %u, hte_index_mask = 0x%08X.\n", max_qp,
hte_index_mask);
> +
> +	u32temp = nes_read_indexed(nesdev, NES_IDX_IRRQ_COUNT);
> +
> +	max_irrq = 1 << (u32temp & 0x001f);
> +
> +	if (max_qp > max_irrq) {
> +		max_qp = max_irrq;
> +		dprintk("Reducing Max QPs to %u due to Available
Q1s.\n", max_qp);
> +	}
> +
> +	/* there should be no reason to allocate more pds than qps */
> +	if (num_pds > max_qp)
> +		num_pds = max_qp;
> +
> +	u32temp = nes_read_indexed(nesdev, NES_IDX_MRT_SIZE);
> +	max_mr = (u32)8192 << (u32temp & 0x7);
> +
> +	u32temp = nes_read_indexed(nesdev, NES_IDX_PBL_REGION_SIZE);
> +	max_256pbl = (u32)1 << (u32temp & 0x0000001f);
> +	max_4kpbl = (u32)1 << ((u32temp >> 16) & 0x0000001f);
> +	max_cq = nes_read_indexed(nesdev, NES_IDX_CQ_CTX_SIZE);
> +
> +	u32temp = nes_read_indexed(nesdev, NES_IDX_ARP_CACHE_SIZE);
> +	arp_table_size = 1 << u32temp;
> +
> +	adapter_size = (sizeof(struct nes_adapter) +
> +			(sizeof(unsigned long)-1)) & (~(sizeof(unsigned
long)-1));
> +	adapter_size += sizeof(unsigned long) * BITS_TO_LONGS(max_qp);
> +	adapter_size += sizeof(unsigned long) * BITS_TO_LONGS(max_mr);
> +	adapter_size += sizeof(unsigned long) * BITS_TO_LONGS(max_cq);
> +	adapter_size += sizeof(unsigned long) * BITS_TO_LONGS(num_pds);
> +	adapter_size += sizeof(unsigned long) *
BITS_TO_LONGS(arp_table_size);
> +	adapter_size += sizeof(struct nes_qp **) * max_qp;
> +
> +	/* allocate a new adapter struct */
> +	nesadapter = kmalloc(adapter_size, GFP_KERNEL);
> +	if (nesadapter == NULL) {
> +		return(NULL);
> +	}
> +	memset(nesadapter, 0, adapter_size);

kzalloc


> +	dprintk("Allocating new nesadapter @ %p, size = %u (actual size
= %u).\n",
> +			nesadapter, (u32)sizeof(struct nes_adapter),
adapter_size);
> +
> +	/* populate the new nesadapter */
> +	nesadapter->devfn = nesdev->pcidev->devfn;
> +	nesadapter->bus_number = nesdev->pcidev->bus->number;

wrong wrong wrong.  Use pci_dev pointer for comparison.  Store that, not

bus number and devfn.


> +	nesadapter->ref_count = 1;
> +	nesadapter->timer_int_req = 0xffff0000;
> +	nesadapter->OneG_Mode = OneG_Mode;
> +
> +	/* nesadapter->tick_delta = clk_divisor; */
> +	nesadapter->hw_rev = hw_rev;
> +	nesadapter->port_count = port_count;
> +
> +	nesadapter->max_qp = max_qp;
> +	nesadapter->hte_index_mask = hte_index_mask;
> +	nesadapter->max_irrq = max_irrq;
> +	nesadapter->max_mr = max_mr;
> +	nesadapter->max_256pbl = max_256pbl - 1;
> +	nesadapter->max_4kpbl = max_4kpbl - 1;
> +	nesadapter->max_cq = max_cq;
> +	nesadapter->free_256pbl = max_256pbl - 1;
> +	nesadapter->free_4kpbl = max_4kpbl - 1;
> +	nesadapter->max_pd = num_pds;
> +	nesadapter->arp_table_size = arp_table_size;
> +	nesadapter->base_pd = 1;
> +
> +	nesadapter->device_cap_flags =
> +			IB_DEVICE_ZERO_STAG | IB_DEVICE_SEND_W_INV |
IB_DEVICE_MEM_WINDOW;
> +
> +	nesadapter->allocated_qps = (unsigned long *)&(((unsigned char
*)nesadapter)
> +			[(sizeof(struct nes_adapter)+(sizeof(unsigned
long)-1))&(~(sizeof(unsigned long)-1))]);
> +	nesadapter->allocated_cqs =
&nesadapter->allocated_qps[BITS_TO_LONGS(max_qp)];
> +	nesadapter->allocated_mrs =
&nesadapter->allocated_cqs[BITS_TO_LONGS(max_cq)];
> +	nesadapter->allocated_pds =
&nesadapter->allocated_mrs[BITS_TO_LONGS(max_mr)];
> +	nesadapter->allocated_arps =
&nesadapter->allocated_pds[BITS_TO_LONGS(num_pds)];
> +	nesadapter->qp_table = (struct nes_qp
**)(&nesadapter->allocated_arps[BITS_TO_LONGS(arp_table_size)]);
> +
> +
> +	/* mark the usual suspect QPs and CQs as in use */
> +	for (u32temp = 0; u32temp < NES_FIRST_QPN; u32temp++) {
> +		set_bit(u32temp, nesadapter->allocated_qps);
> +		set_bit(u32temp, nesadapter->allocated_cqs);
> +	}
> +
> +	u32temp = nes_read_indexed(nesdev, NES_IDX_QP_MAX_CFG_SIZES);
> +
> +	max_rq_wrs = ((u32temp >> 8) & 3);
> +	switch (max_rq_wrs) {
> +		case 0:
> +			max_rq_wrs = 4;
> +			break;
> +		case 1:
> +			max_rq_wrs = 16;
> +			break;
> +		case 2:
> +			max_rq_wrs = 32;
> +			break;
> +		case 3:
> +			max_rq_wrs = 512;
> +			break;
> +	}
> +
> +	max_sq_wrs = (u32temp & 3);
> +	switch (max_sq_wrs) {
> +		case 0:
> +			max_sq_wrs = 4;
> +			break;
> +		case 1:
> +			max_sq_wrs = 16;
> +			break;
> +		case 2:
> +			max_sq_wrs = 32;
> +			break;
> +		case 3:
> +			max_sq_wrs = 512;
> +			break;
> +	}
> +	nesadapter->max_qp_wr = min(max_rq_wrs, max_sq_wrs);
> +	dprintk("Max wqes = %u.\n", nesadapter->max_qp_wr);
> +
> +	nesadapter->max_irrq_wr = (u32temp >> 16) & 3;
> +	dprintk("%s: Max IRRQ wqes = %u.\n", __FUNCTION__,
nesadapter->max_irrq_wr);
> +
> +
> +	nesadapter->max_sge = 4;
> +	nesadapter->max_cqe = 32767;
> +
> +	if (nes_read_eeprom_values(nesdev, nesadapter)) {
> +		printk(KERN_ERR PFX "Unable to read EEPROM data.\n");
> +		kfree(nesadapter);
> +		return(NULL);
> +	}
> +
> +	u32temp = nes_read_indexed(nesdev, NES_IDX_TCP_TIMER_CONFIG);
> +	nes_write_indexed(nesdev, NES_IDX_TCP_TIMER_CONFIG,
> +			(u32temp & 0xff000000) |
(nesadapter->tcp_timer_core_clk_divisor & 0x00ffffff));
> +	dprintk("%s: TCP Timer Config0=%08x\n", __FUNCTION__,
> +			nes_read_indexed(nesdev,
NES_IDX_TCP_TIMER_CONFIG));
> +
> +	/* setup port configuration */
> +	if (nesadapter->port_count == 1) {
> +		u32temp = 0x00000000;
> +		if (nes_drv_opt & NES_DRV_OPT_DUAL_LOGICAL_PORT) {
> +			nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE,
0x00000002);
> +		} else {
> +			nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE,
0x00000003);
> +		}
> +	} else {
> +		if (nesadapter->port_count == 2) {
> +			u32temp = 0x00000044;
> +		} else {
> +			u32temp = 0x000000e4;
> +		}
> +		nes_write_indexed(nesdev, NES_IDX_TX_POOL_SIZE,
0x00000003);
> +	}
> +
> +	nes_write_indexed(nesdev, NES_IDX_NIC_LOGPORT_TO_PHYPORT,
u32temp);
> +	dprintk("%s: Probe time, LOG2PHY=%u\n", __FUNCTION__,
> +			nes_read_indexed(nesdev,
NES_IDX_NIC_LOGPORT_TO_PHYPORT));
> +
> +	spin_lock_init(&nesadapter->resource_lock);
> +	spin_lock_init(&nesadapter->phy_lock);
> +
> +	init_timer(&nesadapter->mh_timer);
> +	nesadapter->mh_timer.function = nes_mh_fix;
> +	nesadapter->mh_timer.expires = jiffies + (HZ/5);  /* 1 second */
> +	nesadapter->mh_timer.data = (unsigned long)nesdev;
> +	add_timer(&nesadapter->mh_timer);
> +
> +	INIT_LIST_HEAD(&nesadapter->nesvnic_list[0]);
> +	INIT_LIST_HEAD(&nesadapter->nesvnic_list[1]);
> +	INIT_LIST_HEAD(&nesadapter->nesvnic_list[2]);
> +	INIT_LIST_HEAD(&nesadapter->nesvnic_list[3]);
> +
> +	list_add_tail(&nesadapter->list, &nes_adapter_list);
> +
> +	return(nesadapter);
> +}
> +
> +
> +/**
> + * nes_reset_adapter_ne020
> + */
> +unsigned int nes_reset_adapter_ne020(struct nes_device *nesdev, u8
*OneG_Mode)
> +{
> +	u32 port_count;
> +	u32 u32temp;
> +	u32 i;
> +
> +	u32temp = nes_read32(nesdev->regs+NES_SOFTWARE_RESET);
> +	port_count = ((u32temp & 0x00000300) >> 8) + 1;
> +	/* TODO: assuming that both SERDES are set the same for now */
> +	*OneG_Mode = (u32temp & 0x00003c00) ? 0 : 1;
> +	dprintk("%s: Initial Software Reset = 0x%08X, port_count=%u\n",
__FUNCTION__, u32temp, port_count);
> +	if (*OneG_Mode) {
> +		dprintk("%s: Running in 1G mode.\n", __FUNCTION__);
> +	}
> +	u32temp &= 0xff00ffc0;
> +	switch (port_count) {
> +		case 1:
> +			u32temp |= 0x00ee0000;
> +			break;
> +		case 2:
> +			u32temp |= 0x00cc0000;
> +			break;
> +		case 4:
> +			u32temp |= 0x00000000;
> +			break;
> +		default:
> +			return (0);
> +			break;
> +	}
> +
> +	/* check and do full reset if needed */
> +	if (nes_read_indexed(nesdev,
NES_IDX_QP_CONTROL+(PCI_FUNC(nesdev->pcidev->devfn)*8))) {
> +		dprintk("Issuing Full Soft reset = 0x%08X\n", u32temp |
0xd);
> +		nes_write32(nesdev->regs+NES_SOFTWARE_RESET, u32temp |
0xd);
> +
> +		i = 0;
> +		while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) &
0x00000040) == 0) && i++ < 10000) {
> +			mdelay(1);
> +		}
> +		if (i >= 10000) {
> +			dprintk("Did not see full soft reset done.\n");
> +			return (0);
> +		}
> +	}
> +
> +	/* port reset */
> +	switch (port_count) {
> +		case 1:
> +			u32temp |= 0x00ee0010;
> +			break;
> +		case 2:
> +			u32temp |= 0x00cc0030;
> +			break;
> +		case 4:
> +			u32temp |= 0x00000030;
> +			break;
> +	}
> +
> +	dprintk("Issuing Port Soft reset = 0x%08X\n", u32temp | 0xd);
> +	nes_write32(nesdev->regs+NES_SOFTWARE_RESET, u32temp | 0xd);
> +
> +	i = 0;
> +	while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) &
0x00000040) == 0) && i++ < 10000) {
> +		mdelay(1);
> +	}
> +	if (i >= 10000) {
> +		dprintk("Did not see port soft reset done.\n");
> +		return (0);
> +	}
> +
> +	/* serdes 0 */
> +	i = 0;
> +	while (((u32temp = (nes_read_indexed(nesdev,
NES_IDX_ETH_SERDES_COMMON_STATUS0)
> +			& 0x0000000f)) != 0x0000000f) && i++ < 5000) {
> +		mdelay(1);

all these delays should be msleep()


> +	if (i >= 5000) {
> +		dprintk("Serdes 0 not ready, status=%x\n", u32temp);
> +		return (0);
> +	}
> +
> +	/* serdes 1 */
> +	if (port_count > 1) {
> +		i = 0;
> +		while (((u32temp = (nes_read_indexed(nesdev,
NES_IDX_ETH_SERDES_COMMON_STATUS1)
> +				& 0x0000000f)) != 0x0000000f) && i++ <
5000) {
> +			mdelay(1);
> +		}
> +		if (i >= 5000) {
> +			dprintk("Serdes 1 not ready, status=%x\n",
u32temp);
> +			return (0);
> +		}
> +	}
> +
> +	i = 0;
> +	while ((nes_read_indexed(nesdev, NES_IDX_INT_CPU_STATUS) !=
0x80) && i++ < 10000) {
> +		mdelay(1);
> +	}
> +	dprintk("%s:%u CPU_STATUS loops=%u\n", __FUNCTION__, __LINE__,
i);
> +	if (i >= 10000) {
> +		printk(KERN_ERR PFX "Internal CPU not ready, status =
%02X\n",
> +				nes_read_indexed(nesdev,
NES_IDX_INT_CPU_STATUS));
> +		return (0);
> +	}
> +
> +	return (port_count);
> +}
> +
> +
> +/**
> + * nes_init_serdes
> + */
> +int nes_init_serdes(struct nes_device *nesdev, u8 port_count)
> +{
> +	int i;
> +	u32 u32temp;
> +
> +	/* init serdes 0 */
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0,
0x00000008);
> +	i = 0;
> +	while (((u32temp = (nes_read_indexed(nesdev,
NES_IDX_ETH_SERDES_COMMON_STATUS0)
> +			& 0x0000000f)) != 0x0000000f) && i++ < 5000) {
> +		mdelay(1);
> +	}
> +	if (i >= 5000) {
> +		dprintk("Init: serdes 0 not ready, status=%x\n",
u32temp);
> +		return (1);
> +	}
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_EMP0,
0x000bdef7);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_DRIVE0,
0x9ce73000);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_RX_MODE0,
0x0ff00000);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_RX_SIGDET0,
0x00000000);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_BYPASS0,
0x00000000);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_LOOPBACK_CONTROL0,
0x00000000);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_RX_EQ_CONTROL0,
0xf0002222);
> +	nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_CDR_CONTROL0,
0x000000ff);
> +
> +	if (port_count > 1) {
> +		/* init serdes 1 */
> +		nes_write_indexed(nesdev,
NES_IDX_ETH_SERDES_COMMON_CONTROL1, 0x00000048);
> +		i = 0;
> +		while (((u32temp = (nes_read_indexed(nesdev,
NES_IDX_ETH_SERDES_COMMON_STATUS1) & 0x0000000f)) != 0x0000000f) &&
> +			   (i++ < 5000)) {
> +			mdelay(1);
> +		}
> +		if (i >= 5000) {
> +			printk("%s: Init: serdes 1 not ready,
status=%x\n", __FUNCTION__, u32temp);
> +			/* return 1; */
> +		}
> +		nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_EMP1,
0x000bdef7);
> +		nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_TX_DRIVE1,
0x9ce73000);
> +		nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_RX_MODE1,
0x0ff00000);
> +		nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_RX_SIGDET1,
0x00000000);
> +		nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_BYPASS1,
0x00000000);
> +		nes_write_indexed(nesdev,
NES_IDX_ETH_SERDES_LOOPBACK_CONTROL1, 0x00000000);
> +		nes_write_indexed(nesdev,
NES_IDX_ETH_SERDES_RX_EQ_CONTROL1, 0xf0002222);
> +		nes_write_indexed(nesdev,
NES_IDX_ETH_SERDES_CDR_CONTROL1, 0x000000ff);
> +	}
> +	return (0);
> +}
> +
> +
> +/**
> + * nes_init_csr_ne020
> + * Initialize registers for ne020 hardware
> + */
> +void nes_init_csr_ne020(struct nes_device *nesdev, u8 hw_rev, u8
port_count)
> +{
> +
> +	nes_write_indexed(nesdev, 0x000001E4, 0x00000007);	
> +	/* nes_write_indexed(nesdev, 0x000001E8, 0x000208C4); */	
> +	nes_write_indexed(nesdev, 0x000001E8, 0x00020844);	
> +	nes_write_indexed(nesdev, 0x000001D8, 0x00048002);	
> +	/* nes_write_indexed(nesdev, 0x000001D8, 0x0004B002); */  
> +	nes_write_indexed(nesdev, 0x000001FC, 0x00050005);	
> +	nes_write_indexed(nesdev, 0x00000600, 0x55555555);	
> +	nes_write_indexed(nesdev, 0x00000604, 0x55555555);	
> +
> +	/* TODO: move these MAC register settings to NIC bringup */
> +	nes_write_indexed(nesdev, 0x00002000, 0x00000001);	
> +	nes_write_indexed(nesdev, 0x00002004, 0x00000001);	
> +	nes_write_indexed(nesdev, 0x00002008, 0x0000FFFF);	
> +	nes_write_indexed(nesdev, 0x0000200C, 0x00000001);	
> +	nes_write_indexed(nesdev, 0x00002010, 0x000003c1);	
> +	nes_write_indexed(nesdev, 0x0000201C, 0x75345678);	
> +	if (port_count > 1) {
> +		nes_write_indexed(nesdev, 0x00002200, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002204, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002208, 0x0000FFFF);	
> +		nes_write_indexed(nesdev, 0x0000220C, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002210, 0x000003c1);	
> +		nes_write_indexed(nesdev, 0x0000221C, 0x75345678);	
> +	}
> +	if (port_count > 2) {
> +		nes_write_indexed(nesdev, 0x00002400, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002404, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002408, 0x0000FFFF);	
> +		nes_write_indexed(nesdev, 0x0000240C, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002410, 0x000003c1);	
> +		nes_write_indexed(nesdev, 0x0000241C, 0x75345678);	
> +
> +		nes_write_indexed(nesdev, 0x00002600, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002604, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002608, 0x0000FFFF);	
> +		nes_write_indexed(nesdev, 0x0000260C, 0x00000001);	
> +		nes_write_indexed(nesdev, 0x00002610, 0x000003c1);	
> +		nes_write_indexed(nesdev, 0x0000261C, 0x75345678);	
> +	}
> +
> +	nes_write_indexed(nesdev, 0x00005000, 0x00018000);	
> +	/* nes_write_indexed(nesdev, 0x00005000, 0x00010000); */  
> +	nes_write_indexed(nesdev, 0x00005004, 0x00020001);	
> +	nes_write_indexed(nesdev, 0x00005008, 0x1F1F1F1F);	
> +	nes_write_indexed(nesdev, 0x00005010, 0x1F1F1F1F);	
> +	nes_write_indexed(nesdev, 0x00005018, 0x1F1F1F1F);	
> +	nes_write_indexed(nesdev, 0x00005020, 0x1F1F1F1F);	
> +	nes_write_indexed(nesdev, 0x00006090, 0xFFFFFFFF);	
> +
> +	/* TODO: move this to code, get from EEPROM */
> +	nes_write_indexed(nesdev, 0x00000900, 0x20000001);	
> +	nes_write_indexed(nesdev, 0x000060C0, 0x0000028e);	
> +	nes_write_indexed(nesdev, 0x000060C8, 0x00000020);	
> +
> +	nes_write_indexed(nesdev, 0x000001EC, 0x5b2625a0);	
> +	/* nes_write_indexed(nesdev, 0x000001EC, 0x5f2625a0); */  
> +	
> +}
> +
> +
> +/**
> + * nes_destroy_adapter - destroy the adapter structure
> + */
> +void nes_destroy_adapter(struct nes_adapter *nesadapter)
> +{
> +	struct nes_adapter *tmp_adapter;
> +
> +	list_for_each_entry(tmp_adapter, &nes_adapter_list, list) {
> +		dprintk("%s: Nes Adapter list entry = 0x%p.\n",
__FUNCTION__, tmp_adapter);
> +	}
> +
> +	nesadapter->ref_count--;
> +	if (!nesadapter->ref_count) {
> +			del_timer(&nesadapter->mh_timer);
> +
> +		dprintk("nes_destroy_adapter: Deleting adapter from
adapter list.\n");
> +		list_del(&nesadapter->list);
> +		dprintk("nes_destroy_adapter: Freeing adapter
structure.\n");
> +		kfree(nesadapter);
> +	}
> +	dprintk("%s: Done.\n", __FUNCTION__);
> +}
> +
> +
> +/**
> + * nes_init_cqp
> + */
> +int nes_init_cqp(struct nes_device *nesdev)
> +{
> +	struct nes_adapter *nesadapter = nesdev->nesadapter;
> +	struct nes_hw_cqp_qp_context *cqp_qp_context;
> +	struct nes_hw_cqp_wqe *cqp_wqe;
> +	struct nes_hw_ceq *ceq;
> +	struct nes_hw_ceq *nic_ceq;
> +	struct nes_hw_aeq *aeq;
> +	void *vmem;
> +	dma_addr_t pmem;
> +	u32 count=0;
> +	u32 cqp_head;
> +	u64 u64temp;
> +	u32 u32temp;
> +
> +#define NES_NIC_CEQ_SIZE 8
> +/* NICs will be on a separate CQ */
> +#define NES_CCEQ_SIZE ((nesadapter->max_cq / nesadapter->port_count)
- 32)
> +
> +	/* allocate CQP memory */
> +	/* Need to add max_cq to the aeq size once cq overflow checking
is added back */
> +	/* SQ is 512 byte aligned, others are 256 byte aligned */
> +	nesdev->cqp_mem_size = 512 +
> +			(sizeof(struct nes_hw_cqp_wqe) *
NES_CQP_SQ_SIZE) +
> +			(sizeof(struct nes_hw_cqe) * NES_CCQ_SIZE) +
> +			max(((u32)sizeof(struct nes_hw_ceqe) *
NES_CCEQ_SIZE), (u32)256) +
> +			max(((u32)sizeof(struct nes_hw_ceqe) *
NES_NIC_CEQ_SIZE), (u32)256) +
> +			(sizeof(struct nes_hw_aeqe) *
nesadapter->max_qp) +
> +			sizeof(struct nes_hw_cqp_qp_context);
> +
> +	nesdev->cqp_vbase = pci_alloc_consistent(nesdev->pcidev,
nesdev->cqp_mem_size,
> +			&nesdev->cqp_pbase);
> +	if (!nesdev->cqp_vbase) {
> +		dprintk(KERN_ERR PFX "Unable to allocate memory for host
descriptor rings\n");
> +		return(-ENOMEM);
> +	}
> +	memset(nesdev->cqp_vbase, 0, nesdev->cqp_mem_size);
> +
> +	/* Allocate a twice the number of CQP requests as the SQ size */
> +	nesdev->nes_cqp_requests = kmalloc(sizeof(struct
nes_cqp_request) *
> +			2 * NES_CQP_SQ_SIZE, GFP_KERNEL);

kzalloc


> +	if (NULL == nesdev->nes_cqp_requests) {

kernel standard:  variable comes first, then comparison constant

or just 'if (!foo)'


> +		dprintk(KERN_ERR PFX "Unable to allocate memory CQP
request entries.\n");
> +		pci_free_consistent(nesdev->pcidev,
nesdev->cqp_mem_size, nesdev->cqp.sq_vbase,
> +				nesdev->cqp.sq_pbase);
> +		return(-ENOMEM);
> +	}
> +	memset(nesdev->nes_cqp_requests, 0, sizeof(struct
nes_cqp_request) *
> +			2 * NES_CQP_SQ_SIZE);
> +	dprintk("Allocated CQP structures at %p (phys = %016lX), size =
%u.\n",
> +			nesdev->cqp_vbase, (unsigned
long)nesdev->cqp_pbase, nesdev->cqp_mem_size);
> +
> +	spin_lock_init(&nesdev->cqp.lock);
> +	init_waitqueue_head(&nesdev->cqp.waitq);
> +
> +	/* Setup Various Structures */
> +	vmem = (void *)(((unsigned long long)nesdev->cqp_vbase + (512 -
1)) &
> +			~(unsigned long long)(512 - 1));
> +	pmem = (dma_addr_t)(((unsigned long long)nesdev->cqp_pbase +
(512 - 1)) &
> +			~(unsigned long long)(512 - 1));
> +
> +	nesdev->cqp.sq_vbase = vmem;
> +	nesdev->cqp.sq_pbase = pmem;
> +	nesdev->cqp.sq_size = NES_CQP_SQ_SIZE;
> +	nesdev->cqp.sq_head = 0;
> +	nesdev->cqp.sq_tail = 0;
> +	nesdev->cqp.qp_id = PCI_FUNC(nesdev->pcidev->devfn);

kill


> +	dprintk("CQP at %p (phys = %016lX).\n",
> +			nesdev->cqp.sq_vbase,	(unsigned
long)nesdev->cqp.sq_pbase);
> +
> +	vmem += (sizeof(struct nes_hw_cqp_wqe) * nesdev->cqp.sq_size);
> +	pmem += (sizeof(struct nes_hw_cqp_wqe) * nesdev->cqp.sq_size);
> +
> +	nesdev->ccq.cq_vbase = vmem;
> +	nesdev->ccq.cq_pbase = pmem;
> +	nesdev->ccq.cq_size = NES_CCQ_SIZE;
> +	nesdev->ccq.cq_head = 0;
> +	nesdev->ccq.ce_handler = nes_cqp_ce_handler;
> +	nesdev->ccq.cq_number = PCI_FUNC(nesdev->pcidev->devfn);

kill


> +	dprintk("CCQ at %p (phys = %016lX).\n",
> +			nesdev->ccq.cq_vbase, (unsigned
long)nesdev->ccq.cq_pbase);
> +
> +	vmem += (sizeof(struct nes_hw_cqe) * nesdev->ccq.cq_size);
> +	pmem += (sizeof(struct nes_hw_cqe) * nesdev->ccq.cq_size);
> +
> +	nesdev->ceq_index = PCI_FUNC(nesdev->pcidev->devfn);
> +	ceq = &nesadapter->ceq[nesdev->ceq_index];
> +	ceq->ceq_vbase = vmem;
> +	ceq->ceq_pbase = pmem;
> +	ceq->ceq_size = NES_CCEQ_SIZE;
> +	ceq->ceq_head = 0;
> +	dprintk("CEQ at %p (phys = %016lX).\n",
> +			ceq->ceq_vbase, (unsigned long)ceq->ceq_pbase);
> +
> +	vmem += max(((u32)sizeof(struct nes_hw_ceqe) * ceq->ceq_size),
(u32)256);
> +	pmem += max(((u32)sizeof(struct nes_hw_ceqe) * ceq->ceq_size),
(u32)256);
> +
> +	nesdev->nic_ceq_index = PCI_FUNC(nesdev->pcidev->devfn) + 8;
> +	nic_ceq = &nesadapter->ceq[nesdev->nic_ceq_index];
> +	nic_ceq->ceq_vbase = vmem;
> +	nic_ceq->ceq_pbase = pmem;
> +	nic_ceq->ceq_size = NES_NIC_CEQ_SIZE;
> +	nic_ceq->ceq_head = 0;
> +	dprintk("NIC CEQ at %p (phys = %016lX).\n",
> +			nic_ceq->ceq_vbase, (unsigned
long)nic_ceq->ceq_pbase);
> +
> +	vmem += max(((u32)sizeof(struct nes_hw_ceqe) *
nic_ceq->ceq_size), (u32)256);
> +	pmem += max(((u32)sizeof(struct nes_hw_ceqe) *
nic_ceq->ceq_size), (u32)256);
> +
> +	aeq = &nesadapter->aeq[PCI_FUNC(nesdev->pcidev->devfn)];
> +	aeq->aeq_vbase = vmem;
> +	aeq->aeq_pbase = pmem;
> +	aeq->aeq_size = nesadapter->max_qp;
> +	aeq->aeq_head = 0;
> +	dprintk("AEQ  at %p (phys = %016lX).\n",
> +			aeq->aeq_vbase, (unsigned long)aeq->aeq_pbase);
> +
> +	/* Setup QP Context */
> +	vmem += (sizeof(struct nes_hw_aeqe) * aeq->aeq_size);
> +	pmem += (sizeof(struct nes_hw_aeqe) * aeq->aeq_size);
> +
> +	cqp_qp_context = vmem;
> +	cqp_qp_context->context_words[0] =
(PCI_FUNC(nesdev->pcidev->devfn) << 12) + (2 << 10);
> +	cqp_qp_context->context_words[1] = 0;
> +	cqp_qp_context->context_words[2] = (u32)nesdev->cqp.sq_pbase;
> +	cqp_qp_context->context_words[3] = ((u64)nesdev->cqp.sq_pbase)
>> 32;
> +
> +	dprintk("Address of CQP Context = %p.\n", cqp_qp_context);
> +	for (count=0;count<4 ; count++) {
> +		dprintk("CQP Context, Line %u = %08X.\n",
> +				count,
cqp_qp_context->context_words[count]);
> +	}
> +
> +	/* Write the address to Create CQP */
> +	if ((sizeof(dma_addr_t) > 4)) {
> +		nes_write_indexed(nesdev,
> +				NES_IDX_CREATE_CQP_HIGH +
(PCI_FUNC(nesdev->pcidev->devfn) * 8),
> +				((u64)pmem) >> 32);
> +	} else {
> +		nes_write_indexed(nesdev,
> +				NES_IDX_CREATE_CQP_HIGH +
(PCI_FUNC(nesdev->pcidev->devfn) * 8), 0);
> +	}
> +	nes_write_indexed(nesdev,
> +			NES_IDX_CREATE_CQP_LOW +
(PCI_FUNC(nesdev->pcidev->devfn) * 8),
> +			(u32)pmem);
> +
> +	dprintk("Address of CQP SQ = %p.\n", nesdev->cqp.sq_vbase);
> +
> +	INIT_LIST_HEAD(&nesdev->cqp_avail_reqs);
> +	INIT_LIST_HEAD(&nesdev->cqp_pending_reqs);
> +
> +	for (count=0; count<2*NES_CQP_SQ_SIZE; count++) {
> +
init_waitqueue_head(&nesdev->nes_cqp_requests[count].waitq);
> +		list_add_tail(&nesdev->nes_cqp_requests[count].list,
&nesdev->cqp_avail_reqs);
> +		/* dprintk("Adding cqp request %p to the available list
\n",
> +				&nesdev->nes_cqp_requests[count]); */
> +	}
> +
> +	/* Write Create CCQ WQE */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_CREATE_CQ | NES_CQP_CQ_CEQ_VALID |
> +			NES_CQP_CQ_CHK_OVERFLOW |
((u32)nesdev->ccq.cq_size << 16));
> +	cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] =
cpu_to_le32(nesdev->ccq.cq_number |
> +			((u32)nesdev->ceq_index<<16));
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0;
> +	*((struct nes_hw_cqp
**)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp;
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0;
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0;
> +	u64temp = (u64)nesdev->ccq.cq_pbase;
> +	cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_LOW_IDX] =
cpu_to_le32((u32)u64temp);
> +	cqp_wqe->wqe_words[NES_CQP_CQ_WQE_PBL_HIGH_IDX] =
cpu_to_le32((u32)(u64temp >> 32));
> +	cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX] = 0;
> +	/* TODO: the following 2 lines likely have endian issues */
> +	*((struct nes_hw_cq
**)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]) =
&nesdev->ccq;
> +	*((u64 *)&cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX])
>>= 1;
> +	dprintk("%s: CQ%u context = 0x%08X:0x%08X.\n", __FUNCTION__,
nesdev->ccq.cq_number,
> +
cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_HIGH_IDX],
> +
cqp_wqe->wqe_words[NES_CQP_CQ_WQE_CQ_CONTEXT_LOW_IDX]);
> +
> +	cqp_wqe->wqe_words[NES_CQP_CQ_WQE_DOORBELL_INDEX_HIGH_IDX] = 0;
> +
> +	/* Write Create CEQ WQE */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_CREATE_CEQ +
> +			((u32)nesdev->ceq_index << 8));
> +	cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_ELEMENT_COUNT_IDX] =
cpu_to_le32(ceq->ceq_size);
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0;
> +	*((struct nes_hw_cqp
**)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp;
> +	*((struct nes_cqp_request
**)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX]) = NULL;
> +	u64temp = (u64)ceq->ceq_pbase;
> +	cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_PBL_LOW_IDX] =
cpu_to_le32((u32)u64temp);
> +	cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_PBL_HIGH_IDX] =
cpu_to_le32((u32)(u64temp >> 32));
> +
> +	/* Write Create AEQ WQE */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_CREATE_AEQ +
> +			((u32)PCI_FUNC(nesdev->pcidev->devfn) << 8));
> +	cqp_wqe->wqe_words[NES_CQP_AEQ_WQE_ELEMENT_COUNT_IDX] =
cpu_to_le32(aeq->aeq_size);
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0;
> +	*((struct nes_hw_cqp
**)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp;
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0;
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0;
> +	u64temp = (u64)aeq->aeq_pbase;
> +	cqp_wqe->wqe_words[NES_CQP_AEQ_WQE_PBL_LOW_IDX] =
cpu_to_le32((u32)u64temp);
> +	cqp_wqe->wqe_words[NES_CQP_AEQ_WQE_PBL_HIGH_IDX] =
cpu_to_le32((u32)(u64temp >> 32));
> +
> +	/* Write Create CEQ WQE */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_CREATE_CEQ +
> +			((u32)nesdev->nic_ceq_index << 8));
> +	cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_ELEMENT_COUNT_IDX] =
cpu_to_le32(nic_ceq->ceq_size);
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0;
> +	*((struct nes_hw_cqp
**)&cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_LOW_IDX]) = &nesdev->cqp;
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX] = 0;
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX] = 0;
> +	u64temp = (u64)nic_ceq->ceq_pbase;
> +	cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_PBL_LOW_IDX] =
cpu_to_le32((u32)u64temp);
> +	cqp_wqe->wqe_words[NES_CQP_CEQ_WQE_PBL_HIGH_IDX] =
cpu_to_le32((u32)(u64temp >> 32));

remove all these pointless u32 casts


> +	/* Poll until CCQP done */
> +	count = 0;
> +	do {
> +		if (count++ > 1000) {
> +			printk(KERN_ERR PFX "Error creating CQP\n");
> +			pci_free_consistent(nesdev->pcidev,
nesdev->cqp_mem_size,
> +					nesdev->cqp_vbase,
nesdev->cqp_pbase);

consolidate duplicated error handling code


> +			return(-1);
> +		}
> +		udelay(10);
> +	} while (!(nes_read_indexed(nesdev,
> +			NES_IDX_QP_CONTROL +
(PCI_FUNC(nesdev->pcidev->devfn) * 8)) & (1 << 8)));
> +
> +	dprintk("CQP Status = 0x%08X\n", nes_read_indexed(nesdev,
> +
NES_IDX_QP_CONTROL+(PCI_FUNC(nesdev->pcidev->devfn)*8)));
> +
> +	u32temp = 0x04800000;
> +	nes_write32(nesdev->regs+NES_WQE_ALLOC, u32temp |
nesdev->cqp.qp_id);
> +
> +	/* wait for the CCQ, CEQ, and AEQ to get created */
> +	count = 0;
> +	do {
> +		if (count++ > 1000) {
> +			printk(KERN_ERR PFX "Error creating CCQ, CEQ,
and AEQ\n");
> +			pci_free_consistent(nesdev->pcidev,
nesdev->cqp_mem_size,
> +					nesdev->cqp_vbase,
nesdev->cqp_pbase);
> +			return(-1);
> +		}
> +		udelay(10);
> +	} while (((nes_read_indexed(nesdev,
> +			NES_IDX_QP_CONTROL +
(PCI_FUNC(nesdev->pcidev->devfn)*8)) & (15<<8)) != (15<<8)));
> +
> +	/* dump the QP status value */
> +	dprintk("QP Status = 0x%08X\n", nes_read_indexed(nesdev,
> +
NES_IDX_QP_CONTROL+(PCI_FUNC(nesdev->pcidev->devfn)*8)));
> +
> +	nesdev->cqp.sq_tail++;
> +
> +	return (0);
> +}
> +
> +
> +/**
> + * nes_destroy_cqp
> + */
> +int nes_destroy_cqp(struct nes_device *nesdev)
> +{
> +	struct nes_hw_cqp_wqe *cqp_wqe;
> +	u32 count=0;
> +	u32 cqp_head;
> +	unsigned long flags;
> +
> +	dprintk("Waiting for CQP work to complete.\n");
> +	do {
> +		if (count++ > 1000)	break;
> +		udelay(10);
> +	} while (!(nesdev->cqp.sq_head == nesdev->cqp.sq_tail));
> +
> +	/* Reset CCQ */
> +	nes_write32(nesdev->regs+NES_CQE_ALLOC, NES_CQE_ALLOC_RESET |
> +			nesdev->ccq.cq_number);
> +
> +	/* Disable device interrupts */
> +	nes_write32(nesdev->regs+NES_INT_MASK, 0x7fffffff);
> +	/* Destroy the AEQ */
> +	spin_lock_irqsave(&nesdev->cqp.lock, flags);
> +	cqp_head = nesdev->cqp.sq_head++;
> +	nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_DESTROY_AEQ |
> +			((u32)PCI_FUNC(nesdev->pcidev->devfn)<<8));
> +	cqp_wqe->wqe_words[NES_CQP_WQE_COMP_CTX_HIGH_IDX] = 0;
> +	/* Destroy the NIC CEQ */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_DESTROY_CEQ |
> +			((u32)nesdev->nic_ceq_index<<8));
> +	/* Destroy the CEQ */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_DESTROY_CEQ |
> +			(nesdev->ceq_index<<8));
> +	/* Destroy the CCQ */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_DESTROY_CQ);
> +	cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] = cpu_to_le32(
nesdev->ccq.cq_number ||
> +			((u32)nesdev->ceq_index<<16));
> +	/* Destroy CQP */
> +	cqp_head = nesdev->cqp.sq_head++;
> +	nesdev->cqp.sq_head &= nesdev->cqp.sq_size-1;
> +	cqp_wqe = &nesdev->cqp.sq_vbase[cqp_head];
> +	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] =
cpu_to_le32(NES_CQP_DESTROY_QP |
> +			NES_CQP_QP_TYPE_CQP);
> +	cqp_wqe->wqe_words[NES_CQP_WQE_ID_IDX] =
cpu_to_le32(nesdev->cqp.qp_id);
> +
> +	barrier();

insufficient


> +	/* Ring doorbell (4 WQEs) */
> +	nes_write32(nesdev->regs+NES_WQE_ALLOC, 0x05800000 |
nesdev->cqp.qp_id);
> +
> +	/* Wait for the destroy to complete */
> +	spin_unlock_irqrestore(&nesdev->cqp.lock, flags);
> +
> +	/* wait for the CCQ, CEQ, and AEQ to get destroyed */
> +	count = 0;
> +	do {
> +		if (count++ > 1000) {
> +			printk(KERN_ERR PFX "Function%d: Error
destroying CCQ, CEQ, and AEQ\n",
> +
PCI_FUNC(nesdev->pcidev->devfn));
> +			break;
> +		}
> +		udelay(10);
> +	} while (((nes_read_indexed(nesdev,
> +			NES_IDX_QP_CONTROL +
(PCI_FUNC(nesdev->pcidev->devfn)*8)) & (15<<8)) != 0));
> +
> +	/* dump the QP status value */
> +	dprintk("Function%d: QP Status = 0x%08X\n",
> +			PCI_FUNC(nesdev->pcidev->devfn),
> +			nes_read_indexed(nesdev,
> +
NES_IDX_QP_CONTROL+(PCI_FUNC(nesdev->pcidev->devfn)*8)));
> +
> +	kfree(nesdev->nes_cqp_requests);
> +
> +	/* Free the control structures */
> +	pci_free_consistent(nesdev->pcidev, nesdev->cqp_mem_size,
nesdev->cqp.sq_vbase,
> +			nesdev->cqp.sq_pbase);
> +
> +	return (0);
> +}
> +
> +
> +/**
> + * nes_init_phy
> + */
> +int nes_init_phy(struct nes_device *nesdev)
> +{
> +	struct nes_adapter *nesadapter = nesdev->nesadapter;
> +	u32 counter = 0;
> +	u32 mac_index = nesdev->mac_index;
> +	u16 phy_data;
> +
> +	if (nesadapter->OneG_Mode) {
> +		dprintk("1G PHY, mac_index = %d.\n", mac_index);
> +		nes_read_1G_phy_reg(nesdev, 1,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 1 phy address %u =
0x%X.\n",
> +				nesadapter->phy_index[mac_index],
phy_data);
> +
> +		nes_write_1G_phy_reg(nesdev, 23,
nesadapter->phy_index[mac_index],  0xb000);
> +
> +		/* Reset the PHY */
> +		nes_write_1G_phy_reg(nesdev, 0,
nesadapter->phy_index[mac_index], 0x8000);
> +		udelay(100);
> +		counter = 0;
> +		do {
> +			nes_read_1G_phy_reg(nesdev, 0,
nesadapter->phy_index[mac_index], &phy_data);
> +			dprintk("Phy data from register 0 = 0x%X.\n",
phy_data);
> +			if (counter++ > 100) break;
> +		} while (phy_data & 0x8000);
> +
> +		/* Setting no phy loopback */
> +		phy_data &= 0xbfff;
> +		phy_data |= 0x1140;
> +		nes_write_1G_phy_reg(nesdev, 0,
nesadapter->phy_index[mac_index],  phy_data);
> +		nes_read_1G_phy_reg(nesdev, 0,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0 = 0x%X.\n", phy_data);
> +
> +		nes_read_1G_phy_reg(nesdev, 0x17,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x17 = 0x%X.\n",
phy_data);
> +
> +		nes_read_1G_phy_reg(nesdev, 0x1e,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x1e = 0x%X.\n",
phy_data);
> +
> +		/* Setting the interrupt mask */
> +		nes_read_1G_phy_reg(nesdev, 0x19,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x19 = 0x%X.\n",
phy_data);
> +		nes_write_1G_phy_reg(nesdev, 0x19,
nesadapter->phy_index[mac_index], 0xffee);
> +
> +		nes_read_1G_phy_reg(nesdev, 0x19,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x19 = 0x%X.\n",
phy_data);
> +
> +		/* turning on flow control */
> +		nes_read_1G_phy_reg(nesdev, 4,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x4 = 0x%X.\n",
phy_data);
> +		nes_write_1G_phy_reg(nesdev, 4,
nesadapter->phy_index[mac_index],
> +				(phy_data & ~(0x03E0)) | 0xc00);
> +		/* nes_write_1G_phy_reg(nesdev, 4,
nesadapter->phy_index[mac_index],
> +				phy_data | 0xc00); */
> +		nes_read_1G_phy_reg(nesdev, 4,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x4 = 0x%X.\n",
phy_data);
> +
> +		nes_read_1G_phy_reg(nesdev, 9,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x9 = 0x%X.\n",
phy_data);
> +		/* Clear Half duplex */
> +		nes_write_1G_phy_reg(nesdev, 9,
nesadapter->phy_index[mac_index],
> +				phy_data & ~(0x0100));
> +		nes_read_1G_phy_reg(nesdev, 9,
nesadapter->phy_index[mac_index], &phy_data);
> +		dprintk("Phy data from register 0x9 = 0x%X.\n",
phy_data);
> +
> +		nes_read_1G_phy_reg(nesdev, 0,
nesadapter->phy_index[mac_index], &phy_data);
> +		nes_write_1G_phy_reg(nesdev, 0,
nesadapter->phy_index[mac_index], phy_data | 0x0300);

move all code down one indentation level.  adjust test at top of 
function to return accordingly, or use a goto

I stopped reviewing here.  please do all style changes, so we can review

this driver in depth.


^ permalink raw reply

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: jamal @ 2007-08-08 15:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: johnpol, peter.p.waskiewicz.jr, jeff, gaagaan, Robert.Olsson,
	kumarkr, rdreier, mcarlson, kaber, jagana, general, mchan, tgraf,
	netdev, shemminger, David Miller, sri
In-Reply-To: <20070808134247.GA9942@gondor.apana.org.au>

On Wed, 2007-08-08 at 21:42 +0800, Herbert Xu wrote:
> On Wed, Aug 08, 2007 at 03:49:00AM -0700, David Miller wrote:
> > 
> > Not because I think it obviates your work, but rather because I'm
> > curious, could you test a TSO-in-hardware driver converted to
> > batching and see how TSO alone compares to batching for a pure
> > TCP workload?
> 
> You could even lower the bar by disabling TSO and enabling
> software GSO.

>From my observation for TCP packets slightly above MDU (upto 2K), GSO
gives worse performance than non-GSO throughput-wise. Actually this has
nothing to do with batching, rather the behavior is consistent with or
without batching changes. 

> > I personally don't think it will help for that case at all as
> > TSO likely does better job of coalescing the work _and_ reducing
> > bus traffic as well as work in the TCP stack.
> 
> I agree.
> I suspect the bulk of the effort is in getting
> these skb's created and processed by the stack so that by
> the time that they're exiting the qdisc there's not much
> to be saved anymore.

pktgen shows a clear win if you test the driver path - which is what you
should test because thats where the batching changes are. 
Using TCP or UDP adds other variables[1] that need to be isolated first
in order to quantify the effect of batching. 
For throughput and CPU utilization, the benefit will be clear when there
are a lot more flows. 

cheers,
jamal

[1] I think there are too many other variables in play unfortunately
when you are dealing with a path that starts above the driver and one
that covers end to end effect:  traffic/app source, system clock sources
as per my recent discovery, congestion control algorithms used, tuning
of recevier etc.

^ permalink raw reply

* [ofa-general] Re: [PATCH 3/9 Rev3] [sched] Modify qdisc_run to support batching
From: jamal @ 2007-08-08 15:26 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: johnpol, herbert, gaagaan, Robert.Olsson, kumarkr, rdreier,
	peter.p.waskiewicz.jr, mcarlson, netdev, jagana, general, mchan,
	tgraf, jeff, shemminger, davem, sri
In-Reply-To: <46B9CD9D.4020506@trash.net>

On Wed, 2007-08-08 at 16:05 +0200, Patrick McHardy wrote:

> This is not completely avoidable of
> course, but you can and should limit the damage by only pulling
> out as much packets as the driver can take and have the driver
> stop the queue afterwards.

This why the dev->xmit_win exists in my patches. The driver says how
much space it has using this variable - and only those packets get
pulled off; so there is no conflict with any scheduling algorithm be it
work/non-work conserving. 
The ideal case is never to carry over anything in dev->blist. However,
there is a challenge with GSO/TSO: if say the descriptors in the driver
are less than the list of skbs, you are faced with the dilema of sending
a fraction of the gso list first.
My current approach is to transmit as much as the driver would allow.
On the next opportunity to transmit, you do immediately send anything
remaining first before you ask the qdisc for more packets. An
alternative approach i had entertained was not to send anything from the
skb list when hitting such a case, but it hasnt seem needed so far
(havent seen any leftovers from my experiments).

cheers,
jamal

^ permalink raw reply

* RE: [PATCH 3/14] nes: connection manager routines
From: Glenn Grundstrom @ 2007-08-08 15:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: rdreier, ewg, netdev
In-Reply-To: <p73d4xyryg2.fsf@bingen.suse.de>

This code is far from a TCP stack.  It's main purpose is to exchange
RDMA MPA request/response messages that are required by the iWarp specs
and therefore needed by our hardware.  All RNIC hardware vendors need
this MPA message exchange, including those already accepted into
kernel.org.  Do you have an alternative suggestion?

Thanks,
Glenn.

-----Original Message-----
From: ak@suse.de [mailto:ak@suse.de] On Behalf Of Andi Kleen
Sent: Wednesday, August 08, 2007 7:41 AM
To: Glenn Grundstrom
Cc: rdreier@cisco.com; ewg@lists.openfabrics.org; netdev@vger.kernel.org
Subject: Re: [PATCH 3/14] nes: connection manager routines

ggrundstrom@neteffect.com writes:

> NetEffect connection manager routines.

This seems more like a new TCP stack. I don't think such code is
appropiate, since Linux already has one.

-Andi

^ permalink raw reply

* Re: TCP's initial cwnd setting correct?...
From: John Heffner @ 2007-08-08 15:26 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Netdev, David Miller
In-Reply-To: <Pine.LNX.4.64.0708062129440.29123@kivilampi-30.cs.helsinki.fi>

That sounds right to me.

   -John


Ilpo Järvinen wrote:
> On Mon, 6 Aug 2007, Ilpo Järvinen wrote:
> 
>> ...Goto logic could be cleaner (somebody has any suggestion for better 
>> way to structure it?)
> 
> ...I could probably move the setting of snd_cwnd earlier to avoid 
> this problem if this seems a valid fix at all.
> 


^ permalink raw reply

* Re: [PATCH RFC]: napi_struct V5
From: jamal @ 2007-08-08 15:32 UTC (permalink / raw)
  To: Shirley Ma
  Cc: David Miller, jgarzik, netdev, netdev-owner, rdreier, rusty,
	shemminger
In-Reply-To: <OFA2F18805.38AA0BD0-ON87257331.005367FB-88257331.0027BEED@us.ibm.com>

On Wed, 2007-08-08 at 08:14 -0700, Shirley Ma wrote:

> Dave, could you please hold this portion of the patch for a moment. I
> will test this patch ASAP. According to our previous experience, this
> changes significant changes some IPoIB driver performance.
> 
If you adjust your quantum while doing that testing you may find an
optimal value. 
Think of a box where you have other network interfaces, the way you
are implementing currently implies you are going to be very unfair to 
the other interfaces on the box. 

cheers,
jamal  



^ permalink raw reply

* Re: TCP's initial cwnd setting correct?...
From: John Heffner @ 2007-08-08 15:20 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, netdev
In-Reply-To: <20070807.182823.123919803.davem@davemloft.net>

I believe the current calculation is correct.  The RFC specifies a 
window of no more than 4380 bytes unless 2*MSS > 4380.  If you change 
the code in this way, then MSS=1461 will give you an initial window of 
3*MSS == 4383, violating the spec.  Reading the pseudocode in the RFC 
3390 is a bit misleading because they use a clamp at 4380 bytes rather 
than use a multiplier in the relevant range.

   -John


David Miller wrote:
> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Mon, 6 Aug 2007 15:37:15 +0300 (EEST)
> 
>> @@ -805,13 +805,13 @@ void tcp_update_metrics(struct sock *sk)
>>  	}
>>  }
>>  
>> -/* Numbers are taken from RFC2414.  */
>> +/* Numbers are taken from RFC3390.  */
>>  __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst)
>>  {
>>  	__u32 cwnd = (dst ? dst_metric(dst, RTAX_INITCWND) : 0);
>>  
>>  	if (!cwnd) {
>> -		if (tp->mss_cache > 1460)
>> +		if (tp->mss_cache >= 2190)
>>  			cwnd = 2;
>>  		else
>>  			cwnd = (tp->mss_cache > 1095) ? 3 : 4;
> 
> I remember suggesting something similar about 5 or 6 years
> ago and Alexey Kuznetsov at the time explained the numbers
> which are there and why they should not be changed.
> 
> I forget the reasons though, and I'll try to do the research.
> 
> These numbers have been like this forever, FWIW.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [PATCH] ixgbe: New driver for Pci-Express 10GbE 82598 support
From: Kok, Auke @ 2007-08-08 15:56 UTC (permalink / raw)
  To: netdev
  Cc: jeff, ayyappan.veeraiyan, akpm, arjan, hch, shemminger, nhorman,
	inaky, mb, john.ronciak
In-Reply-To: <20070804040502.D9133F2C0F@doppio.foo-projects.org>

Auke Kok wrote:
> This patch adds support for the Intel 82598 PCI-Express 10GbE
> chipset. Devices will be available on the market soon.
> 
> This version of the driver is largely the same as the last release:
> 
> * Driver uses a single RX and single TX queue, each using 1 MSI-X
>   irq vector.
> * Driver runs in NAPI mode only
> * Driver is largely multiqueue-ready (TM)


For those interested in the specs of the product and all the features that it 
supports, I just noticed that the product brief was posted on our website with 
all the features and facts on the 82598-based products. Please take a look:


Overview:
   http://www.intel.com/design/network/products/lan/controllers/82598.htm

Product brief:
   http://download.intel.com/design/network/prodbrf/317796.pdf


Cheers,

Auke

^ permalink raw reply

* [RFT 0/4] fib_trie cleanup patches
From: Robert Olsson @ 2007-08-08 16:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, akpm, netdev
In-Reply-To: <20070727075917.470055328@linux-foundation.org>


Stephen Hemminger writes:

 > I don't have a machine with anywhere near enough routes to test this,
 > so would someone with many routes give it a go and make sure nothing
 > got busted in the process.

 Hello! 

 It's not only the numbers of routes thats important... 

 Anyway I've done what can to verify that route selection (comparing
 lookups via netlink and userland longest-prefix-match using the same
 full routing table) and locking (concurrent rDoS on multiple CPU:s 
 also while loading the full routing table) is intact.

 Keep TKEY_GET_MASK in patch #2 as it gives some hint whats going on.

 And how about Paul E. McKenney comment about rcu_dereference() 
 covering the initial fetch?

 BTW. Right now the lab is setup for tests described above...

 Cheers.

					--ro

 

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Roland Dreier @ 2007-08-08 16:18 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Andi Kleen, Jeff Garzik, ggrundstrom, ewg, netdev
In-Reply-To: <200708081555.03588.mb@bu3sch.de>

 > But there are indeed a few cases that look wrong.

yes...

 > arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);

eg this almost certainly wants to be

	writel(swab32(val), target);

or something equivalent like

	__raw_writel(cpu_to_be32(val), target);
	/* plus some suffficent memory ordering */

 - R.

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Jeff Garzik @ 2007-08-08 16:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ggrundstrom, rdreier, ewg, netdev
In-Reply-To: <p73hcnarykt.fsf@bingen.suse.de>

Andi Kleen wrote:
> Jeff Garzik <jeff@garzik.org> writes:
>>> +			val, reg_index, addr, addr+4); */
>>> +	writel(cpu_to_le32(reg_index), addr);
>>> +	writel(cpu_to_le32(val),(u8 *)addr + 4);
>> wrong -- endian conversion macros not needed with writel()
> 
> Are you sure?

Yes.

read[bwl] and write[bwl] are always defined in terms of the 
little-endian PCI bus.  This has been true since my first days in the 
kernel ages (decade+) ago, when we had a long discussion about it with 
regards to framebuffer drivers.

If you want to skip barriers and endian conversions, __raw_write[bwl]() 
exists.

The rare exceptions are a few embedded arches that implemented writel() 
for a non-PCI bus.  Those cases need to be renamed to mybus_writel(), 
but at least they do not interfere with mainstream drivers and APIs.


> I don't think that's true. e.g. powerpc writel 
> doesn't convert endian

Incorrect -- read the code.  PPC most certainly does convert endian.

Ten years ago Linus said something along the lines of "writel() means 
PCI means little endian.  period."  ;-)

	Jeff



^ permalink raw reply

* RE: [PATCH 68] drivers/net/s2io.c: kmalloc + memset conversion to k[cz]alloc
From: Ramkrishna Vepa @ 2007-08-08 16:16 UTC (permalink / raw)
  To: Jeff Garzik, Mariusz Kozlowski
  Cc: linux-kernel, kernel-janitors, Andrew Morton, netdev
In-Reply-To: <46B8EAD0.2040303@pobox.com>

We have a few patches queued and can send this patch in along with ours.

Ram

> -----Original Message-----
> From: netdev-owner@vger.kernel.org
[mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Jeff Garzik
> Sent: Tuesday, August 07, 2007 2:58 PM
> To: Mariusz Kozlowski
> Cc: linux-kernel@vger.kernel.org; kernel-janitors@vger.kernel.org;
Andrew
> Morton; netdev@vger.kernel.org
> Subject: Re: [PATCH 68] drivers/net/s2io.c: kmalloc + memset
conversion to
> k[cz]alloc
> 
> Mariusz Kozlowski wrote:
> > Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
> >
> >  drivers/net/s2io.c | 235587 -> 235340 (-247 bytes)
> >  drivers/net/s2io.o | 460768 -> 460120 (-648 bytes)
> >
> >  drivers/net/s2io.c |   14 +++++---------
> >  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> ACK but didn't apply, please wait 24-48 hours (so that s2io fixes go
> upstream), then rediff and resend
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Jeff Garzik @ 2007-08-08 16:25 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Michael Buesch, Andi Kleen, ggrundstrom, ewg, netdev
In-Reply-To: <aday7gmov8o.fsf@cisco.com>

Roland Dreier wrote:
>  > But there are indeed a few cases that look wrong.
> 
> yes...
> 
>  > arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
> 
> eg this almost certainly wants to be
> 
> 	writel(swab32(val), target);
> 
> or something equivalent like
> 
> 	__raw_writel(cpu_to_be32(val), target);
> 	/* plus some suffficent memory ordering */

Precisely.  Some of those cases are "we know the underlying writel() 
swaps... we want to swap in this case anyway"

	Jeff




^ permalink raw reply

* Re: [PATCH RFC]: napi_struct V6
From: Jeff Garzik @ 2007-08-08 16:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev, hadi, rusty
In-Reply-To: <20070808070628.284d805c@oldman>

Someone please make sure Documentation/networking/netdevices.txt remains 
correct with regards to APIs and locking.

It -is- up to date, unlike NAPI howto.  It should not change very much 
due to this napi_struct work though, if at all.

	Jeff




^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Michael Buesch @ 2007-08-08 16:30 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Andi Kleen, Jeff Garzik, ggrundstrom, ewg, netdev
In-Reply-To: <aday7gmov8o.fsf@cisco.com>

On Wednesday 08 August 2007 18:18:31 Roland Dreier wrote:
>  > But there are indeed a few cases that look wrong.
> 
> yes...
> 
>  > arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
> 
> eg this almost certainly wants to be
> 
> 	writel(swab32(val), target);
> 
> or something equivalent like
> 
> 	__raw_writel(cpu_to_be32(val), target);
> 	/* plus some suffficent memory ordering */
> 
>  - R.
> 
> 

certainly, yes.
Most likely the __raw_writel variant is portable, but I am not
sure. Anybody sure?

-- 
Greetings Michael.

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Jeff Garzik @ 2007-08-08 16:33 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Roland Dreier, Andi Kleen, ggrundstrom, ewg, netdev
In-Reply-To: <200708081830.20129.mb@bu3sch.de>

Michael Buesch wrote:
> On Wednesday 08 August 2007 18:18:31 Roland Dreier wrote:
>>  > But there are indeed a few cases that look wrong.
>>
>> yes...
>>
>>  > arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
>>
>> eg this almost certainly wants to be
>>
>> 	writel(swab32(val), target);
>>
>> or something equivalent like
>>
>> 	__raw_writel(cpu_to_be32(val), target);
>> 	/* plus some suffficent memory ordering */
>>
>>  - R.
>>
>>
> 
> certainly, yes.
> Most likely the __raw_writel variant is portable, but I am not
> sure. Anybody sure?

Yes, it's portable.  You must however be aware of the guarantees that 
writel() provides and __raw_writel() does not:  no barriers or flushes, 
no endian conversions, no ordering constraints, ...  Probably a few more 
details I'm forgetting too :)

	Jeff




^ permalink raw reply

* Re: [PATCH 68] drivers/net/s2io.c: kmalloc + memset conversion to k[cz]alloc
From: Jeff Garzik @ 2007-08-08 16:33 UTC (permalink / raw)
  To: Ramkrishna Vepa
  Cc: Mariusz Kozlowski, linux-kernel, kernel-janitors, Andrew Morton,
	netdev
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD77020354C0@nekter>

Ramkrishna Vepa wrote:
> We have a few patches queued and can send this patch in along with ours.

That would be great, thanks.

	Jeff




^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Roland Dreier @ 2007-08-08 16:39 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Andi Kleen, Jeff Garzik, ggrundstrom, ewg, netdev
In-Reply-To: <200708081830.20129.mb@bu3sch.de>

 > Most likely the __raw_writel variant is portable, but I am not
 > sure. Anybody sure?

Yes, it should be fine.  I use that construct in a few IB drivers.

 - R.

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Michael Buesch @ 2007-08-08 16:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Roland Dreier, Andi Kleen, ggrundstrom, ewg, netdev
In-Reply-To: <46B9F054.4030000@garzik.org>

On Wednesday 08 August 2007 18:33:24 Jeff Garzik wrote:
> Michael Buesch wrote:
> > On Wednesday 08 August 2007 18:18:31 Roland Dreier wrote:
> >>  > But there are indeed a few cases that look wrong.
> >>
> >> yes...
> >>
> >>  > arch/x86_64/kernel/pci-calgary.c:       writel(cpu_to_be32(val), target);
> >>
> >> eg this almost certainly wants to be
> >>
> >> 	writel(swab32(val), target);
> >>
> >> or something equivalent like
> >>
> >> 	__raw_writel(cpu_to_be32(val), target);
> >> 	/* plus some suffficent memory ordering */
> >>
> >>  - R.
> >>
> >>
> > 
> > certainly, yes.
> > Most likely the __raw_writel variant is portable, but I am not
> > sure. Anybody sure?
> 
> Yes, it's portable.  You must however be aware of the guarantees that 
> writel() provides and __raw_writel() does not:  no barriers or flushes, 
> no endian conversions, no ordering constraints, ...  Probably a few more 
> details I'm forgetting too :)

writel doesn't guarantee flushing either.
readl does.
The barrier/ordering issue however might be a critical thing,
when using __raw_XXX. So one must always mmiowb() after such a write.

-- 
Greetings Michael.

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Roland Dreier @ 2007-08-08 16:46 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Jeff Garzik, Andi Kleen, ggrundstrom, ewg, netdev
In-Reply-To: <200708081843.36370.mb@bu3sch.de>

 > The barrier/ordering issue however might be a critical thing,
 > when using __raw_XXX. So one must always mmiowb() after such a write.

Not mmiowb() -- that is for ordering between CPUs, eg on systems like
Altix where PCI transactions might get reordered in the system fabric
before reaching the PCI bus.

You need a full wmb() to order between __raw_writel()s.

 - R.

^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: akepner @ 2007-08-08 16:57 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Michael Buesch, Jeff Garzik, Andi Kleen, ggrundstrom, ewg, netdev
In-Reply-To: <adair7qotyf.fsf@cisco.com>

On Wed, Aug 08, 2007 at 09:46:16AM -0700, Roland Dreier wrote:

> ....
> Not mmiowb() -- that is for ordering between CPUs, eg on systems like
> Altix where PCI transactions might get reordered in the system fabric
> before reaching the PCI bus.
> 

Yes, that's right. This is a continual source of confusion. 
FWIW, Jesse Barnes documented mmiowb() in 
Documentation/DocBook/deviceiobook.tmpl 

-- 
Arthur


^ permalink raw reply

* Re: [PATCH 2/14] nes: device structures and defines
From: Jeff Garzik @ 2007-08-08 16:59 UTC (permalink / raw)
  To: Michael Buesch; +Cc: Roland Dreier, Andi Kleen, ggrundstrom, ewg, netdev
In-Reply-To: <200708081843.36370.mb@bu3sch.de>

Michael Buesch wrote:
> writel doesn't guarantee flushing either.
> readl does.


Not quite -- there are multiple kinds of flushing.  You're thinking 
about flushing across PCI bridges, which is correct, but you also have 
CPU write posting and CPU write ordering and such.

Without taking all that into account, you might be tempted to think that 
__raw_readl() will perform all flushes necessary following a 
__raw_writel() -- but that would be incorrect.

	Jeff



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox