* Re: [REGRESSION] tg3 dead after s2ram
From: Michael Chan @ 2007-08-01 21:00 UTC (permalink / raw)
To: Joachim Deguara
Cc: Andrew Morton, lkml List, Michal Piotrowski, netdev, linux-acpi
In-Reply-To: <1185990466.5552.34.camel@dell>
On Wed, 2007-08-01 at 10:47 -0700, Michael Chan wrote:
> You have 2 Broadcom devices in your system. 07:00.0 is a wireless
> device, I think. 8:4.0 is the tg3 device.
>
> It's clear that the tg3 device is still in D3 state after resume and
> that explains why all register accesses fail. tg3_resume() should put
> the device back in D0 state in a very straight forward way and I don't
> see how that can fail. It worked for me when I tested it last night.
> Can you add some printk() to tg3_resume() to see what's happening? Let
> me know if you want me to send you some debug patches to do that.
I misread the PCI registers below. The power state was ok.
The problem is that memory enable and bus master were not set in PCI
register 4 after resume. This also explains the register access
failures.
In tg3_resume(), we call pci_restore_state() which should re-enable
those 2 bits in PCI register 4. Can you add some printk() to see why
those bits are not restored after pci_restore_state()?
Thanks.
>
> Here's the before and after for 8:4.0:
>
> >
> > Before
> >
> > 08:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit
> > Ethernet (rev 03)
> > Subsystem: Acer Incorporated [ALI] Unknown device 010e
> > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
> > SERR- FastB2B-
> > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
> > <MAbort- >SERR- <PERR-
> > Latency: 0 (16000ns min)
> > Interrupt: pin A routed to IRQ 22
> > Region 0: Memory at d0300000 (32-bit, non-prefetchable) [size=64K]
> > Expansion ROM at <ignored> [disabled]
> > Capabilities: [48] Power Management version 2
> > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
> > Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> > Capabilities: [50] Vital Product Data
> > Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
> > Enable-
> > Address: ffff6bfe7fffffb8 Data: fec7
> > 00: e4 14 9c 16 06 00 b0 02 03 00 00 02 00 00 00 00
> > 10: 00 00 30 d0 00 00 00 00 00 00 00 00 00 00 00 00
> > 20: 00 00 00 00 00 00 00 00 07 00 00 00 25 10 0e 01
> > 30: 00 00 ff ff 48 00 00 00 00 00 00 00 0a 01 40 00
> > 40: 00 00 00 00 00 00 00 00 01 50 02 c0 00 20 00 00
> > 50: 03 58 fc 00 6f bf be 7f 05 00 86 00 b8 ff ff 7f
> > 60: fe 6b ff ff c7 fe 00 00 98 00 03 30 00 00 3f 76
> > 70: f6 10 00 00 20 00 00 80 14 04 00 00 00 00 00 00
> > 80: 85 6b d0 36 03 40 00 0c 34 00 13 04 82 90 09 04
> > 90: 09 97 00 01 00 00 00 00 00 00 00 00 eb 01 00 00
> > a0: 00 00 00 00 23 01 00 00 00 00 00 00 cb 00 00 00
> > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >
> > After
> >
> > 08:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit
> > Ethernet (rev 03)
> > Subsystem: Acer Incorporated [ALI] Unknown device 010e
> > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
> > SERR- FastB2B-
> > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort-
> > <MAbort- >SERR- <PERR-
> > Interrupt: pin A routed to IRQ 22
> > Region 0: Memory at d0300000 (32-bit, non-prefetchable) [disabled] [size=64K]
> > Expansion ROM at <ignored> [disabled]
> > Capabilities: [48] Power Management version 2
> > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
> > Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
> > Capabilities: [50] Vital Product Data
> > Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
> > Enable-
> > Address: ffff6bfe7fffffb8 Data: fec7
> > 00: e4 14 9c 16 00 00 b0 02 03 00 00 02 00 00 00 00
> > 10: 00 00 30 d0 00 00 00 00 00 00 00 00 00 00 00 00
> > 20: 00 00 00 00 00 00 00 00 07 00 00 00 25 10 0e 01
> > 30: 00 00 ff ff 48 00 00 00 00 00 00 00 0a 01 40 00
> > 40: 00 00 00 00 00 00 00 00 01 50 02 c0 00 21 00 64
> > 50: 03 58 fc 00 6f bf be 7f 05 00 86 00 b8 ff ff 7f
> > 60: fe 6b ff ff c7 fe 00 00 9a 00 03 30 00 00 00 00
> > 70: 76 10 00 00 40 00 00 00 50 00 00 00 00 00 00 00
> > 80: 03 58 fc 00 00 00 00 00 00 00 00 00 fe 90 09 04
> > 90: 01 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >
^ permalink raw reply
* Re: [RFC][BNX2X]: New driver for Broadcom 10Gb Ethernet.
From: Michael Buesch @ 2007-08-01 22:06 UTC (permalink / raw)
To: Michael Chan; +Cc: davem, jeff, netdev, eliezert, lusinsky, eilong
In-Reply-To: <1185957077.5552.22.camel@dell>
On Wednesday 01 August 2007 10:31:17 Michael Chan wrote:
> +typedef struct {
> + u8 reserved[64];
> +} license_key_t;
No typedef.
What is a "license key" used for, anyway?
> +#define RUN_AT(x) (jiffies + (x))
That macro does only obfuscate code, in my opinion.
If you want jiffies+x, better opencode it.
> +typedef enum {
> + BCM5710 = 0,
> +} board_t;
No typedef. Do
enum bnx2x_board {
BCM5710 = 0,
};
Or something like that.
> +static struct pci_device_id bnx2x_pci_tbl[] = {
static const struct...
> + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_5710,
> + PCI_ANY_ID, PCI_ANY_ID, 0, 0, BCM5710 },
> + { 0 }
> +};
> +static inline u32 bnx2x_bits_en(struct bnx2x *bp, u32 block, u32 reg,
> + u32 bits)
Does that really need to be inline? I'd suggest dropping inline.
> +{
> + u32 val = REG_RD(bp, block, reg);
> +
> + val |= bits;
> + REG_WR(bp, block, reg, val);
> + return val;
> +}
> +
> +static inline u32 bnx2x_bits_dis(struct bnx2x *bp, u32 block, u32 reg,
> + u32 bits)
Same here.
> +{
> + u32 val = REG_RD(bp, block, reg);
> +
> + val &= ~bits;
> + REG_WR(bp, block, reg, val);
> + return val;
> +}
> +
> +static int bnx2x_mdio22_write(struct bnx2x *bp, u32 reg, u32 val)
> +{
> + int rc;
> + u32 tmp, i;
> + int port = bp->port;
> + u32 emac_base = port ? GRCBASE_EMAC1 : GRCBASE_EMAC0;
> +
> + if (bp->phy_flags & PHY_INT_MODE_AUTO_POLLING_FLAG) {
> +
> + tmp = REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_MODE);
> + tmp &= ~EMAC_MDIO_MODE_AUTO_POLL;
> + EMAC_WR(EMAC_REG_EMAC_MDIO_MODE, tmp);
> + REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_MODE);
> + udelay(40);
> + }
> +
> + tmp = ((bp->phy_addr << 21) | (reg << 16) |
> + (val & EMAC_MDIO_COMM_DATA) |
> + EMAC_MDIO_COMM_COMMAND_WRITE_22 |
> + EMAC_MDIO_COMM_START_BUSY);
> + EMAC_WR(EMAC_REG_EMAC_MDIO_COMM, tmp);
> +
> + for (i = 0; i < 50; i++) {
> + udelay(10);
> +
> + tmp = REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_COMM);
> + if (!(tmp & EMAC_MDIO_COMM_START_BUSY)) {
> + udelay(5);
> + break;
> + }
> + }
> +
> + if (tmp & EMAC_MDIO_COMM_START_BUSY) {
> + BNX2X_ERR("write phy register failed\n");
> +
> + rc = -EBUSY;
> + } else {
> + rc = 0;
> + }
> +
> + if (bp->phy_flags & PHY_INT_MODE_AUTO_POLLING_FLAG) {
> +
> + tmp = REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_MODE);
> + tmp |= EMAC_MDIO_MODE_AUTO_POLL;
> + EMAC_WR(EMAC_REG_EMAC_MDIO_MODE, tmp);
> + }
> +
> + return rc;
> +}
> +
> +static int bnx2x_mdio22_read(struct bnx2x *bp, u32 reg, u32 *ret_val)
> +{
> + int rc;
> + u32 val, i;
> + int port = bp->port;
> + u32 emac_base = port ? GRCBASE_EMAC1 : GRCBASE_EMAC0;
> +
> + if (bp->phy_flags & PHY_INT_MODE_AUTO_POLLING_FLAG) {
> +
> + val = REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_MODE);
> + val &= ~EMAC_MDIO_MODE_AUTO_POLL;
> + EMAC_WR(EMAC_REG_EMAC_MDIO_MODE, val);
> + REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_MODE);
> + udelay(40);
> + }
> +
> + val = ((bp->phy_addr << 21) | (reg << 16) |
> + EMAC_MDIO_COMM_COMMAND_READ_22 |
> + EMAC_MDIO_COMM_START_BUSY);
> + EMAC_WR(EMAC_REG_EMAC_MDIO_COMM, val);
> +
> + for (i = 0; i < 50; i++) {
> + udelay(10);
> +
> + val = REG_RD(bp, emac_base, EMAC_REG_EMAC_MDIO_COMM);
> + if (!(val & EMAC_MDIO_COMM_START_BUSY)) {
No udelay(5) here, like in write above?
> + val &= EMAC_MDIO_COMM_DATA;
> + break;
> + }
> + }
> +static int bnx2x_mdio45_vwrite(struct bnx2x *bp, u32 reg, u32 addr, u32 val)
> +{
> + int i;
> + u32 rd_val;
> +
> + for (i = 0; i < 10; i++) {
> + bnx2x_mdio45_write(bp, reg, addr, val);
> + mdelay(5);
Can you msleep(5) here?
> + bnx2x_mdio45_read(bp, reg, addr, &rd_val);
> + /* if the read value is not the same as the value we wrote,
> + we should write it again */
> + if (rd_val == val) {
> + return 0;
> + }
> + }
> + BNX2X_ERR("MDIO write in CL45 failed\n");
> + return -EBUSY;
> +}
> +/* DMAE command positions used
> + * Port0 14
> + * Port1 15
> + */
> +static void bnx2x_wb_write_dmae(struct bnx2x *bp, u32 wb_addr, u32 *wb_write,
> + u32 wb_len)
> +{
> + struct dmae_command *dmae = &bp->dmae;
> + int port = bp->port;
> + u32 *wb_comp = bnx2x_sp(bp, wb_comp);
> +
> + memcpy(bnx2x_sp(bp, wb_write[0]), wb_write, wb_len * 4);
> + memset(dmae, 0, sizeof(struct dmae_command));
> +
> + dmae->opcode = (DMAE_CMD_SRC_PCI | DMAE_CMD_DST_GRC |
> + DMAE_CMD_C_DST_PCI | DMAE_CMD_C_ENABLE |
> + DMAE_CMD_SRC_RESET | DMAE_CMD_DST_RESET |
> + DMAE_CMD_ENDIANITY_DW_SWAP |
> + (port? DMAE_CMD_PORT_1 : DMAE_CMD_PORT_0));
> + dmae->src_addr_lo = U64_LO(bnx2x_sp_mapping(bp, wb_write));
> + dmae->src_addr_hi = U64_HI(bnx2x_sp_mapping(bp, wb_write));
> + dmae->dst_addr_lo = wb_addr >> 2;
> + dmae->dst_addr_hi = 0;
> + dmae->len = wb_len;
> + dmae->comp_addr_lo = U64_LO(bnx2x_sp_mapping(bp, wb_comp));
> + dmae->comp_addr_hi = U64_HI(bnx2x_sp_mapping(bp, wb_comp));
> + dmae->comp_val = BNX2X_WB_COMP_VAL;
> + bnx2x_post_dmae(bp, port? 15 : 14);
> +
> + *wb_comp = 0;
wmb();
> + REG_WR32(bp, GRCBASE_DMAE,
> + (bp->port? DMAE_REGISTERS_GO_C15 :
> + DMAE_REGISTERS_GO_C14), 1);
> + udelay(5);
> + while (*wb_comp != BNX2X_WB_COMP_VAL) {
> + udelay(5);
Not sure what this is doing here.
Do we need some DMA syncing here?
Anyway, we _do_ need a timeout here.
Probably also some kind of memory barrier, if no DMA syncing is needed.
> + }
> +}
> +static void bnx2x_emac_enable(struct bnx2x *bp)
> +{
> + u32 val;
> + int port = bp->port;
> + u32 emac_base = port ? GRCBASE_EMAC1 : GRCBASE_EMAC0;
> + while (val & EMAC_MODE_RESET) {
> + val = REG_RD(bp, emac_base, EMAC_REG_EMAC_MODE);
> + DP(NETIF_MSG_LINK, "EMAC reset reg is %u\n", val);
> + }
A timeout, please.
> + /* reset tx part */
> + EMAC_WR(EMAC_REG_EMAC_TX_MODE, EMAC_TX_MODE_RESET);
> +
> + while (val & EMAC_TX_MODE_RESET) {
> + val = REG_RD(bp, emac_base, EMAC_REG_EMAC_TX_MODE);
> + DP(NETIF_MSG_LINK, "EMAC reset reg is %u\n", val);
> + }
A timeout, please.
> +static void bnx2x_pbf_update(struct bnx2x *bp)
> +{
> + int port = bp->port;
> + u32 init_crd, crd;
> + u32 count = 1000;
> + u32 pause = 0;
> +
> + /* disable port */
> + REG_WR(bp, GRCBASE_PBF,
> + PBF_REGISTERS_DISABLE_NEW_TASK_PROC_P0 + port*4, 0x1);
> +
> + /* wait for init credit */
> + init_crd = REG_RD(bp, GRCBASE_PBF,
> + PBF_REGISTERS_P0_INIT_CRD + port*4);
> + crd = REG_RD(bp, GRCBASE_PBF, PBF_REGISTERS_P0_CREDIT + port*8);
> + DP(NETIF_MSG_LINK, "init_crd 0x%x crd 0x%x\n", init_crd, crd);
> +
> + while ((init_crd != crd) && count) {
> + mdelay(5);
msleep(5)?
> + crd = REG_RD(bp, GRCBASE_PBF,
> + PBF_REGISTERS_P0_CREDIT + port*8);
> + count--;
> + }
> + /* probe the credit changes */
> + REG_WR(bp, GRCBASE_PBF, PBF_REGISTERS_INIT_P0 + port*4, 0x1);
> + mdelay(5);
msleep(5)?
> + REG_WR(bp, GRCBASE_PBF, PBF_REGISTERS_INIT_P0 + port*4, 0x0);
> +
> + /* enable port */
> + REG_WR(bp, GRCBASE_PBF,
> + PBF_REGISTERS_DISABLE_NEW_TASK_PROC_P0 + port*4, 0x0);
> +}
> +/* This function is called upon link interrupt */
> +static void bnx2x_link_update(struct bnx2x *bp)
> +{
> + u32 gp_status;
> + int port = bp->port;
> + int i;
> + int link_10g;
> +
> + DP(NETIF_MSG_LINK, "port %x, is xgxs %x, stat_mask 0x%x, "
> + "int_mask 0x%x, saved_mask 0x%x, MI_INT %x, SERDES_LINK %x,"
> + " 10G %x, XGXS_LINK %x\n",
> + port, (bp->phy_flags & PHY_XGSX_FLAG),
> + REG_RD(bp, GRCBASE_NIG,
> + NIG_REGISTERS_STATUS_INTERRUPT_PORT0 + port*4),
> + REG_RD(bp, GRCBASE_NIG,
> + NIG_REGISTERS_MASK_INTERRUPT_PORT0 + port*4),
> + bp->nig_mask,
> + REG_RD(bp, GRCBASE_NIG,
> + NIG_REGISTERS_EMAC0_STATUS_MISC_MI_INT + port*0x18),
> + REG_RD(bp, GRCBASE_NIG,
> + NIG_REGISTERS_SERDES0_STATUS_LINK_STATUS + port*0x3c),
> + REG_RD(bp, GRCBASE_NIG,
> + NIG_REGISTERS_XGXS0_STATUS_LINK10G + port*0x68),
> + REG_RD(bp, GRCBASE_NIG,
> + NIG_REGISTERS_XGXS0_STATUS_LINK_STATUS + port*0x68)
> + );
> +
> + MDIO_SET_REG_BANK(bp, MDIO_REG_BANK_GP_STATUS);
> + /* avoid fast toggling */
> + for (i = 0 ; i < 10 ; i++) {
> + mdelay(10);
msleep(10)?
> + bnx2x_mdio22_read(bp, MDIO_GP_STATUS_TOP_AN_STATUS1,
> + &gp_status);
> + }
> +
> + bnx2x_link_settings_status(bp, gp_status);
> +
> + /* anything 10 and over uses the bmac */
> + link_10g = ((bp->line_speed >= SPEED_10000) &&
> + (bp->line_speed <= SPEED_16000));
> +
> + bnx2x_link_int_ack(bp, link_10g);
> +
> + /* link is up only if both local phy and external phy are up */
> + if (bp->link_up && bnx2x_ext_phy_is_link_up(bp)) {
> + if (link_10g) {
> + bnx2x_bmac_enable(bp, 0);
> + bnx2x_leds_set(bp, SPEED_10000);
> +
> + } else {
> + bnx2x_emac_enable(bp);
> + bnx2x_emac_program(bp);
> +
> + /* AN complete? */
> + if (gp_status & MDIO_AN_CL73_OR_37_COMPLETE) {
> + if (!(bp->phy_flags & PHY_SGMII_FLAG)) {
> + bnx2x_set_sgmii_tx_driver(bp);
> + }/* Not SGMII */
> + }
> + }
> + bnx2x_link_up(bp);
> +
> + } else { /* link down */
> + bnx2x_leds_unset(bp);
> + bnx2x_link_down(bp);
> + }
> +}
> +static inline u16 bnx2x_update_dsb_idx(struct bnx2x *bp)
Too big for inlining.
If this func is only used in one place, gcc will inline it automatically.
> +{
> + struct host_def_status_block *dsb = bp->def_status_blk;
> + u16 rc = 0;
> +
> + if (bp->def_att_idx != dsb->atten_status_block.attn_bits_index) {
> + bp->def_att_idx = dsb->atten_status_block.attn_bits_index;
> + rc |= 1;
> + }
> + if (bp->def_c_idx != dsb->c_def_status_block.status_block_index) {
> + bp->def_c_idx = dsb->c_def_status_block.status_block_index;
> + rc |= 2;
> + }
> + if (bp->def_u_idx != dsb->u_def_status_block.status_block_index) {
> + bp->def_u_idx = dsb->u_def_status_block.status_block_index;
> + rc |= 4;
> + }
> + if (bp->def_x_idx != dsb->x_def_status_block.status_block_index) {
> + bp->def_x_idx = dsb->x_def_status_block.status_block_index;
> + rc |= 8;
> + }
> + if (bp->def_t_idx != dsb->t_def_status_block.status_block_index) {
> + bp->def_t_idx = dsb->t_def_status_block.status_block_index;
> + rc |= 16;
> + }
> + rmb(); /* TBD chack this */
> + return rc;
> +}
> +
> +static inline u16 bnx2x_update_fpsb_idx(struct bnx2x_fastpath *fp)
Too big for inlining.
> +{
> + struct host_status_block *fpsb = fp->status_blk;
> + u16 rc = 0;
> +
> + if (fp->fp_c_idx != fpsb->c_status_block.status_block_index) {
> + fp->fp_c_idx = fpsb->c_status_block.status_block_index;
> + rc |= 1;
> + }
> + if (fp->fp_u_idx != fpsb->u_status_block.status_block_index) {
> + fp->fp_u_idx = fpsb->u_status_block.status_block_index;
> + rc |= 2;
> + }
> + rmb(); /* TBD chack this */
> + return rc;
> +}
> +
> +static inline u32 bnx2x_tx_avail(struct bnx2x_fastpath *fp)
Too big for inlining.
> +{
> + u16 used;
> + u32 prod = fp->tx_bd_prod;
> + u32 cons = fp->tx_bd_cons;
> +
> + smp_mb();
This barrier needs a comment. Why is it there? And why SMP only?
> +
> + used = (NUM_TX_BD - NUM_TX_RINGS + prod - cons +
> + (cons / TX_DESC_CNT) - (prod / TX_DESC_CNT));
> +
> + if (prod >= cons) {
> + /* used = prod - cons - prod/size + cons/size */
> + used -= NUM_TX_BD - NUM_TX_RINGS;
> + }
> +
> + BUG_TRAP(used <= fp->bp->tx_ring_size);
> + BUG_TRAP((fp->bp->tx_ring_size - used) <= MAX_TX_AVAIL);
> + return(fp->bp->tx_ring_size - used);
> +}
> +/*========================================================================*/
> +
> +/* acquire split MCP access lock register */
> +static int bnx2x_lock_alr( struct bnx2x *bp)
> +{
> + int rc = 0;
> + u32 i, j, val;
> +
> + i = 100;
> + val = 1UL << 31;
> +
> + REG_WR(bp, GRCBASE_MCP, 0x9c, val);
> + for (j = 0; j < i*10; j++) {
> + val = REG_RD(bp, GRCBASE_MCP, 0x9c);
> + if (val & (1L << 31)) {
> + break;
> + }
> +
> + mdelay(5);
msleep(5)?
> + }
> +
> + if (!(val & (1L << 31))) {
> + BNX2X_ERR("Cannot acquire nvram interface.\n");
> +
> + rc = -EBUSY;
> + }
> +
> + return rc;
> +}
> +static void bnx2x_free_mem(struct bnx2x *bp)
> +{
> +
> +#define BNX2X_PCI_FREE(x, y, size) do { \
> + if (x) { \
> + pci_free_consistent(bp->pdev, size, x, y); \
> + x = NULL; \
> + } \
> +} while (0)
> +
> +#define BNX2X_KFREE(x) do { \
> + if (x) { \
> + vfree(x); \
> + x = NULL; \
> + } \
> +} while (0)
> +
> + int i;
> +
> + /* fastpath */
> + for_each_queue(bp, i) {
> +
> + /* Status blocks */
> + BNX2X_PCI_FREE(bnx2x_fp(bp, i, status_blk),
> + bnx2x_fp(bp, i, status_blk_mapping),
> + sizeof(struct host_status_block)
> + + sizeof(struct eth_tx_db_data));
> +
> + /* fast path rings: tx_buf tx_desc rx_buf rx_desc rx_comp */
> + BNX2X_KFREE(bnx2x_fp(bp, i, tx_buf_ring));
> + BNX2X_PCI_FREE(bnx2x_fp(bp, i, tx_desc_ring),
> + bnx2x_fp(bp, i, tx_desc_mapping),
> + sizeof(struct eth_tx_bd) * NUM_TX_BD);
> +
> + BNX2X_KFREE(bnx2x_fp(bp, i, rx_buf_ring));
> + BNX2X_PCI_FREE(bnx2x_fp(bp, i, rx_desc_ring),
> + bnx2x_fp(bp, i, rx_desc_mapping),
> + sizeof(struct eth_rx_bd) * NUM_RX_BD);
> +
> + BNX2X_PCI_FREE(bnx2x_fp(bp, i, rx_comp_ring),
> + bnx2x_fp(bp, i, rx_comp_mapping),
> + sizeof(struct eth_rx_bd) * NUM_RX_BD);
> + }
> +
> + BNX2X_KFREE( bp->fp);
> + /* end of fast path*/
> +
> + BNX2X_PCI_FREE(bp->def_status_blk, bp->def_status_blk_mapping,
> + (sizeof(struct host_def_status_block)));
> +
> + BNX2X_PCI_FREE(bp->slowpath, bp->slowpath_mapping,
> + (sizeof(struct bnx2x_slowpath)));
> +
> + if (iscsi_active) {
> + BNX2X_PCI_FREE(bp->t1, bp->t1_mapping, 64*1024);
> + BNX2X_PCI_FREE(bp->t2, bp->t2_mapping, 16*1024);
> + BNX2X_PCI_FREE(bp->timers, bp->timers_mapping, 8*1024);
> + BNX2X_PCI_FREE(bp->qm, bp->qm_mapping, 128*1024);
> + }
> +
> + BNX2X_PCI_FREE(bp->spq, bp->spq_mapping, PAGE_SIZE);
> +
> +#undef BNX2X_PCI_FREE
> +#undef BNX2X_KFREE
> +}
> +
> +static int bnx2x_alloc_mem(struct bnx2x *bp)
> +{
> + int i;
> +
> +#define BNX2X_PCI_ALLOC(x, y, size) do { \
> + x = pci_alloc_consistent(bp->pdev, size, y); \
> + if (x == NULL) \
> + goto alloc_mem_err; \
> + memset(x, 0, size); \
> +} while (0)
> +
> +#define BNX2X_ALLOC(x, size) do { \
> + x = vmalloc(size); \
Why not zalloc()?
> + if (x == NULL) \
> + goto alloc_mem_err; \
> + memset(x, 0, size); \
> +} while (0)
> +
> + /* fastpath */
> + BNX2X_ALLOC(bp->fp, sizeof(struct bnx2x_fastpath)*bp->num_queues);
> +
> + for_each_queue(bp, i) {
> +
> + bnx2x_fp(bp, i, bp) = bp;
> +
> + /* Status blocks */
> + BNX2X_PCI_ALLOC(bnx2x_fp(bp, i, status_blk),
> + &bnx2x_fp(bp, i, status_blk_mapping),
> + sizeof(struct host_status_block) +
> + sizeof(struct eth_tx_db_data));
> +
> + bnx2x_fp(bp, i, tx_prods_mapping) =
> + bnx2x_fp(bp, i, status_blk_mapping) +
> + sizeof(struct host_status_block);
> +
> + bnx2x_fp(bp, i, hw_tx_prods) =
> + (void *)(bnx2x_fp(bp, i, status_blk) + 1);
> +
> + /* fast path rings: tx_buf tx_desc rx_buf rx_desc rx_comp */
> + BNX2X_ALLOC(bnx2x_fp(bp, i, tx_buf_ring),
> + sizeof(struct sw_tx_bd) * NUM_TX_BD);
> + BNX2X_PCI_ALLOC(bnx2x_fp(bp, i, tx_desc_ring),
> + &bnx2x_fp(bp, i, tx_desc_mapping),
> + sizeof(struct eth_tx_bd) * NUM_TX_BD);
> +
> + BNX2X_ALLOC(bnx2x_fp(bp, i, rx_buf_ring),
> + sizeof(struct sw_rx_bd) * NUM_RX_BD);
> + BNX2X_PCI_ALLOC(bnx2x_fp(bp, i, rx_desc_ring),
> + &bnx2x_fp(bp, i, rx_desc_mapping),
> + sizeof(struct eth_rx_bd) * NUM_RX_BD);
> +
> + BNX2X_PCI_ALLOC(bnx2x_fp(bp, i, rx_comp_ring),
> + &bnx2x_fp(bp, i, rx_comp_mapping),
> + sizeof(struct eth_rx_bd) * NUM_RX_BD);
> + }
> + /* end of fast path*/
> +
> + BNX2X_PCI_ALLOC(bp->def_status_blk, &bp->def_status_blk_mapping,
> + sizeof(struct host_def_status_block));
> +
> + BNX2X_PCI_ALLOC(bp->slowpath, &bp->slowpath_mapping,
> + sizeof(struct bnx2x_slowpath));
> +
> + if (iscsi_active) {
> + BNX2X_PCI_ALLOC(bp->t1, &bp->t1_mapping, 64*1024);
> +
> + /* Initialize T1 */
> + for (i = 0; i < 64*1024; i += 64) {
> + *(u64 *)((char *)bp->t1 + i+56) = 0x0UL;
> + *(u64 *)((char *)bp->t1 + i+3) = 0x0UL;
> + }
> +
> + /* allocate searcher T2 table
> + we allocate 1/4 of alloc num for T2
> + (which is not entered into the ILT) */
> + BNX2X_PCI_ALLOC(bp->t2, &bp->t2_mapping, 16*1024);
> +
> + /* Initialize T2 */
> + for (i = 0; i < 16*1024; i += 64) {
> + *(u64 *)((char *)bp->t2 + i+56) =
> + bp->t2_mapping + i + 64;
> + }
> +
> + /* now sixup the last line in the block
> + * to point to the next block
> + */
> + *(u64 *)((char *)bp->t2 + 1024*16-8) = bp->t2_mapping;
> +
> + /* Timer block array (MAX_CONN*8)
> + * phys uncached for now 1024 conns
> + */
> + BNX2X_PCI_ALLOC(bp->timers, &bp->timers_mapping, 8*1024);
> +
> + /* QM queues (128*MAX_CONN) */
> + BNX2X_PCI_ALLOC(bp->qm, &bp->qm_mapping, 128*1024);
> + }
> +
> + /* Slow path ring */
> + BNX2X_PCI_ALLOC(bp->spq, &bp->spq_mapping, PAGE_SIZE);
> +
> + return 0;
> +
> +alloc_mem_err:
> + bnx2x_free_mem(bp);
> + return -ENOMEM;
> +
> +#undef BNX2X_PCI_ALLOC
> +#undef BNX2X_ALLOC
> +}
> +
> +
> +
> +static void bnx2x_set_mac_addr(struct bnx2x *bp)
> +{
> + struct mac_configuration_cmd *config = bnx2x_sp(bp, mac_config);
> +
> + /* CAM allocation
> + * unicasts 0-31:port0 32-63:port1
> + * multicast 64-127:port0 128-191:port1
> + */
> + config->hdr.length = 2;
> + config->hdr.offset = bp->port ? 31 : 0;
> + config->hdr.reserved0 = 0;
> + config->hdr.reserved1 = 0;
> +
> + /* primary MAC */
> + config->config_table[0].cam_entry.msb_mac_addr =
> + ntohl(*(u32 *)&bp->dev->dev_addr[0]);
> + config->config_table[0].cam_entry.lsb_mac_addr =
> + ntohs(*(u16 *)&bp->dev->dev_addr[4]);
Better use xx_to_cpu()
anyway, ntohX is defined as beX_to_cpu().
Are you sure this is correct? A byte array is little endian.
> +static inline int bnx2x_alloc_rx_skb(struct bnx2x *bp,
> + struct bnx2x_fastpath *fp, u16 index)
Too big for inlining.
If this is only called in one place, gcc will inline it automatically.
> +{
> + struct sk_buff *skb;
> + struct sw_rx_bd *rx_buf = &fp->rx_buf_ring[index];
> + struct eth_rx_bd *rxbd = &fp->rx_desc_ring[index];
> + dma_addr_t mapping;
> +
> + skb = netdev_alloc_skb(bp->dev, bp->rx_buf_size);
> +
> + if (skb == NULL) {
> + return -ENOMEM;
> + }
> +
> + mapping = pci_map_single(bp->pdev, skb->data, bp->rx_buf_use_size,
> + PCI_DMA_FROMDEVICE);
You need to check if the mapping succeed.
There's a macro for that: dma_mapping_error()
> + rx_buf->skb = skb;
> + pci_unmap_addr_set(rx_buf, mapping, mapping);
> +#ifdef BNX2X_STOP_ON_ERROR
> + rxbd->reserved = fp->last_alloc++;
> +#else
> + rxbd->reserved = 0;
> +#endif
> + rxbd->len = bp->rx_buf_use_size;
> + rxbd->addr_hi = U64_HI(mapping);
> + rxbd->addr_lo = U64_LO(mapping);
> + return 0;
> +}
> +static void bnx2x_tx_int(struct bnx2x_fastpath *fp, int work)
> +{
> + struct bnx2x *bp = fp->bp;
> + u16 hw_cons, sw_cons, bd_cons = fp->tx_bd_cons;
> + int done = 0;
> +
> + hw_cons = *fp->tx_cons_sb;
> + sw_cons = fp->tx_pkt_cons;
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + if (unlikely(bp->panic)) {
> + return;
> + }
> +#endif
> +
> + while (sw_cons != hw_cons) {
> + u16 pkt_cons;
> +
> + pkt_cons = TX_BD(sw_cons);
> +
> + /* prefetch(bp->tx_buf_ring[pkt_cons].skb); */
> +
> + DP(NETIF_MSG_TX_DONE, "sw_cons=%u hw_cons=%u pkt_cons=%d\n",
> + sw_cons, hw_cons, pkt_cons);
> +
> +/* if (NEXT_TX_IDX(sw_cons)!=hw_cons){
> + rmb();
> + prefetch(fp->tx_buf_ring[NEXT_TX_IDX(sw_cons)].skb);
> + }
> +*/
> +
> + bd_cons = bnx2x_free_tx_pkt(bp, fp, pkt_cons);
> + sw_cons++;
> + done++;
> +
> + if (done == work) {
> + break;
> + }
> + }
> +
> + fp->tx_pkt_cons = sw_cons;
> + fp->tx_bd_cons = bd_cons;
> +
> + smp_mb();
Please add a comment why we need a SMP MB here.
> +
> + /* TBD need a thresh? */
> + if (unlikely(netif_queue_stopped(bp->dev))) {
> + netif_tx_lock(bp->dev);
> + if ((netif_queue_stopped(bp->dev)) &&
> + (bnx2x_tx_avail(fp) >= MAX_SKB_FRAGS + 3)) {
> + netif_wake_queue(bp->dev);
> + }
> + netif_tx_unlock(bp->dev);
> + }
> +}
> +static inline void bnx2x_reuse_rx_skb(struct bnx2x_fastpath *fp,
> + struct sk_buff *skb, u16 cons, u16 prod)
Too big for inlining.
> +{
> + struct bnx2x *bp = fp->bp;
> + struct sw_rx_bd *cons_rx_buf = &fp->rx_buf_ring[cons];
> + struct sw_rx_bd *prod_rx_buf = &fp->rx_buf_ring[prod];
> + struct eth_rx_bd *cons_bd = &fp->rx_desc_ring[cons];
> + struct eth_rx_bd *prod_bd = &fp->rx_desc_ring[prod];
> +
> + pci_dma_sync_single_for_device(bp->pdev,
> + pci_unmap_addr(cons_rx_buf, mapping),
> + bp->rx_offset + RX_COPY_THRESH,
> + PCI_DMA_FROMDEVICE);
> +
> + prod_rx_buf->skb = cons_rx_buf->skb;
> + pci_unmap_addr_set(prod_rx_buf, mapping,
> + pci_unmap_addr(cons_rx_buf, mapping));
> + *prod_bd = *cons_bd;
> +#ifdef BNX2X_STOP_ON_ERROR
> + prod_bd->reserved = fp->last_alloc++;
> +#endif
> +}
> +static void bnx2x_attn_int(struct bnx2x *bp)
> +{
> + int port = bp->port;
> +
> + /* read local copy of bits */
> + u16 attn_bits = bp->def_status_blk->atten_status_block.attn_bits;
> + u16 attn_ack = bp->def_status_blk->atten_status_block.attn_bits_ack;
> + u16 attn_state = bp->attn_state;
> +
> + /* look for changed bits */
> + u16 asserted = attn_bits & ~attn_ack & ~attn_state;
> + u16 deasserted = ~attn_bits & attn_ack & attn_state;
Do you probably want | instead of & here?
> +
> + DP(NETIF_MSG_HW,
> + "attn_bits %x, attn_ack %x, asserted %x, deasserted %x \n",
> + attn_bits, attn_ack, asserted, deasserted);
> +
> + if (~(attn_bits ^ attn_ack) & (attn_bits ^ attn_state)) {
Do you want && or | here, instead of & ?
> + BNX2X_ERR("bad attention state\n");
> + }
> +}
> +
> +static inline int bnx2x_has_work(struct bnx2x_fastpath *fp)
Probably better not inline.
> +{
> + u16 rx_cons_sb = *fp->rx_cons_sb;
> +
> + if ((rx_cons_sb & MAX_RX_DESC_CNT) == MAX_RX_DESC_CNT)
> + rx_cons_sb++;
> +
> + if ((rx_cons_sb != fp->rx_comp_cons) ||
> + (*fp->tx_cons_sb != fp->tx_pkt_cons))
> + return 1;
> +
> + return 0;
> +}
> +static irqreturn_t bnx2x_msix_sp_int(int irq, void *dev_instance)
> +{
> + struct net_device *dev = dev_instance;
You need to check if dev==NULL and bail out.
Another driver sharing the IRQ with this might choose to pass the dev
pointer as NULL.
> + struct bnx2x *bp = netdev_priv(dev);
No check if the device actually _did_ generate the IRQ? Sharing...
> + /* Return here if interrupt is disabled. */
> + if (unlikely(atomic_read(&bp->intr_sem) != 0)) {
> + DP(NETIF_MSG_INTR, "called but int_sem not 0, returning. \n");
> + return IRQ_HANDLED;
> + }
> +
> +
> + bnx2x_ack_sb(bp, 16, XSTORM_ID, 0, IGU_INT_DISABLE, 0);
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + if (unlikely(bp->panic)) {
> + return IRQ_HANDLED;
> + }
> +#endif
> +
> + tasklet_schedule(&bp->sp_task);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t bnx2x_msix_fp_int(int irq, void *fp_cookie)
> +{
> +
> + struct bnx2x_fastpath *fp = fp_cookie;
Check if fp==NULL
> + struct bnx2x *bp = fp->bp;
> + struct net_device *dev = bp->dev;
No share protection either?
> + DP(NETIF_MSG_INTR, "got an msix interrupt on [%d]\n", fp->index);
> + bnx2x_ack_sb(bp, fp->index, USTORM_ID, 0 , IGU_INT_DISABLE, 0);
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + if (unlikely(bp->panic)) {
> + return IRQ_HANDLED;
> + }
> +#endif
> +
> + prefetch(fp->rx_cons_sb);
> + prefetch(fp->tx_cons_sb);
> + prefetch(&fp->status_blk->c_status_block.status_block_index);
> + prefetch(&fp->status_blk->u_status_block.status_block_index);
> + netif_rx_schedule(dev);
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t bnx2x_interrupt(int irq, void *dev_instance)
> +{
> + struct net_device *dev = dev_instance;
Check if dev==NULL
> + struct bnx2x *bp = netdev_priv(dev);
> + u16 status = bnx2x_ack_int(bp);
> +
> + if (unlikely(status == 0)) {
That's not unlikely.
> + DP(NETIF_MSG_INTR, "not our interrupt!\n");
> + return IRQ_NONE;
> + }
> +
> + DP(NETIF_MSG_INTR, "got an interrupt status is %u\n", status);
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + if (unlikely(bp->panic)) {
> + return IRQ_HANDLED;
> + }
> +#endif
> +
> + /* Return here if interrupt is shared and is disabled. */
> + if (unlikely(atomic_read(&bp->intr_sem) != 0)) {
> + DP(NETIF_MSG_INTR, "called but int_sem not 0, returning. \n");
> + return IRQ_HANDLED;
> + }
> +
> + if (status & 0x2) {
> +
> + struct bnx2x_fastpath *fp = &bp->fp[0];
> +
> + prefetch(fp->rx_cons_sb);
> + prefetch(fp->tx_cons_sb);
> + prefetch(&fp->status_blk->c_status_block.status_block_index);
> + prefetch(&fp->status_blk->u_status_block.status_block_index);
> +
> + netif_rx_schedule(dev);
> +
> + status &= ~0x2;
> +
> + if (!status) {
> + return IRQ_HANDLED;
> + }
> + }
> +
> + if (unlikely(status & 0x1)) {
> +
> + tasklet_schedule(&bp->sp_task);
> + status &= ~0x1;
> + if (!status) {
> + return IRQ_HANDLED;
> + }
> + }
> +
> + DP(NETIF_MSG_INTR, "got an unknown interrupt! (status is %u)\n",
> + status);
> +
> + return IRQ_HANDLED;
> +}
> +/* some of the internal memories are not directly readable from the driver
> + to test them we send debug packets
> + */
> +static int bnx2x_int_mem_test(struct bnx2x *bp)
> +{
> + u32 val;
> + u32 count;
> + u8 hw;
> + u8 port;
> + u8 i;
> + u32 factor;
> +
> + switch (CHIP_REV(bp)) {
> + case CHIP_REV_EMUL:
> + hw = INIT_EMULATION;
> + factor = 2000;
> + break;
> + case CHIP_REV_FPGA:
> + hw = INIT_FPGA;
> + factor = 120;
> + break;
> + default:
> + hw = INIT_ASIC;
> + factor = 1;
> + break;
> + }
> +
> + port = PORT0 << bp->port;
> +
> + DP(NETIF_MSG_HW, "mem_wrk start part1\n");
> +
> + /* Disable inputs of parser neighbor blocks */
> + REG_WR(bp, GRCBASE_TSDM, TSDM_REGISTERS_ENABLE_IN1, 0x0);
> + REG_WR(bp, GRCBASE_TCM, TCM_REGISTERS_PRS_IFEN, 0x0);
> + REG_WR(bp, GRCBASE_CFC, CFC_REGISTERS_DEBUG0, 0x1);
> + NIG_WR(NIG_REGISTERS_PRS_REQ_IN_EN, 0x0);
> +
> + /* Write 0 to parser credits for CFC search request */
> + REG_WR(bp, GRCBASE_PRS, PRS_REGISTERS_CFC_SEARCH_INITIAL_CREDIT, 0x0);
> +
> + /* send Ethernet packet */
> + bnx2x_lb_pckt(bp);
> +
> + /* TODO do i reset NIG statistic ?*/
> + /* Wait until NIG register shows 1 packet of size 0x10 */
> + count = 1000;
> + while (count) {
> + val = REG_RD(bp, GRCBASE_NIG, NIG_REGISTERS_STAT2_BRB_OCTET);
> + REG_RD(bp, GRCBASE_NIG, NIG_REGISTERS_STAT2_BRB_OCTET + 4);
> +
> + if (val == 0x10) {
> + break;
> + }
> + msleep(10 * factor);
That would be msleep(20000) for CHIP_REV_EMUL.
besides the sleep being pretty big (does it really need to wait
20 sec?), some arches don't support such a big msleep value.
use ssleep(). It would take 5 and a half hours for the
timeout count to hit in.
> + count--;
> + }
> + if (val != 0x10) {
> + BNX2X_ERR("NIG timeout val = 0x%x\n", val);
> + return -1;
> + }
> +
> + /* Wait until PRS register shows 1 packet */
> + count = 1000;
> + while (count) {
> + val = REG_RD(bp, GRCBASE_PRS, PRS_REGISTERS_NUM_OF_PACKETS);
> +
> + if (val == 0x1) {
> + break;
> + }
> + msleep(10 * factor);
Same here.
> + count--;
> + }
> + if (val != 0x1) {
> + BNX2X_ERR("PRS timeout val = 0x%x\n", val);
> + return -2;
> + }
> + /* Wait until NIG register shows 10 + 1
> + packets of size 11*0x10 = 0xb0 */
> + count = 1000;
> + while (count) {
> +
> + val = REG_RD(bp, GRCBASE_NIG, NIG_REGISTERS_STAT2_BRB_OCTET);
> + REG_RD(bp, GRCBASE_NIG, NIG_REGISTERS_STAT2_BRB_OCTET + 4);
> +
> + if (val == 0xb0) {
> + break;
> + }
> + msleep(10 * factor);
Same here.
> + count--;
> + }
> + if (val != 0xb0) {
> + BNX2X_ERR("NIG timeout val = 0x%x\n", val);
> + return -3;
> + }
> +
> + /* Wait until PRS register shows 2 packet */
> + val = REG_RD(bp, GRCBASE_PRS, PRS_REGISTERS_NUM_OF_PACKETS);
> +
> + if (val != 0x2) {
> + BNX2X_ERR("PRS timeout val = 0x%x\n", val);
> + }
> +
> + /* Write 1 to parser credits for CFC search request */
> + REG_WR(bp, GRCBASE_PRS, PRS_REGISTERS_CFC_SEARCH_INITIAL_CREDIT, 0x1);
> +
> + /* Wait until PRS register shows 3 packet */
> + msleep(10 * factor);
Same.
> + DP(NETIF_MSG_HW, "done\n");
> + return 0; /* OK */
> +}
> +static void bnx2x_stop_stats(struct bnx2x *bp)
> +{
> + atomic_set(&bp->stats_state, STATS_STATE_STOP);
> + DP(BNX2X_MSG_STATS, "stats_state - STOP\n");
> + while (atomic_read(&bp->stats_state) != STATS_STATE_DISABLE) {
This seems to need a memory barrier (on this side _and_ on the write side,
wherever that is (comment!)).
There are special mbs for atomic variables.
> + msleep(100);
> + }
> + DP(BNX2X_MSG_STATS, "stats_state - DISABLE\n");
> +}
> +static void bnx2x_update_stats(struct bnx2x *bp)
> +{
> +
> + if (!bnx2x_update_storm_stats(bp)) {
> +
> + if (bp->phy_flags & PHY_BMAC_FLAG) {
> + bnx2x_update_bmac_stats(bp);
> +
> + } else if (bp->phy_flags & PHY_EMAC_FLAG) {
> + bnx2x_update_emac_stats(bp);
> +
> + } else { /* unreached */
> + BNX2X_ERR("no MAC active\n");
> + return;
> + }
> +
> + bnx2x_update_net_stats(bp);
> + }
> +
> + if (bp->msglevel & NETIF_MSG_TIMER) {
> + printk(KERN_DEBUG "%s:\n"
> + KERN_DEBUG "tx avail(%x) rx usage(%x)\n"
> + KERN_DEBUG "tx pkt (%lx) rx pkt(%lx)\n"
> + KERN_DEBUG "tx hc idx(%x) rx hc index(%x)\n"
> + KERN_DEBUG "%s (Xoff events %u)\n"
> + KERN_DEBUG "brb drops %u\n"
> + KERN_DEBUG "tstats->no_buff_discard, %u\n"
> + KERN_DEBUG "tstats->errors_discard, %u\n"
> + KERN_DEBUG "tstats->mac_filter_discard, %u\n"
> + KERN_DEBUG "tstats->xxoverflow_discard, %u\n"
> + KERN_DEBUG "tstats->ttl0_discard, %u\n",
> + bp->dev->name,
> + bnx2x_tx_avail(bp->fp),
> + (u16)(*bp->fp->rx_cons_sb - bp->fp->rx_comp_cons),
> + bp->net_stats.tx_packets, bp->net_stats.rx_packets,
> + *bp->fp->tx_cons_sb, *bp->fp->rx_cons_sb,
> + netif_queue_stopped(bp->dev)? "Xoff" : "Xon ",
> + bp->slowpath->eth_stats.driver_xoff,
> + bp->slowpath->eth_stats.brb_discard,
> + bp->slowpath->eth_stats.no_buff_discard,
> + bp->slowpath->eth_stats.errors_discard,
> + bp->slowpath->eth_stats.mac_filter_discard,
> + bp->slowpath->eth_stats.xxoverflow_discard,
> + bp->slowpath->eth_stats.ttl0_discard);
> + }
> +
> + if (bp->state != BNX2X_STATE_OPEN) {
> + DP(BNX2X_MSG_STATS, "state is %x, returning\n", bp->state);
> + return;
> + }
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + if (unlikely(bp->panic)) {
> + return;
> + }
> +#endif
> +
> + if (bp->fw_mb) {
> + REG_WR32(bp, GRCBASE_DMAE,
> + (bp->port? DMAE_REGISTERS_GO_C13 :
> + DMAE_REGISTERS_GO_C12), 1);
> + }
> +
> + if (atomic_read(&bp->stats_state) != STATS_STATE_ENABLE) {
> + atomic_set(&bp->stats_state, STATS_STATE_DISABLE);
Is that the write side? If yes, you need a mb here, too.
And a comment.
> + return;
> + }
> +
> + if (bp->phy_flags & PHY_BMAC_FLAG) {
> + REG_WR32(bp, GRCBASE_DMAE,
> + (bp->port? DMAE_REGISTERS_GO_C6 :
> + DMAE_REGISTERS_GO_C0), 1);
> +
> + } else if (bp->phy_flags & PHY_EMAC_FLAG) {
> + REG_WR32(bp, GRCBASE_DMAE,
> + (bp->port? DMAE_REGISTERS_GO_C8 :
> + DMAE_REGISTERS_GO_C2), 1);
> + }
> +
> + if (bnx2x_sp_post(bp, RAMROD_CMD_ID_ETH_STAT_QUERY, 0, 0, 0, 0) == 0) {
> + /* stats ramrod has it's own slot on the spe */
> + bp->spq_left++;
> + bp->stat_pending = 1;
> + }
> +}
> +/* Called with netif_tx_lock.
> + * bnx2x_tx_int() runs without netif_tx_lock unless it needs to call
> + * netif_wake_queue().
> + */
> +static int bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> + struct bnx2x *bp = netdev_priv(dev);
> + dma_addr_t mapping;
> + struct eth_tx_bd *txbd;
> + struct eth_tx_parse_bd *pbd = NULL;
> + struct sw_tx_bd *tx_buf;
> + u16 pkt_prod, bd_prod;
> + int nbd, fp_index = 0;
> + struct bnx2x_fastpath *fp;
> +
> + fp = &bp->fp[fp_index];
> +#ifdef BNX2X_STOP_ON_ERROR
> + if (unlikely(bp->panic)) {
> + return NETDEV_TX_BUSY;
> + }
> +#endif
> +
> + if (unlikely(bnx2x_tx_avail(bp->fp) <
> + (skb_shinfo(skb)->nr_frags + 3))) {
> + bp->slowpath->eth_stats.driver_xoff++,
> + netif_stop_queue(dev);
> + BNX2X_ERR("BUG! Tx ring full when queue awake!\n");
> + return NETDEV_TX_BUSY;
> + }
> +
> + /*
> + This is a bit ugly. First we use one BD which we mark as start,
> + then for TSO or xsum we have a parsing info BD,
> + and only then we have the rest of the TSO bds.
> + (don't forget to mark the last one as last,
> + and to unmap only AFTER you write to the BD ...)
> + I would like to thank Dov Hirshfeld for this mess.
> + */
> +
> + pkt_prod = fp->tx_pkt_prod++;
> + bd_prod = fp->tx_bd_prod;
> +
> + pkt_prod = TX_BD(pkt_prod);
> + bd_prod = TX_BD(bd_prod);
> +
> + /* get a tx_buff and first bd */
> + tx_buf = &fp->tx_buf_ring[pkt_prod];
> + txbd = &fp->tx_desc_ring[bd_prod];
> +
> + txbd->bd_flags.as_bitfield = ETH_TX_BD_FLAGS_START_BD;
> +
> + /* remeber the first bd of the packet */
> + tx_buf->first_bd = bd_prod;
> +
> + DP(NETIF_MSG_TX_QUEUED,
> + "sending pkt=%u @%p, next_idx=%u, bd=%u @%p\n",
> + pkt_prod, tx_buf, fp->tx_pkt_prod, bd_prod, txbd);
> +
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + u8 len;
> + struct iphdr *iph = ip_hdr(skb);
> +
> + txbd->bd_flags.as_bitfield |= ETH_TX_BD_FLAGS_IP_CSUM ;
> +
> + /* turn on parsing and get a bd */
> + bd_prod = TX_BD(NEXT_TX_IDX(bd_prod));
> + pbd = (void *)&fp->tx_desc_ring[bd_prod];
> +
> + len = ((u8 *)ip_hdr(skb) - (u8 *)skb->data)/2;
> +
> + /* for now NS flag is not used in Linux */
> + pbd->global_data =
> + len | ((skb->protocol == ETH_P_8021Q)
> + << ETH_TX_PARSE_BD_LLC_SNAP_EN_SHIFT);
> +
> + pbd->ip_hlen = ip_hdrlen(skb)/2;
> + pbd->total_hlen = len + pbd->ip_hlen;
> +
> + if (iph->protocol == IPPROTO_TCP) {
> + struct tcphdr *th = tcp_hdr(skb);
> + txbd->bd_flags.as_bitfield |=
> + ETH_TX_BD_FLAGS_TCP_CSUM;
> + pbd->tcp_flags = htonl(tcp_flag_word(skb))&0xFFFF;
> + pbd->total_hlen += tcp_hdrlen(skb)/2;
> +
> + pbd->tcp_pseudo_csum = ntohs(th->check);
> + } else if (iph->protocol == IPPROTO_UDP) {
> + struct udphdr *uh = udp_hdr(skb);
> + txbd->bd_flags.as_bitfield |=
> + ETH_TX_BD_FLAGS_TCP_CSUM;
> + pbd->total_hlen += 4;
> + pbd->global_data |= ETH_TX_PARSE_BD_UDP_FLG;
> +
> + /* HW bug: we need to subtract 10 bytes before the
> + * UDP header from the csum
> + */
> + uh->check = (u16) ~csum_fold(csum_sub(uh->check,
> + csum_partial(((u8 *)(uh)-10), 10, 0)));
> + }
> + }
> +
> + if (bp->vlgrp != 0 && vlan_tx_tag_present(skb)) {
> + txbd->vlan = vlan_tx_tag_get(skb);
> + txbd->bd_flags.as_bitfield |= ETH_TX_BD_FLAGS_VLAN_TAG;
> + }
> +#ifdef BNX2X_STOP_ON_ERROR
> + else {
> + txbd->vlan = pkt_prod;
> + }
> +#endif
> +
> + mapping = pci_map_single(bp->pdev, skb->data,
> + skb->len, PCI_DMA_TODEVICE);
dma_mapping_error()
> +
> + txbd->addr_hi = U64_HI(mapping);
> + txbd->addr_lo = U64_LO(mapping);
> + txbd->hdr_nbds = 1;
> + txbd->nbd = nbd = skb_shinfo(skb)->nr_frags + ((pbd == NULL)? 1 : 2);
> + txbd->nbytes = skb_headlen(skb);
> +
> + DP(NETIF_MSG_TX_QUEUED,
> + "first bd @%p, addr(%x:%x) hdr_nbds(%d) nbd(%d)"
> + " nbytes(%d) flags(%x) vlan(%u)\n",
> + txbd, txbd->addr_hi, txbd->addr_lo, txbd->hdr_nbds, txbd->nbd,
> + txbd->nbytes, txbd->bd_flags.as_bitfield, txbd->vlan);
> +
> + if ((skb_shinfo(skb)->gso_size) &&
> + (skb->len > (bp->dev->mtu + ETH_HLEN))) {
> +
> + int hlen = 2*pbd->total_hlen;
> + DP(NETIF_MSG_TX_QUEUED,
> + "TSO packet len %d, hlen %d, total len %d, tso size %d\n",
> + skb->len, hlen, skb_headlen(skb), skb_shinfo(skb)->gso_size);
> +
> + txbd->bd_flags.as_bitfield |= ETH_TX_BD_FLAGS_SW_LSO;
> +
> + if (txbd->nbytes > hlen) {
> + /* we split the first bd into headers and data bds
> + * to ease the pain of our fellow micocode engineers
> + * we use one mapping for both bds
> + * So far this has only been observed to happen
> + * in Other Operating Systems(TM)
> + */
> +
> + /* first fix first bd */
> + nbd++;
> + txbd->nbd++;
> + txbd->nbytes = hlen;
> +
> + /* we only print this as an error
> + * because we don't think this will ever happen.
> + */
> + BNX2X_ERR("TSO split headr size is %d (%x:%x)"
> + " nbd %d\n", txbd->nbytes, txbd->addr_hi,
> + txbd->addr_lo, txbd->nbd);
> +
> + /* now get a new data bd
> + * (after the pbd) and fill it */
> + bd_prod = TX_BD(NEXT_TX_IDX(bd_prod));
> + txbd = &fp->tx_desc_ring[bd_prod];
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + txbd->vlan = pkt_prod;
> +#endif
> + txbd->addr_hi = U64_HI(mapping);
> + txbd->addr_lo = U64_LO(mapping) + hlen;
> + txbd->nbytes = skb_headlen(skb) - hlen;
> + /* this marks the bd
> + * as one that has no individual mapping
> + * the FW ignors this flag in a bd not maked start
> + */
> + txbd->bd_flags.as_bitfield = ETH_TX_BD_FLAGS_SW_LSO;
> + DP(NETIF_MSG_TX_QUEUED,
> + "TSO split data size is %d (%x:%x)\n",
> + txbd->nbytes, txbd->addr_hi, txbd->addr_lo);
> + }
> +
> + if (!pbd) {
> + /* supposed to be unreached
> + * (and therefore not handled properly...)
> + */
> + BNX2X_ERR("LSO with no PBD\n");
> + BUG();
> + }
> +
> + pbd->lso_mss = skb_shinfo(skb)->gso_size;
> + pbd->tcp_send_seq = ntohl(tcp_hdr(skb)->seq);
> + pbd->ip_id = ntohs(ip_hdr(skb)->id);
> + pbd->tcp_pseudo_csum =
> + ntohs(~csum_tcpudp_magic(ip_hdr(skb)->saddr,
> + ip_hdr(skb)->daddr,
> + 0, IPPROTO_TCP, 0));
> + pbd->global_data |= ETH_TX_PARSE_BD_PSEUDO_CS_WITHOUT_LEN;
> + }
> +
> + {
> + int i;
> + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> + skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> +
> + bd_prod = TX_BD(NEXT_TX_IDX(bd_prod));
> + txbd = &fp->tx_desc_ring[bd_prod];
> +
> + mapping = pci_map_page(bp->pdev, frag->page,
> + frag->page_offset,
> + frag->size, PCI_DMA_TODEVICE);
> +
> + txbd->addr_hi = U64_HI(mapping);
> + txbd->addr_lo = U64_LO(mapping);
> + txbd->nbytes = frag->size;
> + txbd->bd_flags.as_bitfield = 0;
> +#ifdef BNX2X_STOP_ON_ERROR
> + txbd->vlan = pkt_prod;
> +#endif
> + DP(NETIF_MSG_TX_QUEUED, "frag (%d) bd @%p,"
> + " addr(%x:%x) nbytes(%d) flags(%x)\n",
> + i, txbd, txbd->addr_hi, txbd->addr_lo,
> + txbd->nbytes, txbd->bd_flags.as_bitfield);
> +
> + } /* for */
> + }
> +
> + /* now at last mark the bd as the last bd */
> + txbd->bd_flags.as_bitfield |= ETH_TX_BD_FLAGS_END_BD;
> +
> + DP(NETIF_MSG_TX_QUEUED, "last bd @%p flags(%x)\n",
> + txbd, txbd->bd_flags.as_bitfield);
> +
> + tx_buf->skb = skb;
> +
> + bd_prod = TX_BD(NEXT_TX_IDX(bd_prod));
> +
> + /* now send a tx doorbell, counting the next bd
> + * if the packet contains or ends with it
> + */
> + if (TX_BD_POFF(bd_prod) < nbd)
> + nbd++;
> +
> + if (pbd) {
> + DP(NETIF_MSG_TX_QUEUED,
> + "PBD @%p, ip_data=%x , ip_hlen %u, ip_id %u, lso_mss %u,"
> + " tcp_flags %x, xsum %x, seq %u, hlen %u)\n",
> + pbd, pbd->global_data, pbd->ip_hlen, pbd->ip_id,
> + pbd->lso_mss, pbd->tcp_flags, pbd->tcp_pseudo_csum,
> + pbd->tcp_send_seq, pbd->total_hlen);
> + }
> +
> + DP(NETIF_MSG_TX_QUEUED, "doorbell: nbd=%u, bd=%d\n", nbd, bd_prod);
> +
> + fp->hw_tx_prods->bds_prod += nbd;
> + fp->hw_tx_prods->packets_prod++;
> + DOORBELL(bp, fp_index, 0);
> +
> + mmiowb();
> +
> + fp->tx_bd_prod = bd_prod;
> + dev->trans_start = jiffies;
> +
> + if (unlikely(bnx2x_tx_avail(fp) < MAX_SKB_FRAGS + 3)) {
> + netif_stop_queue(dev);
> + bp->slowpath->eth_stats.driver_xoff++;
> + if (bnx2x_tx_avail(fp) >= MAX_SKB_FRAGS + 3)
> + netif_wake_queue(dev);
> + }
> + fp->tx_pkt++;
> + return NETDEV_TX_OK;
> +}
> +static int bnx2x_nic_unload(struct bnx2x *bp, int fre_irq)
> +{
> + u32 reset_code = 0;
> + int rc /*, j */;
> + bp->state = BNX2X_STATE_CLOSING_WAIT4_HALT;
> +
> + /* Calling flush_scheduled_work() may deadlock because
> + * linkwatch_event() may be on the workqueue and it will try to get
> + * the rtnl_lock which we are holding.
> + */
> + while (bp->in_reset_task)
> + msleep(1);
Memory barrier?
> +
> + /* Delete the timer: do it before disabling interrupts, as it
> + may be stil STAT_QUERY ramrod pending after stopping the timer */
> + del_timer_sync(&bp->timer);
> +
> + /* Wait until stat ramrod returns and all SP tasks complete */
> + while (bp->stat_pending && (bp->spq_left != MAX_SPQ_PENDING)) {
Memory barrier?
> + msleep(1);
> + }
> +
> + /* Stop fast path, disable MAC, disable interrupts */
> + bnx2x_netif_stop(bp);
> + /* Remember cstorm producer in DSB to track it */
> + bp->dsb_prod_sp_idx = *bp->dsb_sp_prod;
> + /* Send CFC_DELETE ramrod */
> + bnx2x_sp_post(bp, RAMROD_CMD_ID_ETH_LEADING_CFC_DEL, 0, 0, 0, 1);
> + /* Wait for completion to arrive. Do nothing as we are going to reset
> + the chip a few lines later */
> + while ( bp->dsb_prod_sp_idx == *bp->dsb_sp_prod ) {
What's that doing? Poking some DMA?
Barrier needed? DMA sync needed?
> + msleep(1);
> + }
> +
> +static void bnx2x_reset_task(struct work_struct *work)
> +{
> + struct bnx2x *bp = container_of(work, struct bnx2x, reset_task);
> +
> +#ifdef BNX2X_STOP_ON_ERROR
> + BNX2X_ERR("reset taks called but STOP_ON_ERROR defined"
> + " so reset not done to allow debug dump,\n"
> + " you will need to reboot when done\n");
> + return;
> +#endif
> +
> + if (!netif_running(bp->dev))
> + return;
> +
> + bp->in_reset_task = 1;
> + bnx2x_netif_stop(bp);
> +
> + bnx2x_nic_unload(bp, 0);
> + bnx2x_nic_load(bp, 0);
> + bp->in_reset_task = 0;
smp_wmb()?
> +}
> +static int bnx2x_phys_id(struct net_device *dev, u32 data)
> +{
> + struct bnx2x *bp = netdev_priv(dev);
> + int i;
> +
> + if (data == 0)
> + data = 2;
> +
> + for (i = 0; i < (data * 2); i++) {
> + if ((i % 2) == 0) {
> + bnx2x_leds_set(bp, SPEED_1000);
> + } else {
> + bnx2x_leds_unset(bp);
> + }
> + msleep_interruptible(500);
Check the return value of msleep_interruptible and drop the
following check for signals.
> + if (signal_pending(current))
> + break;
> + }
> +
> + if (bp->link_up) {
> + bnx2x_leds_set(bp, bp->line_speed);
> + }
> + return 0;
> +}
> +#ifdef BNX2X_STOP_ON_ERROR
> +#warning stop on error defined
> +#define bnx2x_panic() \
> + do { bp->panic = 1; \
> + BNX2X_ERR("driver assert\n"); \
> + bnx2x_disable_int(bp); \
> + bnx2x_panic_dump(bp); \
> + } while (0);
Drop the last semicolon
> +#else
> +#define bnx2x_panic()
> +#endif
Puh, that was a lot :)
--
Greetings Michael.
^ permalink raw reply
* Re: [PATCH] SMSC LAN911x and LAN921x vendor driver
From: Peter Korsgaard @ 2007-08-01 22:27 UTC (permalink / raw)
To: Steve.Glendinning
Cc: Bahadir Balban, Bill Gatliff, Dustin Mcintire, ian.saturley,
netdev
In-Reply-To: <OF5F9C7C7D.04B8DBCD-ON80257328.0063528D-80257328.0065CB53@smsc.com>
>>>>> "Steve" == Steve Glendinning <Steve.Glendinning@smsc.com> writes:
Hi,
>> What's the problem with Dustin's driver? It seems to work fine here
>> with a lan9117. Why not just add lan921x support to the existing
>> driver?
Steve> I have heard Dustin's driver works very well on PXA, but on
Steve> others it doesn't even compile (hence why it depends on
Steve> ARCH_PXA).
I'm using it on PPC.
Steve> Dustin's driver was based on the smc91x code, but these two
Steve> ethernet devices share nothing other than a similar name.
Steve> smsc911x is a new, completely platform-independent driver with
Steve> workarounds for several hardware issues, and it doesn't suffer
Steve> from quite as much macro abuse :-)
The macros are not that bad as the hw people sligtly misconnected the
chip, so I need special access routines (enable the big endian mode,
not byteswap on normal registar access and byteswap on fifo access).
I'll give your driver a try and report back.
--
Bye, Peter Korsgaard
^ permalink raw reply
* Re: [RFC][BNX2X]: New driver for Broadcom 10Gb Ethernet.
From: Roland Dreier @ 2007-08-01 22:28 UTC (permalink / raw)
To: Michael Buesch
Cc: Michael Chan, davem, jeff, netdev, eliezert, lusinsky, eilong
In-Reply-To: <200708020006.13457.mb@bu3sch.de>
> > + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_5710,
> > + PCI_ANY_ID, PCI_ANY_ID, 0, 0, BCM5710 },
FWIW, this could be neater as
{ PCI_VDEVICE(BROADCOM, PCI_DEVICE_ID_NX2_5710), BCM5710 }
- R.
^ permalink raw reply
* Re: [RFC][BNX2X]: New driver for Broadcom 10Gb Ethernet.
From: Jeff Garzik @ 2007-08-01 22:50 UTC (permalink / raw)
To: Roland Dreier
Cc: Michael Buesch, Michael Chan, davem, netdev, eliezert, lusinsky,
eilong
In-Reply-To: <adaodhq6gal.fsf@cisco.com>
Roland Dreier wrote:
> > > + { PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_5710,
> > > + PCI_ANY_ID, PCI_ANY_ID, 0, 0, BCM5710 },
>
> FWIW, this could be neater as
>
> { PCI_VDEVICE(BROADCOM, PCI_DEVICE_ID_NX2_5710), BCM5710 }
Yes. And additionally, I prefer (but not require) that people directly
use a hexidecimal constant in the PCI ID table for device ID, if that is
the only place in the entire codebase referring to that PCI device ID.
Using a named constant for a single-use PCI device ID merely aggrevates
include/linux/pci_ids.h patching headache for what is ultimately an
arbitrary number [usually] picked out of thin air by the hw vendor.
Jeff
^ permalink raw reply
* Re: [PATCH 2/2] NET: fix memory leaks from security_secid_to_secctx()
From: James Morris @ 2007-08-02 0:05 UTC (permalink / raw)
To: Paul Moore; +Cc: netdev, selinux
In-Reply-To: <20070801151557.264877461@hp.com>
Both patches applied to:
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6.git#for-akpm
--
James Morris
<jmorris@namei.org>
^ permalink raw reply
* Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )
From: Matt Mackall @ 2007-08-02 2:02 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Gabriel C, Andrew Morton, linux-kernel, netdev, jason.wessel,
amitkale, Sam Ravnborg
In-Reply-To: <20070801095920.GA1941@ff.dom.local>
On Wed, Aug 01, 2007 at 11:59:21AM +0200, Jarek Poplawski wrote:
> On Tue, Jul 31, 2007 at 05:05:00PM +0200, Gabriel C wrote:
> > Jarek Poplawski wrote:
> > > On Tue, Jul 31, 2007 at 12:14:36PM +0200, Gabriel C wrote:
> > >> Jarek Poplawski wrote:
> > >>> On 28-07-2007 20:42, Gabriel C wrote:
> > >>>> Andrew Morton wrote:
> > >>>>> On Sat, 28 Jul 2007 17:44:45 +0200 Gabriel C <nix.or.die@googlemail.com> wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> I got this compile error with a randconfig ( http://194.231.229.228/MM/randconfig-auto-82.broken.netpoll.c ).
> > >>>>>>
> > >>>>>> ...
> > >>>>>>
> > >>>>>> net/core/netpoll.c: In function 'netpoll_poll':
> > >>>>>> net/core/netpoll.c:155: error: 'struct net_device' has no member named 'poll_controller'
> > >>>>>> net/core/netpoll.c:159: error: 'struct net_device' has no member named 'poll_controller'
> > >>>>>> net/core/netpoll.c: In function 'netpoll_setup':
> > >>>>>> net/core/netpoll.c:670: error: 'struct net_device' has no member named 'poll_controller'
> > >>>>>> make[2]: *** [net/core/netpoll.o] Error 1
> > >>>>>> make[1]: *** [net/core] Error 2
> > >>>>>> make: *** [net] Error 2
> > >>>>>> make: *** Waiting for unfinished jobs....
> > >>>>>>
> > >>>>>> ...
> > >>>>>>
> > >>>>>>
> > >>>>>> I think is because KGDBOE selects just NETPOLL.
> > >>>>>>
> > >>>>> Looks like it.
> > >>>>>
> > >>>>> Select went and selected NETPOLL and NETPOLL_TRAP but things like
> > >>>>> CONFIG_NETDEVICES and CONFIG_NET_POLL_CONTROLLER remain unset. `select'
> > >>>>> remains evil.
> > >>> ...
> > >>>> I think there may be a logical issue ( again if I got it right ).
> > >>>> We need some ethernet card to work with kgdboe right ? but we don't have any if !NETDEVICES && !NET_ETHERNET.
> > >>>>
> > >>>> So maybe some ' depends on ... && NETDEVICES!=n && NET_ETHERNET!=n ' is needed too ?
> > >>> IMHO, the only logical issue here is netpoll.c mustn't use
> > >>> CONFIG_NET_POLL_CONTROLLER code without #ifdef if it doesn't
> > >>> add this dependency itself.
> > >>>
> > >> Well it does if NETDEVICES && if NET_ETHERNET which booth are N when !NETDEVICES is why KGDBOE uses select and not depends on.
> > >
> > > "does if XXX" means may "use if XXX".
> >
> > From what I know means only use "if xxx" on !xxx everything inside the "if xxx" is n and "depends on <something inside the if xxx>
> > does not work.
> >
> > ...
> >
> > menuconfig FOO
> > bool "FOO"
> > depends on BAR
> > default y
> > -- help --
> > something
> > if FOO
> >
> > config BAZ
> > depends on WHATEVR && !NOT_THIS
> >
> > menuconfig SOMETHING_ELSE
> > ....
> > if SOMETHING_ELSE
> >
> > config BLUBB
> > depends on PCI && WHATNOT
> >
> > endif # SOMETHING_ELSE
> >
> > config NETPOLL
> > def_bool NETCONSOLE
> >
> > config NETPOLL_TRAP
> > bool "Netpoll traffic trapping"
> > default n
> > depends on NETPOLL
> >
> > config NET_POLL_CONTROLLER
> > def_bool NETPOLL
> >
> > endif # FOO
> >
> > Now if you set FOO=n all is gone and your driver have to select whatever it needs from there.
>
> Probably not exactly so...
>
> >From drivers/net/Kconfig:
>
> > # All the following symbols are dependent on NETDEVICES - do not repeat
> > # that for each of the symbols.
> > if NETDEVICES
>
> So, according to this netpoll could presume NETDEVICES and
> NET_POLL_CONTROLLER are always on.
>
> But, as you've found, it's possible to select NETPOLL and !NETDEVICES,
> so this comment is at least not precise.
>
> On the other side something like this:
>
> ...
> endif # NETDEVICES
>
> config NETPOLL
> depends on NETDEVICES
> def_bool NETCONSOLE
>
> config NETPOLL_TRAP
> bool "Netpoll traffic trapping"
> default n
> depends on NETPOLL
>
> config NET_POLL_CONTROLLER
> def_bool NETPOLL
> depends on NETPOLL
>
>
> seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
> still doesn't check for NETDEVICES dependency.
That's odd. Adding Sam to the cc:.
> > >> Now KGDBOE just selects NETPOLL and NETPOLL_TRAP.
> > >> Adding 'select CONFIG_NET_POLL_CONTROLLER' let kgdboe compiles but the question is does it work without any ethernet card ?
> > >
> > > Why kgdboe should care what netpoll needs? So, I hope, you are adding
> > > this select under config NETPOLL. On the other hand, if NETPOLL should
> > > depend on NET_POLL_CONTROLLER there is probably no reason to have them
> > > both.
> >
> > NET_POLL_CONTROLLER has def_bool NETPOLL if NETDEVICES .
> >
> > Net peoples ping ?:)
How about cc:ing the netpoll maintainer?
> OK, I wasn't right here: there is no visible reason for both in the
> kernel code, but I can imagine there could be some external users of
> NET_POLL_CONTROLLER without NETPOLL.
I don't know of any. As far as I can tell at this point,
NET_POLL_CONTROLLER == NETPOLL.
--
Mathematics is the supreme nostalgia of our time.
^ permalink raw reply
* [PATCH 2.6.23-rc1][NETFILTER] nf_conntrack_reasm: adding icmpv6_send code(TIME EXCEEDED).
From: Masayuki Nakagawa @ 2007-08-02 2:53 UTC (permalink / raw)
To: netdev; +Cc: davem, yoshfuji, Masayuki Nakagawa
I ran the TAHI conformance test on a kernel, which CONFIG_NF_CONNTRACK_IPV6
is enabled. And then it showed a result including a couple of failure.
The all of failed items are related to TIME EXCEEDED.
The test procedure is here.
Tester Target
| |
|-------------------------->|
| Echo Request |
| (1st fragment) |
| |
| wait for 65 sec. |
| |
|<--------------------------|
| ICMPv6 Error |
(1) Tester sends a first fragment of ICMPv6 echo request to Target.
(2) Wait for over 60 sec.
(3) If target replies a ICMPv6 error message(Time Exceeded) to Tester,
then this test is success, otherwise it's failure.
The reason of the failure is very simple, it's because icmpv6_send code are
missing in nf_ct_frag6_expire function(nf_conntrack_reasm.c).
The change is to add the missing code.
In RFC2460, the specification regarding Time Exceeded is described,
but it's defined as "should". So, there is no specification violation here.
Therefore I'm not sure whether this change is appropriate or not.
I will appreciate any comments. Thanks.
Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp>
Index: linux-2.6/net/ipv6/netfilter/nf_conntrack_reasm.c
===================================================================
--- linux-2.6.orig/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ linux-2.6/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -76,6 +76,7 @@ struct nf_ct_frag6_queue
struct sk_buff *fragments;
int len;
int meat;
+ int iif;
ktime_t stamp;
unsigned int csum;
__u8 last_in; /* has first/last segment arrived? */
@@ -279,6 +280,7 @@ static void nf_ct_frag6_evictor(void)
static void nf_ct_frag6_expire(unsigned long data)
{
struct nf_ct_frag6_queue *fq = (struct nf_ct_frag6_queue *) data;
+ struct net_device *dev = NULL;
spin_lock(&fq->lock);
@@ -287,7 +289,26 @@ static void nf_ct_frag6_expire(unsigned
fq_kill(fq);
+ dev = dev_get_by_index(fq->iif);
+ if (!dev)
+ goto out;
+
+ /* Don't send error if the first segment did not arrive. */
+ if (!(fq->last_in&FIRST_IN) || !fq->fragments)
+ goto out;
+
+ /*
+ But use as source device on which LAST ARRIVED
+ segment was received. And do not use fq->dev
+ pointer directly, device might already disappeared.
+ */
+ fq->fragments->dev = dev;
+ icmpv6_send(fq->fragments, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0, dev);
+
out:
+ if (dev)
+ dev_put(dev);
+
spin_unlock(&fq->lock);
fq_put(fq, NULL);
}
@@ -534,6 +555,9 @@ static int nf_ct_frag6_queue(struct nf_c
else
fq->fragments = skb;
+ if (skb->dev)
+ fq->iif = skb->dev->ifindex;
+
skb->dev = NULL;
fq->stamp = skb->tstamp;
fq->meat += skb->len;
^ permalink raw reply
* Re: [PATCH 2.6.23-rc1][NETFILTER] nf_conntrack_reasm: adding icmpv6_send code(TIME EXCEEDED).
From: David Miller @ 2007-08-02 4:26 UTC (permalink / raw)
To: nakagawa.msy; +Cc: netdev, yoshfuji
In-Reply-To: <46B14720.9090300@ncos.nec.co.jp>
Please make sure to CC: netfilter patches to
netfilter-devel@lists.netfilter.org and kaber@trash.net as is listed
in the linux/MAINTAINERS file.
Thank you.
^ permalink raw reply
* Re: [RFC][PATCH] Removal of duplicated include net/wanrouter/wanmain.c
From: David Miller @ 2007-08-02 4:50 UTC (permalink / raw)
To: michal.k.k.piotrowski; +Cc: akpm, netdev, linux-kernel
In-Reply-To: <46B0C9DD.8040701@googlemail.com>
From: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Date: Wed, 01 Aug 2007 19:58:53 +0200
> Hi,
>
> There is no need to include linux/init.h twice
...
> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Patch applied, thanks.
^ permalink raw reply
* Re: [PATCH 2.6.23-rc1][NETFILTER] nf_conntrack_reasm: adding icmpv6_send code(TIME EXCEEDED).
From: Yasuyuki KOZAKAI @ 2007-08-02 4:43 UTC (permalink / raw)
To: nakagawa.msy; +Cc: netdev, davem, yoshfuji
In-Reply-To: <46B14720.9090300@ncos.nec.co.jp>
Hi,
From: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp>
Date: Wed, 01 Aug 2007 19:53:20 -0700
> I ran the TAHI conformance test on a kernel, which CONFIG_NF_CONNTRACK_IPV6
> is enabled. And then it showed a result including a couple of failure.
> The all of failed items are related to TIME EXCEEDED.
>
> The test procedure is here.
> Tester Target
> | |
> |-------------------------->|
> | Echo Request |
> | (1st fragment) |
> | |
> | wait for 65 sec. |
> | |
> |<--------------------------|
> | ICMPv6 Error |
>
> (1) Tester sends a first fragment of ICMPv6 echo request to Target.
> (2) Wait for over 60 sec.
> (3) If target replies a ICMPv6 error message(Time Exceeded) to Tester,
> then this test is success, otherwise it's failure.
>
> The reason of the failure is very simple, it's because icmpv6_send code are
> missing in nf_ct_frag6_expire function(nf_conntrack_reasm.c).
> The change is to add the missing code.
>
> In RFC2460, the specification regarding Time Exceeded is described,
> but it's defined as "should". So, there is no specification violation here.
> Therefore I'm not sure whether this change is appropriate or not.
>
> I will appreciate any comments. Thanks.
Main usage of nf_conntrack_ipv6 is for firewall today. I think that silently
sending error is not preferable. Moreover, RFC2460 says
4.5 Fragment Header
The Fragment header is used by an IPv6 source to send a packet larger
than would fit in the path MTU to its destination. (Note: unlike
IPv4, fragmentation in IPv6 is performed only by source nodes, not by
routers along a packet's delivery path -- see section 5.)
This means that IPv6 router doesn't send Too Big error for forwarded
packets.
At least it's better to be configurable if people want to do that.
Second idea is to pass such fragments to next network processing instead of
dropping them. But I'm not sure that it is possible.
-- Yasuyuki Kozakai
^ permalink raw reply
* Re: [PATCH 66] net/ipv4/raw.c: kmalloc + memset conversion to kzalloc
From: David Miller @ 2007-08-02 4:54 UTC (permalink / raw)
To: m.kozlowski; +Cc: linux-kernel, kernel-janitors, akpm, netdev
In-Reply-To: <200707312354.00650.m.kozlowski@tuxland.pl>
From: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Date: Tue, 31 Jul 2007 23:54:00 +0200
> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Applied.
^ permalink raw reply
* Re: [PATCH 67] net/ipv4/route.c: mostly kmalloc + memset conversion to k[cz]alloc
From: David Miller @ 2007-08-02 4:54 UTC (permalink / raw)
To: m.kozlowski; +Cc: linux-kernel, kernel-janitors, akpm, netdev
In-Reply-To: <200707312355.03321.m.kozlowski@tuxland.pl>
From: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Date: Tue, 31 Jul 2007 23:55:02 +0200
> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Applied.
^ permalink raw reply
* strange tcp behavior
From: john @ 2007-08-02 6:19 UTC (permalink / raw)
To: netdev
1186035057.207629 127.0.0.1 -> 127.0.0.1 TCP 50000 > smtp [SYN]
Seq=0 Len=0
1186035057.207632 127.0.0.1 -> 127.0.0.1 TCP smtp > 50000 [SYN, ACK]
Seq=0 Ack=1 Win=32792 Len=0 MSS=16396
1186035057.207666 127.0.0.1 -> 127.0.0.1 TCP 50000 > smtp [ACK]
Seq=1 Ack=1 Win=1500 Len=0
1186035057.207699 127.0.0.1 -> 127.0.0.1 SMTP Command: EHLO localhost
1186035057.207718 127.0.0.1 -> 127.0.0.1 TCP smtp > 50000 [ACK]
Seq=1 Ack=17 Win=32792 Len=0
1186035057.207736 127.0.0.1 -> 127.0.0.1 TCP 50000 > smtp [RST]
Seq=17 Len=0
1186035057.223934 127.0.0.1 -> 127.0.0.1 TCP 33787 > 50000 [RST,
ACK] Seq=0 Ack=0 Win=32792 Len=0
Can someone please comment as to why, tcp stack sends rst packet from the
wrong source port in this situation.
This is the same problem that was described in my first two posts, witch
unfortunately nobody seemed to notice.
Here is source code witch can reproduce the behavior described, the client
side code is a complete mess but with a little bit it works.
Server:
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <poll.h>
#include <fcntl.h>
void main(void) {
int ms;
int ss;
struct sockaddr_in sa;
char *str = "HELLO FRIEND";
struct pollfd fd;
int flags;
ms = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
flags = fcntl(ms, F_GETFL, 0);
fcntl(ms, F_SETFL, flags | O_NONBLOCK);
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = htonl(INADDR_ANY);
sa.sin_port = htons(25);
bind(ms, (struct sockaddr *) &sa, sizeof(sa));
listen(ms, 0);
fd.fd = ms;
fd.events = POLLIN;
while(poll(&fd, 1, -1)) {
ss = accept(ms, NULL, NULL);
usleep(10000);
send(ss, str, strlen(str), MSG_NOSIGNAL);
close(ss);
memset(&fd, 0, sizeof(fd));
fd.fd = ms;
fd.events = POLLIN;
}
}
Client:
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <linux/if_ether.h>
//#include <arpa/inet.h>
//#include <linux/if_ether.h>
struct sockaddr_in localaddr;
struct sockaddr_in remoteaddr;
struct sockaddr rawaddr;
int sdl, sdr;
struct tcphdr header;
struct pheader_t {
uint32_t saddr;
uint32_t daddr;
uint8_t r;
uint8_t protocol;
uint16_t length;
};
struct pheader_t pheader;
unsigned short tbuf[2048];
unsigned char buf[2048];
char *msg = "EHLO localhost\r\n";
unsigned char *p;
char *src_addr = "127.0.0.1";
char *dst_addr = "127.0.0.1";
unsigned short sprt = 50000;
unsigned short dprt = 25;
struct timeval tv;
unsigned seq, ack_seq;
int data;
void mysend(void) {
int i, sum;
int len;
if(data) {
len = strlen(msg);
memcpy((char *) tbuf + sizeof(pheader) + sizeof(header),
msg, len);
} else
len = 0;
bzero(&pheader, sizeof(pheader));
pheader.saddr = (in_addr_t) inet_addr(src_addr);
pheader.daddr = (in_addr_t) inet_addr(dst_addr);
pheader.protocol = 6;
pheader.length = htons(sizeof(header) + len);
memcpy(tbuf, &pheader, sizeof(pheader));
memcpy((char *) tbuf + sizeof(pheader), &header, sizeof(header));
sum = 0;
for(i = 0; i < (sizeof(pheader) + sizeof(header)) / 2 + len / 2;
i++) {
sum += tbuf[i];
sum = (sum & 0x0000ffff) + (sum >> 16);
}
header.check = ~sum;
memcpy((char *) tbuf + sizeof(pheader), &header, sizeof(header));
sendto(sdr, (char *) tbuf + sizeof(pheader), sizeof(header) +
len, 0, (struct sockaddr *) &remoteaddr, sizeof(remoteaddr));
}
void main(void)
{
gettimeofday(&tv, NULL);
srand(tv.tv_sec & tv.tv_usec);
remoteaddr.sin_family = AF_INET;
remoteaddr.sin_addr.s_addr = (in_addr_t) inet_addr(dst_addr);
sdl = socket(PF_INET, SOCK_PACKET, htons(ETH_P_ALL));
strcpy(rawaddr.sa_data, "lo");
bind(sdl, (struct sockaddr *) &rawaddr, sizeof(rawaddr));
sdr = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);
bzero(&header, sizeof(header));
header.source = htons(sprt);
header.dest = htons(dprt);
seq = rand();
ack_seq = 0;
header.seq = htonl(seq);
header.ack_seq = htonl(ack_seq);
header.doff = sizeof(header) / 4;
header.syn = 1;
header.window = htons(1500);
mysend();
while(1) {
recvfrom(sdl, buf, sizeof(buf), 0, NULL, NULL);
// p = buf + (*buf & 0x0f) * 4;
p = (buf + 14) + (*(buf + 14) & 0x0f) * 4;
if(ntohs(((struct tcphdr *)p)->source) == dprt &&
ntohs(((struct tcphdr *)p)->dest) == sprt && ((struct
tcphdr *)p)->syn == 1 && ((struct tcphdr *)p)->ack == 1)
break;
}
bzero(&header, sizeof(header));
header.source = htons(sprt);
header.dest = htons(dprt);
seq = ntohl(((struct tcphdr *)p)->ack_seq);
ack_seq = ntohl(((struct tcphdr *)p)->seq) + 1;
header.seq = htonl(seq);
header.ack_seq = htonl(ack_seq);
header.doff = sizeof(header) / 4;
header.ack = 1;
header.window = htons(1500);
mysend();
bzero(&header, sizeof(header));
header.source = htons(sprt);
header.dest = htons(dprt);
header.seq = htonl(seq);
header.ack_seq = htonl(ack_seq);
header.doff = sizeof(header) / 4;
header.ack = 1;
header.psh = 1;
header.window = htons(1500);
data = 1;
mysend();
data = 0;
// usleep(300);
bzero(&header, sizeof(header));
header.source = htons(sprt);
header.dest = htons(dprt);
seq += strlen(msg);
header.seq = htonl(seq);
header.ack_seq = htonl(ack_seq);
header.doff = sizeof(header) / 4;
header.rst = 1;
header.window = htons(1500);
mysend();
}
I traced this behavior way back to 2.4.0-test9-pre3 kernel.
^ permalink raw reply
* Re: [REGRESSION] tg3 dead after s2ram
From: Joachim Deguara @ 2007-08-02 8:05 UTC (permalink / raw)
To: Michael Chan
Cc: Andrew Morton, lkml List, Michal Piotrowski, netdev, linux-acpi
In-Reply-To: <1186002023.18322.7.camel@dell>
On Wednesday 01 August 2007 23:00:23 Michael Chan wrote:
> On Wed, 2007-08-01 at 10:47 -0700, Michael Chan wrote:
> > You have 2 Broadcom devices in your system. 07:00.0 is a wireless
> > device, I think. 8:4.0 is the tg3 device.
> >
> > It's clear that the tg3 device is still in D3 state after resume and
> > that explains why all register accesses fail. tg3_resume() should put
> > the device back in D0 state in a very straight forward way and I don't
> > see how that can fail. It worked for me when I tested it last night.
> > Can you add some printk() to tg3_resume() to see what's happening? Let
> > me know if you want me to send you some debug patches to do that.
>
> I misread the PCI registers below. The power state was ok.
>
> The problem is that memory enable and bus master were not set in PCI
> register 4 after resume. This also explains the register access
> failures.
>
> In tg3_resume(), we call pci_restore_state() which should re-enable
> those 2 bits in PCI register 4. Can you add some printk() to see why
> those bits are not restored after pci_restore_state()?
Reading pci_restore_state() it looks already instrumented. Sorry about the
wrong pci device, looking at my BCM5788 it is pci device 08:04.0 and looking
at the log from my first post there is nothing restored in the first 64
bytes! Otherwise it would have said "PM: Writing back..."
Is this what you are looking for or should I do other printk instrumentation?
-Joachim
^ permalink raw reply
* Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )
From: Jarek Poplawski @ 2007-08-02 9:00 UTC (permalink / raw)
To: Matt Mackall
Cc: Gabriel C, Andrew Morton, linux-kernel, netdev, jason.wessel,
amitkale, Sam Ravnborg
In-Reply-To: <20070802020219.GM11166@waste.org>
On Wed, Aug 01, 2007 at 09:02:19PM -0500, Matt Mackall wrote:
> On Wed, Aug 01, 2007 at 11:59:21AM +0200, Jarek Poplawski wrote:
> > On Tue, Jul 31, 2007 at 05:05:00PM +0200, Gabriel C wrote:
> > > Jarek Poplawski wrote:
> > > > On Tue, Jul 31, 2007 at 12:14:36PM +0200, Gabriel C wrote:
> > > >> Jarek Poplawski wrote:
> > > >>> On 28-07-2007 20:42, Gabriel C wrote:
> > > >>>> Andrew Morton wrote:
> > > >>>>> On Sat, 28 Jul 2007 17:44:45 +0200 Gabriel C <nix.or.die@googlemail.com> wrote:
...
> > > >>>>>> net/core/netpoll.c:155: error: 'struct net_device' has no member named 'poll_controller'
...
> > > >>>>>> I think is because KGDBOE selects just NETPOLL.
> > > >>>>>>
> > > >>>>> Looks like it.
> > > >>>>>
> > > >>>>> Select went and selected NETPOLL and NETPOLL_TRAP but things like
> > > >>>>> CONFIG_NETDEVICES and CONFIG_NET_POLL_CONTROLLER remain unset. `select'
> > > >>>>> remains evil.
...
> > seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
> > still doesn't check for NETDEVICES dependency.
>
> That's odd. Adding Sam to the cc:.
Looks right, but after reading Andrew's opinion about select I'd be
astonished if he doesn't know this problem already.
>
> > > >> Now KGDBOE just selects NETPOLL and NETPOLL_TRAP.
> > > >> Adding 'select CONFIG_NET_POLL_CONTROLLER' let kgdboe compiles but the question is does it work without any ethernet card ?
> > > >
> > > > Why kgdboe should care what netpoll needs? So, I hope, you are adding
> > > > this select under config NETPOLL. On the other hand, if NETPOLL should
> > > > depend on NET_POLL_CONTROLLER there is probably no reason to have them
> > > > both.
> > >
> > > NET_POLL_CONTROLLER has def_bool NETPOLL if NETDEVICES .
> > >
> > > Net peoples ping ?:)
>
> How about cc:ing the netpoll maintainer?
Is there a new one or do you suggest possibility of abusing the
authority of the netpoll's author with such trifles...?!
BTW, I can't find any official meaning of def_bool, but it's name
suggests only default value, so logically it should be not enough
to assure NET_POLL_CONTROLLER=y, and netpoll should use "depends",
"require" or "select" (IMHO more readable too), but on the other
hand this could be practially wrong...
>
> > OK, I wasn't right here: there is no visible reason for both in the
> > kernel code, but I can imagine there could be some external users of
> > NET_POLL_CONTROLLER without NETPOLL.
>
> I don't know of any. As far as I can tell at this point,
> NET_POLL_CONTROLLER == NETPOLL.
There are some notions about "other diagnostic tools" in some
net drivers, eg. 3c509.c, so there would be a little bit of
work if, after changing this, they really exist (and even if
not - maybe it's reasonable to save such possibility for the
future?).
Best regards,
Jarek P.
^ permalink raw reply
* Re: [PATCH] SCTP: drop SACK if ctsn is not less than the next tsn of assoc
From: Wei Yongjun @ 2007-08-02 8:57 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: netdev, Neil Horman, lksctp-developers, Sridhar Samudrala
In-Reply-To: <46B08545.8030707@hp.com>
Vlad Yasevich wrote:
> This is a little better.
>
> One suggestion. The new function you create is almost exactly like
> sctp_sf_violation_chunklen() with the exception of the error string.
> Can you extract the common parts into a single function so that
> we don't have duplication of code.
>
> Thanks
> -vlad
>
>
>
>>>
>>> This is an interesting case, but I am not sure that simply discarding
>>> the SACK is the right thing.
>>>
>>> The peer in this case is violating the protocol whereby he is trying to
>>> advance the cumulative tsn ack to a point beyond the max tsn currently
>>> sent. I would vote for terminating the association in this case since
>>> either the peer is a mis-behaved implementation, or the association is
>>> under attack.
>>>
Patch has been modified base on comment.
Thanks.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
--- net/sctp/sm_statefuns.c.orig 2007-07-29 18:11:01.000000000 -0400
+++ net/sctp/sm_statefuns.c 2007-07-31 17:49:22.000000000 -0400
@@ -97,6 +97,13 @@
const struct sctp_association *asoc,
struct sctp_transport *transport);
+static sctp_disposition_t sctp_sf_abort_violation(
+ const struct sctp_association *asoc,
+ void *arg,
+ sctp_cmd_seq_t *commands,
+ const __u8 *payload,
+ const size_t paylen);
+
static sctp_disposition_t sctp_sf_violation_chunklen(
const struct sctp_endpoint *ep,
const struct sctp_association *asoc,
@@ -104,6 +111,13 @@
void *arg,
sctp_cmd_seq_t *commands);
+static sctp_disposition_t sctp_sf_violation_ctsn(
+ const struct sctp_endpoint *ep,
+ const struct sctp_association *asoc,
+ const sctp_subtype_t type,
+ void *arg,
+ sctp_cmd_seq_t *commands);
+
/* Small helper function that checks if the chunk length
* is of the appropriate length. The 'required_length' argument
* is set to be the size of a specific chunk we are testing.
@@ -2880,6 +2894,13 @@
return SCTP_DISPOSITION_DISCARD;
}
+ /* If Cumulative TSN Ack beyond the max tsn currently
+ * send, terminating the association and respond to the
+ * sender with an ABORT.
+ */
+ if (!TSN_lt(ctsn, asoc->next_tsn))
+ return sctp_sf_violation_ctsn(ep, asoc, type, arg, commands);
+
/* Return this SACK for further processing. */
sctp_add_cmd_sf(commands, SCTP_CMD_PROCESS_SACK, SCTP_SACKH(sackh));
@@ -3691,40 +3712,21 @@
return SCTP_DISPOSITION_VIOLATION;
}
-
/*
- * Handle a protocol violation when the chunk length is invalid.
- * "Invalid" length is identified as smaller then the minimal length a
- * given chunk can be. For example, a SACK chunk has invalid length
- * if it's length is set to be smaller then the size of sctp_sack_chunk_t.
- *
- * We inform the other end by sending an ABORT with a Protocol Violation
- * error code.
- *
- * Section: Not specified
- * Verification Tag: Nothing to do
- * Inputs
- * (endpoint, asoc, chunk)
- *
- * Outputs
- * (reply_msg, msg_up, counters)
- *
- * Generate an ABORT chunk and terminate the association.
+ * Common function to handle a protocol violation.
*/
-static sctp_disposition_t sctp_sf_violation_chunklen(
- const struct sctp_endpoint *ep,
+static sctp_disposition_t sctp_sf_abort_violation(
const struct sctp_association *asoc,
- const sctp_subtype_t type,
void *arg,
- sctp_cmd_seq_t *commands)
+ sctp_cmd_seq_t *commands,
+ const __u8 *payload,
+ const size_t paylen)
{
struct sctp_chunk *chunk = arg;
struct sctp_chunk *abort = NULL;
- char err_str[]="The following chunk had invalid length:";
/* Make the abort chunk. */
- abort = sctp_make_abort_violation(asoc, chunk, err_str,
- sizeof(err_str));
+ abort = sctp_make_abort_violation(asoc, chunk, payload, paylen);
if (!abort)
goto nomem;
@@ -3756,6 +3758,57 @@
return SCTP_DISPOSITION_NOMEM;
}
+/*
+ * Handle a protocol violation when the chunk length is invalid.
+ * "Invalid" length is identified as smaller then the minimal length a
+ * given chunk can be. For example, a SACK chunk has invalid length
+ * if it's length is set to be smaller then the size of sctp_sack_chunk_t.
+ *
+ * We inform the other end by sending an ABORT with a Protocol Violation
+ * error code.
+ *
+ * Section: Not specified
+ * Verification Tag: Nothing to do
+ * Inputs
+ * (endpoint, asoc, chunk)
+ *
+ * Outputs
+ * (reply_msg, msg_up, counters)
+ *
+ * Generate an ABORT chunk and terminate the association.
+ */
+static sctp_disposition_t sctp_sf_violation_chunklen(
+ const struct sctp_endpoint *ep,
+ const struct sctp_association *asoc,
+ const sctp_subtype_t type,
+ void *arg,
+ sctp_cmd_seq_t *commands)
+{
+ char err_str[]="The following chunk had invalid length:";
+
+ return sctp_sf_abort_violation(asoc, arg, commands, err_str,
+ sizeof(err_str));
+}
+
+/* Handle a protocol violation when the peer trying to advance the
+ * cumulative tsn ack to a point beyond the max tsn currently sent.
+ *
+ * We inform the other end by sending an ABORT with a Protocol Violation
+ * error code.
+ */
+static sctp_disposition_t sctp_sf_violation_ctsn(
+ const struct sctp_endpoint *ep,
+ const struct sctp_association *asoc,
+ const sctp_subtype_t type,
+ void *arg,
+ sctp_cmd_seq_t *commands)
+{
+ char err_str[]="The cumulative tsn ack beyond the max tsn currently sent:";
+
+ return sctp_sf_abort_violation(asoc, arg, commands, err_str,
+ sizeof(err_str));
+}
+
/***************************************************************************
* These are the state functions for handling primitive (Section 10) events.
***************************************************************************/
^ permalink raw reply
* Re: [REGRESSION] tg3 dead after s2ram
From: Joachim Deguara @ 2007-08-02 9:15 UTC (permalink / raw)
To: Michael Chan
Cc: Andrew Morton, lkml List, Michal Piotrowski, netdev, linux-acpi
In-Reply-To: <200708021005.45773.joachim.deguara@amd.com>
On Thursday 02 August 2007 10:05:44 Joachim Deguara wrote:
> On Wednesday 01 August 2007 23:00:23 Michael Chan wrote:
> > On Wed, 2007-08-01 at 10:47 -0700, Michael Chan wrote:
> > The problem is that memory enable and bus master were not set in PCI
> > register 4 after resume. This also explains the register access
> > failures.
Found it! I did the instrumentation and found pci_restore was not being
called. Why, because this code failed:
printk(KERN_INFO "tg3_resume: entered\n");
if (!netif_running(dev))
return 0;
printk(KERN_INFO "tg3_resume: before restore\n");
Bad dmesg:
[ 0.581236] PM: Writing back config space on device 0000:07:00.0 at offset
4
(was 4, writing d0200004)
[ 0.581259] tg3_resume: entered
[ 0.581276] PM: Writing back config space on device 0000:08:09.0 at offset
f
(was c001ff, writing 1c0010b)
why is this, oh damn it is because I was calling openSUSE's "powersave -u" and
not s2ram directly. powersave is disabling the network and the tg3 will not
restore the pci device like this! right?
here is a good dmesg with by calling s2ram:
[ 0.577085] PM: Writing back config space on device 0000:07:00.0 at offset
4
(was 4, writing d0200004)
[ 0.577108] tg3_resume: entered
[ 0.577109] tg3_resume: before restore
[ 0.577140] PM: Writing back config space on device 0000:08:04.0 at offset
c
(was 0, writing ffff0000)
[ 0.577171] PM: Writing back config space on device 0000:08:04.0 at offset
1
(was 2b00000, writing 2b00006)
[ 0.577304] tg3_resume: after set_power_state
[ 0.579119] tg3: eth0: Link is down.
[ 0.786266] ata2: SATA link down (SStatus 0 SControl 310)
[ 0.848176] tg3_resume: after tg3_restart_hw
[ 0.848265] PM: Writing back config space on device 0000:08:09.0 at offset
f
(was c001ff, writing 1c0010b)
Seams like even if powersave shuts down the network that the device should
still work after a suspend to ram, so who is at fault here?
-Joachim
^ permalink raw reply
* Re: [REGRESSION] tg3 dead after s2ram
From: David Miller @ 2007-08-02 9:23 UTC (permalink / raw)
To: joachim.deguara
Cc: mchan, akpm, linux-kernel, michal.k.k.piotrowski, netdev,
linux-acpi
In-Reply-To: <200708021115.10812.joachim.deguara@amd.com>
From: "Joachim Deguara" <joachim.deguara@amd.com>
Date: Thu, 2 Aug 2007 11:15:05 +0200
> Seams like even if powersave shuts down the network that the device should
> still work after a suspend to ram, so who is at fault here?
It's a good question.
The pci_enable() is done on the PCI device at probe time, at least in
the tg3 driver, and with such a model restoring and saving of PCI
config space should not be dependant upon whether the netdev is
running or not.
^ permalink raw reply
* Re: TCP SACK issue, hung connection, tcpdump included
From: Ilpo Järvinen @ 2007-08-02 9:23 UTC (permalink / raw)
To: Darryl L. Miles; +Cc: LKML, Netdev
In-Reply-To: <46AEC286.2030302@netbauds.net>
On Tue, 31 Jul 2007, Darryl L. Miles wrote:
> I've been able to capture a tcpdump from both ends during the problem and its
> my belief there is a bug in 2.6.20.1 (at the client side) in that it issues a
> SACK option for an old sequence which the current window being advertised is
> beyond it. This is the most concerning issue as the integrity of the sequence
> numbers doesn't seem right (to my limited understanding anyhow).
You probably didn't check the reference I explicitly gave to those who
are not familiar how DSACK works, just in case you didn't pick it up last
time, here it is again for you: RFC2883... However, if DSACKs really
bother you still (though it shouldn't :-)), IIRC I also told you how
you're able to turn it off (tcp_dsack sysctl) but I assure you that it's
not a bug but feature called DSACK [RFC2883], there's _absolutely_ nothing
wrong with it, instead, it would be wrong to _not_ send the below snd_una
SACK in this scenario when tcp_dsack set to 1.
> There is another concern of why the SERVER performed a retransmission in the
> first place, when the tcpdump shows the ack covering it has been seen.
There are only three possible reasons to this thing:
1) The ACK didn't reach the SERVER (your logs prove this to not be the
case)
2) The ACK got discarded by the SERVER
3) The SERVER (not the client) is buggy and sends an incorrect
retransmission
...So we have just two options remaining...
> I have made available the full dumps at:
>
> http://darrylmiles.org/snippets/lkml/20070731/
Thanks about these... Based on a quick check, it is rather clear that the
SERVER is for some reason discarding the packets it's receiving:
04:11:26.833935 IP CLIENT.43726 > SERVER.ssh: P 4239:4287(48) ack 28176 win 501 <nop,nop,timestamp 819646456 16345815>
04:11:27.132425 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239 win 2728 <nop,nop,timestamp 17096579 819458864>
04:11:27.230081 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501 <nop,nop,timestamp 819646555 16345815,nop,nop
Notice, (cumulative) ack field didn't advance though new data arrived, and
for the record, it's in advertised window too. There are no DSACK in here
so your theory about below snd_una SACK won't help to explain this one
at all... We'll just have to figure out why it's discarding it. And
there's even more to prove this...
> This sequence is interesting from the client side:
>
> 03:58:56.419034 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239
> win 2728 <nop,nop,timestamp 16345815 819458859> # S1
> 03:58:56.419100 IP CLIENT.43726 > SERVER.ssh: . ack 27464 win 501
> <nop,nop,timestamp 819458884 16345815> # C1
> 03:58:56.422019 IP SERVER.ssh > CLIENT.43726: P 27464:28176(712) ack 4239
> win 2728 <nop,nop,timestamp 16345815 819458859> # S2
> 03:58:56.422078 IP CLIENT.43726 > SERVER.ssh: . ack 28176 win 501
> <nop,nop,timestamp 819458884 16345815> # C2
>
> The above 4 packets look as expect to me. Then we suddenly see a
> retransmission of 26016:27464.
>
> 03:58:56.731597 IP SERVER.ssh > CLIENT.43726: . 26016:27464(1448) ack 4239
> win 2728 <nop,nop,timestamp 16346128 819458864> # S3
...Look at this on the retransmission:
... timestamp 16346128 819458864>
...it tells us what really got received by the TCP. The corresponding ACK
with matching timestamp is, surprise, surprise, this one:
> 03:58:56.340180 IP CLIENT.43726 > SERVER.ssh: . ack 26016 win 501
> <nop,nop,timestamp 819458864 16345734>
...thus the SERVER has _not_ received but discarded the subsequent
cumulative ACKs!!! Therefore it's retransmitting from 26016 onward but
never receives any reply as everything seems to get discarded...
There was one bad checksum btw:
> 03:58:56.365662 IP (tos 0x10, ttl 64, id 28685, offset 0, flags [DF],
> proto 6, length: 764) SERVER.ssh > CLIENT.43726: P [bad tcp cksum 6662
> (->ef2b)!] 617734888:617735600(712) ack 2634113543 win 2728
> <nop,nop,timestamp 16345815 819458859>
> There are some changes in 2.6.22 that appear to affect TCP SACK handling
> does this fix a known issue ?
There is no such "known issue" :-)... This issue has nothing to do with
TCP SACK handling, since that code _won't_ be reached... We could verify
that from the timestamps. But if you still insist that SACK under snd_una
is the issue, please turn tcp_dsack to 0 on the CLIENT, you will not get
them after that and you can be happy as your out-of-window SACK "issue"
is then fixed :-)...
...Seriously, somebody else than me is probably better in suggesting what
could cause the discarding at the SERVER in this case. SNMP stuff Dave was
asking could help, you can find them from /proc/net/{netstat,snmp}...
--
i.
^ permalink raw reply
* [RFC][PATCH] Removal of FASTROUTE definition include/linux/if_packet.h
From: Rami Rosen @ 2007-08-02 9:23 UTC (permalink / raw)
To: netdev, linux-kernel
Hi,
It seems that PACKET_FASTROUTE definition should be removed due to that
fastroute is no longer supported.
Regards,
Rami Rosen
--
Signed-off-by: Rami Rosen <ramirose@gmail.com>
--- linux-2.6.23-rc1-clean/include/linux/if_packet.h 2007-05-03
12:07:59.000000000 +0300
+++ linux-2.6.23-rc1/include/linux/if_packet.h 2007-08-02
11:37:16.000000000 +0300
@@ -30,7 +30,6 @@ struct sockaddr_ll
#define PACKET_OUTGOING 4 /* Outgoing of any type */
/* These ones are invisible by user level */
#define PACKET_LOOPBACK 5 /* MC/BRD frame looped back */
-#define PACKET_FASTROUTE 6 /* Fastrouted frame */
/* Packet socket options */
^ permalink raw reply
* Re: TCP SACK issue, hung connection, tcpdump included
From: David Miller @ 2007-08-02 9:26 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: darryl-mailinglists, linux-kernel, netdev
In-Reply-To: <Pine.LNX.4.64.0708021056560.8788@kivilampi-30.cs.helsinki.fi>
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 2 Aug 2007 12:23:23 +0300 (EEST)
> ...Seriously, somebody else than me is probably better in suggesting what
> could cause the discarding at the SERVER in this case. SNMP stuff Dave was
> asking could help, you can find them from /proc/net/{netstat,snmp}...
That will also tell us if TCP discarded the packet due to
timestamps tests or similar.
^ permalink raw reply
* Re: [RFC][PATCH] Removal of FASTROUTE definition include/linux/if_packet.h
From: David Miller @ 2007-08-02 9:26 UTC (permalink / raw)
To: ramirose; +Cc: netdev, linux-kernel
In-Reply-To: <eb3ff54b0708020223o521e89b9v3200e06ab296df12@mail.gmail.com>
From: "Rami Rosen" <ramirose@gmail.com>
Date: Thu, 2 Aug 2007 12:23:41 +0300
> Hi,
> It seems that PACKET_FASTROUTE definition should be removed due to that
> fastroute is no longer supported.
It's a value exported to userland, so removing it could
break application compilation, so we can't remove it.
^ permalink raw reply
* Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )
From: Sam Ravnborg @ 2007-08-02 9:36 UTC (permalink / raw)
To: Matt Mackall
Cc: Jarek Poplawski, Gabriel C, Andrew Morton, linux-kernel, netdev,
jason.wessel, amitkale
In-Reply-To: <20070802020219.GM11166@waste.org>
> >
> > ...
> > endif # NETDEVICES
> >
> > config NETPOLL
> > depends on NETDEVICES
> > def_bool NETCONSOLE
> >
> > config NETPOLL_TRAP
> > bool "Netpoll traffic trapping"
> > default n
> > depends on NETPOLL
> >
> > config NET_POLL_CONTROLLER
> > def_bool NETPOLL
> > depends on NETPOLL
> >
> >
> > seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
> > still doesn't check for NETDEVICES dependency.
>
> That's odd. Adding Sam to the cc:.
select is evil....
select will by brute force set a symbol equal to 'y' without
visiting the dependencies.
So abusing select you are able to select a symbol FOO even
if FOO depends on BAR that is not set.
In general use select only for non-visible symbols (no promts anywhere)
and for symbols with no dependencies.
That will limit the suefullness but on the other hand avoid the illegal
configurations all over.
kconfig should one day warn about such things but I have not fel inclined
to dive into the matters hoping that Roman does one day.
Sam
^ permalink raw reply
* Re: strange tcp behavior
From: Evgeniy Polyakov @ 2007-08-02 9:55 UTC (permalink / raw)
To: john; +Cc: netdev
In-Reply-To: <56697.212.93.96.73.1186035546.squirrel@mail.screen.lv>
On Thu, Aug 02, 2007 at 09:19:06AM +0300, john@screen.lv (john@screen.lv) wrote:
> 1186035057.207629 127.0.0.1 -> 127.0.0.1 TCP 50000 > smtp [SYN]
> Seq=0 Len=0
> 1186035057.207632 127.0.0.1 -> 127.0.0.1 TCP smtp > 50000 [SYN, ACK]
> Seq=0 Ack=1 Win=32792 Len=0 MSS=16396
> 1186035057.207666 127.0.0.1 -> 127.0.0.1 TCP 50000 > smtp [ACK]
> Seq=1 Ack=1 Win=1500 Len=0
> 1186035057.207699 127.0.0.1 -> 127.0.0.1 SMTP Command: EHLO localhost
> 1186035057.207718 127.0.0.1 -> 127.0.0.1 TCP smtp > 50000 [ACK]
> Seq=1 Ack=17 Win=32792 Len=0
> 1186035057.207736 127.0.0.1 -> 127.0.0.1 TCP 50000 > smtp [RST]
> Seq=17 Len=0
> 1186035057.223934 127.0.0.1 -> 127.0.0.1 TCP 33787 > 50000 [RST,
> ACK] Seq=0 Ack=0 Win=32792 Len=0
>
> Can someone please comment as to why, tcp stack sends rst packet from the
> wrong source port in this situation.
Besides the fact, that test applications do not run if started not as
root, I got this:
13:51:12.180241 IP localhost.localdomain.50000 > localhost.localdomain.10250: S 906222067:906222067(0) win 1500
13:51:12.180279 IP localhost.localdomain.10250 > localhost.localdomain.50000: S 2011233747:2011233747(0) ack 906222068
win 32792 <mss 16396>
13:51:12.180293 IP localhost.localdomain.50000 > localhost.localdomain.10250: R 906222068:906222068(0) win 0
13:51:12.180320 IP localhost.localdomain.50000 > localhost.localdomain.10250: . ack 1 win 1500
13:51:12.180329 IP localhost.localdomain.10250 > localhost.localdomain.50000: R 2011233748:2011233748(0) win 0
13:51:12.180341 IP localhost.localdomain.50000 > localhost.localdomain.10250: P 1:17(16) ack 1 win 1500
13:51:12.180349 IP localhost.localdomain.10250 > localhost.localdomain.50000: R 2011233748:2011233748(0) win 0
13:51:12.180361 IP localhost.localdomain.50000 > localhost.localdomain.10250: R 906222084:906222084(0) win 1500
I.e. there is no bug in this session.
FC7 2.6.22.1-27.fc7 kernel.
Here is vanilla (with my patches, unrelated to the problem though)
2.6.22-rc5:
09:33:37.650279 IP localhost.50000 > localhost.10250: S 1326688203:1326688203(0) win 1500
09:33:37.664391 IP localhost.10250 > localhost.50000: S 3637551175:3637551175(0) ack 1326688204 win 32792 <mss 16396>
09:33:37.664417 IP localhost.50000 > localhost.10250: R 1326688204:1326688204(0) win 0
09:33:37.650451 IP localhost.50000 > localhost.10250: . ack 1 win 1500
09:33:37.650467 IP localhost.10250 > localhost.50000: R 3637551176:3637551176(0) win 0
09:33:37.650481 IP localhost.50000 > localhost.10250: P 1:17(16) ack 1 win 1500
09:33:37.650493 IP localhost.10250 > localhost.50000: R 3637551176:3637551176(0) win 0
09:33:37.650507 IP localhost.50000 > localhost.10250: R 1326688220:1326688220(0) win 1500
Is it possible that your tcpdump is screwed?
--
Evgeniy Polyakov
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox