* Re: [net-next PATCH v2 00/12] Flow API
From: Or Gerlitz @ 2015-01-14 15:00 UTC (permalink / raw)
To: John Fastabend
Cc: Thomas Graf, simon.horman, Scott Feldman, Linux Netdev List,
Jamal Hadi Salim, Andy Gospodarek, David Miller,
Alexei Starovoitov
In-Reply-To: <54B680EB.8040404@gmail.com>
On Wed, Jan 14, 2015 at 4:44 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 01/13/2015 10:29 PM, Or Gerlitz wrote:
>>
>> On Tue, Jan 13, 2015 at 11:35 PM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>>
>>> I tried to roll in all the feedback from v1 into this series annotated
>>> here,
>> Can you please drop V2 to your
>> https://github.com/jrfastab/rocker-net-next.git tree?
> Should be there now, but as Alexei noted I seem to have missed updating
> the patch commit messages so they still reference the old names.
Oh, I see it now.. yep, with the old names and dates to Dec 30th...
^ permalink raw reply
* Re: [REMINDER] iproute2 ss: Some thoughts about additional info output layout
From: Vadim Kochan @ 2015-01-14 14:57 UTC (permalink / raw)
To: netdev; +Cc: vadim4j
In-Reply-To: <20150108221240.GA23636@angus-think.lan>
On Fri, Jan 09, 2015 at 12:12:40AM +0200, Vadim Kochan wrote:
> Hi,
>
> I think that current output of ss utility visually looks little weird
> when additional info options were specified, so I feel that a lot of
> yours will say that it will break existing scripts of ss output parsing
> but I will try, so thats how I think ss output would looks better in
> case if additionally info options were specified (I am not sure how it
> would looks in the email) :
>
>
> u_str ESTAB 0 0 * 14875 * 14876
> users:("terminator",pid=30585,fd=8),
> ("terminator",pid=29450,fd=8),
> ("firefox",pid=21863,fd=8),
> ("terminator",pid=18927,fd=8),
> ("terminator",pid=17955,fd=8),
> ("terminator",pid=9620,fd=8),
> ("terminator",pid=7894,fd=8),
> ("terminator",pid=7245,fd=8),
> ("terminator",pid=2542,fd=8),
> ("qtile",pid=654,fd=8)
> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0) <->
>
> u_str ESTAB 0 0 * 7550221 * 7550222
> users:("terminator",pid=9620,fd=12)
> skmem:(r0,rb4194304,t0,tb4194304,f0,w0,o0,bl0) <->
>
> udp ESTAB 0 0 10.3.5.18:ipproto-58744 192.168.1.1:swipe
> users:("firefox",pid=21863,fd=54)
> uid:1000 ino:7328962 sk:ffff880212f77bc0 <->
> skmem:(r0,rb4194304,t0,tb4194304,f0,w0,o0,bl0)
>
> udp ESTAB 0 0 fe80::6e88:14ff:feac:51e4:ipproto-33582 fd6b:a256:8c2e::1:swipe
> users:("firefox",pid=21863,fd=62)
> uid:1000 ino:7328974 sk:ffff880205a76e80 <->
> skmem:(r0,rb4194304,t0,tb4194304,f0,w0,o0,bl0)
>
> tcp ESTAB 0 0 192.168.1.198:57851 173.194.113.205:https
> users:("firefox",pid=21863,fd=67)
> uid:1000 ino:7601516 sk:ffff880102e94d00 <->
> skmem:(r0,rb372480,t0,tb87040,f0,w0,o0,bl0)
> ts sack cubic
> wscale:7,11
> rto:253.333 rtt:50.609/18.209 ato:40
> mss:1448 cwnd:10
> send 2.3Mbps lastsnd:46636 lastrcv:46570 lastack:46593 pacing_rate 4.6Mbps rcv_rtt:73.333 rcv_space:42340
>
> tcp ESTAB 0 0 192.168.1.198:47385 74.125.143.104:https
> users:("firefox",pid=21863,fd=63)
> uid:1000 ino:7602727 sk:ffff880007a1db00 <->
> skmem:(r0,rb372480,t0,tb87040,f0,w0,o0,bl0)
> ts sack cubic wscale:7,11 rto:296.666 rtt:95.496/22.756 ato:40
> mss:1418 cwnd:10
> send 1.2Mbps lastsnd:58640 lastrcv:58550 lastack:58550 pacing_rate 2.4Mbps rcv_rtt:96.666 rcv_space:42340
>
> Regards,
> Vadim Kochan
Jut reminder may be it was missed within a lots of emails.
Thanks,
^ permalink raw reply
* Re: [PATCH 1/2] net/macb: Adding comments to various #defs to make interpretation easier
From: Nicolas Ferre @ 2015-01-14 15:10 UTC (permalink / raw)
To: Xander Huff
Cc: jaeden.amero, rich.tollerton, ben.shelton, brad.mouring, netdev,
linux-kernel
In-Reply-To: <1421187351-27279-1-git-send-email-xander.huff@ni.com>
Le 13/01/2015 23:15, Xander Huff a écrit :
> This change is to help improve at-a-glace knowledge of the purpose of the
> various Cadence MACB/GEM registers. Comments are more helpful for human
> readability than short acronyms.
>
> Describe various #define varibles Cadence MACB/GEM registers as documented
> in Xilinix's "Zynq-7000 All Programmable SoC TechnicalReference Manual, v1.9.1
> (UG-585)"
>
> Signed-off-by: Xander Huff <xander.huff@ni.com>
For the record:
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
> ---
> drivers/net/ethernet/cadence/macb.h | 269 ++++++++++++++++++++++--------------
> 1 file changed, 162 insertions(+), 107 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index 084191b..8e8c3c9 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -15,20 +15,20 @@
> #define MACB_MAX_QUEUES 8
>
> /* MACB register offsets */
> -#define MACB_NCR 0x0000
> -#define MACB_NCFGR 0x0004
> -#define MACB_NSR 0x0008
> +#define MACB_NCR 0x0000 /* Network Control */
> +#define MACB_NCFGR 0x0004 /* Network Config */
> +#define MACB_NSR 0x0008 /* Network Status */
> #define MACB_TAR 0x000c /* AT91RM9200 only */
> #define MACB_TCR 0x0010 /* AT91RM9200 only */
> -#define MACB_TSR 0x0014
> -#define MACB_RBQP 0x0018
> -#define MACB_TBQP 0x001c
> -#define MACB_RSR 0x0020
> -#define MACB_ISR 0x0024
> -#define MACB_IER 0x0028
> -#define MACB_IDR 0x002c
> -#define MACB_IMR 0x0030
> -#define MACB_MAN 0x0034
> +#define MACB_TSR 0x0014 /* Transmit Status */
> +#define MACB_RBQP 0x0018 /* RX Q Base Address */
> +#define MACB_TBQP 0x001c /* TX Q Base Address */
> +#define MACB_RSR 0x0020 /* Receive Status */
> +#define MACB_ISR 0x0024 /* Interrupt Status */
> +#define MACB_IER 0x0028 /* Interrupt Enable */
> +#define MACB_IDR 0x002c /* Interrupt Disable */
> +#define MACB_IMR 0x0030 /* Interrupt Mask */
> +#define MACB_MAN 0x0034 /* PHY Maintenance */
> #define MACB_PTR 0x0038
> #define MACB_PFR 0x003c
> #define MACB_FTO 0x0040
> @@ -68,27 +68,27 @@
> #define MACB_MID 0x00fc
>
> /* GEM register offsets. */
> -#define GEM_NCFGR 0x0004
> -#define GEM_USRIO 0x000c
> -#define GEM_DMACFG 0x0010
> -#define GEM_HRB 0x0080
> -#define GEM_HRT 0x0084
> -#define GEM_SA1B 0x0088
> -#define GEM_SA1T 0x008C
> -#define GEM_SA2B 0x0090
> -#define GEM_SA2T 0x0094
> -#define GEM_SA3B 0x0098
> -#define GEM_SA3T 0x009C
> -#define GEM_SA4B 0x00A0
> -#define GEM_SA4T 0x00A4
> -#define GEM_OTX 0x0100
> -#define GEM_DCFG1 0x0280
> -#define GEM_DCFG2 0x0284
> -#define GEM_DCFG3 0x0288
> -#define GEM_DCFG4 0x028c
> -#define GEM_DCFG5 0x0290
> -#define GEM_DCFG6 0x0294
> -#define GEM_DCFG7 0x0298
> +#define GEM_NCFGR 0x0004 /* Network Config */
> +#define GEM_USRIO 0x000c /* User IO */
> +#define GEM_DMACFG 0x0010 /* DMA Configuration */
> +#define GEM_HRB 0x0080 /* Hash Bottom */
> +#define GEM_HRT 0x0084 /* Hash Top */
> +#define GEM_SA1B 0x0088 /* Specific1 Bottom */
> +#define GEM_SA1T 0x008C /* Specific1 Top */
> +#define GEM_SA2B 0x0090 /* Specific2 Bottom */
> +#define GEM_SA2T 0x0094 /* Specific2 Top */
> +#define GEM_SA3B 0x0098 /* Specific3 Bottom */
> +#define GEM_SA3T 0x009C /* Specific3 Top */
> +#define GEM_SA4B 0x00A0 /* Specific4 Bottom */
> +#define GEM_SA4T 0x00A4 /* Specific4 Top */
> +#define GEM_OTX 0x0100 /* Octets transmitted */
> +#define GEM_DCFG1 0x0280 /* Design Config 1 */
> +#define GEM_DCFG2 0x0284 /* Design Config 2 */
> +#define GEM_DCFG3 0x0288 /* Design Config 3 */
> +#define GEM_DCFG4 0x028c /* Design Config 4 */
> +#define GEM_DCFG5 0x0290 /* Design Config 5 */
> +#define GEM_DCFG6 0x0294 /* Design Config 6 */
> +#define GEM_DCFG7 0x0298 /* Design Config 7 */
>
> #define GEM_ISR(hw_q) (0x0400 + ((hw_q) << 2))
> #define GEM_TBQP(hw_q) (0x0440 + ((hw_q) << 2))
> @@ -98,67 +98,73 @@
> #define GEM_IMR(hw_q) (0x0640 + ((hw_q) << 2))
>
> /* Bitfields in NCR */
> -#define MACB_LB_OFFSET 0
> +#define MACB_LB_OFFSET 0 /* reserved */
> #define MACB_LB_SIZE 1
> -#define MACB_LLB_OFFSET 1
> +#define MACB_LLB_OFFSET 1 /* Loop back local */
> #define MACB_LLB_SIZE 1
> -#define MACB_RE_OFFSET 2
> +#define MACB_RE_OFFSET 2 /* Receive enable */
> #define MACB_RE_SIZE 1
> -#define MACB_TE_OFFSET 3
> +#define MACB_TE_OFFSET 3 /* Transmit enable */
> #define MACB_TE_SIZE 1
> -#define MACB_MPE_OFFSET 4
> +#define MACB_MPE_OFFSET 4 /* Management port enable */
> #define MACB_MPE_SIZE 1
> -#define MACB_CLRSTAT_OFFSET 5
> +#define MACB_CLRSTAT_OFFSET 5 /* Clear stats regs */
> #define MACB_CLRSTAT_SIZE 1
> -#define MACB_INCSTAT_OFFSET 6
> +#define MACB_INCSTAT_OFFSET 6 /* Incremental stats regs */
> #define MACB_INCSTAT_SIZE 1
> -#define MACB_WESTAT_OFFSET 7
> +#define MACB_WESTAT_OFFSET 7 /* Write enable stats regs */
> #define MACB_WESTAT_SIZE 1
> -#define MACB_BP_OFFSET 8
> +#define MACB_BP_OFFSET 8 /* Back pressure */
> #define MACB_BP_SIZE 1
> -#define MACB_TSTART_OFFSET 9
> +#define MACB_TSTART_OFFSET 9 /* Start transmission */
> #define MACB_TSTART_SIZE 1
> -#define MACB_THALT_OFFSET 10
> +#define MACB_THALT_OFFSET 10 /* Transmit halt */
> #define MACB_THALT_SIZE 1
> -#define MACB_NCR_TPF_OFFSET 11
> +#define MACB_NCR_TPF_OFFSET 11 /* Transmit pause frame */
> #define MACB_NCR_TPF_SIZE 1
> -#define MACB_TZQ_OFFSET 12
> +#define MACB_TZQ_OFFSET 12 /* Transmit zero quantum
> + * pause frame
> + */
> #define MACB_TZQ_SIZE 1
>
> /* Bitfields in NCFGR */
> -#define MACB_SPD_OFFSET 0
> +#define MACB_SPD_OFFSET 0 /* Speed */
> #define MACB_SPD_SIZE 1
> -#define MACB_FD_OFFSET 1
> +#define MACB_FD_OFFSET 1 /* Full duplex */
> #define MACB_FD_SIZE 1
> -#define MACB_BIT_RATE_OFFSET 2
> +#define MACB_BIT_RATE_OFFSET 2 /* Discard non-VLAN frames */
> #define MACB_BIT_RATE_SIZE 1
> -#define MACB_JFRAME_OFFSET 3
> +#define MACB_JFRAME_OFFSET 3 /* reserved */
> #define MACB_JFRAME_SIZE 1
> -#define MACB_CAF_OFFSET 4
> +#define MACB_CAF_OFFSET 4 /* Copy all frames */
> #define MACB_CAF_SIZE 1
> -#define MACB_NBC_OFFSET 5
> +#define MACB_NBC_OFFSET 5 /* No broadcast */
> #define MACB_NBC_SIZE 1
> -#define MACB_NCFGR_MTI_OFFSET 6
> +#define MACB_NCFGR_MTI_OFFSET 6 /* Multicast hash enable */
> #define MACB_NCFGR_MTI_SIZE 1
> -#define MACB_UNI_OFFSET 7
> +#define MACB_UNI_OFFSET 7 /* Unicast hash enable */
> #define MACB_UNI_SIZE 1
> -#define MACB_BIG_OFFSET 8
> +#define MACB_BIG_OFFSET 8 /* Receive 1536 byte frames */
> #define MACB_BIG_SIZE 1
> -#define MACB_EAE_OFFSET 9
> +#define MACB_EAE_OFFSET 9 /* External address match
> + * enable
> + */
> #define MACB_EAE_SIZE 1
> #define MACB_CLK_OFFSET 10
> #define MACB_CLK_SIZE 2
> -#define MACB_RTY_OFFSET 12
> +#define MACB_RTY_OFFSET 12 /* Retry test */
> #define MACB_RTY_SIZE 1
> -#define MACB_PAE_OFFSET 13
> +#define MACB_PAE_OFFSET 13 /* Pause enable */
> #define MACB_PAE_SIZE 1
> #define MACB_RM9200_RMII_OFFSET 13 /* AT91RM9200 only */
> #define MACB_RM9200_RMII_SIZE 1 /* AT91RM9200 only */
> -#define MACB_RBOF_OFFSET 14
> +#define MACB_RBOF_OFFSET 14 /* Receive buffer offset */
> #define MACB_RBOF_SIZE 2
> -#define MACB_RLCE_OFFSET 16
> +#define MACB_RLCE_OFFSET 16 /* Length field error frame
> + * discard
> + */
> #define MACB_RLCE_SIZE 1
> -#define MACB_DRFCS_OFFSET 17
> +#define MACB_DRFCS_OFFSET 17 /* FCS remove */
> #define MACB_DRFCS_SIZE 1
> #define MACB_EFRHD_OFFSET 18
> #define MACB_EFRHD_SIZE 1
> @@ -166,111 +172,160 @@
> #define MACB_IRXFCS_SIZE 1
>
> /* GEM specific NCFGR bitfields. */
> -#define GEM_GBE_OFFSET 10
> +#define GEM_GBE_OFFSET 10 /* Gigabit mode enable */
> #define GEM_GBE_SIZE 1
> -#define GEM_CLK_OFFSET 18
> +#define GEM_CLK_OFFSET 18 /* MDC clock division */
> #define GEM_CLK_SIZE 3
> -#define GEM_DBW_OFFSET 21
> +#define GEM_DBW_OFFSET 21 /* Data bus width */
> #define GEM_DBW_SIZE 2
> #define GEM_RXCOEN_OFFSET 24
> #define GEM_RXCOEN_SIZE 1
>
> /* Constants for data bus width. */
> -#define GEM_DBW32 0
> -#define GEM_DBW64 1
> -#define GEM_DBW128 2
> +#define GEM_DBW32 0 /* 32 bit AMBA AHB data bus
> + * width
> + */
> +#define GEM_DBW64 1 /* 64 bit AMBA AHB data bus
> + * width
> + */
> +#define GEM_DBW128 2 /* 128 bit AMBA AHB data bus
> + * width
> + */
>
> /* Bitfields in DMACFG. */
> -#define GEM_FBLDO_OFFSET 0
> +#define GEM_FBLDO_OFFSET 0 /* AHB fixed burst length for
> + * DMA data operations
> + */
> #define GEM_FBLDO_SIZE 5
> -#define GEM_ENDIA_OFFSET 7
> +#define GEM_ENDIA_OFFSET 7 /* AHB endian swap mode enable
> + * for packet data accesses
> + */
> #define GEM_ENDIA_SIZE 1
> -#define GEM_RXBMS_OFFSET 8
> +#define GEM_RXBMS_OFFSET 8 /* Receiver packet buffer
> + * memory size select
> + */
> #define GEM_RXBMS_SIZE 2
> -#define GEM_TXPBMS_OFFSET 10
> +#define GEM_TXPBMS_OFFSET 10 /* Transmitter packet buffer
> + * memory size select
> + */
> #define GEM_TXPBMS_SIZE 1
> -#define GEM_TXCOEN_OFFSET 11
> +#define GEM_TXCOEN_OFFSET 11 /* Transmitter IP, TCP and
> + * UDP checksum generation
> + * offload enable
> + */
> #define GEM_TXCOEN_SIZE 1
> -#define GEM_RXBS_OFFSET 16
> +#define GEM_RXBS_OFFSET 16 /* DMA receive buffer size in
> + * AHB system memory
> + */
> #define GEM_RXBS_SIZE 8
> -#define GEM_DDRP_OFFSET 24
> +#define GEM_DDRP_OFFSET 24 /* disc_when_no_ahb */
> #define GEM_DDRP_SIZE 1
>
>
> /* Bitfields in NSR */
> -#define MACB_NSR_LINK_OFFSET 0
> +#define MACB_NSR_LINK_OFFSET 0 /* pcs_link_state */
> #define MACB_NSR_LINK_SIZE 1
> -#define MACB_MDIO_OFFSET 1
> +#define MACB_MDIO_OFFSET 1 /* status of the mdio_in
> + * pin
> + */
> #define MACB_MDIO_SIZE 1
> -#define MACB_IDLE_OFFSET 2
> +#define MACB_IDLE_OFFSET 2 /* The PHY management logic is
> + * idle (i.e. has completed)
> + */
> #define MACB_IDLE_SIZE 1
>
> /* Bitfields in TSR */
> -#define MACB_UBR_OFFSET 0
> +#define MACB_UBR_OFFSET 0 /* Used bit read */
> #define MACB_UBR_SIZE 1
> -#define MACB_COL_OFFSET 1
> +#define MACB_COL_OFFSET 1 /* Collision occurred */
> #define MACB_COL_SIZE 1
> -#define MACB_TSR_RLE_OFFSET 2
> +#define MACB_TSR_RLE_OFFSET 2 /* Retry limit exceeded */
> #define MACB_TSR_RLE_SIZE 1
> -#define MACB_TGO_OFFSET 3
> +#define MACB_TGO_OFFSET 3 /* Transmit go */
> #define MACB_TGO_SIZE 1
> -#define MACB_BEX_OFFSET 4
> +#define MACB_BEX_OFFSET 4 /* Transmit frame corruption
> + * due to AHB error
> + */
> #define MACB_BEX_SIZE 1
> #define MACB_RM9200_BNQ_OFFSET 4 /* AT91RM9200 only */
> #define MACB_RM9200_BNQ_SIZE 1 /* AT91RM9200 only */
> -#define MACB_COMP_OFFSET 5
> +#define MACB_COMP_OFFSET 5 /* Trnasmit complete */
> #define MACB_COMP_SIZE 1
> -#define MACB_UND_OFFSET 6
> +#define MACB_UND_OFFSET 6 /* Trnasmit under run */
> #define MACB_UND_SIZE 1
>
> /* Bitfields in RSR */
> -#define MACB_BNA_OFFSET 0
> +#define MACB_BNA_OFFSET 0 /* Buffer not available */
> #define MACB_BNA_SIZE 1
> -#define MACB_REC_OFFSET 1
> +#define MACB_REC_OFFSET 1 /* Frame received */
> #define MACB_REC_SIZE 1
> -#define MACB_OVR_OFFSET 2
> +#define MACB_OVR_OFFSET 2 /* Receive overrun */
> #define MACB_OVR_SIZE 1
>
> /* Bitfields in ISR/IER/IDR/IMR */
> -#define MACB_MFD_OFFSET 0
> +#define MACB_MFD_OFFSET 0 /* Management frame sent */
> #define MACB_MFD_SIZE 1
> -#define MACB_RCOMP_OFFSET 1
> +#define MACB_RCOMP_OFFSET 1 /* Receive complete */
> #define MACB_RCOMP_SIZE 1
> -#define MACB_RXUBR_OFFSET 2
> +#define MACB_RXUBR_OFFSET 2 /* RX used bit read */
> #define MACB_RXUBR_SIZE 1
> -#define MACB_TXUBR_OFFSET 3
> +#define MACB_TXUBR_OFFSET 3 /* TX used bit read */
> #define MACB_TXUBR_SIZE 1
> -#define MACB_ISR_TUND_OFFSET 4
> +#define MACB_ISR_TUND_OFFSET 4 /* Enable trnasmit buffer
> + * under run interrupt
> + */
> #define MACB_ISR_TUND_SIZE 1
> -#define MACB_ISR_RLE_OFFSET 5
> +#define MACB_ISR_RLE_OFFSET 5 /* Enable retry limit exceeded
> + * or late collision interrupt
> + */
> #define MACB_ISR_RLE_SIZE 1
> -#define MACB_TXERR_OFFSET 6
> +#define MACB_TXERR_OFFSET 6 /* Enable transmit frame
> + * corruption due to AHB error
> + * interrupt
> + */
> #define MACB_TXERR_SIZE 1
> -#define MACB_TCOMP_OFFSET 7
> +#define MACB_TCOMP_OFFSET 7 /* Enable transmit complete
> + * interrupt
> + */
> #define MACB_TCOMP_SIZE 1
> -#define MACB_ISR_LINK_OFFSET 9
> +#define MACB_ISR_LINK_OFFSET 9 /* Enable link change
> + * interrupt
> + */
> #define MACB_ISR_LINK_SIZE 1
> -#define MACB_ISR_ROVR_OFFSET 10
> +#define MACB_ISR_ROVR_OFFSET 10 /* Enable receive overrun
> + * interrupt
> + */
> #define MACB_ISR_ROVR_SIZE 1
> -#define MACB_HRESP_OFFSET 11
> +#define MACB_HRESP_OFFSET 11 /* Enable hrsep not OK
> + * interrupt
> + */
> #define MACB_HRESP_SIZE 1
> -#define MACB_PFR_OFFSET 12
> +#define MACB_PFR_OFFSET 12 /* Enable pause frame with
> + * non-zero pause quantum
> + * interrupt
> + */
> #define MACB_PFR_SIZE 1
> -#define MACB_PTZ_OFFSET 13
> +#define MACB_PTZ_OFFSET 13 /* Enable pause time zero
> + * interrupt
> + */
> #define MACB_PTZ_SIZE 1
>
> /* Bitfields in MAN */
> -#define MACB_DATA_OFFSET 0
> +#define MACB_DATA_OFFSET 0 /* data */
> #define MACB_DATA_SIZE 16
> -#define MACB_CODE_OFFSET 16
> +#define MACB_CODE_OFFSET 16 /* Must be written to 10 */
> #define MACB_CODE_SIZE 2
> -#define MACB_REGA_OFFSET 18
> +#define MACB_REGA_OFFSET 18 /* Register address */
> #define MACB_REGA_SIZE 5
> -#define MACB_PHYA_OFFSET 23
> +#define MACB_PHYA_OFFSET 23 /* PHY address */
> #define MACB_PHYA_SIZE 5
> -#define MACB_RW_OFFSET 28
> +#define MACB_RW_OFFSET 28 /* Operation. 10 is read. 01
> + * is write.
> + */
> #define MACB_RW_SIZE 2
> -#define MACB_SOF_OFFSET 30
> +#define MACB_SOF_OFFSET 30 /* Must be written to 1 for
> + * Clause 22 operation
> + */
> #define MACB_SOF_SIZE 2
>
> /* Bitfields in USRIO (AVR32) */
> @@ -286,7 +341,7 @@
> /* Bitfields in USRIO (AT91) */
> #define MACB_RMII_OFFSET 0
> #define MACB_RMII_SIZE 1
> -#define GEM_RGMII_OFFSET 0 /* GEM gigabit mode */
> +#define GEM_RGMII_OFFSET 0 /* GEM gigabit mode */
> #define GEM_RGMII_SIZE 1
> #define MACB_CLKEN_OFFSET 1
> #define MACB_CLKEN_SIZE 1
>
--
Nicolas Ferre
^ permalink raw reply
* Re: [PATCH 1/2] net/macb: Adding comments to various #defs to make interpretation easier
From: Brad Mouring @ 2015-01-14 15:11 UTC (permalink / raw)
To: David Miller
Cc: xander.huff, nicolas.ferre, jaeden.amero, rich.tollerton,
ben.shelton, brad.mouring, netdev, linux-kernel
In-Reply-To: <20150114.002609.1903467089677912749.davem@davemloft.net>
On Wed, Jan 14, 2015 at 12:26:09AM -0500, David Miller wrote:
> From: Xander Huff <xander.huff@ni.com>
> Date: Tue, 13 Jan 2015 16:15:50 -0600
>
> > This change is to help improve at-a-glace knowledge of the purpose of the
> > various Cadence MACB/GEM registers. Comments are more helpful for human
> > readability than short acronyms.
> >
> > Describe various #define varibles Cadence MACB/GEM registers as documented
> > in Xilinix's "Zynq-7000 All Programmable SoC TechnicalReference Manual, v1.9.1
s/Xilinix/Xilinx/. Sorry for the previous html-spam. Didn't follow the rule to always get coffee in the system prior to responding.
> > (UG-585)"
> >
> > Signed-off-by: Xander Huff <xander.huff@ni.com>
>
> Applied.
^ permalink raw reply
* RE: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
From: Zhou, Danny @ 2015-01-14 15:26 UTC (permalink / raw)
To: Willem de Bruijn, John Fastabend
Cc: Network Development, Neil Horman, Daniel Borkmann, Ronciak, John,
Hannes Frederic Sowa, brouer@redhat.com
In-Reply-To: <CA+FuTSdCV8puZVe-6aWQt5mPk0i3_CBK7hecOgviLc0GpdUmNw@mail.gmail.com>
> -----Original Message-----
> From: Willem de Bruijn [mailto:willemb@google.com]
> Sent: Wednesday, January 14, 2015 2:53 AM
> To: John Fastabend
> Cc: Network Development; Zhou, Danny; Neil Horman; Daniel Borkmann; Ronciak, John; Hannes Frederic Sowa;
> brouer@redhat.com
> Subject: Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
>
> On Mon, Jan 12, 2015 at 11:35 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
> > This patch adds net_device ops to split off a set of driver queues
> > from the driver and map the queues into user space via mmap. This
> > allows the queues to be directly manipulated from user space. For
> > raw packet interface this removes any overhead from the kernel network
> > stack.
>
> Can you elaborate how packet payload mapping is handled?
> Processes are still responsible for translating from user virtual to
> physical (and bus) addresses, correct? The IOMMU is only there
> to restrict the physical address ranges that may be written.
>
User space processes have to use the IOVA returned from af_packet to fill
NIC's Rx (as well as Tx) descriptors. When a DMA request is trigged for transferring a
coming packet from the NIC to host memory, the device ID(specified by PCIe device' B:N:F)
field in the DMA request will be used by IOMMU to find the device address translation
structure for this domain/device. Then the IOMMU will use the IOVA field in the
DMA request as the match field to look up the per-device address translation structure
to get the corresponding physical address pointing to where packet should be transferred to.
If an invalid IOVA address (e.g. arbitrary address or physical address) is filled in NIC's descriptors,
IOMMU would prevent DMA from happening due to above lookup operation returns failure.
> >
> > With these operations we bypass the network stack and packet_type
> > handlers that would typically send traffic to an af_packet socket.
> > This means hardware must do the forwarding. To do this ew can use
> > the ETHTOOL_SRXCLSRLINS ops in the ethtool command set. It is
> > currently supported by multiple drivers including sfc, mlx4, niu,
> > ixgbe, and i40e. Supporting some way to steer traffic to a queue
> > is the _only_ hardware requirement to support this interface.
> >
> > A follow on patch adds support for ixgbe but we expect at least
> > the subset of drivers implementing ETHTOOL_SRXCLSRLINS can be
> > implemented later.
> >
> > The high level flow, leveraging the af_packet control path, looks
> > like:
> >
> > bind(fd, &sockaddr, sizeof(sockaddr));
> >
> > /* Get the device type and info */
> > getsockopt(fd, SOL_PACKET, PACKET_DEV_DESC_INFO, &def_info,
> > &optlen);
> >
> > /* With device info we can look up descriptor format */
> >
> > /* Get the layout of ring space offset, page_sz, cnt */
> > getsockopt(fd, SOL_PACKET, PACKET_DEV_QPAIR_MAP_REGION_INFO,
> > &info, &optlen);
> >
> > /* request some queues from the driver */
> > setsockopt(fd, SOL_PACKET, PACKET_RXTX_QPAIRS_SPLIT,
> > &qpairs_info, sizeof(qpairs_info));
> >
> > /* if we let the driver pick us queues learn which queues
> > * we were given
> > */
> > getsockopt(fd, SOL_PACKET, PACKET_RXTX_QPAIRS_SPLIT,
> > &qpairs_info, sizeof(qpairs_info));
> >
> > /* And mmap queue pairs to user space */
> > mmap(NULL, info.tp_dev_bar_sz, PROT_READ | PROT_WRITE,
> > MAP_SHARED, fd, 0);
> >
> > /* Now we have some user space queues to read/write to*/
> >
> > There is one critical difference when running with these interfaces
> > vs running without them. In the normal case the af_packet module
> > uses a standard descriptor format exported by the af_packet user
> > space headers. In this model because we are working directly with
> > driver queues the descriptor format maps to the descriptor format
> > used by the device. User space applications can learn device
> > information from the socket option PACKET_DEV_DESC_INFO. These
> > are described by giving the vendor/deviceid and a descriptor layout
> > in offset/length/width/alignment/byte_ordering.
>
> Raising the issue of exposed vs. virtualized interface just once
> more. I wonder if it is possible to keep the virtual to physical
> translation in the kernel while avoiding syscall latency, by doing
> the translation in a kernel thread on a coupled hyperthread that
> waits with mwait on the virtual queue producer index. The page
> table operations that Neil proposed in v1 of this patch may work
> even better.
>
This is one shot request during initialization, so should be ok from latency
prospective. The NIC requests physically contiguous host memory region
to be used as rx/tx packet buffer, so the physical address is provided for af_packet
or the NIC driver to do this check. Otherwise, it is hard to check it for given
virtual address and size of the memory regions.
> > To protect against arbitrary DMA writes IOMMU devices put memory
> > in a single domain to stop arbitrary DMA to memory. Note it would
> > be possible to dma into another sockets pages because most NIC
> > devices only support a single domain. This would require being
> > able to guess another sockets page layout. However the socket
> > operation does require CAP_NET_ADMIN privileges.
> >
> > Additionally we have a set of DPDK patches to enable DPDK with this
> > interface. DPDK can be downloaded @ dpdk.org although as I hope is
> > clear from above DPDK is just our paticular test environment we
> > expect other libraries could be built on this interface.
> >
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > ---
> > include/linux/netdevice.h | 79 ++++++++
> > include/uapi/linux/if_packet.h | 88 +++++++++
> > net/packet/af_packet.c | 397 ++++++++++++++++++++++++++++++++++++++++
> > net/packet/internal.h | 10 +
> > 4 files changed, 573 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 679e6e9..b71c97d 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -52,6 +52,8 @@
> > #include <linux/neighbour.h>
> > #include <uapi/linux/netdevice.h>
> >
> > +#include <linux/if_packet.h>
> > +
> > struct netpoll_info;
> > struct device;
> > struct phy_device;
> > @@ -1030,6 +1032,54 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
> > * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
> > * Called to notify switch device port of bridge port STP
> > * state change.
> > + *
> > + * int (*ndo_split_queue_pairs) (struct net_device *dev,
> > + * unsigned int qpairs_start_from,
> > + * unsigned int qpairs_num,
> > + * struct sock *sk)
> > + * Called to request a set of queues from the driver to be handed to the
> > + * callee for management. After this returns the driver will not use the
> > + * queues.
> > + *
> > + * int (*ndo_get_split_queue_pairs) (struct net_device *dev,
> > + * unsigned int *qpairs_start_from,
> > + * unsigned int *qpairs_num,
> > + * struct sock *sk)
> > + * Called to get the location of queues that have been split for user
> > + * space to use. The socket must have previously requested the queues via
> > + * ndo_split_queue_pairs successfully.
> > + *
> > + * int (*ndo_return_queue_pairs) (struct net_device *dev,
> > + * struct sock *sk)
> > + * Called to return a set of queues identified by sock to the driver. The
> > + * socket must have previously requested the queues via
> > + * ndo_split_queue_pairs for this action to be performed.
> > + *
> > + * int (*ndo_get_device_qpair_map_region_info) (struct net_device *dev,
> > + * struct tpacket_dev_qpair_map_region_info *info)
> > + * Called to return mapping of queue memory region.
> > + *
> > + * int (*ndo_get_device_desc_info) (struct net_device *dev,
> > + * struct tpacket_dev_info *dev_info)
> > + * Called to get device specific information. This should uniquely identify
> > + * the hardware so that descriptor formats can be learned by the stack/user
> > + * space.
> > + *
> > + * int (*ndo_direct_qpair_page_map) (struct vm_area_struct *vma,
> > + * struct net_device *dev)
> > + * Called to map queue pair range from split_queue_pairs into mmap region.
> > + *
> > + * int (*ndo_direct_validate_dma_mem_region_map)
> > + * (struct net_device *dev,
> > + * struct tpacket_dma_mem_region *region,
> > + * struct sock *sk)
> > + * Called to validate DMA address remaping for userspace memory region
> > + *
> > + * int (*ndo_get_dma_region_info)
> > + * (struct net_device *dev,
> > + * struct tpacket_dma_mem_region *region,
> > + * struct sock *sk)
> > + * Called to get dma region' information such as iova.
> > */
> > struct net_device_ops {
> > int (*ndo_init)(struct net_device *dev);
> > @@ -1190,6 +1240,35 @@ struct net_device_ops {
> > int (*ndo_switch_port_stp_update)(struct net_device *dev,
> > u8 state);
> > #endif
> > + int (*ndo_split_queue_pairs)(struct net_device *dev,
> > + unsigned int qpairs_start_from,
> > + unsigned int qpairs_num,
> > + struct sock *sk);
> > + int (*ndo_get_split_queue_pairs)
> > + (struct net_device *dev,
> > + unsigned int *qpairs_start_from,
> > + unsigned int *qpairs_num,
> > + struct sock *sk);
> > + int (*ndo_return_queue_pairs)
> > + (struct net_device *dev,
> > + struct sock *sk);
> > + int (*ndo_get_device_qpair_map_region_info)
> > + (struct net_device *dev,
> > + struct tpacket_dev_qpair_map_region_info *info);
> > + int (*ndo_get_device_desc_info)
> > + (struct net_device *dev,
> > + struct tpacket_dev_info *dev_info);
> > + int (*ndo_direct_qpair_page_map)
> > + (struct vm_area_struct *vma,
> > + struct net_device *dev);
> > + int (*ndo_validate_dma_mem_region_map)
> > + (struct net_device *dev,
> > + struct tpacket_dma_mem_region *region,
> > + struct sock *sk);
> > + int (*ndo_get_dma_region_info)
> > + (struct net_device *dev,
> > + struct tpacket_dma_mem_region *region,
> > + struct sock *sk);
> > };
> >
> > /**
> > diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
> > index da2d668..eb7a727 100644
> > --- a/include/uapi/linux/if_packet.h
> > +++ b/include/uapi/linux/if_packet.h
> > @@ -54,6 +54,13 @@ struct sockaddr_ll {
> > #define PACKET_FANOUT 18
> > #define PACKET_TX_HAS_OFF 19
> > #define PACKET_QDISC_BYPASS 20
> > +#define PACKET_RXTX_QPAIRS_SPLIT 21
> > +#define PACKET_RXTX_QPAIRS_RETURN 22
> > +#define PACKET_DEV_QPAIR_MAP_REGION_INFO 23
> > +#define PACKET_DEV_DESC_INFO 24
> > +#define PACKET_DMA_MEM_REGION_MAP 25
> > +#define PACKET_DMA_MEM_REGION_RELEASE 26
> > +
> >
> > #define PACKET_FANOUT_HASH 0
> > #define PACKET_FANOUT_LB 1
> > @@ -64,6 +71,87 @@ struct sockaddr_ll {
> > #define PACKET_FANOUT_FLAG_ROLLOVER 0x1000
> > #define PACKET_FANOUT_FLAG_DEFRAG 0x8000
> >
> > +#define PACKET_MAX_NUM_MAP_MEMORY_REGIONS 64
> > +#define PACKET_MAX_NUM_DESC_FORMATS 8
> > +#define PACKET_MAX_NUM_DESC_FIELDS 64
> > +#define PACKET_NIC_DESC_FIELD(fseq, foffset, fwidth, falign, fbo) \
> > + .seqn = (__u8)fseq, \
> > + .offset = (__u8)foffset, \
> > + .width = (__u8)fwidth, \
> > + .align = (__u8)falign, \
> > + .byte_order = (__u8)fbo
> > +
> > +#define MAX_MAP_MEMORY_REGIONS 64
> > +
> > +/* setsockopt takes addr, size ,direction parametner, getsockopt takes
> > + * iova, size, direction.
> > + * */
> > +struct tpacket_dma_mem_region {
> > + void *addr; /* userspace virtual address */
> > + __u64 phys_addr; /* physical address */
> > + __u64 iova; /* IO virtual address used for DMA */
> > + unsigned long size; /* size of region */
> > + int direction; /* dma data direction */
> > +};
> > +
> > +struct tpacket_dev_qpair_map_region_info {
> > + unsigned int tp_dev_bar_sz; /* size of BAR */
> > + unsigned int tp_dev_sysm_sz; /* size of systerm memory */
> > + /* number of contiguous memory on BAR mapping to user space */
> > + unsigned int tp_num_map_regions;
> > + /* number of contiguous memory on system mapping to user apce */
> > + unsigned int tp_num_sysm_map_regions;
> > + struct map_page_region {
> > + unsigned page_offset; /* offset to start of region */
> > + unsigned page_sz; /* size of page */
> > + unsigned page_cnt; /* number of pages */
> > + } tp_regions[MAX_MAP_MEMORY_REGIONS];
> > +};
> > +
> > +struct tpacket_dev_qpairs_info {
> > + unsigned int tp_qpairs_start_from; /* qpairs index to start from */
> > + unsigned int tp_qpairs_num; /* number of qpairs */
> > +};
> > +
> > +enum tpack_desc_byte_order {
> > + BO_NATIVE = 0,
> > + BO_NETWORK,
> > + BO_BIG_ENDIAN,
> > + BO_LITTLE_ENDIAN,
> > +};
> > +
> > +struct tpacket_nic_desc_fld {
> > + __u8 seqn; /* Sequency index of descriptor field */
> > + __u8 offset; /* Offset to start */
> > + __u8 width; /* Width of field */
> > + __u8 align; /* Alignment in bits */
> > + enum tpack_desc_byte_order byte_order; /* Endian flag */
> > +};
> > +
> > +struct tpacket_nic_desc_expr {
> > + __u8 version; /* Version number */
> > + __u8 size; /* Descriptor size in bytes */
> > + enum tpack_desc_byte_order byte_order; /* Endian flag */
> > + __u8 num_of_fld; /* Number of valid fields */
> > + /* List of each descriptor field */
> > + struct tpacket_nic_desc_fld fields[PACKET_MAX_NUM_DESC_FIELDS];
> > +};
> > +
> > +struct tpacket_dev_info {
> > + __u16 tp_device_id;
> > + __u16 tp_vendor_id;
> > + __u16 tp_subsystem_device_id;
> > + __u16 tp_subsystem_vendor_id;
> > + __u32 tp_numa_node;
> > + __u32 tp_revision_id;
> > + __u32 tp_num_total_qpairs;
> > + __u32 tp_num_inuse_qpairs;
> > + __u32 tp_num_rx_desc_fmt;
> > + __u32 tp_num_tx_desc_fmt;
> > + struct tpacket_nic_desc_expr tp_rx_dexpr[PACKET_MAX_NUM_DESC_FORMATS];
> > + struct tpacket_nic_desc_expr tp_tx_dexpr[PACKET_MAX_NUM_DESC_FORMATS];
> > +};
> > +
> > struct tpacket_stats {
> > unsigned int tp_packets;
> > unsigned int tp_drops;
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 6880f34..8cd17da 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -214,6 +214,9 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *,
> > static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
> > struct tpacket3_hdr *);
> > static void packet_flush_mclist(struct sock *sk);
> > +static int umem_release(struct net_device *dev, struct packet_sock *po);
> > +static int get_umem_pages(struct tpacket_dma_mem_region *region,
> > + struct packet_umem_region *umem);
> >
> > struct packet_skb_cb {
> > unsigned int origlen;
> > @@ -2633,6 +2636,16 @@ static int packet_release(struct socket *sock)
> > sock_prot_inuse_add(net, sk->sk_prot, -1);
> > preempt_enable();
> >
> > + if (po->tp_owns_queue_pairs) {
> > + struct net_device *dev;
> > +
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (dev) {
> > + dev->netdev_ops->ndo_return_queue_pairs(dev, sk);
> > + umem_release(dev, po);
> > + }
> > + }
> > +
> > spin_lock(&po->bind_lock);
> > unregister_prot_hook(sk, false);
> > packet_cached_dev_reset(po);
> > @@ -2829,6 +2842,8 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
> > po->num = proto;
> > po->xmit = dev_queue_xmit;
> >
> > + INIT_LIST_HEAD(&po->umem_list);
> > +
> > err = packet_alloc_pending(po);
> > if (err)
> > goto out2;
> > @@ -3226,6 +3241,88 @@ static void packet_flush_mclist(struct sock *sk)
> > }
> >
> > static int
> > +get_umem_pages(struct tpacket_dma_mem_region *region,
> > + struct packet_umem_region *umem)
> > +{
> > + struct page **page_list;
> > + unsigned long npages;
> > + unsigned long offset;
> > + unsigned long base;
> > + unsigned long i;
> > + int ret;
> > + dma_addr_t phys_base;
> > +
> > + phys_base = (region->phys_addr) & PAGE_MASK;
> > + base = ((unsigned long)region->addr) & PAGE_MASK;
> > + offset = ((unsigned long)region->addr) & (~PAGE_MASK);
> > + npages = PAGE_ALIGN(region->size + offset) >> PAGE_SHIFT;
> > +
> > + npages = min_t(unsigned long, npages, umem->nents);
> > + sg_init_table(umem->sglist, npages);
> > +
> > + umem->nmap = 0;
> > + page_list = (struct page **)__get_free_page(GFP_KERNEL);
> > + if (!page_list)
> > + return -ENOMEM;
> > +
> > + while (npages) {
> > + unsigned long min = min_t(unsigned long, npages,
> > + PAGE_SIZE / sizeof(struct page *));
> > +
> > + ret = get_user_pages(current, current->mm, base, min,
> > + 1, 0, page_list, NULL);
> > + if (ret < 0)
> > + break;
> > +
> > + base += ret * PAGE_SIZE;
> > + npages -= ret;
> > +
> > + /* validate if the memory region is physically contigenous */
> > + for (i = 0; i < ret; i++) {
> > + unsigned int page_index =
> > + (page_to_phys(page_list[i]) - phys_base) /
> > + PAGE_SIZE;
> > +
> > + if (page_index != umem->nmap + i) {
> > + int j;
> > +
> > + for (j = 0; j < (umem->nmap + i); j++)
> > + put_page(sg_page(&umem->sglist[j]));
> > +
> > + free_page((unsigned long)page_list);
> > + return -EFAULT;
> > + }
> > +
> > + sg_set_page(&umem->sglist[umem->nmap + i],
> > + page_list[i], PAGE_SIZE, 0);
> > + }
> > +
> > + umem->nmap += ret;
> > + }
> > +
> > + free_page((unsigned long)page_list);
> > + return 0;
> > +}
> > +
> > +static int
> > +umem_release(struct net_device *dev, struct packet_sock *po)
> > +{
> > + struct packet_umem_region *umem, *tmp;
> > + int i;
> > +
> > + list_for_each_entry_safe(umem, tmp, &po->umem_list, list) {
> > + dma_unmap_sg(dev->dev.parent, umem->sglist,
> > + umem->nmap, umem->direction);
> > + for (i = 0; i < umem->nmap; i++)
> > + put_page(sg_page(&umem->sglist[i]));
> > +
> > + vfree(umem);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int
> > packet_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen)
> > {
> > struct sock *sk = sock->sk;
> > @@ -3428,6 +3525,167 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
> > po->xmit = val ? packet_direct_xmit : dev_queue_xmit;
> > return 0;
> > }
> > + case PACKET_RXTX_QPAIRS_SPLIT:
> > + {
> > + struct tpacket_dev_qpairs_info qpairs;
> > + const struct net_device_ops *ops;
> > + struct net_device *dev;
> > + int err;
> > +
> > + if (optlen != sizeof(qpairs))
> > + return -EINVAL;
> > + if (copy_from_user(&qpairs, optval, sizeof(qpairs)))
> > + return -EFAULT;
> > +
> > + /* Only allow one set of queues to be owned by userspace */
> > + if (po->tp_owns_queue_pairs)
> > + return -EBUSY;
> > +
> > + /* This call only works after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > + ops = dev->netdev_ops;
> > + if (!ops->ndo_split_queue_pairs)
> > + return -EOPNOTSUPP;
> > +
> > + err = ops->ndo_split_queue_pairs(dev,
> > + qpairs.tp_qpairs_start_from,
> > + qpairs.tp_qpairs_num, sk);
> > + if (!err)
> > + po->tp_owns_queue_pairs = true;
> > +
> > + return err;
> > + }
> > + case PACKET_RXTX_QPAIRS_RETURN:
> > + {
> > + struct tpacket_dev_qpairs_info qpairs_info;
> > + const struct net_device_ops *ops;
> > + struct net_device *dev;
> > + int err;
> > +
> > + if (optlen != sizeof(qpairs_info))
> > + return -EINVAL;
> > + if (copy_from_user(&qpairs_info, optval, sizeof(qpairs_info)))
> > + return -EFAULT;
> > +
> > + if (!po->tp_owns_queue_pairs)
> > + return -EINVAL;
> > +
> > + /* This call only work after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > + ops = dev->netdev_ops;
> > + if (!ops->ndo_split_queue_pairs)
> > + return -EOPNOTSUPP;
> > +
> > + err = dev->netdev_ops->ndo_return_queue_pairs(dev, sk);
> > + if (!err)
> > + po->tp_owns_queue_pairs = false;
> > +
> > + return err;
> > + }
> > + case PACKET_DMA_MEM_REGION_MAP:
> > + {
> > + struct tpacket_dma_mem_region region;
> > + const struct net_device_ops *ops;
> > + struct net_device *dev;
> > + struct packet_umem_region *umem;
> > + unsigned long npages;
> > + unsigned long offset;
> > + unsigned long i;
> > + int err;
> > +
> > + if (optlen != sizeof(region))
> > + return -EINVAL;
> > + if (copy_from_user(®ion, optval, sizeof(region)))
> > + return -EFAULT;
> > + if ((region.direction != DMA_BIDIRECTIONAL) &&
> > + (region.direction != DMA_TO_DEVICE) &&
> > + (region.direction != DMA_FROM_DEVICE))
> > + return -EFAULT;
> > +
> > + if (!po->tp_owns_queue_pairs)
> > + return -EINVAL;
> > +
> > + /* This call only work after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > +
> > + offset = ((unsigned long)region.addr) & (~PAGE_MASK);
> > + npages = PAGE_ALIGN(region.size + offset) >> PAGE_SHIFT;
> > +
> > + umem = vzalloc(sizeof(*umem) +
> > + sizeof(struct scatterlist) * npages);
> > + if (!umem)
> > + return -ENOMEM;
> > +
> > + umem->nents = npages;
> > + umem->direction = region.direction;
> > +
> > + down_write(¤t->mm->mmap_sem);
> > + if (get_umem_pages(®ion, umem) < 0) {
> > + ret = -EFAULT;
> > + goto exit;
> > + }
> > +
> > + if ((umem->nmap == npages) &&
> > + (0 != dma_map_sg(dev->dev.parent, umem->sglist,
> > + umem->nmap, region.direction))) {
> > + region.iova = sg_dma_address(umem->sglist) + offset;
> > +
> > + ops = dev->netdev_ops;
> > + if (!ops->ndo_validate_dma_mem_region_map) {
> > + ret = -EOPNOTSUPP;
> > + goto unmap;
> > + }
> > +
> > + /* use driver to validate mapping of dma memory */
> > + err = ops->ndo_validate_dma_mem_region_map(dev,
> > + ®ion,
> > + sk);
> > + if (!err) {
> > + list_add_tail(&umem->list, &po->umem_list);
> > + ret = 0;
> > + goto exit;
> > + }
> > + }
> > +
> > +unmap:
> > + dma_unmap_sg(dev->dev.parent, umem->sglist,
> > + umem->nmap, umem->direction);
> > + for (i = 0; i < umem->nmap; i++)
> > + put_page(sg_page(&umem->sglist[i]));
> > +
> > + vfree(umem);
> > +exit:
> > + up_write(¤t->mm->mmap_sem);
> > +
> > + return ret;
> > + }
> > + case PACKET_DMA_MEM_REGION_RELEASE:
> > + {
> > + struct net_device *dev;
> > +
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > +
> > + down_write(¤t->mm->mmap_sem);
> > + ret = umem_release(dev, po);
> > + up_write(¤t->mm->mmap_sem);
> > +
> > + return ret;
> > + }
> > +
> > default:
> > return -ENOPROTOOPT;
> > }
> > @@ -3523,6 +3781,129 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
> > case PACKET_QDISC_BYPASS:
> > val = packet_use_direct_xmit(po);
> > break;
> > + case PACKET_RXTX_QPAIRS_SPLIT:
> > + {
> > + struct net_device *dev;
> > + struct tpacket_dev_qpairs_info qpairs_info;
> > + int err;
> > +
> > + if (len != sizeof(qpairs_info))
> > + return -EINVAL;
> > + if (copy_from_user(&qpairs_info, optval, sizeof(qpairs_info)))
> > + return -EFAULT;
> > +
> > + /* This call only work after a successful queue pairs split-off
> > + * operation via setsockopt()
> > + */
> > + if (!po->tp_owns_queue_pairs)
> > + return -EINVAL;
> > +
> > + /* This call only work after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > + if (!dev->netdev_ops->ndo_split_queue_pairs)
> > + return -EOPNOTSUPP;
> > +
> > + err = dev->netdev_ops->ndo_get_split_queue_pairs(dev,
> > + &qpairs_info.tp_qpairs_start_from,
> > + &qpairs_info.tp_qpairs_num, sk);
> > +
> > + lv = sizeof(qpairs_info);
> > + data = &qpairs_info;
> > + break;
> > + }
> > + case PACKET_DEV_QPAIR_MAP_REGION_INFO:
> > + {
> > + struct tpacket_dev_qpair_map_region_info info;
> > + const struct net_device_ops *ops;
> > + struct net_device *dev;
> > + int err;
> > +
> > + if (len != sizeof(info))
> > + return -EINVAL;
> > + if (copy_from_user(&info, optval, sizeof(info)))
> > + return -EFAULT;
> > +
> > + /* This call only work after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > +
> > + ops = dev->netdev_ops;
> > + if (!ops->ndo_get_device_qpair_map_region_info)
> > + return -EOPNOTSUPP;
> > +
> > + err = ops->ndo_get_device_qpair_map_region_info(dev, &info);
> > + if (err)
> > + return err;
> > +
> > + lv = sizeof(struct tpacket_dev_qpair_map_region_info);
> > + data = &info;
> > + break;
> > + }
> > + case PACKET_DEV_DESC_INFO:
> > + {
> > + struct net_device *dev;
> > + struct tpacket_dev_info info;
> > + int err;
> > +
> > + if (len != sizeof(info))
> > + return -EINVAL;
> > + if (copy_from_user(&info, optval, sizeof(info)))
> > + return -EFAULT;
> > +
> > + /* This call only work after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > + if (!dev->netdev_ops->ndo_get_device_desc_info)
> > + return -EOPNOTSUPP;
> > +
> > + err = dev->netdev_ops->ndo_get_device_desc_info(dev, &info);
> > + if (err)
> > + return err;
> > +
> > + lv = sizeof(struct tpacket_dev_info);
> > + data = &info;
> > + break;
> > + }
> > + case PACKET_DMA_MEM_REGION_MAP:
> > + {
> > + struct tpacket_dma_mem_region info;
> > + struct net_device *dev;
> > + int err;
> > +
> > + if (len != sizeof(info))
> > + return -EINVAL;
> > + if (copy_from_user(&info, optval, sizeof(info)))
> > + return -EFAULT;
> > +
> > + /* This call only work after a bind call which calls a dev_hold
> > + * operation so we do not need to increment dev ref counter
> > + */
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > +
> > + if (!dev->netdev_ops->ndo_get_dma_region_info)
> > + return -EOPNOTSUPP;
> > +
> > + err = dev->netdev_ops->ndo_get_dma_region_info(dev, &info, sk);
> > + if (err)
> > + return err;
> > +
> > + lv = sizeof(struct tpacket_dma_mem_region);
> > + data = &info;
> > + break;
> > + }
> > +
> > default:
> > return -ENOPROTOOPT;
> > }
> > @@ -3536,7 +3917,6 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
> > return 0;
> > }
> >
> > -
> > static int packet_notifier(struct notifier_block *this,
> > unsigned long msg, void *ptr)
> > {
> > @@ -3920,6 +4300,8 @@ static int packet_mmap(struct file *file, struct socket *sock,
> > struct packet_sock *po = pkt_sk(sk);
> > unsigned long size, expected_size;
> > struct packet_ring_buffer *rb;
> > + const struct net_device_ops *ops;
> > + struct net_device *dev;
> > unsigned long start;
> > int err = -EINVAL;
> > int i;
> > @@ -3927,8 +4309,20 @@ static int packet_mmap(struct file *file, struct socket *sock,
> > if (vma->vm_pgoff)
> > return -EINVAL;
> >
> > + dev = __dev_get_by_index(sock_net(sk), po->ifindex);
> > + if (!dev)
> > + return -EINVAL;
> > +
> > mutex_lock(&po->pg_vec_lock);
> >
> > + if (po->tp_owns_queue_pairs) {
> > + ops = dev->netdev_ops;
> > + err = ops->ndo_direct_qpair_page_map(vma, dev);
> > + if (err)
> > + goto out;
> > + goto done;
> > + }
> > +
> > expected_size = 0;
> > for (rb = &po->rx_ring; rb <= &po->tx_ring; rb++) {
> > if (rb->pg_vec) {
> > @@ -3966,6 +4360,7 @@ static int packet_mmap(struct file *file, struct socket *sock,
> > }
> > }
> >
> > +done:
> > atomic_inc(&po->mapped);
> > vma->vm_ops = &packet_mmap_ops;
> > err = 0;
> > diff --git a/net/packet/internal.h b/net/packet/internal.h
> > index cdddf6a..55d2fce 100644
> > --- a/net/packet/internal.h
> > +++ b/net/packet/internal.h
> > @@ -90,6 +90,14 @@ struct packet_fanout {
> > struct packet_type prot_hook ____cacheline_aligned_in_smp;
> > };
> >
> > +struct packet_umem_region {
> > + struct list_head list;
> > + int nents;
> > + int nmap;
> > + int direction;
> > + struct scatterlist sglist[0];
> > +};
> > +
> > struct packet_sock {
> > /* struct sock has to be the first member of packet_sock */
> > struct sock sk;
> > @@ -97,6 +105,7 @@ struct packet_sock {
> > union tpacket_stats_u stats;
> > struct packet_ring_buffer rx_ring;
> > struct packet_ring_buffer tx_ring;
> > + struct list_head umem_list;
> > int copy_thresh;
> > spinlock_t bind_lock;
> > struct mutex pg_vec_lock;
> > @@ -113,6 +122,7 @@ struct packet_sock {
> > unsigned int tp_reserve;
> > unsigned int tp_loss:1;
> > unsigned int tp_tx_has_off:1;
> > + unsigned int tp_owns_queue_pairs:1;
> > unsigned int tp_tstamp;
> > struct net_device __rcu *cached_dev;
> > int (*xmit)(struct sk_buff *skb);
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
From: Zhou, Danny @ 2015-01-14 15:28 UTC (permalink / raw)
To: David Miller, David.Laight@ACULAB.COM
Cc: john.fastabend@gmail.com, dborkman@redhat.com,
hannes@stressinduktion.org, netdev@vger.kernel.org,
nhorman@tuxdriver.com, Ronciak, John, brouer@redhat.com
In-Reply-To: <20150113.122733.1459000731112321026.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, January 14, 2015 1:28 AM
> To: David.Laight@ACULAB.COM
> Cc: john.fastabend@gmail.com; dborkman@redhat.com; hannes@stressinduktion.org; netdev@vger.kernel.org; Zhou, Danny;
> nhorman@tuxdriver.com; Ronciak, John; brouer@redhat.com
> Subject: Re: [RFC PATCH v2 1/2] net: af_packet support for direct ring access in user space
>
> From: David Laight <David.Laight@ACULAB.COM>
> Date: Tue, 13 Jan 2015 17:15:30 +0000
>
> > How about something like:
> >
> > struct tpacket_dma_mem_region {
> > __u64 addr; /* userspace virtual address */
> > __u64 phys_addr; /* physical address */
> > __u64 iova; /* IO virtual address used for DMA */
> > __u64 size; /* size of region */
> > int direction; /* dma data direction */
> > } aligned(8);
> >
> > So that it is independant of 32/64 bits.
> > It is a shame that gcc has no way of defining a 64bit 'void *' on 32bit systems.
> > You can use a union, but you still need to zero extend the value on LE (worse on BE).
>
> We have an __aligned_u64, please use that.
Thanks, will do.
^ permalink raw reply
* [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Thomas Jarosch @ 2015-01-14 15:32 UTC (permalink / raw)
To: 'Linux Netdev List'; +Cc: Eric Dumazet, Jeff Kirsher, e1000-devel
Hello,
after updating a good bunch of production level machines
from kernel 3.4.101 to kernel 3.14.25, a few of them started
to show serious trouble when there was a lot of network traffic.
---------------------------------------------------------------
Jan 14 11:14:57 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
Jan 14 11:14:57 intrartc kernel: TDH <3b>
Jan 14 11:14:57 intrartc kernel: TDT <76>
Jan 14 11:14:57 intrartc kernel: next_to_use <76>
Jan 14 11:14:57 intrartc kernel: next_to_clean <31>
Jan 14 11:14:57 intrartc kernel: buffer_info[next_to_clean]:
Jan 14 11:14:57 intrartc kernel: time_stamp <ffff328c>
Jan 14 11:14:57 intrartc kernel: next_to_watch <3b>
Jan 14 11:14:57 intrartc kernel: jiffies <ffff33b9>
Jan 14 11:14:57 intrartc kernel: next_to_watch.status <0>
Jan 14 11:14:57 intrartc kernel: MAC Status <40080083>
Jan 14 11:14:57 intrartc kernel: PHY Status <796d>
Jan 14 11:14:57 intrartc kernel: PHY 1000BASE-T Status <3800>
Jan 14 11:14:57 intrartc kernel: PHY Extended Status <3000>
Jan 14 11:14:57 intrartc kernel: PCI Status <10>
Jan 14 11:14:59 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
..
---------------------------------------------------------------
All of those troubled machines use an Intel DH61CR board and
are driven by the e1000e driver. Kernels 3.7.0 to 3.19-rc4 are affected.
The problem vanishes when you disable TSO. This is the
recommended "solution" on serverfault and others.
http://ehc.ac/p/e1000/bugs/378/
http://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
I have a test setup that can trigger the problem within seconds
and bisected it down to this commit (hi Eric!):
---------------------------------------------------------------
commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db
Author: Eric Dumazet <edumazet@google.com>
Date: Wed Sep 26 06:46:57 2012 +0000
net: use bigger pages in __netdev_alloc_frag
We currently use percpu order-0 pages in __netdev_alloc_frag
to deliver fragments used by __netdev_alloc_skb()
Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
- Better filling of space (the ending hole overhead is less an issue)
- Less calls to page allocator or accesses to page->_count
- Could allow struct skb_shared_info futures changes without major
performance impact.
This patch implements a transparent fallback to smaller
pages in case of memory pressure.
It also uses a standard "struct page_frag" instead of a custom one.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---------------------------------------------------------------
Reverting the commit f.e. in kernel 3.7.0 solves the issue.
I've done some more tests:
3.18.0 32bit + PAE: broken
3.6.0 32bit + PAE: works
3.7.0 32bit + PAE: broken
3.7.0 32bit + PAE + revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db -> works
3.7.0 32bit (without PAE) -> broken
3.7.0 32bit + "GFP_COMP" flag removed in __netdev_alloc_frag(): broken
3.7.0 32bit + "GFP_COMP" flag replaced with
"GFP_DMA" in __netdev_alloc_frag(): works!
3.7.0 32bit + "GFP_COMP" flag + "GFP_DMA" flag: broken
3.19-rc4 32bit: broken
The problem is triggered only when the traffic is forwarded to another client.
(this client is behind NAT). Generating traffic directly
on the system did not trigger the issue.
To me it looks like Eric's change uncovered a memory allocation
issue in the e1000e driver: It probably uses a memory address
unsuitable for DMA or so. This is just a guess though.
Funny fact: I have another Intel DH61CR board that does not show the problem.
I've borrowed (...) the mainboard from one affected box for my bisect test setup.
Please CC: comments. Thanks.
Best regards,
Thomas
^ permalink raw reply
* non-OVS based vxlan config broken on 3.19-rc ?!
From: Or Gerlitz @ 2015-01-14 15:18 UTC (permalink / raw)
To: tom Herbert, thomas Graf, Marcelo Leitner, Jesse Gross
Cc: netdev@vger.kernel.org
Guys, just realized that non-OVS based vxlan config is broken with
3.19-rc... I see that it works for me on 3.18.2 and breaks on 3.19-rc3
(Linus tree). Tested over mlx4 (both offloaded and non offloaded modes)
and igb, see below the simplest form I can see it breaks on 3.19-rcand
works on 3.18
Looking on tcpdump and stats, the arp reply arrives to the 3.19-rc host
NIC driver but is dropped along the stack beforehanded to the vxlan
driver, not sure where and why...
Or.
> $ tcpdump -nni vxlan42 -e
> 16:48:40.961852 8e:c2:13:4f:97:6e > ff:ff:ff:ff:ff:ff, ethertype ARP
> (0x0806), length 42: Request who-has 192.168.92.18 tell 192.168.92.17,
> length 28
> 16:48:41.961864 8e:c2:13:4f:97:6e > ff:ff:ff:ff:ff:ff, ethertype ARP
> (0x0806), length 42: Request who-has 192.168.92.18 tell 192.168.92.17,
> length 28
> 16:48:42.979948 8e:c2:13:4f:97:6e > ff:ff:ff:ff:ff:ff, ethertype ARP
> (0x0806), length 42: Request who-has 192.168.92.18 tell 192.168.92.17,
> length 28
>
> $ tcpdump -nni eth3 -e
> 16:48:46.993870 00:02:c9:e9:bf:32 > 01:00:5e:00:00:2a, ethertype IPv4
> (0x0800), length 92: 192.168.31.17.33434 > 239.0.0.42.4789: UDP, length 50
> 16:48:46.993905 f4:52:14:01:da:82 > 00:02:c9:e9:bf:32, ethertype IPv4
> (0x0800), length 92: 192.168.31.18.39155 > 192.168.31.17.4789: UDP,
> length 50
> 16:48:47.993855 00:02:c9:e9:bf:32 > 01:00:5e:00:00:2a, ethertype IPv4
> (0x0800), length 92: 192.168.31.17.33434 > 239.0.0.42.4789: UDP, length 50
> 16:48:47.993881 f4:52:14:01:da:82 > 00:02:c9:e9:bf:32, ethertype IPv4
> (0x0800), length 92: 192.168.31.18.39155 > 192.168.31.17.4789: UDP,
> length 50
>
> $ nstat
> Wed Jan 14 16:54:04 2015
>
> #kernel
> IpInReceives 2 0.0
> IpInDelivers 2 0.0
> IpOutRequests 4 0.0
> IcmpOutErrors 2 0.0
> IcmpOutEchoReps 2 0.0
> IcmpMsgOutType8 2 0.0
> TcpInSegs 1 0.0
> TcpOutSegs 1 0.0
> UdpInDatagrams 1 0.0
> TcpExtTCPPureAcks 1 0.0
> TcpExtTCPOrigDataSent 1 0.0
> IpExtOutMcastPkts 1 0.0
> IpExtInOctets 124 0.0
> IpExtOutOctets 670 0.0
> IpExtOutMcastOctets 78 0.0
> IpExtInNoECTPkts 2 0.0
>
> Every 1.0s: netstat -s -w
> Wed Jan 14 16:54:56 2015
>
> Ip:
> 1077125083 total packets received
> 182 with invalid addresses
> 0 forwarded
> 0 incoming packets discarded
> 1077124621 incoming packets delivered
> 545614475 requests sent out
> Icmp:
> 67936 ICMP messages received
> 60 input ICMP message failed.
> ICMP input histogram:
> destination unreachable: 3319
> echo requests: 528
> echo replies: 64089
> 71890 ICMP messages sent
> 0 ICMP messages failed
> ICMP output histogram:
> destination unreachable: 3542
> echo request: 67821
> echo replies: 527
> IcmpMsg:
> InType0: 64089
> InType3: 3319
> InType8: 528
> OutType0: 527
> OutType3: 3542
> OutType8: 67821
> UdpLite:
> IpExt:
> InMcastPkts: 827
> OutMcastPkts: 2993
> InBcastPkts: 1424
> InOctets: 2102729994314
> OutOctets: 35397269627
> InMcastOctets: 31854
> OutMcastOctets: 237154
> InBcastOctets: 453473
> InNoECTPkts: 1440876685
# host A with IP address 192.168.31.17
IP=ip
# mlx4
ETH=eth0
$IP link del vxlan42
$IP link add vxlan42 type vxlan id 42 group 239.0.0.42 ttl 10 dstport
4789 dev $ETH
$IP link set vxlan42 up
ifconfig vxlan42 192.168.92.17/24
ifconfig vxlan42 mtu 1450
# plain ping
ping 192.168.31.18 -c 2
# encaped ping <-- BREAKS
ping 192.168.92.18 -c 2
# host B with IP address 192.168.31.18
IP=ip
# mlx4
ETH=eth0
$IP link del vxlan42
$IP link add vxlan42 type vxlan id 42 group 239.0.0.42 ttl 10 dstport
4789 dev $ETH
$IP link set vxlan42 up
ifconfig vxlan42 192.168.92.18/24
ifconfig vxlan42 mtu 1450
ping 192.168.31.17 -c 2
ping 192.168.92.17 -c 2
^ permalink raw reply
* Re: [patch net-next 1/2 v3] tc: add BPF based action
From: Alexei Starovoitov @ 2015-01-14 15:39 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Jiri Pirko, Network Development, David S. Miller,
Jamal Hadi Salim, Hannes Frederic Sowa
In-Reply-To: <54B66F08.2010305@redhat.com>
On Wed, Jan 14, 2015 at 5:28 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>
> I'm still wondering about the drop semantics ... wouldn't it be more
> intuitive to use 0 for drops in this context?
good point.
I think it must be 0 to match behavior of socket filters, etc.
If program tries to access beyond packet size or does divide
by zero if will be terminated and will return 0.
So zero should be the safest action from caller point of view.
^ permalink raw reply
* Re: Investment
From: Suklee Peck @ 2015-01-13 19:57 UTC (permalink / raw)
To: Recipients
I'm contacting you on behalf of an investment placed under management 5 years ago by Shui bian. He needs assistance in investing these funds. If you are interested, you can write to his private email ( saitt01@qq.com ) for further details.
gesalpe
Best Regards,
Suklee Peck
^ permalink raw reply
* Re: non-OVS based vxlan config broken on 3.19-rc ?!
From: thomas Graf @ 2015-01-14 15:52 UTC (permalink / raw)
To: Or Gerlitz
Cc: tom Herbert, Marcelo Leitner, Jesse Gross, netdev@vger.kernel.org
In-Reply-To: <54B688D9.8030101@mellanox.com>
On 01/14/15 at 05:18pm, Or Gerlitz wrote:
> Guys, just realized that non-OVS based vxlan config is broken with
> 3.19-rc... I see that it works for me on 3.18.2 and breaks on 3.19-rc3
> (Linus tree). Tested over mlx4 (both offloaded and non offloaded modes) and
> igb, see below the simplest form I can see it breaks on 3.19-rcand works on
> 3.18
>
> Looking on tcpdump and stats, the arp reply arrives to the 3.19-rc host NIC
> driver but is dropped along the stack beforehanded to the vxlan driver, not
> sure where and why...
As additional data point: I tested the VXLAN-GBP with iproute2 based tunnels
on net-next and that works fine. Driver used was a e1000 in KVM.
^ permalink raw reply
* Re: [PATCH 2/2] net/macb: improved ethtool statistics support
From: Nicolas Ferre @ 2015-01-14 15:53 UTC (permalink / raw)
To: Xander Huff, netdev, David Miller
Cc: jaeden.amero, rich.tollerton, ben.shelton, brad.mouring,
linux-kernel, Cyrille Pitchen
In-Reply-To: <1421187351-27279-2-git-send-email-xander.huff@ni.com>
Le 13/01/2015 23:15, Xander Huff a écrit :
> Currently `ethtool -S` simply returns "no stats available". It
> would be more useful to see what the various ethtool statistics
> registers' values are. This change implements get_ethtool_stats,
> get_strings, and get_sset_count functions to accomplish this.
>
> Read all GEM statistics registers and sum them into
> macb.ethtool_stats. Add the necessary infrastructure to make this
> accessible via `ethtool -S`.
>
> Update gem_update_stats to utilize ethtool_stats.
>
> Signed-off-by: Xander Huff <xander.huff@ni.com>
David,
I see some issues with this patch: can you hold it a little bit please
(aka NAK)?
Remarks enclosed:
> ---
> drivers/net/ethernet/cadence/macb.c | 55 +++++++-
> drivers/net/ethernet/cadence/macb.h | 256 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 307 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index 3767271..dd8c202 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -1827,12 +1827,23 @@ static int macb_close(struct net_device *dev)
>
> static void gem_update_stats(struct macb *bp)
> {
> - u32 __iomem *reg = bp->regs + GEM_OTX;
> + int i;
> u32 *p = &bp->hw_stats.gem.tx_octets_31_0;
> - u32 *end = &bp->hw_stats.gem.rx_udp_checksum_errors + 1;
>
> - for (; p < end; p++, reg++)
> - *p += __raw_readl(reg);
> + for (i = 0; i < GEM_STATS_LEN; ++i, ++p) {
> + u32 offset = gem_statistics[i].offset;
> + u64 val = __raw_readl(bp->regs+offset);
> +
> + bp->ethtool_stats[i] += val;
> + *p += val;
> +
> + if (offset == GEM_OCTTXL || offset == GEM_OCTRXL) {
> + /* Add GEM_OCTTXH, GEM_OCTRXH */
> + val = __raw_readl(bp->regs+offset+4);
style: whitespace around '+'
> + bp->ethtool_stats[i] += ((u64)val)<<32;
style: ditto
> + *(++p) += val;
> + }
> + }
> }
>
> static struct net_device_stats *gem_get_stats(struct macb *bp)
> @@ -1873,6 +1884,39 @@ static struct net_device_stats *gem_get_stats(struct macb *bp)
> return nstat;
> }
>
> +static void gem_get_ethtool_stats(struct net_device *dev,
> + struct ethtool_stats *stats, u64 *data)
> +{
> + struct macb *bp;
> +
> + bp = netdev_priv(dev);
> + gem_update_stats(bp);
> + memcpy(data, &bp->ethtool_stats, sizeof(u64)*GEM_STATS_LEN);
style: ditto
> +}
> +
> +static int gem_get_sset_count(struct net_device *dev, int sset)
> +{
> + switch (sset) {
> + case ETH_SS_STATS:
> + return GEM_STATS_LEN;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static void gem_get_ethtool_strings(struct net_device *dev, u32 sset, u8 *p)
> +{
> + int i;
> +
> + switch (sset) {
> + case ETH_SS_STATS:
> + for (i = 0; i < GEM_STATS_LEN; i++, p += ETH_GSTRING_LEN)
> + memcpy(p, gem_statistics[i].stat_string,
> + ETH_GSTRING_LEN);
> + break;
> + }
> +}
> +
> struct net_device_stats *macb_get_stats(struct net_device *dev)
> {
> struct macb *bp = netdev_priv(dev);
> @@ -1988,6 +2032,9 @@ const struct ethtool_ops macb_ethtool_ops = {
> .get_regs = macb_get_regs,
> .get_link = ethtool_op_get_link,
> .get_ts_info = ethtool_op_get_ts_info,
> + .get_ethtool_stats = gem_get_ethtool_stats,
> + .get_strings = gem_get_ethtool_strings,
> + .get_sset_count = gem_get_sset_count,
I think that the 10/100 macb version of this IP doesn't have the same
statistic possibilities: so you shouldn't register these functions for
all the variants of the IP.
Can you please verify this and only register these functions in the proper
if (macb_is_gem(bp)) alternative?
> };
> EXPORT_SYMBOL_GPL(macb_ethtool_ops);
>
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index 8e8c3c9..378b218 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -82,6 +82,159 @@
> #define GEM_SA4B 0x00A0 /* Specific4 Bottom */
> #define GEM_SA4T 0x00A4 /* Specific4 Top */
> #define GEM_OTX 0x0100 /* Octets transmitted */
> +#define GEM_OCTTXL 0x0100 /* Octets transmitted
> + * [31:0]
> + */
Well please place the comment on a single line. Even if it exceeds 80
characters, it can be an exception for this.
However, check with David but personally, I feel that this formatting is
not good...
> +#define GEM_OCTTXH 0x0104 /* Octets transmitted
> + * [47:32]
> + */
> +#define GEM_TXCNT 0x0108 /* Error-free Frames
> + * Transmitted counter
> + */
> +#define GEM_TXBCCNT 0x010c /* Error-free Broadcast
> + * Frames counter
> + */
> +#define GEM_TXMCCNT 0x0110 /* Error-free Multicast
> + * Frames counter
> + */
> +#define GEM_TXPAUSECNT 0x0114 /* Pause Frames
> + * Transmitted Counter
> + */
> +#define GEM_TX64CNT 0x0118 /* Error-free 64 byte
> + * Frames Transmitted
> + * counter
> + */
... particularly when it comes to reach 3 lines like above ^^^^^^
> +#define GEM_TX65CNT 0x011c /* Error-free 65-127 byte
> + * Frames Transmitted
> + * counter
> + */
> +#define GEM_TX128CNT 0x0120 /* Error-free 128-255
> + * byte Frames
> + * Transmitted counter
> + */
> +#define GEM_TX256CNT 0x0124 /* Error-free 256-511
> + * byte Frames
> + * transmitted counter
> + */
> +#define GEM_TX512CNT 0x0128 /* Error-free 512-1023
> + * byte Frames
> + * transmitted counter
> + */
> +#define GEM_TX1024CNT 0x012c /* Error-free 1024-1518
> + * byte Frames
> + * transmitted counter
> + */
> +#define GEM_TX1519CNT 0x0130 /* Error-free larger than
> + * 1519 byte Frames
> + * tranmitted counter
> + */
> +#define GEM_TXURUNCNT 0x0134 /* TX under run error
> + * counter
> + */
> +#define GEM_SNGLCOLLCNT 0x0138 /* Single Collision Frame
> + * Counter
> + */
> +#define GEM_MULTICOLLCNT 0x013c /* Multiple Collision
> + * Frame Counter
> + */
> +#define GEM_EXCESSCOLLCNT 0x0140 /* Excessive Collision
> + * Frame Counter
> + */
> +#define GEM_LATECOLLCNT 0x0144 /* Late Collision Frame
> + * Counter
> + */
> +#define GEM_TXDEFERCNT 0x0148 /* Deferred Transmission
> + * Frame Counter
> + */
> +#define GEM_TXCSENSECNT 0x014c /* Carrier Sense Error
> + * Counter
> + */
> +#define GEM_ORX 0x0150 /* Octets received */
> +#define GEM_OCTRXL 0x0150 /* Octets received
> + * [31:0]
> + */
> +#define GEM_OCTRXH 0x0154 /* Octets received
> + * [47:32]
> + */
> +#define GEM_RXCNT 0x0158 /* Error-free Frames
> + * Received Counter
> + */
> +#define GEM_RXBROADCNT 0x015c /* Error-free Broadcast
> + * Frames Received
> + * Counter
> + */
> +#define GEM_RXMULTICNT 0x0160 /* Error-free Multicast
> + * Frames Received
> + * Counter
> + */
> +#define GEM_RXPAUSECNT 0x0164 /* Error-free Pause
> + * Frames Received
> + * Counter
> + */
> +#define GEM_RX64CNT 0x0168 /* Error-free 64 byte
> + * Frames Received
> + * Counter
> + */
> +#define GEM_RX65CNT 0x016c /* Error-free 65-127 byte
> + * Frames Received
> + * Counter
> + */
> +#define GEM_RX128CNT 0x0170 /* Error-free 128-255
> + * byte Frames Received
> + * Counter
> + */
> +#define GEM_RX256CNT 0x0174 /* Error-free 256-511
> + * byte Frames Received
> + * Counter
> + */
> +#define GEM_RX512CNT 0x0178 /* Error-free 512-1023
> + * byte Frames Received
> + * Counter
> + */
> +#define GEM_RX1024CNT 0x017c /* Error-free 1024-1518
> + * byte Frames Received
> + * Counter
> + */
> +#define GEM_RX1519CNT 0x0180 /* Error-free larger than
> + * 1519 Frames Received
> + * Counter
> + */
> +#define GEM_RXUNDRCNT 0x0184 /* Undersize Frames
> + * Received Counter
> + */
> +#define GEM_RXOVRCNT 0x0188 /* Oversize Frames
> + * Received Counter
> + */
> +#define GEM_RXJABCNT 0x018c /* Jabbers Received
> + * Counter
> + */
> +#define GEM_RXFCSCNT 0x0190 /* Frame Check Sequence
> + * Error Counter
> + */
> +#define GEM_RXLENGTHCNT 0x0194 /* Length Field Error
> + * Counter
> + */
> +#define GEM_RXSYMBCNT 0x0198 /* Symbol Error
> + * Counter
> + */
> +#define GEM_RXALIGNCNT 0x019c /* Alignment Error
> + * Counter
> + */
> +#define GEM_RXRESERRCNT 0x01a0 /* Receive Resource Error
> + * Counter
> + */
> +#define GEM_RXORCNT 0x01a4 /* Receive Overrun
> + * Counter
> + */
> +#define GEM_RXIPCCNT 0x01a8 /* IP header Checksum
> + * Error Counter
> + */
> +#define GEM_RXTCPCCNT 0x01ac /* TCP Checksum Error
> + * Counter
> + */
> +#define GEM_RXUDPCCNT 0x01b0 /* UDP Checksum Error
> + * Counter
> + */
> #define GEM_DCFG1 0x0280 /* Design Config 1 */
> #define GEM_DCFG2 0x0284 /* Design Config 2 */
> #define GEM_DCFG3 0x0288 /* Design Config 3 */
> @@ -650,6 +803,107 @@ struct gem_stats {
> u32 rx_udp_checksum_errors;
> };
>
> +/* Describes the name and offset of an individual statistic register, as
style: should be like this:
/*
* bla bla bla
*/
> + * returned by `ethtool -S`. Also describes which net_device_stats statistics
> + * this register should contribute to.
> + */
> +struct gem_statistic {
> + char stat_string[ETH_GSTRING_LEN];
> + int offset;
> + u32 stat_bits;
> +};
> +
> +/* Bitfield defs for net_device_stat statistics */
> +#define GEM_NDS_RXERR_OFFSET 0
> +#define GEM_NDS_RXLENERR_OFFSET 1
> +#define GEM_NDS_RXOVERERR_OFFSET 2
> +#define GEM_NDS_RXCRCERR_OFFSET 3
> +#define GEM_NDS_RXFRAMEERR_OFFSET 4
> +#define GEM_NDS_RXFIFOERR_OFFSET 5
> +#define GEM_NDS_TXERR_OFFSET 6
> +#define GEM_NDS_TXABORTEDERR_OFFSET 7
> +#define GEM_NDS_TXCARRIERERR_OFFSET 8
> +#define GEM_NDS_TXFIFOERR_OFFSET 9
> +#define GEM_NDS_COLLISIONS_OFFSET 10
> +
> +#define GEM_STAT_TITLE(name, title) GEM_STAT_TITLE_BITS(name, title, 0)
> +#define GEM_STAT_TITLE_BITS(name, title, bits) { \
> + .stat_string = title, \
> + .offset = GEM_##name, \
> + .stat_bits = bits \
> +}
> +
> +/* list of gem statistic registers. The names MUST match the
> + * corresponding GEM_* definitions.
> + */
> +static const struct gem_statistic gem_statistics[] = {
> + GEM_STAT_TITLE(OCTTXL, "tx_octets"), /* OCTTXH combined with OCTTXL */
> + GEM_STAT_TITLE(TXCNT, "tx_frames"),
> + GEM_STAT_TITLE(TXBCCNT, "tx_broadcast_frames"),
> + GEM_STAT_TITLE(TXMCCNT, "tx_multicast_frames"),
> + GEM_STAT_TITLE(TXPAUSECNT, "tx_pause_frames"),
> + GEM_STAT_TITLE(TX64CNT, "tx_64_byte_frames"),
> + GEM_STAT_TITLE(TX65CNT, "tx_65_127_byte_frames"),
> + GEM_STAT_TITLE(TX128CNT, "tx_128_255_byte_frames"),
> + GEM_STAT_TITLE(TX256CNT, "tx_256_511_byte_frames"),
> + GEM_STAT_TITLE(TX512CNT, "tx_512_1023_byte_frames"),
> + GEM_STAT_TITLE(TX1024CNT, "tx_1024_1518_byte_frames"),
> + GEM_STAT_TITLE(TX1519CNT, "tx_greater_than_1518_byte_frames"),
> + GEM_STAT_TITLE_BITS(TXURUNCNT, "tx_underrun",
> + GEM_BIT(NDS_TXERR)|GEM_BIT(NDS_TXFIFOERR)),
> + GEM_STAT_TITLE_BITS(SNGLCOLLCNT, "tx_single_collision_frames",
> + GEM_BIT(NDS_TXERR)|GEM_BIT(NDS_COLLISIONS)),
> + GEM_STAT_TITLE_BITS(MULTICOLLCNT, "tx_multiple_collision_frames",
> + GEM_BIT(NDS_TXERR)|GEM_BIT(NDS_COLLISIONS)),
> + GEM_STAT_TITLE_BITS(EXCESSCOLLCNT, "tx_excessive_collisions",
> + GEM_BIT(NDS_TXERR)|
> + GEM_BIT(NDS_TXABORTEDERR)|
> + GEM_BIT(NDS_COLLISIONS)),
> + GEM_STAT_TITLE_BITS(LATECOLLCNT, "tx_late_collisions",
> + GEM_BIT(NDS_TXERR)|GEM_BIT(NDS_COLLISIONS)),
> + GEM_STAT_TITLE(TXDEFERCNT, "tx_deferred_frames"),
> + GEM_STAT_TITLE_BITS(TXCSENSECNT, "tx_carrier_sense_errors",
> + GEM_BIT(NDS_TXERR)|GEM_BIT(NDS_COLLISIONS)),
> + GEM_STAT_TITLE(OCTRXL, "rx_octets"), /* OCTRXH combined with OCTRXL */
> + GEM_STAT_TITLE(RXCNT, "rx_frames"),
> + GEM_STAT_TITLE(RXBROADCNT, "rx_broadcast_frames"),
> + GEM_STAT_TITLE(RXMULTICNT, "rx_multicast_frames"),
> + GEM_STAT_TITLE(RXPAUSECNT, "rx_pause_frames"),
> + GEM_STAT_TITLE(RX64CNT, "rx_64_byte_frames"),
> + GEM_STAT_TITLE(RX65CNT, "rx_65_127_byte_frames"),
> + GEM_STAT_TITLE(RX128CNT, "rx_128_255_byte_frames"),
> + GEM_STAT_TITLE(RX256CNT, "rx_256_511_byte_frames"),
> + GEM_STAT_TITLE(RX512CNT, "rx_512_1023_byte_frames"),
> + GEM_STAT_TITLE(RX1024CNT, "rx_1024_1518_byte_frames"),
> + GEM_STAT_TITLE(RX1519CNT, "rx_greater_than_1518_byte_frames"),
> + GEM_STAT_TITLE_BITS(RXUNDRCNT, "rx_undersized_frames",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXLENERR)),
> + GEM_STAT_TITLE_BITS(RXOVRCNT, "rx_oversize_frames",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXLENERR)),
> + GEM_STAT_TITLE_BITS(RXJABCNT, "rx_jabbers",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXLENERR)),
> + GEM_STAT_TITLE_BITS(RXFCSCNT, "rx_frame_check_sequence_errors",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXCRCERR)),
> + GEM_STAT_TITLE_BITS(RXLENGTHCNT, "rx_length_field_frame_errors",
> + GEM_BIT(NDS_RXERR)),
> + GEM_STAT_TITLE_BITS(RXSYMBCNT, "rx_symbol_errors",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXFRAMEERR)),
> + GEM_STAT_TITLE_BITS(RXALIGNCNT, "rx_alignment_errors",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXOVERERR)),
> + GEM_STAT_TITLE_BITS(RXRESERRCNT, "rx_resource_errors",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXOVERERR)),
> + GEM_STAT_TITLE_BITS(RXORCNT, "rx_overruns",
> + GEM_BIT(NDS_RXERR)|GEM_BIT(NDS_RXFIFOERR)),
> + GEM_STAT_TITLE_BITS(RXIPCCNT, "rx_ip_header_checksum_errors",
> + GEM_BIT(NDS_RXERR)),
> + GEM_STAT_TITLE_BITS(RXTCPCCNT, "rx_tcp_checksum_errors",
> + GEM_BIT(NDS_RXERR)),
> + GEM_STAT_TITLE_BITS(RXUDPCCNT, "rx_udp_checksum_errors",
> + GEM_BIT(NDS_RXERR)),
> +};
> +
> +#define GEM_STATS_LEN ARRAY_SIZE(gem_statistics)
> +
> struct macb;
>
> struct macb_or_gem_ops {
> @@ -728,6 +982,8 @@ struct macb {
> dma_addr_t skb_physaddr; /* phys addr from pci_map_single */
> int skb_length; /* saved skb length for pci_unmap_single */
> unsigned int max_tx_length;
> +
> + u64 ethtool_stats[GEM_STATS_LEN];
> };
>
> extern const struct ethtool_ops macb_ethtool_ops;
>
--
Nicolas Ferre
^ permalink raw reply
* Re: [patch net-next 1/2 v3] tc: add BPF based action
From: Jiri Pirko @ 2015-01-14 15:55 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Daniel Borkmann, Network Development, David S. Miller,
Jamal Hadi Salim, Hannes Frederic Sowa
In-Reply-To: <CAMEtUux+HOOegzyi82fYT_JMDX_+d0dZCQ=Zc5GWC-awQmJu1A@mail.gmail.com>
Wed, Jan 14, 2015 at 04:39:34PM CET, ast@plumgrid.com wrote:
>On Wed, Jan 14, 2015 at 5:28 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>>
>> I'm still wondering about the drop semantics ... wouldn't it be more
>> intuitive to use 0 for drops in this context?
>
>good point.
>I think it must be 0 to match behavior of socket filters, etc.
>If program tries to access beyond packet size or does divide
>by zero if will be terminated and will return 0.
>So zero should be the safest action from caller point of view.
Will do. Thanks!
^ permalink raw reply
* Re: [PATCH net 0/3]tg3: synchronize_irq() should be called without taking locks
From: Peter Hurley @ 2015-01-14 15:57 UTC (permalink / raw)
To: Prashant Sreedharan, davem; +Cc: netdev, mchan
In-Reply-To: <1421217039-11689-1-git-send-email-prashant@broadcom.com>
On 01/14/2015 01:30 AM, Prashant Sreedharan wrote:
> Prashant Sreedharan (3):
> tg3_timer() should grab tp->lock before checking for tp->irq_sync
> tg3_reset_task() needs to use rtnl_lock to synchronize
> Release tp->lock before invoking synchronize_irq()
Thanks!
For series:
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Peter Hurley <peter@hurleysoftware.com>
But maybe one of these patches should reference that this fixes
BUG: sleeping function... so that others can quickly find this
fix (if they're bisecting or whatever). For the same reason, it
might be useful for this series to be just one patch.
Regards,
Peter Hurley
^ permalink raw reply
* Re: [3.19-rc3] tg3: BUG: sleeping function called from invalid context
From: Peter Hurley @ 2015-01-14 16:06 UTC (permalink / raw)
To: Prashant Sreedharan; +Cc: Michael Chan, netdev, Linux kernel
In-Reply-To: <1421116255.16485.14.camel@prashant>
On 01/12/2015 09:30 PM, Prashant Sreedharan wrote:
> On Mon, 2015-01-12 at 19:59 -0500, Peter Hurley wrote:
>> On 3.19-rc3, I'm seeing this might_sleep() warning [1] from the tg3_open()
>> call stack. Let me know if I need to bisect this.
>>
>> Regards,
>> Peter Hurley
>>
>> [1]
>>
>> [ 17.203009] BUG: sleeping function called from invalid context at /home/peter/src/kernels/mainline/kernel/irq/manage.c:104
>> [ 17.203067] in_atomic(): 1, irqs_disabled(): 0, pid: 1106, name: ip
>> [ 17.203092] 2 locks held by ip/1106:
>> [ 17.205255] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816adf1f>] rtnetlink_rcv+0x1f/0x40
>> [ 17.207445] #1: (&(&tp->lock)->rlock){+.....}, at: [<ffffffffa01073e6>] tg3_start+0xc06/0x11f0 [tg3]
>> [ 17.209725] CPU: 2 PID: 1106 Comm: ip Not tainted 3.19.0-rc3+wip-xeon+lockdep #rc3+wip
>> [ 17.211900] Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012
>> [ 17.214086] 0000000000000068 ffff8802ac823498 ffffffff817af7e8 0000000000000005
>> [ 17.216265] ffffffff81a9be78 ffff8802ac8234a8 ffffffff810998a5 ffff8802ac8234d8
>> [ 17.218446] ffffffff8109991a ffff8802ac8234c8 ffff8802af0aae00 ffffffffa00ed000
>> [ 17.220636] Call Trace:
>> [ 17.222743] [<ffffffff817af7e8>] dump_stack+0x4f/0x7b
>> [ 17.224808] [<ffffffff810998a5>] ___might_sleep+0x105/0x140
>> [ 17.226842] [<ffffffff8109991a>] __might_sleep+0x3a/0xa0
>> [ 17.228869] [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [ 17.230939] [<ffffffff810d7d78>] synchronize_irq+0x38/0xa0
>> [ 17.232967] [<ffffffffa00ed000>] ? 0xffffffffa00ed000
>> [ 17.234991] [<ffffffffa010105f>] tg3_chip_reset+0x13f/0x9c0 [tg3]
>> [ 17.236988] [<ffffffffa01020ae>] tg3_reset_hw+0x7e/0x2d20 [tg3]
>> [ 17.238996] [<ffffffff813bfaff>] ? __udelay+0x2f/0x40
>> [ 17.241007] [<ffffffffa00ef2f7>] ? _tw32_flush+0x47/0x80 [tg3]
>> [ 17.243066] [<ffffffffa0104dac>] tg3_init_hw+0x5c/0x70 [tg3]
>> [ 17.245438] [<ffffffffa010740b>] tg3_start+0xc2b/0x11f0 [tg3]
>> [ 17.247444] [<ffffffffa0107ad7>] ? tg3_open+0x107/0x2e0 [tg3]
>> [ 17.249556] [<ffffffff810c338d>] ? trace_hardirqs_on+0xd/0x10
>> [ 17.251581] [<ffffffff8107806f>] ? __local_bh_enable_ip+0x6f/0x100
>> [ 17.253710] [<ffffffffa0107af8>] tg3_open+0x128/0x2e0 [tg3]
>> [ 17.255758] [<ffffffff816ba3f5>] ? netpoll_poll_disable+0x5/0xa0
>> [ 17.257932] [<ffffffff816a14af>] __dev_open+0xbf/0x140
>> [ 17.260091] [<ffffffff816a17c1>] __dev_change_flags+0xa1/0x160
>> [ 17.262222] [<ffffffff816a18a9>] dev_change_flags+0x29/0x60
>> [ 17.264360] [<ffffffff816b0e02>] do_setlink+0x2f2/0xa30
>> [ 17.266431] [<ffffffff816b1b7f>] rtnl_newlink+0x51f/0x750
>> [ 17.268485] [<ffffffff816b1749>] ? rtnl_newlink+0xe9/0x750
>> [ 17.270483] [<ffffffff811869c2>] ? free_pages_prepare+0x1d2/0x270
>> [ 17.272507] [<ffffffff810c32bd>] ? trace_hardirqs_on_caller+0x11d/0x1e0
>> [ 17.274531] [<ffffffff813dd1b2>] ? nla_parse+0x32/0x120
>> [ 17.276531] [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [ 17.278514] [<ffffffff816adfd5>] rtnetlink_rcv_msg+0x95/0x250
>> [ 17.280485] [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [ 17.282448] [<ffffffff817b4a02>] ? mutex_lock_nested+0x382/0x530
>> [ 17.284402] [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [ 17.286290] [<ffffffff816adf1f>] ? rtnetlink_rcv+0x1f/0x40
>> [ 17.288142] [<ffffffff816adf40>] ? rtnetlink_rcv+0x40/0x40
>> [ 17.290031] [<ffffffff816cedc1>] netlink_rcv_skb+0xc1/0xe0
>> [ 17.291836] [<ffffffff816adf2e>] rtnetlink_rcv+0x2e/0x40
>> [ 17.293615] [<ffffffff816ce473>] netlink_unicast+0xf3/0x1d0
>> [ 17.295420] [<ffffffff816ce863>] netlink_sendmsg+0x313/0x690
>> [ 17.297132] [<ffffffff811ada4f>] ? might_fault+0x5f/0xb0
>> [ 17.298799] [<ffffffff8168253c>] do_sock_sendmsg+0x8c/0x100
>> [ 17.300493] [<ffffffff81681e3e>] ? copy_msghdr_from_user+0x15e/0x1f0
>> [ 17.302173] [<ffffffff81682aeb>] ___sys_sendmsg+0x30b/0x320
>> [ 17.303798] [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [ 17.305431] [<ffffffff810bdee0>] ? cpuacct_account_field+0x80/0xb0
>> [ 17.307085] [<ffffffff81021ab5>] ? native_sched_clock+0x35/0xa0
>> [ 17.308744] [<ffffffff810a4f35>] ? sched_clock_local+0x25/0x90
>> [ 17.310375] [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [ 17.311948] [<ffffffff810a5198>] ? sched_clock_cpu+0xb8/0xe0
>> [ 17.313509] [<ffffffff810bf8be>] ? put_lock_stats.isra.26+0xe/0x30
>> [ 17.315069] [<ffffffff810c007e>] ? lock_release_holdtime.part.27+0x12e/0x1b0
>> [ 17.316618] [<ffffffff810a5dc1>] ? vtime_account_user+0x91/0xa0
>> [ 17.318162] [<ffffffff8109f5d1>] ? get_parent_ip+0x11/0x50
>> [ 17.319703] [<ffffffff8109f699>] ? preempt_count_sub+0x49/0x50
>> [ 17.321235] [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [ 17.322732] [<ffffffff811807e5>] ? context_tracking_user_exit+0x55/0x130
>> [ 17.324197] [<ffffffff816834f2>] __sys_sendmsg+0x42/0x80
>> [ 17.325634] [<ffffffff81683542>] SyS_sendmsg+0x12/0x20
>> [ 17.327048] [<ffffffff817ba12d>] system_call_fastpath+0x16/0x1b
>
> Please bisect, there hasn't been tg3 code changes in this path that
> might cause this.
What triggers this is the new debugging code added to catch nested
sleeps; specifically e22b886 ("sched/wait: Add might_sleep() checks").
Regards,
Peter Hurley
^ permalink raw reply
* Re: [PATCH iproute2 0/3] ip netns: Run over all netns
From: Vadim Kochan @ 2015-01-14 16:13 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Vadim Kochan, netdev
In-Reply-To: <20150113172837.793f0969@urahara>
On Tue, Jan 13, 2015 at 05:28:37PM -0800, Stephen Hemminger wrote:
> On Wed, 7 Jan 2015 13:04:19 +0200
> Vadim Kochan <vadim4j@gmail.com> wrote:
>
> > From: Vadim Kochan <vadim4j@gmail.com>
> >
> > Allow 'ip netns del' and 'ip netns exec' run over each network namespace names.
> >
> > 'ip netns exec' executes command forcely on eacn nsname.
> >
> > Vadim Kochan (3):
> > lib: Exec func on each netns
> > ip netns: Allow exec on each netns
> > ip netns: Delete all netns
> >
> > include/namespace.h | 6 ++++
> > include/utils.h | 5 +++
> > ip/ipnetns.c | 96 ++++++++++++++++++++++++++++++++---------------------
> > lib/namespace.c | 22 ++++++++++++
> > lib/utils.c | 28 ++++++++++++++++
> > man/man8/ip-netns.8 | 26 ++++++++++++---
> > 6 files changed, 141 insertions(+), 42 deletions(-)
> >
>
> It is a useful concept but as others have pointed out the idea of reserving
> a keyword 'all' at this point is likely to cause somebody to break.
> So sorry.
OK, then what about additional option like:
$ ip -all netns exec ...
?
Thanks,
^ permalink raw reply
* Re: [PATCH v4 20/20] kbuild: add a new kselftest_install make target to install selftests
From: Shuah Khan @ 2015-01-14 16:32 UTC (permalink / raw)
To: mmarek, masami.hiramatsu.pt
Cc: gregkh, akpm, rostedt, mingo, davem, keescook, tranmanphong, mpe,
cov, dh.herrmann, hughd, bobby.prani, serge.hallyn, ebiederm,
tim.bird, josh, koct9i, linux-kbuild, linux-kernel, linux-api,
netdev
In-Reply-To: <2c5a28faaa79d9c2415854a08817ada509fcb943.1420571615.git.shuahkh@osg.samsung.com>
On 01/06/2015 12:43 PM, Shuah Khan wrote:
> Add a new make target to install to install kernel selftests.
> This new target will build and install selftests. kselftest
> target now depends on kselftest_install and runs the generated
> kselftest script to reduce duplicate work and for common look
> and feel when running tests.
>
> make kselftest_target:
> -- exports kselftest INSTALL_KSFT_PATH
> default $(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
> -- exports INSTALL_KSFT_PATH
> -- runs selftests make install target
>
> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
> ---
> Makefile | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
Hi Marek,
Could you please Ack this patch, if this version looks good,
so I can take this through ksefltest tree.
thanks,
-- Shuah
--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978
^ permalink raw reply
* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
From: Eric Dumazet @ 2015-01-14 16:34 UTC (permalink / raw)
To: Ding Tianhong
Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
linux-arm-kernel, xuwei5, zhangfei.gao, netdev, devicetree, linux
In-Reply-To: <1421217254-12008-4-git-send-email-dingtianhong@huawei.com>
On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>
> v13: Fix the problem of alignment parameters for function and checkpatch warming.
>
> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
> for hip04 ethernet.
>
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
> is not supported for this patch, but I think it could work for hip04,
> will support it later after some tests for performance better.
>
> Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
> it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>
> - Before:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
>
> - After:
> $ ping 192.168.1.1 ...
> === 192.168.1.1 ping statistics ===
> 24 packets transmitted, 24 received, 0% packet loss, time 22999ms
> rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>
> $ iperf -c 192.168.1.1 ...
> [ ID] Interval Transfer Bandwidth
> [ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
>
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
> for v9 version
> - drop the workqueue
> - batch cleanup based on tx_coalesce_frames/usecs for better throughput
> - use a reasonable default tx timeout (200us, could be shorted
> based on measurements) with a range timer
> - fix napi poll function return value
> - use a lockless queue for cleanup
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
> drivers/net/ethernet/hisilicon/Makefile | 2 +-
> drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
> 2 files changed, 970 insertions(+), 1 deletion(-)
> create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> index 40115a7..6c14540 100644
> --- a/drivers/net/ethernet/hisilicon/Makefile
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -3,4 +3,4 @@
> #
>
> obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> new file mode 100644
> index 0000000..525214e
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> @@ -0,0 +1,969 @@
> +
> +/* Copyright (c) 2014 Linaro Ltd.
> + * Copyright (c) 2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/etherdevice.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +#include <linux/ktime.h>
> +#include <linux/of_address.h>
> +#include <linux/phy.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_net.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/regmap.h>
> +
> +#define PPE_CFG_RX_ADDR 0x100
> +#define PPE_CFG_POOL_GRP 0x300
> +#define PPE_CFG_RX_BUF_SIZE 0x400
> +#define PPE_CFG_RX_FIFO_SIZE 0x500
> +#define PPE_CURR_BUF_CNT 0xa200
> +
> +#define GE_DUPLEX_TYPE 0x08
> +#define GE_MAX_FRM_SIZE_REG 0x3c
> +#define GE_PORT_MODE 0x40
> +#define GE_PORT_EN 0x44
> +#define GE_SHORT_RUNTS_THR_REG 0x50
> +#define GE_TX_LOCAL_PAGE_REG 0x5c
> +#define GE_TRANSMIT_CONTROL_REG 0x60
> +#define GE_CF_CRC_STRIP_REG 0x1b0
> +#define GE_MODE_CHANGE_REG 0x1b4
> +#define GE_RECV_CONTROL_REG 0x1e0
> +#define GE_STATION_MAC_ADDRESS 0x210
> +#define PPE_CFG_CPU_ADD_ADDR 0x580
> +#define PPE_CFG_MAX_FRAME_LEN_REG 0x408
> +#define PPE_CFG_BUS_CTRL_REG 0x424
> +#define PPE_CFG_RX_CTRL_REG 0x428
> +#define PPE_CFG_RX_PKT_MODE_REG 0x438
> +#define PPE_CFG_QOS_VMID_GEN 0x500
> +#define PPE_CFG_RX_PKT_INT 0x538
> +#define PPE_INTEN 0x600
> +#define PPE_INTSTS 0x608
> +#define PPE_RINT 0x604
> +#define PPE_CFG_STS_MODE 0x700
> +#define PPE_HIS_RX_PKT_CNT 0x804
> +
> +/* REG_INTERRUPT */
> +#define RCV_INT BIT(10)
> +#define RCV_NOBUF BIT(8)
> +#define RCV_DROP BIT(7)
> +#define TX_DROP BIT(6)
> +#define DEF_INT_ERR (RCV_NOBUF | RCV_DROP | TX_DROP)
> +#define DEF_INT_MASK (RCV_INT | DEF_INT_ERR)
> +
> +/* TX descriptor config */
> +#define TX_FREE_MEM BIT(0)
> +#define TX_READ_ALLOC_L3 BIT(1)
> +#define TX_FINISH_CACHE_INV BIT(2)
> +#define TX_CLEAR_WB BIT(4)
> +#define TX_L3_CHECKSUM BIT(5)
> +#define TX_LOOP_BACK BIT(11)
> +
> +/* RX error */
> +#define RX_PKT_DROP BIT(0)
> +#define RX_L2_ERR BIT(1)
> +#define RX_PKT_ERR (RX_PKT_DROP | RX_L2_ERR)
> +
> +#define SGMII_SPEED_1000 0x08
> +#define SGMII_SPEED_100 0x07
> +#define SGMII_SPEED_10 0x06
> +#define MII_SPEED_100 0x01
> +#define MII_SPEED_10 0x00
> +
> +#define GE_DUPLEX_FULL BIT(0)
> +#define GE_DUPLEX_HALF 0x00
> +#define GE_MODE_CHANGE_EN BIT(0)
> +
> +#define GE_TX_AUTO_NEG BIT(5)
> +#define GE_TX_ADD_CRC BIT(6)
> +#define GE_TX_SHORT_PAD_THROUGH BIT(7)
> +
> +#define GE_RX_STRIP_CRC BIT(0)
> +#define GE_RX_STRIP_PAD BIT(3)
> +#define GE_RX_PAD_EN BIT(4)
> +
> +#define GE_AUTO_NEG_CTL BIT(0)
> +
> +#define GE_RX_INT_THRESHOLD BIT(6)
> +#define GE_RX_TIMEOUT 0x04
> +
> +#define GE_RX_PORT_EN BIT(1)
> +#define GE_TX_PORT_EN BIT(2)
> +
> +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12)
> +
> +#define PPE_CFG_RX_PKT_ALIGN BIT(18)
> +#define PPE_CFG_QOS_VMID_MODE BIT(14)
> +#define PPE_CFG_QOS_VMID_GRP_SHIFT 8
> +
> +#define PPE_CFG_RX_FIFO_FSFU BIT(11)
> +#define PPE_CFG_RX_DEPTH_SHIFT 16
> +#define PPE_CFG_RX_START_SHIFT 0
> +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT 11
> +
> +#define PPE_CFG_BUS_LOCAL_REL BIT(14)
> +#define PPE_CFG_BUS_BIG_ENDIEN BIT(0)
> +
> +#define RX_DESC_NUM 128
> +#define TX_DESC_NUM 256
> +#define TX_NEXT(N) (((N) + 1) & (TX_DESC_NUM-1))
> +#define RX_NEXT(N) (((N) + 1) & (RX_DESC_NUM-1))
> +
> +#define GMAC_PPE_RX_PKT_MAX_LEN 379
> +#define GMAC_MAX_PKT_LEN 1516
> +#define GMAC_MIN_PKT_LEN 31
> +#define RX_BUF_SIZE 1600
> +#define RESET_TIMEOUT 1000
> +#define TX_TIMEOUT (6 * HZ)
> +
> +#define DRV_NAME "hip04-ether"
> +#define DRV_VERSION "v1.0"
> +
> +#define HIP04_MAX_TX_COALESCE_USECS 200
> +#define HIP04_MIN_TX_COALESCE_USECS 100
> +#define HIP04_MAX_TX_COALESCE_FRAMES 200
> +#define HIP04_MIN_TX_COALESCE_FRAMES 100
> +
> +struct tx_desc {
> + u32 send_addr;
__be32 send_adddr; ?
> + u32 send_size;
__be32
> + u32 next_addr;
__be32
> + u32 cfg;
__be32
> + u32 wb_addr;
__be32 wb_addr ?
> +} __aligned(64);
> +
> +struct rx_desc {
> + u16 reserved_16;
> + u16 pkt_len;
> + u32 reserve1[3];
> + u32 pkt_err;
> + u32 reserve2[4];
> +};
> +
> +struct hip04_priv {
> + void __iomem *base;
> + int phy_mode;
> + int chan;
> + unsigned int port;
> + unsigned int speed;
> + unsigned int duplex;
> + unsigned int reg_inten;
> +
> + struct napi_struct napi;
> + struct net_device *ndev;
> +
> + struct tx_desc *tx_desc;
> + dma_addr_t tx_desc_dma;
> + struct sk_buff *tx_skb[TX_DESC_NUM];
> + dma_addr_t tx_phys[TX_DESC_NUM];
This is not an efficient way to store skb/phys, as for each skb, info
will be store in 2 separate cache lines.
It would be better to use a
struct hip04_tx_desc {
struct sk_buff *skb;
dma_addr_t phys;
}
> + unsigned int tx_head;
> +
> + int tx_coalesce_frames;
> + int tx_coalesce_usecs;
> + struct hrtimer tx_coalesce_timer;
> +
> + unsigned char *rx_buf[RX_DESC_NUM];
> + dma_addr_t rx_phys[RX_DESC_NUM];
Same thing here : Use a struct to get better data locality.
> + unsigned int rx_head;
> + unsigned int rx_buf_size;
> +
> + struct device_node *phy_node;
> + struct phy_device *phy;
> + struct regmap *map;
> + struct work_struct tx_timeout_task;
> +
> + /* written only by tx cleanup */
> + unsigned int tx_tail ____cacheline_aligned_in_smp;
> +};
> +
> +static inline unsigned int tx_count(unsigned int head, unsigned int tail)
> +{
> + return (head - tail) % (TX_DESC_NUM - 1);
> +}
> +
> +static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + u32 val;
> +
> + priv->speed = speed;
> + priv->duplex = duplex;
> +
> + switch (priv->phy_mode) {
> + case PHY_INTERFACE_MODE_SGMII:
> + if (speed == SPEED_1000)
> + val = SGMII_SPEED_1000;
> + else if (speed == SPEED_100)
> + val = SGMII_SPEED_100;
> + else
> + val = SGMII_SPEED_10;
> + break;
> + case PHY_INTERFACE_MODE_MII:
> + if (speed == SPEED_100)
> + val = MII_SPEED_100;
> + else
> + val = MII_SPEED_10;
> + break;
> + default:
> + netdev_warn(ndev, "not supported mode\n");
> + val = MII_SPEED_10;
> + break;
> + }
> + writel_relaxed(val, priv->base + GE_PORT_MODE);
> +
> + val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
> + writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
> +
> + val = GE_MODE_CHANGE_EN;
> + writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
> +}
> +
> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> + u32 val, tmp, timeout = 0;
> +
> + do {
> + regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> + regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> + if (timeout++ > RESET_TIMEOUT)
> + break;
> + } while (val & 0xfff);
> +}
> +
> +static void hip04_config_fifo(struct hip04_priv *priv)
> +{
> + u32 val;
> +
> + val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> + val |= PPE_CFG_STS_RX_PKT_CNT_RC;
> + writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
> +
> + val = BIT(priv->port);
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
> +
> + val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
> + val |= PPE_CFG_QOS_VMID_MODE;
> + writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
> +
> + val = RX_BUF_SIZE;
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
> +
> + val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
> + val |= PPE_CFG_RX_FIFO_FSFU;
> + val |= priv->chan << PPE_CFG_RX_START_SHIFT;
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
> +
> + val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
> + writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
> +
> + val = PPE_CFG_RX_PKT_ALIGN;
> + writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
> +
> + val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
> + writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
> +
> + val = GMAC_PPE_RX_PKT_MAX_LEN;
> + writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
> +
> + val = GMAC_MAX_PKT_LEN;
> + writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
> +
> + val = GMAC_MIN_PKT_LEN;
> + writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
> +
> + val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
> + val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
> + writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
> +
> + val = GE_RX_STRIP_CRC;
> + writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
> +
> + val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
> + val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
> + writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
> +
> + val = GE_AUTO_NEG_CTL;
> + writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
> +}
> +
> +static void hip04_mac_enable(struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + u32 val;
> +
> + /* enable tx & rx */
> + val = readl_relaxed(priv->base + GE_PORT_EN);
> + val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
> + writel_relaxed(val, priv->base + GE_PORT_EN);
> +
> + /* clear rx int */
> + val = RCV_INT;
> + writel_relaxed(val, priv->base + PPE_RINT);
> +
> + /* config recv int */
> + val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
> + writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
> +
> + /* enable interrupt */
> + priv->reg_inten = DEF_INT_MASK;
> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +}
> +
> +static void hip04_mac_disable(struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + u32 val;
> +
> + /* disable int */
> + priv->reg_inten &= ~(DEF_INT_MASK);
> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> + /* disable tx & rx */
> + val = readl_relaxed(priv->base + GE_PORT_EN);
> + val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
> + writel_relaxed(val, priv->base + GE_PORT_EN);
> +}
> +
> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> + writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
> +}
> +
> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
> +}
> +
> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> +{
> + return readl(priv->base + PPE_HIS_RX_PKT_CNT);
> +}
> +
> +static void hip04_update_mac_address(struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> +
> + writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
> + priv->base + GE_STATION_MAC_ADDRESS);
> + writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
> + (ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
> + priv->base + GE_STATION_MAC_ADDRESS + 4);
> +}
> +
> +static int hip04_set_mac_address(struct net_device *ndev, void *addr)
> +{
> + eth_mac_addr(ndev, addr);
> + hip04_update_mac_address(ndev);
> + return 0;
> +}
> +
> +static int hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + unsigned tx_tail = priv->tx_tail;
> + struct tx_desc *desc;
> + unsigned int bytes_compl = 0, pkts_compl = 0;
> + unsigned int count;
> +
> + smp_rmb();
> + count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
> + if (count == 0)
> + goto out;
> +
> + while (count) {
> + desc = &priv->tx_desc[tx_tail];
> + if (desc->send_addr != 0) {
> + if (force)
> + desc->send_addr = 0;
> + else
> + break;
> + }
> +
> + if (priv->tx_phys[tx_tail]) {
> + dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> + priv->tx_skb[tx_tail]->len,
> + DMA_TO_DEVICE);
> + priv->tx_phys[tx_tail] = 0;
> + }
> + pkts_compl++;
> + bytes_compl += priv->tx_skb[tx_tail]->len;
> + dev_kfree_skb(priv->tx_skb[tx_tail]);
> + priv->tx_skb[tx_tail] = NULL;
> + tx_tail = TX_NEXT(tx_tail);
> + count--;
> + }
> +
> + priv->tx_tail = tx_tail;
> + smp_wmb(); /* Ensure tx_tail visible to xmit */
> +
> +out:
> + if (pkts_compl || bytes_compl)
Testing bytes_compl should be enough : There is no way pkt_compl could
be 0 if bytes_compl is not 0.
> + netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> +
> + if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
> + netif_wake_queue(ndev);
> +
> + return count;
> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> + struct hip04_priv *priv = netdev_priv(ndev);
> + struct net_device_stats *stats = &ndev->stats;
> + unsigned int tx_head = priv->tx_head, count;
> + struct tx_desc *desc = &priv->tx_desc[tx_head];
> + dma_addr_t phys;
> +
> + smp_rmb();
> + count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
> + if (count == (TX_DESC_NUM - 1)) {
> + netif_stop_queue(ndev);
> + return NETDEV_TX_BUSY;
> + }
> +
> + phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> + if (dma_mapping_error(&ndev->dev, phys)) {
> + dev_kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + priv->tx_skb[tx_head] = skb;
> + priv->tx_phys[tx_head] = phys;
> + desc->send_addr = cpu_to_be32(phys);
> + desc->send_size = cpu_to_be32(skb->len);
> + desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
> + phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> + desc->wb_addr = cpu_to_be32(phys);
> + skb_tx_timestamp(skb);
> +
> + hip04_set_xmit_desc(priv, phys);
> + priv->tx_head = TX_NEXT(tx_head);
> + count++;
Starting from this point, skb might already have been freed by TX
completion.
Its racy to access skb->len
> + netdev_sent_queue(ndev, skb->len);
> +
> + stats->tx_bytes += skb->len;
> + stats->tx_packets++;
> +
> + /* Ensure tx_head update visible to tx reclaim */
> + smp_wmb();
> +
> + /* queue is getting full, better start cleaning up now */
> + if (count >= priv->tx_coalesce_frames) {
> + if (napi_schedule_prep(&priv->napi)) {
> + /* disable rx interrupt and timer */
> + priv->reg_inten &= ~(RCV_INT);
> + writel_relaxed(DEF_INT_MASK & ~RCV_INT,
> + priv->base + PPE_INTEN);
> + hrtimer_cancel(&priv->tx_coalesce_timer);
> + __napi_schedule(&priv->napi);
> + }
> + } else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
> + /* cleanup not pending yet, start a new timer */
> + hrtimer_start_expires(&priv->tx_coalesce_timer,
> + HRTIMER_MODE_REL);
> + }
> +
> + return NETDEV_TX_OK;
> +}
> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> + struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> + struct net_device *ndev = priv->ndev;
> + struct net_device_stats *stats = &ndev->stats;
> + unsigned int cnt = hip04_recv_cnt(priv);
> + struct rx_desc *desc;
> + struct sk_buff *skb;
> + unsigned char *buf;
> + bool last = false;
> + dma_addr_t phys;
> + int rx = 0;
> + int tx_remaining;
> + u16 len;
> + u32 err;
> +
> + while (cnt && !last) {
> + buf = priv->rx_buf[priv->rx_head];
> + skb = build_skb(buf, priv->rx_buf_size);
> + if (unlikely(!skb))
> + net_dbg_ratelimited("build_skb failed\n");
Well, is skb is NULL, you're crashing later...
You really have to address a memory allocation error much better than
that !
> +
> + dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> + RX_BUF_SIZE, DMA_FROM_DEVICE);
> + priv->rx_phys[priv->rx_head] = 0;
> +
> + desc = (struct rx_desc *)skb->data;
> + len = be16_to_cpu(desc->pkt_len);
> + err = be32_to_cpu(desc->pkt_err);
> +
> + if (0 == len) {
> + dev_kfree_skb_any(skb);
> + last = true;
> + } else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
> + dev_kfree_skb_any(skb);
> + stats->rx_dropped++;
> + stats->rx_errors++;
> + } else {
> + skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> + skb_put(skb, len);
> + skb->protocol = eth_type_trans(skb, ndev);
> + napi_gro_receive(&priv->napi, skb);
> + stats->rx_packets++;
> + stats->rx_bytes += len;
> + rx++;
> + }
> +
> + buf = netdev_alloc_frag(priv->rx_buf_size);
> + if (!buf)
> + goto done;
Same problem here : In case of memory allocation error, your driver is
totally screwed.
> + phys = dma_map_single(&ndev->dev, buf,
> + RX_BUF_SIZE, DMA_FROM_DEVICE);
> + if (dma_mapping_error(&ndev->dev, phys))
> + goto done;
Same problem here : You really have to recover properly.
> + priv->rx_buf[priv->rx_head] = buf;
> + priv->rx_phys[priv->rx_head] = phys;
> + hip04_set_recv_desc(priv, phys);
> +
> + priv->rx_head = RX_NEXT(priv->rx_head);
> + if (rx >= budget)
> + goto done;
> +
> + if (--cnt == 0)
> + cnt = hip04_recv_cnt(priv);
> + }
> +
> + if (!(priv->reg_inten & RCV_INT)) {
> + /* enable rx interrupt */
> + priv->reg_inten |= RCV_INT;
> + writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> + }
> + napi_complete(napi);
> +done:
> + /* clean up tx descriptors and start a new timer if necessary */
> + tx_remaining = hip04_tx_reclaim(ndev, false);
> + if (rx < budget && tx_remaining)
> + hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
> +
> + return rx;
> +}
> +
^ permalink raw reply
* [patch-net-next v2 2/3] net: ethernet: cpsw: split out IRQ handler
From: Felipe Balbi @ 2015-01-14 16:58 UTC (permalink / raw)
To: davem
Cc: Tony Lindgren, Linux OMAP Mailing List, mugunthanvnm, netdev,
Felipe Balbi
In-Reply-To: <1421254729-10602-1-git-send-email-balbi@ti.com>
Now we can introduce dedicated IRQ handlers
for each of the IRQ events. This helps with
cleaning up a little bit of the clutter in
cpsw_interrupt() while also making sure that
TX IRQs will try to handle TX buffers while
RX IRQs will try to handle RX buffers.
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
drivers/net/ethernet/ti/cpsw.c | 41 ++++++++++++++++++++++++++++++-----------
1 file changed, 30 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8e1af51e4b76..c6c483f3e49f 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -754,18 +754,36 @@ requeue:
dev_kfree_skb_any(new_skb);
}
-static irqreturn_t cpsw_interrupt(int irq, void *dev_id)
+static irqreturn_t cpsw_dummy_interrupt(int irq, void *dev_id)
{
struct cpsw_priv *priv = dev_id;
int value = irq - priv->irqs_table[0];
- /* NOTICE: Ending IRQ here. The trick with the 'value' variable above
- * is to make sure we will always write the correct value to the EOI
- * register. Namely 0 for RX_THRESH Interrupt, 1 for RX Interrupt, 2
- * for TX Interrupt and 3 for MISC Interrupt.
- */
cpdma_ctlr_eoi(priv->dma, value);
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
+{
+ struct cpsw_priv *priv = dev_id;
+
+ cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_TX);
+ cpdma_chan_process(priv->txch, 128);
+
+ priv = cpsw_get_slave_priv(priv, 1);
+ if (priv)
+ cpdma_chan_process(priv->txch, 128);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
+{
+ struct cpsw_priv *priv = dev_id;
+
+ cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_RX);
+
cpsw_intr_disable(priv);
if (priv->irq_enabled == true) {
cpsw_disable_irq(priv);
@@ -1617,7 +1635,8 @@ static void cpsw_ndo_poll_controller(struct net_device *ndev)
cpsw_intr_disable(priv);
cpdma_ctlr_int_ctrl(priv->dma, false);
- cpsw_interrupt(ndev->irq, priv);
+ cpsw_rx_interrupt(priv->irq[1], priv);
+ cpsw_tx_interrupt(priv->irq[2], priv);
cpdma_ctlr_int_ctrl(priv->dma, true);
cpsw_intr_enable(priv);
}
@@ -2351,7 +2370,7 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ale_ret;
priv->irqs_table[0] = irq;
- ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
0, dev_name(&pdev->dev), priv);
if (ret < 0) {
dev_err(priv->dev, "error attaching irq (%d)\n", ret);
@@ -2363,7 +2382,7 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ale_ret;
priv->irqs_table[1] = irq;
- ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
0, dev_name(&pdev->dev), priv);
if (ret < 0) {
dev_err(priv->dev, "error attaching irq (%d)\n", ret);
@@ -2375,7 +2394,7 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ale_ret;
priv->irqs_table[2] = irq;
- ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
0, dev_name(&pdev->dev), priv);
if (ret < 0) {
dev_err(priv->dev, "error attaching irq (%d)\n", ret);
@@ -2387,7 +2406,7 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ale_ret;
priv->irqs_table[3] = irq;
- ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
0, dev_name(&pdev->dev), priv);
if (ret < 0) {
dev_err(priv->dev, "error attaching irq (%d)\n", ret);
--
2.2.0
^ permalink raw reply related
* [patch-net-next v2 1/3] net: ethernet: cpsw: unroll IRQ request loop
From: Felipe Balbi @ 2015-01-14 16:58 UTC (permalink / raw)
To: davem
Cc: Tony Lindgren, Linux OMAP Mailing List, mugunthanvnm, netdev,
Felipe Balbi
This patch is in preparation for a nicer IRQ
handling scheme where we use different IRQ
handlers for each IRQ line (as it should be).
Later, we will also drop IRQs offset 0 and 3
because they are always disabled in this driver.
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
drivers/net/ethernet/ti/cpsw.c | 62 ++++++++++++++++++++++++++++++++----------
1 file changed, 47 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e61ee8351272..8e1af51e4b76 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2156,7 +2156,8 @@ static int cpsw_probe(struct platform_device *pdev)
void __iomem *ss_regs;
struct resource *res, *ss_res;
u32 slave_offset, sliver_offset, slave_size;
- int ret = 0, i, k = 0;
+ int ret = 0, i;
+ int irq;
ndev = alloc_etherdev(sizeof(struct cpsw_priv));
if (!ndev) {
@@ -2345,24 +2346,55 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ale_ret;
}
- while ((res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k))) {
- if (k >= ARRAY_SIZE(priv->irqs_table)) {
- ret = -EINVAL;
- goto clean_ale_ret;
- }
+ irq = platform_get_irq(pdev, 0);
+ if (irq < 0)
+ goto clean_ale_ret;
- ret = devm_request_irq(&pdev->dev, res->start, cpsw_interrupt,
- 0, dev_name(&pdev->dev), priv);
- if (ret < 0) {
- dev_err(priv->dev, "error attaching irq (%d)\n", ret);
- goto clean_ale_ret;
- }
+ priv->irqs_table[0] = irq;
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ 0, dev_name(&pdev->dev), priv);
+ if (ret < 0) {
+ dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+ goto clean_ale_ret;
+ }
- priv->irqs_table[k] = res->start;
- k++;
+ irq = platform_get_irq(pdev, 1);
+ if (irq < 0)
+ goto clean_ale_ret;
+
+ priv->irqs_table[1] = irq;
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ 0, dev_name(&pdev->dev), priv);
+ if (ret < 0) {
+ dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+ goto clean_ale_ret;
+ }
+
+ irq = platform_get_irq(pdev, 2);
+ if (irq < 0)
+ goto clean_ale_ret;
+
+ priv->irqs_table[2] = irq;
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ 0, dev_name(&pdev->dev), priv);
+ if (ret < 0) {
+ dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+ goto clean_ale_ret;
+ }
+
+ irq = platform_get_irq(pdev, 3);
+ if (irq < 0)
+ goto clean_ale_ret;
+
+ priv->irqs_table[3] = irq;
+ ret = devm_request_irq(&pdev->dev, irq, cpsw_interrupt,
+ 0, dev_name(&pdev->dev), priv);
+ if (ret < 0) {
+ dev_err(priv->dev, "error attaching irq (%d)\n", ret);
+ goto clean_ale_ret;
}
- priv->num_irqs = k;
+ priv->num_irqs = 4;
ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
--
2.2.0
^ permalink raw reply related
* [patch-net-next v2 3/3] net: ethernet: cpsw: don't requests IRQs we don't use
From: Felipe Balbi @ 2015-01-14 16:58 UTC (permalink / raw)
To: davem
Cc: Tony Lindgren, Linux OMAP Mailing List, mugunthanvnm, netdev,
Felipe Balbi
In-Reply-To: <1421254729-10602-1-git-send-email-balbi@ti.com>
CPSW never uses RX_THRESHOLD or MISC interrupts. In
fact, they are always kept masked in their appropriate
IRQ Enable register.
Instead of allocating an IRQ that never fires, it's best
to remove that code altogether and let future patches
implement it if anybody needs those.
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
drivers/net/ethernet/ti/cpsw.c | 55 ++++++++++++------------------------------
1 file changed, 15 insertions(+), 40 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c6c483f3e49f..ba09ff3c1695 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -754,16 +754,6 @@ requeue:
dev_kfree_skb_any(new_skb);
}
-static irqreturn_t cpsw_dummy_interrupt(int irq, void *dev_id)
-{
- struct cpsw_priv *priv = dev_id;
- int value = irq - priv->irqs_table[0];
-
- cpdma_ctlr_eoi(priv->dma, value);
-
- return IRQ_HANDLED;
-}
-
static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
{
struct cpsw_priv *priv = dev_id;
@@ -1635,8 +1625,8 @@ static void cpsw_ndo_poll_controller(struct net_device *ndev)
cpsw_intr_disable(priv);
cpdma_ctlr_int_ctrl(priv->dma, false);
- cpsw_rx_interrupt(priv->irq[1], priv);
- cpsw_tx_interrupt(priv->irq[2], priv);
+ cpsw_rx_interrupt(priv->irq[0], priv);
+ cpsw_tx_interrupt(priv->irq[1], priv);
cpdma_ctlr_int_ctrl(priv->dma, true);
cpsw_intr_enable(priv);
}
@@ -2358,30 +2348,27 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_dma_ret;
}
- ndev->irq = platform_get_irq(pdev, 0);
+ ndev->irq = platform_get_irq(pdev, 1);
if (ndev->irq < 0) {
dev_err(priv->dev, "error getting irq resource\n");
ret = -ENOENT;
goto clean_ale_ret;
}
- irq = platform_get_irq(pdev, 0);
- if (irq < 0)
- goto clean_ale_ret;
-
- priv->irqs_table[0] = irq;
- ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
- 0, dev_name(&pdev->dev), priv);
- if (ret < 0) {
- dev_err(priv->dev, "error attaching irq (%d)\n", ret);
- goto clean_ale_ret;
- }
+ /* Grab RX and TX IRQs. Note that we also have RX_THRESHOLD and
+ * MISC IRQs which are always kept disabled with this driver so
+ * we will not request them.
+ *
+ * If anyone wants to implement support for those, make sure to
+ * first request and append them to irqs_table array.
+ */
+ /* RX IRQ */
irq = platform_get_irq(pdev, 1);
if (irq < 0)
goto clean_ale_ret;
- priv->irqs_table[1] = irq;
+ priv->irqs_table[0] = irq;
ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
0, dev_name(&pdev->dev), priv);
if (ret < 0) {
@@ -2389,31 +2376,19 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_ale_ret;
}
+ /* TX IRQ */
irq = platform_get_irq(pdev, 2);
if (irq < 0)
goto clean_ale_ret;
- priv->irqs_table[2] = irq;
+ priv->irqs_table[1] = irq;
ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
0, dev_name(&pdev->dev), priv);
if (ret < 0) {
dev_err(priv->dev, "error attaching irq (%d)\n", ret);
goto clean_ale_ret;
}
-
- irq = platform_get_irq(pdev, 3);
- if (irq < 0)
- goto clean_ale_ret;
-
- priv->irqs_table[3] = irq;
- ret = devm_request_irq(&pdev->dev, irq, cpsw_dummy_interrupt,
- 0, dev_name(&pdev->dev), priv);
- if (ret < 0) {
- dev_err(priv->dev, "error attaching irq (%d)\n", ret);
- goto clean_ale_ret;
- }
-
- priv->num_irqs = 4;
+ priv->num_irqs = 2;
ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
--
2.2.0
^ permalink raw reply related
* [patch net] team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
From: Jiri Pirko @ 2015-01-14 17:15 UTC (permalink / raw)
To: netdev; +Cc: davem, jbenc
This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").
Consider following scenario:
count_pending == 2
CPU0 CPU1
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 1)
schedule_delayed_work
team_notify_peers
atomic_add (adding 1 to count_pending)
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 1)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 0)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test (dec count_pending to -1)
Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.
Fixes: fc423ff00df3a1955441 ("team: add peer notification")
Fixes: 492b200efdd20b8fcfd ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
drivers/net/team/team.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 43bcfff..4b2bfc5 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -622,6 +622,7 @@ static int team_change_mode(struct team *team, const char *kind)
static void team_notify_peers_work(struct work_struct *work)
{
struct team *team;
+ int val;
team = container_of(work, struct team, notify_peers.dw.work);
@@ -629,9 +630,14 @@ static void team_notify_peers_work(struct work_struct *work)
schedule_delayed_work(&team->notify_peers.dw, 0);
return;
}
+ val = atomic_dec_if_positive(&team->notify_peers.count_pending);
+ if (val < 0) {
+ rtnl_unlock();
+ return;
+ }
call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, team->dev);
rtnl_unlock();
- if (!atomic_dec_and_test(&team->notify_peers.count_pending))
+ if (val)
schedule_delayed_work(&team->notify_peers.dw,
msecs_to_jiffies(team->notify_peers.interval));
}
@@ -662,6 +668,7 @@ static void team_notify_peers_fini(struct team *team)
static void team_mcast_rejoin_work(struct work_struct *work)
{
struct team *team;
+ int val;
team = container_of(work, struct team, mcast_rejoin.dw.work);
@@ -669,9 +676,14 @@ static void team_mcast_rejoin_work(struct work_struct *work)
schedule_delayed_work(&team->mcast_rejoin.dw, 0);
return;
}
+ val = atomic_dec_if_positive(&team->mcast_rejoin.count_pending);
+ if (val < 0) {
+ rtnl_unlock();
+ return;
+ }
call_netdevice_notifiers(NETDEV_RESEND_IGMP, team->dev);
rtnl_unlock();
- if (!atomic_dec_and_test(&team->mcast_rejoin.count_pending))
+ if (val)
schedule_delayed_work(&team->mcast_rejoin.dw,
msecs_to_jiffies(team->mcast_rejoin.interval));
}
--
1.9.3
^ permalink raw reply related
* Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
From: Eric Dumazet @ 2015-01-14 17:20 UTC (permalink / raw)
To: Thomas Jarosch
Cc: 'Linux Netdev List', Eric Dumazet, Jeff Kirsher,
e1000-devel
In-Reply-To: <1719052.SGOfRAJhfQ@storm>
On Wed, 2015-01-14 at 16:32 +0100, Thomas Jarosch wrote:
> Hello,
>
> after updating a good bunch of production level machines
> from kernel 3.4.101 to kernel 3.14.25, a few of them started
> to show serious trouble when there was a lot of network traffic.
>
> ---------------------------------------------------------------
> Jan 14 11:14:57 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
> Jan 14 11:14:57 intrartc kernel: TDH <3b>
> Jan 14 11:14:57 intrartc kernel: TDT <76>
> Jan 14 11:14:57 intrartc kernel: next_to_use <76>
> Jan 14 11:14:57 intrartc kernel: next_to_clean <31>
> Jan 14 11:14:57 intrartc kernel: buffer_info[next_to_clean]:
> Jan 14 11:14:57 intrartc kernel: time_stamp <ffff328c>
> Jan 14 11:14:57 intrartc kernel: next_to_watch <3b>
> Jan 14 11:14:57 intrartc kernel: jiffies <ffff33b9>
> Jan 14 11:14:57 intrartc kernel: next_to_watch.status <0>
> Jan 14 11:14:57 intrartc kernel: MAC Status <40080083>
> Jan 14 11:14:57 intrartc kernel: PHY Status <796d>
> Jan 14 11:14:57 intrartc kernel: PHY 1000BASE-T Status <3800>
> Jan 14 11:14:57 intrartc kernel: PHY Extended Status <3000>
> Jan 14 11:14:57 intrartc kernel: PCI Status <10>
> Jan 14 11:14:59 intrartc kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
> ..
> ---------------------------------------------------------------
>
> All of those troubled machines use an Intel DH61CR board and
> are driven by the e1000e driver. Kernels 3.7.0 to 3.19-rc4 are affected.
>
> The problem vanishes when you disable TSO. This is the
> recommended "solution" on serverfault and others.
> http://ehc.ac/p/e1000/bugs/378/
> http://serverfault.com/questions/616485/e1000e-reset-adapter-unexpectedly-detected-hardware-unit-hang
>
> I have a test setup that can trigger the problem within seconds
> and bisected it down to this commit (hi Eric!):
> ---------------------------------------------------------------
> commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db
> Author: Eric Dumazet <edumazet@google.com>
> Date: Wed Sep 26 06:46:57 2012 +0000
>
> net: use bigger pages in __netdev_alloc_frag
>
> We currently use percpu order-0 pages in __netdev_alloc_frag
> to deliver fragments used by __netdev_alloc_skb()
>
> Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
> be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
>
> Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
>
> - Better filling of space (the ending hole overhead is less an issue)
>
> - Less calls to page allocator or accesses to page->_count
>
> - Could allow struct skb_shared_info futures changes without major
> performance impact.
>
> This patch implements a transparent fallback to smaller
> pages in case of memory pressure.
>
> It also uses a standard "struct page_frag" instead of a custom one.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Cc: Benjamin LaHaise <bcrl@kvack.org>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---------------------------------------------------------------
>
> Reverting the commit f.e. in kernel 3.7.0 solves the issue.
> I've done some more tests:
>
> 3.18.0 32bit + PAE: broken
> 3.6.0 32bit + PAE: works
> 3.7.0 32bit + PAE: broken
> 3.7.0 32bit + PAE + revert 69b08f62e17439ee3d436faf0b9a7ca6fffb78db -> works
>
> 3.7.0 32bit (without PAE) -> broken
> 3.7.0 32bit + "GFP_COMP" flag removed in __netdev_alloc_frag(): broken
> 3.7.0 32bit + "GFP_COMP" flag replaced with
> "GFP_DMA" in __netdev_alloc_frag(): works!
> 3.7.0 32bit + "GFP_COMP" flag + "GFP_DMA" flag: broken
> 3.19-rc4 32bit: broken
>
>
> The problem is triggered only when the traffic is forwarded to another client.
> (this client is behind NAT). Generating traffic directly
> on the system did not trigger the issue.
>
> To me it looks like Eric's change uncovered a memory allocation
> issue in the e1000e driver: It probably uses a memory address
> unsuitable for DMA or so. This is just a guess though.
>
> Funny fact: I have another Intel DH61CR board that does not show the problem.
> I've borrowed (...) the mainboard from one affected box for my bisect test setup.
>
> Please CC: comments. Thanks.
I would try to use lower data per txd. I am not sure 24KB is really
supported.
( check commit d821a4c4d11ad160925dab2bb009b8444beff484 for details)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index e14fd85f64eb..8d973f7edfbd 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3897,7 +3897,7 @@ void e1000e_reset(struct e1000_adapter *adapter)
* limit of 24KB due to receive synchronization limitations.
*/
adapter->tx_fifo_limit = min_t(u32, ((er32(PBA) >> 16) << 10) - 96,
- 24 << 10);
+ 8 << 10);
/* Disable Adaptive Interrupt Moderation if 2 full packets cannot
* fit in receive buffer.
^ permalink raw reply related
* [PATCH v3 02/16] virtio/9p: verify device has config space
From: Michael S. Tsirkin @ 2015-01-14 17:27 UTC (permalink / raw)
To: linux-kernel, virtualization
Cc: Eric Van Hensbergen, netdev, v9fs-developer, Ron Minnich,
David S. Miller
In-Reply-To: <1421256142-11512-1-git-send-email-mst@redhat.com>
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/9p needs config space access so make it
fail gracefully if not there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
net/9p/trans_virtio.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index daa749c..d8e376a 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -524,6 +524,12 @@ static int p9_virtio_probe(struct virtio_device *vdev)
int err;
struct virtio_chan *chan;
+ if (!vdev->config->get) {
+ dev_err(&vdev->dev, "%s failure: config access disabled\n",
+ __func__);
+ return -EINVAL;
+ }
+
chan = kmalloc(sizeof(struct virtio_chan), GFP_KERNEL);
if (!chan) {
pr_err("Failed to allocate virtio 9P channel\n");
--
MST
^ permalink raw reply related
* [PATCH v3 05/16] virtio/net: verify device has config space
From: Michael S. Tsirkin @ 2015-01-14 17:27 UTC (permalink / raw)
To: linux-kernel, virtualization
Cc: Rusty Russell, cornelia.huck, virtualization, netdev
In-Reply-To: <1421256142-11512-1-git-send-email-mst@redhat.com>
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/net needs config space access so make it
fail gracefully if not there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/net/virtio_net.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5ca9771..9bc1072 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1713,6 +1713,12 @@ static int virtnet_probe(struct virtio_device *vdev)
struct virtnet_info *vi;
u16 max_queue_pairs;
+ if (!vdev->config->get) {
+ dev_err(&vdev->dev, "%s failure: config access disabled\n",
+ __func__);
+ return -EINVAL;
+ }
+
if (!virtnet_validate_features(vdev))
return -EINVAL;
--
MST
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox