* Re: [PATCHv11 net-next 1/2] openvswitch: Refactor ovs_nla_fill_match().
From: Joe Stringer @ 2014-12-03 21:42 UTC (permalink / raw)
To: Pravin Shelar; +Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, netdev, LKML
In-Reply-To: <CALnjE+pNbwtGPsHfTJ0FXxdSpXT=AKgtfHPqT5matCx+PGcXbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 3 December 2014 at 11:37, Pravin Shelar <pshelar@nicira.com> wrote:
> On Tue, Dec 2, 2014 at 6:56 PM, Joe Stringer <joestringer@nicira.com> wrote:
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index 332b5a0..b2a3796 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -462,10 +462,8 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>> 0, upcall_info->cmd);
>> upcall->dp_ifindex = dp_ifindex;
>>
>> - nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY);
>> - err = ovs_nla_put_flow(key, key, user_skb);
>> + err = ovs_nla_put_flow(key, key, OVS_PACKET_ATTR_KEY, false, user_skb);
>
> We need different name here, since it does not operate on flow. maybe
> __ovs_nla_put_key(). we can move the function definition to
> flow_netlink.h
OK sure. I'll fix this up for the next version.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH net-next] net: bcmgenet: add support for Rx priority queues
From: Petri Gynther @ 2014-12-03 21:31 UTC (permalink / raw)
To: Florian Fainelli; +Cc: netdev, David Miller
In-Reply-To: <547E3192.4080400@gmail.com>
Hi Florian,
On Tue, Dec 2, 2014 at 1:39 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 02/12/14 13:00, Petri Gynther wrote:
>> bcmgenet hardware supports 16 Rx priority queues + 1 Rx default queue.
>> Currently, the driver only supports the Rx default queue.
>> Add support for the Rx priority queues.
>
> You are doing many things in one patch here, I see at least 3 separate
> commits:
>
> - move TX completion to NAPI
> - introduce a RX ring change that just applies to RX ring 16
> - introduce support for RX rings 0 through 15
>
> Eventually a 4th one which caches the reads and writes to the INTRL2_0
> registers and uses int0_mask, which BTW, I had problems with on GENETv4,
> hence the reason why it is not currently adopted.
>
I'll break this patch into smaller pieces.
> Have you tried the following NAPI/queue partitioning:
>
> - one NAPI context per TX queue, except ring 16
> - one NAPI context per RX queue, except ring 16
> - one shared NAPI context for RX & TX queue 16 (today's scheme)
>
Since my initial goal was to get one Rx priority queue working
together with the Rx default queue, I just added one new NAPI context
for the priority queue Rx/Tx processing.
> The changes are looking good, but since there are many things that
> change, it is harder to review, which is why I would prefer separate
> individual patches.
>
> Thanks!
>
>>
>> Signed-off-by: Petri Gynther <pgynther@google.com>
>> ---
>> drivers/net/ethernet/broadcom/genet/bcmgenet.c | 432 +++++++++++++++----------
>> drivers/net/ethernet/broadcom/genet/bcmgenet.h | 27 +-
>> 2 files changed, 289 insertions(+), 170 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> index f2fadb0..aced105 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> @@ -53,8 +53,10 @@
>> /* Default highest priority queue for multi queue support */
>> #define GENET_Q0_PRIORITY 0
>>
>> -#define GENET_DEFAULT_BD_CNT \
>> - (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->bds_cnt)
>> +#define GENET_Q16_RX_BD_CNT \
>> + (TOTAL_DESC - priv->hw_params->rx_queues * priv->hw_params->rx_bds_cnt)
>> +#define GENET_Q16_TX_BD_CNT \
>> + (TOTAL_DESC - priv->hw_params->tx_queues * priv->hw_params->tx_bds_cnt)
>>
>> #define RX_BUF_LENGTH 2048
>> #define SKB_ALIGNMENT 32
>> @@ -1313,7 +1315,8 @@ out:
>> }
>>
>>
>> -static int bcmgenet_rx_refill(struct bcmgenet_priv *priv, struct enet_cb *cb)
>> +static int bcmgenet_rx_refill(struct bcmgenet_priv *priv,
>> + struct bcmgenet_rx_ring *ring, struct enet_cb *cb)
>> {
>> struct device *kdev = &priv->pdev->dev;
>> struct sk_buff *skb;
>> @@ -1341,14 +1344,16 @@ static int bcmgenet_rx_refill(struct bcmgenet_priv *priv, struct enet_cb *cb)
>> dma_unmap_addr_set(cb, dma_addr, mapping);
>> /* assign packet, prepare descriptor, and advance pointer */
>>
>> - dmadesc_set_addr(priv, priv->rx_bd_assign_ptr, mapping);
>> + dmadesc_set_addr(priv, ring->bd_assign_ptr, mapping);
>>
>> /* turn on the newly assigned BD for DMA to use */
>> - priv->rx_bd_assign_index++;
>> - priv->rx_bd_assign_index &= (priv->num_rx_bds - 1);
>> + if (likely(ring->bd_assign_idx < ring->end_ptr))
>> + ring->bd_assign_idx++;
>> + else
>> + ring->bd_assign_idx = ring->cb_ptr;
>>
>> - priv->rx_bd_assign_ptr = priv->rx_bds +
>> - (priv->rx_bd_assign_index * DMA_DESC_SIZE);
>> + ring->bd_assign_ptr = priv->rx_bds +
>> + (ring->bd_assign_idx * DMA_DESC_SIZE);
>>
>> return 0;
>> }
>> @@ -1357,8 +1362,10 @@ static int bcmgenet_rx_refill(struct bcmgenet_priv *priv, struct enet_cb *cb)
>> * this could be called from bottom half, or from NAPI polling method.
>> */
>> static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
>> - unsigned int budget)
>> + unsigned int index,
>> + struct napi_struct *napi, int budget)
>> {
>> + struct bcmgenet_rx_ring *ring = &priv->rx_rings[index];
>> struct net_device *dev = priv->dev;
>> struct enet_cb *cb;
>> struct sk_buff *skb;
>> @@ -1369,21 +1376,21 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
>> unsigned int p_index;
>> unsigned int chksum_ok = 0;
>>
>> - p_index = bcmgenet_rdma_ring_readl(priv, DESC_INDEX, RDMA_PROD_INDEX);
>> + p_index = bcmgenet_rdma_ring_readl(priv, index, RDMA_PROD_INDEX);
>> p_index &= DMA_P_INDEX_MASK;
>>
>> - if (p_index < priv->rx_c_index)
>> - rxpkttoprocess = (DMA_C_INDEX_MASK + 1) -
>> - priv->rx_c_index + p_index;
>> + if (likely(p_index >= ring->c_index))
>> + rxpkttoprocess = p_index - ring->c_index;
>> else
>> - rxpkttoprocess = p_index - priv->rx_c_index;
>> + rxpkttoprocess = (DMA_C_INDEX_MASK + 1) -
>> + ring->c_index + p_index;
>>
>> netif_dbg(priv, rx_status, dev,
>> "RDMA: rxpkttoprocess=%d\n", rxpkttoprocess);
>>
>> while ((rxpktprocessed < rxpkttoprocess) &&
>> (rxpktprocessed < budget)) {
>> - cb = &priv->rx_cbs[priv->rx_read_ptr];
>> + cb = &priv->rx_cbs[ring->read_ptr];
>> skb = cb->skb;
>>
>> /* We do not have a backing SKB, so we do not have a
>> @@ -1408,7 +1415,7 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
>> dma_length_status =
>> dmadesc_get_length_status(priv,
>> priv->rx_bds +
>> - (priv->rx_read_ptr *
>> + (ring->read_ptr *
>> DMA_DESC_SIZE));
>> } else {
>> struct status_64 *status;
>> @@ -1425,8 +1432,8 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
>>
>> netif_dbg(priv, rx_status, dev,
>> "%s:p_ind=%d c_ind=%d read_ptr=%d len_stat=0x%08x\n",
>> - __func__, p_index, priv->rx_c_index,
>> - priv->rx_read_ptr, dma_length_status);
>> + __func__, p_index, ring->c_index,
>> + ring->read_ptr, dma_length_status);
>>
>> if (unlikely(!(dma_flag & DMA_EOP) || !(dma_flag & DMA_SOP))) {
>> netif_err(priv, rx_status, dev,
>> @@ -1491,28 +1498,34 @@ static unsigned int bcmgenet_desc_rx(struct bcmgenet_priv *priv,
>> dev->stats.multicast++;
>>
>> /* Notify kernel */
>> - napi_gro_receive(&priv->napi, skb);
>> + napi_gro_receive(napi, skb);
>> cb->skb = NULL;
>> netif_dbg(priv, rx_status, dev, "pushed up to kernel\n");
>>
>> /* refill RX path on the current control block */
>> refill:
>> - err = bcmgenet_rx_refill(priv, cb);
>> + err = bcmgenet_rx_refill(priv, ring, cb);
>> if (err) {
>> priv->mib.alloc_rx_buff_failed++;
>> netif_err(priv, rx_err, dev, "Rx refill failed\n");
>> }
>>
>> rxpktprocessed++;
>> - priv->rx_read_ptr++;
>> - priv->rx_read_ptr &= (priv->num_rx_bds - 1);
>> + if (likely(ring->read_ptr < ring->end_ptr))
>> + ring->read_ptr++;
>> + else
>> + ring->read_ptr = ring->cb_ptr;
>> +
>> + ring->c_index = (ring->c_index + 1) & DMA_C_INDEX_MASK;
>> + bcmgenet_rdma_ring_writel(priv, index, ring->c_index, RDMA_CONS_INDEX);
>> }
>>
>> return rxpktprocessed;
>> }
>>
>> /* Assign skb to RX DMA descriptor. */
>> -static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv)
>> +static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv,
>> + struct bcmgenet_rx_ring *ring)
>> {
>> struct enet_cb *cb;
>> int ret = 0;
>> @@ -1521,12 +1534,12 @@ static int bcmgenet_alloc_rx_buffers(struct bcmgenet_priv *priv)
>> netif_dbg(priv, hw, priv->dev, "%s:\n", __func__);
>>
>> /* loop here for each buffer needing assign */
>> - for (i = 0; i < priv->num_rx_bds; i++) {
>> - cb = &priv->rx_cbs[priv->rx_bd_assign_index];
>> + for (i = 0; i < ring->size; i++) {
>> + cb = &priv->rx_cbs[ring->bd_assign_idx];
>> if (cb->skb)
>> - continue;
>> + bcmgenet_free_cb(cb);
>>
>> - ret = bcmgenet_rx_refill(priv, cb);
>> + ret = bcmgenet_rx_refill(priv, ring, cb);
>> if (ret)
>> break;
>> }
>> @@ -1607,9 +1620,11 @@ static int reset_umac(struct bcmgenet_priv *priv)
>> static void bcmgenet_intr_disable(struct bcmgenet_priv *priv)
>> {
>> /* Mask all interrupts.*/
>> + priv->int0_mask = 0xFFFFFFFF;
>> bcmgenet_intrl2_0_writel(priv, 0xFFFFFFFF, INTRL2_CPU_MASK_SET);
>> bcmgenet_intrl2_0_writel(priv, 0xFFFFFFFF, INTRL2_CPU_CLEAR);
>> bcmgenet_intrl2_0_writel(priv, 0, INTRL2_CPU_MASK_CLEAR);
>> + priv->int1_mask = 0xFFFFFFFF;
>> bcmgenet_intrl2_1_writel(priv, 0xFFFFFFFF, INTRL2_CPU_MASK_SET);
>> bcmgenet_intrl2_1_writel(priv, 0xFFFFFFFF, INTRL2_CPU_CLEAR);
>> bcmgenet_intrl2_1_writel(priv, 0, INTRL2_CPU_MASK_CLEAR);
>> @@ -1619,7 +1634,8 @@ static int init_umac(struct bcmgenet_priv *priv)
>> {
>> struct device *kdev = &priv->pdev->dev;
>> int ret;
>> - u32 reg, cpu_mask_clear;
>> + u32 reg;
>> + u32 i;
>>
>> dev_dbg(&priv->pdev->dev, "bcmgenet: init_umac\n");
>>
>> @@ -1646,15 +1662,15 @@ static int init_umac(struct bcmgenet_priv *priv)
>>
>> bcmgenet_intr_disable(priv);
>>
>> - cpu_mask_clear = UMAC_IRQ_RXDMA_BDONE;
>> -
>> - dev_dbg(kdev, "%s:Enabling RXDMA_BDONE interrupt\n", __func__);
>> + /* Enable Rx and Tx interrupts for the default queue 16 */
>> + priv->int0_mask &= ~(UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE |
>> + UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE);
>>
>> /* Monitor cable plug/unplugged event for internal PHY */
>> if (phy_is_internal(priv->phydev)) {
>> - cpu_mask_clear |= (UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
>> + priv->int0_mask &= ~(UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
>> } else if (priv->ext_phy) {
>> - cpu_mask_clear |= (UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
>> + priv->int0_mask &= ~(UMAC_IRQ_LINK_DOWN | UMAC_IRQ_LINK_UP);
>> } else if (priv->phy_interface == PHY_INTERFACE_MODE_MOCA) {
>> reg = bcmgenet_bp_mc_get(priv);
>> reg |= BIT(priv->hw_params->bp_in_en_shift);
>> @@ -1669,9 +1685,18 @@ static int init_umac(struct bcmgenet_priv *priv)
>>
>> /* Enable MDIO interrupts on GENET v3+ */
>> if (priv->hw_params->flags & GENET_HAS_MDIO_INTR)
>> - cpu_mask_clear |= UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR;
>> + priv->int0_mask &= ~(UMAC_IRQ_MDIO_DONE | UMAC_IRQ_MDIO_ERROR);
>>
>> - bcmgenet_intrl2_0_writel(priv, cpu_mask_clear, INTRL2_CPU_MASK_CLEAR);
>> + /* Enable Tx priority queue interrupts */
>> + for (i = 0; i < priv->hw_params->tx_queues; i++)
>> + priv->int1_mask &= ~(1 << i);
>> +
>> + /* Enable Rx priority queue interrupts */
>> + for (i = 0; i < priv->hw_params->rx_queues; i++)
>> + priv->int1_mask &= ~(1 << (UMAC_IRQ1_RX_INTR_SHIFT + i));
>> +
>> + bcmgenet_intrl2_0_writel(priv, ~priv->int0_mask, INTRL2_CPU_MASK_CLEAR);
>> + bcmgenet_intrl2_1_writel(priv, ~priv->int1_mask, INTRL2_CPU_MASK_CLEAR);
>>
>> /* Enable rx/tx engine.*/
>> dev_dbg(kdev, "done init umac\n");
>> @@ -1684,12 +1709,11 @@ static int init_umac(struct bcmgenet_priv *priv)
>> */
>> static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
>> unsigned int index, unsigned int size,
>> - unsigned int write_ptr, unsigned int end_ptr)
>> + unsigned int start_ptr, unsigned int end_ptr)
>> {
>> struct bcmgenet_tx_ring *ring = &priv->tx_rings[index];
>> u32 words_per_bd = WORDS_PER_BD(priv);
>> u32 flow_period_val = 0;
>> - unsigned int first_bd;
>>
>> spin_lock_init(&ring->lock);
>> ring->index = index;
>> @@ -1702,12 +1726,12 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
>> ring->int_enable = bcmgenet_tx_ring_int_enable;
>> ring->int_disable = bcmgenet_tx_ring_int_disable;
>> }
>> - ring->cbs = priv->tx_cbs + write_ptr;
>> + ring->cbs = priv->tx_cbs + start_ptr;
>> ring->size = size;
>> ring->c_index = 0;
>> ring->free_bds = size;
>> - ring->write_ptr = write_ptr;
>> - ring->cb_ptr = write_ptr;
>> + ring->write_ptr = start_ptr;
>> + ring->cb_ptr = start_ptr;
>> ring->end_ptr = end_ptr - 1;
>> ring->prod_index = 0;
>>
>> @@ -1718,22 +1742,16 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
>> bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_PROD_INDEX);
>> bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_CONS_INDEX);
>> bcmgenet_tdma_ring_writel(priv, index, 1, DMA_MBUF_DONE_THRESH);
>> - /* Disable rate control for now */
>> bcmgenet_tdma_ring_writel(priv, index, flow_period_val,
>> TDMA_FLOW_PERIOD);
>> - /* Unclassified traffic goes to ring 16 */
>> bcmgenet_tdma_ring_writel(priv, index,
>> ((size << DMA_RING_SIZE_SHIFT) |
>> RX_BUF_LENGTH), DMA_RING_BUF_SIZE);
>> -
>> - first_bd = write_ptr;
>> -
>> - /* Set start and end address, read and write pointers */
>> - bcmgenet_tdma_ring_writel(priv, index, first_bd * words_per_bd,
>> + bcmgenet_tdma_ring_writel(priv, index, start_ptr * words_per_bd,
>> DMA_START_ADDR);
>> - bcmgenet_tdma_ring_writel(priv, index, first_bd * words_per_bd,
>> + bcmgenet_tdma_ring_writel(priv, index, start_ptr * words_per_bd,
>> TDMA_READ_PTR);
>> - bcmgenet_tdma_ring_writel(priv, index, first_bd,
>> + bcmgenet_tdma_ring_writel(priv, index, start_ptr * words_per_bd,
>> TDMA_WRITE_PTR);
>> bcmgenet_tdma_ring_writel(priv, index, end_ptr * words_per_bd - 1,
>> DMA_END_ADDR);
>> @@ -1741,42 +1759,44 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv,
>>
>> /* Initialize a RDMA ring */
>> static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
>> - unsigned int index, unsigned int size)
>> + unsigned int index, unsigned int size,
>> + unsigned int start_ptr, unsigned int end_ptr)
>> {
>> + struct bcmgenet_rx_ring *ring = &priv->rx_rings[index];
>> u32 words_per_bd = WORDS_PER_BD(priv);
>> int ret;
>>
>> - priv->num_rx_bds = TOTAL_DESC;
>> - priv->rx_bds = priv->base + priv->hw_params->rdma_offset;
>> - priv->rx_bd_assign_ptr = priv->rx_bds;
>> - priv->rx_bd_assign_index = 0;
>> - priv->rx_c_index = 0;
>> - priv->rx_read_ptr = 0;
>> - priv->rx_cbs = kcalloc(priv->num_rx_bds, sizeof(struct enet_cb),
>> - GFP_KERNEL);
>> - if (!priv->rx_cbs)
>> - return -ENOMEM;
>> + ring->index = index;
>> + ring->cbs = priv->rx_cbs + start_ptr;
>> + ring->size = size;
>> + ring->c_index = 0;
>> + ring->read_ptr = start_ptr;
>> + ring->cb_ptr = start_ptr;
>> + ring->end_ptr = end_ptr - 1;
>>
>> - ret = bcmgenet_alloc_rx_buffers(priv);
>> + ret = bcmgenet_alloc_rx_buffers(priv, ring);
>> if (ret) {
>> - kfree(priv->rx_cbs);
>> return ret;
>> }
>>
>> - bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_WRITE_PTR);
>> bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_PROD_INDEX);
>> bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_CONS_INDEX);
>> + bcmgenet_rdma_ring_writel(priv, index, 1, DMA_MBUF_DONE_THRESH);
>> bcmgenet_rdma_ring_writel(priv, index,
>> ((size << DMA_RING_SIZE_SHIFT) |
>> RX_BUF_LENGTH), DMA_RING_BUF_SIZE);
>> - bcmgenet_rdma_ring_writel(priv, index, 0, DMA_START_ADDR);
>> - bcmgenet_rdma_ring_writel(priv, index,
>> - words_per_bd * size - 1, DMA_END_ADDR);
>> bcmgenet_rdma_ring_writel(priv, index,
>> (DMA_FC_THRESH_LO <<
>> DMA_XOFF_THRESHOLD_SHIFT) |
>> DMA_FC_THRESH_HI, RDMA_XON_XOFF_THRESH);
>> - bcmgenet_rdma_ring_writel(priv, index, 0, RDMA_READ_PTR);
>> + bcmgenet_rdma_ring_writel(priv, index, start_ptr * words_per_bd,
>> + DMA_START_ADDR);
>> + bcmgenet_rdma_ring_writel(priv, index, start_ptr * words_per_bd,
>> + RDMA_READ_PTR);
>> + bcmgenet_rdma_ring_writel(priv, index, start_ptr * words_per_bd,
>> + RDMA_WRITE_PTR);
>> + bcmgenet_rdma_ring_writel(priv, index, end_ptr * words_per_bd - 1,
>> + DMA_END_ADDR);
>>
>> return ret;
>> }
>> @@ -1784,75 +1804,113 @@ static int bcmgenet_init_rx_ring(struct bcmgenet_priv *priv,
>> /* init multi xmit queues, only available for GENET2+
>> * the queue is partitioned as follows:
>> *
>> - * queue 0 - 3 is priority based, each one has 32 descriptors,
>> + * queues 0-3 are priority based, each one has 32 descriptors,
>> * with queue 0 being the highest priority queue.
>> *
>> - * queue 16 is the default tx queue with GENET_DEFAULT_BD_CNT
>> + * queue 16 is the default tx queue with GENET_Q16_TX_BD_CNT
>> * descriptors: 256 - (number of tx queues * bds per queues) = 128
>> * descriptors.
>> *
>> * The transmit control block pool is then partitioned as following:
>> - * - tx_cbs[0...127] are for queue 16
>> - * - tx_ring_cbs[0] points to tx_cbs[128..159]
>> - * - tx_ring_cbs[1] points to tx_cbs[160..191]
>> - * - tx_ring_cbs[2] points to tx_cbs[192..223]
>> - * - tx_ring_cbs[3] points to tx_cbs[224..255]
>> + * - tx_ring_cbs[0] points to tx_cbs[0..31]
>> + * - tx_ring_cbs[1] points to tx_cbs[32..63]
>> + * - tx_ring_cbs[2] points to tx_cbs[64..95]
>> + * - tx_ring_cbs[3] points to tx_cbs[96..127]
>> + * - tx ring 16 uses tx_cbs[128..255]
>> */
>> -static void bcmgenet_init_multiq(struct net_device *dev)
>> +static void bcmgenet_init_tx_queues(struct net_device *dev)
>> {
>> struct bcmgenet_priv *priv = netdev_priv(dev);
>> unsigned int i, dma_enable;
>> - u32 reg, dma_ctrl, ring_cfg = 0;
>> + u32 dma_ctrl, ring_cfg;
>> u32 dma_priority[3] = {0, 0, 0};
>>
>> - if (!netif_is_multiqueue(dev)) {
>> - netdev_warn(dev, "called with non multi queue aware HW\n");
>> - return;
>> - }
>> -
>> dma_ctrl = bcmgenet_tdma_readl(priv, DMA_CTRL);
>> dma_enable = dma_ctrl & DMA_EN;
>> dma_ctrl &= ~DMA_EN;
>> bcmgenet_tdma_writel(priv, dma_ctrl, DMA_CTRL);
>>
>> + dma_ctrl = 0;
>> + ring_cfg = 0;
>> +
>> /* Enable strict priority arbiter mode */
>> bcmgenet_tdma_writel(priv, DMA_ARBITER_SP, DMA_ARB_CTRL);
>>
>> + /* Initialize Tx priority queues */
>> for (i = 0; i < priv->hw_params->tx_queues; i++) {
>> - /* first 64 tx_cbs are reserved for default tx queue
>> - * (ring 16)
>> - */
>> - bcmgenet_init_tx_ring(priv, i, priv->hw_params->bds_cnt,
>> - i * priv->hw_params->bds_cnt,
>> - (i + 1) * priv->hw_params->bds_cnt);
>> + bcmgenet_init_tx_ring(priv, i, priv->hw_params->tx_bds_cnt,
>> + i * priv->hw_params->tx_bds_cnt,
>> + (i + 1) * priv->hw_params->tx_bds_cnt);
>>
>> /* Configure ring as descriptor ring and setup priority */
>> - ring_cfg |= 1 << i;
>> - dma_ctrl |= 1 << (i + DMA_RING_BUF_EN_SHIFT);
>> + ring_cfg |= (1 << i);
>> + dma_ctrl |= (1 << (i + DMA_RING_BUF_EN_SHIFT));
>>
>> dma_priority[DMA_PRIO_REG_INDEX(i)] |=
>> ((GENET_Q0_PRIORITY + i) << DMA_PRIO_REG_SHIFT(i));
>> }
>>
>> - /* Set ring 16 priority and program the hardware registers */
>> + /* Initialize Tx default queue 16 */
>> + bcmgenet_init_tx_ring(priv, DESC_INDEX, GENET_Q16_TX_BD_CNT,
>> + priv->hw_params->tx_queues *
>> + priv->hw_params->tx_bds_cnt, TOTAL_DESC);
>> + ring_cfg |= (1 << DESC_INDEX);
>> + dma_ctrl |= (1 << (DESC_INDEX + DMA_RING_BUF_EN_SHIFT));
>> dma_priority[DMA_PRIO_REG_INDEX(DESC_INDEX)] |=
>> ((GENET_Q0_PRIORITY + priv->hw_params->tx_queues) <<
>> DMA_PRIO_REG_SHIFT(DESC_INDEX));
>> +
>> + /* Set Tx ring priorities */
>> bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0);
>> bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1);
>> bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2);
>>
>> /* Enable rings */
>> - reg = bcmgenet_tdma_readl(priv, DMA_RING_CFG);
>> - reg |= ring_cfg;
>> - bcmgenet_tdma_writel(priv, reg, DMA_RING_CFG);
>> + bcmgenet_tdma_writel(priv, ring_cfg, DMA_RING_CFG);
>>
>> /* Configure ring as descriptor ring and re-enable DMA if enabled */
>> - reg = bcmgenet_tdma_readl(priv, DMA_CTRL);
>> - reg |= dma_ctrl;
>> if (dma_enable)
>> - reg |= DMA_EN;
>> - bcmgenet_tdma_writel(priv, reg, DMA_CTRL);
>> + dma_ctrl |= DMA_EN;
>> + bcmgenet_tdma_writel(priv, dma_ctrl, DMA_CTRL);
>> +}
>> +
>> +static void bcmgenet_init_rx_queues(struct net_device *dev)
>> +{
>> + struct bcmgenet_priv *priv = netdev_priv(dev);
>> + unsigned int i, dma_enable;
>> + u32 dma_ctrl, ring_cfg;
>> +
>> + dma_ctrl = bcmgenet_rdma_readl(priv, DMA_CTRL);
>> + dma_enable = dma_ctrl & DMA_EN;
>> + dma_ctrl &= ~DMA_EN;
>> + bcmgenet_rdma_writel(priv, dma_ctrl, DMA_CTRL);
>> +
>> + dma_ctrl = 0;
>> + ring_cfg = 0;
>> +
>> + /* Initialize Rx priority queues */
>> + for (i = 0; i < priv->hw_params->rx_queues; i++) {
>> + bcmgenet_init_rx_ring(priv, i, priv->hw_params->rx_bds_cnt,
>> + i * priv->hw_params->rx_bds_cnt,
>> + (i + 1) * priv->hw_params->rx_bds_cnt);
>> + ring_cfg |= (1 << i);
>> + dma_ctrl |= (1 << (i + DMA_RING_BUF_EN_SHIFT));
>> + }
>> +
>> + /* Initialize Rx default queue 16 */
>> + bcmgenet_init_rx_ring(priv, DESC_INDEX, GENET_Q16_RX_BD_CNT,
>> + priv->hw_params->rx_queues *
>> + priv->hw_params->rx_bds_cnt, TOTAL_DESC);
>> + ring_cfg |= (1 << DESC_INDEX);
>> + dma_ctrl |= (1 << (DESC_INDEX + DMA_RING_BUF_EN_SHIFT));
>> +
>> + /* Enable rings */
>> + bcmgenet_rdma_writel(priv, ring_cfg, DMA_RING_CFG);
>> +
>> + /* Configure ring as descriptor ring and re-enable DMA if enabled */
>> + if (dma_enable)
>> + dma_ctrl |= DMA_EN;
>> + bcmgenet_rdma_writel(priv, dma_ctrl, DMA_CTRL);
>> }
>>
>> static int bcmgenet_dma_teardown(struct bcmgenet_priv *priv)
>> @@ -1928,24 +1986,28 @@ static void bcmgenet_fini_dma(struct bcmgenet_priv *priv)
>> /* init_edma: Initialize DMA control register */
>> static int bcmgenet_init_dma(struct bcmgenet_priv *priv)
>> {
>> - int ret;
>> + netif_dbg(priv, hw, priv->dev, "bcmgenet: init_dma\n");
>>
>> - netif_dbg(priv, hw, priv->dev, "bcmgenet: init_edma\n");
>> + /* init rDma */
>> + bcmgenet_rdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
>>
>> - /* by default, enable ring 16 (descriptor based) */
>> - ret = bcmgenet_init_rx_ring(priv, DESC_INDEX, TOTAL_DESC);
>> - if (ret) {
>> - netdev_err(priv->dev, "failed to initialize RX ring\n");
>> - return ret;
>> + /* init common Rx ring structures */
>> + priv->rx_bds = priv->base + priv->hw_params->rdma_offset;
>> + priv->num_rx_bds = TOTAL_DESC;
>> + priv->rx_cbs = kcalloc(priv->num_rx_bds, sizeof(struct enet_cb),
>> + GFP_KERNEL);
>> + if (!priv->rx_cbs) {
>> + bcmgenet_fini_dma(priv);
>> + return -ENOMEM;
>> }
>>
>> - /* init rDma */
>> - bcmgenet_rdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
>> + /* init Rx queues */
>> + bcmgenet_init_rx_queues(priv->dev);
>>
>> - /* Init tDma */
>> + /* init tDma */
>> bcmgenet_tdma_writel(priv, DMA_MAX_BURST_LENGTH, DMA_SCB_BURST_SIZE);
>>
>> - /* Initialize common TX ring structures */
>> + /* init common Tx ring structures */
>> priv->tx_bds = priv->base + priv->hw_params->tdma_offset;
>> priv->num_tx_bds = TOTAL_DESC;
>> priv->tx_cbs = kcalloc(priv->num_tx_bds, sizeof(struct enet_cb),
>> @@ -1955,38 +2017,75 @@ static int bcmgenet_init_dma(struct bcmgenet_priv *priv)
>> return -ENOMEM;
>> }
>>
>> - /* initialize multi xmit queue */
>> - bcmgenet_init_multiq(priv->dev);
>> -
>> - /* initialize special ring 16 */
>> - bcmgenet_init_tx_ring(priv, DESC_INDEX, GENET_DEFAULT_BD_CNT,
>> - priv->hw_params->tx_queues *
>> - priv->hw_params->bds_cnt,
>> - TOTAL_DESC);
>> + /* init Tx queues */
>> + bcmgenet_init_tx_queues(priv->dev);
>>
>> return 0;
>> }
>>
>> -/* NAPI polling method*/
>> +/* NAPI polling method for Rx and Tx default queues */
>> static int bcmgenet_poll(struct napi_struct *napi, int budget)
>> {
>> - struct bcmgenet_priv *priv = container_of(napi,
>> - struct bcmgenet_priv, napi);
>> - unsigned int work_done;
>> + struct bcmgenet_priv *priv =
>> + container_of(napi, struct bcmgenet_priv, napi);
>> + int work_done = 0;
>>
>> - /* tx reclaim */
>> + /* Tx default queue processing */
>> bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[DESC_INDEX]);
>>
>> - work_done = bcmgenet_desc_rx(priv, budget);
>> + /* Rx default queue processing */
>> + work_done += bcmgenet_desc_rx(priv, DESC_INDEX, napi, budget);
>> +
>> + if (work_done < budget) {
>> + napi_complete(napi);
>> + bcmgenet_intrl2_0_writel(priv,
>> + UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE |
>> + UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE,
>> + INTRL2_CPU_MASK_CLEAR);
>> + }
>> +
>> + return work_done;
>> +}
>> +
>> +/* NAPI polling method for Rx and Tx priority queues */
>> +static int bcmgenet_poll_priority(struct napi_struct *napi, int budget)
>> +{
>> + struct bcmgenet_priv *priv =
>> + container_of(napi, struct bcmgenet_priv, napi_priority);
>> + int work_done = 0;
>> + unsigned int index;
>> + unsigned int active_rings;
>> +
>> + priv->irq1_stat |= (bcmgenet_intrl2_1_readl(priv, INTRL2_CPU_STAT) &
>> + ~priv->int1_mask);
>> +
>> + /* Tx priority queue processing */
>> + index = 0;
>> + active_rings = priv->irq1_stat & UMAC_IRQ1_TX_INTR_MASK;
>> + while (active_rings) {
>> + if (active_rings & 0x1)
>> + bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[index]);
>> + active_rings >>= 1;
>> + index++;
>> + }
>> +
>> + /* Rx priority queue processing */
>> + index = 0;
>> + active_rings = (priv->irq1_stat >> UMAC_IRQ1_RX_INTR_SHIFT) &
>> + UMAC_IRQ1_RX_INTR_MASK;
>> + while (active_rings && work_done < budget) {
>> + if (active_rings & 0x1)
>> + work_done += bcmgenet_desc_rx(priv, index, napi,
>> + budget - work_done);
>> + active_rings >>= 1;
>> + index++;
>> + }
>> +
>> + priv->irq1_stat = 0;
>>
>> - /* Advancing our consumer index*/
>> - priv->rx_c_index += work_done;
>> - priv->rx_c_index &= DMA_C_INDEX_MASK;
>> - bcmgenet_rdma_ring_writel(priv, DESC_INDEX,
>> - priv->rx_c_index, RDMA_CONS_INDEX);
>> if (work_done < budget) {
>> napi_complete(napi);
>> - bcmgenet_intrl2_0_writel(priv, UMAC_IRQ_RXDMA_BDONE,
>> + bcmgenet_intrl2_1_writel(priv, ~priv->int1_mask,
>> INTRL2_CPU_MASK_CLEAR);
>> }
>>
>> @@ -2017,36 +2116,34 @@ static void bcmgenet_irq_task(struct work_struct *work)
>> }
>> }
>>
>> -/* bcmgenet_isr1: interrupt handler for ring buffer. */
>> +/* bcmgenet_isr1: handle Rx and Tx priority queues */
>> static irqreturn_t bcmgenet_isr1(int irq, void *dev_id)
>> {
>> struct bcmgenet_priv *priv = dev_id;
>> - unsigned int index;
>>
>> /* Save irq status for bottom-half processing. */
>> priv->irq1_stat =
>> bcmgenet_intrl2_1_readl(priv, INTRL2_CPU_STAT) &
>> ~priv->int1_mask;
>> +
>> /* clear interrupts */
>> bcmgenet_intrl2_1_writel(priv, priv->irq1_stat, INTRL2_CPU_CLEAR);
>>
>> netif_dbg(priv, intr, priv->dev,
>> "%s: IRQ=0x%x\n", __func__, priv->irq1_stat);
>> - /* Check the MBDONE interrupts.
>> - * packet is done, reclaim descriptors
>> - */
>> - if (priv->irq1_stat & 0x0000ffff) {
>> - index = 0;
>> - for (index = 0; index < 16; index++) {
>> - if (priv->irq1_stat & (1 << index))
>> - bcmgenet_tx_reclaim(priv->dev,
>> - &priv->tx_rings[index]);
>> +
>> + if (priv->irq1_stat) {
>> + if (likely(napi_schedule_prep(&priv->napi_priority))) {
>> + bcmgenet_intrl2_1_writel(priv, ~priv->int1_mask,
>> + INTRL2_CPU_MASK_SET);
>> + __napi_schedule(&priv->napi_priority);
>> }
>> }
>> +
>> return IRQ_HANDLED;
>> }
>>
>> -/* bcmgenet_isr0: Handle various interrupts. */
>> +/* bcmgenet_isr0: handle Rx and Tx default queues + other stuff */
>> static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
>> {
>> struct bcmgenet_priv *priv = dev_id;
>> @@ -2054,29 +2151,25 @@ static irqreturn_t bcmgenet_isr0(int irq, void *dev_id)
>> /* Save irq status for bottom-half processing. */
>> priv->irq0_stat =
>> bcmgenet_intrl2_0_readl(priv, INTRL2_CPU_STAT) &
>> - ~bcmgenet_intrl2_0_readl(priv, INTRL2_CPU_MASK_STATUS);
>> + ~priv->int0_mask;
>> +
>> /* clear interrupts */
>> bcmgenet_intrl2_0_writel(priv, priv->irq0_stat, INTRL2_CPU_CLEAR);
>>
>> netif_dbg(priv, intr, priv->dev,
>> "IRQ=0x%x\n", priv->irq0_stat);
>>
>> - if (priv->irq0_stat & (UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE)) {
>> - /* We use NAPI(software interrupt throttling, if
>> - * Rx Descriptor throttling is not used.
>> - * Disable interrupt, will be enabled in the poll method.
>> - */
>> + if (priv->irq0_stat & (UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE |
>> + UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE)) {
>> if (likely(napi_schedule_prep(&priv->napi))) {
>> - bcmgenet_intrl2_0_writel(priv, UMAC_IRQ_RXDMA_BDONE,
>> - INTRL2_CPU_MASK_SET);
>> + bcmgenet_intrl2_0_writel(priv,
>> + UMAC_IRQ_RXDMA_BDONE | UMAC_IRQ_RXDMA_PDONE |
>> + UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE,
>> + INTRL2_CPU_MASK_SET);
>> __napi_schedule(&priv->napi);
>> }
>> }
>> - if (priv->irq0_stat &
>> - (UMAC_IRQ_TXDMA_BDONE | UMAC_IRQ_TXDMA_PDONE)) {
>> - /* Tx reclaim */
>> - bcmgenet_tx_reclaim(priv->dev, &priv->tx_rings[DESC_INDEX]);
>> - }
>> +
>> if (priv->irq0_stat & (UMAC_IRQ_PHY_DET_R |
>> UMAC_IRQ_PHY_DET_F |
>> UMAC_IRQ_LINK_UP |
>> @@ -2170,6 +2263,7 @@ static void bcmgenet_netif_start(struct net_device *dev)
>>
>> /* Start the network engine */
>> napi_enable(&priv->napi);
>> + napi_enable(&priv->napi_priority);
>>
>> umac_enable_set(priv, CMD_TX_EN | CMD_RX_EN, true);
>>
>> @@ -2269,6 +2363,7 @@ static void bcmgenet_netif_stop(struct net_device *dev)
>>
>> netif_tx_stop_all_queues(dev);
>> napi_disable(&priv->napi);
>> + napi_disable(&priv->napi_priority);
>> phy_stop(priv->phydev);
>>
>> bcmgenet_intr_disable(priv);
>> @@ -2436,8 +2531,9 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
>> static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
>> [GENET_V1] = {
>> .tx_queues = 0,
>> + .tx_bds_cnt = 0,
>> .rx_queues = 0,
>> - .bds_cnt = 0,
>> + .rx_bds_cnt = 0,
>> .bp_in_en_shift = 16,
>> .bp_in_mask = 0xffff,
>> .hfb_filter_cnt = 16,
>> @@ -2449,8 +2545,9 @@ static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
>> },
>> [GENET_V2] = {
>> .tx_queues = 4,
>> - .rx_queues = 4,
>> - .bds_cnt = 32,
>> + .tx_bds_cnt = 32,
>> + .rx_queues = 0,
>> + .rx_bds_cnt = 0,
>> .bp_in_en_shift = 16,
>> .bp_in_mask = 0xffff,
>> .hfb_filter_cnt = 16,
>> @@ -2465,8 +2562,9 @@ static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
>> },
>> [GENET_V3] = {
>> .tx_queues = 4,
>> - .rx_queues = 4,
>> - .bds_cnt = 32,
>> + .tx_bds_cnt = 32,
>> + .rx_queues = 0,
>> + .rx_bds_cnt = 0,
>> .bp_in_en_shift = 17,
>> .bp_in_mask = 0x1ffff,
>> .hfb_filter_cnt = 48,
>> @@ -2481,8 +2579,9 @@ static struct bcmgenet_hw_params bcmgenet_hw_params[] = {
>> },
>> [GENET_V4] = {
>> .tx_queues = 4,
>> - .rx_queues = 4,
>> - .bds_cnt = 32,
>> + .tx_bds_cnt = 32,
>> + .rx_queues = 0,
>> + .rx_bds_cnt = 0,
>> .bp_in_en_shift = 17,
>> .bp_in_mask = 0x1ffff,
>> .hfb_filter_cnt = 48,
>> @@ -2560,14 +2659,15 @@ static void bcmgenet_set_hw_params(struct bcmgenet_priv *priv)
>> #endif
>>
>> pr_debug("Configuration for version: %d\n"
>> - "TXq: %1d, RXq: %1d, BDs: %1d\n"
>> + "TXq: %1d, TXBDs: %1d, RXq: %1d, RXBDs: %1d\n"
>> "BP << en: %2d, BP msk: 0x%05x\n"
>> "HFB count: %2d, QTAQ msk: 0x%05x\n"
>> "TBUF: 0x%04x, HFB: 0x%04x, HFBreg: 0x%04x\n"
>> "RDMA: 0x%05x, TDMA: 0x%05x\n"
>> "Words/BD: %d\n",
>> priv->version,
>> - params->tx_queues, params->rx_queues, params->bds_cnt,
>> + params->tx_queues, params->tx_bds_cnt,
>> + params->rx_queues, params->rx_bds_cnt,
>> params->bp_in_en_shift, params->bp_in_mask,
>> params->hfb_filter_cnt, params->qtag_mask,
>> params->tbuf_offset, params->hfb_offset,
>> @@ -2594,8 +2694,9 @@ static int bcmgenet_probe(struct platform_device *pdev)
>> struct resource *r;
>> int err = -EIO;
>>
>> - /* Up to GENET_MAX_MQ_CNT + 1 TX queues and a single RX queue */
>> - dev = alloc_etherdev_mqs(sizeof(*priv), GENET_MAX_MQ_CNT + 1, 1);
>> + /* Up to GENET_MAX_MQ_CNT + 1 TX queues and RX queues */
>> + dev = alloc_etherdev_mqs(sizeof(*priv), GENET_MAX_MQ_CNT + 1,
>> + GENET_MAX_MQ_CNT + 1);
>> if (!dev) {
>> dev_err(&pdev->dev, "can't allocate net device\n");
>> return -ENOMEM;
>> @@ -2635,7 +2736,8 @@ static int bcmgenet_probe(struct platform_device *pdev)
>> dev->watchdog_timeo = 2 * HZ;
>> dev->ethtool_ops = &bcmgenet_ethtool_ops;
>> dev->netdev_ops = &bcmgenet_netdev_ops;
>> - netif_napi_add(dev, &priv->napi, bcmgenet_poll, 64);
>> + netif_napi_add(dev, &priv->napi, bcmgenet_poll, 16);
>> + netif_napi_add(dev, &priv->napi_priority, bcmgenet_poll_priority, 64);
>>
>> priv->msg_enable = netif_msg_init(-1, GENET_MSG_DEFAULT);
>>
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> index b36ddec..80d9715 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h
>> @@ -310,6 +310,11 @@ struct bcmgenet_mib_counters {
>> #define UMAC_IRQ_MDIO_DONE (1 << 23)
>> #define UMAC_IRQ_MDIO_ERROR (1 << 24)
>>
>> +/* INTRL2 instance 1 definitions */
>> +#define UMAC_IRQ1_TX_INTR_MASK 0xFFFF
>> +#define UMAC_IRQ1_RX_INTR_MASK 0xFFFF
>> +#define UMAC_IRQ1_RX_INTR_SHIFT 16
>> +
>> /* Register block offsets */
>> #define GENET_SYS_OFF 0x0000
>> #define GENET_GR_BRIDGE_OFF 0x0040
>> @@ -503,8 +508,9 @@ enum bcmgenet_version {
>> */
>> struct bcmgenet_hw_params {
>> u8 tx_queues;
>> + u8 tx_bds_cnt;
>> u8 rx_queues;
>> - u8 bds_cnt;
>> + u8 rx_bds_cnt;
>> u8 bp_in_en_shift;
>> u32 bp_in_mask;
>> u8 hfb_filter_cnt;
>> @@ -536,6 +542,18 @@ struct bcmgenet_tx_ring {
>> struct bcmgenet_tx_ring *);
>> };
>>
>> +struct bcmgenet_rx_ring {
>> + unsigned int index; /* Rx ring index */
>> + struct enet_cb *cbs; /* Rx ring buffer control block */
>> + unsigned int size; /* Rx ring size */
>> + unsigned int c_index; /* Rx last consumer index */
>> + unsigned int read_ptr; /* Rx ring read pointer */
>> + unsigned int cb_ptr; /* Rx ring initial CB ptr */
>> + unsigned int end_ptr; /* Rx ring end CB ptr */
>> + void __iomem *bd_assign_ptr; /* Rx ring refill ptr */
>> + unsigned int bd_assign_idx; /* Rx ring refill index */
>> +};
>> +
>> /* device context */
>> struct bcmgenet_priv {
>> void __iomem *base;
>> @@ -546,6 +564,7 @@ struct bcmgenet_priv {
>>
>> /* NAPI for descriptor based rx */
>> struct napi_struct napi ____cacheline_aligned;
>> + struct napi_struct napi_priority ____cacheline_aligned;
>>
>> /* transmit variables */
>> void __iomem *tx_bds;
>> @@ -556,13 +575,11 @@ struct bcmgenet_priv {
>>
>> /* receive variables */
>> void __iomem *rx_bds;
>> - void __iomem *rx_bd_assign_ptr;
>> - int rx_bd_assign_index;
>> struct enet_cb *rx_cbs;
>> unsigned int num_rx_bds;
>> unsigned int rx_buf_len;
>> - unsigned int rx_read_ptr;
>> - unsigned int rx_c_index;
>> +
>> + struct bcmgenet_rx_ring rx_rings[DESC_INDEX + 1];
>>
>> /* other misc variables */
>> struct bcmgenet_hw_params *hw_params;
>>
>
^ permalink raw reply
* Re: How do I update Ericsson F5521gw firmware from Linux? / Ericsson F5521gw Random Disconnect Issue
From: Dan Williams @ 2014-12-03 21:29 UTC (permalink / raw)
To: Richard Yao; +Cc: netdev
In-Reply-To: <20141203105841.GF32286@woodpecker.gentoo.org>
On Wed, 2014-12-03 at 10:58 +0000, Richard Yao wrote:
> I purchased an Ericsson F5521gw (Lenovo part 60Y3279) so that my Lenovo T520
> could connect to China Unicom for internet access during a stay in China.
> Unfortunately, it tends to fail every 4 to 8 hours with the following printed
> to the system log:
Probably a better discussion for the ModemManager list (since that's
what you're using) but here goes...
As far as I know, there is no way to do this from Linux unless you
install Windows into a VM and use USB passthrough to allow the VM direct
access to the Ericsson device's USB interfaces.
I'm not sure what version of NetworkManager you're running, but NM
0.9.10+ have WWAN autoconnect support which will reconnect periodically
on failure. That coupled with setting your openvpn VPN connection as a
"secondary connection" to the WWAN should ensure that the VPN is always
up when the WWAN is up. That would be a good workaround if you cannot
find a way to update the firmware.
nmcli con mod "China Unicom" connection.autoconnect yes
nmcli con
<find your VPN connection's name or UUID>
nmcli con mod "China Unicom" +connection.secondaries <VPN name or UUID>
(this does require that your VPN passwords be stored
in /etc/NetworkManager/system-connections/, but rest assured they are
only accessible by root)
Dan
> Dec 3 04:28:27 t520 kernel: [85827.909187] cdc_ncm 2-1.4:1.6 wwan0: network connection: disconnected
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> (ttyACM1): modem state changed, 'connected' --> 'registered' (reason: user-requested)
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> (ttyACM1): device state change: activated -> failed (reason 'modem-no-carrier') [100 120 25]
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> NetworkManager state is now CONNECTED_LOCAL
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> NetworkManager state is now CONNECTED_GLOBAL
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> Policy set 'tun0' (tun0) as default for IPv4 routing and DNS.
> Dec 3 04:28:27 t520 dbus[3912]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
> Dec 3 04:28:27 t520 nm-openvpn[4200]: MANAGEMENT: Client disconnected
> Dec 3 04:28:27 t520 nm-openvpn[4200]: SIGTERM received, sending exit notification to peer
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> (tun0): link disconnected (deferring action for 4 seconds)
> Dec 3 04:28:27 t520 NetworkManager[4134]: <error> [1417598907.695056] [platform/nm-linux-platform.c:1714] add_object(): Netlink error adding 0.0.0.0/0 via 10.8.0.5 dev tun0 metric 1024 mss 0 src user: Unspecific failure
> Dec 3 04:28:27 t520 NetworkManager[4134]: <error> [1417598907.695124] [platform/nm-linux-platform.c:1714] add_object(): Netlink error adding 10.8.0.5/32 via 0.0.0.0 dev tun0 metric 1024 mss 0 src user: Unspecific failure
> Dec 3 04:28:27 t520 NetworkManager[4134]: <error> [1417598907.695165] [platform/nm-linux-platform.c:1714] add_object(): Netlink error adding 0.0.0.0/0 via 10.8.0.5 dev tun0 metric 1024 mss 0 src user: Unspecific failure
> tail: /var/log/messages: file truncated
> Dec 3 04:28:27 t520 NetworkManager[4134]: <error> [1417598907.695165] [platform/nm-linux-platform.c:1714] add_object(): Netlink error adding 0.0.0.0/0 via 10.8.0.5 dev tun0 metric 1024 mss 0 src user: Unspecific failure
> Dec 3 04:28:27 t520 NetworkManager[4134]: <error> [1417598907.695187] [nm-policy.c:693] update_ip4_routing(): Failed to set default route.
> Dec 3 04:28:27 t520 NetworkManager[4134]: <warn> Activation (ttyACM1) failed for connection 'China Unicom'
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> (ttyACM1): device state change: failed -> disconnected (reason 'none') [120 30 0]
> Dec 3 04:28:27 t520 NetworkManager[4134]: <info> (ttyACM1): deactivating device (reason 'none') [0]
> Dec 3 04:28:27 t520 dbus[3912]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
> Dec 3 04:28:27 t520 nm-dispatcher: Dispatching action 'vpn-down' for tun0
> Dec 3 04:28:27 t520 nm-dispatcher: Dispatching action 'down' for wwan0
>
> Here is the modem's description of itself:
>
> mmcli -m 0 --command='AT*EEVINFO=99'
> response: '*EEVINFO:
> Model.................... F5521gw
> IMEI Data................ <REDACTED>
> SVN...................... 05
> Serial Number............ <REDACTED>
> Product Number........... KRD 131 18/221
> Revision................. R1C
> FW Product............... CXP 901 7640/1
> FW Version............... R2A07
> FW Build Date/Time....... 2010-12-03/12:17
> Cust. Product............ CXC 173 0424/22
> Cust. Version............ R1B02
> Customization Descr...... Lenovo
> Format................... 1
> Base Product Number...... 1/KRD 131 18/1
> Base Product Revision.... R1N
> SIMLock Deployment....... 0.0
> SIMLock Description...... Unlocked
> SIMLock Product.......... CXC 173 0839/01
> SIMLock Revision......... R1F
> Model Description........ F5521gw Mobile Broadband Module
> Vendor Name.............. Lenovo
> Config. Set Product...... CXP 901 7629/1
> Config. Set Revision..... R3A02
> Network Customization.... Default;46001
> Customization State...... 0
> Configuration Product.... CXP 901 7640/1
> Configuration Revision... R2A07
> Protocol FW Product...... CXC 173 0063/1
> Protocol FW Version...... R2A07
> Application FW Product... CXC 173 0064/1
> Application FW Version... R2A07
> Network List Product..... CXC 173 1116/1
> Network List Revision.... R1A
> Individualization........ 189.191
> Domain................... 3.3
> Upgrade State............ 1
> Volume info.............. 66 MB total / 43.9% free'
>
> Posts on the Lenovo forums suggest that this can be resolved by updating the
> firmware:
>
> http://forums.lenovo.com/t5/X-Series-Tablet-ThinkPad-Laptops/Ericsson-F5521gw-WWAN-disconnects-intermittently-and-cannot/td-p/565597
>
> Unfortunately, the official firmware updater only runs on Windows and I am
> unable to find a way to update the firmware from Linux. I also cannot find any
> hardware documentation. Does anyone have any suggestions?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [patch net-next 3/6] net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcu
From: Jamal Hadi Salim @ 2014-12-03 21:15 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem
In-Reply-To: <20141203152051.GK1860@nanopsycho.orion>
On 12/03/14 10:20, Jiri Pirko wrote:
> Wed, Dec 03, 2014 at 03:37:46PM CET, jhs@mojatatu.com wrote:
>
> Yep, but this is updater, protected by rtnl. _rcu list travelsal variant
> should be used by reader only (classify callback in cls case).
>
> get op is only called from tc_ctl_tfilter which is always called with
> rtnl held.
>
I am not an rcu officionado. So if the control path is doing a non-rcu
get + rcu-del/change (update) then as long as the fastpath is (read)
rcu locking we are fine and nothing will actually happen until the
fastpath releases and rcu grace period ends, correct?
In which case please accept my:
ACKed-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCHv11 net-next 2/2] openvswitch: Add support for unique flow IDs.
From: Joe Stringer @ 2014-12-03 21:02 UTC (permalink / raw)
To: Pravin Shelar; +Cc: Linux Kernel, dev@openvswitch.org, Linux Netdev List
In-Reply-To: <CALnjE+qfdFu4Vv3RaUmOtXuvrY+_iCFLNDXhFu2+zDHBJx0L0A@mail.gmail.com>
On 3 December 2014 at 11:45, Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Dec 3, 2014 at 10:47 AM, Joe Stringer <joestringer@nicira.com> wrote:
>> I forgot to mention that this is the first post based against net-next.
>>
>> On 2 December 2014 at 18:56, Joe Stringer <joestringer@nicira.com> wrote:
>>> <....snip...>
>>> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
>>> index a8b30f3..7f31dbf 100644
>>> --- a/net/openvswitch/flow.h
>>> +++ b/net/openvswitch/flow.h
>>> @@ -197,6 +197,13 @@ struct sw_flow_match {
>>> struct sw_flow_mask *mask;
>>> };
>>>
>>> +#define MAX_UFID_LENGTH 256
>>> +
>>> +struct sw_flow_id {
>>> + u32 ufid_len;
>>> + u32 ufid[MAX_UFID_LENGTH / 4];
>>> +};
>>> +
>>> struct sw_flow_actions {
>>> struct rcu_head rcu;
>>> u32 actions_len;
>>
>> Pravin, I changed the 'struct sw_flow_id' to the above after feedback
>> from the previous round, but it doesn't seem quite right. Is this what
>> you meant? Given that current ovs-vswitchd userspace only generates
>> 128bit UFIDs, it seems wasteful to be allocating so much for this. Did
>> you have in mind for this to be united with the unmasked_key?
>
> I am fine with 128bits of ufid, we can extend it later if we need it.
> But what do you mean by united unmasked key? Can you define the struct
> here?
What I mean, is that we could define sw_flow_id as:
+struct sw_flow_id {
+ u32 ufid_len;
+ u32 ufid[];
+};
...and just allocate sizeof(struct sw_flow_id) + nla_len(ufid_nlattr),
rather than always allocating a 260B structure.
^ permalink raw reply
* Re: [PATCH v2 2/6] ethernet/intel: Use eth_skb_pad and skb_put_padto helpers
From: Jeff Kirsher @ 2014-12-03 19:25 UTC (permalink / raw)
To: Alexander Duyck; +Cc: netdev, davem
In-Reply-To: <20141203161739.9223.52060.stgit@ahduyck-vm-fedora20>
[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]
On Wed, 2014-12-03 at 08:17 -0800, Alexander Duyck wrote:
> Update the Intel Ethernet drivers to use eth_skb_pad() and
> skb_put_padto
> instead of doing their own implementations of the function.
>
> Also this cleans up two other spots where skb_pad was called but the
> length
> and tail pointers were being manipulated directly instead of just
> having
> the padding length added via __skb_put.
>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---
> drivers/net/ethernet/intel/e1000/e1000_main.c | 8 ++------
> drivers/net/ethernet/intel/e1000e/netdev.c | 8 ++------
> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 11 +++--------
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 8 ++------
> drivers/net/ethernet/intel/igb/igb_main.c | 19
> +++++--------------
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 19
> +++++--------------
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 +++--------
> 7 files changed, 22 insertions(+), 62 deletions(-)
Since this is a part of a series, no need to break it up, by having me
only pull in this one patch.
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* pull request: wireless 2014-12-03
From: John W. Linville @ 2014-12-03 20:34 UTC (permalink / raw)
To: davem; +Cc: linux-wireless, netdev, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5864 bytes --]
Dave,
One last(?) batch of fixes hoping to make 3.18...
In this episode, we have another trio of rtlwifi fixes
repairing a little more damage from the major update of the
rtlwifi-family of drivers. These editing mistakes caused some
memory corruption and missed a flag critical to proper interrupt
handling. Together, these fix the kernel regression reported at
https://bugzilla.kernel.org/show_bug.cgi?id=88951 by Catalin Iacob.
Please let me know if there are problems!
Thanks,
John
---
The following changes since commit 7d63a5f9b25ba6b130da8eb2d32a72b1462d0249:
rtlwifi: Change order in device startup (2014-11-25 14:22:22 -0500)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless.git tags/master-2014-12-01
for you to fetch changes up to 87141db0848aa20c43d453f5545efc8f390d4372:
rtlwifi: rtl8192ce: Fix missing interrupt ready flag (2014-12-01 15:22:02 -0500)
----------------------------------------------------------------
Larry Finger (3):
rtlwifi: rtl8192ce: Fix editing error that causes silent memory corruption
rtlwifi: rtl8192ce: Fix kernel crashes due to missing callback entry
rtlwifi: rtl8192ce: Fix missing interrupt ready flag
drivers/net/wireless/rtlwifi/rtl8192ce/hw.c | 3 ++-
drivers/net/wireless/rtlwifi/rtl8192ce/sw.c | 3 +++
drivers/net/wireless/rtlwifi/rtl8192ce/trx.c | 24 ++++++++++++++++++++----
drivers/net/wireless/rtlwifi/rtl8192ce/trx.h | 2 ++
4 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
index 55357d69397a..d2ec5160bbf0 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ce/hw.c
@@ -1287,6 +1287,7 @@ void rtl92ce_enable_interrupt(struct ieee80211_hw *hw)
rtl_write_dword(rtlpriv, REG_HIMR, rtlpci->irq_mask[0] & 0xFFFFFFFF);
rtl_write_dword(rtlpriv, REG_HIMRE, rtlpci->irq_mask[1] & 0xFFFFFFFF);
+ rtlpci->irq_enabled = true;
}
void rtl92ce_disable_interrupt(struct ieee80211_hw *hw)
@@ -1296,7 +1297,7 @@ void rtl92ce_disable_interrupt(struct ieee80211_hw *hw)
rtl_write_dword(rtlpriv, REG_HIMR, IMR8190_DISABLED);
rtl_write_dword(rtlpriv, REG_HIMRE, IMR8190_DISABLED);
- synchronize_irq(rtlpci->pdev->irq);
+ rtlpci->irq_enabled = false;
}
static void _rtl92ce_poweroff_adapter(struct ieee80211_hw *hw)
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/sw.c b/drivers/net/wireless/rtlwifi/rtl8192ce/sw.c
index 46ea07605eb4..dd5aa089126a 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ce/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ce/sw.c
@@ -228,6 +228,7 @@ static struct rtl_hal_ops rtl8192ce_hal_ops = {
.led_control = rtl92ce_led_control,
.set_desc = rtl92ce_set_desc,
.get_desc = rtl92ce_get_desc,
+ .is_tx_desc_closed = rtl92ce_is_tx_desc_closed,
.tx_polling = rtl92ce_tx_polling,
.enable_hw_sec = rtl92ce_enable_hw_security_config,
.set_key = rtl92ce_set_key,
@@ -271,6 +272,8 @@ static struct rtl_hal_cfg rtl92ce_hal_cfg = {
.maps[MAC_RCR_ACRC32] = ACRC32,
.maps[MAC_RCR_ACF] = ACF,
.maps[MAC_RCR_AAP] = AAP,
+ .maps[MAC_HIMR] = REG_HIMR,
+ .maps[MAC_HIMRE] = REG_HIMRE,
.maps[EFUSE_TEST] = REG_EFUSE_TEST,
.maps[EFUSE_CTRL] = REG_EFUSE_CTRL,
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
index dc3d20b17a26..e88dcd0e0af1 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.c
@@ -720,16 +720,15 @@ u32 rtl92ce_get_desc(u8 *p_desc, bool istx, u8 desc_name)
break;
}
} else {
- struct rx_desc_92c *pdesc = (struct rx_desc_92c *)p_desc;
switch (desc_name) {
case HW_DESC_OWN:
- ret = GET_RX_DESC_OWN(pdesc);
+ ret = GET_RX_DESC_OWN(p_desc);
break;
case HW_DESC_RXPKT_LEN:
- ret = GET_RX_DESC_PKT_LEN(pdesc);
+ ret = GET_RX_DESC_PKT_LEN(p_desc);
break;
case HW_DESC_RXBUFF_ADDR:
- ret = GET_RX_STATUS_DESC_BUFF_ADDR(pdesc);
+ ret = GET_RX_DESC_BUFF_ADDR(p_desc);
break;
default:
RT_ASSERT(false, "ERR rxdesc :%d not process\n",
@@ -740,6 +739,23 @@ u32 rtl92ce_get_desc(u8 *p_desc, bool istx, u8 desc_name)
return ret;
}
+bool rtl92ce_is_tx_desc_closed(struct ieee80211_hw *hw,
+ u8 hw_queue, u16 index)
+{
+ struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
+ struct rtl8192_tx_ring *ring = &rtlpci->tx_ring[hw_queue];
+ u8 *entry = (u8 *)(&ring->desc[ring->idx]);
+ u8 own = (u8)rtl92ce_get_desc(entry, true, HW_DESC_OWN);
+
+ /*beacon packet will only use the first
+ *descriptor defautly,and the own may not
+ *be cleared by the hardware
+ */
+ if (own)
+ return false;
+ return true;
+}
+
void rtl92ce_tx_polling(struct ieee80211_hw *hw, u8 hw_queue)
{
struct rtl_priv *rtlpriv = rtl_priv(hw);
diff --git a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.h b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.h
index 9a39ec4204dd..4bec4b07e3e0 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192ce/trx.h
+++ b/drivers/net/wireless/rtlwifi/rtl8192ce/trx.h
@@ -723,6 +723,8 @@ bool rtl92ce_rx_query_desc(struct ieee80211_hw *hw,
void rtl92ce_set_desc(struct ieee80211_hw *hw, u8 *pdesc, bool istx,
u8 desc_name, u8 *val);
u32 rtl92ce_get_desc(u8 *pdesc, bool istx, u8 desc_name);
+bool rtl92ce_is_tx_desc_closed(struct ieee80211_hw *hw,
+ u8 hw_queue, u16 index);
void rtl92ce_tx_polling(struct ieee80211_hw *hw, u8 hw_queue);
void rtl92ce_tx_fill_cmddesc(struct ieee80211_hw *hw, u8 *pdesc,
bool b_firstseg, bool b_lastseg,
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply related
* Re: [PATCHv11 net-next 2/2] openvswitch: Add support for unique flow IDs.
From: Pravin Shelar @ 2014-12-03 19:45 UTC (permalink / raw)
To: Joe Stringer; +Cc: Linux Kernel, dev@openvswitch.org, Linux Netdev List
In-Reply-To: <CANr6G5z-kokWkBs3bb=bXrBmBUhaqHKH1QfeXycFWxSNf00qFw@mail.gmail.com>
On Wed, Dec 3, 2014 at 10:47 AM, Joe Stringer <joestringer@nicira.com> wrote:
> I forgot to mention that this is the first post based against net-next.
>
> On 2 December 2014 at 18:56, Joe Stringer <joestringer@nicira.com> wrote:
>> <....snip...>
>> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
>> index a8b30f3..7f31dbf 100644
>> --- a/net/openvswitch/flow.h
>> +++ b/net/openvswitch/flow.h
>> @@ -197,6 +197,13 @@ struct sw_flow_match {
>> struct sw_flow_mask *mask;
>> };
>>
>> +#define MAX_UFID_LENGTH 256
>> +
>> +struct sw_flow_id {
>> + u32 ufid_len;
>> + u32 ufid[MAX_UFID_LENGTH / 4];
>> +};
>> +
>> struct sw_flow_actions {
>> struct rcu_head rcu;
>> u32 actions_len;
>
> Pravin, I changed the 'struct sw_flow_id' to the above after feedback
> from the previous round, but it doesn't seem quite right. Is this what
> you meant? Given that current ovs-vswitchd userspace only generates
> 128bit UFIDs, it seems wasteful to be allocating so much for this. Did
> you have in mind for this to be united with the unmasked_key?
I am fine with 128bits of ufid, we can extend it later if we need it.
But what do you mean by united unmasked key? Can you define the struct
here?
^ permalink raw reply
* Re: [PATCH net-next] tc_act: export uapi header file
From: Jiri Pirko @ 2014-12-03 19:42 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Jamal Hadi Salim, David Miller, netdev
In-Reply-To: <20141203093317.3c482872@urahara>
Wed, Dec 03, 2014 at 06:33:17PM CET, stephen@networkplumber.org wrote:
>This file is used by iproute2 and should be exported.
>
>Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@resnulli.us>
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2014-12-03 19:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Thomas Graf, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <20141203183859.GB16447@redhat.com>
On Wed, Dec 3, 2014 at 10:38 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Dec 03, 2014 at 10:07:42AM -0800, Jesse Gross wrote:.
>> ICMP can't be the complete solution in any case because it only works
>> for IP traffic.
>
> Let's be specific please. What protocols do you most care about? IPX?
>
>> I think there are only two full solutions: find a way
>> to adjust the guest MTU to the minimum MTU that its traffic could hit
>> in an L2 domain or fragmentation. ICMP could be a possible
>> optimization in the fragmentation case.
>
> Both approaches seem strange. You are sending 1 packet an hour to
> some destination behind 100 tunnels. Why would you want to
> cut down your MTU for all packets? On the other hand,
> doubling the amount of packets because your MTU is off
> by a couple of bytes will hurt performance significantly.
>
> Still, if you want to cut down the MTU within guest,
> that's only an ifconfig away.
> Most people would not want to bother, I think it's a good
> idea to make PMTU work properly for them.
I care about correctness first, which means that an Ethernet link
being exposed to the guest should behave like Ethernet. So, yes, IPX
should work if somebody chooses to do that.
Your comments are about performance optimization. That's fine but
without a correct base to start from it seems like putting the cart
before the horse and is hard to reason about.
^ permalink raw reply
* Re: [PATCHv11 net-next 1/2] openvswitch: Refactor ovs_nla_fill_match().
From: Pravin Shelar @ 2014-12-03 19:37 UTC (permalink / raw)
To: Joe Stringer; +Cc: netdev, LKML, dev@openvswitch.org
In-Reply-To: <1417575363-13770-1-git-send-email-joestringer@nicira.com>
On Tue, Dec 2, 2014 at 6:56 PM, Joe Stringer <joestringer@nicira.com> wrote:
> Refactor the ovs_nla_fill_match() function into separate netlink
> serialization functions ovs_nla_put_{unmasked_key,masked_key,mask}().
> Modify ovs_nla_put_flow() to handle attribute nesting and expose the
> 'is_mask' parameter - all callers need to nest the flow, and callers
> have better knowledge about whether it is serializing a mask or not.
> The next patch will be the first user of ovs_nla_put_masked_key().
>
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
Thanks for the refactoring. code looks better now.
> ---
> v11: Shift netlink serialization of key/mask to flow_netlink.c
> Add put_{unmasked_key,key,mask} helpers.
> Perform nesting in ovs_nla_put_flow().
> v10: First post.
> ---
> net/openvswitch/datapath.c | 41 ++++++------------------------------
> net/openvswitch/flow_netlink.c | 45 +++++++++++++++++++++++++++++++++++++---
> net/openvswitch/flow_netlink.h | 8 +++++--
> 3 files changed, 54 insertions(+), 40 deletions(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 332b5a0..b2a3796 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -462,10 +462,8 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
> 0, upcall_info->cmd);
> upcall->dp_ifindex = dp_ifindex;
>
> - nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY);
> - err = ovs_nla_put_flow(key, key, user_skb);
> + err = ovs_nla_put_flow(key, key, OVS_PACKET_ATTR_KEY, false, user_skb);
We need different name here, since it does not operate on flow. maybe
__ovs_nla_put_key(). we can move the function definition to
flow_netlink.h
> BUG_ON(err);
> - nla_nest_end(user_skb, nla);
>
> if (upcall_info->userdata)
> __nla_put(user_skb, OVS_PACKET_ATTR_USERDATA,
> @@ -676,37 +674,6 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
> }
>
> /* Called with ovs_mutex or RCU read lock. */
> -static int ovs_flow_cmd_fill_match(const struct sw_flow *flow,
> - struct sk_buff *skb)
> -{
> - struct nlattr *nla;
> - int err;
> -
> - /* Fill flow key. */
> - nla = nla_nest_start(skb, OVS_FLOW_ATTR_KEY);
> - if (!nla)
> - return -EMSGSIZE;
> -
> - err = ovs_nla_put_flow(&flow->unmasked_key, &flow->unmasked_key, skb);
> - if (err)
> - return err;
> -
> - nla_nest_end(skb, nla);
> -
> - /* Fill flow mask. */
> - nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK);
> - if (!nla)
> - return -EMSGSIZE;
> -
> - err = ovs_nla_put_flow(&flow->key, &flow->mask->key, skb);
> - if (err)
> - return err;
> -
> - nla_nest_end(skb, nla);
> - return 0;
> -}
> -
> -/* Called with ovs_mutex or RCU read lock. */
> static int ovs_flow_cmd_fill_stats(const struct sw_flow *flow,
> struct sk_buff *skb)
> {
> @@ -787,7 +754,11 @@ static int ovs_flow_cmd_fill_info(const struct sw_flow *flow, int dp_ifindex,
>
> ovs_header->dp_ifindex = dp_ifindex;
>
> - err = ovs_flow_cmd_fill_match(flow, skb);
> + err = ovs_nla_put_unmasked_key(flow, skb);
> + if (err)
> + goto error;
> +
> + err = ovs_nla_put_mask(flow, skb);
> if (err)
> goto error;
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index df3c7f2..7bb571f 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -1131,12 +1131,12 @@ int ovs_nla_get_flow_metadata(const struct nlattr *attr,
> return metadata_from_nlattrs(&match, &attrs, a, false, log);
> }
>
> -int ovs_nla_put_flow(const struct sw_flow_key *swkey,
> - const struct sw_flow_key *output, struct sk_buff *skb)
> +int __ovs_nla_put_flow(const struct sw_flow_key *swkey,
> + const struct sw_flow_key *output, bool is_mask,
> + struct sk_buff *skb)
> {
> struct ovs_key_ethernet *eth_key;
> struct nlattr *nla, *encap;
> - bool is_mask = (swkey != output);
>
> if (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id))
> goto nla_put_failure;
> @@ -1346,6 +1346,45 @@ nla_put_failure:
> return -EMSGSIZE;
> }
>
> +int ovs_nla_put_flow(const struct sw_flow_key *swkey,
> + const struct sw_flow_key *output, int attr, bool is_mask,
> + struct sk_buff *skb)
> +{
> + int err;
> + struct nlattr *nla;
> +
> + nla = nla_nest_start(skb, attr);
> + if (!nla)
> + return -EMSGSIZE;
> + err = __ovs_nla_put_flow(swkey, output, is_mask, skb);
> + if (err)
> + return err;
> + nla_nest_end(skb, nla);
> +
> + return 0;
> +}
> +
> +/* Called with ovs_mutex or RCU read lock. */
> +int ovs_nla_put_unmasked_key(const struct sw_flow *flow, struct sk_buff *skb)
> +{
> + return ovs_nla_put_flow(&flow->unmasked_key, &flow->unmasked_key,
> + OVS_FLOW_ATTR_KEY, false, skb);
> +}
> +
> +/* Called with ovs_mutex or RCU read lock. */
> +int ovs_nla_put_masked_key(const struct sw_flow *flow, struct sk_buff *skb)
> +{
> + return ovs_nla_put_flow(&flow->mask->key, &flow->key,
> + OVS_FLOW_ATTR_KEY, false, skb);
> +}
> +
> +/* Called with ovs_mutex or RCU read lock. */
> +int ovs_nla_put_mask(const struct sw_flow *flow, struct sk_buff *skb)
> +{
> + return ovs_nla_put_flow(&flow->key, &flow->mask->key,
> + OVS_FLOW_ATTR_MASK, true, skb);
> +}
> +
> #define MAX_ACTIONS_BUFSIZE (32 * 1024)
>
> static struct sw_flow_actions *nla_alloc_flow_actions(int size, bool log)
> diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
> index 577f12b..ea54564 100644
> --- a/net/openvswitch/flow_netlink.h
> +++ b/net/openvswitch/flow_netlink.h
> @@ -43,11 +43,15 @@ size_t ovs_key_attr_size(void);
> void ovs_match_init(struct sw_flow_match *match,
> struct sw_flow_key *key, struct sw_flow_mask *mask);
>
> -int ovs_nla_put_flow(const struct sw_flow_key *,
> - const struct sw_flow_key *, struct sk_buff *);
> +int ovs_nla_put_flow(const struct sw_flow_key *, const struct sw_flow_key *,
> + int attr, bool is_mask, struct sk_buff *);
> int ovs_nla_get_flow_metadata(const struct nlattr *, struct sw_flow_key *,
> bool log);
>
> +int ovs_nla_put_unmasked_key(const struct sw_flow *flow, struct sk_buff *skb);
> +int ovs_nla_put_masked_key(const struct sw_flow *flow, struct sk_buff *skb);
> +int ovs_nla_put_mask(const struct sw_flow *flow, struct sk_buff *skb);
> +
> int ovs_nla_get_match(struct sw_flow_match *, const struct nlattr *key,
> const struct nlattr *mask, bool log);
> int ovs_nla_put_egress_tunnel_key(struct sk_buff *,
> --
> 1.7.10.4
>
^ permalink raw reply
* Re: [PATCH net-next v3] rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
From: Mahesh Bandewar @ 2014-12-03 19:08 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, David Miller, Eric Dumazet, Roopa Prabhu, Toshiaki Makita
In-Reply-To: <1417592357.5303.130.camel@edumazet-glaptop2.roam.corp.google.com>
On Tue, Dec 2, 2014 at 11:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-12-02 at 23:24 -0800, Mahesh Bandewar wrote:
>
>> +EXPORT_SYMBOL(rtmsg_ifinfo_build_skb);
>> +
>> +void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags)
>> +{
>> + struct net *net = dev_net(dev);
>> +
>> + rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
>> +}
>> +EXPORT_SYMBOL(rtmsg_ifinfo_send);
>> +
>> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
>> + gfp_t flags)
>> +{
>> + struct sk_buff *skb;
>> +
>> + skb = rtmsg_ifinfo_build_skb(type, dev, change, flags);
>> + if (skb)
>> + rtmsg_ifinfo_send(skb, dev, flags);
>> }
>> EXPORT_SYMBOL(rtmsg_ifinfo);
>
> One last point :
>
> Only rtmsg_ifinfo() needs the EXPORT_SYMBOL(...)
>
> rtmsg_ifinfo_build_skb() and rtmsg_ifinfo_send() are used from core networking stack,
> not a module.
>
Thanks Eric. If we ever need that in a module in the future, we can
add that later. Will remove them.
>
>
^ permalink raw reply
* Re: [PATCH v2 5/6] myri10ge: use eth_skb_pad helper
From: Sergei Shtylyov @ 2014-12-03 18:56 UTC (permalink / raw)
To: Alexander Duyck, netdev; +Cc: Hyong-Youb Kim, davem
In-Reply-To: <20141203161758.9223.85476.stgit@ahduyck-vm-fedora20>
Hello.
On 12/03/2014 07:17 PM, Alexander Duyck wrote:
> Update myri10ge to use eth_skb_pad helper. This also corrects a minor
> issue as the driver was updating length without updating the tail pointer.
> Cc: Hyong-Youb Kim <hykim@myri.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---
> drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
> diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> index 9e7e3f1..af09905 100644
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> @@ -2913,16 +2913,11 @@ again:
> flags |= MXGEFW_FLAGS_SMALL;
>
> /* pad frames to at least ETH_ZLEN bytes */
> - if (unlikely(skb->len < ETH_ZLEN)) {
> - if (skb_padto(skb, ETH_ZLEN)) {
> - /* The packet is gone, so we must
> - * return 0 */
> - ss->stats.tx_dropped += 1;
> - return NETDEV_TX_OK;
> - }
> - /* adjust the len to account for the zero pad
> - * so that the nic can know how long it is */
> - skb->len = ETH_ZLEN;
> + if (eth_skb_pad(skb)) {
> + /* The packet is gone, so we must
> + * return 0 */
Time to fix the comment style, perhaps? The preferred one for the
networking code is:
/* bla
* bla
*/
> + ss->stats.tx_dropped += 1;
Hm, why not 'ss->stats.tx_dropped++'?
> + return NETDEV_TX_OK;
> }
> }
WBR, Sergei
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Rick Jones @ 2014-12-03 18:56 UTC (permalink / raw)
To: Michael S. Tsirkin, Jesse Gross
Cc: Thomas Graf, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <20141203183859.GB16447@redhat.com>
Trying to "fake-out" an ICMP message to paper-over "devices" in the
"middle" of same Layer2 network having different MTUs from the ends goes
back to at least the days when people started joining FDDI networks to
Ethernet networks with bridges rather than routers. Perhaps back even
further. It had problems them (including not all traffic being IP), I
doubt today would be any different.
All devices in a Layer2 network must have the same MTU. (*)
The only exception to that which does not lead one down a rathole is
that you can have the MTU at the "edges" of the Layer 2 network be
smaller than the MTU in the "middle." And then only if the middle
"never" tries to talk itself to the edges directly.
rick jones
(*) Or at least any communicating device must be kept ignorant of what
one does to have a smaller MTU in there somewhere.
^ permalink raw reply
* Re: [PATCHv11 net-next 2/2] openvswitch: Add support for unique flow IDs.
From: Joe Stringer @ 2014-12-03 18:47 UTC (permalink / raw)
To: Pravin Shelar; +Cc: Linux Kernel, dev@openvswitch.org, Linux Netdev List
In-Reply-To: <1417575363-13770-2-git-send-email-joestringer@nicira.com>
I forgot to mention that this is the first post based against net-next.
On 2 December 2014 at 18:56, Joe Stringer <joestringer@nicira.com> wrote:
> <....snip...>
> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
> index a8b30f3..7f31dbf 100644
> --- a/net/openvswitch/flow.h
> +++ b/net/openvswitch/flow.h
> @@ -197,6 +197,13 @@ struct sw_flow_match {
> struct sw_flow_mask *mask;
> };
>
> +#define MAX_UFID_LENGTH 256
> +
> +struct sw_flow_id {
> + u32 ufid_len;
> + u32 ufid[MAX_UFID_LENGTH / 4];
> +};
> +
> struct sw_flow_actions {
> struct rcu_head rcu;
> u32 actions_len;
Pravin, I changed the 'struct sw_flow_id' to the above after feedback
from the previous round, but it doesn't seem quite right. Is this what
you meant? Given that current ovs-vswitchd userspace only generates
128bit UFIDs, it seems wasteful to be allocating so much for this. Did
you have in mind for this to be united with the unmasked_key?
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Michael S. Tsirkin @ 2014-12-03 18:38 UTC (permalink / raw)
To: Jesse Gross
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jason Wang,
Du, Fan, fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <CAEP_g=9C+D3gbjJ4n1t6xuyjqEAMYi4ZfqPoe92UAoQJH-UsKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Wed, Dec 03, 2014 at 10:07:42AM -0800, Jesse Gross wrote:
> On Wed, Dec 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 02, 2014 at 10:12:04AM -0800, Jesse Gross wrote:
> >> On Tue, Dec 2, 2014 at 9:41 AM, Thomas Graf <tgraf@suug.ch> wrote:
> >> > On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
> >> >> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
> >> >> > On 12/02/14 at 01:48pm, Flavio Leitner wrote:
> >> >> > > What about containers or any other virtualization environment that
> >> >> > > doesn't use Virtio?
> >> >> >
> >> >> > The host can dictate the MTU in that case for both veth or OVS
> >> >> > internal which would be primary container plumbing techniques.
> >> >>
> >> >> It typically can't do this easily for VMs with emulated devices:
> >> >> real ethernet uses a fixed MTU.
> >> >>
> >> >> IMHO it's confusing to suggest MTU as a fix for this bug, it's
> >> >> an unrelated optimization.
> >> >> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
> >> >
> >> > PMTU discovery only resolves the issue if an actual IP stack is
> >> > running inside the VM. This may not be the case at all.
> >>
> >> It's also only really a correct thing to do if the ICMP packet is
> >> coming from an L3 node. If you are doing straight bridging then you
> >> have to resort to hacks like OVS had before, which I agree are not
> >> particularly desirable.
> >
> > The issue seems to be that fundamentally, this is
> > bridging interfaces with variable MTUs (even if MTU values
> > on devices don't let us figure this out)-
> > that is already not straight bridging, and
> > I would argue sending ICMPs back is the right thing to do.
>
> How do you deal with the fact that there is no host IP stack inside
> the tunnel? And isn't this exactly the same as the former OVS
> implementation that you said you didn't like?
I was talking about the high level requirement, not the implementation
here. I agree it's not at all trivial, we need to propagate this across
tunnels.
But let's agree on what we are trying to accomplish first.
> >> > I agree that exposing an MTU towards the guest is not applicable
> >> > in all situations, in particular because it is difficult to decide
> >> > what MTU to expose. It is a relatively elegant solution in a lot
> >> > of virtualization host cases hooked up to an orchestration system
> >> > though.
> >>
> >> I also think this is the right thing to do as a common case
> >> optimization and I know other platforms (such as Hyper-V) do it. It's
> >> not a complete solution so we still need the original patch in this
> >> thread to handle things transparently.
> >
> > Well, as I believe David (and independently Jason) is saying, it looks like
> > the ICMPs we are sending back after applying the original patch have the
> > wrong MTU.
>
> The problem is actually that the ICMP messages won't even go to the
> sending VM because the host IP stack and the VM are isolated from each
> other and there is no route.
Exactly.
But all this is talking about implementation.
Let's agree on what we want to do first.
And in my mind, we typically want originator to adjust its PMTU,
just for a given destination.
Sending ICMP to originating VM will typically accomplish this.
> > And if I understand what David is saying here, IP is also the wrong place to
> > do it.
>
> ICMP can't be the complete solution in any case because it only works
> for IP traffic.
Let's be specific please. What protocols do you most care about? IPX?
> I think there are only two full solutions: find a way
> to adjust the guest MTU to the minimum MTU that its traffic could hit
> in an L2 domain or fragmentation. ICMP could be a possible
> optimization in the fragmentation case.
Both approaches seem strange. You are sending 1 packet an hour to
some destination behind 100 tunnels. Why would you want to
cut down your MTU for all packets? On the other hand,
doubling the amount of packets because your MTU is off
by a couple of bytes will hurt performance significantly.
Still, if you want to cut down the MTU within guest,
that's only an ifconfig away.
Most people would not want to bother, I think it's a good
idea to make PMTU work properly for them.
--
MST
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: bcmgenet: add support for new GENET PHY revision scheme
From: Sergei Shtylyov @ 2014-12-03 18:12 UTC (permalink / raw)
To: Florian Fainelli, netdev; +Cc: davem
In-Reply-To: <547F4E4E.5030906@gmail.com>
Hello.
On 12/03/2014 08:54 PM, Florian Fainelli wrote:
>>> Starting with GPHY revision G0, the GENET register layout has changed to
>>> use the same numbering scheme as the Starfighter 2 switch. This means
>>> that GPHY major revision is in bits 15:12, minor in bits 11:8 and patch
>>> level is in bits 7:4.
>>> Introduce a small heuristic which checks for the old scheme first, tests
>>> for the new scheme and finally attempts to catch reserved values and
>>> aborts.
>>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>>> ---
>>> drivers/net/ethernet/broadcom/genet/bcmgenet.c | 24
>>> +++++++++++++++++++++++-
>>> 1 file changed, 23 insertions(+), 1 deletion(-)
>>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>> index f2fadb053d52..23e283174c4e 100644
>>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> [...]
>>> @@ -2551,8 +2552,29 @@ static void bcmgenet_set_hw_params(struct
[...]
>>> + if ((gphy_rev & 0xf0) != 0)
>>> + priv->gphy_rev = gphy_rev << 8;
>>> +
>>> + /* This is the new scheme, GPHY major rolls over with 0x10 = rev
>>> G0 */
>>> + else if ((gphy_rev & 0xff00) != 0)
>>> + priv->gphy_rev = gphy_rev;
>>> +
>>> + /* This is reserved so should require special treatment */
>>> + else if (gphy_rev == 0 || gphy_rev == 0x01ff) {
>>> + pr_warn("Invalid GPHY revision detected: 0x%04x\n", gphy_rev);
>>> + return;
>>> + }
>>
>> Hm, {} are needed on all *if* branches.
> checkpatch.pl did not complain about that.
It probably still doesn't. Nevertheless, refer to
Documentation/CodingStyle, chapter 3.
>> [...]
WBR, Sergei
^ permalink raw reply
* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Jesse Gross @ 2014-12-03 18:07 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Thomas Graf, Du, Fan, Jason Wang, netdev@vger.kernel.org,
davem@davemloft.net, fw@strlen.de, dev@openvswitch.org,
Pravin Shelar
In-Reply-To: <20141203090339.GA9299@redhat.com>
On Wed, Dec 3, 2014 at 1:03 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Dec 02, 2014 at 10:12:04AM -0800, Jesse Gross wrote:
>> On Tue, Dec 2, 2014 at 9:41 AM, Thomas Graf <tgraf@suug.ch> wrote:
>> > On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
>> >> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
>> >> > On 12/02/14 at 01:48pm, Flavio Leitner wrote:
>> >> > > What about containers or any other virtualization environment that
>> >> > > doesn't use Virtio?
>> >> >
>> >> > The host can dictate the MTU in that case for both veth or OVS
>> >> > internal which would be primary container plumbing techniques.
>> >>
>> >> It typically can't do this easily for VMs with emulated devices:
>> >> real ethernet uses a fixed MTU.
>> >>
>> >> IMHO it's confusing to suggest MTU as a fix for this bug, it's
>> >> an unrelated optimization.
>> >> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
>> >
>> > PMTU discovery only resolves the issue if an actual IP stack is
>> > running inside the VM. This may not be the case at all.
>>
>> It's also only really a correct thing to do if the ICMP packet is
>> coming from an L3 node. If you are doing straight bridging then you
>> have to resort to hacks like OVS had before, which I agree are not
>> particularly desirable.
>
> The issue seems to be that fundamentally, this is
> bridging interfaces with variable MTUs (even if MTU values
> on devices don't let us figure this out)-
> that is already not straight bridging, and
> I would argue sending ICMPs back is the right thing to do.
How do you deal with the fact that there is no host IP stack inside
the tunnel? And isn't this exactly the same as the former OVS
implementation that you said you didn't like?
>> > I agree that exposing an MTU towards the guest is not applicable
>> > in all situations, in particular because it is difficult to decide
>> > what MTU to expose. It is a relatively elegant solution in a lot
>> > of virtualization host cases hooked up to an orchestration system
>> > though.
>>
>> I also think this is the right thing to do as a common case
>> optimization and I know other platforms (such as Hyper-V) do it. It's
>> not a complete solution so we still need the original patch in this
>> thread to handle things transparently.
>
> Well, as I believe David (and independently Jason) is saying, it looks like
> the ICMPs we are sending back after applying the original patch have the
> wrong MTU.
The problem is actually that the ICMP messages won't even go to the
sending VM because the host IP stack and the VM are isolated from each
other and there is no route.
> And if I understand what David is saying here, IP is also the wrong place to
> do it.
ICMP can't be the complete solution in any case because it only works
for IP traffic. I think there are only two full solutions: find a way
to adjust the guest MTU to the minimum MTU that its traffic could hit
in an L2 domain or fragmentation. ICMP could be a possible
optimization in the fragmentation case.
^ permalink raw reply
* Re: [PATCH] bpf: arm64: lift restriction on last instruction
From: Will Deacon @ 2014-12-03 18:04 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Zi Shen Lim, David S. Miller, Catalin Marinas, Daniel Borkmann,
Network Development, linux-arm-kernel@lists.infradead.org, LKML
In-Reply-To: <CAMEtUuwJHC2eyOh3pD=qDXw1YbfJqx1a7W0o65n5Ur_B_TvNOA@mail.gmail.com>
On Wed, Dec 03, 2014 at 03:54:32PM +0000, Alexei Starovoitov wrote:
> On Wed, Dec 3, 2014 at 12:38 AM, Zi Shen Lim <zlim.lnx@gmail.com> wrote:
> > Earlier implementation assumed last instruction is BPF_EXIT.
> > Since this is no longer a restriction in eBPF, we remove this
> > limitation.
> >
> > Per Alexei Starovoitov [1]:
> >> classic BPF has a restriction that last insn is always BPF_RET.
> >> eBPF doesn't have BPF_RET instruction and this restriction.
> >> It has BPF_EXIT insn which can appear anywhere in the program
> >> one or more times and it doesn't have to be last insn.
> >
> > [1] https://lkml.org/lkml/2014/11/27/2
> >
> > Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
> > Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
>
> yours is cleaner than my own attempt to fix it.
> Thanks!
> Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cheers, I've applied this for 3.19.
Will
^ permalink raw reply
* [PATCH net-next v2 2/2] net: phy: bcm7xxx: add an explicit version check for GPHY rev G0
From: Florian Fainelli @ 2014-12-03 17:57 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
In-Reply-To: <1417629420-31146-1-git-send-email-f.fainelli@gmail.com>
GPHY revision G0 has its version rolled over to 0x10, introduce an
explicit check for that revision and invoke the proper workaround
function for it.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
drivers/net/phy/bcm7xxx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index 7a53af4346e4..974ec4515269 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -252,6 +252,8 @@ static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
break;
case 0xe0:
case 0xf0:
+ /* Rev G0 introduces a roll over */
+ case 0x10:
ret = bcm7xxx_28nm_e0_plus_afe_config_init(phydev);
break;
default:
--
2.1.0
^ permalink raw reply related
* [PATCH net-next v2 1/2] net: bcmgenet: add support for new GENET PHY revision scheme
From: Florian Fainelli @ 2014-12-03 17:56 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
In-Reply-To: <1417629420-31146-1-git-send-email-f.fainelli@gmail.com>
Starting with GPHY revision G0, the GENET register layout has changed to
use the same numbering scheme as the Starfighter 2 switch. This means
that GPHY major revision is in bits 15:12, minor in bits 11:8 and patch
level is in bits 7:4.
Introduce a small heuristic which checks for the old scheme first, tests
for the new scheme and finally attempts to catch reserved values and
aborts.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v2:
- remove unnecessary parenthesis
- add missing "is" in paragraph about the scheme
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index f2fadb053d52..410c22bc655a 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2503,6 +2503,7 @@ static void bcmgenet_set_hw_params(struct bcmgenet_priv *priv)
struct bcmgenet_hw_params *params;
u32 reg;
u8 major;
+ u16 gphy_rev;
if (GENET_IS_V4(priv)) {
bcmgenet_dma_regs = bcmgenet_dma_regs_v3plus;
@@ -2551,8 +2552,29 @@ static void bcmgenet_set_hw_params(struct bcmgenet_priv *priv)
* to pass this information to the PHY driver. The PHY driver expects
* to find the PHY major revision in bits 15:8 while the GENET register
* stores that information in bits 7:0, account for that.
+ *
+ * On newer chips, starting with PHY revision G0, a new scheme is
+ * deployed similar to the Starfighter 2 switch with GPHY major
+ * revision in bits 15:8 and patch level in bits 7:0. Major revision 0
+ * is reserved as well as special value 0x01ff, we have a small
+ * heuristic to check for the new GPHY revision and re-arrange things
+ * so the GPHY driver is happy.
*/
- priv->gphy_rev = (reg & 0xffff) << 8;
+ gphy_rev = reg & 0xffff;
+
+ /* This is the good old scheme, just GPHY major, no minor nor patch */
+ if ((gphy_rev & 0xf0) != 0)
+ priv->gphy_rev = gphy_rev << 8;
+
+ /* This is the new scheme, GPHY major rolls over with 0x10 = rev G0 */
+ else if ((gphy_rev & 0xff00) != 0)
+ priv->gphy_rev = gphy_rev;
+
+ /* This is reserved so should require special treatment */
+ else if (gphy_rev == 0 || gphy_rev == 0x01ff) {
+ pr_warn("Invalid GPHY revision detected: 0x%04x\n", gphy_rev);
+ return;
+ }
#ifdef CONFIG_PHYS_ADDR_T_64BIT
if (!(params->flags & GENET_HAS_40BITS))
--
2.1.0
^ permalink raw reply related
* [PATCH net-next v2 0/2] net: bcmgenet: support for new GPHY revision scheme
From: Florian Fainelli @ 2014-12-03 17:56 UTC (permalink / raw)
To: netdev; +Cc: davem, Florian Fainelli
Hi David,
These two patches update the GENET GPHY revision logic to account for some
of our newer designs starting with GPHY rev G0.
Florian Fainelli (2):
net: bcmgenet: add support for new GENET PHY revision scheme
net: phy: bcm7xxx: add an explicit version check for GPHY rev G0
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 24 +++++++++++++++++++++++-
drivers/net/phy/bcm7xxx.c | 2 ++
2 files changed, 25 insertions(+), 1 deletion(-)
--
2.1.0
^ permalink raw reply
* Re: [PATCH iproute2] ss: Use rtnl_dump_filter in handle_netlink_request
From: vadim4j @ 2014-12-03 17:45 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20141203095123.40429f4c@urahara>
On Wed, Dec 03, 2014 at 09:51:23AM -0800, Stephen Hemminger wrote:
> On Wed, 3 Dec 2014 19:20:10 +0200
> vadim4j@gmail.com wrote:
>
> > On Wed, Dec 03, 2014 at 09:21:29AM -0800, Stephen Hemminger wrote:
> > > On Tue, 2 Dec 2014 16:53:04 +0200
> > > Vadim Kochan <vadim4j@gmail.com> wrote:
> > >
> > > > Replaced handling netlink messages by rtnl_dump_filter
> > > > from lib/libnetlink.c, also:
> > > >
> > > > - removed unused dump_fp arg;
> > > > - added MAGIC_SEQ #define for 123456 seq id
> > > >
> > > > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> > >
> > > This doesn't work correctly.
> > >
> > > Simple test
> > > $ misc/ss >/dev/null
> > > RTNETLINK answers: No such file or directory
> > > RTNETLINK answers: No such file or directory
> > > RTNETLINK answers: No such file or directory
> >
> > Just tried, I did not get such errors.
>
> I have OpenVPN running.
Hm, it is reproduced only with this patch ?
If so I will try to setup OpenVPN ... can't imagine how it can be
related ...
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: bcmgenet: add support for new GENET PHY revision scheme
From: Florian Fainelli @ 2014-12-03 17:54 UTC (permalink / raw)
To: Sergei Shtylyov, netdev; +Cc: davem
In-Reply-To: <547EF2BC.60504@cogentembedded.com>
On 03/12/14 03:23, Sergei Shtylyov wrote:
> Hello.
>
> On 12/3/2014 2:28 AM, Florian Fainelli wrote:
>
>> Starting with GPHY revision G0, the GENET register layout has changed to
>> use the same numbering scheme as the Starfighter 2 switch. This means
>> that GPHY major revision is in bits 15:12, minor in bits 11:8 and patch
>> level is in bits 7:4.
>
>> Introduce a small heuristic which checks for the old scheme first, tests
>> for the new scheme and finally attempts to catch reserved values and
>> aborts.
>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> drivers/net/ethernet/broadcom/genet/bcmgenet.c | 24
>> +++++++++++++++++++++++-
>> 1 file changed, 23 insertions(+), 1 deletion(-)
>
>> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> index f2fadb053d52..23e283174c4e 100644
>> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
>> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> [...]
>> @@ -2551,8 +2552,29 @@ static void bcmgenet_set_hw_params(struct
>> bcmgenet_priv *priv)
>> * to pass this information to the PHY driver. The PHY driver
>> expects
>> * to find the PHY major revision in bits 15:8 while the GENET
>> register
>> * stores that information in bits 7:0, account for that.
>> + *
>> + * On newer chips, starting with PHY revision G0, a new scheme is
>> + * deployed similar to the Starfighter 2 switch with GPHY major
>> + * revision in bits 15:8 and patch level in bits 7:0. Major
>> revision 0
>> + * is reserved as well as special value 0x01ff, we have a small
>> + * heuristic to check for the new GPHY revision and re-arrange
>> things
>> + * so the GPHY driver is happy.
>> */
>> - priv->gphy_rev = (reg & 0xffff) << 8;
>> + gphy_rev = (reg & 0xffff);
>
> Parens not needed anymore.
>
>> +
>> + /* This the good old scheme, just GPHY major, no minor nor patch */
>
> Missing "is" after "This"?
Alright, I will resubmit...
>
>> + if ((gphy_rev & 0xf0) != 0)
>> + priv->gphy_rev = gphy_rev << 8;
>> +
>> + /* This is the new scheme, GPHY major rolls over with 0x10 = rev
>> G0 */
>> + else if ((gphy_rev & 0xff00) != 0)
>> + priv->gphy_rev = gphy_rev;
>> +
>> + /* This is reserved so should require special treatment */
>> + else if (gphy_rev == 0 || gphy_rev == 0x01ff) {
>> + pr_warn("Invalid GPHY revision detected: 0x%04x\n", gphy_rev);
>> + return;
>> + }
>
> Hm, {} are needed on all *if* branches.
checkpatch.pl did not complain about that.
>
> [...]
>
> WBR, Sergei
>
^ permalink raw reply
* Re: [PATCH iproute2] ip monitor: Fixed printing timestamp few times
From: vadim4j @ 2014-12-03 17:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20141203094731.4d310fc5@urahara>
On Wed, Dec 03, 2014 at 09:47:31AM -0800, Stephen Hemminger wrote:
> On Wed, 3 Dec 2014 19:11:26 +0200
> vadim4j@gmail.com wrote:
>
> > I think it was caused by RTM_NEWNDUSEROPT which was sent by router.
> > So in this case the message will be skipped with printed timestamp.
>
> Can that be decoded? it might be useful.
Sorry, you mean how it can be reproduced by clean scenario?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox