Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Alexei Starovoitov @ 2016-09-13 17:13 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Eric Dumazet, John Fastabend, Brenden Blanco, Jeff Kirsher,
	Jesper Dangaard Brouer, David S. Miller, Cong Wang,
	intel-wired-lan, William Tu, Linux Kernel Network Developers
In-Reply-To: <CALx6S36u+tQUEVhu0BaLZ6-_0=GSMwxd4m89-H+brVZ-rr-U2g@mail.gmail.com>

On Tue, Sep 13, 2016 at 09:21:47AM -0700, Tom Herbert wrote:
> On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
> >> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
> >> >
> >> >> yep. there are various ways to shoot yourself in the foot with xdp.
> >> >> The simplest program that drops all the packets will make the box unpingable.
> >> >
> >> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> >> > scooter on 101 highway ;)
> >> >
> >> > This XDP_TX thing was one of the XDP marketing stuff, but there is
> >> > absolutely no documentation on it, warning users about possible
> >> > limitations/outcomes.
> >> >
> >> > BTW, I am not sure mlx4 implementation even works, vs BQL :
> >> >
> >> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> >> > but tx completion will call netdev_tx_completed_queue() -> crash
> >> >
> >> > Do we have one test to validate that a XDP_TX implementation is actually
> >> > correct ?
> >> >
> >> Obviously not for e1000 :-(. We really need some real test and
> >> performance results and analysis on the interaction between the stack
> >> data path and XDP data path.
> >
> > no. we don't need it for e1k and we cannot really do it.
> > <broken record mode on> this patch is for debugging of xdp programs only.
> >
> You can say this "only for a debugging" a thousand times and that
> still won't justify putting bad code into the kernel. Material issues
> have been raised with these patches, I have proposed a fix for one
> core issue, and we have requested a lot more testing. So, please, if
> you really want to move these patches forward start addressing the
> concerns being raised by reviewers.

I'm afraid the point 'only for debugging' still didn't make it across.
xdp+e1k is for development (and debugging) of xdp-type of bpf
programs and _not_ for debugging of xdp itself, kernel or anything else.
The e1k provided interfaces and behavior needs to match exactly
what real hw nics (like mlx4, mlx5, igxbe, i40e) will do.
Doing special hacks are not acceptable. Therefore your
'proposed fix' misses the mark, since:
1. ignoring bql/qdisc is not a bug, but the requirement
2. such 'fix' goes against the goal above since behaviors will be
different and xdp developer won't be able to build something like
xdp loadbalancer in the kvm.

If you have other concerns please raise them or if you have
suggestions on how to develop xdp programs without this e1k patch
I would love hear them.
Alexander's review comments are discussed in separate thread.

^ permalink raw reply

* Re: [PATCH 3/3] net-next: dsa: add new driver for qca8xxx family
From: Vivien Didelot @ 2016-09-13 17:11 UTC (permalink / raw)
  To: Andrew Lunn, John Crispin
  Cc: David S. Miller, Florian Fainelli, netdev, linux-kernel,
	qsdk-review
In-Reply-To: <20160913131408.GE15332@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

>> ok, i will simply substract 1 from the phy_addr inside the mdio
>> callbacks. this would make the code more readable and make the DT
>> binding compliant with the ePAPR spec.
>
> It does however need well commenting. It is setting a trap for anybody
> who puts an external PHY on port 6. If they access that PHY via these
> functions, the address is off by one.
>
> This is the first silicon vendor who made their MDIO addresses for
> PHYs illogical. So i'm thinking we maybe should add a new function to
> dsa_switch_ops.
>
> 	/* Return the MDIO address for the PHY for this port. */
>         int     (*phy_port_map(struct dsa_switch *ds, int port);
>
> This should return the MDIO address for integrated PHYs only, or
> -ENODEV if the port does not have an integrated PHY. For an external
> PHY, a phy-handle should be used. This phy_port_map() is used in
> dsa_slave_phy_setup(). But dsa_slave_phy_setup() is already too
> complex, so it needs doing with care.

Note that some switch drivers *have to* register their slave MDIO bus
themselves (e.g. bcm_sf2). This becomes confusing with the DSA
phy_{read,write} ops.

Since the former alternative is prefered, we may want to remove the
latter soon from DSA. If this phy_port_map is needed for that case, it'd
be preferable not to add it.

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH 3/3] net-next: dsa: add new driver for qca8xxx family
From: Florian Fainelli @ 2016-09-13 17:09 UTC (permalink / raw)
  To: Andrew Lunn, John Crispin
  Cc: David S. Miller, netdev, linux-kernel, qsdk-review
In-Reply-To: <20160913155926.GP11400@lunn.ch>

On 09/13/2016 08:59 AM, Andrew Lunn wrote:
>> Hi Andrew,
>>
>> this function does indeed duplicate the functionality of
>> phy_ethtool_get_eee() with the small difference, that e->eee_active is
>> also set which phy_ethtool_get_eee() does not set.
>>
>> dsa_slave_get_eee() will call phy_ethtool_get_eee() right after the
>> get_eee() op has been called. would it be ok to move the code setting
>> eee_active to  phy_ethtool_get_eee().

Humm, AFAIR, the reason why eee_active is set outside of
phy_ethtool_set_eee() is because this is a MAC + PHY thing, both need to
agree and support that, and so while the PHY may be configured to have
EEE advertised and enabled, you also need to take care of the MAC
portion and enable EEE in there as well. Is not there such a thing for
the qca8k switch where the PHY needs to be configured through the
standard phylib calls, but the switch's transmitter/receiver also needs
to have EEE enabled?
-- 
Florian

^ permalink raw reply

* Re: [Intel-wired-lan] [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Alexei Starovoitov @ 2016-09-13 16:56 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: John Fastabend, Brenden Blanco, Jeff Kirsher,
	Jesper Dangaard Brouer, David Miller, Cong Wang, intel-wired-lan,
	u9012063, Netdev
In-Reply-To: <CAKgT0UfzRehm_YNi2k5uyvdW_52mPSO3cEGfr6EtA+5ONM+cLQ@mail.gmail.com>

On Mon, Sep 12, 2016 at 08:42:41PM -0700, Alexander Duyck wrote:
> On Mon, Sep 12, 2016 at 3:13 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
> > From: Alexei Starovoitov <ast@fb.com>
> >
> > This patch adds initial support for XDP on e1000 driver. Note e1000
> > driver does not support page recycling in general which could be
> > added as a further improvement. However XDP_DROP case will recycle.
> > XDP_TX and XDP_PASS do not support recycling.
> >
> > e1000 only supports a single tx queue at this time so the queue
> > is shared between xdp program and Linux stack. It is possible for
> > an XDP program to starve the stack in this model.
> >
> > The XDP program will drop packets on XDP_TX errors. This can occur
> > when the tx descriptors are exhausted. This behavior is the same
> > for both shared queue models like e1000 and dedicated tx queue
> > models used in multiqueue devices. However if both the stack and
> > XDP are transmitting packets it is perhaps more likely to occur in
> > the shared queue model. Further refinement to the XDP model may be
> > possible in the future.
> >
> > I tested this patch running e1000 in a VM using KVM over a tap
> > device.
> >
> > CC: William Tu <u9012063@gmail.com>
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > ---
> >  drivers/net/ethernet/intel/e1000/e1000.h      |    2
> >  drivers/net/ethernet/intel/e1000/e1000_main.c |  176 +++++++++++++++++++++++++
> >  2 files changed, 175 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h
> > index d7bdea7..5cf8a0a 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000.h
> > +++ b/drivers/net/ethernet/intel/e1000/e1000.h
> > @@ -150,6 +150,7 @@ struct e1000_adapter;
> >   */
> >  struct e1000_tx_buffer {
> >         struct sk_buff *skb;
> > +       struct page *page;
> >         dma_addr_t dma;
> >         unsigned long time_stamp;
> >         u16 length;
> 
> I'm not really a huge fan of adding yet another member to this
> structure.  Each e1000_tx_buffer is already pretty big at 40 bytes,
> pushing it to 48 just means we lose that much more memory.  If nothing
> else we may wan to look at doing something like creating a union
> between the skb, page, and an unsigned long.  Then you could use the
> lowest bit of the address as a flag indicating if this is a skb or a
> page.

that exactly what we did for mlx4_en_tx_info, since it's a real nic
where performance matters. For e1k I didn't want to complicate
the logic for no reason, since I don't see how 8 extra bytes matter here.
we will take a look if union is easy to do though.

> > @@ -279,6 +280,7 @@ struct e1000_adapter {
> >                              struct e1000_rx_ring *rx_ring,
> >                              int cleaned_count);
> >         struct e1000_rx_ring *rx_ring;      /* One per active queue */
> > +       struct bpf_prog *prog;
> >         struct napi_struct napi;
> >
> >         int num_tx_queues;
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index 62a7f8d..232b927 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -32,6 +32,7 @@
> >  #include <linux/prefetch.h>
> >  #include <linux/bitops.h>
> >  #include <linux/if_vlan.h>
> > +#include <linux/bpf.h>
> >
> >  char e1000_driver_name[] = "e1000";
> >  static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
> > @@ -842,6 +843,44 @@ static int e1000_set_features(struct net_device *netdev,
> >         return 0;
> >  }
> >
> > +static int e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
> > +{
> > +       struct e1000_adapter *adapter = netdev_priv(netdev);
> > +       struct bpf_prog *old_prog;
> > +
> > +       old_prog = xchg(&adapter->prog, prog);
> > +       if (old_prog) {
> > +               synchronize_net();
> > +               bpf_prog_put(old_prog);
> > +       }
> > +
> > +       if (netif_running(netdev))
> > +               e1000_reinit_locked(adapter);
> > +       else
> > +               e1000_reset(adapter);
> 
> What is the point of the reset?  If the interface isn't running is
> there anything in the hardware you actually need to cleanup?

The above is inspired by e1000_set_features().
I'm assuming it was done there for a reason and same applies here.

> > +       return 0;
> > +}
> > +
> > +static bool e1000_xdp_attached(struct net_device *dev)
> > +{
> > +       struct e1000_adapter *priv = netdev_priv(dev);
> > +
> > +       return !!priv->prog;
> > +}
> > +
> > +static int e1000_xdp(struct net_device *dev, struct netdev_xdp *xdp)
> > +{
> > +       switch (xdp->command) {
> > +       case XDP_SETUP_PROG:
> > +               return e1000_xdp_set(dev, xdp->prog);
> > +       case XDP_QUERY_PROG:
> > +               xdp->prog_attached = e1000_xdp_attached(dev);
> > +               return 0;
> > +       default:
> > +               return -EINVAL;
> > +       }
> > +}
> > +
> >  static const struct net_device_ops e1000_netdev_ops = {
> >         .ndo_open               = e1000_open,
> >         .ndo_stop               = e1000_close,
> > @@ -860,6 +899,7 @@ static const struct net_device_ops e1000_netdev_ops = {
> >  #endif
> >         .ndo_fix_features       = e1000_fix_features,
> >         .ndo_set_features       = e1000_set_features,
> > +       .ndo_xdp                = e1000_xdp,
> >  };
> >
> >  /**
> > @@ -1276,6 +1316,9 @@ static void e1000_remove(struct pci_dev *pdev)
> >         e1000_down_and_stop(adapter);
> >         e1000_release_manageability(adapter);
> >
> > +       if (adapter->prog)
> > +               bpf_prog_put(adapter->prog);
> > +
> >         unregister_netdev(netdev);
> >
> >         e1000_phy_hw_reset(hw);
> > @@ -1859,7 +1902,7 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
> >         struct e1000_hw *hw = &adapter->hw;
> >         u32 rdlen, rctl, rxcsum;
> >
> > -       if (adapter->netdev->mtu > ETH_DATA_LEN) {
> > +       if (adapter->netdev->mtu > ETH_DATA_LEN || adapter->prog) {
> >                 rdlen = adapter->rx_ring[0].count *
> >                         sizeof(struct e1000_rx_desc);
> >                 adapter->clean_rx = e1000_clean_jumbo_rx_irq;
> 
> If you are really serious about using the page based Rx path we should
> probably fix the fact that you take a pretty significant hit on
> performance penalty for turning this mode on.

Not sure I follow. KVM tests show that xdp_drop/tx is faster even with
full page alloc and no page recycling.
xdp is only operational in page-per-packet mode which dictates the above approach.

> > @@ -1973,6 +2016,11 @@ e1000_unmap_and_free_tx_resource(struct e1000_adapter *adapter,
> >                 dev_kfree_skb_any(buffer_info->skb);
> >                 buffer_info->skb = NULL;
> >         }
> > +       if (buffer_info->page) {
> > +               put_page(buffer_info->page);
> > +               buffer_info->page = NULL;
> > +       }
> > +
> >         buffer_info->time_stamp = 0;
> >         /* buffer_info must be completely set up in the transmit path */
> >  }
> > @@ -3298,6 +3346,69 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
> >         return NETDEV_TX_OK;
> >  }
> >
> > +static void e1000_tx_map_rxpage(struct e1000_tx_ring *tx_ring,
> > +                               struct e1000_rx_buffer *rx_buffer_info,
> > +                               unsigned int len)
> > +{
> > +       struct e1000_tx_buffer *buffer_info;
> > +       unsigned int i = tx_ring->next_to_use;
> > +
> > +       buffer_info = &tx_ring->buffer_info[i];
> > +
> > +       buffer_info->length = len;
> > +       buffer_info->time_stamp = jiffies;
> > +       buffer_info->mapped_as_page = false;
> > +       buffer_info->dma = rx_buffer_info->dma;
> > +       buffer_info->next_to_watch = i;
> > +       buffer_info->page = rx_buffer_info->rxbuf.page;
> > +
> > +       tx_ring->buffer_info[i].skb = NULL;
> > +       tx_ring->buffer_info[i].segs = 1;
> > +       tx_ring->buffer_info[i].bytecount = len;
> > +       tx_ring->buffer_info[i].next_to_watch = i;
> > +
> > +       rx_buffer_info->rxbuf.page = NULL;
> > +}
> > +
> > +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info,
> > +                                u32 len,
> > +                                struct net_device *netdev,
> > +                                struct e1000_adapter *adapter)
> > +{
> > +       struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0);
> > +       struct e1000_hw *hw = &adapter->hw;
> > +       struct e1000_tx_ring *tx_ring;
> > +
> > +       if (len > E1000_MAX_DATA_PER_TXD)
> > +               return;
> > +
> > +       /* e1000 only support a single txq at the moment so the queue is being
> > +        * shared with stack. To support this requires locking to ensure the
> > +        * stack and XDP are not running at the same time. Devices with
> > +        * multiple queues should allocate a separate queue space.
> > +        */
> > +       HARD_TX_LOCK(netdev, txq, smp_processor_id());
> > +
> > +       tx_ring = adapter->tx_ring;
> > +
> > +       if (E1000_DESC_UNUSED(tx_ring) < 2) {
> > +               HARD_TX_UNLOCK(netdev, txq);
> > +               return;
> > +       }
> > +
> > +       if (netif_xmit_frozen_or_stopped(txq))
> > +               return;
> > +
> > +       e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len);
> > +       netdev_sent_queue(netdev, len);
> > +       e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1);
> > +
> > +       writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
> > +       mmiowb();
> > +
> > +       HARD_TX_UNLOCK(netdev, txq);
> > +}
> > +
> >  #define NUM_REGS 38 /* 1 based count */
> >  static void e1000_regdump(struct e1000_adapter *adapter)
> >  {
> > @@ -4139,6 +4250,19 @@ static struct sk_buff *e1000_alloc_rx_skb(struct e1000_adapter *adapter,
> >         return skb;
> >  }
> >
> > +static inline int e1000_call_bpf(struct bpf_prog *prog, void *data,
> > +                                unsigned int length)
> > +{
> > +       struct xdp_buff xdp;
> > +       int ret;
> > +
> > +       xdp.data = data;
> > +       xdp.data_end = data + length;
> > +       ret = BPF_PROG_RUN(prog, (void *)&xdp);
> > +
> > +       return ret;
> > +}
> > +
> >  /**
> >   * e1000_clean_jumbo_rx_irq - Send received data up the network stack; legacy
> >   * @adapter: board private structure
> > @@ -4157,12 +4281,15 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
> >         struct pci_dev *pdev = adapter->pdev;
> >         struct e1000_rx_desc *rx_desc, *next_rxd;
> >         struct e1000_rx_buffer *buffer_info, *next_buffer;
> > +       struct bpf_prog *prog;
> >         u32 length;
> >         unsigned int i;
> >         int cleaned_count = 0;
> >         bool cleaned = false;
> >         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> >
> > +       rcu_read_lock(); /* rcu lock needed here to protect xdp programs */
> > +       prog = READ_ONCE(adapter->prog);
> >         i = rx_ring->next_to_clean;
> >         rx_desc = E1000_RX_DESC(*rx_ring, i);
> >         buffer_info = &rx_ring->buffer_info[i];
> > @@ -4188,12 +4315,54 @@ static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
> >
> >                 cleaned = true;
> >                 cleaned_count++;
> > +               length = le16_to_cpu(rx_desc->length);
> > +
> > +               if (prog) {
> > +                       struct page *p = buffer_info->rxbuf.page;
> > +                       dma_addr_t dma = buffer_info->dma;
> > +                       int act;
> > +
> > +                       if (unlikely(!(status & E1000_RXD_STAT_EOP))) {
> > +                               /* attached bpf disallows larger than page
> > +                                * packets, so this is hw error or corruption
> > +                                */
> > +                               pr_info_once("%s buggy !eop\n", netdev->name);
> > +                               break;
> > +                       }
> > +                       if (unlikely(rx_ring->rx_skb_top)) {
> > +                               pr_info_once("%s ring resizing bug\n",
> > +                                            netdev->name);
> > +                               break;
> > +                       }
> > +                       dma_sync_single_for_cpu(&pdev->dev, dma,
> > +                                               length, DMA_FROM_DEVICE);
> > +                       act = e1000_call_bpf(prog, page_address(p), length);
> > +                       switch (act) {
> > +                       case XDP_PASS:
> > +                               break;
> > +                       case XDP_TX:
> > +                               dma_sync_single_for_device(&pdev->dev,
> > +                                                          dma,
> > +                                                          length,
> > +                                                          DMA_TO_DEVICE);
> > +                               e1000_xmit_raw_frame(buffer_info, length,
> > +                                                    netdev, adapter);
> 
> Implementing a new xmit path and clean-up routines for just this is
> going to be a pain.  I'd say if we are going to do something like this
> then maybe we should look at coming up with a new ndo for the xmit and
> maybe push more of this into some sort of inline hook.  Duplicating
> this code in every driver is going to be really expensive.

we will have a common xdp routines when more drivers implement it.
I would expect several pieces fo mlx4/mlx5 can be made common.
ndo approach won't work here, since stack doesn't call into this part.
The xdp logic stays within the driver and dma/page things are driver specific too.
It's pretty much like trying to make common struct between e1000_tx_buffer
and mlx4_en_tx_info. Which is quite difficult.

> Also I just noticed there is no break statement from the xmit code
> above to the drop that below.  I'd think you could overwrite the frame
> data in a case where the Rx exceeds the Tx due to things like flow
> control generating back pressure.

you mean pause frames on TX side?
Aren't 'if (E1000_DESC_UNUSED(tx_ring) < 2) {' enough in e1000_xmit_raw_frame() ?

Thanks for the review!

^ permalink raw reply

* Re: README: [PATCH RFC 11/11] net/mlx5e: XDP TX xmit more
From: Jesper Dangaard Brouer via iovisor-dev @ 2016-09-13 16:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tom Herbert, iovisor-dev, Jamal Hadi Salim, Saeed Mahameed,
	Eric Dumazet, Linux Netdev List, Edward Cree
In-Reply-To: <1473782310.18970.138.camel-XN9IlZ5yJG9HTL0Zs8A6p+yfmBU6pStAUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>

On Tue, 13 Sep 2016 08:58:30 -0700
Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> We also care about icache pressure, and GRO/TSO already provides
> bundling where it is applicable, without adding insane complexity in
> the stacks.

Sorry, I cannot resist. The GRO code is really bad regarding icache
pressure/usage, due to how everything is function pointers calling
function pointers, even if the general case is calling the function
defined just next to it in the same C-file (which usually cause
inlining).  I can easily get 10% more performance for UDP use-cases by
simply disabling the GRO code, and I measure a significant drop in
icache-misses.

Edward's solution should lower icache pressure.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH V3] net-next: dsa: add FIB support
From: Vivien Didelot @ 2016-09-13 16:34 UTC (permalink / raw)
  To: John Crispin, David S. Miller, Andrew Lunn, Florian Fainelli
  Cc: netdev, linux-kernel, Jiri Pirko
In-Reply-To: <4f5e1704-daab-b9d3-c865-fad834d9f80b@phrozen.org>

Hi John,

John Crispin <john@phrozen.org> writes:

> i sent an email to Jiri earlier today and he asked me to drop this
> until his notification series got merged.

That makes sense then. So David should ignore this for the moment.

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: Cyrill Gorcunov @ 2016-09-13 16:31 UTC (permalink / raw)
  To: David Miller
  Cc: dsa, netdev, linux-kernel, eric.dumazet, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen
In-Reply-To: <20160913.115735.1397234520899437294.davem@davemloft.net>

On Tue, Sep 13, 2016 at 11:57:35AM -0400, David Miller wrote:
> > 
> > Thanks for review, David. I updated against net-next.
> 
> Please do not post new versions of patches as replies to existing
> discussions.
> 
> Instead, make fresh patch postings to the list.

Oh, will do. Sorry for inconvenience.

^ permalink raw reply

* Re: [PATCH RFC 03/11] net/mlx5e: Implement RX mapped page cache for page recycle
From: Jesper Dangaard Brouer via iovisor-dev @ 2016-09-13 16:28 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Tom Herbert, iovisor-dev, Jamal Hadi Salim, Saeed Mahameed,
	Eric Dumazet, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <549ee0e2-b76b-ec62-4287-e63c4320e7c6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On Tue, 13 Sep 2016 13:16:29 +0300
Tariq Toukan <tariqt-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:

> On 07/09/2016 9:45 PM, Jesper Dangaard Brouer wrote:
> > On Wed,  7 Sep 2016 15:42:24 +0300 Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> >  
> >> From: Tariq Toukan <tariqt-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >> Instead of reallocating and mapping pages for RX data-path,
> >> recycle already used pages in a per ring cache.
> >>
> >> We ran pktgen single-stream benchmarks, with iptables-raw-drop:
> >>
> >> Single stride, 64 bytes:
> >> * 4,739,057 - baseline
> >> * 4,749,550 - order0 no cache
> >> * 4,786,899 - order0 with cache
> >> 1% gain
> >>
> >> Larger packets, no page cross, 1024 bytes:
> >> * 3,982,361 - baseline
> >> * 3,845,682 - order0 no cache
> >> * 4,127,852 - order0 with cache
> >> 3.7% gain
> >>
> >> Larger packets, every 3rd packet crosses a page, 1500 bytes:
> >> * 3,731,189 - baseline
> >> * 3,579,414 - order0 no cache
> >> * 3,931,708 - order0 with cache
> >> 5.4% gain
> >>
> >> Signed-off-by: Tariq Toukan <tariqt-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> ---
> >>   drivers/net/ethernet/mellanox/mlx5/core/en.h       | 16 ++++++
> >>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 15 ++++++
> >>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 57 ++++++++++++++++++++--
> >>   drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 16 ++++++
> >>   4 files changed, 99 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> >> index 075cdfc..afbdf70 100644
> >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> >> @@ -287,6 +287,18 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
> >>   	u8					tired;
> >>   };
> >>   
> >> +/* a single cache unit is capable to serve one napi call (for non-striding rq)
> >> + * or a MPWQE (for striding rq).
> >> + */
> >> +#define MLX5E_CACHE_UNIT	(MLX5_MPWRQ_PAGES_PER_WQE > NAPI_POLL_WEIGHT ? \
> >> +				 MLX5_MPWRQ_PAGES_PER_WQE : NAPI_POLL_WEIGHT)
> >> +#define MLX5E_CACHE_SIZE	(2 * roundup_pow_of_two(MLX5E_CACHE_UNIT))
> >> +struct mlx5e_page_cache {
> >> +	u32 head;
> >> +	u32 tail;
> >> +	struct mlx5e_dma_info page_cache[MLX5E_CACHE_SIZE];
> >> +};
> >> +  
> > [...]  
> >>   
> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> >> index c1cb510..8e02af3 100644
> >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> >> @@ -305,11 +305,55 @@ static inline void mlx5e_post_umr_wqe(struct mlx5e_rq *rq, u16 ix)
> >>   	mlx5e_tx_notify_hw(sq, &wqe->ctrl, 0);
> >>   }
> >>   
> >> +static inline bool mlx5e_rx_cache_put(struct mlx5e_rq *rq,
> >> +				      struct mlx5e_dma_info *dma_info)
> >> +{
> >> +	struct mlx5e_page_cache *cache = &rq->page_cache;
> >> +	u32 tail_next = (cache->tail + 1) & (MLX5E_CACHE_SIZE - 1);
> >> +
> >> +	if (tail_next == cache->head) {
> >> +		rq->stats.cache_full++;
> >> +		return false;
> >> +	}
> >> +
> >> +	cache->page_cache[cache->tail] = *dma_info;
> >> +	cache->tail = tail_next;
> >> +	return true;
> >> +}
> >> +
> >> +static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq,
> >> +				      struct mlx5e_dma_info *dma_info)
> >> +{
> >> +	struct mlx5e_page_cache *cache = &rq->page_cache;
> >> +
> >> +	if (unlikely(cache->head == cache->tail)) {
> >> +		rq->stats.cache_empty++;
> >> +		return false;
> >> +	}
> >> +
> >> +	if (page_ref_count(cache->page_cache[cache->head].page) != 1) {
> >> +		rq->stats.cache_busy++;
> >> +		return false;
> >> +	}  
> > Hmmm... doesn't this cause "blocking" of the page_cache recycle
> > facility until the page at the head of the queue gets (page) refcnt
> > decremented?  Real use-case could fairly easily block/cause this...  
> Hi Jesper,
> 
> That's right. We are aware of this issue.
> We considered ways of solving this, but decided to keep current 
> implementation for now.
> One way of solving this is to look deeper in the cache.
> Cons:
> - this will consume time, and the chance of finding an available page is 
> not that high: if the page in head of queue is busy then there's a good 
> chance that all the others are too (because of FIFO).
> in other words, you already checked all pages and anyway you're going to 
> allocate a new one (higher penalty for same decision).
> - this will make holes in the array causing complex accounting when 
> looking for an available page (this can easily be fixed by swapping 
> between the page in head and the available one).
> 
> Another way is sharing pages between different RQs.
> - For now we're not doing this for simplicity and to keep 
> synchronization away.
> 
> What do you think?
> 
> Anyway, we're looking forward to use your page-pool API which solves 
> these issues.

Yes, as you mention yourself, the page-pool API solves this problem.
Thus, I'm not sure it is worth investing more time in optimizing this
driver local page cache mechanism.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH V3] net-next: dsa: add FIB support
From: John Crispin @ 2016-09-13 16:27 UTC (permalink / raw)
  To: Vivien Didelot, David S. Miller, Andrew Lunn, Florian Fainelli
  Cc: netdev, linux-kernel, Jiri Pirko
In-Reply-To: <87intz22jy.fsf@ketchup.mtl.sfl>



On 13/09/2016 18:21, Vivien Didelot wrote:
> Hi John,
> 
> John Crispin <john@phrozen.org> writes:
> 
>> @@ -237,6 +237,7 @@ struct switchdev_obj;
>>  struct switchdev_obj_port_fdb;
>>  struct switchdev_obj_port_mdb;
>>  struct switchdev_obj_port_vlan;
>> +struct switchdev_obj_ipv4_fib;
> 
> Can you keep it ordered please (put obj_ipv4 above port_fdb).
> 
>>  
>>  struct dsa_switch_ops {
>>  	struct list_head	list;
>> @@ -386,6 +387,18 @@ struct dsa_switch_ops {
>>  	int	(*port_mdb_dump)(struct dsa_switch *ds, int port,
>>  				 struct switchdev_obj_port_mdb *mdb,
>>  				 int (*cb)(struct switchdev_obj *obj));
>> +
>> +	/*
>> +	 * IPV4 routing
>> +	 */
>> +	int	(*ipv4_fib_prepare)(struct dsa_switch *ds, int port,
>> +				    const struct switchdev_obj_ipv4_fib *fib4,
>> +				    struct switchdev_trans *trans);
>> +	int	(*ipv4_fib_add)(struct dsa_switch *ds, int port,
>> +				const struct switchdev_obj_ipv4_fib *fib4,
>> +				struct switchdev_trans *trans);
> 
> DSA *_add ops should return void, since no error is supposed to occure in
> the commit phase.
> 
> If they are port-based operations, please prefix them with "port_",
> otherwise, the int port parameter is not necessary.
> 
>> +	int	(*ipv4_fib_del)(struct dsa_switch *ds, int port,
>> +				const struct switchdev_obj_ipv4_fib *fib4);
>>  };
>>  
>>  void register_switch_driver(struct dsa_switch_ops *type);
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index 9ecbe78..c974ac0 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -334,6 +334,38 @@ static int dsa_slave_port_mdb_dump(struct net_device *dev,
>>  	return -EOPNOTSUPP;
>>  }
>>  
>> +static int dsa_slave_ipv4_fib_add(struct net_device *dev,
>> +				  const struct switchdev_obj_ipv4_fib *fib4,
>> +				  struct switchdev_trans *trans)
>> +{
>> +	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	struct dsa_switch *ds = p->parent;
>> +	int ret;
>> +
>> +	if (!ds->ops->ipv4_fib_prepare || !ds->ops->ipv4_fib_add)
>> +		return -EOPNOTSUPP;
>> +
>> +	if (switchdev_trans_ph_prepare(trans))
>> +		ret = ds->ops->ipv4_fib_prepare(ds, p->port, fib4, trans);
>> +	else
>> +		ret = ds->ops->ipv4_fib_add(ds, p->port, fib4, trans);
>> +
>> +	return ret;
>> +}
> 
> Please see dsa_slave_port_vlan_add for a better logic with the prepare
> phase and void add routine.
> 
>> +
>> +static int dsa_slave_ipv4_fib_del(struct net_device *dev,
>> +				  const struct switchdev_obj_ipv4_fib *fib4)
>> +{
>> +	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	struct dsa_switch *ds = p->parent;
>> +	int ret = -EOPNOTSUPP;
>> +
>> +	if (ds->ops->ipv4_fib_del)
>> +		ret = ds->ops->ipv4_fib_del(ds, p->port, fib4);
>> +
>> +	return ret;
>> +}
> 
> Just curious, isn't there a dump operation for SWITCHDEV_OBJ_ID_IPV4_FIB?
> 
>> +
>>  static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
>>  {
>>  	struct dsa_slave_priv *p = netdev_priv(dev);
>> @@ -465,6 +497,11 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>>  					      SWITCHDEV_OBJ_PORT_VLAN(obj),
>>  					      trans);
>>  		break;
>> +	case SWITCHDEV_OBJ_ID_IPV4_FIB:
>> +		err = dsa_slave_ipv4_fib_add(dev,
>> +					     SWITCHDEV_OBJ_IPV4_FIB(obj),
>> +					     trans);
>> +		break;
>>  	default:
>>  		err = -EOPNOTSUPP;
>>  		break;
>> @@ -490,6 +527,10 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
>>  		err = dsa_slave_port_vlan_del(dev,
>>  					      SWITCHDEV_OBJ_PORT_VLAN(obj));
>>  		break;
>> +	case SWITCHDEV_OBJ_ID_IPV4_FIB:
>> +		err = dsa_slave_ipv4_fib_del(dev,
>> +					     SWITCHDEV_OBJ_IPV4_FIB(obj));
>> +		break;
>>  	default:
>>  		err = -EOPNOTSUPP;
>>  		break;
> 
> Please keep the SWITCHDEV_OBJ_ID_IPV4_FIB case ordered with other cases
> as well.
> 
> I'm adding Jiri's in the loop, since he has started a thread on FIB
> notifications a few days ago, his feedback might be interesting. If I'm
> not mistaken, there is a plan to factorize FID routines (not sure).
> 
> Thanks,
> 
>         Vivien

Hi Vivien,

i sent an email to Jiri earlier today and he asked me to drop this until
his notification series got merged.

	John

> 

^ permalink raw reply

* Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
From: Tom Herbert @ 2016-09-13 16:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, John Fastabend, Brenden Blanco, Jeff Kirsher,
	Jesper Dangaard Brouer, David S. Miller, Cong Wang,
	intel-wired-lan, William Tu, Linux Kernel Network Developers
In-Reply-To: <20160913012815.GB25756@ast-mbp.thefacebook.com>

On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
>> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
>> >
>> >> yep. there are various ways to shoot yourself in the foot with xdp.
>> >> The simplest program that drops all the packets will make the box unpingable.
>> >
>> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
>> > scooter on 101 highway ;)
>> >
>> > This XDP_TX thing was one of the XDP marketing stuff, but there is
>> > absolutely no documentation on it, warning users about possible
>> > limitations/outcomes.
>> >
>> > BTW, I am not sure mlx4 implementation even works, vs BQL :
>> >
>> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
>> > but tx completion will call netdev_tx_completed_queue() -> crash
>> >
>> > Do we have one test to validate that a XDP_TX implementation is actually
>> > correct ?
>> >
>> Obviously not for e1000 :-(. We really need some real test and
>> performance results and analysis on the interaction between the stack
>> data path and XDP data path.
>
> no. we don't need it for e1k and we cannot really do it.
> <broken record mode on> this patch is for debugging of xdp programs only.
>
You can say this "only for a debugging" a thousand times and that
still won't justify putting bad code into the kernel. Material issues
have been raised with these patches, I have proposed a fix for one
core issue, and we have requested a lot more testing. So, please, if
you really want to move these patches forward start addressing the
concerns being raised by reviewers.

Tom

>> The fact that these changes are being
>> passed of as something only needed for KCM is irrelevant, e1000 is a
>> well deployed a NIC and there's no restriction that I see that would
>> prevent any users from enabling this feature on real devices.
>
> e1k is not even manufactured any more. Probably the only place
> where it can be found is computer history museum.
> e1000e fairs slightly better, but it's a different nic and this
> patch is not about it.
>

^ permalink raw reply

* Re: [PATCH V3] net-next: dsa: add FIB support
From: Vivien Didelot @ 2016-09-13 16:21 UTC (permalink / raw)
  To: John Crispin, David S. Miller, Andrew Lunn, Florian Fainelli
  Cc: netdev, linux-kernel, John Crispin, Jiri Pirko
In-Reply-To: <1473746541-52314-1-git-send-email-john@phrozen.org>

Hi John,

John Crispin <john@phrozen.org> writes:

> @@ -237,6 +237,7 @@ struct switchdev_obj;
>  struct switchdev_obj_port_fdb;
>  struct switchdev_obj_port_mdb;
>  struct switchdev_obj_port_vlan;
> +struct switchdev_obj_ipv4_fib;

Can you keep it ordered please (put obj_ipv4 above port_fdb).

>  
>  struct dsa_switch_ops {
>  	struct list_head	list;
> @@ -386,6 +387,18 @@ struct dsa_switch_ops {
>  	int	(*port_mdb_dump)(struct dsa_switch *ds, int port,
>  				 struct switchdev_obj_port_mdb *mdb,
>  				 int (*cb)(struct switchdev_obj *obj));
> +
> +	/*
> +	 * IPV4 routing
> +	 */
> +	int	(*ipv4_fib_prepare)(struct dsa_switch *ds, int port,
> +				    const struct switchdev_obj_ipv4_fib *fib4,
> +				    struct switchdev_trans *trans);
> +	int	(*ipv4_fib_add)(struct dsa_switch *ds, int port,
> +				const struct switchdev_obj_ipv4_fib *fib4,
> +				struct switchdev_trans *trans);

DSA *_add ops should return void, since no error is supposed to occure in
the commit phase.

If they are port-based operations, please prefix them with "port_",
otherwise, the int port parameter is not necessary.

> +	int	(*ipv4_fib_del)(struct dsa_switch *ds, int port,
> +				const struct switchdev_obj_ipv4_fib *fib4);
>  };
>  
>  void register_switch_driver(struct dsa_switch_ops *type);
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 9ecbe78..c974ac0 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -334,6 +334,38 @@ static int dsa_slave_port_mdb_dump(struct net_device *dev,
>  	return -EOPNOTSUPP;
>  }
>  
> +static int dsa_slave_ipv4_fib_add(struct net_device *dev,
> +				  const struct switchdev_obj_ipv4_fib *fib4,
> +				  struct switchdev_trans *trans)
> +{
> +	struct dsa_slave_priv *p = netdev_priv(dev);
> +	struct dsa_switch *ds = p->parent;
> +	int ret;
> +
> +	if (!ds->ops->ipv4_fib_prepare || !ds->ops->ipv4_fib_add)
> +		return -EOPNOTSUPP;
> +
> +	if (switchdev_trans_ph_prepare(trans))
> +		ret = ds->ops->ipv4_fib_prepare(ds, p->port, fib4, trans);
> +	else
> +		ret = ds->ops->ipv4_fib_add(ds, p->port, fib4, trans);
> +
> +	return ret;
> +}

Please see dsa_slave_port_vlan_add for a better logic with the prepare
phase and void add routine.

> +
> +static int dsa_slave_ipv4_fib_del(struct net_device *dev,
> +				  const struct switchdev_obj_ipv4_fib *fib4)
> +{
> +	struct dsa_slave_priv *p = netdev_priv(dev);
> +	struct dsa_switch *ds = p->parent;
> +	int ret = -EOPNOTSUPP;
> +
> +	if (ds->ops->ipv4_fib_del)
> +		ret = ds->ops->ipv4_fib_del(ds, p->port, fib4);
> +
> +	return ret;
> +}

Just curious, isn't there a dump operation for SWITCHDEV_OBJ_ID_IPV4_FIB?

> +
>  static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
>  {
>  	struct dsa_slave_priv *p = netdev_priv(dev);
> @@ -465,6 +497,11 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>  					      SWITCHDEV_OBJ_PORT_VLAN(obj),
>  					      trans);
>  		break;
> +	case SWITCHDEV_OBJ_ID_IPV4_FIB:
> +		err = dsa_slave_ipv4_fib_add(dev,
> +					     SWITCHDEV_OBJ_IPV4_FIB(obj),
> +					     trans);
> +		break;
>  	default:
>  		err = -EOPNOTSUPP;
>  		break;
> @@ -490,6 +527,10 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
>  		err = dsa_slave_port_vlan_del(dev,
>  					      SWITCHDEV_OBJ_PORT_VLAN(obj));
>  		break;
> +	case SWITCHDEV_OBJ_ID_IPV4_FIB:
> +		err = dsa_slave_ipv4_fib_del(dev,
> +					     SWITCHDEV_OBJ_IPV4_FIB(obj));
> +		break;
>  	default:
>  		err = -EOPNOTSUPP;
>  		break;

Please keep the SWITCHDEV_OBJ_ID_IPV4_FIB case ordered with other cases
as well.

I'm adding Jiri's in the loop, since he has started a thread on FIB
notifications a few days ago, his feedback might be interesting. If I'm
not mistaken, there is a plan to factorize FID routines (not sure).

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH v3 net 1/1] net sched actions: fix GETing actions
From: Cong Wang @ 2016-09-13 16:20 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David Miller, Linux Kernel Network Developers
In-Reply-To: <1473721658-6034-1-git-send-email-jhs@emojatatu.com>

On Mon, Sep 12, 2016 at 4:07 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>
>
> With the batch changes that translated transient actions into
> a temporary list lost in the translation was the fact that
> tcf_action_destroy() will eventually delete the action from
> the permanent location if the refcount is zero.
>
> Example of what broke:
> ...add a gact action to drop
> sudo $TC actions add action drop index 10
> ...now retrieve it, looks good
> sudo $TC actions get action gact index 10
> ...retrieve it again and find it is gone!
> sudo $TC actions get action gact index 10
>
> Fixes:
> commit 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array"),
> commit 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
> commit f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")
>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  net/sched/act_api.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index d09d068..50720b1 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -592,6 +592,16 @@ err_out:
>         return ERR_PTR(err);
>  }
>
> +static void cleanup_a(struct list_head *actions, int ovr)
> +{
> +       struct tc_action *a;
> +
> +       list_for_each_entry(a, actions, list) {
> +               if (ovr)
> +                       a->tcfa_refcnt -= 1;
> +       }
> +}
> +
>  int tcf_action_init(struct net *net, struct nlattr *nla,
>                                   struct nlattr *est, char *name, int ovr,
>                                   int bind, struct list_head *actions)
> @@ -612,8 +622,15 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
>                         goto err;
>                 }
>                 act->order = i;
> +               if (ovr)
> +                       act->tcfa_refcnt += 1;
>                 list_add_tail(&act->list, actions);
>         }
> +
> +       /* Remove the temp refcnt which was necessary to protect against
> +        * destroying an existing action which was being replaced
> +        */
> +       cleanup_a(actions, ovr);
>         return 0;

I am still trying to understand this piece, so here you hold the refcnt
for the same action used by the later iteration? Otherwise there is
almost none user inbetween hold and release...

The comment you add is not clear to me, we use RTNL/RCU to
sync destroy and replace, so how could that happen?

Thanks.

^ permalink raw reply

* Re: [patch net-next 0/5] mlxsw: ethtool enhancements
From: David Miller @ 2016-09-13 16:17 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, yotamg, nogahf, ogerlitz
In-Reply-To: <1473679587-9112-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Mon, 12 Sep 2016 13:26:22 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Ido says:
> 
> Patches 1-4 do some minor cleanup in current ethtool ops. Patch 5
> replace legacy {get,set}_settings callbacks with
> {get,set}_link_ksettings.

Series applied, thanks guys.

^ permalink raw reply

* Re: [PATCH net-next 0/7] cxgb4: add support for offloading TC u32 filters
From: David Miller @ 2016-09-13 16:12 UTC (permalink / raw)
  To: rahul.lakkireddy; +Cc: netdev, hariprasad, leedom, nirranjan, indranil
In-Reply-To: <cover.1473667613.git.rahul.lakkireddy@chelsio.com>

From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Date: Mon, 12 Sep 2016 13:42:33 +0530

> This series of patches add support to offload TC u32 filters onto
> Chelsio NICs.
> 
> Patch 1 moves current common filter code to separate files
> in order to provide a common api for performing packet classification
> and filtering in Chelsio NICs.
> 
> Patch 2 enables filters for normal NIC configuration and implements
> common api for setting and deleting filters.
> 
> Patch 3 provides a debugfs for dumping filter information.
> 
> Patches 4-7 add support for TC u32 offload via ndo_setup_tc.

Like Jiri I'm getting increasingly disappointed how liberally people
toss tons of things into debugfs.

This seems to be the way driver developers throw their hands into the
air when they can't find a quick and easy way to export some piece of
information.

Please work on the long term usability of the kernel and all
networking drivers by finding a bonafide piece of existing
infrastructure by which to export things, or build a new one if
needed.

Thanks.

^ permalink raw reply

* Re: [PATCH] drivers: net: phy: xgene: Fix 'remove' function
From: David Miller @ 2016-09-13 16:06 UTC (permalink / raw)
  To: christophe.jaillet
  Cc: isubramanian, kchudgar, f.fainelli, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <1473623014-7801-1-git-send-email-christophe.jaillet@wanadoo.fr>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Sun, 11 Sep 2016 21:43:34 +0200

> If 'IS_ERR(pdata->clk)' is true, then 'clk_disable_unprepare(pdata->clk)'
> will do nothing.
> 
> It is likely that 'if (!IS_ERR(pdata->clk))' was expected here.
> In fact, the test can even be removed because 'clk_disable_unprepare'
> already handles such cases.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Applied.

^ permalink raw reply

* Re: [PATCH 3/3] net-next: dsa: add new driver for qca8xxx family
From: Andrew Lunn @ 2016-09-13 15:59 UTC (permalink / raw)
  To: John Crispin, Florian Fainelli
  Cc: David S. Miller, Florian Fainelli, netdev, linux-kernel,
	qsdk-review
In-Reply-To: <7f9c1463-1cc0-e230-e7fe-dbb434427251@phrozen.org>

> Hi Andrew,
> 
> this function does indeed duplicate the functionality of
> phy_ethtool_get_eee() with the small difference, that e->eee_active is
> also set which phy_ethtool_get_eee() does not set.
> 
> dsa_slave_get_eee() will call phy_ethtool_get_eee() right after the
> get_eee() op has been called. would it be ok to move the code setting
> eee_active to  phy_ethtool_get_eee().

Hi John

I think that is a question for Florian.

  Andrew

^ permalink raw reply

* Re: README: [PATCH RFC 11/11] net/mlx5e: XDP TX xmit more
From: Eric Dumazet via iovisor-dev @ 2016-09-13 15:58 UTC (permalink / raw)
  To: Edward Cree
  Cc: Tom Herbert, iovisor-dev, Jamal Hadi Salim, Saeed Mahameed,
	Eric Dumazet, Linux Netdev List
In-Reply-To: <d8a477c6-5394-ab33-443f-59d75a58f430-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>

On Tue, 2016-09-13 at 16:20 +0100, Edward Cree wrote:
> On 12/09/16 11:15, Jesper Dangaard Brouer wrote:
> > I'm reacting so loudly, because this is a mental model switch, that
> > need to be applied to the full drivers RX path. Also for normal stack
> > delivery of SKBs. As both Edward Cree[1] and I[2] have demonstrated,
> > there is between 10%-25% perf gain here.
> >
> > [1] http://lists.openwall.net/netdev/2016/04/19/89
> > [2] http://lists.openwall.net/netdev/2016/01/15/51
> BTW, I'd also still rather like to see that happen, I never really
> understood the objections people had to those patches when I posted them.  I
> still believe that dealing in skb-lists instead of skbs, and thus
> 'automatically' bulking similar packets, is better than trying to categorise
> packets into flows early on based on some set of keys.  The problem with the
> latter approach is that there are now two definitions of "similar":
> 1) the set of fields used to index the flow
> 2) what will actually cause the stack's behaviour to differ if not using the
> cached values.
> Quite apart from the possibility of bugs if one changes but not the other,
> this forces (1) to be conservative, only considering things "similar" if the
> entire stack will.  Whereas with bundling, the stack can keep packets
> together until they reach a layer at which they are no longer "similar"
> enough.  Thus, for instance, packets with the same IP 3-tuple but different
> port numbers can be grouped together for IP layer processing, then split
> apart for L4.

To be fair you never showed us the numbers for DDOS traffic, and you did
not show us how typical TCP + netfilter modules kind of traffic would be
handled.

Show us real numbers, not synthetic ones, say when receiving traffic on
100,000 or more TCP sockets.

We also care about icache pressure, and GRO/TSO already provides
bundling where it is applicable, without adding insane complexity in the
stacks.

Just look at how complex the software fallbacks for GSO/checksumming
are, how many bugs we had to fix... And this is only at the edge of our
stack.

^ permalink raw reply

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: David Miller @ 2016-09-13 15:57 UTC (permalink / raw)
  To: gorcunov
  Cc: dsa, netdev, linux-kernel, eric.dumazet, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen
In-Reply-To: <20160911191714.GC2001@uranus.lan>

From: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Sun, 11 Sep 2016 22:17:15 +0300

> On Sat, Sep 10, 2016 at 04:28:40PM -0600, David Ahern wrote:
>> On 9/10/16 4:05 PM, Cyrill Gorcunov wrote:
>> > On Sat, Sep 10, 2016 at 10:31:35AM -0600, David Ahern wrote:
>> >>
>> >> Would you mind adding the destroy capability as well? The udp version
>> >> should be close to what is needed for raw sockets. See udp_diag_destroy
>> >> and udp_abort.
>> > 
>> > Should be something like below. Didn't tested it yet so for review only.
>> > Will do testing at Monday.
>> 
>> doesn't compile:
>> - raw_abort needs to be in a header for ipv6, and
>> - inet_sk_diag_fill args have changed due to a recent commit
> 
> Thanks for review, David. I updated against net-next.

Please do not post new versions of patches as replies to existing
discussions.

Instead, make fresh patch postings to the list.

Thanks.

^ permalink raw reply

* [net-next PATCH 11/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_rx_data_ack()
From: Varun Prakash @ 2016-09-13 15:54 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_mk_rx_data_ack() to remove duplicate
code to form CPL_RX_DATA_ACK hardware command.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/cm.c                  | 19 ++++++++-----------
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h | 15 +++++++++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 16 ++++++----------
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index a6d5fcb..3cbbfbe 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1354,9 +1354,9 @@ static void established_upcall(struct c4iw_ep *ep)
 
 static int update_rx_credits(struct c4iw_ep *ep, u32 credits)
 {
-	struct cpl_rx_data_ack *req;
 	struct sk_buff *skb;
-	int wrlen = roundup(sizeof *req, 16);
+	u32 wrlen = roundup(sizeof(struct cpl_rx_data_ack), 16);
+	u32 credit_dack;
 
 	PDBG("%s ep %p tid %u credits %u\n", __func__, ep, ep->hwtid, credits);
 	skb = get_skb(NULL, wrlen, GFP_KERNEL);
@@ -1373,15 +1373,12 @@ static int update_rx_credits(struct c4iw_ep *ep, u32 credits)
 	if (ep->rcv_win > RCV_BUFSIZ_M * 1024)
 		credits += ep->rcv_win - RCV_BUFSIZ_M * 1024;
 
-	req = (struct cpl_rx_data_ack *) skb_put(skb, wrlen);
-	memset(req, 0, wrlen);
-	INIT_TP_WR(req, ep->hwtid);
-	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK,
-						    ep->hwtid));
-	req->credit_dack = cpu_to_be32(credits | RX_FORCE_ACK_F |
-				       RX_DACK_CHANGE_F |
-				       RX_DACK_MODE_V(dack_mode));
-	set_wr_txq(skb, CPL_PRIORITY_ACK, ep->ctrlq_idx);
+	credit_dack = credits | RX_FORCE_ACK_F | RX_DACK_CHANGE_F |
+		      RX_DACK_MODE_V(dack_mode);
+
+	cxgb_mk_rx_data_ack(skb, wrlen, ep->hwtid, ep->ctrlq_idx,
+			    credit_dack);
+
 	c4iw_ofld_send(&ep->com.dev->rdev, skb);
 	return credits;
 }
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index 70999e8..515b94f 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -142,4 +142,19 @@ cxgb_mk_abort_rpl(struct sk_buff *skb, u32 len, u32 tid, u16 chan)
 	rpl->cmd = CPL_ABORT_NO_RST;
 	set_wr_txq(skb, CPL_PRIORITY_DATA, chan);
 }
+
+static inline void
+cxgb_mk_rx_data_ack(struct sk_buff *skb, u32 len, u32 tid, u16 chan,
+		    u32 credit_dack)
+{
+	struct cpl_rx_data_ack *req;
+
+	req = (struct cpl_rx_data_ack *)__skb_put(skb, len);
+	memset(req, 0, len);
+
+	INIT_TP_WR(req, tid);
+	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK, tid));
+	req->credit_dack = cpu_to_be32(credit_dack);
+	set_wr_txq(skb, CPL_PRIORITY_ACK, chan);
+}
 #endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index 9bdbe3b..2fb1bf1 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -994,22 +994,18 @@ static void cxgbit_send_rx_credits(struct cxgbit_sock *csk, struct sk_buff *skb)
 int cxgbit_rx_data_ack(struct cxgbit_sock *csk)
 {
 	struct sk_buff *skb;
-	struct cpl_rx_data_ack *req;
-	unsigned int len = roundup(sizeof(*req), 16);
+	u32 len = roundup(sizeof(struct cpl_rx_data_ack), 16);
+	u32 credit_dack;
 
 	skb = alloc_skb(len, GFP_KERNEL);
 	if (!skb)
 		return -1;
 
-	req = (struct cpl_rx_data_ack *)__skb_put(skb, len);
-	memset(req, 0, len);
+	credit_dack = RX_DACK_CHANGE_F | RX_DACK_MODE_V(1) |
+		      RX_CREDITS_V(csk->rx_credits);
 
-	set_wr_txq(skb, CPL_PRIORITY_ACK, csk->ctrlq_idx);
-	INIT_TP_WR(req, csk->tid);
-	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_RX_DATA_ACK,
-						    csk->tid));
-	req->credit_dack = cpu_to_be32(RX_DACK_CHANGE_F | RX_DACK_MODE_V(1) |
-				       RX_CREDITS_V(csk->rx_credits));
+	cxgb_mk_rx_data_ack(skb, len, csk->tid, csk->ctrlq_idx,
+			    credit_dack);
 
 	csk->rx_credits = 0;
 
-- 
2.0.2

^ permalink raw reply related

* [net-next PATCH 09/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_abort_req()
From: Varun Prakash @ 2016-09-13 15:54 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_mk_abort_req() to remove duplicate code
to form CPL_ABORT_REQ hardware command.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/cm.c                  | 13 ++++---------
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h | 16 ++++++++++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 13 +++----------
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 22bccd8..484196e 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -652,21 +652,16 @@ static int send_halfclose(struct c4iw_ep *ep)
 
 static int send_abort(struct c4iw_ep *ep)
 {
-	struct cpl_abort_req *req;
-	int wrlen = roundup(sizeof *req, 16);
+	u32 wrlen = roundup(sizeof(struct cpl_abort_req), 16);
 	struct sk_buff *req_skb = skb_dequeue(&ep->com.ep_skb_list);
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 	if (WARN_ON(!req_skb))
 		return -ENOMEM;
 
-	set_wr_txq(req_skb, CPL_PRIORITY_DATA, ep->txq_idx);
-	t4_set_arp_err_handler(req_skb, ep, abort_arp_failure);
-	req = (struct cpl_abort_req *)skb_put(req_skb, wrlen);
-	memset(req, 0, wrlen);
-	INIT_TP_WR(req, ep->hwtid);
-	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_REQ, ep->hwtid));
-	req->cmd = CPL_ABORT_SEND_RST;
+	cxgb_mk_abort_req(req_skb, wrlen, ep->hwtid, ep->txq_idx,
+			  ep, abort_arp_failure);
+
 	return c4iw_l2t_send(&ep->com.dev->rdev, req_skb, ep->l2t);
 }
 
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index e77661d..2d3a3bf 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -112,4 +112,20 @@ cxgb_mk_close_con_req(struct sk_buff *skb, u32 len, u32 tid, u16 chan,
 	set_wr_txq(skb, CPL_PRIORITY_DATA, chan);
 	t4_set_arp_err_handler(skb, handle, handler);
 }
+
+static inline void
+cxgb_mk_abort_req(struct sk_buff *skb, u32 len, u32 tid, u16 chan,
+		  void *handle, arp_err_handler_t handler)
+{
+	struct cpl_abort_req *req;
+
+	req = (struct cpl_abort_req *)__skb_put(skb, len);
+	memset(req, 0, len);
+
+	INIT_TP_WR(req, tid);
+	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_REQ, tid));
+	req->cmd = CPL_ABORT_SEND_RST;
+	set_wr_txq(skb, CPL_PRIORITY_DATA, chan);
+	t4_set_arp_err_handler(skb, handle, handler);
+}
 #endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index a8f5f36..f2b737e 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -647,9 +647,8 @@ static void cxgbit_abort_arp_failure(void *handle, struct sk_buff *skb)
 
 static int cxgbit_send_abort_req(struct cxgbit_sock *csk)
 {
-	struct cpl_abort_req *req;
-	unsigned int len = roundup(sizeof(*req), 16);
 	struct sk_buff *skb;
+	u32 len = roundup(sizeof(struct cpl_abort_req), 16);
 
 	pr_debug("%s: csk %p tid %u; state %d\n",
 		 __func__, csk, csk->tid, csk->com.state);
@@ -660,15 +659,9 @@ static int cxgbit_send_abort_req(struct cxgbit_sock *csk)
 		cxgbit_send_tx_flowc_wr(csk);
 
 	skb = __skb_dequeue(&csk->skbq);
-	req = (struct cpl_abort_req *)__skb_put(skb, len);
-	memset(req, 0, len);
+	cxgb_mk_abort_req(skb, len, csk->tid, csk->txq_idx,
+			  csk->com.cdev, cxgbit_abort_arp_failure);
 
-	set_wr_txq(skb, CPL_PRIORITY_DATA, csk->txq_idx);
-	t4_set_arp_err_handler(skb, csk->com.cdev, cxgbit_abort_arp_failure);
-	INIT_TP_WR(req, csk->tid);
-	OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_REQ,
-						    csk->tid));
-	req->cmd = CPL_ABORT_SEND_RST;
 	return cxgbit_l2t_send(csk->com.cdev, skb, csk->l2t);
 }
 
-- 
2.0.2

^ permalink raw reply related

* [net-next PATCH 06/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_compute_wscale()
From: Varun Prakash @ 2016-09-13 15:54 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_compute_wscale() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/cm.c                  | 12 ++++++------
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h            |  9 ---------
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h |  9 +++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 11 +----------
 4 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index c3c678f..b9d77df 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -689,7 +689,7 @@ static int send_connect(struct c4iw_ep *ep)
 	u64 opt0;
 	u32 opt2;
 	unsigned int mtu_idx;
-	int wscale;
+	u32 wscale;
 	int win, sizev4, sizev6, wrlen;
 	struct sockaddr_in *la = (struct sockaddr_in *)
 				 &ep->com.local_addr;
@@ -739,7 +739,7 @@ static int send_connect(struct c4iw_ep *ep)
 	cxgb_best_mtu(ep->com.dev->rdev.lldi.mtus, ep->mtu, &mtu_idx,
 		      enable_tcp_timestamps,
 		      (ep->com.remote_addr.ss_family == AF_INET) ? 0 : 1);
-	wscale = compute_wscale(rcv_win);
+	wscale = cxgb_compute_wscale(rcv_win);
 
 	/*
 	 * Specify the largest window that will fit in opt0. The
@@ -1891,7 +1891,7 @@ static int send_fw_act_open_req(struct c4iw_ep *ep, unsigned int atid)
 	struct sk_buff *skb;
 	struct fw_ofld_connection_wr *req;
 	unsigned int mtu_idx;
-	int wscale;
+	u32 wscale;
 	struct sockaddr_in *sin;
 	int win;
 
@@ -1919,7 +1919,7 @@ static int send_fw_act_open_req(struct c4iw_ep *ep, unsigned int atid)
 	cxgb_best_mtu(ep->com.dev->rdev.lldi.mtus, ep->mtu, &mtu_idx,
 		      enable_tcp_timestamps,
 		      (ep->com.remote_addr.ss_family == AF_INET) ? 0 : 1);
-	wscale = compute_wscale(rcv_win);
+	wscale = cxgb_compute_wscale(rcv_win);
 
 	/*
 	 * Specify the largest window that will fit in opt0. The
@@ -2339,7 +2339,7 @@ static int accept_cr(struct c4iw_ep *ep, struct sk_buff *skb,
 	unsigned int mtu_idx;
 	u64 opt0;
 	u32 opt2;
-	int wscale;
+	u32 wscale;
 	struct cpl_t5_pass_accept_rpl *rpl5 = NULL;
 	int win;
 	enum chip_type adapter_type = ep->com.dev->rdev.lldi.adapter_type;
@@ -2363,7 +2363,7 @@ static int accept_cr(struct c4iw_ep *ep, struct sk_buff *skb,
 	cxgb_best_mtu(ep->com.dev->rdev.lldi.mtus, ep->mtu, &mtu_idx,
 		      enable_tcp_timestamps && req->tcpopt.tstamp,
 		      (ep->com.remote_addr.ss_family == AF_INET) ? 0 : 1);
-	wscale = compute_wscale(rcv_win);
+	wscale = cxgb_compute_wscale(rcv_win);
 
 	/*
 	 * Specify the largest window that will fit in opt0. The
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index aa47e0a..6a9bef1f 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -881,15 +881,6 @@ static inline struct c4iw_listen_ep *to_listen_ep(struct iw_cm_id *cm_id)
 	return cm_id->provider_data;
 }
 
-static inline int compute_wscale(int win)
-{
-	int wscale = 0;
-
-	while (wscale < 14 && (65535<<wscale) < win)
-		wscale++;
-	return wscale;
-}
-
 static inline int ocqp_supported(const struct cxgb4_lld_info *infop)
 {
 #if defined(__i386__) || defined(__x86_64__) || defined(CONFIG_PPC64)
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index 7fb4feb..ecf3baa 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -74,4 +74,13 @@ cxgb_best_mtu(const unsigned short *mtus, unsigned short mtu,
 
 	cxgb4_best_aligned_mtu(mtus, hdr_size, data_size, 8, idx);
 }
+
+static inline u32 cxgb_compute_wscale(u32 win)
+{
+	u32 wscale = 0;
+
+	while (wscale < 14 && (65535 << wscale) < win)
+		wscale++;
+	return wscale;
+}
 #endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index b09c09b..cd29c91 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -1085,15 +1085,6 @@ out:
 	return -ENOMEM;
 }
 
-static u32 cxgbit_compute_wscale(u32 win)
-{
-	u32 wscale = 0;
-
-	while (wscale < 14 && (65535 << wscale) < win)
-		wscale++;
-	return wscale;
-}
-
 static void
 cxgbit_pass_accept_rpl(struct cxgbit_sock *csk, struct cpl_pass_accept_req *req)
 {
@@ -1124,7 +1115,7 @@ cxgbit_pass_accept_rpl(struct cxgbit_sock *csk, struct cpl_pass_accept_req *req)
 	cxgb_best_mtu(csk->com.cdev->lldi.mtus, csk->mtu, &mtu_idx,
 		      req->tcpopt.tstamp,
 		      (csk->com.remote_addr.ss_family == AF_INET) ? 0 : 1);
-	wscale = cxgbit_compute_wscale(csk->rcv_win);
+	wscale = cxgb_compute_wscale(csk->rcv_win);
 	/*
 	 * Specify the largest window that will fit in opt0. The
 	 * remainder will be specified in the rx_data_ack.
-- 
2.0.2

^ permalink raw reply related

* [net-next PATCH 04/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_is_neg_adv()
From: Varun Prakash @ 2016-09-13 15:53 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_is_neg_adv() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/cm.c                  | 15 +++------------
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h |  9 +++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 11 +----------
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index a08a748..b35fdc0 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1987,15 +1987,6 @@ static inline int act_open_has_tid(int status)
 		status != CPL_ERR_CONN_EXIST);
 }
 
-/* Returns whether a CPL status conveys negative advice.
- */
-static int is_neg_adv(unsigned int status)
-{
-	return status == CPL_ERR_RTX_NEG_ADVICE ||
-	       status == CPL_ERR_PERSIST_NEG_ADVICE ||
-	       status == CPL_ERR_KEEPALV_NEG_ADVICE;
-}
-
 static char *neg_adv_str(unsigned int status)
 {
 	switch (status) {
@@ -2235,7 +2226,7 @@ static int act_open_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
 	PDBG("%s ep %p atid %u status %u errno %d\n", __func__, ep, atid,
 	     status, status2errno(status));
 
-	if (is_neg_adv(status)) {
+	if (cxgb_is_neg_adv(status)) {
 		PDBG("%s Connection problems for atid %u status %u (%s)\n",
 		     __func__, atid, status, neg_adv_str(status));
 		ep->stats.connect_neg_adv++;
@@ -2751,7 +2742,7 @@ static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
 	if (!ep)
 		return 0;
 
-	if (is_neg_adv(req->status)) {
+	if (cxgb_is_neg_adv(req->status)) {
 		PDBG("%s Negative advice on abort- tid %u status %d (%s)\n",
 		     __func__, ep->hwtid, req->status,
 		     neg_adv_str(req->status));
@@ -4227,7 +4218,7 @@ static int peer_abort_intr(struct c4iw_dev *dev, struct sk_buff *skb)
 		kfree_skb(skb);
 		return 0;
 	}
-	if (is_neg_adv(req->status)) {
+	if (cxgb_is_neg_adv(req->status)) {
 		PDBG("%s Negative advice on abort- tid %u status %d (%s)\n",
 		     __func__, ep->hwtid, req->status,
 		     neg_adv_str(req->status));
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index c4df04a..57fcc98 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -47,4 +47,13 @@ struct dst_entry *
 cxgb_find_route6(struct cxgb4_lld_info *,
 		 struct net_device *(*)(struct net_device *),
 		 __u8 *, __u8 *, __be16, __be16, u8, __u32);
+
+/* Returns whether a CPL status conveys negative advice.
+ */
+static inline bool cxgb_is_neg_adv(unsigned int status)
+{
+	return status == CPL_ERR_RTX_NEG_ADVICE ||
+	       status == CPL_ERR_PERSIST_NEG_ADVICE ||
+	       status == CPL_ERR_KEEPALV_NEG_ADVICE;
+}
 #endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index e961ac4..c46bdd5 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -73,15 +73,6 @@ out:
 	return wr_waitp->ret;
 }
 
-/* Returns whether a CPL status conveys negative advice.
- */
-static int cxgbit_is_neg_adv(unsigned int status)
-{
-	return status == CPL_ERR_RTX_NEG_ADVICE ||
-		status == CPL_ERR_PERSIST_NEG_ADVICE ||
-		status == CPL_ERR_KEEPALV_NEG_ADVICE;
-}
-
 static int cxgbit_np_hashfn(const struct cxgbit_np *cnp)
 {
 	return ((unsigned long)cnp >> 10) & (NP_INFO_HASH_SIZE - 1);
@@ -1704,7 +1695,7 @@ static void cxgbit_abort_req_rss(struct cxgbit_sock *csk, struct sk_buff *skb)
 	pr_debug("%s: csk %p; tid %u; state %d\n",
 		 __func__, csk, tid, csk->com.state);
 
-	if (cxgbit_is_neg_adv(hdr->status)) {
+	if (cxgb_is_neg_adv(hdr->status)) {
 		pr_err("%s: got neg advise %d on tid %u\n",
 		       __func__, hdr->status, tid);
 		goto rel_skb;
-- 
2.0.2

^ permalink raw reply related

* [net-next PATCH 03/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route6()
From: Varun Prakash @ 2016-09-13 15:53 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_find_route6() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/cm.c                  | 70 ++++++-----------------
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c | 33 +++++++++++
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h |  4 ++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 51 ++---------------
 4 files changed, 61 insertions(+), 97 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 02f5e20..a08a748 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -465,46 +465,6 @@ static struct net_device *get_real_dev(struct net_device *egress_dev)
 	return rdma_vlan_dev_real_dev(egress_dev) ? : egress_dev;
 }
 
-static int our_interface(struct c4iw_dev *dev, struct net_device *egress_dev)
-{
-	int i;
-
-	egress_dev = get_real_dev(egress_dev);
-	for (i = 0; i < dev->rdev.lldi.nports; i++)
-		if (dev->rdev.lldi.ports[i] == egress_dev)
-			return 1;
-	return 0;
-}
-
-static struct dst_entry *find_route6(struct c4iw_dev *dev, __u8 *local_ip,
-				     __u8 *peer_ip, __be16 local_port,
-				     __be16 peer_port, u8 tos,
-				     __u32 sin6_scope_id)
-{
-	struct dst_entry *dst = NULL;
-
-	if (IS_ENABLED(CONFIG_IPV6)) {
-		struct flowi6 fl6;
-
-		memset(&fl6, 0, sizeof(fl6));
-		memcpy(&fl6.daddr, peer_ip, 16);
-		memcpy(&fl6.saddr, local_ip, 16);
-		if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
-			fl6.flowi6_oif = sin6_scope_id;
-		dst = ip6_route_output(&init_net, NULL, &fl6);
-		if (!dst)
-			goto out;
-		if (!our_interface(dev, ip6_dst_idev(dst)->dev) &&
-		    !(ip6_dst_idev(dst)->dev->flags & IFF_LOOPBACK)) {
-			dst_release(dst);
-			dst = NULL;
-		}
-	}
-
-out:
-	return dst;
-}
-
 static void arp_failure_discard(void *handle, struct sk_buff *skb)
 {
 	pr_err(MOD "ARP failure\n");
@@ -2197,10 +2157,13 @@ static int c4iw_reconnect(struct c4iw_ep *ep)
 		iptype = 4;
 		ra = (__u8 *)&raddr->sin_addr;
 	} else {
-		ep->dst = find_route6(ep->com.dev, laddr6->sin6_addr.s6_addr,
-				      raddr6->sin6_addr.s6_addr,
-				      laddr6->sin6_port, raddr6->sin6_port, 0,
-				      raddr6->sin6_scope_id);
+		ep->dst = cxgb_find_route6(&ep->com.dev->rdev.lldi,
+					   get_real_dev,
+					   laddr6->sin6_addr.s6_addr,
+					   raddr6->sin6_addr.s6_addr,
+					   laddr6->sin6_port,
+					   raddr6->sin6_port, 0,
+					   raddr6->sin6_scope_id);
 		iptype = 6;
 		ra = (__u8 *)&raddr6->sin6_addr;
 	}
@@ -2540,10 +2503,11 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
 		     , __func__, parent_ep, hwtid,
 		     local_ip, peer_ip, ntohs(local_port),
 		     ntohs(peer_port), peer_mss);
-		dst = find_route6(dev, local_ip, peer_ip, local_port, peer_port,
-				  PASS_OPEN_TOS_G(ntohl(req->tos_stid)),
-				  ((struct sockaddr_in6 *)
-				  &parent_ep->com.local_addr)->sin6_scope_id);
+		dst = cxgb_find_route6(&dev->rdev.lldi, get_real_dev,
+				local_ip, peer_ip, local_port, peer_port,
+				PASS_OPEN_TOS_G(ntohl(req->tos_stid)),
+				((struct sockaddr_in6 *)
+				 &parent_ep->com.local_addr)->sin6_scope_id);
 	}
 	if (!dst) {
 		printk(KERN_ERR MOD "%s - failed to find dst entry!\n",
@@ -3339,10 +3303,12 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 		     __func__, laddr6->sin6_addr.s6_addr,
 		     ntohs(laddr6->sin6_port),
 		     raddr6->sin6_addr.s6_addr, ntohs(raddr6->sin6_port));
-		ep->dst = find_route6(dev, laddr6->sin6_addr.s6_addr,
-				      raddr6->sin6_addr.s6_addr,
-				      laddr6->sin6_port, raddr6->sin6_port, 0,
-				      raddr6->sin6_scope_id);
+		ep->dst = cxgb_find_route6(&dev->rdev.lldi, get_real_dev,
+					   laddr6->sin6_addr.s6_addr,
+					   raddr6->sin6_addr.s6_addr,
+					   laddr6->sin6_port,
+					   raddr6->sin6_port, 0,
+					   raddr6->sin6_scope_id);
 	}
 	if (!ep->dst) {
 		printk(KERN_ERR MOD "%s - cannot find route.\n", __func__);
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
index a318412..0f0de5b 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
@@ -33,6 +33,7 @@
 #include <linux/tcp.h>
 #include <linux/ipv6.h>
 #include <net/route.h>
+#include <net/ip6_route.h>
 
 #include "libcxgb_cm.h"
 
@@ -114,3 +115,35 @@ cxgb_find_route(struct cxgb4_lld_info *lldi,
 	return &rt->dst;
 }
 EXPORT_SYMBOL(cxgb_find_route);
+
+struct dst_entry *
+cxgb_find_route6(struct cxgb4_lld_info *lldi,
+		 struct net_device *(*get_real_dev)(struct net_device *),
+		 __u8 *local_ip, __u8 *peer_ip, __be16 local_port,
+		 __be16 peer_port, u8 tos, __u32 sin6_scope_id)
+{
+	struct dst_entry *dst = NULL;
+
+	if (IS_ENABLED(CONFIG_IPV6)) {
+		struct flowi6 fl6;
+
+		memset(&fl6, 0, sizeof(fl6));
+		memcpy(&fl6.daddr, peer_ip, 16);
+		memcpy(&fl6.saddr, local_ip, 16);
+		if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
+			fl6.flowi6_oif = sin6_scope_id;
+		dst = ip6_route_output(&init_net, NULL, &fl6);
+		if (!dst)
+			goto out;
+		if (!cxgb_our_interface(lldi, get_real_dev,
+					ip6_dst_idev(dst)->dev) &&
+		    !(ip6_dst_idev(dst)->dev->flags & IFF_LOOPBACK)) {
+			dst_release(dst);
+			dst = NULL;
+		}
+	}
+
+out:
+	return dst;
+}
+EXPORT_SYMBOL(cxgb_find_route6);
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index fe69161..c4df04a 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -43,4 +43,8 @@ struct dst_entry *
 cxgb_find_route(struct cxgb4_lld_info *,
 		struct net_device *(*)(struct net_device *),
 		__be32, __be32, __be16,	__be16, u8);
+struct dst_entry *
+cxgb_find_route6(struct cxgb4_lld_info *,
+		 struct net_device *(*)(struct net_device *),
+		 __u8 *, __u8 *, __be16, __be16, u8, __u32);
 #endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index 49b24b9..e961ac4 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -790,46 +790,6 @@ void _cxgbit_free_csk(struct kref *kref)
 	kfree(csk);
 }
 
-static int
-cxgbit_our_interface(struct cxgbit_device *cdev, struct net_device *egress_dev)
-{
-	u8 i;
-
-	egress_dev = cxgbit_get_real_dev(egress_dev);
-	for (i = 0; i < cdev->lldi.nports; i++)
-		if (cdev->lldi.ports[i] == egress_dev)
-			return 1;
-	return 0;
-}
-
-static struct dst_entry *
-cxgbit_find_route6(struct cxgbit_device *cdev, __u8 *local_ip, __u8 *peer_ip,
-		   __be16 local_port, __be16 peer_port, u8 tos,
-		   __u32 sin6_scope_id)
-{
-	struct dst_entry *dst = NULL;
-
-	if (IS_ENABLED(CONFIG_IPV6)) {
-		struct flowi6 fl6;
-
-		memset(&fl6, 0, sizeof(fl6));
-		memcpy(&fl6.daddr, peer_ip, 16);
-		memcpy(&fl6.saddr, local_ip, 16);
-		if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
-			fl6.flowi6_oif = sin6_scope_id;
-		dst = ip6_route_output(&init_net, NULL, &fl6);
-		if (!dst)
-			goto out;
-		if (!cxgbit_our_interface(cdev, ip6_dst_idev(dst)->dev) &&
-		    !(ip6_dst_idev(dst)->dev->flags & IFF_LOOPBACK)) {
-			dst_release(dst);
-			dst = NULL;
-		}
-	}
-out:
-	return dst;
-}
-
 static void cxgbit_set_tcp_window(struct cxgbit_sock *csk, struct port_info *pi)
 {
 	unsigned int linkspeed;
@@ -1299,11 +1259,12 @@ cxgbit_pass_accept_req(struct cxgbit_device *cdev, struct sk_buff *skb)
 			 , __func__, cnp, tid,
 			 local_ip, peer_ip, ntohs(local_port),
 			 ntohs(peer_port), peer_mss);
-		dst = cxgbit_find_route6(cdev, local_ip, peer_ip,
-					 local_port, peer_port,
-					 PASS_OPEN_TOS_G(ntohl(req->tos_stid)),
-					 ((struct sockaddr_in6 *)
-					 &cnp->com.local_addr)->sin6_scope_id);
+		dst = cxgb_find_route6(&cdev->lldi, cxgbit_get_real_dev,
+				       local_ip, peer_ip,
+				       local_port, peer_port,
+				       PASS_OPEN_TOS_G(ntohl(req->tos_stid)),
+				       ((struct sockaddr_in6 *)
+					&cnp->com.local_addr)->sin6_scope_id);
 	}
 	if (!dst) {
 		pr_err("%s - failed to find dst entry!\n",
-- 
2.0.2

^ permalink raw reply related

* [net-next PATCH 01/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_get_4tuple()
From: Varun Prakash @ 2016-09-13 15:53 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_get_4tuple() in libcxgb_cm.c to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/Kconfig               |  1 +
 drivers/infiniband/hw/cxgb4/Makefile              |  1 +
 drivers/infiniband/hw/cxgb4/cm.c                  | 41 +------------
 drivers/net/ethernet/chelsio/libcxgb/Makefile     |  4 +-
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c | 72 +++++++++++++++++++++++
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h | 42 +++++++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 41 +------------
 7 files changed, 125 insertions(+), 77 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
 create mode 100644 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h

diff --git a/drivers/infiniband/hw/cxgb4/Kconfig b/drivers/infiniband/hw/cxgb4/Kconfig
index 23f38cf..afe8b28 100644
--- a/drivers/infiniband/hw/cxgb4/Kconfig
+++ b/drivers/infiniband/hw/cxgb4/Kconfig
@@ -1,6 +1,7 @@
 config INFINIBAND_CXGB4
 	tristate "Chelsio T4/T5 RDMA Driver"
 	depends on CHELSIO_T4 && INET && (IPV6 || IPV6=n)
+	select CHELSIO_LIB
 	select GENERIC_ALLOCATOR
 	---help---
 	  This is an iWARP/RDMA driver for the Chelsio T4 and T5
diff --git a/drivers/infiniband/hw/cxgb4/Makefile b/drivers/infiniband/hw/cxgb4/Makefile
index e11cf72..fa40b68 100644
--- a/drivers/infiniband/hw/cxgb4/Makefile
+++ b/drivers/infiniband/hw/cxgb4/Makefile
@@ -1,4 +1,5 @@
 ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
+ccflags-y += -Idrivers/net/ethernet/chelsio/libcxgb
 
 obj-$(CONFIG_INFINIBAND_CXGB4) += iw_cxgb4.o
 
diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b6a953a..e591f61 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -49,6 +49,7 @@
 
 #include <rdma/ib_addr.h>
 
+#include <libcxgb_cm.h>
 #include "iw_cxgb4.h"
 #include "clip_tbl.h"
 
@@ -2518,42 +2519,6 @@ static void reject_cr(struct c4iw_dev *dev, u32 hwtid, struct sk_buff *skb)
 	return;
 }
 
-static void get_4tuple(struct cpl_pass_accept_req *req, enum chip_type type,
-		       int *iptype, __u8 *local_ip, __u8 *peer_ip,
-		       __be16 *local_port, __be16 *peer_port)
-{
-	int eth_len = (CHELSIO_CHIP_VERSION(type) <= CHELSIO_T5) ?
-		      ETH_HDR_LEN_G(be32_to_cpu(req->hdr_len)) :
-		      T6_ETH_HDR_LEN_G(be32_to_cpu(req->hdr_len));
-	int ip_len = (CHELSIO_CHIP_VERSION(type) <= CHELSIO_T5) ?
-		     IP_HDR_LEN_G(be32_to_cpu(req->hdr_len)) :
-		     T6_IP_HDR_LEN_G(be32_to_cpu(req->hdr_len));
-	struct iphdr *ip = (struct iphdr *)((u8 *)(req + 1) + eth_len);
-	struct ipv6hdr *ip6 = (struct ipv6hdr *)((u8 *)(req + 1) + eth_len);
-	struct tcphdr *tcp = (struct tcphdr *)
-			     ((u8 *)(req + 1) + eth_len + ip_len);
-
-	if (ip->version == 4) {
-		PDBG("%s saddr 0x%x daddr 0x%x sport %u dport %u\n", __func__,
-		     ntohl(ip->saddr), ntohl(ip->daddr), ntohs(tcp->source),
-		     ntohs(tcp->dest));
-		*iptype = 4;
-		memcpy(peer_ip, &ip->saddr, 4);
-		memcpy(local_ip, &ip->daddr, 4);
-	} else {
-		PDBG("%s saddr %pI6 daddr %pI6 sport %u dport %u\n", __func__,
-		     ip6->saddr.s6_addr, ip6->daddr.s6_addr, ntohs(tcp->source),
-		     ntohs(tcp->dest));
-		*iptype = 6;
-		memcpy(peer_ip, ip6->saddr.s6_addr, 16);
-		memcpy(local_ip, ip6->daddr.s6_addr, 16);
-	}
-	*peer_port = tcp->source;
-	*local_port = tcp->dest;
-
-	return;
-}
-
 static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
 {
 	struct c4iw_ep *child_ep = NULL, *parent_ep;
@@ -2582,8 +2547,8 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
 		goto reject;
 	}
 
-	get_4tuple(req, parent_ep->com.dev->rdev.lldi.adapter_type, &iptype,
-		   local_ip, peer_ip, &local_port, &peer_port);
+	cxgb_get_4tuple(req, parent_ep->com.dev->rdev.lldi.adapter_type,
+			&iptype, local_ip, peer_ip, &local_port, &peer_port);
 
 	/* Find output route */
 	if (iptype == 4)  {
diff --git a/drivers/net/ethernet/chelsio/libcxgb/Makefile b/drivers/net/ethernet/chelsio/libcxgb/Makefile
index 2362230..2534e30 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/Makefile
+++ b/drivers/net/ethernet/chelsio/libcxgb/Makefile
@@ -1,3 +1,5 @@
+ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
+
 obj-$(CONFIG_CHELSIO_LIB) += libcxgb.o
 
-libcxgb-y := libcxgb_ppm.o
+libcxgb-y := libcxgb_ppm.o libcxgb_cm.o
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
new file mode 100644
index 0000000..d7342bb
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2016 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/tcp.h>
+#include <linux/ipv6.h>
+
+#include "libcxgb_cm.h"
+
+void
+cxgb_get_4tuple(struct cpl_pass_accept_req *req, enum chip_type type,
+		int *iptype, __u8 *local_ip, __u8 *peer_ip,
+		__be16 *local_port, __be16 *peer_port)
+{
+	int eth_len = (CHELSIO_CHIP_VERSION(type) <= CHELSIO_T5) ?
+		      ETH_HDR_LEN_G(be32_to_cpu(req->hdr_len)) :
+		      T6_ETH_HDR_LEN_G(be32_to_cpu(req->hdr_len));
+	int ip_len = (CHELSIO_CHIP_VERSION(type) <= CHELSIO_T5) ?
+		     IP_HDR_LEN_G(be32_to_cpu(req->hdr_len)) :
+		     T6_IP_HDR_LEN_G(be32_to_cpu(req->hdr_len));
+	struct iphdr *ip = (struct iphdr *)((u8 *)(req + 1) + eth_len);
+	struct ipv6hdr *ip6 = (struct ipv6hdr *)((u8 *)(req + 1) + eth_len);
+	struct tcphdr *tcp = (struct tcphdr *)
+			     ((u8 *)(req + 1) + eth_len + ip_len);
+
+	if (ip->version == 4) {
+		pr_debug("%s saddr 0x%x daddr 0x%x sport %u dport %u\n",
+			 __func__, ntohl(ip->saddr), ntohl(ip->daddr),
+			 ntohs(tcp->source), ntohs(tcp->dest));
+		*iptype = 4;
+		memcpy(peer_ip, &ip->saddr, 4);
+		memcpy(local_ip, &ip->daddr, 4);
+	} else {
+		pr_debug("%s saddr %pI6 daddr %pI6 sport %u dport %u\n",
+			 __func__, ip6->saddr.s6_addr, ip6->daddr.s6_addr,
+			 ntohs(tcp->source), ntohs(tcp->dest));
+		*iptype = 6;
+		memcpy(peer_ip, ip6->saddr.s6_addr, 16);
+		memcpy(local_ip, ip6->daddr.s6_addr, 16);
+	}
+	*peer_port = tcp->source;
+	*local_port = tcp->dest;
+}
+EXPORT_SYMBOL(cxgb_get_4tuple);
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
new file mode 100644
index 0000000..2ab8d9b
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *	  copyright notice, this list of conditions and the following
+ *	  disclaimer in the documentation and/or other materials
+ *	  provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __LIBCXGB_CM_H__
+#define __LIBCXGB_CM_H__
+
+#include <cxgb4.h>
+#include <t4_msg.h>
+
+void
+cxgb_get_4tuple(struct cpl_pass_accept_req *, enum chip_type,
+		int *, __u8 *, __u8 *, __be16 *, __be16 *);
+#endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index 0ae0b13..8bb5a25 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -24,6 +24,7 @@
 #include <net/ip6_route.h>
 #include <net/addrconf.h>
 
+#include <libcxgb_cm.h>
 #include "cxgbit.h"
 #include "clip_tbl.h"
 
@@ -789,42 +790,6 @@ void _cxgbit_free_csk(struct kref *kref)
 	kfree(csk);
 }
 
-static void
-cxgbit_get_tuple_info(struct cpl_pass_accept_req *req, int *iptype,
-		      __u8 *local_ip, __u8 *peer_ip, __be16 *local_port,
-		      __be16 *peer_port)
-{
-	u32 eth_len = ETH_HDR_LEN_G(be32_to_cpu(req->hdr_len));
-	u32 ip_len = IP_HDR_LEN_G(be32_to_cpu(req->hdr_len));
-	struct iphdr *ip = (struct iphdr *)((u8 *)(req + 1) + eth_len);
-	struct ipv6hdr *ip6 = (struct ipv6hdr *)((u8 *)(req + 1) + eth_len);
-	struct tcphdr *tcp = (struct tcphdr *)
-			      ((u8 *)(req + 1) + eth_len + ip_len);
-
-	if (ip->version == 4) {
-		pr_debug("%s saddr 0x%x daddr 0x%x sport %u dport %u\n",
-			 __func__,
-			 ntohl(ip->saddr), ntohl(ip->daddr),
-			 ntohs(tcp->source),
-			 ntohs(tcp->dest));
-		*iptype = 4;
-		memcpy(peer_ip, &ip->saddr, 4);
-		memcpy(local_ip, &ip->daddr, 4);
-	} else {
-		pr_debug("%s saddr %pI6 daddr %pI6 sport %u dport %u\n",
-			 __func__,
-			 ip6->saddr.s6_addr, ip6->daddr.s6_addr,
-			 ntohs(tcp->source),
-			 ntohs(tcp->dest));
-		*iptype = 6;
-		memcpy(peer_ip, ip6->saddr.s6_addr, 16);
-		memcpy(local_ip, ip6->daddr.s6_addr, 16);
-	}
-
-	*peer_port = tcp->source;
-	*local_port = tcp->dest;
-}
-
 static int
 cxgbit_our_interface(struct cxgbit_device *cdev, struct net_device *egress_dev)
 {
@@ -1340,8 +1305,8 @@ cxgbit_pass_accept_req(struct cxgbit_device *cdev, struct sk_buff *skb)
 		goto rel_skb;
 	}
 
-	cxgbit_get_tuple_info(req, &iptype, local_ip, peer_ip,
-			      &local_port, &peer_port);
+	cxgb_get_4tuple(req, cdev->lldi.adapter_type, &iptype, local_ip,
+			peer_ip, &local_port, &peer_port);
 
 	/* Find output route */
 	if (iptype == 4)  {
-- 
2.0.2

^ permalink raw reply related

* [net-next PATCH 10/11] libcxgb,iw_cxgb4,cxgbit: add cxgb_mk_abort_rpl()
From: Varun Prakash @ 2016-09-13 15:54 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-rdma, target-devel, nab, dledford, swise,
	gerlitz.or, indranil, varun
In-Reply-To: <cover.1473781521.git.varun@chelsio.com>

Add cxgb_mk_abort_rpl() to remove duplicate
code to form CPL_ABORT_RPL hardware command.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/infiniband/hw/cxgb4/cm.c                  | 10 ++++------
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h | 14 ++++++++++++++
 drivers/target/iscsi/cxgbit/cxgbit_cm.c           | 11 ++---------
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 484196e..a6d5fcb 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2705,12 +2705,12 @@ static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
 {
 	struct cpl_abort_req_rss *req = cplhdr(skb);
 	struct c4iw_ep *ep;
-	struct cpl_abort_rpl *rpl;
 	struct sk_buff *rpl_skb;
 	struct c4iw_qp_attributes attrs;
 	int ret;
 	int release = 0;
 	unsigned int tid = GET_TID(req);
+	u32 len = roundup(sizeof(struct cpl_abort_rpl), 16);
 
 	ep = get_ep_from_tid(dev, tid);
 	if (!ep)
@@ -2809,11 +2809,9 @@ static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
 		release = 1;
 		goto out;
 	}
-	set_wr_txq(skb, CPL_PRIORITY_DATA, ep->txq_idx);
-	rpl = (struct cpl_abort_rpl *) skb_put(rpl_skb, sizeof(*rpl));
-	INIT_TP_WR(rpl, ep->hwtid);
-	OPCODE_TID(rpl) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_RPL, ep->hwtid));
-	rpl->cmd = CPL_ABORT_NO_RST;
+
+	cxgb_mk_abort_rpl(rpl_skb, len, ep->hwtid, ep->txq_idx);
+
 	c4iw_ofld_send(&ep->com.dev->rdev, rpl_skb);
 out:
 	if (release)
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
index 2d3a3bf..70999e8 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.h
@@ -128,4 +128,18 @@ cxgb_mk_abort_req(struct sk_buff *skb, u32 len, u32 tid, u16 chan,
 	set_wr_txq(skb, CPL_PRIORITY_DATA, chan);
 	t4_set_arp_err_handler(skb, handle, handler);
 }
+
+static inline void
+cxgb_mk_abort_rpl(struct sk_buff *skb, u32 len, u32 tid, u16 chan)
+{
+	struct cpl_abort_rpl *rpl;
+
+	rpl = (struct cpl_abort_rpl *)__skb_put(skb, len);
+	memset(rpl, 0, len);
+
+	INIT_TP_WR(rpl, tid);
+	OPCODE_TID(rpl) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_RPL, tid));
+	rpl->cmd = CPL_ABORT_NO_RST;
+	set_wr_txq(skb, CPL_PRIORITY_DATA, chan);
+}
 #endif
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_cm.c b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
index f2b737e..9bdbe3b 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_cm.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_cm.c
@@ -1642,11 +1642,10 @@ static void cxgbit_abort_req_rss(struct cxgbit_sock *csk, struct sk_buff *skb)
 {
 	struct cpl_abort_req_rss *hdr = cplhdr(skb);
 	unsigned int tid = GET_TID(hdr);
-	struct cpl_abort_rpl *rpl;
 	struct sk_buff *rpl_skb;
 	bool release = false;
 	bool wakeup_thread = false;
-	unsigned int len = roundup(sizeof(*rpl), 16);
+	u32 len = roundup(sizeof(struct cpl_abort_rpl), 16);
 
 	pr_debug("%s: csk %p; tid %u; state %d\n",
 		 __func__, csk, tid, csk->com.state);
@@ -1686,14 +1685,8 @@ static void cxgbit_abort_req_rss(struct cxgbit_sock *csk, struct sk_buff *skb)
 		cxgbit_send_tx_flowc_wr(csk);
 
 	rpl_skb = __skb_dequeue(&csk->skbq);
-	set_wr_txq(skb, CPL_PRIORITY_DATA, csk->txq_idx);
-
-	rpl = (struct cpl_abort_rpl *)__skb_put(rpl_skb, len);
-	memset(rpl, 0, len);
 
-	INIT_TP_WR(rpl, csk->tid);
-	OPCODE_TID(rpl) = cpu_to_be32(MK_OPCODE_TID(CPL_ABORT_RPL, tid));
-	rpl->cmd = CPL_ABORT_NO_RST;
+	cxgb_mk_abort_rpl(rpl_skb, len, csk->tid, csk->txq_idx);
 	cxgbit_ofld_send(csk->com.cdev, rpl_skb);
 
 	if (wakeup_thread) {
-- 
2.0.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox