Netdev List
 help / color / mirror / Atom feed
* [net-next PATCH v3 3/5] phonet: Replace calls to __skb_alloc_page with __dev_alloc_page
From: Alexander Duyck @ 2014-11-11 17:26 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111172523.16460.16845.stgit@ahduyck-vm-fedora20>

Replace the calls to __skb_alloc_page that are passed NULL with calls to
__dev_alloc_page.

In addition remove __GFP_COLD flag from allocations as we only want it for
the Rx buffer which is taken care of by __dev_alloc_skb, not for any
secondary allocations such as the queue element transmit descriptors.

Cc: Oliver Neukum <oliver@neukum.org>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/usb/cdc-phonet.c           |    6 +++---
 drivers/usb/gadget/function/f_phonet.c |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index 2ec1500..415ce8b 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -130,7 +130,7 @@ static int rx_submit(struct usbpn_dev *pnd, struct urb *req, gfp_t gfp_flags)
 	struct page *page;
 	int err;
 
-	page = __skb_alloc_page(gfp_flags | __GFP_NOMEMALLOC, NULL);
+	page = __dev_alloc_page(gfp_flags | __GFP_NOMEMALLOC);
 	if (!page)
 		return -ENOMEM;
 
@@ -212,7 +212,7 @@ resubmit:
 	if (page)
 		put_page(page);
 	if (req)
-		rx_submit(pnd, req, GFP_ATOMIC | __GFP_COLD);
+		rx_submit(pnd, req, GFP_ATOMIC);
 }
 
 static int usbpn_close(struct net_device *dev);
@@ -231,7 +231,7 @@ static int usbpn_open(struct net_device *dev)
 	for (i = 0; i < rxq_size; i++) {
 		struct urb *req = usb_alloc_urb(0, GFP_KERNEL);
 
-		if (!req || rx_submit(pnd, req, GFP_KERNEL | __GFP_COLD)) {
+		if (!req || rx_submit(pnd, req, GFP_KERNEL)) {
 			usb_free_urb(req);
 			usbpn_close(dev);
 			return -ENOMEM;
diff --git a/drivers/usb/gadget/function/f_phonet.c b/drivers/usb/gadget/function/f_phonet.c
index b9cfc15..cde7397 100644
--- a/drivers/usb/gadget/function/f_phonet.c
+++ b/drivers/usb/gadget/function/f_phonet.c
@@ -303,7 +303,7 @@ pn_rx_submit(struct f_phonet *fp, struct usb_request *req, gfp_t gfp_flags)
 	struct page *page;
 	int err;
 
-	page = __skb_alloc_page(gfp_flags | __GFP_NOMEMALLOC, NULL);
+	page = __dev_alloc_page(gfp_flags | __GFP_NOMEMALLOC);
 	if (!page)
 		return -ENOMEM;
 
@@ -377,7 +377,7 @@ static void pn_rx_complete(struct usb_ep *ep, struct usb_request *req)
 	if (page)
 		put_page(page);
 	if (req)
-		pn_rx_submit(fp, req, GFP_ATOMIC | __GFP_COLD);
+		pn_rx_submit(fp, req, GFP_ATOMIC);
 }
 
 /*-------------------------------------------------------------------------*/
@@ -437,7 +437,7 @@ static int pn_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
 
 			netif_carrier_on(dev);
 			for (i = 0; i < phonet_rxq_size; i++)
-				pn_rx_submit(fp, fp->out_reqv[i], GFP_ATOMIC | __GFP_COLD);
+				pn_rx_submit(fp, fp->out_reqv[i], GFP_ATOMIC);
 		}
 		spin_unlock(&port->lock);
 		return 0;

^ permalink raw reply related

* [net-next PATCH v3 1/5] net: Add device Rx page allocation function
From: Alexander Duyck @ 2014-11-11 17:26 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111172523.16460.16845.stgit@ahduyck-vm-fedora20>

This patch implements __dev_alloc_pages and __dev_alloc_page.  These are
meant to replace the __skb_alloc_pages and __skb_alloc_page functions.  The
reason for doing this is that it occurred to me that __skb_alloc_page is
supposed to be passed an sk_buff pointer, but it is NULL in all cases where
it is used.  Worse is that in the case of ixgbe it is passed NULL via the
sk_buff pointer in the rx_buffer info structure which means the compiler is
not correctly stripping it out.

The naming for these functions is based on dev_alloc_skb and __dev_alloc_skb.
There was originally a netdev_alloc_page, however that was passed a
net_device pointer and this function is not so I thought it best to follow
that naming scheme since that is the same difference between dev_alloc_skb
and netdev_alloc_skb.

In the case of anything greater than order 0 it is assumed that we want a
compound page so __GFP_COMP is set for all allocations as we expect a
compound page when assigning a page frag.

The other change in this patch is to exploit the behaviors of the page
allocator in how it handles flags.  So for example we can always set
__GFP_COMP and __GFP_MEMALLOC since they are ignored if they are not
applicable or are overridden by another flag.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 103fbe8..2e5221f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2185,6 +2185,54 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 }
 
 /**
+ * __dev_alloc_pages - allocate page for network Rx
+ * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
+ * @order: size of the allocation
+ *
+ * Allocate a new page.
+ *
+ * %NULL is returned if there is no free memory.
+*/
+static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
+					     unsigned int order)
+{
+	/* This piece of code contains several assumptions.
+	 * 1.  This is for device Rx, therefor a cold page is preferred.
+	 * 2.  The expectation is the user wants a compound page.
+	 * 3.  If requesting a order 0 page it will not be compound
+	 *     due to the check to see if order has a value in prep_new_page
+	 * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
+	 *     code in gfp_to_alloc_flags that should be enforcing this.
+	 */
+	gfp_mask |= __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC;
+
+	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
+}
+
+static inline struct page *dev_alloc_pages(unsigned int order)
+{
+	return __dev_alloc_pages(GFP_ATOMIC, order);
+}
+
+/**
+ * __dev_alloc_page - allocate a page for network Rx
+ * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
+ *
+ * Allocate a new page.
+ *
+ * %NULL is returned if there is no free memory.
+ */
+static inline struct page *__dev_alloc_page(gfp_t gfp_mask)
+{
+	return __dev_alloc_pages(gfp_mask, 0);
+}
+
+static inline struct page *dev_alloc_page(void)
+{
+	return __dev_alloc_page(GFP_ATOMIC);
+}
+
+/**
  *	__skb_alloc_pages - allocate pages for ps-rx on a skb and preserve pfmemalloc data
  *	@gfp_mask: alloc_pages_node mask. Set __GFP_NOMEMALLOC if not for network packet RX
  *	@skb: skb to set pfmemalloc on if __GFP_MEMALLOC is used

^ permalink raw reply related

* [net-next PATCH v3 0/5] Replace __skb_alloc_pages with simpler function
From: Alexander Duyck @ 2014-11-11 17:26 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher

This patch series replaces __skb_alloc_pages with a much simpler function,
__dev_alloc_pages.  The main difference between the two is that
__skb_alloc_pages had an sk_buff pointer that was being passed as NULL in
call places where it was called.  In a couple of cases the NULL was passed
by variable and this led to unnecessary code being run.

As such in order to simplify things the __dev_alloc_pages call only takes a
mask and the page order being requested.  In addition it takes advantage of
several behaviors already built into the page allocator so that it can just
set GFP flags unconditionally.

v2: Renamed functions to dev_alloc_page(s) instead of netdev_alloc_page(s)
    Removed __GFP_COLD flag from usb code as it was redundant
v3: Update patch descriptions and subjects to match changes in v2

---

Alexander Duyck (5):
      net: Add device Rx page allocation function
      cxgb4/cxgb4vf: Replace __skb_alloc_page with __dev_alloc_page
      phonet: Replace calls to __skb_alloc_page with __dev_alloc_page
      fm10k/igb/ixgbe: Replace __skb_alloc_page with dev_alloc_page
      net: Remove __skb_alloc_page and __skb_alloc_pages


 drivers/net/ethernet/chelsio/cxgb4/sge.c      |    6 +-
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c    |    7 ++-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    2 -
 drivers/net/ethernet/intel/igb/igb_main.c     |    2 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 -
 drivers/net/usb/cdc-phonet.c                  |    6 +-
 drivers/usb/gadget/function/f_phonet.c        |    6 +-
 include/linux/skbuff.h                        |   61 ++++++++++++++-----------
 8 files changed, 49 insertions(+), 44 deletions(-)

--

^ permalink raw reply

* Re: [net-next PATCH v2 4/5] fm10k/igb/ixgbe: Replace __skb_alloc_page with netdev_alloc_page
From: Alexander Duyck @ 2014-11-11 17:24 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, hariprasad-ut6Up61K2wZBDgjK7y7TUQ,
	donald.c.skidmore-ral2JQCrhuEAvxtiuMwx3w,
	oliver-GvhC2dPhHPQdnm+yROfE0A, balbi-l0cyMroinI0,
	matthew.vick-ral2JQCrhuEAvxtiuMwx3w, mgorman-l3A5Bk7waGM,
	David Miller, jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <CAHA+R7P+GtA6xHxrPZOhxHM0LA8mGkmWugTqLeHtw5M6P3udvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


On 11/11/2014 09:15 AM, Cong Wang wrote:
> On Tue, Nov 11, 2014 at 9:11 AM, Alexander Duyck
> <alexander.h.duyck-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>> index e645af4..73457ed 100644
>> --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
>> @@ -83,7 +83,7 @@ static bool fm10k_alloc_mapped_page(struct fm10k_ring *rx_ring,
>>                  return true;
>>
>>          /* alloc new page for storage */
>> -       page = alloc_page(GFP_ATOMIC | __GFP_COLD);
>> +       page = dev_alloc_page();
> Doesn't match $subject.

Yeah, I just noticed that.  I missed the patch title and comments when I 
was doing the replacement.

v3 on the way.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch net-next 1/2] net: move vlan pop/push functions into common code
From: Pravin Shelar @ 2014-11-11 17:24 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, David Miller, Jamal Hadi Salim, Tom Herbert, Eric Dumazet,
	willemb, Daniel Borkmann, mst, fw, Paul.Durrant, Thomas Graf
In-Reply-To: <1415700789-9171-1-git-send-email-jiri@resnulli.us>

On Tue, Nov 11, 2014 at 2:13 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> So it can be used from out of openvswitch code.
> Did couple of cosmetic changes on the way, namely variable naming and
> adding support for 8021AD proto.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/skbuff.h    |  2 ++
>  net/core/skbuff.c         | 86 +++++++++++++++++++++++++++++++++++++++++++++++
>  net/openvswitch/actions.c | 76 ++---------------------------------------
>  3 files changed, 90 insertions(+), 74 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 103fbe8..3b0445c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2673,6 +2673,8 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet);
>  unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
>  struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
>  struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
> +int skb_vlan_pop(struct sk_buff *skb);
> +int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
>
>  struct skb_checksum_ops {
>         __wsum (*update)(const void *mem, int len, __wsum wsum);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7001896..75e53d4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4151,6 +4151,92 @@ err_free:
>  }
>  EXPORT_SYMBOL(skb_vlan_untag);
>
> +/* remove VLAN header from packet and update csum accordingly. */
> +static int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci)
> +{
> +       struct vlan_hdr *vhdr;
> +       int err;
> +
> +       if (!pskb_may_pull(skb, VLAN_ETH_HLEN))
> +               return -ENOMEM;
> +
> +       if (!skb_cloned(skb) || skb_clone_writable(skb, VLAN_ETH_HLEN))
> +               return 0;
> +
> +       err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
> +       if (unlikely(err))
> +               return err;
> +
> +       if (skb->ip_summed == CHECKSUM_COMPLETE)
> +               skb->csum = csum_sub(skb->csum, csum_partial(skb->data
> +                                       + (2 * ETH_ALEN), VLAN_HLEN, 0));
> +
> +       vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN);
> +       *vlan_tci = ntohs(vhdr->h_vlan_TCI);
> +
> +       memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
> +       __skb_pull(skb, VLAN_HLEN);
> +
> +       vlan_set_encap_proto(skb, vhdr);
> +       skb->mac_header += VLAN_HLEN;
> +       if (skb_network_offset(skb) < ETH_HLEN)
> +               skb_set_network_header(skb, ETH_HLEN);
> +       skb_reset_mac_len(skb);
> +
> +       return 0;
> +}
> +
> +int skb_vlan_pop(struct sk_buff *skb)
> +{
> +       u16 vlan_tci;
> +       __be16 vlan_proto;
> +       int err;
> +
> +       if (likely(vlan_tx_tag_present(skb))) {
> +               skb->vlan_tci = 0;
> +       } else {
> +               if (unlikely((skb->protocol != htons(ETH_P_8021Q) &&
> +                             skb->protocol != htons(ETH_P_8021AD)) ||
> +                            skb->len < VLAN_ETH_HLEN))
> +                       return 0;
> +
> +               err = __skb_vlan_pop(skb, &vlan_tci);
> +               if (err)
> +                       return err;
> +       }
> +       /* move next vlan tag to hw accel tag */
> +       if (likely((skb->protocol != htons(ETH_P_8021Q) &&
> +                   skb->protocol != htons(ETH_P_8021AD)) ||
> +                  skb->len < VLAN_ETH_HLEN))
> +               return 0;
> +
> +       vlan_proto = skb->protocol;
> +       err = __skb_vlan_pop(skb, &vlan_tci);
> +       if (unlikely(err))
> +               return err;
> +
> +       __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
> +       return 0;
> +}
> +EXPORT_SYMBOL(skb_vlan_pop);
> +
> +int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
> +{
> +       if (vlan_tx_tag_present(skb)) {
> +               /* push down current VLAN tag */
> +               if (!__vlan_put_tag(skb, skb->vlan_proto,
> +                                   vlan_tx_tag_get(skb)))
> +                       return -ENOMEM;
> +
Since you are restructuring this code, can you also change
__vlan_put_tag() to not free skb on error. So that these two new
functions can have common error handling code.

> +               if (skb->ip_summed == CHECKSUM_COMPLETE)
> +                       skb->csum = csum_add(skb->csum, csum_partial(skb->data
> +                                       + (2 * ETH_ALEN), VLAN_HLEN, 0));
> +       }
> +       __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
> +       return 0;
> +}
> +EXPORT_SYMBOL(skb_vlan_push);
> +
>  /**
>   * alloc_skb_with_frags - allocate skb with page frags
>   *
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index f7e5891..1b28110 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -206,86 +206,14 @@ static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
>         return 0;
>  }
>
> -/* remove VLAN header from packet and update csum accordingly. */
> -static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
> -{
> -       struct vlan_hdr *vhdr;
> -       int err;
> -
> -       err = make_writable(skb, VLAN_ETH_HLEN);
> -       if (unlikely(err))
> -               return err;
> -
> -       if (skb->ip_summed == CHECKSUM_COMPLETE)
> -               skb->csum = csum_sub(skb->csum, csum_partial(skb->data
> -                                       + (2 * ETH_ALEN), VLAN_HLEN, 0));
> -
> -       vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN);
> -       *current_tci = vhdr->h_vlan_TCI;
> -
> -       memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
> -       __skb_pull(skb, VLAN_HLEN);
> -
> -       vlan_set_encap_proto(skb, vhdr);
> -       skb->mac_header += VLAN_HLEN;
> -
> -       if (skb_network_offset(skb) < ETH_HLEN)
> -               skb_set_network_header(skb, ETH_HLEN);
> -
> -       /* Update mac_len for subsequent MPLS actions */
> -       skb_reset_mac_len(skb);
> -       return 0;
> -}
> -
>  static int pop_vlan(struct sk_buff *skb)
>  {
> -       __be16 tci;
> -       int err;
> -
> -       if (likely(vlan_tx_tag_present(skb))) {
> -               skb->vlan_tci = 0;
> -       } else {
> -               if (unlikely(skb->protocol != htons(ETH_P_8021Q) ||
> -                            skb->len < VLAN_ETH_HLEN))
> -                       return 0;
> -
> -               err = __pop_vlan_tci(skb, &tci);
> -               if (err)
> -                       return err;
> -       }
> -       /* move next vlan tag to hw accel tag */
> -       if (likely(skb->protocol != htons(ETH_P_8021Q) ||
> -                  skb->len < VLAN_ETH_HLEN))
> -               return 0;
> -
> -       err = __pop_vlan_tci(skb, &tci);
> -       if (unlikely(err))
> -               return err;
> -
> -       __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), ntohs(tci));
> -       return 0;
> +       return skb_vlan_pop(skb);
>  }
>
>  static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vlan)
>  {
> -       if (unlikely(vlan_tx_tag_present(skb))) {
> -               u16 current_tag;
> -
> -               /* push down current VLAN tag */
> -               current_tag = vlan_tx_tag_get(skb);
> -
> -               if (!__vlan_put_tag(skb, skb->vlan_proto, current_tag))
> -                       return -ENOMEM;
> -               /* Update mac_len for subsequent MPLS actions */
> -               skb->mac_len += VLAN_HLEN;
> -
> -               if (skb->ip_summed == CHECKSUM_COMPLETE)
> -                       skb->csum = csum_add(skb->csum, csum_partial(skb->data
> -                                       + (2 * ETH_ALEN), VLAN_HLEN, 0));
> -
> -       }
> -       __vlan_hwaccel_put_tag(skb, vlan->vlan_tpid, ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
> -       return 0;
> +       return skb_vlan_push(skb, vlan->vlan_tpid, ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
>  }
>
>  static int set_eth_addr(struct sk_buff *skb,
> --
> 1.9.3
>

^ permalink raw reply

* Re: [net-next PATCH v2 4/5] fm10k/igb/ixgbe: Replace __skb_alloc_page with netdev_alloc_page
From: Cong Wang @ 2014-11-11 17:15 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, linux-usb, leedom, hariprasad, donald.c.skidmore, oliver,
	balbi, matthew.vick, mgorman, David Miller, jeffrey.t.kirsher
In-Reply-To: <20141111171117.15976.41609.stgit@ahduyck-vm-fedora20>

On Tue, Nov 11, 2014 at 9:11 AM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
> index e645af4..73457ed 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
> @@ -83,7 +83,7 @@ static bool fm10k_alloc_mapped_page(struct fm10k_ring *rx_ring,
>                 return true;
>
>         /* alloc new page for storage */
> -       page = alloc_page(GFP_ATOMIC | __GFP_COLD);
> +       page = dev_alloc_page();

Doesn't match $subject.

^ permalink raw reply

* [net-next PATCH v2 5/5] net: Remove __skb_alloc_page and __skb_alloc_pages
From: Alexander Duyck @ 2014-11-11 17:11 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111170711.15976.44062.stgit@ahduyck-vm-fedora20>

Remove the two functions which are now dead code.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |   43 -------------------------------------------
 1 file changed, 43 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2e5221f..73c370e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2233,49 +2233,6 @@ static inline struct page *dev_alloc_page(void)
 }
 
 /**
- *	__skb_alloc_pages - allocate pages for ps-rx on a skb and preserve pfmemalloc data
- *	@gfp_mask: alloc_pages_node mask. Set __GFP_NOMEMALLOC if not for network packet RX
- *	@skb: skb to set pfmemalloc on if __GFP_MEMALLOC is used
- *	@order: size of the allocation
- *
- * 	Allocate a new page.
- *
- * 	%NULL is returned if there is no free memory.
-*/
-static inline struct page *__skb_alloc_pages(gfp_t gfp_mask,
-					      struct sk_buff *skb,
-					      unsigned int order)
-{
-	struct page *page;
-
-	gfp_mask |= __GFP_COLD;
-
-	if (!(gfp_mask & __GFP_NOMEMALLOC))
-		gfp_mask |= __GFP_MEMALLOC;
-
-	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
-	if (skb && page && page->pfmemalloc)
-		skb->pfmemalloc = true;
-
-	return page;
-}
-
-/**
- *	__skb_alloc_page - allocate a page for ps-rx for a given skb and preserve pfmemalloc data
- *	@gfp_mask: alloc_pages_node mask. Set __GFP_NOMEMALLOC if not for network packet RX
- *	@skb: skb to set pfmemalloc on if __GFP_MEMALLOC is used
- *
- * 	Allocate a new page.
- *
- * 	%NULL is returned if there is no free memory.
- */
-static inline struct page *__skb_alloc_page(gfp_t gfp_mask,
-					     struct sk_buff *skb)
-{
-	return __skb_alloc_pages(gfp_mask, skb, 0);
-}
-
-/**
  *	skb_propagate_pfmemalloc - Propagate pfmemalloc if skb is allocated after RX page
  *	@page: The page that was allocated from skb_alloc_page
  *	@skb: The skb that may need pfmemalloc set

^ permalink raw reply related

* [net-next PATCH v2 4/5] fm10k/igb/ixgbe: Replace __skb_alloc_page with netdev_alloc_page
From: Alexander Duyck @ 2014-11-11 17:11 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111170711.15976.44062.stgit@ahduyck-vm-fedora20>

The Intel drivers were pretty much just using the plain vanilla GFP flags
in their calls to __skb_alloc_page so this change makes it so that they use
netdev_alloc_page which just uses GFP_ATOMIC for the gfp_flags value.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 +--
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e645af4..73457ed 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -83,7 +83,7 @@ static bool fm10k_alloc_mapped_page(struct fm10k_ring *rx_ring,
 		return true;
 
 	/* alloc new page for storage */
-	page = alloc_page(GFP_ATOMIC | __GFP_COLD);
+	page = dev_alloc_page();
 	if (unlikely(!page)) {
 		rx_ring->rx_stats.alloc_failed++;
 		return false;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a2d72a8..1e35fae 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6988,7 +6988,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
 		return true;
 
 	/* alloc new page for storage */
-	page = __skb_alloc_page(GFP_ATOMIC | __GFP_COLD, NULL);
+	page = dev_alloc_page();
 	if (unlikely(!page)) {
 		rx_ring->rx_stats.alloc_failed++;
 		return false;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d2df4e3..7405478 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1440,8 +1440,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
 
 	/* alloc new page for storage */
 	if (likely(!page)) {
-		page = __skb_alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP,
-					 bi->skb, ixgbe_rx_pg_order(rx_ring));
+		page = dev_alloc_pages(ixgbe_rx_pg_order(rx_ring));
 		if (unlikely(!page)) {
 			rx_ring->rx_stats.alloc_rx_page_failed++;
 			return false;

^ permalink raw reply related

* [net-next PATCH v2 3/5] phonet: Replace calls to __skb_alloc_page with __netdev_alloc_page
From: Alexander Duyck @ 2014-11-11 17:11 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111170711.15976.44062.stgit@ahduyck-vm-fedora20>

Replace the calls to __skb_alloc_page that are passed NULL with calls to
__netdev_alloc_page.

In addition remove __GFP_COLD flag from allocations as we only want it for
the Rx buffer which is taken care of by __dev_alloc_skb, not for any
secondary allocations such as the queue element transmit descriptors.

Cc: Oliver Neukum <oliver@neukum.org>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/usb/cdc-phonet.c           |    6 +++---
 drivers/usb/gadget/function/f_phonet.c |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/cdc-phonet.c b/drivers/net/usb/cdc-phonet.c
index 2ec1500..415ce8b 100644
--- a/drivers/net/usb/cdc-phonet.c
+++ b/drivers/net/usb/cdc-phonet.c
@@ -130,7 +130,7 @@ static int rx_submit(struct usbpn_dev *pnd, struct urb *req, gfp_t gfp_flags)
 	struct page *page;
 	int err;
 
-	page = __skb_alloc_page(gfp_flags | __GFP_NOMEMALLOC, NULL);
+	page = __dev_alloc_page(gfp_flags | __GFP_NOMEMALLOC);
 	if (!page)
 		return -ENOMEM;
 
@@ -212,7 +212,7 @@ resubmit:
 	if (page)
 		put_page(page);
 	if (req)
-		rx_submit(pnd, req, GFP_ATOMIC | __GFP_COLD);
+		rx_submit(pnd, req, GFP_ATOMIC);
 }
 
 static int usbpn_close(struct net_device *dev);
@@ -231,7 +231,7 @@ static int usbpn_open(struct net_device *dev)
 	for (i = 0; i < rxq_size; i++) {
 		struct urb *req = usb_alloc_urb(0, GFP_KERNEL);
 
-		if (!req || rx_submit(pnd, req, GFP_KERNEL | __GFP_COLD)) {
+		if (!req || rx_submit(pnd, req, GFP_KERNEL)) {
 			usb_free_urb(req);
 			usbpn_close(dev);
 			return -ENOMEM;
diff --git a/drivers/usb/gadget/function/f_phonet.c b/drivers/usb/gadget/function/f_phonet.c
index b9cfc15..cde7397 100644
--- a/drivers/usb/gadget/function/f_phonet.c
+++ b/drivers/usb/gadget/function/f_phonet.c
@@ -303,7 +303,7 @@ pn_rx_submit(struct f_phonet *fp, struct usb_request *req, gfp_t gfp_flags)
 	struct page *page;
 	int err;
 
-	page = __skb_alloc_page(gfp_flags | __GFP_NOMEMALLOC, NULL);
+	page = __dev_alloc_page(gfp_flags | __GFP_NOMEMALLOC);
 	if (!page)
 		return -ENOMEM;
 
@@ -377,7 +377,7 @@ static void pn_rx_complete(struct usb_ep *ep, struct usb_request *req)
 	if (page)
 		put_page(page);
 	if (req)
-		pn_rx_submit(fp, req, GFP_ATOMIC | __GFP_COLD);
+		pn_rx_submit(fp, req, GFP_ATOMIC);
 }
 
 /*-------------------------------------------------------------------------*/
@@ -437,7 +437,7 @@ static int pn_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
 
 			netif_carrier_on(dev);
 			for (i = 0; i < phonet_rxq_size; i++)
-				pn_rx_submit(fp, fp->out_reqv[i], GFP_ATOMIC | __GFP_COLD);
+				pn_rx_submit(fp, fp->out_reqv[i], GFP_ATOMIC);
 		}
 		spin_unlock(&port->lock);
 		return 0;

^ permalink raw reply related

* [net-next PATCH v2 2/5] cxgb4/cxgb4vf: Replace __skb_alloc_page with __netdev_alloc_page
From: Alexander Duyck @ 2014-11-11 17:11 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111170711.15976.44062.stgit@ahduyck-vm-fedora20>

Drop the bloated use of __skb_alloc_page and replace it with
__netdev_alloc_page.  In addition update the one other spot that is
allocating a page so that it allocates with the correct flags.

Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |    6 +++---
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c |    7 ++++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 5e1b314..20ee002 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -576,7 +576,7 @@ static unsigned int refill_fl(struct adapter *adap, struct sge_fl *q, int n,
 	__be64 *d = &q->desc[q->pidx];
 	struct rx_sw_desc *sd = &q->sdesc[q->pidx];
 
-	gfp |= __GFP_NOWARN | __GFP_COLD;
+	gfp |= __GFP_NOWARN;
 
 	if (s->fl_pg_order == 0)
 		goto alloc_small_pages;
@@ -585,7 +585,7 @@ static unsigned int refill_fl(struct adapter *adap, struct sge_fl *q, int n,
 	 * Prefer large buffers
 	 */
 	while (n) {
-		pg = alloc_pages(gfp | __GFP_COMP, s->fl_pg_order);
+		pg = __dev_alloc_pages(gfp, s->fl_pg_order);
 		if (unlikely(!pg)) {
 			q->large_alloc_failed++;
 			break;       /* fall back to single pages */
@@ -615,7 +615,7 @@ static unsigned int refill_fl(struct adapter *adap, struct sge_fl *q, int n,
 
 alloc_small_pages:
 	while (n--) {
-		pg = __skb_alloc_page(gfp, NULL);
+		pg = __dev_alloc_page(gfp);
 		if (unlikely(!pg)) {
 			q->alloc_failed++;
 			break;
diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
index 85036e6..9df40df 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/sge.c
@@ -602,6 +602,8 @@ static unsigned int refill_fl(struct adapter *adapter, struct sge_fl *fl,
 	 */
 	BUG_ON(fl->avail + n > fl->size - FL_PER_EQ_UNIT);
 
+	gfp |= __GFP_NOWARN;
+
 	/*
 	 * If we support large pages, prefer large buffers and fail over to
 	 * small pages if we can't allocate large pages to satisfy the refill.
@@ -612,8 +614,7 @@ static unsigned int refill_fl(struct adapter *adapter, struct sge_fl *fl,
 		goto alloc_small_pages;
 
 	while (n) {
-		page = alloc_pages(gfp | __GFP_COMP | __GFP_NOWARN,
-				   FL_PG_ORDER);
+		page = __dev_alloc_pages(gfp, FL_PG_ORDER);
 		if (unlikely(!page)) {
 			/*
 			 * We've failed inour attempt to allocate a "large
@@ -657,7 +658,7 @@ static unsigned int refill_fl(struct adapter *adapter, struct sge_fl *fl,
 
 alloc_small_pages:
 	while (n--) {
-		page = __skb_alloc_page(gfp | __GFP_NOWARN, NULL);
+		page = __dev_alloc_page(gfp);
 		if (unlikely(!page)) {
 			fl->alloc_failed++;
 			break;

^ permalink raw reply related

* [net-next PATCH v2 1/5] net: Add device Rx page allocation function
From: Alexander Duyck @ 2014-11-11 17:10 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: leedom, hariprasad, donald.c.skidmore, oliver, balbi,
	matthew.vick, mgorman, davem, jeffrey.t.kirsher
In-Reply-To: <20141111170711.15976.44062.stgit@ahduyck-vm-fedora20>

This patch implements __dev_alloc_pages and __dev_alloc_page.  These are
meant to replace the __skb_alloc_pages and __skb_alloc_page functions.  The
reason for doing this is that it occurred to me that __skb_alloc_page is
supposed to be passed an sk_buff pointer, but it is NULL in all cases where
it is used.  Worse is that in the case of ixgbe it is passed NULL via the
sk_buff pointer in the rx_buffer info structure which means the compiler is
not correctly stripping it out.

The naming for these functions is based on dev_alloc_skb and __dev_alloc_skb.
There was originally a netdev_alloc_page, however that was passed a
net_device pointer and this function is not so I thought it best to follow
that naming scheme since that is the same difference between dev_alloc_skb
and netdev_alloc_skb.

In the case of anything greater than order 0 it is assumed that we want a
compound page so __GFP_COMP is set for all allocations as we expect a
compound page when assigning a page frag.

The other change in this patch is to exploit the behaviors of the page
allocator in how it handles flags.  So for example we can always set
__GFP_COMP and __GFP_MEMALLOC since they are ignored if they are not
applicable or are overridden by another flag.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 103fbe8..2e5221f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2185,6 +2185,54 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 }
 
 /**
+ * __dev_alloc_pages - allocate page for network Rx
+ * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
+ * @order: size of the allocation
+ *
+ * Allocate a new page.
+ *
+ * %NULL is returned if there is no free memory.
+*/
+static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
+					     unsigned int order)
+{
+	/* This piece of code contains several assumptions.
+	 * 1.  This is for device Rx, therefor a cold page is preferred.
+	 * 2.  The expectation is the user wants a compound page.
+	 * 3.  If requesting a order 0 page it will not be compound
+	 *     due to the check to see if order has a value in prep_new_page
+	 * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
+	 *     code in gfp_to_alloc_flags that should be enforcing this.
+	 */
+	gfp_mask |= __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC;
+
+	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
+}
+
+static inline struct page *dev_alloc_pages(unsigned int order)
+{
+	return __dev_alloc_pages(GFP_ATOMIC, order);
+}
+
+/**
+ * __dev_alloc_page - allocate a page for network Rx
+ * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
+ *
+ * Allocate a new page.
+ *
+ * %NULL is returned if there is no free memory.
+ */
+static inline struct page *__dev_alloc_page(gfp_t gfp_mask)
+{
+	return __dev_alloc_pages(gfp_mask, 0);
+}
+
+static inline struct page *dev_alloc_page(void)
+{
+	return __dev_alloc_page(GFP_ATOMIC);
+}
+
+/**
  *	__skb_alloc_pages - allocate pages for ps-rx on a skb and preserve pfmemalloc data
  *	@gfp_mask: alloc_pages_node mask. Set __GFP_NOMEMALLOC if not for network packet RX
  *	@skb: skb to set pfmemalloc on if __GFP_MEMALLOC is used

^ permalink raw reply related

* [net-next PATCH v2 0/5] Replace __skb_alloc_pages with simpler function
From: Alexander Duyck @ 2014-11-11 17:10 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
  Cc: leedom-ut6Up61K2wZBDgjK7y7TUQ, hariprasad-ut6Up61K2wZBDgjK7y7TUQ,
	donald.c.skidmore-ral2JQCrhuEAvxtiuMwx3w,
	oliver-GvhC2dPhHPQdnm+yROfE0A, balbi-l0cyMroinI0,
	matthew.vick-ral2JQCrhuEAvxtiuMwx3w, mgorman-l3A5Bk7waGM,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w

This patch series replaces __skb_alloc_pages with a much simpler function,
__dev_alloc_pages.  The main difference between the two is that
__skb_alloc_pages had an sk_buff pointer that was being passed as NULL in
call places where it was called.  In a couple of cases the NULL was passed
by variable and this led to unnecessary code being run.

As such in order to simplify things the __dev_alloc_pages call only takes a
mask and the page order being requested.  In addition it takes advantage of
several behaviors already built into the page allocator so that it can just
set GFP flags unconditionally.

v2: Renamed functions to dev_alloc_page(s) instead of netdev_alloc_page(s)
    Removed __GFP_COLD flag from usb code as it was redundant

---

Alexander Duyck (5):
      net: Add device Rx page allocation function
      cxgb4/cxgb4vf: Replace __skb_alloc_page with __netdev_alloc_page
      phonet: Replace calls to __skb_alloc_page with __netdev_alloc_page
      fm10k/igb/ixgbe: Replace __skb_alloc_page with netdev_alloc_page
      net: Remove __skb_alloc_page and __skb_alloc_pages


 drivers/net/ethernet/chelsio/cxgb4/sge.c      |    6 +-
 drivers/net/ethernet/chelsio/cxgb4vf/sge.c    |    7 ++-
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    2 -
 drivers/net/ethernet/intel/igb/igb_main.c     |    2 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 -
 drivers/net/usb/cdc-phonet.c                  |    6 +-
 drivers/usb/gadget/function/f_phonet.c        |    6 +-
 include/linux/skbuff.h                        |   61 ++++++++++++++-----------
 8 files changed, 49 insertions(+), 44 deletions(-)

--
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Patch net-next] net: remove netif_set_real_num_rx_queues()
From: Cong Wang @ 2014-11-11 17:07 UTC (permalink / raw)
  To: Edward Cree
  Cc: Linux Kernel Network Developers, Eric Dumazet, David S. Miller
In-Reply-To: <54620AC0.9080509@solarflare.com>

On Tue, Nov 11, 2014 at 5:10 AM, Edward Cree <ecree@solarflare.com> wrote:
>> -static inline int netif_copy_real_num_queues(struct net_device *to_dev,
>> -                                          const struct net_device *from_dev)
> Patch title says you're removing _set_ but body only removes _copy_.
> Which one is right?

Oops, my copy-n-paste error...

^ permalink raw reply

* [PATCH] selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
From: Shuah Khan @ 2014-11-11 17:04 UTC (permalink / raw)
  To: davem; +Cc: Shuah Khan, netdev, linux-api, linux-kernel

The while loop in sock_fanout_read_ring() checks mmap region
bounds after access, causing it to segfault. Fix it to check
count before accessing header->tp_status. This problem can be
reproduced consistently when the test in run as follows:

    make -C tools/testing/selftests TARGETS=net run_tests
    or
    make run_tests from tools/testing/selftests
    or
    make run_test from tools/testing/selftests/net

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
---
 tools/testing/selftests/net/psock_fanout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c
index 57b9c2b..6f67333 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -128,7 +128,7 @@ static int sock_fanout_read_ring(int fd, void *ring)
 	struct tpacket2_hdr *header = ring;
 	int count = 0;
 
-	while (header->tp_status & TP_STATUS_USER && count < RING_NUM_FRAMES) {
+	while (count < RING_NUM_FRAMES && header->tp_status & TP_STATUS_USER) {
 		count++;
 		header = ring + (count * getpagesize());
 	}
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH V1 net-next] net/fm10k: Avoid double setting of NETIF_F_SG for the HW encapsulation feature mask
From: Vick, Matthew @ 2014-11-11 16:37 UTC (permalink / raw)
  To: Or Gerlitz, Kirsher, Jeffrey T; +Cc: netdev@vger.kernel.org
In-Reply-To: <1415520822-11922-1-git-send-email-ogerlitz@mellanox.com>

On 11/9/14, 12:13 AM, "Or Gerlitz" <ogerlitz@mellanox.com> wrote:

>The networking core does it for the driver during registration time.
>
>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>---
>
>changes from v0:
> - set alignment to be per the request of Matthew Vick

Thanks Or! I'll add my ACK as this passes through Jeff's queue.

^ permalink raw reply

* Re: How to make stack send broadcast ARP request when entry is STALE?
From: Brian Haley @ 2014-11-11 16:36 UTC (permalink / raw)
  To: Ulf samuelsson; +Cc: Netdev
In-Reply-To: <C8A75CC5-18FA-4D2D-9259-74254D808107@emagii.com>

On 11/11/2014 05:08 AM, Ulf samuelsson wrote:
> If I set ucast_solicit to '0', then I always send broadcast, which is not desirable.

That seems to contradict your statement:

"The HP router is configured by a customer, and they intentionally limit replies
 to broadcast, and that is how they want it."

So I'm not understanding your question.

The best way forward would be for you to send a patch out that gets your desired
behaviour, and let others give feedback on it.

-Brian

> In the PROBE state of the ARP state machine, "probes" count from 0 .. ucast_probes.
> 
> I can see the following arguments for letting "probes" count from 
>   
>    0.. (ucast_probes + app_probes + mcast_probes)
>    
> 
> A: This is how the IPv6 is doing it. 
>      It is not standardized in IPv4, but why should the IPv4 have a different behaviour?
> 
> B: If you do not send out broadcast if unicast fails, then a broadcast will be sent out 
>      anyway, once the ARP entry is removed by the garbage collector.
>      You get an annoyingly long delay before that happens.
> 
> C: If a large data centre does not want broadcasts to be sent out, 
>      then they can set mcast_probes to 0, in which case no broadcasts will be sent
>      out in PROBE state.
> 
> D:  When in other states, the counter runs until it a reply is had, or the counter wraps around.
>       It is sending broadcast all the time.
> 
> 
> Best Regards
> Ulf Samuelsson
> ulf@emagii.com
> +46  (722) 427 437
> 
> 
>> 10 nov 2014 kl. 23:52 skrev Brian Haley <brian.haley@hp.com>:
>>
>>> On 11/07/2014 05:11 AM, Ulf samuelsson wrote:
>>> The HP router is configured by a customer, and they intentionally limit replies
>>> to broadcast, and that is how they want it.
>>
>> So this is the crux of the problem - the customer has configured the router so
>> that it doesn't play well with most modern network stacks that try and use
>> unicast so they don't send unnecessary broadcast packets.  I don't know why I
>> thought this was something wrong with the router software.
>>
>> Did you try this?
>>
>> $ sudo sysctl net.ipv4.neigh.eth0.ucast_solicit=0
>>
>> It works for me.
>>
>> And they really should re-think their decision on that configuration setting.
>>
>> -Brian
>>
>>
>>> In the previous version of the build system, the Interpeak stack was used
>>> and this would in PROBE state send unicast ARP request, and if that failed
>>> send broadcast ARP.
>>>
>>> The native linux stack, when in PROBE state sends only unicast until it decides
>>> that it should enter FAILED state.
>>>
>>> The 'mcast_probes' variable seems to be totally ignored, except the first  time,
>>> so I do not see why it is there.
>>>
>>> Best Regards
>>> Ulf Samuelsson
>>> ulf@emagii.com
>>> +46  (722) 427 437
>>>
>>>
>>>>> 7 nov 2014 kl. 10:54 skrev Brian Haley <brian.haley@hp.com>:
>>>>>
>>>>> On 11/05/2014 07:48 AM, Ulf samuelsson wrote:
>>>>> Have a problem with an HP router at a certain location, which
>>>>> is configured to only answer to broadcast ARP requests.
>>>>> That cannot be changed.
>>>>
>>>> Sorry to hear about the problem, but my only suggestions would be to try the latest firmware and/or put a call in to support.  I don't happen work in the division that makes routers...
>>>>
>>>>> The first ARP request the kernel sends out, is a broadcast request,
>>>>> which is fine, but after the reply, the kernel sends unicast requests,
>>>>> which will not get any replies.
>>>>
>>>> You might be able to hack this by inserting an ebtables rule - check the dnat target section of the man page - don't know the exact syntax but it would probably end in '-j dnat --to-destination ff:ff:ff:ff:ff:ff'
>>>>
>>>> -Brian
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 

^ permalink raw reply

* Re: [PATCH] net: s6gmac: remove driver
From: David Miller @ 2014-11-11 16:30 UTC (permalink / raw)
  To: pebolle; +Cc: valentinrothberg, jcmvbkbc, dg, netdev, linux-kernel
In-Reply-To: <1415694742.16021.11.camel@x220>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Tue, 11 Nov 2014 09:32:22 +0100

> On Sun, 2014-10-19 at 13:28 -0400, David Miller wrote:
>> From: Max Filippov <jcmvbkbc@gmail.com>
>> Date: Sun, 19 Oct 2014 21:26:05 +0400
>> 
>> > On Sun, Oct 19, 2014 at 8:45 PM, David Miller <davem@davemloft.net> wrote:
>> >> From: Daniel Glöckner <dg@emlix.com>
>> >> Date: Sun, 19 Oct 2014 00:49:05 +0200
>> >>
>> >>> The s6000 Xtensa support is removed from the kernel. There are no
>> >>> other chips using this driver.
>> >>
>> >> That's not what I see in Linus's current tree:
>> > 
>> > David, s6000 removal is queued for 3.18.
>> 
>> Then there is zero reason for me to apply this to my networking
>> tree.
> 
> Commit 4006e565e150 ("xtensa: remove s6000 variant and s6105 platform")
> is included in linux-next as of next-20141111. So now we see:
>     $ git grep XTENSA_VARIANT_S6000 next-20141111
>     next-20141111:drivers/net/ethernet/Kconfig:     depends on XTENSA_VARIANT_S6000
> 
> Should Daniel resubmit, for net-next, perhaps with a commit explanation
> that references this commit?

When the s6000 removal shows up in Linus's tree, submit it.

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: Thomas Graf @ 2014-11-11 16:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo
In-Reply-To: <20141111154017.GH1825@nanopsycho.lan>

On 11/11/14 at 04:40pm, Jiri Pirko wrote:
> Tue, Nov 11, 2014 at 04:32:32PM CET, tgraf@suug.ch wrote:
> >If the ID is only unique within a driver, then the user space cannot
> >rely on using the ID to group switch ports. Multiple drivers might
> >come up with the same ID.
> 
> Well, as I said, it is the same as for physical port id. But if needed,
> there can be added some simple mechanism for the id registration
> ensuring their uniqueness.
> 
> >
> >Even now, multiple rocker instances would have the same ID.
> 
> It depends on what hw returns to driver.

I think that falls within the responsibility of the API. I'll propose
a patch to address it after this gets in.

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: John Fastabend @ 2014-11-11 15:44 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Thomas Graf, Jiri Pirko, netdev, davem, nhorman, andy, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo
In-Reply-To: <54622E2B.10900@cumulusnetworks.com>

On 11/11/2014 07:41 AM, Roopa Prabhu wrote:
> On 11/11/14, 7:32 AM, Thomas Graf wrote:
>> On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>>>> On 11/10/14 at 02:04pm, John Fastabend wrote:
>>>>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>>>>> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>>>>> +                    struct netdev_phys_item_id *psid)
>>>>>> +{
>>>>>> +    struct rocker_port *rocker_port = netdev_priv(dev);
>>>>>> +    struct rocker *rocker = rocker_port->rocker;
>>>>>> +
>>>>> hmm looks like you read this out of a magic switch register :) but
>>>>> my switch doesn't have this magic reg. I suposse the switch MAC
>>>>> address
>>>>> should work.
>>>> This needs more work afterwards. Either we define that the switch ID
>>>> is only unique in combination with the parent ifindex or we need to
>>>> introduce a notation of uniquness into the switch ID itself.
>>> This is something similar to physical port id. Each driver should take
>>> care of generating that id.
>> If the ID is only unique within a driver, then the user space cannot
>> rely on using the ID to group switch ports. Multiple drivers might
>> come up with the same ID.
>>
>> Even now, multiple rocker instances would have the same ID.
> yep, that was my concern on the switch id namespace handling (on the
> other thread).

At least the devices I work with have a burnt in MAC which could
be used for this.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: Roopa Prabhu @ 2014-11-11 15:41 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Jiri Pirko, John Fastabend, netdev, davem, nhorman, andy,
	dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, jhs, sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gos
In-Reply-To: <20141111153232.GH19157@casper.infradead.org>

On 11/11/14, 7:32 AM, Thomas Graf wrote:
> On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>>> On 11/10/14 at 02:04pm, John Fastabend wrote:
>>>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>>>> +static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>>>> +					struct netdev_phys_item_id *psid)
>>>>> +{
>>>>> +	struct rocker_port *rocker_port = netdev_priv(dev);
>>>>> +	struct rocker *rocker = rocker_port->rocker;
>>>>> +
>>>> hmm looks like you read this out of a magic switch register :) but
>>>> my switch doesn't have this magic reg. I suposse the switch MAC address
>>>> should work.
>>> This needs more work afterwards. Either we define that the switch ID
>>> is only unique in combination with the parent ifindex or we need to
>>> introduce a notation of uniquness into the switch ID itself.
>> This is something similar to physical port id. Each driver should take
>> care of generating that id.
> If the ID is only unique within a driver, then the user space cannot
> rely on using the ID to group switch ports. Multiple drivers might
> come up with the same ID.
>
> Even now, multiple rocker instances would have the same ID.
yep, that was my concern on the switch id namespace handling (on the 
other thread).

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: Jiri Pirko @ 2014-11-11 15:40 UTC (permalink / raw)
  To: Thomas Graf
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo
In-Reply-To: <20141111153232.GH19157@casper.infradead.org>

Tue, Nov 11, 2014 at 04:32:32PM CET, tgraf@suug.ch wrote:
>On 11/11/14 at 04:19pm, Jiri Pirko wrote:
>> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>> >On 11/10/14 at 02:04pm, John Fastabend wrote:
>> >> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>> >> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
>> >> >+					struct netdev_phys_item_id *psid)
>> >> >+{
>> >> >+	struct rocker_port *rocker_port = netdev_priv(dev);
>> >> >+	struct rocker *rocker = rocker_port->rocker;
>> >> >+
>> >> 
>> >> hmm looks like you read this out of a magic switch register :) but
>> >> my switch doesn't have this magic reg. I suposse the switch MAC address
>> >> should work.
>> >
>> >This needs more work afterwards. Either we define that the switch ID
>> >is only unique in combination with the parent ifindex or we need to
>> >introduce a notation of uniquness into the switch ID itself.
>> 
>> This is something similar to physical port id. Each driver should take
>> care of generating that id.
>
>If the ID is only unique within a driver, then the user space cannot
>rely on using the ID to group switch ports. Multiple drivers might
>come up with the same ID.

Well, as I said, it is the same as for physical port id. But if needed,
there can be added some simple mechanism for the id registration
ensuring their uniqueness.

>
>Even now, multiple rocker instances would have the same ID.

It depends on what hw returns to driver.

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: Thomas Graf @ 2014-11-11 15:32 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo
In-Reply-To: <20141111151928.GF1825@nanopsycho.lan>

On 11/11/14 at 04:19pm, Jiri Pirko wrote:
> Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
> >On 11/10/14 at 02:04pm, John Fastabend wrote:
> >> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> >> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
> >> >+					struct netdev_phys_item_id *psid)
> >> >+{
> >> >+	struct rocker_port *rocker_port = netdev_priv(dev);
> >> >+	struct rocker *rocker = rocker_port->rocker;
> >> >+
> >> 
> >> hmm looks like you read this out of a magic switch register :) but
> >> my switch doesn't have this magic reg. I suposse the switch MAC address
> >> should work.
> >
> >This needs more work afterwards. Either we define that the switch ID
> >is only unique in combination with the parent ifindex or we need to
> >introduce a notation of uniquness into the switch ID itself.
> 
> This is something similar to physical port id. Each driver should take
> care of generating that id.

If the ID is only unique within a driver, then the user space cannot
rely on using the ID to group switch ports. Multiple drivers might
come up with the same ID.

Even now, multiple rocker instances would have the same ID.

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: Jiri Pirko @ 2014-11-11 15:28 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5461366A.9050900@gmail.com>

Mon, Nov 10, 2014 at 11:04:26PM CET, john.fastabend@gmail.com wrote:
>On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>This patch introduces the first driver to benefit from the switchdev
>>infrastructure and to implement newly introduced switch ndos. This is a
>>driver for emulated switch chip implemented in qemu:
>>https://github.com/sfeldma/qemu-rocker/
>>
>>This patch is a result of joint work with Scott Feldman.
>>
>>Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  MAINTAINERS                          |    7 +
>>  drivers/net/ethernet/Kconfig         |    1 +
>>  drivers/net/ethernet/Makefile        |    1 +
>>  drivers/net/ethernet/rocker/Kconfig  |   27 +
>>  drivers/net/ethernet/rocker/Makefile |    5 +
>>  drivers/net/ethernet/rocker/rocker.c | 2060 ++++++++++++++++++++++++++++++++++
>>  drivers/net/ethernet/rocker/rocker.h |  427 +++++++
>>  7 files changed, 2528 insertions(+)
>>  create mode 100644 drivers/net/ethernet/rocker/Kconfig
>>  create mode 100644 drivers/net/ethernet/rocker/Makefile
>>  create mode 100644 drivers/net/ethernet/rocker/rocker.c
>>  create mode 100644 drivers/net/ethernet/rocker/rocker.h
>>
>
>[...]
>
>>+
>>+static netdev_tx_t rocker_port_xmit(struct sk_buff *skb, struct net_device *dev)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+	struct rocker *rocker = rocker_port->rocker;
>>+	struct rocker_desc_info *desc_info;
>>+	struct rocker_tlv *frags;
>>+	int i;
>>+	int err;
>>+
>>+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
>>+	if (unlikely(!desc_info)) {
>>+		if (net_ratelimit())
>
>Could you have a netif_stop_queue() here as well same as below? Not
>that optimizing the xmit routine is the interesting part of this patch.
>But I guess this is just some strange error path because I see you
>check this case below.

This code should never be reached.
If the ring is full, the queue is previously stopped by:

        desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
        if (!desc_info)
                netif_stop_queue(dev);





>
>>+			netdev_err(dev, "tx ring full when queue awake\n");
>>+		return NETDEV_TX_BUSY;
>>+	}
>>+
>>+	rocker_desc_cookie_ptr_set(desc_info, skb);
>>+
>>+	frags = rocker_tlv_nest_start(desc_info, ROCKER_TLV_TX_FRAGS);
>>+	if (!frags)
>>+		goto out;
>>+	err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
>>+					  skb->data, skb_headlen(skb));
>>+	if (err)
>>+		goto nest_cancel;
>>+	if (skb_shinfo(skb)->nr_frags > ROCKER_TX_FRAGS_MAX)
>>+		goto nest_cancel;
>>+
>>+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>>+		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>>+
>>+		err = rocker_tx_desc_frag_map_put(rocker_port, desc_info,
>>+						  skb_frag_address(frag),
>>+						  skb_frag_size(frag));
>>+		if (err)
>>+			goto unmap_frags;
>>+	}
>>+	rocker_tlv_nest_end(desc_info, frags);
>>+
>>+	rocker_desc_gen_clear(desc_info);
>>+	rocker_desc_head_set(rocker, &rocker_port->tx_ring, desc_info);
>>+
>>+	desc_info = rocker_desc_head_get(&rocker_port->tx_ring);
>>+	if (!desc_info)
>>+		netif_stop_queue(dev);
>
>I'm not entirely sure I followed the TLV usage here but OK. If it
>works...
>
>>+
>>+	return NETDEV_TX_OK;
>>+
>>+unmap_frags:
>>+	rocker_tx_desc_frags_unmap(rocker_port, desc_info);
>>+nest_cancel:
>>+	rocker_tlv_nest_cancel(desc_info, frags);
>>+out:
>>+	dev_kfree_skb(skb);
>>+	return NETDEV_TX_OK;
>>+}
>>+
>>+static int rocker_port_set_mac_address(struct net_device *dev, void *p)
>>+{
>>+	struct sockaddr *addr = p;
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+	int err;
>>+
>>+	if (!is_valid_ether_addr(addr->sa_data))
>>+		return -EADDRNOTAVAIL;
>>+
>>+	err = rocker_cmd_set_port_settings_macaddr(rocker_port, addr->sa_data);
>>+	if (err)
>>+		return err;
>>+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
>>+	return 0;
>>+}
>>+
>>+static int rocker_port_sw_parent_id_get(struct net_device *dev,
>>+					struct netdev_phys_item_id *psid)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+	struct rocker *rocker = rocker_port->rocker;
>>+
>
>hmm looks like you read this out of a magic switch register :) but
>my switch doesn't have this magic reg. I suposse the switch MAC address
>should work.
>
>>+	psid->id_len = sizeof(rocker->hw.id);
>>+	memcpy(&psid->id, &rocker->hw.id, psid->id_len);
>>+	return 0;
>>+}
>>+
>>+static const struct net_device_ops rocker_port_netdev_ops = {
>>+	.ndo_open			= rocker_port_open,
>>+	.ndo_stop			= rocker_port_stop,
>>+	.ndo_start_xmit			= rocker_port_xmit,
>>+	.ndo_set_mac_address		= rocker_port_set_mac_address,
>>+	.ndo_sw_parent_id_get		= rocker_port_sw_parent_id_get,
>>+};
>>+
>>+/********************
>>+ * ethtool interface
>>+ ********************/
>>+
>>+static int rocker_port_get_settings(struct net_device *dev,
>>+				    struct ethtool_cmd *ecmd)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+
>>+	return rocker_cmd_get_port_settings_ethtool(rocker_port, ecmd);
>>+}
>>+
>>+static int rocker_port_set_settings(struct net_device *dev,
>>+				    struct ethtool_cmd *ecmd)
>>+{
>>+	struct rocker_port *rocker_port = netdev_priv(dev);
>>+
>>+	return rocker_cmd_set_port_settings_ethtool(rocker_port, ecmd);
>>+}
>>+
>>+static void rocker_port_get_drvinfo(struct net_device *dev,
>>+				    struct ethtool_drvinfo *drvinfo)
>>+{
>>+	strlcpy(drvinfo->driver, rocker_driver_name, sizeof(drvinfo->driver));
>>+	strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
>>+}
>>+
>>+static const struct ethtool_ops rocker_port_ethtool_ops = {
>>+	.get_settings		= rocker_port_get_settings,
>>+	.set_settings		= rocker_port_set_settings,
>>+	.get_drvinfo		= rocker_port_get_drvinfo,
>>+	.get_link		= ethtool_op_get_link,
>>+};
>>+
>
>[...]
>
>Looks reasonable to me, although I mostly scanned it and looked
>over the interface parts. My comments are just observations no
>need to change anything for them.
>
>Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [patch net-next v2 05/10] rocker: introduce rocker switch driver
From: Jiri Pirko @ 2014-11-11 15:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: John Fastabend, netdev, davem, nhorman, andy, dborkman, ogerlitz,
	jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo
In-Reply-To: <20141111142946.GG19157@casper.infradead.org>

Tue, Nov 11, 2014 at 03:29:46PM CET, tgraf@suug.ch wrote:
>On 11/10/14 at 02:04pm, John Fastabend wrote:
>> On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>> >+static int rocker_port_sw_parent_id_get(struct net_device *dev,
>> >+					struct netdev_phys_item_id *psid)
>> >+{
>> >+	struct rocker_port *rocker_port = netdev_priv(dev);
>> >+	struct rocker *rocker = rocker_port->rocker;
>> >+
>> 
>> hmm looks like you read this out of a magic switch register :) but
>> my switch doesn't have this magic reg. I suposse the switch MAC address
>> should work.
>
>This needs more work afterwards. Either we define that the switch ID
>is only unique in combination with the parent ifindex or we need to
>introduce a notation of uniquness into the switch ID itself.

This is something similar to physical port id. Each driver should take
care of generating that id.

>
>Is the goal to expose a hardware ID here to allow identification of
>the hardware chip?
>
>MAC is tempting but I'm pretty sure that we'll have pure L3 devices
>being handled by this API at some point.

^ permalink raw reply

* Re: [patch net-next v2 02/10] net: introduce generic switch devices support
From: Jiri Pirko @ 2014-11-11 15:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl
In-Reply-To: <5461354A.3020906@gmail.com>

Mon, Nov 10, 2014 at 10:59:38PM CET, john.fastabend@gmail.com wrote:
>On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>>  MAINTAINERS                            |  7 ++++
>>  include/linux/netdevice.h              | 10 ++++++
>>  include/net/switchdev.h                | 30 +++++++++++++++++
>>  net/Kconfig                            |  1 +
>>  net/Makefile                           |  3 ++
>>  net/switchdev/Kconfig                  | 13 ++++++++
>>  net/switchdev/Makefile                 |  5 +++
>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>>  9 files changed, 161 insertions(+)
>>  create mode 100644 Documentation/networking/switchdev.txt
>>  create mode 100644 include/net/switchdev.h
>>  create mode 100644 net/switchdev/Kconfig
>>  create mode 100644 net/switchdev/Makefile
>>  create mode 100644 net/switchdev/switchdev.c
>>
>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>new file mode 100644
>>index 0000000..98be76c
>>--- /dev/null
>>+++ b/Documentation/networking/switchdev.txt
>>@@ -0,0 +1,59 @@
>>+Switch (and switch-ish) device drivers HOWTO
>>+===========================
>>+
>>+Please note that the word "switch" is here used in very generic meaning.
>>+This include devices supporting L2/L3 but also various flow offloading chips,
>>+including switches embedded into SR-IOV NICs.
>>+
>>+Lets describe a topology a bit. Imagine the following example:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+In this example, there are two independent lines between the switch silicon
>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>+separate from the switch driver. SOME switch chip is by managed by a driver
>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>+connected to some other type of bus.
>>+
>>+Now, for the previous example show the representation in kernel:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  eth0 eth1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>+created for each port of a switch. These netdevices are instances
>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>+
>>+The only difference of the switch-port netdevice from the ordinary netdevice
>>+is that is implements couple more NDOs:
>>+
>>+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>>+			       of the same physical switch chip. This is
>>+			       mandatory to be implemented by all switch drivers
>>+			       and serves the caller for recognition of a port
>>+			       netdevice.
>
>What is the connection between ndo_sw_parent_get_id and
>ndo_get_phys_port_id(). I'm having a bit of trouble teasing
>this out.
>
>For example here is my ascii art for a SR-IOV NIC,
>
>       eth0     eth1     eth2
>        |         |        |
>        |         |        |
>        PF        VF       VF
>   +----+---------+--------+----+
>   |       embedded bridge      |
>   +-------------+--------------+
>                 |
>                port
>
>that can do switching between the various uplinks and downlinks.
>In IEEE 802.1Q language the embedded bridge acts like an edge
>relay. At least that seems to be the current state of the art
>for SR-IOV. Edge relay just means it has a single uplink port
>to the network and multiple downlinks and also isn't required
>to do learning and run loop detection protocols STP, et. al.
>
>Also there are multi-function devices that look the same except
>replace the VFs with PFs. It seems to be a common mode for NICs
>that do the iSCSI offloads with storage functions.
>
>When something is an embedded bridge vs a SOME switch chip is
>not entirely clear.
>
>My understanding is use ndo_sw_parent_get_id() when you have
>multiple physical ports all connected to a single switch object.
>When you have a single port connected to multiple PCIE functions
>or queues representing a netdev (e.g. macvlan offload) use the
>ndo_get_phys_port_id(). Just want to be sure we are on the
>same page here.

Nod. You described that right.


>
>Otherwise patch looks good. I think we can clear the above up
>with an addition to the documentation. Could go in after the
>initial set and be OK with me.
>
>IMO this patch is needed otherwise user space is at a complete
>loss on trying to figure out how netdevs map to switch silicon.
>You could have reused ndo_get_phys_port_id() perhaps but then
>I think user space may get confused by SR-IOV/VMDQ/etc ports
>attached to a switch silicon. For .02$ having a new distinct
>identifier is cleaner.

It most definitelly is. Therefore I went that way.


>
>
>>+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>>+			  chip itself (it can be though of as a "parent" of the
>>+			  port, therefore the name). They are not port-specific.
>>+			  Caller might use arbitrary port netdevice of the same
>>+			  switch and it will make no difference.
>>+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>
>[...]
>
>Thanks,
>John
>
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox