Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: mvneta: fix race condition in mvneta_tx()
From: Eric Dumazet @ 2014-12-02 12:37 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: netdev, Maggie Mae Roxas, Thomas Petazzoni, Gregory CLEMENT,
	Ezequiel Garcia
In-Reply-To: <1417523459.5303.39.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, 2014-12-02 at 04:30 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> mvneta_tx() dereferences skb to get skb->len too late,
> as hardware might have completed the transmit and TX completion
> could have freed the skb from another cpu.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>



For completeness, this bug was added in linux-3.14 and seems a stable
candidate.

Fixes: 71f6d1b31fb1 ("net: mvneta: replace Tx timer with a real interrupt")

^ permalink raw reply

* Re: [PATCH] net: mvneta: fix Tx interrupt delay
From: Willy Tarreau @ 2014-12-02 12:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Maggie Mae Roxas, Thomas Petazzoni, Gregory CLEMENT,
	Ezequiel Garcia
In-Reply-To: <1417522688.5303.35.camel@edumazet-glaptop2.roam.corp.google.com>

Hi Eric,

On Tue, Dec 02, 2014 at 04:18:08AM -0800, Eric Dumazet wrote:
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index 4762994..35bfba7 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -214,7 +214,7 @@
> >  /* Various constants */
> >  
> >  /* Coalescing */
> > -#define MVNETA_TXDONE_COAL_PKTS		16
> > +#define MVNETA_TXDONE_COAL_PKTS		1
> >  #define MVNETA_RX_COAL_PKTS		32
> >  #define MVNETA_RX_COAL_USEC		100
> >  
> 
> 
> I am surprised TCP even worked correctly with this problem.

There are multiple explanations to this :
  - we used to flush Tx queues upon Rx interrupt, which used to hide
    the problem.

  - we tend to have large socket buffers, which cover the issue. I've
    never had any issue at high data rates. After all only 23kB send
    buffer is needed to get it to work.

  - most often you have multiple parallel streams which hide the issue
    even more.

> I highly suggest BQL for this driver, now this issue is fixed.

How does that work ?

> I wonder if this high setting was not because of race conditions in the
> driver :
>
> mvneta_tx() seems to access skb->len too late, TX completion might have
> already freed skb :
> 
>                 u64_stats_update_begin(&stats->syncp);
>                 stats->tx_packets++;
>                 stats->tx_bytes  += skb->len;         // potential use after free
>                 u64_stats_update_end(&stats->syncp);

Good catch! But no, this is unrelated since it does not fix the race either.
Initially this driver used to implement a timer to flush the Tx queue after
10ms, resulting in abysmal Tx-only performance as you can easily imagine. In
my opinion there's a design flaw in the chip, and they did everything they
could to workaround it (let's not say "hide it"), but that was not enough.
When I "fixed" the performance issue by enabling the Tx interrupt, I kept
the value 16 which gave pretty good results to me, without realizing that
there was this corner case :-/

> Thanks Willy !

Thanks for your review :-)

Willy

^ permalink raw reply

* Re: [PATCH] net: mvneta: fix race condition in mvneta_tx()
From: Willy Tarreau @ 2014-12-02 12:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Maggie Mae Roxas, Thomas Petazzoni, Gregory CLEMENT,
	Ezequiel Garcia
In-Reply-To: <1417523835.5303.41.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Dec 02, 2014 at 04:37:15AM -0800, Eric Dumazet wrote:
> On Tue, 2014-12-02 at 04:30 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > mvneta_tx() dereferences skb to get skb->len too late,
> > as hardware might have completed the transmit and TX completion
> > could have freed the skb from another cpu.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> 
> 
> For completeness, this bug was added in linux-3.14 and seems a stable
> candidate.
> 
> Fixes: 71f6d1b31fb1 ("net: mvneta: replace Tx timer with a real interrupt")

Absolutely, we backported this one back to 3.10.

Thanks Eric!
Willy

^ permalink raw reply

* Humble Request
From: Cirotti-Levine, Marie @ 2014-12-02 12:42 UTC (permalink / raw)


An Urgent Situation, that requires your Attention, Please only reply to (khalidmohamed@qq.com<mailto:khalidmohamed@qq.com>) for more details.

^ permalink raw reply

* Re: [PATCH] net: mvneta: fix Tx interrupt delay
From: Eric Dumazet @ 2014-12-02 13:04 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: netdev, Maggie Mae Roxas, Thomas Petazzoni, Gregory CLEMENT,
	Ezequiel Garcia
In-Reply-To: <20141202123951.GA16347@1wt.eu>

On Tue, 2014-12-02 at 13:39 +0100, Willy Tarreau wrote:
> Hi Eric,
> 
> On Tue, Dec 02, 2014 at 04:18:08AM -0800, Eric Dumazet wrote:
> 
> > I highly suggest BQL for this driver, now this issue is fixed.
> 
> How does that work ?

This works very well ;)

Its super easy to implement, take a look at commits
bdbc063129e811264cd6c311d8c2d9b95de01231 or
7070ce0a6419a118842298bc967061ad6cea40db

^ permalink raw reply

* Re: Is this 32-bit NCM?
From: Enrico Mioso @ 2014-12-02 13:11 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Kevin Zhu, Eli Britstein, Alex Strizhevsky, Midge Shaojun Tan,
	youtux@gmail.com, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <87fvcyqoup.fsf@nemi.mork.no>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2429 bytes --]

... but out of curiosity: are NCM specs allowing to change order of things in 
the package or not?
This is not to start philosofical falames or something, but to understand 
better how things work. And, if they do: how much arbitrarily?


On Tue, 2 Dec 2014, Bjørn Mork wrote:

==Date: Tue, 2 Dec 2014 12:21:18
==From: Bjørn Mork <bjorn@mork.no>
==To: Enrico Mioso <mrkiko.rs@gmail.com>
==Cc: Kevin Zhu <Mingying.Zhu@audiocodes.com>,
==    Eli Britstein <Eli.Britstein@audiocodes.com>,
==    Alex Strizhevsky <alexxst@gmail.com>,
==    Midge Shaojun Tan <ShaojunMidge.Tan@audiocodes.com>,
==    "youtux@gmail.com" <youtux@gmail.com>,
==    "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
==    "netdev@vger.kernel.org" <netdev@vger.kernel.org>
==Subject: Re: Is this 32-bit NCM?
==
==Enrico Mioso <mrkiko.rs@gmail.com> writes:
==
==> Kevin - it works! the key seems to be in the tx_fixup function; there is a 
==> special handling for frames effectively.
==> Please ... help me backport those changes to the standard Linux driver - it 
==> will be a gain for us all, and in general you'll have a more probable 
==> maintenance than you would with the driver from huawei.
==
==Very interesting.  The NCM code in the huawei driver has a different
==origin, so it is quite different and not too easy to merge into the
==existing Linux cdc_ncm driver.
==
==But this does pinpoint differences we should explore.  One is the
==placement of the NDP: The Huawei driver puts it at the end.  Another,
==which is much easier to test out quickly, is the sequence numbering: The
==Huawei driver doesn't use it.
==
==So I wonder if this makes any difference:
==
==diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
==index 80a844e0ae03..37f11770acb6 100644
==--- a/drivers/net/usb/cdc_ncm.c
==+++ b/drivers/net/usb/cdc_ncm.c
==@@ -1049,7 +1049,7 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign)
== 		nth16 = (struct usb_cdc_ncm_nth16 *)memset(skb_put(skb_out, sizeof(struct usb_cdc_ncm_nth16)), 0, sizeof(struct usb_cdc_ncm_nth16));
== 		nth16->dwSignature = cpu_to_le32(USB_CDC_NCM_NTH16_SIGN);
== 		nth16->wHeaderLength = cpu_to_le16(sizeof(struct usb_cdc_ncm_nth16));
==-		nth16->wSequence = cpu_to_le16(ctx->tx_seq++);
==+//		nth16->wSequence = cpu_to_le16(ctx->tx_seq++);
== 
== 		/* count total number of frames in this NTB */
== 		ctx->tx_curr_frame_num = 0;
==
==
==
==Bjørn
==

^ permalink raw reply

* Re: [PATCH] net: mvneta: fix Tx interrupt delay
From: Willy Tarreau @ 2014-12-02 13:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Maggie Mae Roxas, Thomas Petazzoni, Gregory CLEMENT,
	Ezequiel Garcia
In-Reply-To: <1417525458.5303.48.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Dec 02, 2014 at 05:04:18AM -0800, Eric Dumazet wrote:
> On Tue, 2014-12-02 at 13:39 +0100, Willy Tarreau wrote:
> > Hi Eric,
> > 
> > On Tue, Dec 02, 2014 at 04:18:08AM -0800, Eric Dumazet wrote:
> > 
> > > I highly suggest BQL for this driver, now this issue is fixed.
> > 
> > How does that work ?
> 
> This works very well ;)
> 
> Its super easy to implement, take a look at commits
> bdbc063129e811264cd6c311d8c2d9b95de01231 or
> 7070ce0a6419a118842298bc967061ad6cea40db

Thanks but I'm not sure I entirely understand the concept. Is it to
notify the sender that the packets were already queued for the NIC ?
And if so, how does that improve the situation ? I'm sorry if this
sounds like a stupid question, it's just that the concept by itself
is not clear to me.

Willy

^ permalink raw reply

* Re: Is this 32-bit NCM?
From: Bjørn Mork @ 2014-12-02 13:37 UTC (permalink / raw)
  To: Enrico Mioso
  Cc: Kevin Zhu, Eli Britstein, Alex Strizhevsky, Midge Shaojun Tan,
	youtux@gmail.com, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <alpine.LNX.2.03.1412021410270.29633-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Enrico Mioso <mrkiko.rs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> ... but out of curiosity: are NCM specs allowing to change order of things in 
> the package or not?
> This is not to start philosofical falames or something, but to understand 
> better how things work. And, if they do: how much arbitrarily?

Only the NTB header has a fixed location. The rest can be anywhere and
in any order. Quoting from section 3 Data Transport:

  "Within any given NTB, the NTH always must be first; but the other
   items may occur in arbitrary order."


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* re: net-caif: add CAIF core protocol stack
From: Dan Carpenter @ 2014-12-02 13:40 UTC (permalink / raw)
  To: sjur.brandeland; +Cc: netdev, Jörn Engel

Hello Sjur Braendeland,

The patch b482cd2053e3: "net-caif: add CAIF core protocol stack" from
Mar 30, 2010, leads to the following static checker warning:

	net/caif/cfctrl.c:440 cfctrl_recv()
	error: potentially using uninitialized 'tmp'.

net/caif/cfpkt_skbuff.c
   124  int cfpkt_extr_head(struct cfpkt *pkt, void *data, u16 len)
   125  {
   126          struct sk_buff *skb = pkt_to_skb(pkt);
   127          u8 *from;
   128          if (unlikely(is_erronous(pkt)))
   129                  return -EPROTO;
   130  
   131          if (unlikely(len > skb->len)) {
   132                  PKT_ERROR(pkt, "read beyond end of packet\n");
   133                  return -EPROTO;
   134          }
   135  
   136          if (unlikely(len > skb_headlen(skb))) {
                             ^^^^^^^^^^^^^^^^^^^^^
Assume we can hit this condition with "len == 1".  I don't know if
that's possible.

   137                  if (unlikely(skb_linearize(skb) != 0)) {
   138                          PKT_ERROR(pkt, "linearize failed\n");
   139                          return -EPROTO;
   140                  }
   141          }
   142          from = skb_pull(skb, len);
   143          from -= len;
   144          if (data)
   145                  memcpy(data, from, len);
   146          return 0;
   147  }
   148  EXPORT_SYMBOL(cfpkt_extr_head);

net/caif/cfctrl.c
   430                          case CFCTRL_SRV_RFM:
   431                                  /* Construct a frame, convert
   432                                   * DatagramConnectionID
   433                                   * to network format long and copy it out...
   434                                   */
   435                                  cfpkt_extr_head(pkt, &tmp32, 4);
   436                                  linkparam.u.rfm.connid =
   437                                    le32_to_cpu(tmp32);
   438                                  cp = (u8 *) linkparam.u.rfm.volume;
   439                                  for (cfpkt_extr_head(pkt, &tmp, 1);
   440                                       cfpkt_more(pkt) && tmp != '\0';
                                                                ^^^^^^^^^^
cfpkt_more() would be true and "tmp" is uninitliazed so it is a forever
loop.

   441                                       cfpkt_extr_head(pkt, &tmp, 1))
   442                                          *cp++ = tmp;
   443                                  *cp = '\0';

regards,
dan carpenter

^ permalink raw reply

* Re: Is this 32-bit NCM?
From: Enrico Mioso @ 2014-12-02 13:53 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Kevin Zhu, Eli Britstein, Alex Strizhevsky, Midge Shaojun Tan,
	youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <87lhmqp400.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1490 bytes --]

Thank you very much Bjorn.


On Tue, 2 Dec 2014, Bjørn Mork wrote:

==Date: Tue, 2 Dec 2014 14:37:03
==From: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
==To: Enrico Mioso <mrkiko.rs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
==Cc: Kevin Zhu <Mingying.Zhu-6C2+4RG2qWF0ubjbjo6WXg@public.gmane.org>,
==    Eli Britstein <Eli.Britstein-6C2+4RG2qWF0ubjbjo6WXg@public.gmane.org>,
==    Alex Strizhevsky <alexxst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
==    Midge Shaojun Tan <ShaojunMidge.Tan-6C2+4RG2qWF0ubjbjo6WXg@public.gmane.org>,
==    "youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
==    "linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
==    "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
==Subject: Re: Is this 32-bit NCM?
==
==Enrico Mioso <mrkiko.rs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
==
==> ... but out of curiosity: are NCM specs allowing to change order of things in 
==> the package or not?
==> This is not to start philosofical falames or something, but to understand 
==> better how things work. And, if they do: how much arbitrarily?
==
==Only the NTB header has a fixed location. The rest can be anywhere and
==in any order. Quoting from section 3 Data Transport:
==
==  "Within any given NTB, the NTH always must be first; but the other
==   items may occur in arbitrary order."
==
==
==Bjørn
==

^ permalink raw reply

* Re: [PATCH net] cxgb4: Add a check for flashing FW using ethtool
From: Sergei Shtylyov @ 2014-12-02 14:18 UTC (permalink / raw)
  To: Hariprasad Shenai, netdev; +Cc: davem, leedom, anish, nirranjan, kumaras
In-Reply-To: <1417522177-14842-1-git-send-email-hariprasad@chelsio.com>

Hello.

On 12/2/2014 3:09 PM, Hariprasad Shenai wrote:

> Don't let T4 firmware flash on a T5 adapter and vice-versa
> using ethtool

> Based on original work by Casey Leedom <leedom@chelsio.com>

> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
> ---
>   drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |   26 ++++++++++++++++++++++++++
>   1 files changed, 26 insertions(+), 0 deletions(-)

> diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> index 163a2a1..fae205a 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> @@ -1131,6 +1131,27 @@ unsigned int t4_flash_cfg_addr(struct adapter *adapter)
>   		return FLASH_CFG_START;
>   }
>
> +/* Return TRUE if the specified firmware matches the adapter.  I.e. T4
> + * firmware for T4 adapters, T5 firmware for T5 adapters, etc.  We go ahead
> + * and emit an error message for mismatched firmware to save our caller the
> + * effort ...
> + */
> +static int t4_fw_matches_chip(const struct adapter *adap,

    s/int/bool/?

> +			      const struct fw_hdr *hdr)
> +{
> +	/* The expression below will return FALSE for any unsupported adapter
> +	 * which will keep us "honest" in the future ...
> +	 */
> +	if ((is_t4(adap->params.chip) && hdr->chip == FW_HDR_CHIP_T4) ||
> +	    (is_t5(adap->params.chip) && hdr->chip == FW_HDR_CHIP_T5))
> +		return 1;

    s/1/true/?

> +
> +	dev_err(adap->pdev_dev,
> +		"FW image (%d) is not suitable for this adapter (%d)\n",
> +		hdr->chip, CHELSIO_CHIP_VERSION(adap->params.chip));
> +	return 0;

    s/0/false/?

[...]

WBR, Sergei

^ permalink raw reply

* [PATCH net-next V2 0/2] ethtool, net/mlx4_en: RSS hash function selection
From: Amir Vadai @ 2014-12-02 14:20 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings
  Cc: netdev, Or Gerlitz, Eyal Perry, Yevgeny Petrilin, Amir Vadai

Hi,

This patchset by Eyal adds support in set/get of RSS hash function. Current
supported functions are Toeplitz and XOR. The API is design to enable adding
new hash functions without breaking backward compatibility.
Userspace patch will be sent after API is available in kernel.


The patchset was applied and tested over commit cd4c910 ("netpoll: delete
defconfig references to obsolete NETPOLL_TRAP")

Amir

Changes from V1:
- Patch 1/2 - ethtool: Support for configurable RSS hash function:
   - Accept an hfunc the chip is using instead of only ETH_RSS_HASH_NO_CHANGE.
   - Provide an accurate hfunc value in get_rxfh() call.
   - Changed the behavior of the get_rxfh() w.r.t the validation of the
     arguments. Function will return 0 instead of -EOPNOSUPP when all arguments
     are NULL.

Changes from V0:
- Patch 1/2 - ethtool: Support for configurable RSS hash function:
 - Add ETH prefix to RSS_HASH_* definitions
 - Moved the strings array to ethtool.c
 - Extend {get,set}_rxfh with additional arg instead of adding new
   ethtool_option and adopt the change into drivers implementations.
 - Moved indir_size and key_size validation into drivers implantation
 - Documented hfunc filed in ethtool_rxfh struct
- Patch 2/2 - net/mlx4_en: Support for configurable RSS hash function
 - Remove redundant priv->rss_hash_fn_caps 
 - Use == operator instead & when determining requested hash function. 

Eyal Perry (2):
  ethtool: Support for configurable RSS hash function
  net/mlx4_en: Support for configurable RSS hash function

 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c       | 11 +++-
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    | 20 ++++++-
 drivers/net/ethernet/broadcom/tg3.c                | 20 ++++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 18 +++++-
 drivers/net/ethernet/emulex/benet/be_ethtool.c     | 12 +++-
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c   | 12 +++-
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 17 +++++-
 drivers/net/ethernet/intel/igb/igb_ethtool.c       | 16 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    | 37 +++++++++++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     | 11 ++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         | 14 ++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |  2 +-
 drivers/net/ethernet/sfc/ethtool.c                 | 18 ++++--
 drivers/net/vmxnet3/vmxnet3_ethtool.c              | 15 ++++-
 include/linux/ethtool.h                            | 42 +++++++++----
 include/uapi/linux/ethtool.h                       | 10 +++-
 net/core/ethtool.c                                 | 69 ++++++++++++----------
 17 files changed, 272 insertions(+), 72 deletions(-)

-- 
1.8.3.4

^ permalink raw reply

* [PATCH net-next V2 1/2] ethtool: Support for configurable RSS hash function
From: Amir Vadai @ 2014-12-02 14:20 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings
  Cc: netdev, Or Gerlitz, Eyal Perry, Yevgeny Petrilin, Tom Lendacky,
	Ariel Elior, Prashant Sreedharan, Michael Chan, Hariprasad S,
	Sathya Perla, Subbu Seetharaman, Ajit Khaparde, Jeff Kirsher,
	Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore,
	Greg Rose, Matthew Vick, John Ronciak, Mitch Williams, Amir Vadai
In-Reply-To: <1417530049-6943-1-git-send-email-amirv@mellanox.com>

From: Eyal Perry <eyalpe@mellanox.com>

This patch extends the set/get_rxfh ethtool-options for getting or
setting the RSS hash function.

It modifies drivers implementation of set/get_rxfh accordingly.

This change also delegates the responsibility of checking whether a
modification to a certain RX flow hash parameter is supported to the
driver implementation of set_rxfh.

User-kernel API is done through the new hfunc bitmask field in the
ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an
index in the new string-set ETH_SS_RSS_HASH_FUNCS.

Got approval from most of the relevant driver maintainers that their
driver is using Toeplitz, and for the few that didn't answered, also
assumed it is Toeplitz.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Cc: "VMware, Inc." <pv-drivers@vmware.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c       | 11 +++-
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    | 20 ++++++-
 drivers/net/ethernet/broadcom/tg3.c                | 20 ++++++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    | 18 +++++-
 drivers/net/ethernet/emulex/benet/be_ethtool.c     | 12 +++-
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c   | 12 +++-
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 17 +++++-
 drivers/net/ethernet/intel/igb/igb_ethtool.c       | 16 ++++-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    | 13 +++-
 drivers/net/ethernet/sfc/ethtool.c                 | 18 ++++--
 drivers/net/vmxnet3/vmxnet3_ethtool.c              | 15 ++++-
 include/linux/ethtool.h                            | 42 +++++++++----
 include/uapi/linux/ethtool.h                       | 10 +++-
 net/core/ethtool.c                                 | 69 ++++++++++++----------
 14 files changed, 223 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
index 95d4453..ebf4893 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
@@ -511,7 +511,8 @@ static u32 xgbe_get_rxfh_indir_size(struct net_device *netdev)
 	return ARRAY_SIZE(pdata->rss_table);
 }
 
-static int xgbe_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key)
+static int xgbe_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
+			 u8 *hfunc)
 {
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
 	unsigned int i;
@@ -525,16 +526,22 @@ static int xgbe_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key)
 	if (key)
 		memcpy(key, pdata->rss_key, sizeof(pdata->rss_key));
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+
 	return 0;
 }
 
 static int xgbe_set_rxfh(struct net_device *netdev, const u32 *indir,
-			 const u8 *key)
+			 const u8 *key, const u8 hfunc)
 {
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
 	struct xgbe_hw_if *hw_if = &pdata->hw_if;
 	unsigned int ret;
 
+	if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+		return -EOPNOTSUPP;
+
 	if (indir) {
 		ret = hw_if->set_rss_lookup_table(pdata, indir);
 		if (ret)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 1edc931..ffe4e00 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -3358,12 +3358,18 @@ static u32 bnx2x_get_rxfh_indir_size(struct net_device *dev)
 	return T_ETH_INDIRECTION_TABLE_SIZE;
 }
 
-static int bnx2x_get_rxfh(struct net_device *dev, u32 *indir, u8 *key)
+static int bnx2x_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
+			  u8 *hfunc)
 {
 	struct bnx2x *bp = netdev_priv(dev);
 	u8 ind_table[T_ETH_INDIRECTION_TABLE_SIZE] = {0};
 	size_t i;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (!indir)
+		return 0;
+
 	/* Get the current configuration of the RSS indirection table */
 	bnx2x_get_rss_ind_table(&bp->rss_conf_obj, ind_table);
 
@@ -3383,11 +3389,21 @@ static int bnx2x_get_rxfh(struct net_device *dev, u32 *indir, u8 *key)
 }
 
 static int bnx2x_set_rxfh(struct net_device *dev, const u32 *indir,
-			  const u8 *key)
+			  const u8 *key, const u8 hfunc)
 {
 	struct bnx2x *bp = netdev_priv(dev);
 	size_t i;
 
+	/* We require at least one supported parameter to be changed and no
+	 * change in any of the unsupported parameters
+	 */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+
+	if (!indir)
+		return 0;
+
 	for (i = 0; i < T_ETH_INDIRECTION_TABLE_SIZE; i++) {
 		/*
 		 * The same as in bnx2x_get_rxfh: we can't use a memcpy()
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 43fd1b7..bb48a61 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -12561,22 +12561,38 @@ static u32 tg3_get_rxfh_indir_size(struct net_device *dev)
 	return size;
 }
 
-static int tg3_get_rxfh(struct net_device *dev, u32 *indir, u8 *key)
+static int tg3_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, u8 *hfunc)
 {
 	struct tg3 *tp = netdev_priv(dev);
 	int i;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (!indir)
+		return 0;
+
 	for (i = 0; i < TG3_RSS_INDIR_TBL_SIZE; i++)
 		indir[i] = tp->rss_ind_tbl[i];
 
 	return 0;
 }
 
-static int tg3_set_rxfh(struct net_device *dev, const u32 *indir, const u8 *key)
+static int tg3_set_rxfh(struct net_device *dev, const u32 *indir, const u8 *key,
+			const u8 hfunc)
 {
 	struct tg3 *tp = netdev_priv(dev);
 	size_t i;
 
+	/* We require at least one supported parameter to be changed and no
+	 * change in any of the unsupported parameters
+	 */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+
+	if (!indir)
+		return 0;
+
 	for (i = 0; i < TG3_RSS_INDIR_TBL_SIZE; i++)
 		tp->rss_ind_tbl[i] = indir[i];
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 3aea82b..e7342bc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2923,21 +2923,35 @@ static u32 get_rss_table_size(struct net_device *dev)
 	return pi->rss_size;
 }
 
-static int get_rss_table(struct net_device *dev, u32 *p, u8 *key)
+static int get_rss_table(struct net_device *dev, u32 *p, u8 *key, u8 *hfunc)
 {
 	const struct port_info *pi = netdev_priv(dev);
 	unsigned int n = pi->rss_size;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (!p)
+		return 0;
 	while (n--)
 		p[n] = pi->rss[n];
 	return 0;
 }
 
-static int set_rss_table(struct net_device *dev, const u32 *p, const u8 *key)
+static int set_rss_table(struct net_device *dev, const u32 *p, const u8 *key,
+			 const u8 hfunc)
 {
 	unsigned int i;
 	struct port_info *pi = netdev_priv(dev);
 
+	/* We require at least one supported parameter to be changed and no
+	 * change in any of the unsupported parameters
+	 */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+	if (!p)
+		return 0;
+
 	for (i = 0; i < pi->rss_size; i++)
 		pi->rss[i] = p[i];
 	if (pi->adapter->flags & FULL_INIT_DONE)
diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index e42a791..73a500c 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -1171,7 +1171,8 @@ static u32 be_get_rxfh_key_size(struct net_device *netdev)
 	return RSS_HASH_KEY_LEN;
 }
 
-static int be_get_rxfh(struct net_device *netdev, u32 *indir, u8 *hkey)
+static int be_get_rxfh(struct net_device *netdev, u32 *indir, u8 *hkey,
+		       u8 *hfunc)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
 	int i;
@@ -1185,16 +1186,23 @@ static int be_get_rxfh(struct net_device *netdev, u32 *indir, u8 *hkey)
 	if (hkey)
 		memcpy(hkey, rss->rss_hkey, RSS_HASH_KEY_LEN);
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+
 	return 0;
 }
 
 static int be_set_rxfh(struct net_device *netdev, const u32 *indir,
-		       const u8 *hkey)
+		       const u8 *hkey, const u8 hfunc)
 {
 	int rc = 0, i, j;
 	struct be_adapter *adapter = netdev_priv(netdev);
 	u8 rsstable[RSS_INDIR_TABLE_LEN];
 
+	/* We do not allow change in unsupported parameters */
+	if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+		return -EOPNOTSUPP;
+
 	if (indir) {
 		struct be_rx_obj *rxo;
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 2d04464..651f53b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -916,11 +916,15 @@ static u32 fm10k_get_rssrk_size(struct net_device *netdev)
 	return FM10K_RSSRK_SIZE * FM10K_RSSRK_ENTRIES_PER_REG;
 }
 
-static int fm10k_get_rssh(struct net_device *netdev, u32 *indir, u8 *key)
+static int fm10k_get_rssh(struct net_device *netdev, u32 *indir, u8 *key,
+			  u8 *hfunc)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
 	int i, err;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+
 	err = fm10k_get_reta(netdev, indir);
 	if (err || !key)
 		return err;
@@ -932,12 +936,16 @@ static int fm10k_get_rssh(struct net_device *netdev, u32 *indir, u8 *key)
 }
 
 static int fm10k_set_rssh(struct net_device *netdev, const u32 *indir,
-			  const u8 *key)
+			  const u8 *key, const u8 hfunc)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
 	struct fm10k_hw *hw = &interface->hw;
 	int i, err;
 
+	/* We do not allow change in unsupported parameters */
+	if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+		return -EOPNOTSUPP;
+
 	err = fm10k_set_reta(netdev, indir);
 	if (err || !key)
 		return err;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 69a269b..69b97ba 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -627,13 +627,19 @@ static u32 i40evf_get_rxfh_indir_size(struct net_device *netdev)
  *
  * Reads the indirection table directly from the hardware. Always returns 0.
  **/
-static int i40evf_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key)
+static int i40evf_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
+			   u8 *hfunc)
 {
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
 	struct i40e_hw *hw = &adapter->hw;
 	u32 hlut_val;
 	int i, j;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (!indir)
+		return 0;
+
 	for (i = 0, j = 0; i <= I40E_VFQF_HLUT_MAX_INDEX; i++) {
 		hlut_val = rd32(hw, I40E_VFQF_HLUT(i));
 		indir[j++] = hlut_val & 0xff;
@@ -654,13 +660,20 @@ static int i40evf_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key)
  * returns 0 after programming the table.
  **/
 static int i40evf_set_rxfh(struct net_device *netdev, const u32 *indir,
-			   const u8 *key)
+			   const u8 *key, const u8 hfunc)
 {
 	struct i40evf_adapter *adapter = netdev_priv(netdev);
 	struct i40e_hw *hw = &adapter->hw;
 	u32 hlut_val;
 	int i, j;
 
+	/* We do not allow change in unsupported parameters */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+	if (!indir)
+		return 0;
+
 	for (i = 0, j = 0; i <= I40E_VFQF_HLUT_MAX_INDEX; i++) {
 		hlut_val = indir[j++];
 		hlut_val |= indir[j++] << 8;
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 02cfd3b..d5673eb 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2842,11 +2842,16 @@ static u32 igb_get_rxfh_indir_size(struct net_device *netdev)
 	return IGB_RETA_SIZE;
 }
 
-static int igb_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key)
+static int igb_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key,
+			u8 *hfunc)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	int i;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (!indir)
+		return 0;
 	for (i = 0; i < IGB_RETA_SIZE; i++)
 		indir[i] = adapter->rss_indir_tbl[i];
 
@@ -2889,13 +2894,20 @@ void igb_write_rss_indir_tbl(struct igb_adapter *adapter)
 }
 
 static int igb_set_rxfh(struct net_device *netdev, const u32 *indir,
-			const u8 *key)
+			const u8 *key, const u8 hfunc)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct e1000_hw *hw = &adapter->hw;
 	int i;
 	u32 num_queues;
 
+	/* We do not allow change in unsupported parameters */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+	if (!indir)
+		return 0;
+
 	num_queues = adapter->rss_queues;
 
 	switch (hw->mac.type) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index c45e06a..28c3fc5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -978,7 +978,8 @@ static u32 mlx4_en_get_rxfh_key_size(struct net_device *netdev)
 	return MLX4_EN_RSS_KEY_SIZE;
 }
 
-static int mlx4_en_get_rxfh(struct net_device *dev, u32 *ring_index, u8 *key)
+static int mlx4_en_get_rxfh(struct net_device *dev, u32 *ring_index, u8 *key,
+			    u8 *hfunc)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_rss_map *rss_map = &priv->rss_map;
@@ -990,16 +991,20 @@ static int mlx4_en_get_rxfh(struct net_device *dev, u32 *ring_index, u8 *key)
 	rss_rings = 1 << ilog2(rss_rings);
 
 	while (n--) {
+		if (!ring_index)
+			break;
 		ring_index[n] = rss_map->qps[n % rss_rings].qpn -
 			rss_map->base_qpn;
 	}
 	if (key)
 		memcpy(key, priv->rss_key, MLX4_EN_RSS_KEY_SIZE);
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
 	return err;
 }
 
 static int mlx4_en_set_rxfh(struct net_device *dev, const u32 *ring_index,
-			    const u8 *key)
+			    const u8 *key, const u8 hfunc)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
@@ -1008,6 +1013,10 @@ static int mlx4_en_set_rxfh(struct net_device *dev, const u32 *ring_index,
 	int i;
 	int rss_rings = 0;
 
+	/* We do not allow change in unsupported parameters */
+	if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+		return -EOPNOTSUPP;
+
 	/* Calculate RSS table size and make sure flows are spread evenly
 	 * between rings
 	 */
diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index cad258a..2ac07c9 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -1086,19 +1086,29 @@ static u32 efx_ethtool_get_rxfh_indir_size(struct net_device *net_dev)
 		0 : ARRAY_SIZE(efx->rx_indir_table));
 }
 
-static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key)
+static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key,
+				u8 *hfunc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 
-	memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_UNKNOWN;
+	if (indir)
+		memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
 	return 0;
 }
 
-static int efx_ethtool_set_rxfh(struct net_device *net_dev,
-				const u32 *indir, const u8 *key)
+static int efx_ethtool_set_rxfh(struct net_device *net_dev, const u32 *indir,
+				const u8 *key, const u8 hfunc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 
+	/* We do not allow change in unsupported parameters */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+	if (!indir)
+		return 0;
 	memcpy(efx->rx_indir_table, indir, sizeof(efx->rx_indir_table));
 	efx->type->rx_push_rss_config(efx);
 	return 0;
diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c
index b725fd9..b7b5332 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c
@@ -583,12 +583,16 @@ vmxnet3_get_rss_indir_size(struct net_device *netdev)
 }
 
 static int
-vmxnet3_get_rss(struct net_device *netdev, u32 *p, u8 *key)
+vmxnet3_get_rss(struct net_device *netdev, u32 *p, u8 *key, u8 *hfunc)
 {
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
 	struct UPT1_RSSConf *rssConf = adapter->rss_conf;
 	unsigned int n = rssConf->indTableSize;
 
+	if (hfunc)
+		*hfunc = ETH_RSS_HASH_TOP;
+	if (!p)
+		return 0;
 	while (n--)
 		p[n] = rssConf->indTable[n];
 	return 0;
@@ -596,13 +600,20 @@ vmxnet3_get_rss(struct net_device *netdev, u32 *p, u8 *key)
 }
 
 static int
-vmxnet3_set_rss(struct net_device *netdev, const u32 *p, const u8 *key)
+vmxnet3_set_rss(struct net_device *netdev, const u32 *p, const u8 *key,
+		const u8 hfunc)
 {
 	unsigned int i;
 	unsigned long flags;
 	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
 	struct UPT1_RSSConf *rssConf = adapter->rss_conf;
 
+	/* We do not allow change in unsupported parameters */
+	if (key ||
+	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+		return -EOPNOTSUPP;
+	if (!p)
+		return 0;
 	for (i = 0; i < rssConf->indTableSize; i++)
 		rssConf->indTable[i] = p[i];
 
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index c1a2d60..653dc9c 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -59,6 +59,26 @@ enum ethtool_phys_id_state {
 	ETHTOOL_ID_OFF
 };
 
+enum {
+	ETH_RSS_HASH_TOP_BIT, /* Configurable RSS hash function - Toeplitz */
+	ETH_RSS_HASH_XOR_BIT, /* Configurable RSS hash function - Xor */
+
+	/*
+	 * Add your fresh new hash function bits above and remember to update
+	 * rss_hash_func_strings[] in ethtool.c
+	 */
+	ETH_RSS_HASH_FUNCS_COUNT
+};
+
+#define __ETH_RSS_HASH_BIT(bit)	((u32)1 << (bit))
+#define __ETH_RSS_HASH(name)	__ETH_RSS_HASH_BIT(ETH_RSS_HASH_##name##_BIT)
+
+#define ETH_RSS_HASH_TOP	__ETH_RSS_HASH(TOP)
+#define ETH_RSS_HASH_XOR	__ETH_RSS_HASH(XOR)
+
+#define ETH_RSS_HASH_UNKNOWN	0
+#define ETH_RSS_HASH_NO_CHANGE	0
+
 struct net_device;
 
 /* Some generic methods drivers may use in their ethtool_ops */
@@ -158,17 +178,14 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 n_rx_rings)
  *	Returns zero if not supported for this specific device.
  * @get_rxfh_indir_size: Get the size of the RX flow hash indirection table.
  *	Returns zero if not supported for this specific device.
- * @get_rxfh: Get the contents of the RX flow hash indirection table and hash
- *	key.
- *	Will only be called if one or both of @get_rxfh_indir_size and
- *	@get_rxfh_key_size are implemented and return non-zero.
- *	Returns a negative error code or zero.
- * @set_rxfh: Set the contents of the RX flow hash indirection table and/or
- *	hash key.  In case only the indirection table or hash key is to be
- *	changed, the other argument will be %NULL.
- *	Will only be called if one or both of @get_rxfh_indir_size and
- *	@get_rxfh_key_size are implemented and return non-zero.
+ * @get_rxfh: Get the contents of the RX flow hash indirection table, hash key
+ *	and/or hash function.
  *	Returns a negative error code or zero.
+ * @set_rxfh: Set the contents of the RX flow hash indirection table, hash
+ *	key, and/or hash function.  Arguments which are set to %NULL or zero
+ *	will remain unchanged.
+ *	Returns a negative error code or zero. An error code must be returned
+ *	if at least one unsupported change was requested.
  * @get_channels: Get number of channels.
  * @set_channels: Set number of channels.  Returns a negative error code or
  *	zero.
@@ -241,9 +258,10 @@ struct ethtool_ops {
 	int	(*reset)(struct net_device *, u32 *);
 	u32	(*get_rxfh_key_size)(struct net_device *);
 	u32	(*get_rxfh_indir_size)(struct net_device *);
-	int	(*get_rxfh)(struct net_device *, u32 *indir, u8 *key);
+	int	(*get_rxfh)(struct net_device *, u32 *indir, u8 *key,
+			    u8 *hfunc);
 	int	(*set_rxfh)(struct net_device *, const u32 *indir,
-			    const u8 *key);
+			    const u8 *key, const u8 hfunc);
 	void	(*get_channels)(struct net_device *, struct ethtool_channels *);
 	int	(*set_channels)(struct net_device *, struct ethtool_channels *);
 	int	(*get_dump_flag)(struct net_device *, struct ethtool_dump *);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index eb2095b..5f66d9c 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -534,6 +534,7 @@ struct ethtool_pauseparam {
  * @ETH_SS_NTUPLE_FILTERS: Previously used with %ETHTOOL_GRXNTUPLE;
  *	now deprecated
  * @ETH_SS_FEATURES: Device feature names
+ * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
  */
 enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
@@ -541,6 +542,7 @@ enum ethtool_stringset {
 	ETH_SS_PRIV_FLAGS,
 	ETH_SS_NTUPLE_FILTERS,
 	ETH_SS_FEATURES,
+	ETH_SS_RSS_HASH_FUNCS,
 };
 
 /**
@@ -884,6 +886,8 @@ struct ethtool_rxfh_indir {
  * @key_size: On entry, the array size of the user buffer for the hash key,
  *	which may be zero.  On return from %ETHTOOL_GRSSH, the size of the
  *	hardware hash key.
+ * @hfunc: Defines the current RSS hash function used by HW (or to be set to).
+ *	Valid values are one of the %ETH_RSS_HASH_*.
  * @rsvd:	Reserved for future extensions.
  * @rss_config: RX ring/queue index for each hash value i.e., indirection table
  *	of @indir_size __u32 elements, followed by hash key of @key_size
@@ -893,14 +897,16 @@ struct ethtool_rxfh_indir {
  * size should be returned.  For %ETHTOOL_SRSSH, an @indir_size of
  * %ETH_RXFH_INDIR_NO_CHANGE means that indir table setting is not requested
  * and a @indir_size of zero means the indir table should be reset to default
- * values.
+ * values. An hfunc of zero means that hash function setting is not requested.
  */
 struct ethtool_rxfh {
 	__u32   cmd;
 	__u32	rss_context;
 	__u32   indir_size;
 	__u32   key_size;
-	__u32	rsvd[2];
+	__u8	hfunc;
+	__u8	rsvd8[3];
+	__u32	rsvd32;
 	__u32   rss_config[0];
 };
 #define ETH_RXFH_INDIR_NO_CHANGE	0xffffffff
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 715f51f..550892c 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -100,6 +100,12 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_BUSY_POLL_BIT] =        "busy-poll",
 };
 
+static const char
+rss_hash_func_strings[ETH_RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
+	[ETH_RSS_HASH_TOP_BIT] =	"toeplitz",
+	[ETH_RSS_HASH_XOR_BIT] =	"xor",
+};
+
 static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
 {
 	struct ethtool_gfeatures cmd = {
@@ -185,6 +191,9 @@ static int __ethtool_get_sset_count(struct net_device *dev, int sset)
 	if (sset == ETH_SS_FEATURES)
 		return ARRAY_SIZE(netdev_features_strings);
 
+	if (sset == ETH_SS_RSS_HASH_FUNCS)
+		return ARRAY_SIZE(rss_hash_func_strings);
+
 	if (ops->get_sset_count && ops->get_strings)
 		return ops->get_sset_count(dev, sset);
 	else
@@ -199,6 +208,9 @@ static void __ethtool_get_strings(struct net_device *dev,
 	if (stringset == ETH_SS_FEATURES)
 		memcpy(data, netdev_features_strings,
 			sizeof(netdev_features_strings));
+	else if (stringset == ETH_SS_RSS_HASH_FUNCS)
+		memcpy(data, rss_hash_func_strings,
+		       sizeof(rss_hash_func_strings));
 	else
 		/* ops->get_strings is valid because checked earlier */
 		ops->get_strings(dev, stringset, data);
@@ -618,7 +630,7 @@ static noinline_for_stack int ethtool_get_rxfh_indir(struct net_device *dev,
 	if (!indir)
 		return -ENOMEM;
 
-	ret = dev->ethtool_ops->get_rxfh(dev, indir, NULL);
+	ret = dev->ethtool_ops->get_rxfh(dev, indir, NULL, NULL);
 	if (ret)
 		goto out;
 
@@ -679,7 +691,7 @@ static noinline_for_stack int ethtool_set_rxfh_indir(struct net_device *dev,
 			goto out;
 	}
 
-	ret = ops->set_rxfh(dev, indir, NULL);
+	ret = ops->set_rxfh(dev, indir, NULL, ETH_RSS_HASH_NO_CHANGE);
 
 out:
 	kfree(indir);
@@ -697,12 +709,11 @@ static noinline_for_stack int ethtool_get_rxfh(struct net_device *dev,
 	u32 total_size;
 	u32 indir_bytes;
 	u32 *indir = NULL;
+	u8 dev_hfunc = 0;
 	u8 *hkey = NULL;
 	u8 *rss_config;
 
-	if (!(dev->ethtool_ops->get_rxfh_indir_size ||
-	      dev->ethtool_ops->get_rxfh_key_size) ||
-	      !dev->ethtool_ops->get_rxfh)
+	if (!ops->get_rxfh)
 		return -EOPNOTSUPP;
 
 	if (ops->get_rxfh_indir_size)
@@ -710,16 +721,14 @@ static noinline_for_stack int ethtool_get_rxfh(struct net_device *dev,
 	if (ops->get_rxfh_key_size)
 		dev_key_size = ops->get_rxfh_key_size(dev);
 
-	if ((dev_key_size + dev_indir_size) == 0)
-		return -EOPNOTSUPP;
-
 	if (copy_from_user(&rxfh, useraddr, sizeof(rxfh)))
 		return -EFAULT;
 	user_indir_size = rxfh.indir_size;
 	user_key_size = rxfh.key_size;
 
 	/* Check that reserved fields are 0 for now */
-	if (rxfh.rss_context || rxfh.rsvd[0] || rxfh.rsvd[1])
+	if (rxfh.rss_context || rxfh.rsvd8[0] || rxfh.rsvd8[1] ||
+	    rxfh.rsvd8[2] || rxfh.rsvd32)
 		return -EINVAL;
 
 	rxfh.indir_size = dev_indir_size;
@@ -727,13 +736,6 @@ static noinline_for_stack int ethtool_get_rxfh(struct net_device *dev,
 	if (copy_to_user(useraddr, &rxfh, sizeof(rxfh)))
 		return -EFAULT;
 
-	/* If the user buffer size is 0, this is just a query for the
-	 * device table size and key size.  Otherwise, if the User size is
-	 * not equal to device table size or key size it's an error.
-	 */
-	if (!user_indir_size && !user_key_size)
-		return 0;
-
 	if ((user_indir_size && (user_indir_size != dev_indir_size)) ||
 	    (user_key_size && (user_key_size != dev_key_size)))
 		return -EINVAL;
@@ -750,14 +752,19 @@ static noinline_for_stack int ethtool_get_rxfh(struct net_device *dev,
 	if (user_key_size)
 		hkey = rss_config + indir_bytes;
 
-	ret = dev->ethtool_ops->get_rxfh(dev, indir, hkey);
-	if (!ret) {
-		if (copy_to_user(useraddr +
-				 offsetof(struct ethtool_rxfh, rss_config[0]),
-				 rss_config, total_size))
-			ret = -EFAULT;
-	}
+	ret = dev->ethtool_ops->get_rxfh(dev, indir, hkey, &dev_hfunc);
+	if (ret)
+		goto out;
 
+	if (copy_to_user(useraddr + offsetof(struct ethtool_rxfh, hfunc),
+			 &dev_hfunc, sizeof(rxfh.hfunc))) {
+		ret = -EFAULT;
+	} else if (copy_to_user(useraddr +
+			      offsetof(struct ethtool_rxfh, rss_config[0]),
+			      rss_config, total_size)) {
+		ret = -EFAULT;
+	}
+out:
 	kfree(rss_config);
 
 	return ret;
@@ -776,33 +783,31 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 	u8 *rss_config;
 	u32 rss_cfg_offset = offsetof(struct ethtool_rxfh, rss_config[0]);
 
-	if (!(ops->get_rxfh_indir_size || ops->get_rxfh_key_size) ||
-	    !ops->get_rxnfc || !ops->set_rxfh)
+	if (!ops->get_rxnfc || !ops->set_rxfh)
 		return -EOPNOTSUPP;
 
 	if (ops->get_rxfh_indir_size)
 		dev_indir_size = ops->get_rxfh_indir_size(dev);
 	if (ops->get_rxfh_key_size)
 		dev_key_size = dev->ethtool_ops->get_rxfh_key_size(dev);
-	if ((dev_key_size + dev_indir_size) == 0)
-		return -EOPNOTSUPP;
 
 	if (copy_from_user(&rxfh, useraddr, sizeof(rxfh)))
 		return -EFAULT;
 
 	/* Check that reserved fields are 0 for now */
-	if (rxfh.rss_context || rxfh.rsvd[0] || rxfh.rsvd[1])
+	if (rxfh.rss_context || rxfh.rsvd8[0] || rxfh.rsvd8[1] ||
+	    rxfh.rsvd8[2] || rxfh.rsvd32)
 		return -EINVAL;
 
-	/* If either indir or hash key is valid, proceed further.
-	 * It is not valid to request that both be unchanged.
+	/* If either indir, hash key or function is valid, proceed further.
+	 * Must request at least one change: indir size, hash key or function.
 	 */
 	if ((rxfh.indir_size &&
 	     rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE &&
 	     rxfh.indir_size != dev_indir_size) ||
 	    (rxfh.key_size && (rxfh.key_size != dev_key_size)) ||
 	    (rxfh.indir_size == ETH_RXFH_INDIR_NO_CHANGE &&
-	     rxfh.key_size == 0))
+	     rxfh.key_size == 0 && rxfh.hfunc == ETH_RSS_HASH_NO_CHANGE))
 		return -EINVAL;
 
 	if (rxfh.indir_size != ETH_RXFH_INDIR_NO_CHANGE)
@@ -845,7 +850,7 @@ static noinline_for_stack int ethtool_set_rxfh(struct net_device *dev,
 		}
 	}
 
-	ret = ops->set_rxfh(dev, indir, hkey);
+	ret = ops->set_rxfh(dev, indir, hkey, rxfh.hfunc);
 
 out:
 	kfree(rss_config);
-- 
1.8.3.4

^ permalink raw reply related

* [PATCH net-next V2 2/2] net/mlx4_en: Support for configurable RSS hash function
From: Amir Vadai @ 2014-12-02 14:20 UTC (permalink / raw)
  To: David S. Miller, Ben Hutchings
  Cc: netdev, Or Gerlitz, Eyal Perry, Yevgeny Petrilin, Amir Vadai
In-Reply-To: <1417530049-6943-1-git-send-email-amirv@mellanox.com>

From: Eyal Perry <eyalpe@mellanox.com>

The ConnectX HW is capable of using one of the following hash functions:
Toeplitz and an XOR hash function. This patch extends the implementation
of the mlx4_en driver set/get_rxfh callbacks to support getting and
setting the RSS hash function used by the device.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 34 +++++++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  | 11 ++++++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      | 14 +++++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |  2 +-
 4 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 28c3fc5..90e0f04 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -978,6 +978,27 @@ static u32 mlx4_en_get_rxfh_key_size(struct net_device *netdev)
 	return MLX4_EN_RSS_KEY_SIZE;
 }
 
+static int mlx4_en_check_rxfh_func(struct net_device *dev, u8 hfunc)
+{
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+
+	/* check if requested function is supported by the device */
+	if ((hfunc == ETH_RSS_HASH_TOP &&
+	     !(priv->mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_RSS_TOP)) ||
+	    (hfunc == ETH_RSS_HASH_XOR &&
+	     !(priv->mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_RSS_XOR)))
+		return -EINVAL;
+
+	priv->rss_hash_fn = hfunc;
+	if (hfunc == ETH_RSS_HASH_TOP && !(dev->features & NETIF_F_RXHASH))
+		en_warn(priv,
+			"Toeplitz hash function should be used in conjunction with RX hashing for optimal performance\n");
+	if (hfunc == ETH_RSS_HASH_XOR && (dev->features & NETIF_F_RXHASH))
+		en_warn(priv,
+			"Enabling both XOR Hash function and RX Hashing can limit RPS functionality\n");
+	return 0;
+}
+
 static int mlx4_en_get_rxfh(struct net_device *dev, u32 *ring_index, u8 *key,
 			    u8 *hfunc)
 {
@@ -999,7 +1020,7 @@ static int mlx4_en_get_rxfh(struct net_device *dev, u32 *ring_index, u8 *key,
 	if (key)
 		memcpy(key, priv->rss_key, MLX4_EN_RSS_KEY_SIZE);
 	if (hfunc)
-		*hfunc = ETH_RSS_HASH_TOP;
+		*hfunc = priv->rss_hash_fn;
 	return err;
 }
 
@@ -1013,10 +1034,6 @@ static int mlx4_en_set_rxfh(struct net_device *dev, const u32 *ring_index,
 	int i;
 	int rss_rings = 0;
 
-	/* We do not allow change in unsupported parameters */
-	if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
-		return -EOPNOTSUPP;
-
 	/* Calculate RSS table size and make sure flows are spread evenly
 	 * between rings
 	 */
@@ -1037,6 +1054,12 @@ static int mlx4_en_set_rxfh(struct net_device *dev, const u32 *ring_index,
 	if (!is_power_of_2(rss_rings))
 		return -EINVAL;
 
+	if (hfunc != ETH_RSS_HASH_NO_CHANGE) {
+		err = mlx4_en_check_rxfh_func(dev, hfunc);
+		if (err)
+			return err;
+	}
+
 	mutex_lock(&mdev->state_lock);
 	if (priv->port_up) {
 		port_up = 1;
@@ -1047,6 +1070,7 @@ static int mlx4_en_set_rxfh(struct net_device *dev, const u32 *ring_index,
 		priv->prof->rss_rings = rss_rings;
 	if (key)
 		memcpy(priv->rss_key, key, MLX4_EN_RSS_KEY_SIZE);
+
 	if (port_up) {
 		err = mlx4_en_start_port(dev);
 		if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b7c9978..dcc0d3d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2608,6 +2608,17 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
 		dev->priv_flags |= IFF_UNICAST_FLT;
 
+	/* Setting a default hash function value */
+	if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_RSS_TOP) {
+		priv->rss_hash_fn = ETH_RSS_HASH_TOP;
+	} else if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_RSS_XOR) {
+		priv->rss_hash_fn = ETH_RSS_HASH_XOR;
+	} else {
+		en_warn(priv,
+			"No RSS hash capabilities exposed, using Toeplitz\n");
+		priv->rss_hash_fn = ETH_RSS_HASH_TOP;
+	}
+
 	mdev->pndev[port] = dev;
 
 	netif_carrier_off(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 946d352..4ca396e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1223,7 +1223,19 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 
 	rss_context->flags = rss_mask;
 	rss_context->hash_fn = MLX4_RSS_HASH_TOP;
-	memcpy(rss_context->rss_key, priv->rss_key, MLX4_EN_RSS_KEY_SIZE);
+	if (priv->rss_hash_fn == ETH_RSS_HASH_XOR) {
+		rss_context->hash_fn = MLX4_RSS_HASH_XOR;
+	} else if (priv->rss_hash_fn == ETH_RSS_HASH_TOP) {
+		rss_context->hash_fn = MLX4_RSS_HASH_TOP;
+		memcpy(rss_context->rss_key, priv->rss_key,
+		       MLX4_EN_RSS_KEY_SIZE);
+		netdev_rss_key_fill(rss_context->rss_key,
+				    MLX4_EN_RSS_KEY_SIZE);
+	} else {
+		en_err(priv, "Unknown RSS hash function requested\n");
+		err = -EINVAL;
+		goto indir_err;
+	}
 	err = mlx4_qp_to_ready(mdev->dev, &priv->res.mtt, &context,
 			       &rss_map->indir_qp, &rss_map->indir_state);
 	if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index aaa7efb..ac48a8d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -376,7 +376,6 @@ struct mlx4_en_port_profile {
 };
 
 struct mlx4_en_profile {
-	int rss_xor;
 	int udp_rss;
 	u8 rss_mask;
 	u32 active_ports;
@@ -619,6 +618,7 @@ struct mlx4_en_priv {
 
 	u32 pflags;
 	u8 rss_key[MLX4_EN_RSS_KEY_SIZE];
+	u8 rss_hash_fn;
 };
 
 enum mlx4_en_wol {
-- 
1.8.3.4

^ permalink raw reply related

* Re: [linux-nics] [PATCH] e1000: remove unused variables
From: Sudip Mukherjee @ 2014-12-02 14:24 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Ben Hutchings, Linux NICS, e1000-devel@lists.sourceforge.net,
	Hisashi T Fujinaka, Vick, Matthew, Greg@isotope.jf.intel.com,
	Kirsher, Jeffrey T, netdev@vger.kernel.org, Wyborny, Carolyn,
	John@isotope.jf.intel.com, linux-kernel@vger.kernel.org
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD653454C69@ORSMSX114.amr.corp.intel.com>

On Mon, Dec 01, 2014 at 06:56:46PM +0000, Fujinaka, Todd wrote:
> After discussing this locally, I'd like to NAK it because this could cause regressions to parts that are still in use but we don't have access to. Also, the assignment was necessary in the past for some versions of gcc and since this may be used in embedded systems using older compilers, we should leave it be.
> 
ok. i understand.
just a thought:
maybe you can put a comment in the file that these are there for a reason and should not be removed. else, you might receive the same type of patch again from someone else.

thanks
sudip

> Thanks.
> 
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujinaka@intel.com
> (503) 712-4565
> 
> -----Original Message-----
> From: linux-nics-bounces@isotope.jf.intel.com [mailto:linux-nics-bounces@isotope.jf.intel.com] On Behalf Of Sudip Mukherjee
> Sent: Sunday, November 30, 2014 8:55 PM
> To: Ben Hutchings
> Cc: Linux NICS; e1000-devel@lists.sourceforge.net; Hisashi T Fujinaka; Vick, Matthew; Greg@isotope.jf.intel.com; Kirsher, Jeffrey T; netdev@vger.kernel.org; Wyborny, Carolyn; John@isotope.jf.intel.com; linux-kernel@vger.kernel.org
> Subject: Re: [linux-nics] [PATCH] e1000: remove unused variables
> 
> On Sun, Nov 30, 2014 at 01:45:13AM +0000, Ben Hutchings wrote:
> > On Wed, 2014-11-26 at 21:59 -0800, Hisashi T Fujinaka wrote:
> > > I'm pretty sure those double reads are there for a reason, so most 
> > > of this I'm going to have to check on Monday. We have a long holiday 
> > > weekend here in the US.
> > [...]
> > 
> > If there were double register reads being replaced with single 
> > register reads, I'd agree this was likely to introduce a regression.  
> > But all I see is var = er32(REG) being changed to er32(REG).
> 
> no, double register reads are not modified. only the unused variables are removed.
> 
> thanks
> sudip
> 
> > 
> > Ben.
> > 
> > --
> > Ben Hutchings
> > The world is coming to an end.	Please log off.
> 
> 
> _______________________________________________
> Linux-nics mailing list
> Linux-nics@intel.com

^ permalink raw reply

* [PATCH iproute2] ss: Use rtnl_dump_filter in handle_netlink_request
From: Vadim Kochan @ 2014-12-02 14:53 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

Replaced handling netlink messages by rtnl_dump_filter
from lib/libnetlink.c, also:

    - removed unused dump_fp arg;
    - added MAGIC_SEQ #define for 123456 seq id

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 misc/ss.c | 128 ++++++++++++++++++--------------------------------------------
 1 file changed, 36 insertions(+), 92 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index a99294d..1c0ece2 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -41,6 +41,8 @@
 #include <linux/packet_diag.h>
 #include <linux/netlink_diag.h>
 
+#define MAGIC_SEQ 123456
+
 #define DIAG_REQUEST(_req, _r)						    \
 	struct {							    \
 		struct nlmsghdr nlh;					    \
@@ -49,7 +51,7 @@
 		.nlh = {						    \
 			.nlmsg_type = SOCK_DIAG_BY_FAMILY,		    \
 			.nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST,\
-			.nlmsg_seq = 123456,				    \
+			.nlmsg_seq = MAGIC_SEQ,				    \
 			.nlmsg_len = sizeof(_req),			    \
 		},							    \
 	}
@@ -1777,7 +1779,7 @@ static int tcpdiag_send(int fd, int protocol, struct filter *f)
 		req.nlh.nlmsg_type = DCCPDIAG_GETSOCK;
 	req.nlh.nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST;
 	req.nlh.nlmsg_pid = 0;
-	req.nlh.nlmsg_seq = 123456;
+	req.nlh.nlmsg_seq = MAGIC_SEQ;
 	memset(&req.r, 0, sizeof(req.r));
 	req.r.idiag_family = AF_INET;
 	req.r.idiag_states = f->states;
@@ -1937,7 +1939,7 @@ again:
 			struct inet_diag_msg *r = NLMSG_DATA(h);
 
 			if (/*h->nlmsg_pid != rth->local.nl_pid ||*/
-			    h->nlmsg_seq != 123456)
+			    h->nlmsg_seq != MAGIC_SEQ)
 				goto skip_it;
 
 			if (h->nlmsg_type == NLMSG_DONE)
@@ -2422,8 +2424,10 @@ static void unix_list_print(struct unixstat *list, struct filter *f)
 	}
 }
 
-static int unix_show_sock(struct nlmsghdr *nlh, struct filter *f)
+static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh,
+		void *arg)
 {
+	struct filter *f = (struct filter *)arg;
 	struct unix_diag_msg *r = NLMSG_DATA(nlh);
 	struct rtattr *tb[UNIX_DIAG_MAX+1];
 	char name[128];
@@ -2512,90 +2516,30 @@ static int unix_show_sock(struct nlmsghdr *nlh, struct filter *f)
 	return 0;
 }
 
-static int handle_netlink_request(struct filter *f, FILE *dump_fp,
-				  struct nlmsghdr *req, size_t size,
-				  int (* show_one_sock)(struct nlmsghdr *nlh, struct filter *f))
+static int handle_netlink_request(struct filter *f, struct nlmsghdr *req,
+		size_t size, rtnl_filter_t show_one_sock)
 {
-	int fd;
-	char	buf[16384];
+	int ret = -1;
+	struct rtnl_handle rth;
 
-	if ((fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_INET_DIAG)) < 0)
+	if (rtnl_open_byproto(&rth, 0, NETLINK_INET_DIAG))
 		return -1;
 
-	if (send(fd, req, size, 0) < 0) {
-		close(fd);
-		return -1;
-	}
+	rth.dump = MAGIC_SEQ;
 
-	while (1) {
-		ssize_t status;
-		struct nlmsghdr *h;
-		struct sockaddr_nl nladdr;
-		socklen_t slen = sizeof(nladdr);
+	if (rtnl_send(&rth, req, size) < 0)
+		goto Exit;
 
-		status = recvfrom(fd, buf, sizeof(buf), 0,
-				  (struct sockaddr *) &nladdr, &slen);
-		if (status < 0) {
-			if (errno == EINTR)
-				continue;
-			perror("OVERRUN");
-			continue;
-		}
-		if (status == 0) {
-			fprintf(stderr, "EOF on netlink\n");
-			goto close_it;
-		}
-
-		if (dump_fp)
-			fwrite(buf, 1, NLMSG_ALIGN(status), dump_fp);
-
-		h = (struct nlmsghdr*)buf;
-		while (NLMSG_OK(h, status)) {
-			int err;
+	if (rtnl_dump_filter(&rth, show_one_sock, f))
+		goto Exit;
 
-			if (/*h->nlmsg_pid != rth->local.nl_pid ||*/
-			    h->nlmsg_seq != 123456)
-				goto skip_it;
-
-			if (h->nlmsg_type == NLMSG_DONE)
-				goto close_it;
-
-			if (h->nlmsg_type == NLMSG_ERROR) {
-				struct nlmsgerr *err = (struct nlmsgerr*)NLMSG_DATA(h);
-				if (h->nlmsg_len < NLMSG_LENGTH(sizeof(struct nlmsgerr))) {
-					fprintf(stderr, "ERROR truncated\n");
-				} else {
-					errno = -err->error;
-					if (errno != ENOENT)
-						fprintf(stderr, "DIAG answers %d\n", errno);
-				}
-				close(fd);
-				return -1;
-			}
-			if (!dump_fp) {
-				err = show_one_sock(h, f);
-				if (err < 0) {
-					close(fd);
-					return err;
-				}
-			}
-
-skip_it:
-			h = NLMSG_NEXT(h, status);
-		}
-
-		if (status) {
-			fprintf(stderr, "!!!Remnant of size %zd\n", status);
-			exit(1);
-		}
-	}
-
-close_it:
-	close(fd);
-	return 0;
+	ret = 0;
+Exit:
+	rtnl_close(&rth);
+	return ret;
 }
 
-static int unix_show_netlink(struct filter *f, FILE *dump_fp)
+static int unix_show_netlink(struct filter *f)
 {
 	DIAG_REQUEST(req, struct unix_diag_req r);
 
@@ -2605,8 +2549,7 @@ static int unix_show_netlink(struct filter *f, FILE *dump_fp)
 	if (show_mem)
 		req.r.udiag_show |= UDIAG_SHOW_MEMINFO;
 
-	return handle_netlink_request(f, dump_fp, &req.nlh,
-					sizeof(req), unix_show_sock);
+	return handle_netlink_request(f, &req.nlh, sizeof(req), unix_show_sock);
 }
 
 static int unix_show(struct filter *f)
@@ -2619,7 +2562,7 @@ static int unix_show(struct filter *f)
 	struct unixstat *list = NULL;
 
 	if (!getenv("PROC_NET_UNIX") && !getenv("PROC_ROOT")
-	    && unix_show_netlink(f, NULL) == 0)
+	    && unix_show_netlink(f) == 0)
 		return 0;
 
 	if ((fp = net_unix_open()) == NULL)
@@ -2693,7 +2636,8 @@ static int unix_show(struct filter *f)
 	return 0;
 }
 
-static int packet_show_sock(struct nlmsghdr *nlh, struct filter *f)
+static int packet_show_sock(const struct sockaddr_nl *addr,
+		struct nlmsghdr *nlh, void *arg)
 {
 	struct packet_diag_msg *r = NLMSG_DATA(nlh);
 	struct rtattr *tb[PACKET_DIAG_MAX+1];
@@ -2786,15 +2730,14 @@ static int packet_show_sock(struct nlmsghdr *nlh, struct filter *f)
 	return 0;
 }
 
-static int packet_show_netlink(struct filter *f, FILE *dump_fp)
+static int packet_show_netlink(struct filter *f)
 {
 	DIAG_REQUEST(req, struct packet_diag_req r);
 
 	req.r.sdiag_family = AF_PACKET;
 	req.r.pdiag_show = PACKET_SHOW_INFO | PACKET_SHOW_MEMINFO | PACKET_SHOW_FILTER;
 
-	return handle_netlink_request(f, dump_fp, &req.nlh, sizeof(req),
-			packet_show_sock);
+	return handle_netlink_request(f, &req.nlh, sizeof(req), packet_show_sock);
 }
 
 
@@ -2811,7 +2754,7 @@ static int packet_show(struct filter *f)
 	int ino;
 	unsigned long long sk;
 
-	if (packet_show_netlink(f, NULL) == 0)
+	if (packet_show_netlink(f) == 0)
 		return 0;
 
 	if ((fp = net_packet_open()) == NULL)
@@ -2982,8 +2925,10 @@ static void netlink_show_one(struct filter *f,
 	return;
 }
 
-static int netlink_show_sock(struct nlmsghdr *nlh, struct filter *f)
+static int netlink_show_sock(const struct sockaddr_nl *addr,
+		struct nlmsghdr *nlh, void *arg)
 {
+	struct filter *f = (struct filter *)arg;
 	struct netlink_diag_msg *r = NLMSG_DATA(nlh);
 	struct rtattr *tb[NETLINK_DIAG_MAX+1];
 	int rq = 0, wq = 0;
@@ -3016,7 +2961,7 @@ static int netlink_show_sock(struct nlmsghdr *nlh, struct filter *f)
 	return 0;
 }
 
-static int netlink_show_netlink(struct filter *f, FILE *dump_fp)
+static int netlink_show_netlink(struct filter *f)
 {
 	DIAG_REQUEST(req, struct netlink_diag_req r);
 
@@ -3024,8 +2969,7 @@ static int netlink_show_netlink(struct filter *f, FILE *dump_fp)
 	req.r.sdiag_protocol = NDIAG_PROTO_ALL;
 	req.r.ndiag_show = NDIAG_SHOW_GROUPS | NDIAG_SHOW_MEMINFO;
 
-	return handle_netlink_request(f, dump_fp, &req.nlh,
-					sizeof(req), netlink_show_sock);
+	return handle_netlink_request(f, &req.nlh, sizeof(req), netlink_show_sock);
 }
 
 static int netlink_show(struct filter *f)
@@ -3038,7 +2982,7 @@ static int netlink_show(struct filter *f)
 	unsigned long long sk, cb;
 
 	if (!getenv("PROC_NET_NETLINK") && !getenv("PROC_ROOT") &&
-		netlink_show_netlink(f, NULL) == 0)
+		netlink_show_netlink(f) == 0)
 		return 0;
 
 	if ((fp = net_netlink_open()) == NULL)
-- 
2.1.3

^ permalink raw reply related

* Re: Is this 32-bit NCM?
From: Kevin Zhu @ 2014-12-02 15:04 UTC (permalink / raw)
  To: Enrico Mioso, Bjørn Mork
  Cc: Eli Britstein, Alex Strizhevsky, Midge Shaojun Tan,
	youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <alpine.LNX.2.03.1412021453340.7488-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

I do not understand why the wSequence matters. By the way, I think I see some NDPs are right after NTH headers in the windows capture.

________________________________________
From: Enrico Mioso <mrkiko.rs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Tuesday, December 2, 2014 21:53
To: Bjørn Mork
Cc: Kevin Zhu; Eli Britstein; Alex Strizhevsky; Midge Shaojun  Tan; youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Is this 32-bit NCM?

Thank you very much Bjorn.


On Tue, 2 Dec 2014, Bjørn Mork wrote:

==Date: Tue, 2 Dec 2014 14:37:03
==From: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
==To: Enrico Mioso <mrkiko.rs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
==Cc: Kevin Zhu <Mingying.Zhu-6C2+4RG2qWF0ubjbjo6WXg@public.gmane.org>,
==    Eli Britstein <Eli.Britstein-6C2+4RG2qWF0ubjbjo6WXg@public.gmane.org>,
==    Alex Strizhevsky <alexxst-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
==    Midge Shaojun Tan <ShaojunMidge.Tan-6C2+4RG2qWF0ubjbjo6WXg@public.gmane.org>,
==    "youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <youtux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
==    "linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
==    "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
==Subject: Re: Is this 32-bit NCM?
==
==Enrico Mioso <mrkiko.rs-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
==
==> ... but out of curiosity: are NCM specs allowing to change order of things in
==> the package or not?
==> This is not to start philosofical falames or something, but to understand
==> better how things work. And, if they do: how much arbitrarily?
==
==Only the NTB header has a fixed location. The rest can be anywhere and
==in any order. Quoting from section 3 Data Transport:
==
==  "Within any given NTB, the NTH always must be first; but the other
==   items may occur in arbitrary order."
==
==
==Bjørn
==
This email and any files transmitted with it are confidential material. They are intended solely for the use of the designated individual or entity to whom they are addressed. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, use, distribution or copying of this communication is strictly prohibited and may be unlawful.

If you have received this email in error please immediately notify the sender and delete or destroy any copy of this message
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next V2 1/2] ethtool: Support for configurable RSS hash function
From: Edward Cree @ 2014-12-02 15:14 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, Ben Hutchings, netdev,
	Solarflare linux maintainers, Shradha Shah
In-Reply-To: <1417530049-6943-2-git-send-email-amirv@mellanox.com>

On 02/12/14 14:20, Amir Vadai wrote:
> diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
> index cad258a..2ac07c9 100644
> --- a/drivers/net/ethernet/sfc/ethtool.c
> +++ b/drivers/net/ethernet/sfc/ethtool.c
> @@ -1086,19 +1086,29 @@ static u32 efx_ethtool_get_rxfh_indir_size(struct net_device *net_dev)
>  		0 : ARRAY_SIZE(efx->rx_indir_table));
>  }
>  
> -static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key)
> +static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key,
> +				u8 *hfunc)
>  {
>  	struct efx_nic *efx = netdev_priv(net_dev);
>  
> -	memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
> +	if (hfunc)
> +		*hfunc = ETH_RSS_HASH_UNKNOWN;
This should be ETH_RSS_HASH_TOP, especially as that's what you test
against in the _set_ function below.
(I don't know if we responded to your query.  If not, I can confirm
we're using Toeplitz.)
> +	if (indir)
> +		memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
>  	return 0;
>  }
>  
> -static int efx_ethtool_set_rxfh(struct net_device *net_dev,
> -				const u32 *indir, const u8 *key)
> +static int efx_ethtool_set_rxfh(struct net_device *net_dev, const u32 *indir,
> +				const u8 *key, const u8 hfunc)
>  {
>  	struct efx_nic *efx = netdev_priv(net_dev);
>  
> +	/* We do not allow change in unsupported parameters */
> +	if (key ||
> +	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
> +		return -EOPNOTSUPP;
> +	if (!indir)
> +		return 0;
>  	memcpy(efx->rx_indir_table, indir, sizeof(efx->rx_indir_table));
>  	efx->type->rx_push_rss_config(efx);
>  	return 0;
-Edward

^ permalink raw reply

* Re: [PATCH] net: mvneta: fix Tx interrupt delay
From: Eric Dumazet @ 2014-12-02 15:17 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: netdev, Maggie Mae Roxas, Thomas Petazzoni, Gregory CLEMENT,
	Ezequiel Garcia
In-Reply-To: <20141202132412.GC16347@1wt.eu>

On Tue, 2014-12-02 at 14:24 +0100, Willy Tarreau wrote:

> Thanks but I'm not sure I entirely understand the concept. Is it to
> notify the sender that the packets were already queued for the NIC ?
> And if so, how does that improve the situation ? I'm sorry if this
> sounds like a stupid question, it's just that the concept by itself
> is not clear to me.

http://lwn.net/Articles/454390/

BQL is first step to fight so called bufferbloat.

https://www.ietf.org/proceedings/86/slides/slides-86-iccrg-0.pdf

^ permalink raw reply

* Re: [PATCH net-next V2 1/2] ethtool: Support for configurable RSS hash function
From: Eyal Perry @ 2014-12-02 15:21 UTC (permalink / raw)
  To: Edward Cree, Amir Vadai
  Cc: David S. Miller, Ben Hutchings, netdev,
	Solarflare linux maintainers, Shradha Shah
In-Reply-To: <547DD74C.9070603@solarflare.com>

On 12/2/2014 17:14 PM, Edward Cree wrote:
> On 02/12/14 14:20, Amir Vadai wrote:
>> diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
>> index cad258a..2ac07c9 100644
>> --- a/drivers/net/ethernet/sfc/ethtool.c
>> +++ b/drivers/net/ethernet/sfc/ethtool.c
>> @@ -1086,19 +1086,29 @@ static u32 efx_ethtool_get_rxfh_indir_size(struct net_device *net_dev)
>>  		0 : ARRAY_SIZE(efx->rx_indir_table));
>>  }
>>  
>> -static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key)
>> +static int efx_ethtool_get_rxfh(struct net_device *net_dev, u32 *indir, u8 *key,
>> +				u8 *hfunc)
>>  {
>>  	struct efx_nic *efx = netdev_priv(net_dev);
>>  
>> -	memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
>> +	if (hfunc)
>> +		*hfunc = ETH_RSS_HASH_UNKNOWN;
> This should be ETH_RSS_HASH_TOP, especially as that's what you test
> against in the _set_ function below.
> (I don't know if we responded to your query.  If not, I can confirm
> we're using Toeplitz.)
Sending V3...
>> +	if (indir)
>> +		memcpy(indir, efx->rx_indir_table, sizeof(efx->rx_indir_table));
>>  	return 0;
>>  }
>>  
>> -static int efx_ethtool_set_rxfh(struct net_device *net_dev,
>> -				const u32 *indir, const u8 *key)
>> +static int efx_ethtool_set_rxfh(struct net_device *net_dev, const u32 *indir,
>> +				const u8 *key, const u8 hfunc)
>>  {
>>  	struct efx_nic *efx = netdev_priv(net_dev);
>>  
>> +	/* We do not allow change in unsupported parameters */
>> +	if (key ||
>> +	    (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
>> +		return -EOPNOTSUPP;
>> +	if (!indir)
>> +		return 0;
>>  	memcpy(efx->rx_indir_table, indir, sizeof(efx->rx_indir_table));
>>  	efx->type->rx_push_rss_config(efx);
>>  	return 0;
> -Edward
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Is this 32-bit NCM?
From: Enrico Mioso @ 2014-12-02 15:28 UTC (permalink / raw)
  To: Kevin Zhu
  Cc: Bjørn Mork, Eli Britstein, Alex Strizhevsky,
	Midge Shaojun Tan, youtux@gmail.com, linux-usb@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <1417532733483.89987@audiocodes.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2923 bytes --]

... And what do you think about the source code of their ndis driver?
We at least know now the device work with it, so we have something to mimic :D
thank you for your work and patience Kevin.

On Tue, 2 Dec 2014, Kevin Zhu wrote:

==Date: Tue, 2 Dec 2014 16:04:25
==From: Kevin Zhu <Mingying.Zhu@audiocodes.com>
==To: Enrico Mioso <mrkiko.rs@gmail.com>, Bjørn Mork <bjorn@mork.no>
==Cc: Eli Britstein <Eli.Britstein@audiocodes.com>,
==    Alex Strizhevsky <alexxst@gmail.com>,
==    Midge Shaojun  Tan <ShaojunMidge.Tan@audiocodes.com>,
==    "youtux@gmail.com" <youtux@gmail.com>,
==    "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
==    "netdev@vger.kernel.org" <netdev@vger.kernel.org>
==Subject: Re: Is this 32-bit NCM?
==
==I do not understand why the wSequence matters. By the way, I think I see some NDPs are right after NTH headers in the windows capture.
==
==________________________________________
==From: Enrico Mioso <mrkiko.rs@gmail.com>
==Sent: Tuesday, December 2, 2014 21:53
==To: Bjørn Mork
==Cc: Kevin Zhu; Eli Britstein; Alex Strizhevsky; Midge Shaojun  Tan; youtux@gmail.com; linux-usb@vger.kernel.org; netdev@vger.kernel.org
==Subject: Re: Is this 32-bit NCM?
==
==Thank you very much Bjorn.
==
==
==On Tue, 2 Dec 2014, Bjørn Mork wrote:
==
====Date: Tue, 2 Dec 2014 14:37:03
====From: Bjørn Mork <bjorn@mork.no>
====To: Enrico Mioso <mrkiko.rs@gmail.com>
====Cc: Kevin Zhu <Mingying.Zhu@audiocodes.com>,
====    Eli Britstein <Eli.Britstein@audiocodes.com>,
====    Alex Strizhevsky <alexxst@gmail.com>,
====    Midge Shaojun Tan <ShaojunMidge.Tan@audiocodes.com>,
====    "youtux@gmail.com" <youtux@gmail.com>,
====    "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
====    "netdev@vger.kernel.org" <netdev@vger.kernel.org>
====Subject: Re: Is this 32-bit NCM?
====
====Enrico Mioso <mrkiko.rs@gmail.com> writes:
====
====> ... but out of curiosity: are NCM specs allowing to change order of things in
====> the package or not?
====> This is not to start philosofical falames or something, but to understand
====> better how things work. And, if they do: how much arbitrarily?
====
====Only the NTB header has a fixed location. The rest can be anywhere and
====in any order. Quoting from section 3 Data Transport:
====
====  "Within any given NTB, the NTH always must be first; but the other
====   items may occur in arbitrary order."
====
====
====Bjørn
====
==This email and any files transmitted with it are confidential material. They are intended solely for the use of the designated individual or entity to whom they are addressed. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, use, distribution or copying of this communication is strictly prohibited and may be unlawful.
==
==If you have received this email in error please immediately notify the sender and delete or destroy any copy of this message
==

^ permalink raw reply

* [PATCH 1/1] net: dsa: replacing the hard-coded sized array "dsa_switch" by dynamic one
From: Andrey Volkov @ 2014-12-02 14:50 UTC (permalink / raw)
  To: netdev; +Cc: Florian Fainelli

Hello,

In time of developing one of our devices (with huge, more then 6, number of onboard switches),
I've bumped with this ancient, I hope, restriction in the 'struct dsa_switch_tree' definition. 
So this simple patch remove this restriction and make dsa_switch_tree more scalable for 
the "usual" 1-2 switches configuration too.

P.S. I've plans to fix hardcoded number of ports too, but it is not so easy as with number of switches.
So if someone have any objections/suggestions I'll happy to discuss them.

Signed-off-by: Andrey Volkov <andrey.volkov@nexvision.fr>                                                                                                                                                                                    
---                                                                                                                                                                                                                                          
 include/net/dsa.h |    3 +--                                                                                                                                                                                                                
 net/dsa/dsa.c     |    7 +++----                                                                                                                                                                                                            
 2 files changed, 4 insertions(+), 6 deletions(-)                                                                                                                                                                                            
                                                                                                                                                                                                                                             
diff --git a/include/net/dsa.h b/include/net/dsa.h                                                                                                                                                                                           
index ed3c34b..733db2e 100644                                                                                                                                                                                                                
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -28,7 +28,6 @@ enum dsa_tag_protocol {
        DSA_TAG_PROTO_BRCM,
 };
 
-#define DSA_MAX_SWITCHES       4
 #define DSA_MAX_PORTS          12
 
 struct dsa_chip_data {
@@ -117,7 +116,7 @@ struct dsa_switch_tree {
        /*
         * Data for the individual switch chips.
         */
-       struct dsa_switch       *ds[DSA_MAX_SWITCHES];
+       struct dsa_switch       *ds[];
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 322c778..c081a19 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -604,8 +604,6 @@ static int dsa_of_probe(struct platform_device *pdev)
        pdev->dev.platform_data = pd;
        pd->netdev = &ethernet_dev->dev;
        pd->nr_chips = of_get_child_count(np);
-       if (pd->nr_chips > DSA_MAX_SWITCHES)
-               pd->nr_chips = DSA_MAX_SWITCHES;
 
        pd->chip = kcalloc(pd->nr_chips, sizeof(struct dsa_chip_data),
                           GFP_KERNEL);
@@ -717,7 +715,7 @@ static int dsa_probe(struct platform_device *pdev)
                pd = pdev->dev.platform_data;
        }
 
-       if (pd == NULL || pd->netdev == NULL)
+       if (pd == NULL || pd->netdev == NULL || pd->nr_chips == 0)
                return -EINVAL;
 
        dev = dev_to_net_device(pd->netdev);
@@ -732,7 +730,8 @@ static int dsa_probe(struct platform_device *pdev)
                goto out;
        }
 
-       dst = kzalloc(sizeof(*dst), GFP_KERNEL);
+       dst = kzalloc(sizeof(*dst) +
+                       sizeof(struct dsa_switch *) * pd->nr_chips, GFP_KERNEL);
        if (dst == NULL) {
                dev_put(dev);
                ret = -ENOMEM;

^ permalink raw reply related

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Flavio Leitner @ 2014-12-02 15:44 UTC (permalink / raw)
  To: Du, Fan
  Cc: 'Jason Wang', netdev@vger.kernel.org, davem@davemloft.net,
	fw@strlen.de
In-Reply-To: <5A90DA2E42F8AE43BC4A093BF0678848DED92B@SHSMSX104.ccr.corp.intel.com>

On Sun, Nov 30, 2014 at 10:08:32AM +0000, Du, Fan wrote:
> 
> 
> >-----Original Message-----
> >From: Jason Wang [mailto:jasowang@redhat.com]
> >Sent: Friday, November 28, 2014 3:02 PM
> >To: Du, Fan
> >Cc: netdev@vger.kernel.org; davem@davemloft.net; fw@strlen.de; Du, Fan
> >Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
> >
> >
> >
> >On Fri, Nov 28, 2014 at 2:33 PM, Fan Du <fan.du@intel.com> wrote:
> >> Test scenario: two KVM guests sitting in different hosts communicate
> >> to each other with a vxlan tunnel.
> >>
> >> All interface MTU is default 1500 Bytes, from guest point of view, its
> >> skb gso_size could be as bigger as 1448Bytes, however after guest skb
> >> goes through vxlan encapuslation, individual segments length of a gso
> >> packet could exceed physical NIC MTU 1500, which will be lost at
> >> recevier side.
> >>
> >> So it's possible in virtualized environment, locally created skb len
> >> after encapslation could be bigger than underlayer MTU. In such case,
> >> it's reasonable to do GSO first, then fragment any packet bigger than
> >> MTU as possible.
> >>
> >> +---------------+ TX     RX +---------------+
> >> |   KVM Guest   | -> ... -> |   KVM Guest   |
> >> +-+-----------+-+           +-+-----------+-+
> >>   |Qemu/VirtIO|               |Qemu/VirtIO|
> >>   +-----------+               +-----------+
> >>        |                            |
> >>        v tap0                  tap0 v
> >>   +-----------+               +-----------+
> >>   | ovs bridge|               | ovs bridge|
> >>   +-----------+               +-----------+
> >>        | vxlan                vxlan |
> >>        v                            v
> >>   +-----------+               +-----------+
> >>   |    NIC    |    <------>   |    NIC    |
> >>   +-----------+               +-----------+
> >>
> >> Steps to reproduce:
> >>  1. Using kernel builtin openvswitch module to setup ovs bridge.
> >>  2. Runing iperf without -M, communication will stuck.
> >
> >Is this issue specific to ovs or ipv4? Path MTU discovery should help in this case I
> >believe.
> 
> Problem here is host stack push local over-sized gso skb down to NIC, and perform GSO there
> without any further ip segmentation.
> 
> Reasonable behavior is do gso first at ip level, if gso-ed skb is bigger than MTU && df is set, 
> Then push ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED message back to sender to adjust mtu.
> 
> For PMTU to work, that's another issue I will try to address later on.
> 
> >>
> >>
> >> Signed-off-by: Fan Du <fan.du@intel.com>
> >> ---
> >>  net/ipv4/ip_output.c |    7 ++++---
> >>  1 files changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index
> >> bc6471d..558b5f8 100644
> >> --- a/net/ipv4/ip_output.c
> >> +++ b/net/ipv4/ip_output.c
> >> @@ -217,9 +217,10 @@ static int ip_finish_output_gso(struct sk_buff
> >> *skb)
> >>  	struct sk_buff *segs;
> >>  	int ret = 0;
> >>
> >> -	/* common case: locally created skb or seglen is <= mtu */
> >> -	if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
> >> -	      skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
> >> +	/* Both locally created skb and forwarded skb could exceed
> >> +	 * MTU size, so make a unified rule for them all.
> >> +	 */
> >> +	if (skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
> >>  		return ip_finish_output2(skb);


Are you using kernel's vxlan device or openvswitch's vxlan device?

Because for kernel's vxlan devices the MTU accounts for the header
overhead so I believe your patch would work.  However, the MTU is
not visible for the ovs's vxlan devices, so that wouldn't work.

fbl

^ permalink raw reply

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Flavio Leitner @ 2014-12-02 15:48 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, mst-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	'Jason Wang', Du, Fan,
	fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org
In-Reply-To: <20141201135225.GA16814-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>

On Mon, Dec 01, 2014 at 01:52:25PM +0000, Thomas Graf wrote:
> On 11/30/14 at 10:08am, Du, Fan wrote:
> > >-----Original Message-----
> > >From: Jason Wang [mailto:jasowang@redhat.com]
> > >Sent: Friday, November 28, 2014 3:02 PM
> > >To: Du, Fan
> > >Cc: netdev@vger.kernel.org; davem@davemloft.net; fw@strlen.de; Du, Fan
> > >Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
> > >On Fri, Nov 28, 2014 at 2:33 PM, Fan Du <fan.du@intel.com> wrote:
> > >> Test scenario: two KVM guests sitting in different hosts communicate
> > >> to each other with a vxlan tunnel.
> > >>
> > >> All interface MTU is default 1500 Bytes, from guest point of view, its
> > >> skb gso_size could be as bigger as 1448Bytes, however after guest skb
> > >> goes through vxlan encapuslation, individual segments length of a gso
> > >> packet could exceed physical NIC MTU 1500, which will be lost at
> > >> recevier side.
> > >>
> > >> So it's possible in virtualized environment, locally created skb len
> > >> after encapslation could be bigger than underlayer MTU. In such case,
> > >> it's reasonable to do GSO first, then fragment any packet bigger than
> > >> MTU as possible.
> > >>
> > >> +---------------+ TX     RX +---------------+
> > >> |   KVM Guest   | -> ... -> |   KVM Guest   |
> > >> +-+-----------+-+           +-+-----------+-+
> > >>   |Qemu/VirtIO|               |Qemu/VirtIO|
> > >>   +-----------+               +-----------+
> > >>        |                            |
> > >>        v tap0                  tap0 v
> > >>   +-----------+               +-----------+
> > >>   | ovs bridge|               | ovs bridge|
> > >>   +-----------+               +-----------+
> > >>        | vxlan                vxlan |
> > >>        v                            v
> > >>   +-----------+               +-----------+
> > >>   |    NIC    |    <------>   |    NIC    |
> > >>   +-----------+               +-----------+
> > >>
> > >> Steps to reproduce:
> > >>  1. Using kernel builtin openvswitch module to setup ovs bridge.
> > >>  2. Runing iperf without -M, communication will stuck.
> > >
> > >Is this issue specific to ovs or ipv4? Path MTU discovery should help in this case I
> > >believe.
> > 
> > Problem here is host stack push local over-sized gso skb down to NIC, and perform GSO there
> > without any further ip segmentation.
> > 
> > Reasonable behavior is do gso first at ip level, if gso-ed skb is bigger than MTU && df is set, 
> > Then push ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED message back to sender to adjust mtu.
> 
> Aside from this. I think Virtio should provide a MTU hint to the guest
> to adjust MTU in the vNIC to account for both overhead or support for
> jumbo frames in the underlay transparently without relying on PMTU or
> MSS hints. I remember we talked about this a while ago with at least
> Michael but haven't done actual code work on it yet.

What about containers or any other virtualization environment that
doesn't use Virtio?

fbl
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* Re: [PATCH net-next] rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()
From: Roopa Prabhu @ 2014-12-02 16:08 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: netdev, David Miller, Eric Dumazet, Toshiaki Makita
In-Reply-To: <1417499650-29176-1-git-send-email-maheshb@google.com>

On 12/1/14, 9:54 PM, Mahesh Bandewar wrote:
> The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
> until after ndo_uninit") tried to do this ealier but while doing so
> it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
> delayed call to fill_info(). So this translated into asking driver
> to remove private state and then quert it's private state. This
> could have catastropic consequences.
>
> This change breaks the rtmsg_ifinfo() into two parts - one fills the
> skb by calling fill_info() prior to calling ndo_uninit() and the second
> part just sends the message using the skb filled earlier.
>
> It was brought to notice when last link is deleted from an ipvlan device
> when it has free-ed the port and the subsequent .fill_info() call is
> trying to get the info from the port.
>
> kernel: [  255.139429] ------------[ cut here ]------------
> kernel: [  255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
> kernel: [  255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
> kernel: [  255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
> kernel: [  255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
> kernel: [  255.139515]  0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
> kernel: [  255.139516]  0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
> kernel: [  255.139518]  00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
> kernel: [  255.139520] Call Trace:
> kernel: [  255.139527]  [<ffffffff815d87f4>] dump_stack+0x46/0x58
> kernel: [  255.139531]  [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
> kernel: [  255.139540]  [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
> kernel: [  255.139544]  [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
> kernel: [  255.139547]  [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
> kernel: [  255.139549]  [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
> kernel: [  255.139551]  [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
> kernel: [  255.139553]  [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
> kernel: [  255.139557]  [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
> kernel: [  255.139558]  [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
> kernel: [  255.139562]  [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
> kernel: [  255.139563]  [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
> kernel: [  255.139565]  [<ffffffff8152c398>] netlink_unicast+0x178/0x230
> kernel: [  255.139567]  [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
> kernel: [  255.139571]  [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
> kernel: [  255.139575]  [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
> kernel: [  255.139577]  [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
> kernel: [  255.139578]  [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
> kernel: [  255.139581]  [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
> kernel: [  255.139585]  [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
> kernel: [  255.139589]  [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
> kernel: [  255.139597]  [<ffffffff811e8336>] ? dput+0xb6/0x190
> kernel: [  255.139606]  [<ffffffff811f05f6>] ? mntput+0x26/0x40
> kernel: [  255.139611]  [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
> kernel: [  255.139613]  [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
> kernel: [  255.139615]  [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
> kernel: [  255.139617]  [<ffffffff815df092>] system_call_fastpath+0x12/0x17
> kernel: [  255.139619] ---[ end trace 5e6703e87d984f6b ]---

interestingly I have never seen this. We use this heavily with most 
other logical devices.
Which tells me most logical devices do have checks in their fill_info.
The patch idea is good. My only concern is stale information
in the DELLINK notification. Because,  ndo_uninit() does do a lot of 
cleanup, sending
newlink's for some of these cleanup changes. And now with your patch the 
dellink notification
skb probably  contains information that has been already deleted by 
ndo_uninit ?





>
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> Report-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
>   drivers/net/bonding/bond_main.c |  4 ++--
>   include/linux/rtnetlink.h       |  6 +++++-
>   include/net/bonding.h           |  8 ++++----
>   net/core/dev.c                  | 26 ++++++++++++++++----------
>   net/core/rtnetlink.c            | 20 ++++++++++++++++----
>   5 files changed, 43 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 184c434ae305..06206a1439a4 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1135,7 +1135,7 @@ static int bond_master_upper_dev_link(struct net_device *bond_dev,
>   	if (err)
>   		return err;
>   	slave_dev->flags |= IFF_SLAVE;
> -	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE, GFP_KERNEL);
> +	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE, GFP_KERNEL, false);
>   	return 0;
>   }
>   
> @@ -1144,7 +1144,7 @@ static void bond_upper_dev_unlink(struct net_device *bond_dev,
>   {
>   	netdev_upper_dev_unlink(slave_dev, bond_dev);
>   	slave_dev->flags &= ~IFF_SLAVE;
> -	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE, GFP_KERNEL);
> +	rtmsg_ifinfo(RTM_NEWLINK, slave_dev, IFF_SLAVE, GFP_KERNEL, false);
>   }
>   
>   static struct slave *bond_alloc_slave(struct bonding *bond)
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index 6cacbce1a06c..545dd0b8c83d 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -16,7 +16,11 @@ extern int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics);
>   extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
>   			      u32 id, long expires, u32 error);
>   
> -void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t flags);
> +struct sk_buff *rtmsg_ifinfo(int type, struct net_device *dev, unsigned change,
> +			     gfp_t flags, bool fill_only);
> +void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
> +		       gfp_t flags);
> +
>   
>   /* RTNL is used as a global lock for all changes to network configuration  */
>   extern void rtnl_lock(void);
> diff --git a/include/net/bonding.h b/include/net/bonding.h
> index 983a94b86b95..ea09f6c5af51 100644
> --- a/include/net/bonding.h
> +++ b/include/net/bonding.h
> @@ -315,7 +315,7 @@ static inline void bond_set_active_slave(struct slave *slave)
>   {
>   	if (slave->backup) {
>   		slave->backup = 0;
> -		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_ATOMIC);
> +		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_ATOMIC, false);
>   	}
>   }
>   
> @@ -323,7 +323,7 @@ static inline void bond_set_backup_slave(struct slave *slave)
>   {
>   	if (!slave->backup) {
>   		slave->backup = 1;
> -		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_ATOMIC);
> +		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_ATOMIC, false);
>   	}
>   }
>   
> @@ -335,7 +335,7 @@ static inline void bond_set_slave_state(struct slave *slave,
>   
>   	slave->backup = slave_state;
>   	if (notify) {
> -		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_ATOMIC);
> +		rtmsg_ifinfo(RTM_NEWLINK, slave->dev, 0, GFP_ATOMIC, false);
>   		slave->should_notify = 0;
>   	} else {
>   		if (slave->should_notify)
> @@ -365,7 +365,7 @@ static inline void bond_slave_state_notify(struct bonding *bond)
>   
>   	bond_for_each_slave(bond, tmp, iter) {
>   		if (tmp->should_notify) {
> -			rtmsg_ifinfo(RTM_NEWLINK, tmp->dev, 0, GFP_ATOMIC);
> +			rtmsg_ifinfo(RTM_NEWLINK, tmp->dev, 0, GFP_ATOMIC, false);
>   			tmp->should_notify = 0;
>   		}
>   	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ac4836241a96..29bc78d5e6cb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1230,7 +1230,7 @@ void netdev_state_change(struct net_device *dev)
>   		change_info.flags_changed = 0;
>   		call_netdevice_notifiers_info(NETDEV_CHANGE, dev,
>   					      &change_info.info);
> -		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
> +		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL, false);
>   	}
>   }
>   EXPORT_SYMBOL(netdev_state_change);
> @@ -1319,7 +1319,7 @@ int dev_open(struct net_device *dev)
>   	if (ret < 0)
>   		return ret;
>   
> -	rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
> +	rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL, false);
>   	call_netdevice_notifiers(NETDEV_UP, dev);
>   
>   	return ret;
> @@ -1396,7 +1396,8 @@ static int dev_close_many(struct list_head *head)
>   	__dev_close_many(head);
>   
>   	list_for_each_entry_safe(dev, tmp, head, close_list) {
> -		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL);
> +		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING,
> +			     GFP_KERNEL, false);
>   		call_netdevice_notifiers(NETDEV_DOWN, dev);
>   		list_del_init(&dev->close_list);
>   	}
> @@ -5683,7 +5684,7 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
>   	unsigned int changes = dev->flags ^ old_flags;
>   
>   	if (gchanges)
> -		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
> +		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC, false);
>   
>   	if (changes & IFF_UP) {
>   		if (dev->flags & IFF_UP)
> @@ -5925,6 +5926,8 @@ static void rollback_registered_many(struct list_head *head)
>   	synchronize_net();
>   
>   	list_for_each_entry(dev, head, unreg_list) {
> +		struct sk_buff *skb = NULL;
> +
>   		/* Shutdown queueing discipline. */
>   		dev_shutdown(dev);
>   
> @@ -5934,6 +5937,10 @@ static void rollback_registered_many(struct list_head *head)
>   		*/
>   		call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>   
> +		if (!dev->rtnl_link_ops ||
> +		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> +			skb = rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL, true);
> +
>   		/*
>   		 *	Flush the unicast and multicast chains
>   		 */
> @@ -5943,9 +5950,8 @@ static void rollback_registered_many(struct list_head *head)
>   		if (dev->netdev_ops->ndo_uninit)
>   			dev->netdev_ops->ndo_uninit(dev);
>   
> -		if (!dev->rtnl_link_ops ||
> -		    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> -			rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
> +		if (skb)
> +			rtmsg_ifinfo_send(skb, dev, GFP_KERNEL);
>   
>   		/* Notifier chain MUST detach us all upper devices. */
>   		WARN_ON(netdev_has_any_upper_dev(dev));
> @@ -6334,7 +6340,7 @@ int register_netdevice(struct net_device *dev)
>   	 */
>   	if (!dev->rtnl_link_ops ||
>   	    dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> -		rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> +		rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL, false);
>   
>   out:
>   	return ret;
> @@ -6959,7 +6965,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
>   	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>   	rcu_barrier();
>   	call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev);
> -	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL);
> +	rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL, false);
>   
>   	/*
>   	 *	Flush the unicast and multicast chains
> @@ -7000,7 +7006,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
>   	 *	Prevent userspace races by waiting until the network
>   	 *	device is fully setup before sending notifications.
>   	 */
> -	rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> +	rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL, false);
>   
>   	synchronize_net();
>   	err = 0;
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index b9b7dfaf202b..1035d8cdbc08 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2220,8 +2220,16 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
>   	return skb->len;
>   }
>   
> -void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> -		  gfp_t flags)
> +void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev, gfp_t flags)
> +{
> +	struct net *net = dev_net(dev);
> +
> +	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
> +}
> +EXPORT_SYMBOL(rtmsg_ifinfo_send);
> +
> +struct sk_buff *rtmsg_ifinfo(int type, struct net_device *dev,
> +			     unsigned int change, gfp_t flags, bool fill_only)
>   {
>   	struct net *net = dev_net(dev);
>   	struct sk_buff *skb;
> @@ -2239,11 +2247,15 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
>   		kfree_skb(skb);
>   		goto errout;
>   	}
> +	if (fill_only)
> +	    return skb;
> +
>   	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
> -	return;
> +	return NULL;
>   errout:
>   	if (err < 0)
>   		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
> +	return NULL;
>   }
>   EXPORT_SYMBOL(rtmsg_ifinfo);
>   
> @@ -3011,7 +3023,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
>   	case NETDEV_JOIN:
>   		break;
>   	default:
> -		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
> +		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL, false);
>   		break;
>   	}
>   	return NOTIFY_DONE;

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox