Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: critic on documentation of the network stack
From: Arkadiusz Miskiewicz @ 2014-01-26 21:33 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev
In-Reply-To: <20140124032324.GO7269@order.stressinduktion.org>

On Friday 24 of January 2014, Hannes Frederic Sowa wrote:
> Hello!
> 
> After net-next is closed I wanted to put the following link here:
> 
>   <http://linux.slashdot.org/comments.pl?sid=4356053&cid=45184693>
> 
> I don't want to start a flamefest or come too close to someone but I
> fear some of the critic is reasonable.  Maybe we can do better (I have
> to admit, I also hate writing documentation, e.g. have not yet send the
> IP_PMTUDISC_INTERFACE man-page patches).
> 
> I try to start with some constructive discussion:
> 
> There are some great features in the network stack that some people miss
> because of lack documentation. One possible solution is documentation
> directly in the kernel, but mostly this is just written as a reference
> and the real wonderful stuff is only achieved by putting lots of those
> features correclty together.
> 
> Maybe this is the second or third time this was proposed but I'll try
> again: Would it make sense to just start slow and setup a wiki

Some other project never merge patches if there is no documentation.

Maintainers could do the same, so eg. kernel patch is merged only if there is 
man-pages patch send, too (or other kind of documentation).

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

^ permalink raw reply

* Inquiry
From: Konrad Lukman @ 2014-01-26 20:44 UTC (permalink / raw)
  To: netdev

Dear sir/Madam

            Compliment of the new year .Sir we like like to establish a

serious business relationship with you , we are Suppliers we will like to inquire if you have some product available for the months of Febuary,please kindly get back to me if you have it available and let me know your payment terms and time of delivery. I look forward to hear from you soon thanks and have a great day.

Regards.

Konrad Lukman

Sale manager.

Comiol Enterprises

Address--904 54TH ST

City--BROOKLYN

Zip--11219-4020

State--NY

+1345675464

Country--UNITED STATES

^ permalink raw reply

* Re: Freescale FEC packet loss
From: Marek Vasut @ 2014-01-26 19:12 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev@vger.kernel.org, Frank Li, fugang.duan,
	fabio.estevam@freescale.com, Hector Palacios, linux-arm-kernel,
	Detlev Zundel, Eric Nelson, Matthew Garrett
In-Reply-To: <1390762590.2735.39.camel@deadeye.wl.decadent.org.uk>

On Sunday, January 26, 2014 at 07:56:30 PM, Ben Hutchings wrote:
> On Wed, 2014-01-22 at 22:55 +0100, Marek Vasut wrote:
> > Hi guys,
> > 
> > I am running stock Linux 3.13 on i.MX6Q SabreLite board. The CPU is
> > i.MX6Q TO 1.0 .
> > 
> > I am hitting a WARNING when I use the FEC ethernet to transfer data, thus
> > I started investigating this problem. TL;DR I am not able to figure this
> > problem out, so I am not attaching a patch :-(
> > 
> > Steps to reproduce:
> > -------------------
> > 1) Boot stock Linux 3.13 on i.MX6Q SabreLite board
> > 2) Plug in an SD card into one of the SD slots (I use the full-size one)
> > 3) Plug in an USB stick into one of the USB ports (I use the upper one)
> > 4) Plug in an ethernet cable into the board
> > 
> >    -> Connect the other side into a gigabit-capable PC
> 
> [...]
> 
> I think there are known problems with 1000BASE-T on the Sabre Lite
> board.

This is MX6-wide thing, not sabrelite specific actually.

> Two possible workarounds are to limit the PHY to 100BASE-TX
> (should be doable with ethtool) or force it to be clock master for
> 1000BASE-T (requires a driver patch).

Can you please elaborate on the later ? I don't quite understand that.

> The vendor kernel apparently does both!

More like the vendor kernel papers over this bug.

> Matthew Garrett has been trying to implement a workaround in a
> clean way.

Do you have any pointers about this please ?

Best regards,
Marek Vasut

^ permalink raw reply

* Re: Freescale FEC packet loss
From: Ben Hutchings @ 2014-01-26 18:56 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev@vger.kernel.org, Frank Li, fugang.duan,
	fabio.estevam@freescale.com, Hector Palacios, linux-arm-kernel,
	Detlev Zundel, Eric Nelson, Matthew Garrett
In-Reply-To: <201401222255.29467.marex@denx.de>

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Wed, 2014-01-22 at 22:55 +0100, Marek Vasut wrote:
> Hi guys,
> 
> I am running stock Linux 3.13 on i.MX6Q SabreLite board. The CPU is i.MX6Q TO 
> 1.0 .
> 
> I am hitting a WARNING when I use the FEC ethernet to transfer data, thus I 
> started investigating this problem. TL;DR I am not able to figure this problem 
> out, so I am not attaching a patch :-(
> 
> Steps to reproduce:
> -------------------
> 1) Boot stock Linux 3.13 on i.MX6Q SabreLite board
> 2) Plug in an SD card into one of the SD slots (I use the full-size one)
> 3) Plug in an USB stick into one of the USB ports (I use the upper one)
> 4) Plug in an ethernet cable into the board
>    -> Connect the other side into a gigabit-capable PC
[...]

I think there are known problems with 1000BASE-T on the Sabre Lite
board.  Two possible workarounds are to limit the PHY to 100BASE-TX
(should be doable with ethtool) or force it to be clock master for
1000BASE-T (requires a driver patch).  The vendor kernel apparently does
both!  Matthew Garrett has been trying to implement a workaround in a
clean way.

Ben.

-- 
Ben Hutchings
If the facts do not conform to your theory, they must be disposed of.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH V2 1/2] net: add and use skb_gso_transport_seglen()
From: Eric Dumazet @ 2014-01-26 16:51 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <1390730297-10725-1-git-send-email-fw@strlen.de>

On Sun, 2014-01-26 at 10:58 +0100, Florian Westphal wrote:
> This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
> skbuff core so it may be reused by upcoming ip forwarding path patch.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Changes since V1:
>  suggestions from Eric Dumazet:
>   - don't use uapi udp.h
>   - remove tcp.h include from tbf, its not needed anymore
> 
>  include/linux/skbuff.h |  1 +
>  net/core/skbuff.c      | 25 +++++++++++++++++++++++++
>  net/sched/sch_tbf.c    | 13 +++----------
>  3 files changed, 29 insertions(+), 10 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* ath9k ARM build error with v3.13-8330-g4ba9920
From: Josh Boyer @ 2014-01-26 15:59 UTC (permalink / raw)
  To: Sujith Manoharan, John W. Linville
  Cc: ath9k-devel, netdev, Linux-Kernel@Vger. Kernel. Org

Hi All,

This commit:

commit 4dc78c437a0a2ac152a2b2c5e91a814a6ef3599e
Author: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Date:   Wed Dec 18 09:53:26 2013 +0530

    ath9k: Fix RTC reset delay

    The delay that is required after issuing a RTC reset
    varies for each chip. Handle this properly.

    Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

adds a udelay(10000) call to the ath9k driver.  This will cause a
build error on various ARM configs because the value passed to udelay
is too large:

ERROR: "__bad_udelay" [drivers/net/wireless/ath/ath9k/ath9k_hw.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Is the 10000 microsecond udelay really required?  I believe the limit
on ARM is 2000.  Perhaps something else could be done in this case?

josh

^ permalink raw reply

* [PATCH 2/2] net: stmmac: Log MAC address only once
From: Hans de Goede @ 2014-01-26 14:50 UTC (permalink / raw)
  To: Giuseppe Cavallaro; +Cc: netdev, Hans de Goede
In-Reply-To: <1390747844-25060-1-git-send-email-hdegoede@redhat.com>

Logging the MAC address on every if-up, is not really useful, and annoying when
there is no cable inserted and NetworkManager tries the ifup every 50 seconds.

Also change the log level from warning to info, as that is what it is.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 838d0b7..920b3c6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1523,9 +1523,9 @@ static void stmmac_check_ether_addr(struct stmmac_priv *priv)
 					     priv->dev->dev_addr, 0);
 		if (!is_valid_ether_addr(priv->dev->dev_addr))
 			eth_hw_addr_random(priv->dev);
+		pr_info("%s: device MAC address %pM\n", priv->dev->name,
+			priv->dev->dev_addr);
 	}
-	pr_warn("%s: device MAC address %pM\n", priv->dev->name,
-		priv->dev->dev_addr);
 }
 
 /**
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH 1/2] net: stmmac: Silence PTP init errors on hw without PTP
From: Hans de Goede @ 2014-01-26 14:50 UTC (permalink / raw)
  To: Giuseppe Cavallaro; +Cc: netdev, Hans de Goede

Logging a PTP error on hw which simply does not support PTP is not very
useful. Moreover this message gets logged on every if-up, and if there is
no cable inserted NetworkManager will re-try the ifup every 50 seconds.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ff514b5..838d0b7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1687,7 +1687,7 @@ static int stmmac_open(struct net_device *dev)
 	stmmac_mmc_setup(priv);
 
 	ret = stmmac_init_ptp(priv);
-	if (ret)
+	if (ret && ret != -EOPNOTSUPP)
 		pr_warn("%s: failed PTP initialisation\n", __func__);
 
 #ifdef CONFIG_STMMAC_DEBUG_FS
-- 
1.8.5.3

^ permalink raw reply related

* Re: IPV6 routing problem
From: Rami Rosen @ 2014-01-26 13:57 UTC (permalink / raw)
  To: Cong Wang; +Cc: Sharat Masetty, Emmanuel Thierry, netdev
In-Reply-To: <CAHA+R7O5+0b9C=i7AY1Mtw9GghpaT3YUmz8AmGZSJKgGsBBVrA@mail.gmail.com>

Hi,

A shameless plug - the book is already published:

http://www.apress.com/9781430261964

And the slides are available in
"IPv6 in the Linux Kernel", under
http://ramirose.wix.com/ramirosen


Rami Rosen

On Fri, Jan 24, 2014 at 10:05 PM, Cong Wang <cwang@twopensource.com> wrote:
> On Fri, Jan 24, 2014 at 10:50 AM, Sharat Masetty <sharat04@gmail.com> wrote:
>> A general question, Can you suggest a good reference documentation to
>> understand the Linux kernel IPV6 routing and neighboring subsystem
>> better? The O Reilly book does not have much about IPV6.
>
> Rami Rosen is writing a book on Linux networking which covers IPv6
> as well. You can also search on Internet for his slides on IPv6 too.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* 3.14-mw regression:  rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe
From: Sander Eikelenboom @ 2014-01-26 10:55 UTC (permalink / raw)
  To: françois romieu; +Cc: netdev, linux-kernel@vger.kernel.org

Hi,

I have got a regression with a 3.14-mw kernel (last commit is 4ba9920e5e9c0e16b5ed24292d45322907bb9035):
It looks like it's related to the rtl8169 ...

--
Sander

Jan 26 11:36:26 serveerstertje kernel: [   89.105537] ------------[ cut here ]------------
Jan 26 11:36:26 serveerstertje kernel: [   89.116779] WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:491 add_dma_entry+0x103/0x130()
Jan 26 11:36:26 serveerstertje kernel: [   89.128148] DMA-API: exceeded 7 overlapping mappings of pfn 55ebe
Jan 26 11:36:26 serveerstertje kernel: [   89.139397] Modules linked in:
Jan 26 11:36:26 serveerstertje kernel: [   89.150535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-20140125-mw-pcireset+ #1
Jan 26 11:36:26 serveerstertje kernel: [   89.161784] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
Jan 26 11:36:26 serveerstertje kernel: [   89.172965]  0000000000000009 ffff88005f603838 ffffffff81acbcfa ffffffff822134e0
Jan 26 11:36:26 serveerstertje kernel: [   89.184156]  ffff88005f603888 ffff88005f603878 ffffffff810bdf62 ffff880000000000
Jan 26 11:36:26 serveerstertje kernel: [   89.195186]  0000000000055ebe 00000000ffffffef 0000000000000200 ffff8800592ea098
Jan 26 11:36:26 serveerstertje kernel: [   89.206227] Call Trace:
Jan 26 11:36:26 serveerstertje kernel: [   89.217027]  <IRQ>  [<ffffffff81acbcfa>] dump_stack+0x46/0x58
Jan 26 11:36:26 serveerstertje kernel: [   89.227907]  [<ffffffff810bdf62>] warn_slowpath_common+0x82/0xb0
Jan 26 11:36:26 serveerstertje kernel: [   89.238678]  [<ffffffff810be031>] warn_slowpath_fmt+0x41/0x50
Jan 26 11:36:26 serveerstertje kernel: [   89.249336]  [<ffffffff81471c5a>] ? active_pfn_read_overlap+0x3a/0x70
Jan 26 11:36:26 serveerstertje kernel: [   89.259904]  [<ffffffff814729e3>] add_dma_entry+0x103/0x130
Jan 26 11:36:26 serveerstertje kernel: [   89.270416]  [<ffffffff81472de6>] debug_dma_map_page+0x126/0x150
Jan 26 11:36:26 serveerstertje kernel: [   89.280840]  [<ffffffff81714686>] rtl8169_start_xmit+0x216/0xa20
Jan 26 11:36:26 serveerstertje kernel: [   89.291073]  [<ffffffff8194aaaa>] ? __kfree_skb+0x3a/0xb0
Jan 26 11:36:26 serveerstertje kernel: [   89.301252]  [<ffffffff81955a3f>] ? dev_queue_xmit_nit+0x1ef/0x260
Jan 26 11:36:26 serveerstertje kernel: [   89.311392]  [<ffffffff81955850>] ? dev_loopback_xmit+0x1e0/0x1e0
Jan 26 11:36:26 serveerstertje kernel: [   89.321418]  [<ffffffff81959b96>] dev_hard_start_xmit+0x2e6/0x4a0
Jan 26 11:36:26 serveerstertje kernel: [   89.331236]  [<ffffffff819778fe>] sch_direct_xmit+0xfe/0x280
Jan 26 11:36:26 serveerstertje kernel: [   89.341013]  [<ffffffff81959f8c>] __dev_queue_xmit+0x23c/0x630
Jan 26 11:36:26 serveerstertje kernel: [   89.350668]  [<ffffffff81959d50>] ? dev_hard_start_xmit+0x4a0/0x4a0
Jan 26 11:36:26 serveerstertje kernel: [   89.360264]  [<ffffffff81a00ce4>] ? ip_output+0x54/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.369698]  [<ffffffff8195a39b>] dev_queue_xmit+0xb/0x10
Jan 26 11:36:26 serveerstertje kernel: [   89.379034]  [<ffffffff819ff2bb>] ip_finish_output+0x2cb/0x670
Jan 26 11:36:26 serveerstertje kernel: [   89.388373]  [<ffffffff81a00ce4>] ? ip_output+0x54/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.397498]  [<ffffffff81a00ce4>] ip_output+0x54/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.406584]  [<ffffffff819fc141>] ip_forward_finish+0x71/0x1a0
Jan 26 11:36:26 serveerstertje kernel: [   89.415534]  [<ffffffff819fc413>] ip_forward+0x1a3/0x440
Jan 26 11:36:26 serveerstertje kernel: [   89.424400]  [<ffffffff819f9f80>] ip_rcv_finish+0x150/0x650
Jan 26 11:36:26 serveerstertje kernel: [   89.433108]  [<ffffffff819faa1b>] ip_rcv+0x22b/0x370
Jan 26 11:36:26 serveerstertje kernel: [   89.441737]  [<ffffffff81a57322>] ? packet_rcv_spkt+0x42/0x190
Jan 26 11:36:26 serveerstertje kernel: [   89.450226]  [<ffffffff81957382>] __netif_receive_skb_core+0x6d2/0x8a0
Jan 26 11:36:26 serveerstertje kernel: [   89.458687]  [<ffffffff81956dc4>] ? __netif_receive_skb_core+0x114/0x8a0
Jan 26 11:36:26 serveerstertje kernel: [   89.467109]  [<ffffffff81008f50>] ? xen_clocksource_read+0x20/0x30
Jan 26 11:36:26 serveerstertje kernel: [   89.475362]  [<ffffffff81116e09>] ? getnstimeofday+0x9/0x30
Jan 26 11:36:26 serveerstertje kernel: [   89.483548]  [<ffffffff8195756c>] __netif_receive_skb+0x1c/0x70
Jan 26 11:36:26 serveerstertje kernel: [   89.491608]  [<ffffffff819575de>] netif_receive_skb_internal+0x1e/0xf0
Jan 26 11:36:26 serveerstertje kernel: [   89.499596]  [<ffffffff81958ac0>] napi_gro_receive+0x70/0xa0
Jan 26 11:36:26 serveerstertje kernel: [   89.507486]  [<ffffffff81711673>] rtl8169_poll+0x2d3/0x680
Jan 26 11:36:26 serveerstertje kernel: [   89.515222]  [<ffffffff81957a81>] net_rx_action+0x161/0x260
Jan 26 11:36:26 serveerstertje kernel: [   89.523097]  [<ffffffff810c28dd>] __do_softirq+0x11d/0x250
Jan 26 11:36:26 serveerstertje kernel: [   89.530973]  [<ffffffff810c2d72>] irq_exit+0xa2/0xd0
Jan 26 11:36:26 serveerstertje kernel: [   89.538915]  [<ffffffff814f94bf>] xen_evtchn_do_upcall+0x2f/0x40
Jan 26 11:36:26 serveerstertje kernel: [   89.546876]  [<ffffffff81ad83de>] xen_do_hypervisor_callback+0x1e/0x30
Jan 26 11:36:26 serveerstertje kernel: [   89.554591]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.562139]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.569503]  [<ffffffff81008c70>] ? xen_safe_halt+0x10/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.576788]  [<ffffffff81018748>] ? default_idle+0x18/0x20
Jan 26 11:36:26 serveerstertje kernel: [   89.583863]  [<ffffffff81018f5e>] ? arch_cpu_idle+0x2e/0x40
Jan 26 11:36:26 serveerstertje kernel: [   89.590627]  [<ffffffff8110b511>] ? cpu_startup_entry+0x91/0x1e0
Jan 26 11:36:26 serveerstertje kernel: [   89.597184]  [<ffffffff81ac0497>] ? rest_init+0xb7/0xc0
Jan 26 11:36:26 serveerstertje kernel: [   89.603507]  [<ffffffff81ac03e0>] ? csum_partial_copy_generic+0x170/0x170
Jan 26 11:36:26 serveerstertje kernel: [   89.609631]  [<ffffffff8230ef1c>] ? start_kernel+0x409/0x416
Jan 26 11:36:26 serveerstertje kernel: [   89.615490]  [<ffffffff8230e912>] ? repair_env_string+0x5e/0x5e
Jan 26 11:36:26 serveerstertje kernel: [   89.621197]  [<ffffffff8230e5f8>] ? x86_64_start_reservations+0x2a/0x2c
Jan 26 11:36:26 serveerstertje kernel: [   89.626592]  [<ffffffff82311e26>] ? xen_start_kernel+0x584/0x586
Jan 26 11:36:26 serveerstertje kernel: [   89.631933] ---[ end trace 206b59d1fe29b5a7 ]---

^ permalink raw reply

* [PATCH] ath9k: Fix uninitialized variable in ath9k_has_tx_pending()
From: Geert Uytterhoeven @ 2014-01-26 10:53 UTC (permalink / raw)
  To: Felix Fietkau, John W. Linville, QCA ath9k Development
  Cc: linux-wireless, ath9k-devel, netdev, linux-kernel,
	Geert Uytterhoeven

drivers/net/wireless/ath/ath9k/main.c: In function ‘ath9k_has_tx_pending’:
drivers/net/wireless/ath/ath9k/main.c:1869: warning: ‘npend’ may be used uninitialized in this function

Introduced by commit 10e2318103f5941aa70c318afe34bc41f1b98529 ("ath9k:
optimize ath9k_flush").

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
This doesn't look like a false positive to me

 drivers/net/wireless/ath/ath9k/main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 5924f72dd493..f08d5051c13f 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1866,7 +1866,7 @@ static void ath9k_set_coverage_class(struct ieee80211_hw *hw, u8 coverage_class)
 
 static bool ath9k_has_tx_pending(struct ath_softc *sc)
 {
-	int i, npend;
+	int i, npend = 0;
 
 	for (i = 0; i < ATH9K_NUM_TX_QUEUES; i++) {
 		if (!ATH_TXQ_SETUP(sc, i))
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] net/apne: Remove unused variable ei_local
From: Geert Uytterhoeven @ 2014-01-26 10:44 UTC (permalink / raw)
  To: Matthew Whitehead, David S. Miller
  Cc: netdev, linux-kernel, Geert Uytterhoeven

drivers/net/ethernet/8390/apne.c: In function ‘apne_probe1’:
drivers/net/ethernet/8390/apne.c:215: warning: unused variable ‘ei_local’

Introduced by commit c45f812f0280c13f1b7992be5e0de512312a9e8f ("8390 :
Replace ei_debug with msg_enable/NETIF_MSG_* feature"), which added the
variable without using it.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 drivers/net/ethernet/8390/apne.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/8390/apne.c b/drivers/net/ethernet/8390/apne.c
index 811fa5d5c697..30104b60da85 100644
--- a/drivers/net/ethernet/8390/apne.c
+++ b/drivers/net/ethernet/8390/apne.c
@@ -212,7 +212,6 @@ static int __init apne_probe1(struct net_device *dev, int ioaddr)
     int neX000, ctron;
 #endif
     static unsigned version_printed;
-    struct ei_device *ei_local = netdev_priv(dev);

     if ((apne_msg_enable & NETIF_MSG_DRV) && (version_printed++ == 0))
 		netdev_info(dev, version);
-- 
1.7.9.5

^ permalink raw reply related

* Re: critic on documentation of the network stack
From: Richard Weinberger @ 2014-01-26 10:35 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Stephen Hemminger, Hannes Frederic Sowa, netdev@vger.kernel.org
In-Reply-To: <52E47E82.4040906@gmail.com>

On Sun, Jan 26, 2014 at 4:18 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> Le 24/01/2014 15:58, Stephen Hemminger a écrit :
>
>> On Fri, 24 Jan 2014 04:23:24 +0100
>> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>>
>>> Hello!
>>>
>>> After net-next is closed I wanted to put the following link here:
>>>
>>>    <http://linux.slashdot.org/comments.pl?sid=4356053&cid=45184693>
>>
>>
>> The problem I have is more that there are more incorrect sources of
>> documentation
>> and differing opinions on the Internet. Maybe the problem is users, maybe
>> the
>> problem is lack of SEO, or developers not being paid to write
>> documentation, or
>> old web sites not being updated. For example, this commenter obviously
>> never
>> found http://www.lartc.org/
>
>
> (which unfortunately is rather outdated)

s/outdated/dead

> I do not buy the fact that some developers do not provide documentation of
> the features they are adding potentially on purpose, truth is probably much
> simpler, you worked on X, you have now moved on and work on Y.

IMHO one cause of the problem is that many nice features get
implemented by contractors.
The customer is only interested in the kernel interface and has it's
own (proprietary) userland.
Don't get me wrong, this is perfectly fine.
But it has the side effect that a) the customer is not interested in
man-pages or other open docs/tools
(he does not want to reveal his magic sauce too early) and
b) the contractor is therefore not paid to provide docs.

> If nobody pays enough attention to what gets added through netdev, iproute2,
> ethtool, man-pages and enforces the need for documentation, then comes the
> current status quo where not all features are documented, until some
> benevolant person realizes this needs fixing. Considering the high volume of
> the list, this is all understandable.
>
> There could probably be some programmatical ways to enforce such
> documentation by only allowing patches coming with, say kernel-doc content
> along the code, and have man-pages and other projects scan for new
> kernel-doc entries they have no reference for.
> --
> Florian
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,
//richard

^ permalink raw reply

* 3.13: e1000e triggers BUG in IRQ handling
From: Jiri Slaby @ 2014-01-26 10:34 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, jesse.brandeburg, Bruce Allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	alexander.h.duyck, john.ronciak, mitch.a.williams,
	e1000-devel@lists.sourceforge.net
  Cc: ML netdev, Linux kernel mailing list

Hi,

after some combination of the following:
  ip link set up dev eth0
  ip link set down dev eth0
  ip link set addr ... dev eth0
  rmmod e1000e
  modprobe e1000e
I got the BUG below. It looks like some path forgot to free_irq and the
next attempt to request_irq (genirq error) or to reset_irq (the BUG) failed.

I don't know whether this is new in 3.13. It happened for me the first
time while trying to convince NetworkManager to set device address as
demanded.

genirq: Flags mismatch irq 46. 00000000 (eth0) vs. 00000000 (eth0)
------------[ cut here ]------------
kernel BUG at /drivers/pci/msi.c:376!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: e1000e ... [last unloaded: e1000e]
CPU: 2 PID: 27424 Comm: NetworkManager Not tainted 3.13.0-1-desktop #1
Hardware name: LENOVO 23252SG/23252SG, BIOS G2ET33WW (1.13 ) 07/24/2012
task: ffff8801004a2110 ti: ffff8800422a0000 task.ti: ffff8800422a0000
RIP: 0010:[<ffffffff8135328a>] [<ffffffff8135328a>]
free_msi_irqs+0x13a/0x140
RSP: 0018:ffff8800422a1708  EFLAGS: 00010286
RAX: ffff880059dc3200 RBX: 0000000000000000 RCX: 00000000fffffffa
RDX: 0000000000000000 RSI: 000000000000002e RDI: 0000000000000000
RBP: ffff880117780c00 R08: ffff880059dc3200 R09: ffff880119800020
R10: 00000000000000c8 R11: 0000000000000001 R12: 0000000000000001
R13: ffff880117c28870 R14: 0000000000000001 R15: ffff880117c28000
FS:  00007ff57f08b840(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000003878818 CR3: 00000000cb4b7000 CR4: 00000000001407e0
Stack:
 ffff880117c28000 ffff880117c28000 ffff8800d55a4000 ffff880117c28098
 ffff8800d55a4d78 0000000000000001 ffff8800d55a4000 ffffffff81353d2d
 ffff8800d55a4880 ffffffffa04482ad ffff8800d55a4880 ffffffffa044858f
Call Trace:
 [<ffffffff81353d2d>] pci_disable_msi+0x2d/0x50
 [<ffffffffa04482ad>] e1000e_reset_interrupt_capability+0x4d/0x60 [e1000e]
 [<ffffffffa044858f>] e1000_request_irq+0x1bf/0x280 [e1000e]
 [<ffffffffa044d75f>] e1000_open+0xff/0x5c0 [e1000e]
 [<ffffffff81517a2f>] __dev_open+0xaf/0x120
 [<ffffffff81517d25>] __dev_change_flags+0xa5/0x190
 [<ffffffff81517e49>] dev_change_flags+0x29/0x70
 [<ffffffff81525dd2>] do_setlink+0x332/0x940
 [<ffffffff81526bce>] rtnl_newlink+0x35e/0x570
 [<ffffffff815266bf>] rtnetlink_rcv_msg+0x9f/0x250
 [<ffffffff81541c99>] netlink_rcv_skb+0xa9/0xc0
 [<ffffffff81522e68>] rtnetlink_rcv+0x18/0x20
 [<ffffffff81541200>] netlink_unicast+0x100/0x180
 [<ffffffff815415b7>] netlink_sendmsg+0x337/0x760
 [<ffffffff814fd083>] sock_sendmsg+0x93/0xe0
 [<ffffffff814fd849>] ___sys_sendmsg+0x3b9/0x3d0
 [<ffffffff814fe2f4>] __sys_sendmsg+0x44/0x80
 [<ffffffff8160e9bd>] system_call_fastpath+0x1a/0x1f
 [<00007ff57c60107d>] 0x7ff57c60107c
Code: 48 8b 55 18 48 8d 45 18 48 83 ea 18 49 39 c5 75 90 48 83 c4 08 5b
5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 7b 28 e8 b8 24 cf ff eb 84 <0f> 0b
0f 1f 40 00 53 31 db f6 07 10 8b 47 08 74 1e 89 f3 f7 d3
RIP
 [<ffffffff8135328a>] free_msi_irqs+0x13a/0x140
 RSP <ffff8800422a1708>
---[ end trace 71a7e89db92577e7 ]---

thanks,
-- 
js
suse labs

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [Xen-devel] [PATCH net-next v5] xen-netfront: clean up code in xennet_release_rx_bufs
From: Annie Li @ 2014-01-26 10:12 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: konrad.wilk, ian.campbell, wei.liu2, david.vrabel, Annie Li

From: Annie Li <annie.li@oracle.com>

This patch removes grant transfer releasing code from netfront, and uses
gnttab_end_foreign_access to end grant access since
gnttab_end_foreign_access_ref may fail when the grant entry is
currently used for reading or writing.

* clean up grant transfer code kept from old netfront(2.6.18) which grants
pages for access/map and transfer. But grant transfer is deprecated in current
netfront, so remove corresponding release code for transfer.

* release grant access (through gnttab_end_foreign_access) and skb for tx/rx path,
use get_page to ensure page is released when grant access is completed successfully.

Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
for them will be created separately.

V5: Remove unecessary change in xennet_end_access.

V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
single patch.

V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
grant acess is ended.

V2: Improve patch comments.

Signed-off-by: Annie Li <annie.li@oracle.com>
---
 drivers/net/xen-netfront.c |   88 +++++++++++++-------------------------------
 1 files changed, 26 insertions(+), 62 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d7bee8a..6ddf1e6 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -117,6 +117,7 @@ struct netfront_info {
 	} tx_skbs[NET_TX_RING_SIZE];
 	grant_ref_t gref_tx_head;
 	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
+	struct page *grant_tx_page[NET_TX_RING_SIZE];
 	unsigned tx_skb_freelist;
 
 	spinlock_t   rx_lock ____cacheline_aligned_in_smp;
@@ -396,6 +397,7 @@ static void xennet_tx_buf_gc(struct net_device *dev)
 			gnttab_release_grant_reference(
 				&np->gref_tx_head, np->grant_tx_ref[id]);
 			np->grant_tx_ref[id] = GRANT_INVALID_REF;
+			np->grant_tx_page[id] = NULL;
 			add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
@@ -452,6 +454,7 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
 						mfn, GNTMAP_readonly);
 
+		np->grant_tx_page[id] = virt_to_page(data);
 		tx->gref = np->grant_tx_ref[id] = ref;
 		tx->offset = offset;
 		tx->size = len;
@@ -497,6 +500,7 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
 							np->xbdev->otherend_id,
 							mfn, GNTMAP_readonly);
 
+			np->grant_tx_page[id] = page;
 			tx->gref = np->grant_tx_ref[id] = ref;
 			tx->offset = offset;
 			tx->size = bytes;
@@ -596,6 +600,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	mfn = virt_to_mfn(data);
 	gnttab_grant_foreign_access_ref(
 		ref, np->xbdev->otherend_id, mfn, GNTMAP_readonly);
+	np->grant_tx_page[id] = virt_to_page(data);
 	tx->gref = np->grant_tx_ref[id] = ref;
 	tx->offset = offset;
 	tx->size = len;
@@ -1085,10 +1090,11 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
 			continue;
 
 		skb = np->tx_skbs[i].skb;
-		gnttab_end_foreign_access_ref(np->grant_tx_ref[i],
-					      GNTMAP_readonly);
-		gnttab_release_grant_reference(&np->gref_tx_head,
-					       np->grant_tx_ref[i]);
+		get_page(np->grant_tx_page[i]);
+		gnttab_end_foreign_access(np->grant_tx_ref[i],
+					  GNTMAP_readonly,
+					  (unsigned long)page_address(np->grant_tx_page[i]));
+		np->grant_tx_page[i] = NULL;
 		np->grant_tx_ref[i] = GRANT_INVALID_REF;
 		add_id_to_freelist(&np->tx_skb_freelist, np->tx_skbs, i);
 		dev_kfree_skb_irq(skb);
@@ -1097,78 +1103,35 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
 
 static void xennet_release_rx_bufs(struct netfront_info *np)
 {
-	struct mmu_update      *mmu = np->rx_mmu;
-	struct multicall_entry *mcl = np->rx_mcl;
-	struct sk_buff_head free_list;
-	struct sk_buff *skb;
-	unsigned long mfn;
-	int xfer = 0, noxfer = 0, unused = 0;
 	int id, ref;
 
-	dev_warn(&np->netdev->dev, "%s: fix me for copying receiver.\n",
-			 __func__);
-	return;
-
-	skb_queue_head_init(&free_list);
-
 	spin_lock_bh(&np->rx_lock);
 
 	for (id = 0; id < NET_RX_RING_SIZE; id++) {
-		ref = np->grant_rx_ref[id];
-		if (ref == GRANT_INVALID_REF) {
-			unused++;
-			continue;
-		}
+		struct sk_buff *skb;
+		struct page *page;
 
 		skb = np->rx_skbs[id];
-		mfn = gnttab_end_foreign_transfer_ref(ref);
-		gnttab_release_grant_reference(&np->gref_rx_head, ref);
-		np->grant_rx_ref[id] = GRANT_INVALID_REF;
-
-		if (0 == mfn) {
-			skb_shinfo(skb)->nr_frags = 0;
-			dev_kfree_skb(skb);
-			noxfer++;
+		if (!skb)
 			continue;
-		}
 
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Remap the page. */
-			const struct page *page =
-				skb_frag_page(&skb_shinfo(skb)->frags[0]);
-			unsigned long pfn = page_to_pfn(page);
-			void *vaddr = page_address(page);
+		ref = np->grant_rx_ref[id];
+		if (ref == GRANT_INVALID_REF)
+			continue;
 
-			MULTI_update_va_mapping(mcl, (unsigned long)vaddr,
-						mfn_pte(mfn, PAGE_KERNEL),
-						0);
-			mcl++;
-			mmu->ptr = ((u64)mfn << PAGE_SHIFT)
-				| MMU_MACHPHYS_UPDATE;
-			mmu->val = pfn;
-			mmu++;
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
 
-			set_phys_to_machine(pfn, mfn);
-		}
-		__skb_queue_tail(&free_list, skb);
-		xfer++;
-	}
-
-	dev_info(&np->netdev->dev, "%s: %d xfer, %d noxfer, %d unused\n",
-		 __func__, xfer, noxfer, unused);
+		/* gnttab_end_foreign_access() needs a page ref until
+		 * foreign access is ended (which may be deferred).
+		 */
+		get_page(page);
+		gnttab_end_foreign_access(ref, 0,
+					  (unsigned long)page_address(page));
+		np->grant_rx_ref[id] = GRANT_INVALID_REF;
 
-	if (xfer) {
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Do all the remapping work and M2P updates. */
-			MULTI_mmu_update(mcl, np->rx_mmu, mmu - np->rx_mmu,
-					 NULL, DOMID_SELF);
-			mcl++;
-			HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl);
-		}
+		kfree_skb(skb);
 	}
 
-	__skb_queue_purge(&free_list);
-
 	spin_unlock_bh(&np->rx_lock);
 }
 
@@ -1339,6 +1302,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	for (i = 0; i < NET_RX_RING_SIZE; i++) {
 		np->rx_skbs[i] = NULL;
 		np->grant_rx_ref[i] = GRANT_INVALID_REF;
+		np->grant_tx_page[i] = NULL;
 	}
 
 	/* A grant for every tx ring slot */
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH V2 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Florian Westphal @ 2014-01-26  9:58 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, Florian Westphal
In-Reply-To: <1390730297-10725-1-git-send-email-fw@strlen.de>

Marcelo Ricardo Leitner reported problems when the forwarding link
path has a lower mtu than the incoming link if the inbound interface
supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as gso packets currently bypass the
dst mtu checks in forward path. Instead, Linux tries to send out packets
exceeding R1-R2 link mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Signed-off-by: Florian Westphal <fw@strlen.de>
---

Changes since V1:
 suggestions from Eric Dumazet:
  - skip more expensive computation for small packets in fwd path
  - use netif_skb_features() feature mask and remove GSO flags
    instead of using 0 feature set.

 include/linux/skbuff.h | 17 ++++++++++++++
 net/ipv4/ip_forward.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv6/ip6_output.c  | 19 ++++++++++++++--
 3 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3d76066..37cb679 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2811,5 +2811,22 @@ static inline bool skb_head_is_locked(const struct sk_buff *skb)
 {
 	return !skb->head_frag || skb_cloned(skb);
 }
+
+/**
+ * skb_gso_network_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_network_seglen is used to determine the real size of the
+ * individual segments, including Layer3 (IP, IPv6) and L4 headers (TCP/UDP).
+ *
+ * The MAC/L2 header is not accounted for.
+ */
+static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
+{
+	unsigned int hdr_len = skb_transport_header(skb) -
+			       skb_network_header(skb);
+	return hdr_len + skb_gso_transport_seglen(skb);
+}
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 694de3b..f5a4d2a 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -39,6 +39,62 @@
 #include <net/route.h>
 #include <net/xfrm.h>
 
+static bool ip_may_fragment(const struct sk_buff *skb)
+{
+	return unlikely((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) ||
+	       !skb->local_df;
+}
+
+static bool ip_gso_exceeds_dst_mtu(const struct sk_buff *skb)
+{
+	if (skb->local_df || !skb_is_gso(skb))
+		return false;
+	return skb_gso_network_seglen(skb) > dst_mtu(skb_dst(skb));
+}
+
+static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+	unsigned len;
+
+	if (skb->len <= mtu || skb->local_df)
+		return false;
+
+	if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+		return false;
+
+	return true;
+}
+
+/* called if GSO skb needs to be fragmented on forward.  */
+static int ip_forward_finish_gso(struct sk_buff *skb)
+{
+	netdev_features_t features = netif_skb_features(skb);
+	struct sk_buff *segs;
+	int ret = 0;
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+	if (IS_ERR(segs)) {
+		kfree_skb(skb);
+		return -ENOMEM;
+	}
+
+	consume_skb(skb);
+
+	do {
+		struct sk_buff *nskb = segs->next;
+		int err;
+
+		segs->next = NULL;
+		err = dst_output(segs);
+
+		if (err && ret == 0)
+			ret = err;
+		segs = nskb;
+	} while (segs);
+
+	return ret;
+}
+
 static int ip_forward_finish(struct sk_buff *skb)
 {
 	struct ip_options *opt	= &(IPCB(skb)->opt);
@@ -49,6 +105,9 @@ static int ip_forward_finish(struct sk_buff *skb)
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
+	if (ip_gso_exceeds_dst_mtu(skb))
+		return ip_forward_finish_gso(skb);
+
 	return dst_output(skb);
 }
 
@@ -88,8 +147,7 @@ int ip_forward(struct sk_buff *skb)
 	if (opt->is_strictroute && rt->rt_uses_gateway)
 		goto sr_failed;
 
-	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
-		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
+	if (!ip_may_fragment(skb) && ip_exceeds_mtu(skb, dst_mtu(&rt->dst))) {
 		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(dst_mtu(&rt->dst)));
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index e6f9319..6a92666 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -321,6 +321,22 @@ static inline int ip6_forward_finish(struct sk_buff *skb)
 	return dst_output(skb);
 }
 
+static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
+{
+	unsigned int len;
+
+	if (skb->len <= mtu || skb->local_df)
+		return false;
+
+	if (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)
+		return true;
+
+	if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
+		return false;
+
+	return true;
+}
+
 int ip6_forward(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
@@ -443,8 +459,7 @@ int ip6_forward(struct sk_buff *skb)
 	if (mtu < IPV6_MIN_MTU)
 		mtu = IPV6_MIN_MTU;
 
-	if ((!skb->local_df && skb->len > mtu && !skb_is_gso(skb)) ||
-	    (IP6CB(skb)->frag_max_size && IP6CB(skb)->frag_max_size > mtu)) {
+	if (ip6_pkt_too_big(skb, mtu)) {
 		/* Again, force OUTPUT device used as source address */
 		skb->dev = dst->dev;
 		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
1.8.1.5

^ permalink raw reply related

* [PATCH V2 1/2] net: add and use skb_gso_transport_seglen()
From: Florian Westphal @ 2014-01-26  9:58 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, Florian Westphal

This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since V1:
 suggestions from Eric Dumazet:
  - don't use uapi udp.h
  - remove tcp.h include from tbf, its not needed anymore

 include/linux/skbuff.h |  1 +
 net/core/skbuff.c      | 25 +++++++++++++++++++++++++
 net/sched/sch_tbf.c    | 13 +++----------
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6f69b3f..3d76066 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2371,6 +2371,7 @@ void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
+unsigned int skb_gso_transport_seglen(const struct sk_buff *skb);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 
 struct skb_checksum_ops {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 06e72d3..ce04da0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -47,6 +47,8 @@
 #include <linux/in.h>
 #include <linux/inet.h>
 #include <linux/slab.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 #include <linux/netdevice.h>
 #ifdef CONFIG_NET_CLS_ACT
 #include <net/pkt_sched.h>
@@ -3592,3 +3594,26 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
 	nf_reset_trace(skb);
 }
 EXPORT_SYMBOL_GPL(skb_scrub_packet);
+
+/**
+ * skb_gso_transport_seglen - Return length of individual segments of a gso packet
+ *
+ * @skb: GSO skb
+ *
+ * skb_gso_transport_seglen is used to determine the real size of the
+ * individual segments, including Layer4 headers (TCP/UDP).
+ *
+ * The MAC/L2 or network (IP, IPv6) headers are not accounted for.
+ */
+unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
+{
+	const struct skb_shared_info *shinfo = skb_shinfo(skb);
+	unsigned int hdr_len;
+
+	if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
+		hdr_len = tcp_hdrlen(skb);
+	else
+		hdr_len = sizeof(struct udphdr);
+	return hdr_len + shinfo->gso_size;
+}
+EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 887e672..758c356 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -21,7 +21,6 @@
 #include <net/netlink.h>
 #include <net/sch_generic.h>
 #include <net/pkt_sched.h>
-#include <net/tcp.h>
 
 
 /*	Simple Token Bucket Filter.
@@ -148,16 +147,10 @@ static u64 psched_ns_t2l(const struct psched_ratecfg *r,
  * Return length of individual segments of a gso packet,
  * including all headers (MAC, IP, TCP/UDP)
  */
-static unsigned int skb_gso_seglen(const struct sk_buff *skb)
+static unsigned int skb_gso_mac_seglen(const struct sk_buff *skb)
 {
 	unsigned int hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
-	const struct skb_shared_info *shinfo = skb_shinfo(skb);
-
-	if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
-		hdr_len += tcp_hdrlen(skb);
-	else
-		hdr_len += sizeof(struct udphdr);
-	return hdr_len + shinfo->gso_size;
+	return hdr_len + skb_gso_transport_seglen(skb);
 }
 
 /* GSO packet is too big, segment it so that tbf can transmit
@@ -202,7 +195,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	int ret;
 
 	if (qdisc_pkt_len(skb) > q->max_size) {
-		if (skb_is_gso(skb) && skb_gso_seglen(skb) <= q->max_size)
+		if (skb_is_gso(skb) && skb_gso_mac_seglen(skb) <= q->max_size)
 			return tbf_segment(skb, sch);
 		return qdisc_reshape_fail(skb, sch);
 	}
-- 
1.8.1.5

^ permalink raw reply related

* WARNING: at net/ipv4/devinet.c:1599
From: Geert Uytterhoeven @ 2014-01-26  9:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org

On m68k/ARAnyM:

WARNING: CPU: 0 PID: 407 at net/ipv4/devinet.c:1599 0x316a99()
Modules linked in:
CPU: 0 PID: 407 Comm: ifconfig Not tainted
3.13.0-atari-09263-g0c71d68014d1 #1378
Stack from 10c4fdf0:
        10c4fdf0 002ffabb 000243e8 00000000 008ced6c 00024416 00316a99 0000063f
        00316a99 00000009 00000000 002501b4 00316a99 0000063f c0a86117 00000080
        c0a86117 00ad0c90 00250a5a 00000014 00ad0c90 00000000 00000000 00000001
        00b02dd0 00356594 00000000 00356594 c0a86117 eff6c9e4 008ced6c 00000002
        008ced60 0024f9b4 00250b52 00ad0c90 00000000 00000000 00252390 00ad0c90
        eff6c9e4 0000004f 00000000 00000000 eff6c9e4 8000e25c eff6c9e4 80001020
Call Trace: [<000243e8>] warn_slowpath_common+0x52/0x6c
 [<00024416>] warn_slowpath_null+0x14/0x1a
 [<002501b4>] rtmsg_ifa+0xdc/0xf0
 [<00250a5a>] __inet_insert_ifa+0xd6/0x1c2
 [<0024f9b4>] inet_abc_len+0x0/0x42
 [<00250b52>] inet_insert_ifa+0xc/0x12
 [<00252390>] devinet_ioctl+0x2ae/0x5d6
 [<00020000>] _060_fpsp_effadd+0xc90c/0xd518
 [<00020000>] _060_fpsp_effadd+0xc90c/0xd518
 [<002530ec>] inet_ioctl+0x120/0x14e
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<001f7666>] sock_ioctl+0x56/0x256
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<00095434>] vfs_ioctl+0x1c/0x30
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<0009559c>] do_vfs_ioctl+0x7a/0x3b8
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<00002884>] buserr+0x20/0x28
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<00095910>] SyS_ioctl+0x36/0x5a
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<00002980>] syscall+0x8/0xc
 [<00008916>] atari_scc_console_write+0x42/0x5c
 [<0010c00b>] mext_leaf_block+0x443/0x81e

---[ end trace 44b14c97c2210758 ]---

Adding some debugging code reveals that net_fill_ifaddr() fails in

    put_cacheinfo(skb, ifa->ifa_cstamp, ifa->ifa_tstamp,
                              preferred, valid))

nla_put complains:

lib/nlattr.c:454: skb_tailroom(skb) = 12, nla_total_size(attrlen) = 20

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [RFC ipsec-next] xfrm: avoid creating temporary SA when there are no listeners
From: Horia Geanta @ 2014-01-26  9:50 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev
In-Reply-To: <1390729852-7842-1-git-send-email-horia.geanta@freescale.com>

In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.

The sane thing to do is to make sure we have audience before
temporary SA allocation.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
---
 include/net/xfrm.h    | 15 +++++++++++++++
 net/key/af_key.c      | 20 ++++++++++++++++++++
 net/xfrm/xfrm_state.c | 31 +++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c  |  6 ++++++
 4 files changed, 72 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index cd7c46f..449a867 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -594,6 +594,7 @@ struct xfrm_mgr {
 					   const struct xfrm_migrate *m,
 					   int num_bundles,
 					   const struct xfrm_kmaddress *k);
+	bool			(*is_alive)(const struct km_event *c);
 };
 
 int xfrm_register_km(struct xfrm_mgr *km);
@@ -1646,6 +1647,20 @@ static inline int xfrm_aevent_is_on(struct net *net)
 	rcu_read_unlock();
 	return ret;
 }
+
+static inline int xfrm_acquire_is_on(struct net *net)
+{
+	struct sock *nlsk;
+	int ret = 0;
+
+	rcu_read_lock();
+	nlsk = rcu_dereference(net->xfrm.nlsk);
+	if (nlsk)
+		ret = netlink_has_listeners(nlsk, XFRMNLGRP_ACQUIRE);
+	rcu_read_unlock();
+
+	return ret;
+}
 #endif
 
 static inline int xfrm_alg_len(const struct xfrm_algo *alg)
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1a04c13..12eb0ad 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3059,6 +3059,25 @@ static u32 get_acqseq(void)
 	return res;
 }
 
+static bool pfkey_is_alive(const struct km_event *c)
+{
+	struct netns_pfkey *net_pfkey = net_generic(c->net, pfkey_net_id);
+	struct sock *sk;
+	struct hlist_node *node;
+	bool is_alive = false;
+
+	rcu_read_lock();
+	sk_for_each_rcu(sk, node, &net_pfkey->table) {
+		if (pfkey_sk(sk)->registered) {
+			is_alive = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return is_alive;
+}
+
 static int pfkey_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *xp)
 {
 	struct sk_buff *skb;
@@ -3784,6 +3803,7 @@ static struct xfrm_mgr pfkeyv2_mgr =
 	.new_mapping	= pfkey_send_new_mapping,
 	.notify_policy	= pfkey_send_policy_notify,
 	.migrate	= pfkey_send_migrate,
+	.is_alive	= pfkey_is_alive,
 };
 
 static int __net_init pfkey_net_init(struct net *net)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8d11d28..e79f376 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -161,6 +161,7 @@ static DEFINE_SPINLOCK(xfrm_state_gc_lock);
 int __xfrm_state_delete(struct xfrm_state *x);
 
 int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol);
+bool km_is_alive(const struct km_event *c);
 void km_state_expired(struct xfrm_state *x, int hard, u32 portid);
 
 static DEFINE_SPINLOCK(xfrm_type_lock);
@@ -788,6 +789,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
 	struct xfrm_state *best = NULL;
 	u32 mark = pol->mark.v & pol->mark.m;
 	unsigned short encap_family = tmpl->encap_family;
+	struct km_event c;
 
 	to_put = NULL;
 
@@ -832,6 +834,17 @@ found:
 			error = -EEXIST;
 			goto out;
 		}
+
+		c.net = net;
+		/* If the KMs have no listeners (yet...), avoid allocating an SA
+		 * for each and every packet - garbage collection might not
+		 * handle the flood.
+		 */
+		if (!km_is_alive(&c)) {
+			error = -ESRCH;
+			goto out;
+		}
+
 		x = xfrm_state_alloc(net);
 		if (x == NULL) {
 			error = -ENOMEM;
@@ -1793,6 +1806,24 @@ int km_report(struct net *net, u8 proto, struct xfrm_selector *sel, xfrm_address
 }
 EXPORT_SYMBOL(km_report);
 
+bool km_is_alive(const struct km_event *c)
+{
+	struct xfrm_mgr *km;
+	bool is_alive = false;
+
+	read_lock(&xfrm_km_lock);
+	list_for_each_entry(km, &xfrm_km_list, list) {
+		if (km->is_alive && km->is_alive(c)) {
+			is_alive = true;
+			break;
+		}
+	}
+	read_unlock(&xfrm_km_lock);
+
+	return is_alive;
+}
+EXPORT_SYMBOL(km_is_alive);
+
 int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen)
 {
 	int err;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3348566..b53a489 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2981,6 +2981,11 @@ static int xfrm_send_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
 	return nlmsg_multicast(net->xfrm.nlsk, skb, 0, XFRMNLGRP_MAPPING, GFP_ATOMIC);
 }
 
+static bool xfrm_is_alive(const struct km_event *c)
+{
+	return (bool)xfrm_acquire_is_on(c->net);
+}
+
 static struct xfrm_mgr netlink_mgr = {
 	.id		= "netlink",
 	.notify		= xfrm_send_state_notify,
@@ -2990,6 +2995,7 @@ static struct xfrm_mgr netlink_mgr = {
 	.report		= xfrm_send_report,
 	.migrate	= xfrm_send_migrate,
 	.new_mapping	= xfrm_send_mapping,
+	.is_alive	= xfrm_is_alive,
 };
 
 static int __net_init xfrm_user_net_init(struct net *net)
-- 
1.8.3.1

^ permalink raw reply related

* [RFC ipsec-next] Temporary SA allocation and OOM
From: Horia Geanta @ 2014-01-26  9:50 UTC (permalink / raw)
  To: Steffen Klassert, David S. Miller; +Cc: netdev

Hi,

In the cases where:
-policies are in place
-there are no key managers registered to PF_KEY / NETLINK XFRM events
-xfrm_states are not available (no KM to negotiate them)

xfrm_state_find will be called for every IPsec packet entering the system.
A temporary SA is allocated; however, since there are no KMs, km_query()
fails to send an ACQUIRE notification and the temporary SA is
immediately garbage collected.

This leads to OOM, considering the fact that SA alloc/free is performed
for each packet and garbage collection does not keep up the pace when
traffic rate is high.

I am attaching as RFC a patch that checks whether there are
any KMs registered before allocating the temporary SA.

A new callback - is_alive - is added to the xfrm_mgr.
If is_alive returns true, there are many chances that km_query() won't fail,
thus temporary SA won't be freed (at least not until it expires - 30s default).
This alleviates the strain caused by very frequent memory allocation.

Thanks,
Horia

Horia Geanta (1):
  xfrm: avoid creating temporary SA when there are no listeners

 include/net/xfrm.h    | 15 +++++++++++++++
 net/key/af_key.c      | 20 ++++++++++++++++++++
 net/xfrm/xfrm_state.c | 31 +++++++++++++++++++++++++++++++
 net/xfrm/xfrm_user.c  |  6 ++++++
 4 files changed, 72 insertions(+)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH 2/2] net: ip, ipv6: handle gso skbs in forwarding path
From: Florian Westphal @ 2014-01-26  9:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev
In-Reply-To: <1390700277.27806.72.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > +static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
> > +{
> > +	unsigned len;
> > +
> > +	if (skb->local_df)
> > +		return false;
> > +	len = skb_is_gso(skb) ? skb_gso_network_seglen(skb) : skb->len;
> > +
> > +	return len > mtu;
> 
> The function should avoid extra computation/tests for small packets.
> 
> if (skb->len <= mtu || skb->local_df)
> 	return false;

Good idea!  Will change it as per your suggestion.

> if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu)
> 	return false;
> 
> return true;
> 
> > +}
> > +
> > +/* called if GSO skb needs to be fragmented on forward.  */
> > +static int ip_forward_finish_gso(struct sk_buff *skb)
> > +{
> > +	struct sk_buff *segs = skb_gso_segment(skb, 0);
> 
> 0 is very pessimistic.
> 
> Have you tried :
> 
> netdev_features_t features = netif_skb_features(skb); 
> struct sk_buff *segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);

No.  I'll see if this works for me, then include it in V2.

Thanks Eric.

^ permalink raw reply

* Re: [PATCH 1/2] net: add and use skb_gso_transport_seglen()
From: Florian Westphal @ 2014-01-26  9:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev
In-Reply-To: <1390699688.27806.66.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2014-01-25 at 23:48 +0100, Florian Westphal wrote:
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -45,6 +45,7 @@
> >  #include <linux/mm.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/in.h>
> > +#include <linux/tcp.h>
> >  #include <linux/inet.h>
> >  #include <linux/slab.h>
> >  #include <linux/netdevice.h>
> > @@ -71,6 +72,8 @@
> >  #include <trace/events/skb.h>
> >  #include <linux/highmem.h>
> >  
> > +#include <uapi/linux/udp.h>
> 
> 
> Normally you should not use uapi/

I added this include to ensure sizeof(struct udphdr) works.
I'll change it to linux/udp.h

> > diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
> > index 887e672..837a61b 100644
> > --- a/net/sched/sch_tbf.c
> > +++ b/net/sched/sch_tbf.c
> 
> It seems you forgot to remove from this file this include :
> 
> #include <net/tcp.h>

Indeed, thanks for spotting this.

> Otherwise, this seems good, thanks !

Thanks for reviewing Eric!

^ permalink raw reply

* [PATCH net-next] 8139cp: remove a won't occurred BUG_ON
From: Wang Weidong @ 2014-01-26  8:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

when variable i go to the BUG_ON the value is equal to the CP_NUM_STATS,
so the BUG_ON won't occur, so remove it

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
---
 drivers/net/ethernet/realtek/8139cp.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
index 737c1a8..b70e184 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -1585,7 +1585,6 @@ static void cp_get_ethtool_stats (struct net_device *dev,
 	tmp_stats[i++] = le16_to_cpu(nic_stats->tx_abort);
 	tmp_stats[i++] = le16_to_cpu(nic_stats->tx_underrun);
 	tmp_stats[i++] = cp->cp_stats.rx_frags;
-	BUG_ON(i != CP_NUM_STATS);
 
 	dma_free_coherent(&cp->pdev->dev, sizeof(*nic_stats), nic_stats, dma);
 }
-- 
1.7.12

^ permalink raw reply related

* Re: critic on documentation of the network stack
From: Florian Fainelli @ 2014-01-26  3:18 UTC (permalink / raw)
  To: Stephen Hemminger, Hannes Frederic Sowa; +Cc: netdev
In-Reply-To: <20140124155835.467deca8@nehalam.linuxnetplumber.net>

Le 24/01/2014 15:58, Stephen Hemminger a écrit :
> On Fri, 24 Jan 2014 04:23:24 +0100
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>
>> Hello!
>>
>> After net-next is closed I wanted to put the following link here:
>>
>>    <http://linux.slashdot.org/comments.pl?sid=4356053&cid=45184693>
>
> The problem I have is more that there are more incorrect sources of documentation
> and differing opinions on the Internet. Maybe the problem is users, maybe the
> problem is lack of SEO, or developers not being paid to write documentation, or
> old web sites not being updated. For example, this commenter obviously never
> found http://www.lartc.org/

(which unfortunately is rather outdated)

I do not buy the fact that some developers do not provide documentation 
of the features they are adding potentially on purpose, truth is 
probably much simpler, you worked on X, you have now moved on and work on Y.

If nobody pays enough attention to what gets added through netdev, 
iproute2, ethtool, man-pages and enforces the need for documentation, 
then comes the current status quo where not all features are documented, 
until some benevolant person realizes this needs fixing. Considering the 
high volume of the list, this is all understandable.

There could probably be some programmatical ways to enforce such 
documentation by only allowing patches coming with, say kernel-doc 
content along the code, and have man-pages and other projects scan for 
new kernel-doc entries they have no reference for.
--
Florian

^ permalink raw reply

* Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash key.
From: Ben Hutchings @ 2014-01-26  2:37 UTC (permalink / raw)
  To: Venkata Duvvuru; +Cc: netdev@vger.kernel.org
In-Reply-To: <BF3270C86E8B1349A26C34E4EC1C44CB2C85E298@CMEXMB1.ad.emulex.com>

[-- Attachment #1: Type: text/plain, Size: 3364 bytes --]

On Fri, 2014-01-24 at 12:00 +0000, Venkata Duvvuru wrote:
> 
> > -----Original Message-----
> > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > Sent: Thursday, January 23, 2014 8:39 PM
> > To: Venkata Duvvuru
> > Cc: netdev@vger.kernel.org
> > Subject: Re: [PATCH net-next 3/4] ethtool: Support for configurable RSS hash
> > key.
> > 
> > On Thu, 2014-01-23 at 13:47 +0000, Venkata Duvvuru wrote:
> > > > -----Original Message-----
> > > > From: Ben Hutchings [mailto:ben@decadent.org.uk]
> > > > Sent: Thursday, January 23, 2014 11:09 AM
> > > > To: Venkata Duvvuru
> > > > Cc: netdev@vger.kernel.org
> > > > Subject: Re: [PATCH net-next 3/4] ethtool: Support for configurable
> > > > RSS hash key.
> > > >
> > > > On Wed, 2014-01-22 at 12:12 +0000, Venkata Duvvuru wrote:
> > > > [...]
> > > > > > No, what I mean is:
> > > > > >
> > > > > > 1. An RX flow steering filter can specify use of RSS, in which
> > > > > > case the value looked up in the indirection is added to the
> > > > > > queue number specified in the filter.  This is not yet
> > > > > > controllable through RX NFC though there is room for extension
> > there.
> > > > > >
> > > > > > 2. Multi-function controllers need multiple RSS contexts (key +
> > > > > > indirection
> > > > > > table) to support independent use of RSS on each function.
> > > > > > But it may also be possible to allocate multiple contexts to a
> > > > > > single
> > > > function.
> > > > > > This could be useful in conjunction with 1.  But there would
> > > > > > need to be a way to allocate and configure extra contexts first.
> > > > > The proposed changes will be incremental so I think this can be
> > > > > done in a separate patch. Thoughts?
> > > >
> > > > The ethtool ABI (to userland) has to remain backward-compatible, and
> > > > it is preferable if we don't add lots of different structures for this.
> > > >
> > > > So please define the new command structure to include both the key
> > > > and indirection table, and some reserved space (documented as
> > > > 'userland must set to 0') for future extensions.
> > >
> > > I think it’s better to keep key and indirection table settings as
> > > different ethtool commands. We can probably add rss contexts (reserved
> > > space) to both the command structures.
> > > If we mix key and indirection table into one command structure then it
> > > will hamper the compatibility.
> > [...]
> > 
> > Right, there is no compatible way to extend struct ethtool_rxfh_indir.
> > I should have thought ahead when defining it!  But the new structure doesn't
> > need to have that problem.
> 
> If any one of the operations (key or indirection table) is not
> supported by the driver, should we silently ignore that operation and
> process the other supported operation or should we fail the command.

It should fail completely.

>  If we fail the command then we are mandating the drivers to implement
> both the operations.

So far as I know, all the multiqueue NICs that include a flow hash
indirection table do so as part of the Microsoft RSS specification,
which requires the hash key to be configurable as well.  Do you know of
any cases where only one of the two is configurable?

Ben.

-- 
Ben Hutchings
If the facts do not conform to your theory, they must be disposed of.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox