* Re: [PATCH net-next 1/1] ipvlan: Initial check-in of the IPVLAN driver.
From: Alexei Starovoitov @ 2014-11-13 23:25 UTC (permalink / raw)
To: Mahesh Bandewar
Cc: netdev, Eric Dumazet, Maciej Zenczykowski, Laurent Chavey,
Tim Hockin, David Miller, Brandon Philips, Pavel Emelianov
On Tue, Nov 11, 2014 at 2:29 PM, Mahesh Bandewar <maheshb@google.com> wrote:
> The device operates in two different modes and the difference
> in these two modes in primarily in the TX side.
>
> (a) L2 mode : In this mode, the device behaves as a L2 device.
> TX processing upto L2 happens on the stack of the virtual device
> associated with (namespace). Packets are switched after that
> into the main device (default-ns) and queued for xmit.
>
> RX processing is simple and all multicast, broadcast (if
> applicable), and unicast belonging to the address(es) are
> delivered to the virtual devices.
>
> (b) L3 mode : In this mode, the device behaves like a L3 device.
> TX processing upto L3 happens on the stack of the virtual device
> associated with (namespace). Packets are switched to the
> main-device (default-ns) for the L2 processing. Hence the routing
> table of the default-ns will be used in this mode.
>
> RX processins is somewhat similar to the L2 mode except that in
> this mode only Unicast packets are delivered to the virtual device
> while main-dev will handle all other packets.
great stuff. would be interesting to see a 'typical use'
scenario of l2 vs l3 mode. Why users would pick one
or another?
I can only think of different default ip in different ns
would force l2. Anything else?
Few comments:
> +++ b/drivers/net/ipvlan/ipvlan.h
...
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/errno.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include <linux/rculist.h>
> +#include <linux/notifier.h>
> +#include <linux/netdevice.h>
> +#include <linux/etherdevice.h>
> +#include <linux/ethtool.h>
> +#include <linux/if_arp.h>
> +#include <linux/if_link.h>
> +#include <linux/atomic.h>
> +#include <linux/if_vlan.h>
> +#include <linux/inet.h>
> +#include <linux/hash.h>
> +#include <linux/ip.h>
> +#include <linux/inetdevice.h>
> +#include <net/rtnetlink.h>
> +#include <net/gre.h>
> +#include <net/route.h>
> +#include <net/addrconf.h>
I don't think it's a good style to put all headers that all
.c need into common .h
Rather put them into individual .c
> +static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> +{
> + void *lyr3h = NULL;
> +
> + switch (skb->protocol) {
> + case htons(ETH_P_ARP): {
> + struct arphdr *arph;
> +
> + if (unlikely(!pskb_may_pull(skb, sizeof(struct arphdr))))
> + return NULL;
> +
> + arph = arp_hdr(skb);
> + *type = IPVL_ARP;
> + lyr3h = arph;
> + break;
> + }
...
> +static struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port,
> + void *lyr3h, int addr_type,
> + bool use_dest)
> +{
> + struct ipvl_addr *addr = NULL;
> +
> + if (addr_type == IPVL_IPV6) {
> + struct ipv6hdr *ip6h = NULL;
> + struct in6_addr *i6addr;
> +
> + ip6h = (struct ipv6hdr *)lyr3h;
> + i6addr = use_dest ? &ip6h->daddr : &ip6h->saddr;
> + addr = ipvlan_ht_addr_lookup(port, i6addr, true);
imo it looks very artificial to split logically single
lookup function into two: get() that returns 'type'/
'void * lyr3h' and lookup() that uses them.
It feels error prone.
Also everywhere lookup() follows get() immediately.
I think single lookup() would be much cleaner.
^ permalink raw reply
* [PATCHv2 net 1/2] fm10k: Check tunnel header length in encap offload
From: Joe Stringer @ 2014-11-13 23:36 UTC (permalink / raw)
To: netdev
Cc: matthew.vick, jeffrey.t.kirsher, linux.nics, therbert, gerlitz.or,
alexander.duyck, linux-kernel
fm10k supports up to 184 bytes of inner+outer headers. Add an initial
check to fail encap offload if these are too large.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
Matthew, I didn't see the equivalent patch on netdev so I went ahead and
created it. If I've missed this somewhere, then please disregard.
v2: First post.
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e645af4..3a85291 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -732,6 +732,12 @@ static __be16 fm10k_tx_encap_offload(struct sk_buff *skb)
struct ethhdr *eth_hdr;
u8 l4_hdr = 0;
+/* fm10k supports 184 octets of outer+inner headers. Minus 20 for inner L4. */
+#define FM10K_MAX_ENCAP_TRANSPORT_OFFSET 164
+ if (skb_inner_transport_header(skb) - skb_mac_header(skb) >
+ FM10K_MAX_ENCAP_TRANSPORT_OFFSET)
+ return 0;
+
switch (vlan_get_protocol(skb)) {
case htons(ETH_P_IP):
l4_hdr = ip_hdr(skb)->protocol;
--
1.7.10.4
^ permalink raw reply related
* [PATCHv2 net 2/2] fm10k: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-13 23:36 UTC (permalink / raw)
To: netdev
Cc: matthew.vick, jeffrey.t.kirsher, linux.nics, therbert, gerlitz.or,
alexander.duyck, linux-kernel
In-Reply-To: <1415921801-10452-1-git-send-email-joestringer@nicira.com>
ndo_gso_check() was recently introduced to allow NICs to report the
offloading support that they have on a per-skb basis. Add an
implementation for this driver which checks for something that looks
like VXLAN.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
v2: Reuse fm10k_tx_encap_offload().
---
drivers/net/ethernet/intel/fm10k/fm10k.h | 1 +
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 42eb434..d38f088 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -443,6 +443,7 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
struct fm10k_ring *tx_ring);
void fm10k_tx_timeout_reset(struct fm10k_intfc *interface);
bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring);
+__be16 fm10k_tx_encap_offload(struct sk_buff *skb);
void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count);
/* PCI */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 3a85291..1144e14 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -727,7 +727,7 @@ static struct ethhdr *fm10k_gre_is_nvgre(struct sk_buff *skb)
return (struct ethhdr *)(&nvgre_hdr->tni);
}
-static __be16 fm10k_tx_encap_offload(struct sk_buff *skb)
+__be16 fm10k_tx_encap_offload(struct sk_buff *skb)
{
struct ethhdr *eth_hdr;
u8 l4_hdr = 0;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 8811364..6e8630a 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1350,6 +1350,13 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
}
}
+static bool fm10k_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+ return (!(skb_shinfo(skb)->gso_type &
+ (SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE)) ||
+ fm10k_tx_encap_offload(skb));
+}
+
static const struct net_device_ops fm10k_netdev_ops = {
.ndo_open = fm10k_open,
.ndo_stop = fm10k_close,
@@ -1372,6 +1379,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
.ndo_do_ioctl = fm10k_ioctl,
.ndo_dfwd_add_station = fm10k_dfwd_add_station,
.ndo_dfwd_del_station = fm10k_dfwd_del_station,
+ .ndo_gso_check = fm10k_gso_check,
};
#define DEFAULT_DEBUG_LEVEL_SHIFT 3
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCHv2 net 1/2] fm10k: Check tunnel header length in encap offload
From: Jeff Kirsher @ 2014-11-13 23:41 UTC (permalink / raw)
To: Joe Stringer
Cc: netdev, matthew.vick, linux.nics, therbert, gerlitz.or,
alexander.duyck, linux-kernel
In-Reply-To: <1415921801-10452-1-git-send-email-joestringer@nicira.com>
[-- Attachment #1: Type: text/plain, Size: 579 bytes --]
On Thu, 2014-11-13 at 15:36 -0800, Joe Stringer wrote:
> fm10k supports up to 184 bytes of inner+outer headers. Add an initial
> check to fail encap offload if these are too large.
>
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> ---
> Matthew, I didn't see the equivalent patch on netdev so I went ahead
> and
> created it. If I've missed this somewhere, then please disregard.
>
> v2: First post.
> ---
> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 6 ++++++
> 1 file changed, 6 insertions(+)
Thanks Joe, I will add your patch to my queue.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCHv2 net 2/2] fm10k: Implement ndo_gso_check()
From: Jeff Kirsher @ 2014-11-13 23:41 UTC (permalink / raw)
To: Joe Stringer
Cc: netdev, matthew.vick, linux.nics, therbert, gerlitz.or,
alexander.duyck, linux-kernel
In-Reply-To: <1415921801-10452-2-git-send-email-joestringer@nicira.com>
[-- Attachment #1: Type: text/plain, Size: 678 bytes --]
On Thu, 2014-11-13 at 15:36 -0800, Joe Stringer wrote:
> ndo_gso_check() was recently introduced to allow NICs to report the
> offloading support that they have on a per-skb basis. Add an
> implementation for this driver which checks for something that looks
> like VXLAN.
>
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> ---
> v2: Reuse fm10k_tx_encap_offload().
> ---
> drivers/net/ethernet/intel/fm10k/fm10k.h | 1 +
> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++++
> 3 files changed, 10 insertions(+), 1 deletion(-)
Same with this one as well, thanks Joe.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCHv2 net 2/2] fm10k: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-13 23:52 UTC (permalink / raw)
To: Jeff Kirsher, Shannon Nelson, Jesse Brandeburg
Cc: netdev, matthew.vick, linux.nics, therbert, gerlitz.or,
alexander.duyck, linux-kernel
In-Reply-To: <1415922116.2454.34.camel@jtkirshe-mobl>
On Thu, Nov 13, 2014 at 03:41:56PM -0800, Jeff Kirsher wrote:
> On Thu, 2014-11-13 at 15:36 -0800, Joe Stringer wrote:
> > ndo_gso_check() was recently introduced to allow NICs to report the
> > offloading support that they have on a per-skb basis. Add an
> > implementation for this driver which checks for something that looks
> > like VXLAN.
> >
> > Signed-off-by: Joe Stringer <joestringer@nicira.com>
> > ---
> > v2: Reuse fm10k_tx_encap_offload().
> > ---
> > drivers/net/ethernet/intel/fm10k/fm10k.h | 1 +
> > drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
> > drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++++
> > 3 files changed, 10 insertions(+), 1 deletion(-)
>
> Same with this one as well, thanks Joe.
Thanks Jeff.
Could you remind me, is the equivalent i40e patch on your queue or were
we still waiting on further feedback from Shannon/Jesse?
^ permalink raw reply
* Re: arm64 allmodconfig failures in nft_reject_bridge.c
From: Mark Brown @ 2014-11-13 23:51 UTC (permalink / raw)
To: David Miller
Cc: pablo, linux, kaber, kadlec, stephen, linaro-kernel,
kernel-build-reports, netfilter-devel, coreteam, bridge, netdev
In-Reply-To: <20141113.152353.548265176661091467.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 579 bytes --]
On Thu, Nov 13, 2014 at 03:23:53PM -0500, David Miller wrote:
> Date: Thu, 13 Nov 2014 19:47:52 +0000
> > On Thu, Nov 13, 2014 at 02:35:13PM -0500, David Miller wrote:
> >> I hold changes in my tree for a week or more, because I want them to
> >> "cook" there before they go to Linus.
> > Hrm. Guess there must've been some other change in -next that pulled
> > the header in implicitly here :(
> -next pulls in my 'net' tree, so got the fix.
Right, but we didn't see the problem before you sent the tree with the
problematic patch to Linus - we didn't detect it in -next.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply
* linux-next: ath9k: build failure, ath_cmn_process_fft() redefinition
From: Jeremiah Mahler @ 2014-11-14 0:07 UTC (permalink / raw)
To: Oleksij Rempel
Cc: Jeremiah Mahler, linux-kernel, ath9k-devel, linville,
linux-wireless, ath9k-devel, netdev
In version 20141113 of the linux-next kernel, if it is compiled with
CONFIG_ATH9K_DEBUGFS unset, an error about ath_cmn_process_fft() being
redefined will be produced.
make
...
LD [M] drivers/net/wireless/ath/ath9k/ath9k_hw.o
CC [M] drivers/net/wireless/ath/ath9k/common-spectral.o
CC lib/debug_locks.o
CC lib/random32.o
drivers/net/wireless/ath/ath9k/common-spectral.c:40:5: error:
redefinition of ‘ath_cmn_process_fft’
int ath_cmn_process_fft(struct ath_spec_scan_priv *spec_priv, struct
ieee80211_hdr *hdr,
^
In file included from drivers/net/wireless/ath/ath9k/common.h:27:0,
from drivers/net/wireless/ath/ath9k/ath9k.h:27,
from
drivers/net/wireless/ath/ath9k/common-spectral.c:18:
drivers/net/wireless/ath/ath9k/common-spectral.h:146:19: note: previous
definition of ‘ath_cmn_process_fft’ was here
static inline int ath_cmn_process_fft(struct ath_spec_scan_priv
*spec_priv,
^
scripts/Makefile.build:257: recipe for target
'drivers/net/wireless/ath/ath9k/common-spectral.o' failed
make[5]: *** [drivers/net/wireless/ath/ath9k/common-spectral.o] Error 1
scripts/Makefile.build:402: recipe for target
'drivers/net/wireless/ath/ath9k' failed
make[4]: *** [drivers/net/wireless/ath/ath9k] Error 2
scripts/Makefile.build:402: recipe for target 'drivers/net/wireless/ath'
failed
make[3]: *** [drivers/net/wireless/ath] Error 2
scripts/Makefile.build:402: recipe for target 'drivers/net/wireless'
failed
make[2]: *** [drivers/net/wireless] Error 2
scripts/Makefile.build:402: recipe for target 'drivers/net' failed
make[1]: *** [drivers/net] Error 2
Makefile:953: recipe for target 'drivers' failed
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....
CC lib/bust_spinlocks.o
...
Bisecting the kernel found that the following patch was the cause.
commit 67dc74f15f147b9f88702de2952d2951e3e000ec
Author: Oleksij Rempel <linux@rempel-privat.de>
Date: Thu Nov 6 08:53:30 2014 +0100
ath9k: move spectral.* to common-spectral.*
and rename exports from ath9k_spectral_* to ath9k_cmn_spectral_*
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch mostly consists of renaming functions and moving code but
there was a functional change to the Makefile.
common-spectral.h uses CONFIG_ATH9K_DEBUGFS to conditionally provide a
prototype of ath_cmn_process_fft() when set or to define it as a noop
when it is unset. The Makefile was changed so that CONFIG_ATH9K_DEBUGFS
no longer applied to common-spectral and this will result in two
definitions of ath_cmn_process_fft().
> --- a/drivers/net/wireless/ath/ath9k/Makefile
> +++ b/drivers/net/wireless/ath/ath9k/Makefile
> @@ -16,8 +16,7 @@ ath9k-$(CONFIG_ATH9K_DFS_CERTIFIED) += dfs.o
> ath9k-$(CONFIG_ATH9K_TX99) += tx99.o
> ath9k-$(CONFIG_ATH9K_WOW) += wow.o
>
> -ath9k-$(CONFIG_ATH9K_DEBUGFS) += debug.o \
> - spectral.o
> +ath9k-$(CONFIG_ATH9K_DEBUGFS) += debug.o
>
> ath9k-$(CONFIG_ATH9K_STATION_STATISTICS) += debug_sta.o
>
> @@ -59,7 +58,8 @@ obj-$(CONFIG_ATH9K_COMMON) += ath9k_common.o
> ath9k_common-y:= common.o \
> common-init.o \
> common-beacon.o \
> - common-debug.o
> + common-debug.o \
> + common-spectral.o
Reverting the patch solves one error, but then a new one is produced.
make
...
MODPOST 185 modules
CC arch/x86/boot/edd.o
VOFFSET arch/x86/boot/voffset.h
ERROR: "ath9k_cmn_spectral_scan_trigger"
[drivers/net/wireless/ath/ath9k/ath9k.ko] undefined!
scripts/Makefile.modpost:90: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
...
This error is caused by the patch before it.
commit f00a422cc81ef665f5098c0bc43cb0c616e55a9b
Author: Oleksij Rempel <linux@rempel-privat.de>
Date: Thu Nov 6 08:53:29 2014 +0100
ath9k: move ath9k_spectral_scan_ from main.c to spectral.c
Now we should be ready to make this code common.
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since the code was moved from main.c to spectral.c, it is now involved
with CONFIG_ATH9K_DEBUGFS, which causes it to break.
Reverting both the above patches resolves the build errors.
--
Jeremiah Mahler
jmmahler@gmail.com
http://github.com/jmahler
^ permalink raw reply
* Re: [PATCHv2 net 2/2] fm10k: Implement ndo_gso_check()
From: Jeff Kirsher @ 2014-11-14 0:12 UTC (permalink / raw)
To: Joe Stringer
Cc: Shannon Nelson, Jesse Brandeburg, netdev, matthew.vick,
linux.nics, therbert, gerlitz.or, alexander.duyck, linux-kernel
In-Reply-To: <20141113235219.GA35957@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]
On Thu, 2014-11-13 at 15:52 -0800, Joe Stringer wrote:
> On Thu, Nov 13, 2014 at 03:41:56PM -0800, Jeff Kirsher wrote:
> > On Thu, 2014-11-13 at 15:36 -0800, Joe Stringer wrote:
> > > ndo_gso_check() was recently introduced to allow NICs to report
> the
> > > offloading support that they have on a per-skb basis. Add an
> > > implementation for this driver which checks for something that
> looks
> > > like VXLAN.
> > >
> > > Signed-off-by: Joe Stringer <joestringer@nicira.com>
> > > ---
> > > v2: Reuse fm10k_tx_encap_offload().
> > > ---
> > > drivers/net/ethernet/intel/fm10k/fm10k.h | 1 +
> > > drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
> > > drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++++
> > > 3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > Same with this one as well, thanks Joe.
>
> Thanks Jeff.
>
> Could you remind me, is the equivalent i40e patch on your queue or
> were
> we still waiting on further feedback from Shannon/Jesse?
Actually, looks like I dropped the patch due to community feedback and
was expecting a v2. Was I incorrect in doing so?
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* My Private Email: liliane.hbettencourt@yahoo.fr
From: Hacer Selamoğlu Çağlayan @ 2014-11-14 0:10 UTC (permalink / raw)
I, Liliane authenticate this email, you can read about me on: http://en.wikipedia.org/wiki/Liliane_Bettencourt I write to you because I intend to give you a portion of my Net-worth which I have been banking. I want to cede it out as gift hoping it would be of help to you and others too. Respond for confirmation.
My Private Email: liliane.hbettencourt@yahoo.fr
With love,
Liliane H Bettencourt
________________________________
Kişiye özel bu mesaj ve içeriğindeki bilgiler gizlidir. Mesaj içeriğinde bulunan bilgi, fikir ve yorumlar, sadece göndericiye aittir. T.C. Çevre ve Şehircilik Bakanlığı bu mesajın içeriği ve ekleri ile ilgili olarak hukuksal hiçbir sorumluluk kabul etmez. Yetkili alıcılardan biri değilseniz, bu mesajın herhangi bir şekilde ifşa edilmesi, kullanılması, kopyalanması, yayılması veya mesajda yer alan hususlarla ilgili olarak herhangi bir işlem yapılmasının kesinlikle yasak olduğunu bildiririz. Böyle bir durumda lütfen hemen mesajın göndericisini bilgilendiriniz ve mesajı sisteminizden siliniz. İnternet ortamında gönderilen e-posta mesajlarındaki hata ve/veya eksikliklerden veya virüslerden dolayı mesajın göndericisi herhangi bir sorumluluk kabul etmemektedir.
Teşekkür ederiz.
*** Bu mail zararlı içeriğe karşı, T.C. Çevre ve Şehircilik Bakanlığı Antivirus Sistemleri tarafından taranmıştır. ***
The information contained in this communication may contain confidential or legally privileged information. Responsibility about sent contents belongs to the sender. The Ministry of Environment and Urbanism doesn't accept any legal responsibility for the contents and attachments of this message. If you are not the intended recipient you are hereby notified that any disclosure, use, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this communication in error, please notify the sender immediately by responding to this e-mail and then delete it from your system. The sender does not accept any liability for any errors or omissions or any viruses in the context of this message which arise as a result of internet transmission.
Thank you.
*** This mail was scanned for known viruses by The Ministry of Environment and Urbanism Antivirus systems. ***
^ permalink raw reply
* (unknown),
From: Omar Hashim @ 2014-11-14 0:04 UTC (permalink / raw)
--
I have a business proposal with mutual
benefits for you.
Regards,
Omar Hashim
--
^ permalink raw reply
* Re: [PATCHv2 net 2/2] fm10k: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-14 0:29 UTC (permalink / raw)
To: Jeff Kirsher
Cc: Shannon Nelson, Jesse Brandeburg, netdev, matthew.vick,
linux.nics, therbert, gerlitz.or, alexander.duyck, linux-kernel,
jesse
In-Reply-To: <1415923959.2454.44.camel@jtkirshe-mobl>
On Thursday, November 13, 2014 16:12:39 Jeff Kirsher wrote:
> On Thu, 2014-11-13 at 15:52 -0800, Joe Stringer wrote:
> > On Thu, Nov 13, 2014 at 03:41:56PM -0800, Jeff Kirsher wrote:
> > > On Thu, 2014-11-13 at 15:36 -0800, Joe Stringer wrote:
> > > > ndo_gso_check() was recently introduced to allow NICs to report
> >
> > the
> >
> > > > offloading support that they have on a per-skb basis. Add an
> > > > implementation for this driver which checks for something that
> >
> > looks
> >
> > > > like VXLAN.
> > > >
> > > > Signed-off-by: Joe Stringer <joestringer@nicira.com>
> > > > ---
> > > > v2: Reuse fm10k_tx_encap_offload().
> > > > ---
> > > >
> > > > drivers/net/ethernet/intel/fm10k/fm10k.h | 1 +
> > > > drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
> > > > drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 ++++++++
> > > > 3 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > Same with this one as well, thanks Joe.
> >
> > Thanks Jeff.
> >
> > Could you remind me, is the equivalent i40e patch on your queue or
> > were
> > we still waiting on further feedback from Shannon/Jesse?
>
> Actually, looks like I dropped the patch due to community feedback and
> was expecting a v2. Was I incorrect in doing so?
That's fine. There were some unresolved questions for what that version should
look like, but I can repost to start the discussion again.
Cheers,
Joe
^ permalink raw reply
* [PATCHv2 net 0/4] Implement ndo_gso_check() for vxlan nics
From: Joe Stringer @ 2014-11-14 0:38 UTC (permalink / raw)
To: netdev
Cc: sathya.perla, shahed.shaikh, amirv, Dept-GELinuxNICDev, therbert,
gerlitz.or, linux-kernel
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other
UDP-based encapsulation protocols where the format and size of the header may
differ. This patch series implements a generic ndo_gso_check() for detecting
VXLAN, then reuses it for these NICs.
Implementation shamelessly stolen from Tom Herbert (with minor fixups):
http://thread.gmane.org/gmane.linux.network/332428/focus=333111
v2: Drop i40e/fm10k patches (code diverged; handling separately).
Refactor common code into vxlan_gso_check() helper.
Minor style fixes.
Joe Stringer (4):
net: Add vxlan_gso_check() helper
be2net: Implement ndo_gso_check()
net/mlx4_en: Implement ndo_gso_check()
qlcnic: Implement ndo_gso_check()
drivers/net/ethernet/emulex/benet/be_main.c | 6 ++++++
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 6 ++++++
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 6 ++++++
drivers/net/vxlan.c | 13 +++++++++++++
include/net/vxlan.h | 2 ++
5 files changed, 33 insertions(+)
--
1.7.10.4
^ permalink raw reply
* [PATCHv2 net 1/4] net: Add vxlan_gso_check() helper
From: Joe Stringer @ 2014-11-14 0:38 UTC (permalink / raw)
To: netdev
Cc: sathya.perla, shahed.shaikh, amirv, Dept-GELinuxNICDev, therbert,
gerlitz.or, alexander.duyck, linux-kernel
In-Reply-To: <1415925495-59312-1-git-send-email-joestringer@nicira.com>
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not
other UDP-based encapsulation protocols where the format and size of the
header differs. This patch implements a generic ndo_gso_check() for
VXLAN which will only advertise GSO support when the skb looks like it
contains VXLAN (or no UDP tunnelling at all).
Implementation shamelessly stolen from Tom Herbert:
http://thread.gmane.org/gmane.linux.network/332428/focus=333111
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
v2: Merge helpers for be2net, mlx4, qlcnic
Use (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
v1: Initial post
---
drivers/net/vxlan.c | 13 +++++++++++++
include/net/vxlan.h | 2 ++
2 files changed, 15 insertions(+)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fa9dc45..6b65863 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1571,6 +1571,19 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
return false;
}
+bool vxlan_gso_check(struct sk_buff *skb)
+{
+ if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
+ (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
+ skb->inner_protocol != htons(ETH_P_TEB) ||
+ (skb_inner_mac_header(skb) - skb_transport_header(skb) !=
+ sizeof(struct udphdr) + sizeof(struct vxlanhdr))))
+ return false;
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(vxlan_gso_check);
+
#if IS_ENABLED(CONFIG_IPV6)
static int vxlan6_xmit_skb(struct vxlan_sock *vs,
struct dst_entry *dst, struct sk_buff *skb,
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index d5f59f3..afadf8e 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -45,6 +45,8 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
__be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
__be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
+bool vxlan_gso_check(struct sk_buff *skb);
+
/* IP header + UDP + VXLAN + Ethernet header */
#define VXLAN_HEADROOM (20 + 8 + 8 + 14)
/* IPv6 header + UDP + VXLAN + Ethernet header */
--
1.7.10.4
^ permalink raw reply related
* [PATCHv2 net 3/4] net/mlx4_en: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-14 0:38 UTC (permalink / raw)
To: netdev
Cc: sathya.perla, shahed.shaikh, amirv, Dept-GELinuxNICDev, therbert,
gerlitz.or, alexander.duyck, linux-kernel
In-Reply-To: <1415925495-59312-1-git-send-email-joestringer@nicira.com>
Use vxlan_gso_check() to advertise offload support for this NIC.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
v2: Refactor out vxlan helper.
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 02266e3..c5fcc56 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2355,6 +2355,11 @@ static void mlx4_en_del_vxlan_port(struct net_device *dev,
queue_work(priv->mdev->workqueue, &priv->vxlan_del_task);
}
+
+static bool mlx4_en_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+ return vxlan_gso_check(skb);
+}
#endif
static const struct net_device_ops mlx4_netdev_ops = {
@@ -2386,6 +2391,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
#ifdef CONFIG_MLX4_EN_VXLAN
.ndo_add_vxlan_port = mlx4_en_add_vxlan_port,
.ndo_del_vxlan_port = mlx4_en_del_vxlan_port,
+ .ndo_gso_check = mlx4_en_gso_check,
#endif
};
--
1.7.10.4
^ permalink raw reply related
* [PATCHv2 net 4/4] qlcnic: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-14 0:38 UTC (permalink / raw)
To: netdev
Cc: sathya.perla, shahed.shaikh, amirv, Dept-GELinuxNICDev, therbert,
gerlitz.or, alexander.duyck, linux-kernel
In-Reply-To: <1415925495-59312-1-git-send-email-joestringer@nicira.com>
Use vxlan_gso_check() to advertise offload support for this NIC.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
v2: Refactor out vxlan helper.
---
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index f5e29f7..a913b3a 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -503,6 +503,11 @@ static void qlcnic_del_vxlan_port(struct net_device *netdev,
adapter->flags |= QLCNIC_DEL_VXLAN_PORT;
}
+
+static bool qlcnic_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+ return vxlan_gso_check(skb);
+}
#endif
static const struct net_device_ops qlcnic_netdev_ops = {
@@ -526,6 +531,7 @@ static const struct net_device_ops qlcnic_netdev_ops = {
#ifdef CONFIG_QLCNIC_VXLAN
.ndo_add_vxlan_port = qlcnic_add_vxlan_port,
.ndo_del_vxlan_port = qlcnic_del_vxlan_port,
+ .ndo_gso_check = qlcnic_gso_check,
#endif
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = qlcnic_poll_controller,
--
1.7.10.4
^ permalink raw reply related
* [PATCHv2 net 2/4] be2net: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-14 0:38 UTC (permalink / raw)
To: netdev
Cc: sathya.perla, shahed.shaikh, amirv, Dept-GELinuxNICDev, therbert,
gerlitz.or, alexander.duyck, linux-kernel
In-Reply-To: <1415925495-59312-1-git-send-email-joestringer@nicira.com>
Use vxlan_gso_check() to advertise offload support for this NIC.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
v2: Refactor out vxlan helper.
---
drivers/net/ethernet/emulex/benet/be_main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 9a18e79..3e8475c 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4421,6 +4421,11 @@ static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
"Disabled VxLAN offloads for UDP port %d\n",
be16_to_cpu(port));
}
+
+static bool be_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+ return vxlan_gso_check(skb);
+}
#endif
static const struct net_device_ops be_netdev_ops = {
@@ -4450,6 +4455,7 @@ static const struct net_device_ops be_netdev_ops = {
#ifdef CONFIG_BE2NET_VXLAN
.ndo_add_vxlan_port = be_add_vxlan_port,
.ndo_del_vxlan_port = be_del_vxlan_port,
+ .ndo_gso_check = be_gso_check,
#endif
};
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCHv2 net 1/2] fm10k: Check tunnel header length in encap offload
From: Vick, Matthew @ 2014-11-14 0:41 UTC (permalink / raw)
To: Joe Stringer, netdev@vger.kernel.org
Cc: Kirsher, Jeffrey T, Linux NICS, therbert@google.com,
gerlitz.or@gmail.com, alexander.duyck@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <1415921801-10452-1-git-send-email-joestringer@nicira.com>
On 11/13/14, 3:36 PM, "Joe Stringer" <joestringer@nicira.com> wrote:
>fm10k supports up to 184 bytes of inner+outer headers. Add an initial
>check to fail encap offload if these are too large.
>
>Signed-off-by: Joe Stringer <joestringer@nicira.com>
>---
>Matthew, I didn't see the equivalent patch on netdev so I went ahead and
>created it. If I've missed this somewhere, then please disregard.
>
>v2: First post.
You didn't miss it Joe--it just hasn't made it up yet. :) It's currently
in Jeff's tree for testing. You're on the CC for the patch, so you'll get
a notification once it goes up. It's basically the same as what you have,
except the #define I use is 184 and I use inner_tcp_hdrlen() to account
for the inner TCP header length.
Since your second patch should apply cleanly on top of mine, what do you
think about dropping the first patch in this series and Jeff can send our
two patches up together once they've passed testing?
Cheers,
Matthew
^ permalink raw reply
* Re: [PATCH 1/3] sh_eth: Remove redundant alignment adjustment
From: Simon Horman @ 2014-11-14 0:43 UTC (permalink / raw)
To: Sergei Shtylyov
Cc: Yoshihiro Kaneko, netdev, David S. Miller, Magnus Damm, linux-sh
In-Reply-To: <546532A2.80509@cogentembedded.com>
On Fri, Nov 14, 2014 at 01:37:22AM +0300, Sergei Shtylyov wrote:
> On 11/13/2014 10:04 AM, Yoshihiro Kaneko wrote:
>
> >From: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com>
>
> >PTR_ALIGN macro after skb_reserve is redundant, because skb_reserve
> >function adjusts the alignment of skb->data.
>
> OK, but where is the bug? There must be one if you base this patch on the
> 'net' tree...
I suppose this patch would be more appropriate for the net-next tree.
> >Signed-off-by: Mitsuhiro Kimura <mitsuhiro.kimura.kc@renesas.com>
> >Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
>
> WBR, Sergei
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCHv2 net 1/2] fm10k: Check tunnel header length in encap offload
From: Joe Stringer @ 2014-11-14 0:54 UTC (permalink / raw)
To: Vick, Matthew
Cc: netdev@vger.kernel.org, Kirsher, Jeffrey T, Linux NICS,
therbert@google.com, gerlitz.or@gmail.com,
alexander.duyck@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <D08A8E37.5FC83%matthew.vick@intel.com>
On Thursday, November 13, 2014 16:41:03 Vick, Matthew wrote:
> On 11/13/14, 3:36 PM, "Joe Stringer" <joestringer@nicira.com> wrote:
> >fm10k supports up to 184 bytes of inner+outer headers. Add an initial
> >check to fail encap offload if these are too large.
> >
> >Signed-off-by: Joe Stringer <joestringer@nicira.com>
> >---
> >Matthew, I didn't see the equivalent patch on netdev so I went ahead and
> >created it. If I've missed this somewhere, then please disregard.
> >
> >v2: First post.
>
> You didn't miss it Joe--it just hasn't made it up yet. :) It's currently
> in Jeff's tree for testing. You're on the CC for the patch, so you'll get
> a notification once it goes up. It's basically the same as what you have,
> except the #define I use is 184 and I use inner_tcp_hdrlen() to account
> for the inner TCP header length.
>
> Since your second patch should apply cleanly on top of mine, what do you
> think about dropping the first patch in this series and Jeff can send our
> two patches up together once they've passed testing?
Ah, that sounds great.
Cheers,
Joe
^ permalink raw reply
* Re: Resume S4 - r8169 BROKEN
From: poma @ 2014-11-14 0:59 UTC (permalink / raw)
To: netdev
In-Reply-To: <54647EFA.5020707@gmail.com>
On 13.11.2014 10:50, poma wrote:
>
> Resume S4 - r8169 RealTek RTL-8169 Gigabit Ethernet driver - BROKEN
>
> Tested with:
> kernel-3.18.0-0.rc4.git0.2.fc22
> http://koji.fedoraproject.org/koji/buildinfo?buildID=592269
> Nov 11 2014
> &
> kernel-3.18.0-rc2.git-9dfa9a2-net-next+
> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9dfa9a2
> 2014-11-13
>
>
> # lspci -knn -s 03:00.0
> 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 02)
> Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:e000]
> Kernel driver in use: r8169
> Kernel modules: r8169
>
>
> # modprobe -v r8169
> insmod /lib/modules/3.18.0-rc2.git-9dfa9a2-net-next+/kernel/drivers/net/mii.ko
> insmod /lib/modules/3.18.0-rc2.git-9dfa9a2-net-next+/kernel/drivers/net/ethernet/realtek/r8169.ko
>
>
> # dmesg
> [ 142.344223] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [ 142.344229] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
> [ 142.344456] r8169 0000:03:00.0: irq 26 for MSI/MSI-X
> [ 142.344600] r8169 0000:03:00.0 eth1: RTL8168c/8111c at 0xffffc9000291a000, 00:12:34:56:78:30, XID 1c4000c0 IRQ 26
> [ 142.344602] r8169 0000:03:00.0 eth1: jumbo features [frames: 6128 bytes, tx checksumming: ko]
> [ 142.388933] r8169 0000:03:00.0 enp3s0: renamed from eth1
> [ 142.417502] r8169 0000:03:00.0 enp3s0: link down
> [ 142.417512] r8169 0000:03:00.0 enp3s0: link down
> [ 144.029918] r8169 0000:03:00.0 enp3s0: link up
>
>
> # ifconfig enp3s0
> enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
> inet 192.168.2.2 netmask 255.255.255.0 broadcast 192.168.2.255
> inet6 fe80::212:34ff:fe56:7830 prefixlen 64 scopeid 0x20<link>
> ether 00:12:34:56:78:30 txqueuelen 1000 (Ethernet)
> RX packets 642 bytes 382970 (373.9 KiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 709 bytes 72634 (70.9 KiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
>
> # ping kernel.org -c10
> PING kernel.org (199.204.44.194) 56(84) bytes of data.
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=1 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=2 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=3 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=4 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=5 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=6 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=7 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=8 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=9 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=10 ttl=49 time=134 ms
>
> --- kernel.org ping statistics ---
> 10 packets transmitted, 10 received, 0% packet loss, time 9011ms
> rtt min/avg/max/mdev = 133.008/134.163/134.987/0.683 ms
>
> ~~~~~~~~~~~~~~~~~~~
> # systemctl suspend
> & RESUME
> ~~~~~~~~
>
> # dmesg
> [ 673.601499] PM: Syncing filesystems ... done.
> [ 673.793930] PM: Preparing system for mem sleep
> [ 673.961198] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [ 673.963420] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [ 673.964664] PM: Entering mem sleep
> [ 676.457146] PM: suspend of devices complete after 2492.103 msecs
> [ 676.457826] PM: late suspend of devices complete after 0.673 msecs
> [ 676.469303] PM: noirq suspend of devices complete after 11.469 msecs
> [ 676.469889] ACPI: Preparing to enter system sleep state S3
> [ 676.470890] PM: Saving platform NVS memory
> [ 676.479540] ACPI: Low-level resume complete
> [ 676.479613] PM: Restoring platform NVS memory
> [ 676.525167] ACPI: Waking up from system sleep state S3
> [ 676.590780] r8169 0000:03:00.0 enp3s0: link down
> [ 678.060741] r8169 0000:03:00.0 enp3s0: link up
> [ 679.604068] PM: resume of devices complete after 3066.534 msecs
> [ 679.604657] PM: Finishing wakeup.
> [ 679.604662] Restarting tasks ... done.
>
>
> # ifconfig enp3s0
> enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
> inet 192.168.2.2 netmask 255.255.255.0 broadcast 192.168.2.255
> inet6 fe80::212:34ff:fe56:7830 prefixlen 64 scopeid 0x20<link>
> ether 00:12:34:56:78:30 txqueuelen 1000 (Ethernet)
> RX packets 782 bytes 464566 (453.6 KiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 860 bytes 86524 (84.4 KiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
>
> # ping kernel.org -c10
> PING kernel.org (199.204.44.194) 56(84) bytes of data.
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=1 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=2 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=3 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=4 ttl=49 time=132 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=5 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=6 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=7 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=8 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=9 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=10 ttl=49 time=134 ms
>
> --- kernel.org ping statistics ---
> 10 packets transmitted, 10 received, 0% packet loss, time 9011ms
> rtt min/avg/max/mdev = 132.798/133.770/134.919/0.682 ms
>
> ~~~~~~~~~~~~~~~~~~~~~
> # systemctl hibernate
> & THAW
> ~~~~~~
>
> # dmesg
> [ 1187.902839] PM: Hibernation mode set to 'platform'
> [ 1188.098296] PM: Syncing filesystems ... done.
> [ 1188.545125] Freezing user space processes ... (elapsed 0.002 seconds) done.
> [ 1188.548022] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
> [ 1188.548031] PM: Marking nosave pages: [mem 0x0009e000-0x000fffff]
> [ 1188.548042] PM: Marking nosave pages: [mem 0xcfef0000-0xffffffff]
> [ 1188.549729] PM: Basic memory bitmaps created
> [ 1188.549755] PM: Preallocating image memory... done (allocated 641929 pages)
> [ 1189.133132] PM: Allocated 2567716 kbytes in 0.58 seconds (4427.09 MB/s)
> [ 1189.133241] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [ 1191.867738] PM: freeze of devices complete after 2732.078 msecs
> [ 1191.868597] PM: late freeze of devices complete after 0.855 msecs
> [ 1191.869951] PM: noirq freeze of devices complete after 1.344 msecs
> [ 1191.871294] ACPI: Preparing to enter system sleep state S4
> [ 1191.871956] PM: Saving platform NVS memory
> [ 1191.879106] PM: Creating hibernation image:
> [ 1192.100544] PM: Need to copy 359761 pages
> [ 1192.100558] PM: Normal pages needed: 359761 + 1024, available pages: 688358
> [ 1191.880196] PM: Restoring platform NVS memory
> [ 1191.925748] ACPI: Waking up from system sleep state S4
> [ 1191.936325] PM: noirq restore of devices complete after 10.428 msecs
> [ 1191.936796] PM: early restore of devices complete after 0.423 msecs
> [ 1192.043225] r8169 0000:03:00.0 enp3s0: link down
> [ 1192.689507] PM: restore of devices complete after 699.641 msecs
> [ 1192.689835] PM: Image restored successfully.
> [ 1192.689856] PM: Basic memory bitmaps freed
> [ 1192.689860] Restarting tasks ... done.
> [ 1193.672620] r8169 0000:03:00.0 enp3s0: link up
>
>
> # ifconfig enp3s0
> enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
> inet 192.168.2.2 netmask 255.255.255.0 broadcast 192.168.2.255
> inet6 fe80::212:34ff:fe56:7830 prefixlen 64 scopeid 0x20<link>
> ether 00:12:34:56:78:30 txqueuelen 1000 (Ethernet)
> RX packets 794 bytes 465496 (454.5 KiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 939 bytes 90700 (88.5 KiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
>
> # ping kernel.org -c10
> ping: unknown host kernel.org
>
> # ping 199.204.44.194 -c10
> PING 199.204.44.194 (199.204.44.194) 56(84) bytes of data.
> From 192.168.2.2 icmp_seq=1 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=2 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=3 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=4 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=5 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=6 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=7 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=8 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=9 Destination Host Unreachable
> From 192.168.2.2 icmp_seq=10 Destination Host Unreachable
>
> --- 199.204.44.194 ping statistics ---
> 10 packets transmitted, 0 received, +10 errors, 100% packet loss, time 9001ms
> pipe 4
>
> ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
> # modprobe -rv r8169 && modprobe -v r8169
> rmmod r8169
> rmmod mii
> insmod /lib/modules/3.18.0-rc2.git-9dfa9a2-net-next+/kernel/drivers/net/mii.ko
> insmod /lib/modules/3.18.0-rc2.git-9dfa9a2-net-next+/kernel/drivers/net/ethernet/realtek/r8169.ko
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> # dmesg
> [ 1566.163904] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [ 1566.163932] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
> [ 1566.164443] r8169 0000:03:00.0: irq 26 for MSI/MSI-X
> [ 1566.164997] r8169 0000:03:00.0 eth1: RTL8168c/8111c at 0xffffc900027d0000, 00:12:34:56:78:30, XID 1c4000c0 IRQ 26
> [ 1566.165007] r8169 0000:03:00.0 eth1: jumbo features [frames: 6128 bytes, tx checksumming: ko]
> [ 1566.383002] r8169 0000:03:00.0 enp3s0: renamed from eth1
> [ 1566.401190] r8169 0000:03:00.0 enp3s0: link down
> [ 1567.909282] r8169 0000:03:00.0 enp3s0: link up
>
>
> # ifconfig enp3s0
> enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
> inet 192.168.2.2 netmask 255.255.255.0 broadcast 192.168.2.255
> inet6 fe80::212:34ff:fe56:7830 prefixlen 64 scopeid 0x20<link>
> ether 00:12:34:56:78:30 txqueuelen 1000 (Ethernet)
> RX packets 59 bytes 14200 (13.8 KiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 77 bytes 7441 (7.2 KiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
>
> # ping kernel.org -c10
> PING kernel.org (199.204.44.194) 56(84) bytes of data.
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=1 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=2 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=3 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=4 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=5 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=6 ttl=49 time=132 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=7 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=8 ttl=49 time=134 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=9 ttl=49 time=133 ms
> 64 bytes from yul-korg-pub.kernel.org (199.204.44.194): icmp_seq=10 ttl=49 time=133 ms
>
> --- kernel.org ping statistics ---
> 10 packets transmitted, 10 received, 0% packet loss, time 9012ms
> rtt min/avg/max/mdev = 132.657/133.636/134.708/0.728 ms
>
This is definitely r8169 - RealTek RTL-8169 Gigabit Ethernet driver defect.
I tested with a variety of kernels starting with 3.10.0-123.9.3.el7.x86_64 up to the latest 3.18 series.
For example, skge - SysKonnect Gigabit Ethernet driver and rt2800usb - Ralink RT2800 USB Wireless LAN driver do not suffer from this issue.
The problem arises when cloned-mac-address is used, and to actually recover connection after 'thaw' there is no need to reload module per se
i.e. 'modprobe -rv r8169 && modprobe -v r8169'
it is sufficient to re-apply cloned-mac-address
i.e. 'ifconfig enp3s0 hw ether 00:12:34:56:78:30'
or use some other equivalent method.
Nevertheless this is broken behavior.
If you have an idea how to solve this within module or if you have a patch, don't hesitate to ping.
poma
^ permalink raw reply
* Re: [PATCH net-next] icmp: Remove some spurious dropped packet profile hits from the ICMP path
From: David Miller @ 2014-11-14 1:32 UTC (permalink / raw)
To: raj; +Cc: netdev
In-Reply-To: <20141113225457.A3E502900805@tardy>
From: raj@tardy.usa.hp.com (Rick Jones)
Date: Thu, 13 Nov 2014 14:54:57 -0800 (PST)
> + /* until the v6 path can be better sorted we may still need
> + * to kfree_sbk() here but want to avoid a double free from
Typo "kfree_skb()".
^ permalink raw reply
* [PATCH v2 net-next 0/7] implementation of eBPF maps
From: Alexei Starovoitov @ 2014-11-14 1:36 UTC (permalink / raw)
To: David S. Miller
Cc: Ingo Molnar, Andy Lutomirski, Daniel Borkmann,
Hannes Frederic Sowa, Eric Dumazet, linux-api, netdev,
linux-kernel
Hi All,
v1->v2:
renamed flags for MAP_UPDATE_ELEM command to be more concise,
clarified commit logs and improved comments in patches 1,3,7
per discussions with Daniel
Old v1 cover:
this set of patches adds implementation of HASH and ARRAY types of eBPF maps
which were described in manpage in commit b4fc1a460f30("Merge branch 'bpf-next'")
The difference vs previous version of these patches from August:
- added 'flags' attribute to BPF_MAP_UPDATE_ELEM
- in HASH type implementation removed per-map kmem_cache.
I was doing kmem_cache_create() for every map to enable selective slub
debugging to check for overflows and leaks. Now it's not needed, so just
use normal kmalloc() for map elements.
- added ARRAY type which was mentioned in manpage, but wasn't public yet
- added map testsuite and removed temporary bits from test_stubs
Note, eBPF programs cannot be attached to events yet.
It will come in the next set.
Alexei Starovoitov (7):
bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command
bpf: add hashtable type of eBPF maps
bpf: add array type of eBPF maps
bpf: fix BPF_MAP_LOOKUP_ELEM command return code
bpf: add a testsuite for eBPF maps
bpf: allow eBPF programs to use maps
bpf: remove test map scaffolding and user proper types
include/linux/bpf.h | 7 +-
include/uapi/linux/bpf.h | 13 +-
kernel/bpf/Makefile | 2 +-
kernel/bpf/arraymap.c | 151 ++++++++++++++++++
kernel/bpf/hashtab.c | 362 +++++++++++++++++++++++++++++++++++++++++++
kernel/bpf/helpers.c | 89 +++++++++++
kernel/bpf/syscall.c | 6 +-
kernel/bpf/test_stub.c | 56 ++-----
samples/bpf/Makefile | 3 +-
samples/bpf/libbpf.c | 3 +-
samples/bpf/libbpf.h | 2 +-
samples/bpf/test_maps.c | 291 ++++++++++++++++++++++++++++++++++
samples/bpf/test_verifier.c | 14 +-
13 files changed, 936 insertions(+), 63 deletions(-)
create mode 100644 kernel/bpf/arraymap.c
create mode 100644 kernel/bpf/hashtab.c
create mode 100644 kernel/bpf/helpers.c
create mode 100644 samples/bpf/test_maps.c
--
1.7.9.5
^ permalink raw reply
* [PATCH v2 net-next 1/7] bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command
From: Alexei Starovoitov @ 2014-11-14 1:36 UTC (permalink / raw)
To: David S. Miller
Cc: Ingo Molnar, Andy Lutomirski, Daniel Borkmann,
Hannes Frederic Sowa, Eric Dumazet, linux-api, netdev,
linux-kernel
In-Reply-To: <1415929010-9361-1-git-send-email-ast@plumgrid.com>
the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
either update existing map element or create a new one.
Initially the plan was to add a new command to handle the case of
'create new element if it didn't exist', but 'flags' style looks
cleaner and overall diff is much smaller (more code reused), so add 'flags'
attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
#define BPF_ANY 0 /* create new element or update existing */
#define BPF_NOEXIST 1 /* create new element if it didn't exist */
#define BPF_EXIST 2 /* update existing element */
bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST
if element already exists.
bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT
if element doesn't exist.
Userspace will call it as:
int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
.flags = flags;
};
return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}
First two bits of 'flags' are used to encode style of bpf_update_elem() command.
Bits 2-63 are reserved for future use.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
patch 5 of this set includes tests of bpf_update_elem() with these flags
include/linux/bpf.h | 2 +-
include/uapi/linux/bpf.h | 8 +++++++-
kernel/bpf/syscall.c | 4 ++--
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cf91754a957..51e9242e4803 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -22,7 +22,7 @@ struct bpf_map_ops {
/* funcs callable from userspace and from eBPF programs */
void *(*map_lookup_elem)(struct bpf_map *map, void *key);
- int (*map_update_elem)(struct bpf_map *map, void *key, void *value);
+ int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags);
int (*map_delete_elem)(struct bpf_map *map, void *key);
};
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d18316f9e9c4..3e9e1b77f29d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -82,7 +82,7 @@ enum bpf_cmd {
/* create or update key/value pair in a given map
* err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
- * Using attr->map_fd, attr->key, attr->value
+ * Using attr->map_fd, attr->key, attr->value, attr->flags
* returns zero or negative error
*/
BPF_MAP_UPDATE_ELEM,
@@ -117,6 +117,11 @@ enum bpf_prog_type {
BPF_PROG_TYPE_UNSPEC,
};
+/* flags for BPF_MAP_UPDATE_ELEM command */
+#define BPF_ANY 0 /* create new element or update existing */
+#define BPF_NOEXIST 1 /* create new element if it didn't exist */
+#define BPF_EXIST 2 /* update existing element */
+
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type; /* one of enum bpf_map_type */
@@ -132,6 +137,7 @@ union bpf_attr {
__aligned_u64 value;
__aligned_u64 next_key;
};
+ __u64 flags;
};
struct { /* anonymous struct used by BPF_PROG_LOAD command */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ba61c8c16032..c0d03bf317a2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -190,7 +190,7 @@ err_put:
return err;
}
-#define BPF_MAP_UPDATE_ELEM_LAST_FIELD value
+#define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
static int map_update_elem(union bpf_attr *attr)
{
@@ -231,7 +231,7 @@ static int map_update_elem(union bpf_attr *attr)
* therefore all map accessors rely on this fact, so do the same here
*/
rcu_read_lock();
- err = map->ops->map_update_elem(map, key, value);
+ err = map->ops->map_update_elem(map, key, value, attr->flags);
rcu_read_unlock();
free_value:
--
1.7.9.5
^ permalink raw reply related
* [PATCH v2 net-next 2/7] bpf: add hashtable type of eBPF maps
From: Alexei Starovoitov @ 2014-11-14 1:36 UTC (permalink / raw)
To: David S. Miller
Cc: Ingo Molnar, Andy Lutomirski, Daniel Borkmann,
Hannes Frederic Sowa, Eric Dumazet, linux-api, netdev,
linux-kernel
In-Reply-To: <1415929010-9361-1-git-send-email-ast@plumgrid.com>
add new map type BPF_MAP_TYPE_HASH and its implementation
- maps are created/destroyed by userspace. Both userspace and eBPF programs
can lookup/update/delete elements from the map
- eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism
for concurrent updates
- key/value are opaque range of bytes (aligned to 8 bytes)
- user space provides 3 configuration attributes via BPF syscall:
key_size, value_size, max_entries
- map takes care of allocating/freeing key/value pairs
- map_update_elem() must fail to insert new element when max_entries
limit is reached to make sure that eBPF programs cannot exhaust memory
- map_update_elem() replaces elements in an atomic way
- optimized for speed of lookup() which can be called multiple times from
eBPF program which itself is triggered by high volume of events
. in the future JIT compiler may recognize lookup() call and optimize it
further, since key_size is constant for life of eBPF program
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
include/uapi/linux/bpf.h | 1 +
kernel/bpf/Makefile | 2 +-
kernel/bpf/hashtab.c | 362 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 364 insertions(+), 1 deletion(-)
create mode 100644 kernel/bpf/hashtab.c
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3e9e1b77f29d..03a01fd609aa 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -111,6 +111,7 @@ enum bpf_cmd {
enum bpf_map_type {
BPF_MAP_TYPE_UNSPEC,
+ BPF_MAP_TYPE_HASH,
};
enum bpf_prog_type {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 0daf7f6ae7df..2c0ec7f9da78 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,5 +1,5 @@
obj-y := core.o
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o hashtab.o
ifdef CONFIG_TEST_BPF
obj-$(CONFIG_BPF_SYSCALL) += test_stub.o
endif
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
new file mode 100644
index 000000000000..d234a012f046
--- /dev/null
+++ b/kernel/bpf/hashtab.c
@@ -0,0 +1,362 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+#include <linux/bpf.h>
+#include <linux/jhash.h>
+#include <linux/filter.h>
+#include <linux/vmalloc.h>
+
+struct bpf_htab {
+ struct bpf_map map;
+ struct hlist_head *buckets;
+ spinlock_t lock;
+ u32 count; /* number of elements in this hashtable */
+ u32 n_buckets; /* number of hash buckets */
+ u32 elem_size; /* size of each element in bytes */
+};
+
+/* each htab element is struct htab_elem + key + value */
+struct htab_elem {
+ struct hlist_node hash_node;
+ struct rcu_head rcu;
+ u32 hash;
+ char key[0] __aligned(8);
+};
+
+/* Called from syscall */
+static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_htab *htab;
+ int err, i;
+
+ htab = kzalloc(sizeof(*htab), GFP_USER);
+ if (!htab)
+ return ERR_PTR(-ENOMEM);
+
+ /* mandatory map attributes */
+ htab->map.key_size = attr->key_size;
+ htab->map.value_size = attr->value_size;
+ htab->map.max_entries = attr->max_entries;
+
+ /* check sanity of attributes.
+ * value_size == 0 may be allowed in the future to use map as a set
+ */
+ err = -EINVAL;
+ if (htab->map.max_entries == 0 || htab->map.key_size == 0 ||
+ htab->map.value_size == 0)
+ goto free_htab;
+
+ /* hash table size must be power of 2 */
+ htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);
+
+ err = -E2BIG;
+ if (htab->map.key_size > MAX_BPF_STACK)
+ /* eBPF programs initialize keys on stack, so they cannot be
+ * larger than max stack size
+ */
+ goto free_htab;
+
+ err = -ENOMEM;
+ htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct hlist_head),
+ GFP_USER | __GFP_NOWARN);
+
+ if (!htab->buckets) {
+ htab->buckets = vmalloc(htab->n_buckets * sizeof(struct hlist_head));
+ if (!htab->buckets)
+ goto free_htab;
+ }
+
+ for (i = 0; i < htab->n_buckets; i++)
+ INIT_HLIST_HEAD(&htab->buckets[i]);
+
+ spin_lock_init(&htab->lock);
+ htab->count = 0;
+
+ htab->elem_size = sizeof(struct htab_elem) +
+ round_up(htab->map.key_size, 8) +
+ htab->map.value_size;
+ return &htab->map;
+
+free_htab:
+ kfree(htab);
+ return ERR_PTR(err);
+}
+
+static inline u32 htab_map_hash(const void *key, u32 key_len)
+{
+ return jhash(key, key_len, 0);
+}
+
+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+ return &htab->buckets[hash & (htab->n_buckets - 1)];
+}
+
+static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 hash,
+ void *key, u32 key_size)
+{
+ struct htab_elem *l;
+
+ hlist_for_each_entry_rcu(l, head, hash_node)
+ if (l->hash == hash && !memcmp(&l->key, key, key_size))
+ return l;
+
+ return NULL;
+}
+
+/* Called from syscall or from eBPF program */
+static void *htab_map_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct hlist_head *head;
+ struct htab_elem *l;
+ u32 hash, key_size;
+
+ /* Must be called with rcu_read_lock. */
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ key_size = map->key_size;
+
+ hash = htab_map_hash(key, key_size);
+
+ head = select_bucket(htab, hash);
+
+ l = lookup_elem_raw(head, hash, key, key_size);
+
+ if (l)
+ return l->key + round_up(map->key_size, 8);
+
+ return NULL;
+}
+
+/* Called from syscall */
+static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct hlist_head *head;
+ struct htab_elem *l, *next_l;
+ u32 hash, key_size;
+ int i;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ key_size = map->key_size;
+
+ hash = htab_map_hash(key, key_size);
+
+ head = select_bucket(htab, hash);
+
+ /* lookup the key */
+ l = lookup_elem_raw(head, hash, key, key_size);
+
+ if (!l) {
+ i = 0;
+ goto find_first_elem;
+ }
+
+ /* key was found, get next key in the same bucket */
+ next_l = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(&l->hash_node)),
+ struct htab_elem, hash_node);
+
+ if (next_l) {
+ /* if next elem in this hash list is non-zero, just return it */
+ memcpy(next_key, next_l->key, key_size);
+ return 0;
+ }
+
+ /* no more elements in this hash list, go to the next bucket */
+ i = hash & (htab->n_buckets - 1);
+ i++;
+
+find_first_elem:
+ /* iterate over buckets */
+ for (; i < htab->n_buckets; i++) {
+ head = select_bucket(htab, i);
+
+ /* pick first element in the bucket */
+ next_l = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),
+ struct htab_elem, hash_node);
+ if (next_l) {
+ /* if it's not empty, just return it */
+ memcpy(next_key, next_l->key, key_size);
+ return 0;
+ }
+ }
+
+ /* itereated over all buckets and all elements */
+ return -ENOENT;
+}
+
+/* Called from syscall or from eBPF program */
+static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
+ u64 map_flags)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct htab_elem *l_new, *l_old;
+ struct hlist_head *head;
+ unsigned long flags;
+ u32 key_size;
+ int ret;
+
+ if (map_flags > BPF_EXIST)
+ /* unknown flags */
+ return -EINVAL;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ /* allocate new element outside of lock */
+ l_new = kmalloc(htab->elem_size, GFP_ATOMIC);
+ if (!l_new)
+ return -ENOMEM;
+
+ key_size = map->key_size;
+
+ memcpy(l_new->key, key, key_size);
+ memcpy(l_new->key + round_up(key_size, 8), value, map->value_size);
+
+ l_new->hash = htab_map_hash(l_new->key, key_size);
+
+ /* bpf_map_update_elem() can be called in_irq() */
+ spin_lock_irqsave(&htab->lock, flags);
+
+ head = select_bucket(htab, l_new->hash);
+
+ l_old = lookup_elem_raw(head, l_new->hash, key, key_size);
+
+ if (!l_old && unlikely(htab->count >= map->max_entries)) {
+ /* if elem with this 'key' doesn't exist and we've reached
+ * max_entries limit, fail insertion of new elem
+ */
+ ret = -E2BIG;
+ goto err;
+ }
+
+ if (l_old && map_flags == BPF_NOEXIST) {
+ /* elem already exists */
+ ret = -EEXIST;
+ goto err;
+ }
+
+ if (!l_old && map_flags == BPF_EXIST) {
+ /* elem doesn't exist, cannot update it */
+ ret = -ENOENT;
+ goto err;
+ }
+
+ /* add new element to the head of the list, so that concurrent
+ * search will find it before old elem
+ */
+ hlist_add_head_rcu(&l_new->hash_node, head);
+ if (l_old) {
+ hlist_del_rcu(&l_old->hash_node);
+ kfree_rcu(l_old, rcu);
+ } else {
+ htab->count++;
+ }
+ spin_unlock_irqrestore(&htab->lock, flags);
+
+ return 0;
+err:
+ spin_unlock_irqrestore(&htab->lock, flags);
+ kfree(l_new);
+ return ret;
+}
+
+/* Called from syscall or from eBPF program */
+static int htab_map_delete_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+ struct hlist_head *head;
+ struct htab_elem *l;
+ unsigned long flags;
+ u32 hash, key_size;
+ int ret = -ENOENT;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ key_size = map->key_size;
+
+ hash = htab_map_hash(key, key_size);
+
+ spin_lock_irqsave(&htab->lock, flags);
+
+ head = select_bucket(htab, hash);
+
+ l = lookup_elem_raw(head, hash, key, key_size);
+
+ if (l) {
+ hlist_del_rcu(&l->hash_node);
+ htab->count--;
+ kfree_rcu(l, rcu);
+ ret = 0;
+ }
+
+ spin_unlock_irqrestore(&htab->lock, flags);
+ return ret;
+}
+
+static void delete_all_elements(struct bpf_htab *htab)
+{
+ int i;
+
+ for (i = 0; i < htab->n_buckets; i++) {
+ struct hlist_head *head = select_bucket(htab, i);
+ struct hlist_node *n;
+ struct htab_elem *l;
+
+ hlist_for_each_entry_safe(l, n, head, hash_node) {
+ hlist_del_rcu(&l->hash_node);
+ htab->count--;
+ kfree(l);
+ }
+ }
+}
+
+/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
+static void htab_map_free(struct bpf_map *map)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+ /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
+ * so the programs (can be more than one that used this map) were
+ * disconnected from events. Wait for outstanding critical sections in
+ * these programs to complete
+ */
+ synchronize_rcu();
+
+ /* some of kfree_rcu() callbacks for elements of this map may not have
+ * executed. It's ok. Proceed to free residual elements and map itself
+ */
+ delete_all_elements(htab);
+ kvfree(htab->buckets);
+ kfree(htab);
+}
+
+static struct bpf_map_ops htab_ops = {
+ .map_alloc = htab_map_alloc,
+ .map_free = htab_map_free,
+ .map_get_next_key = htab_map_get_next_key,
+ .map_lookup_elem = htab_map_lookup_elem,
+ .map_update_elem = htab_map_update_elem,
+ .map_delete_elem = htab_map_delete_elem,
+};
+
+static struct bpf_map_type_list tl = {
+ .ops = &htab_ops,
+ .type = BPF_MAP_TYPE_HASH,
+};
+
+static int __init register_htab_map(void)
+{
+ bpf_register_map_type(&tl);
+ return 0;
+}
+late_initcall(register_htab_map);
--
1.7.9.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox