* Re: [PATCH net 1/3] gre: do not assign header_ops in collect metadata mode
From: pravin shelar @ 2016-04-22 21:04 UTC (permalink / raw)
To: Jiri Benc
Cc: Linux Kernel Network Developers, Pravin B Shelar, Thomas Graf,
Simon Horman
In-Reply-To: <7c34e7243a146fc5ccdf6349892355746741ff26.1461346798.git.jbenc@redhat.com>
On Fri, Apr 22, 2016 at 10:44 AM, Jiri Benc <jbenc@redhat.com> wrote:
> In ipgre mode (i.e. not gretap) with collect metadata flag set, the tunnel
> is incorrectly assumed to be mGRE in NBMA mode (see commit 6a5f44d7a048c).
> This is not the case, we're controlling the encapsulation addresses by
> lwtunnel metadata. And anyway, assigning dev->header_ops in collect metadata
> mode does not make sense.
>
> Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
> net/ipv4/ip_gre.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index af5d1f38217f..d0abde4236af 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -893,7 +893,7 @@ static int ipgre_tunnel_init(struct net_device *dev)
> netif_keep_dst(dev);
> dev->addr_len = 4;
>
> - if (iph->daddr) {
> + if (iph->daddr && !tunnel->collect_md) {
> #ifdef CONFIG_NET_IPGRE_BROADCAST
> if (ipv4_is_multicast(iph->daddr)) {
> if (!iph->saddr)
> @@ -902,8 +902,9 @@ static int ipgre_tunnel_init(struct net_device *dev)
> dev->header_ops = &ipgre_header_ops;
> }
> #endif
I think we should we return error in case of such configuration rather
than silently ignoring it.
^ permalink raw reply
* Re: [PATCH net 2/3] gre: build header correctly for collect metadata tunnels
From: pravin shelar @ 2016-04-22 21:05 UTC (permalink / raw)
To: Jiri Benc
Cc: Linux Kernel Network Developers, Pravin B Shelar, Thomas Graf,
Simon Horman
In-Reply-To: <7778b1a1e2123d3c9bf541d671fe62caeb6e9fd5.1461346798.git.jbenc@redhat.com>
On Fri, Apr 22, 2016 at 10:44 AM, Jiri Benc <jbenc@redhat.com> wrote:
> In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to
> contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the
> case, the interface is ARPHRD_IPGRE and the protocol to be used for
> encapsulation is skb->protocol.
>
> Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
^ permalink raw reply
* Re: [ask for help and advice] fib6_del triggered BUG_ON, because rt->rt6i_ref counter is 2
From: Hongmei Li @ 2016-04-22 21:06 UTC (permalink / raw)
To: netdev
In-Reply-To: <56F635EC.6090009@huawei.com>
I encounter the same BUG_ON several times,
rt6i_ref is 2;
rt6i_dst is 0::0
Here is an example trace:
[278451.384635] Kernel BUG at ffffffc000b60238
[278451.384641] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[278451.384671] task: ffffffc035f30ac0
[278451.384683] PC is at fib6_purge_rt+0xd0/0xe4
[278451.384690] LR is at fib6_del+0x230/0x298
[278451.384695] pc : [<ffffffc000b60238>] lr : [<ffffffc000b613b4>]
[278451.384700] sp : ffffffc02b9d78a0
[278451.384704] x29: ffffffc02b9d78a0 x28: 0000000000000000
[278451.384711] x27: 0000000000000000 x26: 00000000000001c8
[278451.384718] x25: ffffffc02b9d7c80 x24: 0000000000000000
[278451.384726] x23: ffffffc00173ebc0 x22: ffffffc02b9d7a90
[278451.384734] x21: ffffffc000b5f690 x20: 0000000000000000
[278451.384741] x19: ffffffc02dc35c80 x18: 0000007f91a07000
[278451.384749] x17: 0000007f91adc33c x16: ffffffc000a76df0
[278451.384757] x15: 0000000000007330 x14: 0000000000007330
[278451.384764] x13: 0000000000000000 x12: 0000000000000000
[278451.384771] x11: ffffffc02dc35d80 x10: 0000000000000001
[278451.384779] x9 : 0000000000000000 x8 : 000000000000008c
[278451.384786] x7 : ffffffc00460e700 x6 : ffffffc00460e200
[278451.384793] x5 : ffffffc0017426d8 x4 : 0000000000000000
[278451.384800] x3 : ffffffc00174f300 x2 : ffffffc00173ebc0
[278451.384807] x1 : ffffffc02bfe5e98 x0 : 0000000000000002
...
[278451.385322] Process ip
[278451.385327] Call trace:
[278451.385336] [<ffffffc000b60238>] fib6_purge_rt+0xd0/0xe4
[278451.385343] [<ffffffc000b613b0>] fib6_del+0x22c/0x298
[278451.385350] [<ffffffc000b5c1a8>] __ip6_del_rt+0x50/0x7c
[278451.385356] [<ffffffc000b5c2a0>] ip6_route_del+0xcc/0x104
[278451.385363] [<ffffffc000b5e130>] inet6_rtm_delroute+0x44/0x68
[278451.385371] [<ffffffc000a99d7c>] rtnetlink_rcv_msg+0x180/0x1a0
[278451.385378] [<ffffffc000ab4998>] netlink_rcv_skb+0x64/0xc8
[278451.385387] [<ffffffc000a98590>] rtnetlink_rcv+0x1c/0x2c
[278451.385393] [<ffffffc000ab43e4>] netlink_unicast+0x108/0x1b8
[278451.385399] [<ffffffc000ab47d0>] netlink_sendmsg+0x27c/0x2d4
[278451.385407] [<ffffffc000a74ebc>] sock_sendmsg+0x8c/0xb0
[278451.385414] [<ffffffc000a76ebc>] SyS_sendto+0xcc/0x110
[278451.385421] Code: 17ffffdb b940ce60 7100041f 54000040 (e7f001f2)
[278451.385428] ---[ end trace abecf0c88c823f51 ]---
[278451.391859] Kernel panic - not syncing: Fatal exception in interrupt
^ permalink raw reply
* Re: [PATCH net 3/3] gre: receive also TEB packets for lwtunnels
From: pravin shelar @ 2016-04-22 21:07 UTC (permalink / raw)
To: Jiri Benc
Cc: Linux Kernel Network Developers, Pravin B Shelar, Thomas Graf,
Simon Horman
In-Reply-To: <017d8f2b1c00e8fdbdb92a5898b1b5f365b58d6d.1461346798.git.jbenc@redhat.com>
On Fri, Apr 22, 2016 at 10:44 AM, Jiri Benc <jbenc@redhat.com> wrote:
> For ipgre interfaces in collect metadata mode, receive also traffic with
> encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
> out correctly. This allows to have mixed Ethernet + L3-only traffic on the
> same lwtunnel interface.
>
How user is suppose to sort out the type of packet? whether it is L2 or L3.
> To keep backwards compatibility and prevent any surprises, gretap interfaces
> have priority in receiving packets with Ethernet headers.
>
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---
> include/net/ip_tunnels.h | 1 +
> net/ipv4/ip_gre.c | 34 +++++++++++++++++++++++++---------
> 2 files changed, 26 insertions(+), 9 deletions(-)
>
...
> +
> +static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi)
> +{
> + struct net *net = dev_net(skb->dev);
> + struct ip_tunnel_net *itn;
> + int res;
> +
> + if (tpi->proto == htons(ETH_P_TEB))
> + itn = net_generic(net, gre_tap_net_id);
> + else
> + itn = net_generic(net, ipgre_net_id);
> +
> + res = __ipgre_rcv(skb, tpi, itn);
> + if (res == PACKET_NEXT && tpi->proto == htons(ETH_P_TEB)) {
> + /* ipgre tunnels in collect metadata mode should receive
> + * also ETH_P_TEB traffic
> + */
> + itn = net_generic(net, ipgre_net_id);
> + res = __ipgre_rcv(skb, tpi, int);
Is it fine to receive L2 packet over L3 device?
At least ip_tunnel_rcv() is not ready to handle such packet.
^ permalink raw reply
* Re: [PATCH net 1/3] gre: do not assign header_ops in collect metadata mode
From: Jiri Benc @ 2016-04-22 21:20 UTC (permalink / raw)
To: pravin shelar
Cc: Linux Kernel Network Developers, Pravin B Shelar, Thomas Graf,
Simon Horman
In-Reply-To: <CAOrHB_Dffj93ED0u+RQ89SneRn5YRGFB9piYYiTDFJ43ZGDpNQ@mail.gmail.com>
On Fri, 22 Apr 2016 14:04:48 -0700, pravin shelar wrote:
> I think we should we return error in case of such configuration rather
> than silently ignoring it.
I thought about it and I'm not sure. We're not returning an error
currently, starting returning it now might be perceived as uAPI
breakage.
But given it doesn't work at all currently, there are apparently no
users yet. I'll wait for more feedback.
Jiri
^ permalink raw reply
* Re: [PATCH net 3/3] gre: receive also TEB packets for lwtunnels
From: Jiri Benc @ 2016-04-22 21:27 UTC (permalink / raw)
To: pravin shelar
Cc: Linux Kernel Network Developers, Pravin B Shelar, Thomas Graf,
Simon Horman
In-Reply-To: <CAOrHB_DrFXg=nQAdscOgX8nbUoOC4fMsWiN6R4n907zB5rtD=g@mail.gmail.com>
On Fri, 22 Apr 2016 14:07:01 -0700, pravin shelar wrote:
> On Fri, Apr 22, 2016 at 10:44 AM, Jiri Benc <jbenc@redhat.com> wrote:
> > For ipgre interfaces in collect metadata mode, receive also traffic with
> > encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
> > out correctly. This allows to have mixed Ethernet + L3-only traffic on the
> > same lwtunnel interface.
> >
> How user is suppose to sort out the type of packet? whether it is L2 or L3.
By skb->protocol, of course. Will be ETH_P_TEB if the inner packet was
Ethernet. There's no problem with this, for lwtunnels used with encap
routes, such packets are just discarded (which is the expected
behavior), as there's no protocol handler for ETH_P_TEB. More clever
lwtunnel users (such as openvswitch) can process the packets. This is
important in order to have a single tunnel interface for everything.
> Is it fine to receive L2 packet over L3 device?
> At least ip_tunnel_rcv() is not ready to handle such packet.
I don't see why. Could you please elaborate? Seems safe to me. The
desired result is skb->protocol == htons(ETH_P_TEB), network header(!)
pointing to the start of the Ethernet frame (because we're tunneling
Ethernet via ARPHRD_IPGRE interface, there's no L2 header to have), mac
header not being set.
Jiri
^ permalink raw reply
* Re: [PATCH] net: ipv6: Do not fix up linklocal and loopback addresses
From: David Ahern @ 2016-04-22 22:14 UTC (permalink / raw)
To: Mike Manning, netdev
In-Reply-To: <571A4DEB.3090204@brocade.com>
On 4/22/16 10:14 AM, Mike Manning wrote:
> commit f1705ec197e7 ("net: ipv6: Make address flushing on ifdown
> optional") added the option to retain user configured addresses on an
> admin down. A comment to one of the later revisions suggested using
> the IFA_F_PERMANENT flag rather than adding a user_managed boolean to
> the ifaddr struct. A side effect of this change is that link local and
> loopback addresses were also retained which was not part of the
> objective of the original changes. The commit 70af921db6f8 ("net: ipv6:
> Do not keep linklocal and loopback addresses") ensures that these are
> no longer kept. Similarly, the present fix ensures that these addresses
> are not incorrectly fixed up.
>
> Testing also with the recent patch in place to delete host routes on
> ifdown still shows that fixup of the LL & loopback addrs is incorrectly
> being attempted (and without that patch was causing a crash in fib6).
> Another approach to this is through code inspection by checking where
> the user_managed flag in the original dahern patch (which we have been
> using for nearly a year) has been replaced by checking IFA_F_PERMANENT.
>
> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
> Signed-off-by: Mike Manning <mmanning@brocade.com>
> ---
> net/ipv6/addrconf.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
^ permalink raw reply
* Re: linux-next: build failure after merge of the net-next tree
From: Jeff Kirsher @ 2016-04-22 23:20 UTC (permalink / raw)
To: Mark Brown, David Miller
Cc: sfr, netdev, linux-next, linux-kernel, mark.d.rustad,
andrewx.bowers, kernel-build-reports, linaro-kernel
In-Reply-To: <20160422092017.GY3217@sirena.org.uk>
[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]
On Fri, 2016-04-22 at 10:20 +0100, Mark Brown wrote:
> On Wed, Apr 13, 2016 at 11:15:13AM -0400, David Miller wrote:
> > From: Stephen Rothwell <sfr@canb.auug.org.au>
>
> > > After merging the net-next tree, today's linux-next build (arm
> > > allmodconfig) failed like thisi (this has actually been failing
> for a
> > > few days, now):
>
> > > ERROR: "__bad_udelay" [drivers/net/ethernet/intel/ixgbe/ixgbe.ko]
> undefined!
>
> > > Caused by commit
>
> > > 49425dfc7451 ("ixgbe: Add support for x550em_a 10G MAC type")
>
> > > arm only allows udelay()s up to 2 milliseconds. This commit
> > > adds a 5 ms udelay in ixgbe_acquire_swfw_sync_x550em_a() in
> > > drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c.
>
> > Jeff, please have your folks look into this. Probably just a
> simple
> > conversion to mdelay().
>
> This is still present, it's been breaking ARM allmodconfig builds for
> about two weeks now.
Interesting that no one spoke up until just a week ago. I have a fix
and I ready to push it to David Miller.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH net-next 0/2] pskb_extract() helper function.
From: marcelo.leitner @ 2016-04-22 23:23 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: netdev, rds-devel, santosh.shilimkar, davem, eric.dumazet
In-Reply-To: <cover.1461086306.git.sowmini.varadhan@oracle.com>
On Wed, Apr 20, 2016 at 03:17:40AM -0700, Sowmini Varadhan wrote:
> This patchset follows up on the discussion in
> https://www.mail-archive.com/netdev@vger.kernel.org/msg105090.html
>
> For RDS-TCP, we have to deal with the full gamut of
> nonlinear sk_buffs, including all the frag_list variants.
> Also, the parent skb has to remain unchanged, while the clone
> is queued for Rx on the PF_RDS socket.
>
> Patch 1 of this patchset adds a pskb_extract() function that
> does all this without the redundant memcpy's in pskb_expand_head()
> and __pskb_pull_tail().
I applied this patchset and updated SCTP to also use it for data chunks.
My tests results were very similar to what I had without it. Varying to
better or worse, tending worse. Thing is, SCTP always works on
linearized skbs as it can't crawl on fragments, so those clone/trim
operations are just offset adjusts regarding the data, and it's shared.
With pskb_extract, it implies in a new memory allocation and a copy,
even in this best case, so for SCTP, for now, it's actually a drawback
I'm afraid.
Marcelo
^ permalink raw reply
* Re: [PATCH net-next 1/2] skbuff: Add pskb_extract() helper function
From: marcelo.leitner @ 2016-04-22 23:27 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: netdev, rds-devel, santosh.shilimkar, davem, eric.dumazet
In-Reply-To: <8d683318ede700f5f1b4d4333dfad8a1935f9ce6.1461086306.git.sowmini.varadhan@oracle.com>
On Wed, Apr 20, 2016 at 03:17:41AM -0700, Sowmini Varadhan wrote:
...
> +/* Extract to_copy bytes starting at off from skb, and return this in
> + * a new skb
> + */
> +struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
> + int to_copy, gfp_t gfp)
> +{
> + struct sk_buff *clone = skb_clone(skb, gfp);
> +
> + if (!clone)
> + return NULL;
> +
> + if (pskb_carve(clone, off, gfp) < 0) {
> + pr_warn("pskb_carve failed\n");
You most likely don't want these pr_warn
> + kfree_skb(clone);
> + return NULL;
> + }
> +
> + if (pskb_trim(clone, to_copy)) {
> + pr_warn("pskb_trim failed\n");
> + kfree_skb(clone);
> + return NULL;
> + }
> + return clone;
Then these two blocks can be just:
if (pskb_carve(clone, off, gfp) < 0 ||
pskb_trim(clone, to_copy)) {
kfree_skb(clone);
clone = NULL;
}
return clone;
Marcelo
> +}
> +EXPORT_SYMBOL(pskb_extract);
> --
> 1.7.1
>
^ permalink raw reply
* Re: [PATCH net-next 0/2] pskb_extract() helper function.
From: Sowmini Varadhan @ 2016-04-22 23:41 UTC (permalink / raw)
To: marcelo.leitner; +Cc: netdev, rds-devel, santosh.shilimkar, davem, eric.dumazet
In-Reply-To: <20160422232301.GA1594@localhost.localdomain>
On (04/22/16 20:23), marcelo.leitner@gmail.com wrote:
> My tests results were very similar to what I had without it. Varying to
> better or worse, tending worse. Thing is, SCTP always works on
> linearized skbs as it can't crawl on fragments, so those clone/trim
sorry to hear that. For RDS-TCP, the rx side does show a noticeable
benefit with rds-stress. To an extent, this is also impacted
by the packet size, and the type of test (for our DB workloads,
we use request-response tests, and the packet size tends to
typically be 8K req, 256 byt responses), so I guess ymmv.
--Sowmini
^ permalink raw reply
* Re: [PATCH net 2/3] gre: build header correctly for collect metadata tunnels
From: Simon Horman @ 2016-04-23 0:03 UTC (permalink / raw)
To: pravin shelar
Cc: Jiri Benc, Linux Kernel Network Developers, Pravin B Shelar,
Thomas Graf
In-Reply-To: <CAOrHB_CbD7ycnSJ4ewkj1WOu-qnYmwAYtusj1M4cfAhtN5J20Q@mail.gmail.com>
On Fri, Apr 22, 2016 at 02:05:06PM -0700, pravin shelar wrote:
> On Fri, Apr 22, 2016 at 10:44 AM, Jiri Benc <jbenc@redhat.com> wrote:
> > In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to
> > contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the
> > case, the interface is ARPHRD_IPGRE and the protocol to be used for
> > encapsulation is skb->protocol.
> >
> > Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.")
> > Signed-off-by: Jiri Benc <jbenc@redhat.com>
>
> Acked-by: Pravin B Shelar <pshelar@ovn.org>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
^ permalink raw reply
* Re: [PATCH v3] prism54: isl_38xx: Replace 'struct timeval'
From: Luis R. Rodriguez @ 2016-04-23 1:30 UTC (permalink / raw)
To: Kalle Valo
Cc: Arnd Bergmann, y2038, netdev, linux-wireless, linux-kernel,
Tina Ruchandani, Johannes Berg
In-Reply-To: <87inz9y0qg.fsf@purkki.adurom.net>
On Apr 22, 2016 8:09 PM, "Kalle Valo" <kvalo@codeaurora.org> wrote:
>
> Johannes Berg <johannes@sipsolutions.net> writes:
>
> > On Mon, 2016-04-18 at 00:10 +0200, Arnd Bergmann wrote:
> >> On Sunday 17 April 2016 14:42:33 Johannes Berg wrote:
> >> >
> >> > I was thinking more restrictively of just the stuff that can't even
> >> > be built without modifying the sources - like the "#if VERBOSE"
> >> > thing.
> >>
> >> All the DEBUG() statements are inside of this kind of check, so if we
> >> remove the #ifdefs, it would be logical to remove the rest of the
> >> debugging infrastructure (DEBUG() macros, SHOW_*, pc_debug, maybe
> >> more) as well.
> >
> > Seems reasonable.
> >
> > Maybe we should Cc the maintainer, but I suspect that since the driver
> > is marked Obsolete anyway Luis won't care either :)
>
> I'm planning to apply this patch anyway, the debugging infrastructure
> removal can be a followup patch. But please let me know if I should drop
> this instead.
I'd say let's bury the driver now. We have a process to stage large chunks
of poo now through staging, let's use that to shaft prism54 there now and
give it a few cycles to let people call bloody murder before finally
removing. We need to do better at deleting ancient legacy shit. I'll stand
behind this one. Sue me if needed.
Luis
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038
^ permalink raw reply
* [PATCH v2 net-next 1/2] skbuff: Add pskb_extract() helper function
From: Sowmini Varadhan @ 2016-04-23 1:36 UTC (permalink / raw)
To: netdev, rds-devel, santosh.shilimkar, davem
Cc: sowmini.varadhan, eric.dumazet, marcelo.leitner
In-Reply-To: <cover.1461368732.git.sowmini.varadhan@oracle.com>
A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.
The existing code uses the sequence
clone = skb_clone(..);
pskb_pull(clone, off, ..);
pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().
To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: Marcelo Leitner review comments
include/linux/skbuff.h | 2 +
net/core/skbuff.c | 242 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 244 insertions(+), 0 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da0ace3..a1ce639 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2986,6 +2986,8 @@ struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
int skb_ensure_writable(struct sk_buff *skb, int write_len);
int skb_vlan_pop(struct sk_buff *skb);
int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
+struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
+ gfp_t gfp);
static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
{
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4cc594c..7b287a8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4619,3 +4619,245 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
return NULL;
}
EXPORT_SYMBOL(alloc_skb_with_frags);
+
+/* carve out the first off bytes from skb when off < headlen */
+static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
+ const int headlen, gfp_t gfp_mask)
+{
+ int i;
+ int size = skb_end_offset(skb);
+ int new_hlen = headlen - off;
+ u8 *data;
+ int doff = 0;
+
+ size = SKB_DATA_ALIGN(size);
+
+ if (skb_pfmemalloc(skb))
+ gfp_mask |= __GFP_MEMALLOC;
+ data = kmalloc_reserve(size +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+ gfp_mask, NUMA_NO_NODE, NULL);
+ if (!data)
+ return -ENOMEM;
+
+ size = SKB_WITH_OVERHEAD(ksize(data));
+
+ /* Copy real data, and all frags */
+ skb_copy_from_linear_data_offset(skb, off, data, new_hlen);
+ skb->len -= off;
+
+ memcpy((struct skb_shared_info *)(data + size),
+ skb_shinfo(skb),
+ offsetof(struct skb_shared_info,
+ frags[skb_shinfo(skb)->nr_frags]));
+ if (skb_cloned(skb)) {
+ /* drop the old head gracefully */
+ if (skb_orphan_frags(skb, gfp_mask)) {
+ kfree(data);
+ return -ENOMEM;
+ }
+ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+ skb_frag_ref(skb, i);
+ if (skb_has_frag_list(skb))
+ skb_clone_fraglist(skb);
+ skb_release_data(skb);
+ } else {
+ /* we can reuse existing recount- all we did was
+ * relocate values
+ */
+ skb_free_head(skb);
+ }
+
+ doff = (data - skb->head);
+ skb->head = data;
+ skb->data = data;
+ skb->head_frag = 0;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ skb->end = size;
+ doff = 0;
+#else
+ skb->end = skb->head + size;
+#endif
+ skb_set_tail_pointer(skb, skb_headlen(skb));
+ skb_headers_offset_update(skb, 0);
+ skb->cloned = 0;
+ skb->hdr_len = 0;
+ skb->nohdr = 0;
+ atomic_set(&skb_shinfo(skb)->dataref, 1);
+
+ return 0;
+}
+
+static int pskb_carve(struct sk_buff *skb, const u32 off, gfp_t gfp);
+
+/* carve out the first eat bytes from skb's frag_list. May recurse into
+ * pskb_carve()
+ */
+static int pskb_carve_frag_list(struct sk_buff *skb,
+ struct skb_shared_info *shinfo, int eat,
+ gfp_t gfp_mask)
+{
+ struct sk_buff *list = shinfo->frag_list;
+ struct sk_buff *clone = NULL;
+ struct sk_buff *insp = NULL;
+
+ do {
+ if (!list) {
+ pr_err("Not enough bytes to eat. Want %d\n", eat);
+ return -EFAULT;
+ }
+ if (list->len <= eat) {
+ /* Eaten as whole. */
+ eat -= list->len;
+ list = list->next;
+ insp = list;
+ } else {
+ /* Eaten partially. */
+ if (skb_shared(list)) {
+ clone = skb_clone(list, gfp_mask);
+ if (!clone)
+ return -ENOMEM;
+ insp = list->next;
+ list = clone;
+ } else {
+ /* This may be pulled without problems. */
+ insp = list;
+ }
+ if (pskb_carve(list, eat, gfp_mask) < 0) {
+ kfree_skb(clone);
+ return -ENOMEM;
+ }
+ break;
+ }
+ } while (eat);
+
+ /* Free pulled out fragments. */
+ while ((list = shinfo->frag_list) != insp) {
+ shinfo->frag_list = list->next;
+ kfree_skb(list);
+ }
+ /* And insert new clone at head. */
+ if (clone) {
+ clone->next = list;
+ shinfo->frag_list = clone;
+ }
+ return 0;
+}
+
+/* carve off first len bytes from skb. Split line (off) is in the
+ * non-linear part of skb
+ */
+static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
+ int pos, gfp_t gfp_mask)
+{
+ int i, k = 0;
+ int size = skb_end_offset(skb);
+ u8 *data;
+ const int nfrags = skb_shinfo(skb)->nr_frags;
+ struct skb_shared_info *shinfo;
+ int doff = 0;
+
+ size = SKB_DATA_ALIGN(size);
+
+ if (skb_pfmemalloc(skb))
+ gfp_mask |= __GFP_MEMALLOC;
+ data = kmalloc_reserve(size +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+ gfp_mask, NUMA_NO_NODE, NULL);
+ if (!data)
+ return -ENOMEM;
+
+ size = SKB_WITH_OVERHEAD(ksize(data));
+
+ memcpy((struct skb_shared_info *)(data + size),
+ skb_shinfo(skb), offsetof(struct skb_shared_info,
+ frags[skb_shinfo(skb)->nr_frags]));
+ if (skb_orphan_frags(skb, gfp_mask)) {
+ kfree(data);
+ return -ENOMEM;
+ }
+ shinfo = (struct skb_shared_info *)(data + size);
+ for (i = 0; i < nfrags; i++) {
+ int fsize = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+
+ if (pos + fsize > off) {
+ shinfo->frags[k] = skb_shinfo(skb)->frags[i];
+
+ if (pos < off) {
+ /* Split frag.
+ * We have two variants in this case:
+ * 1. Move all the frag to the second
+ * part, if it is possible. F.e.
+ * this approach is mandatory for TUX,
+ * where splitting is expensive.
+ * 2. Split is accurately. We make this.
+ */
+ shinfo->frags[0].page_offset += off - pos;
+ skb_frag_size_sub(&shinfo->frags[0], off - pos);
+ }
+ skb_frag_ref(skb, i);
+ k++;
+ }
+ pos += fsize;
+ }
+ shinfo->nr_frags = k;
+ if (skb_has_frag_list(skb))
+ skb_clone_fraglist(skb);
+
+ if (k == 0) {
+ /* split line is in frag list */
+ pskb_carve_frag_list(skb, shinfo, off - pos, gfp_mask);
+ }
+ skb_release_data(skb);
+
+ doff = (data - skb->head);
+ skb->head = data;
+ skb->head_frag = 0;
+ skb->data = data;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ skb->end = size;
+ doff = 0;
+#else
+ skb->end = skb->head + size;
+#endif
+ skb_reset_tail_pointer(skb);
+ skb_headers_offset_update(skb, 0);
+ skb->cloned = 0;
+ skb->hdr_len = 0;
+ skb->nohdr = 0;
+ skb->len -= off;
+ skb->data_len = skb->len;
+ atomic_set(&skb_shinfo(skb)->dataref, 1);
+ return 0;
+}
+
+/* remove len bytes from the beginning of the skb */
+static int pskb_carve(struct sk_buff *skb, const u32 len, gfp_t gfp)
+{
+ int headlen = skb_headlen(skb);
+
+ if (len < headlen)
+ return pskb_carve_inside_header(skb, len, headlen, gfp);
+ else
+ return pskb_carve_inside_nonlinear(skb, len, headlen, gfp);
+}
+
+/* Extract to_copy bytes starting at off from skb, and return this in
+ * a new skb
+ */
+struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
+ int to_copy, gfp_t gfp)
+{
+ struct sk_buff *clone = skb_clone(skb, gfp);
+
+ if (!clone)
+ return NULL;
+
+ if (pskb_carve(clone, off, gfp) < 0 ||
+ pskb_trim(clone, to_copy)) {
+ kfree_skb(clone);
+ return NULL;
+ }
+ return clone;
+}
+EXPORT_SYMBOL(pskb_extract);
--
1.7.1
^ permalink raw reply related
* [PATCH v2 net-next 2/2] RDS: TCP: Call pskb_extract() helper function
From: Sowmini Varadhan @ 2016-04-23 1:36 UTC (permalink / raw)
To: netdev, rds-devel, santosh.shilimkar, davem
Cc: sowmini.varadhan, eric.dumazet, marcelo.leitner
In-Reply-To: <cover.1461368732.git.sowmini.varadhan@oracle.com>
rds-stress experiments with request size 256 bytes, 8K acks,
using 16 threads show a 40% improvment when pskb_extract()
replaces the {skb_clone(..); pskb_pull(..); pskb_trim(..);}
pattern in the Rx path, so we leverage the perf gain with
this commit.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp_recv.c | 14 +++-----------
1 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index 27a9921..d75d8b5 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -207,22 +207,14 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
}
if (left && tc->t_tinc_data_rem) {
- clone = skb_clone(skb, arg->gfp);
+ to_copy = min(tc->t_tinc_data_rem, left);
+
+ clone = pskb_extract(skb, offset, to_copy, arg->gfp);
if (!clone) {
desc->error = -ENOMEM;
goto out;
}
- to_copy = min(tc->t_tinc_data_rem, left);
- if (!pskb_pull(clone, offset) ||
- pskb_trim(clone, to_copy)) {
- pr_warn("rds_tcp_data_recv: pull/trim failed "
- "left %zu data_rem %zu skb_len %d\n",
- left, tc->t_tinc_data_rem, skb->len);
- kfree_skb(clone);
- desc->error = -ENOMEM;
- goto out;
- }
skb_queue_tail(&tinc->ti_skb_list, clone);
rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "
--
1.7.1
^ permalink raw reply related
* [PATCH v2 net-next 0/2] pskb_extract() helper function.
From: Sowmini Varadhan @ 2016-04-23 1:36 UTC (permalink / raw)
To: netdev, rds-devel, santosh.shilimkar, davem
Cc: sowmini.varadhan, eric.dumazet, marcelo.leitner
This patchset follows up on the discussion in
https://www.mail-archive.com/netdev@vger.kernel.org/msg105090.html
For RDS-TCP, we have to deal with the full gamut of
nonlinear sk_buffs, including all the frag_list variants.
Also, the parent skb has to remain unchanged, while the clone
is queued for Rx on the PF_RDS socket.
Patch 1 of this patchset adds a pskb_extract() function that
does all this without the redundant memcpy's in pskb_expand_head()
and __pskb_pull_tail().
v2: Marcelo Leitner review comments
Sowmini Varadhan (2):
Add pskb_extract() helper function
Call pskb_extract() helper function
include/linux/skbuff.h | 2 +
net/core/skbuff.c | 242 ++++++++++++++++++++++++++++++++++++++++++++++++
net/rds/tcp_recv.c | 14 +--
3 files changed, 247 insertions(+), 11 deletions(-)
^ permalink raw reply
* Re: [PATCH net-next 04/10] net: hns: add attribute reset-field-offset for dsaf node
From: Yisen Zhuang @ 2016-04-23 1:41 UTC (permalink / raw)
To: Rob Herring
Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
galak-sgV2jX0FEOL9JmXXK+q4OQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
yankejian-hv44wF8Li93QT0dZR+AlfA,
huangdaode-C8/M+/jPZTeaMJb+Lgu22Q,
salil.mehta-hv44wF8Li93QT0dZR+AlfA,
lipeng321-hv44wF8Li93QT0dZR+AlfA, liguozhu-hv44wF8Li93QT0dZR+AlfA,
xieqianqian-hv44wF8Li93QT0dZR+AlfA,
linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20160422204418.GA19071@rob-hp-laptop>
Hi Rob,
Thanks for your suggestion. I will put DT bindings in separate patches,
and modify related dts file in next version.
Thanks,
Yisen
在 2016/4/23 4:44, Rob Herring 写道:
> On Fri, Apr 22, 2016 at 03:20:13PM +0800, Yisen Zhuang wrote:
>> Add the subctrl reset offset for dsaf, this property is used to reset
>> xge/ge ports for different dsaf. If this attribute is not present,
>> default value 0 will be used.
>>
>> Signed-off-by: Daode Huang <huangdaode-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org>
>> Signed-off-by: Yisen Zhuang <yisen.zhuang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> ---
>> .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 2 ++
>
> Please put DT bindings in separate patches and don't modify the same
> binding in a series of patches. You are describing h/w and the h/w is
> not changing.
>
>> drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 8 +++++
>> drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 1 +
>> drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 40 +++++++++++++++-------
>> 4 files changed, 39 insertions(+), 12 deletions(-)
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net 1/3] gre: do not assign header_ops in collect metadata mode
From: Thomas Graf @ 2016-04-23 1:41 UTC (permalink / raw)
To: Jiri Benc
Cc: pravin shelar, Linux Kernel Network Developers, Pravin B Shelar,
Simon Horman
In-Reply-To: <20160422232054.3f6b47da@griffin>
On 04/22/16 at 11:20pm, Jiri Benc wrote:
> On Fri, 22 Apr 2016 14:04:48 -0700, pravin shelar wrote:
> > I think we should we return error in case of such configuration rather
> > than silently ignoring it.
>
> I thought about it and I'm not sure. We're not returning an error
> currently, starting returning it now might be perceived as uAPI
> breakage.
>
> But given it doesn't work at all currently, there are apparently no
> users yet. I'll wait for more feedback.
As a user, I would probably favour receiving an error for a configuration
that can't possibly work and was not working before.
^ permalink raw reply
* Re: [PATCH net 3/3] gre: receive also TEB packets for lwtunnels
From: Thomas Graf @ 2016-04-23 1:49 UTC (permalink / raw)
To: Jiri Benc; +Cc: netdev, Pravin B Shelar, Simon Horman
In-Reply-To: <017d8f2b1c00e8fdbdb92a5898b1b5f365b58d6d.1461346798.git.jbenc@redhat.com>
On 04/22/16 at 07:44pm, Jiri Benc wrote:
> For ipgre interfaces in collect metadata mode, receive also traffic with
> encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
> out correctly. This allows to have mixed Ethernet + L3-only traffic on the
> same lwtunnel interface.
>
> To keep backwards compatibility and prevent any surprises, gretap interfaces
> have priority in receiving packets with Ethernet headers.
I may be missing some context. Is anyone using this already or is this
preparing the stage for another user? It's not clear to me from the
commit message.
^ permalink raw reply
* Re: [PATCH net-next] macvlan: fix failure during registration v2
From: Eric W. Biederman @ 2016-04-23 2:52 UTC (permalink / raw)
To: Francesco Ruggeri; +Cc: netdev, David S. Miller, Mahesh Bandewar
In-Reply-To: <CA+HUmGhxBuCYcRcyDMctTxcw-p+Z3-kdx72aaWNLi-WjpEk2sQ@mail.gmail.com>
Francesco Ruggeri <fruggeri@arista.com> writes:
> On Thu, Apr 21, 2016 at 10:44 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> <
>>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>>> index 95394ed..e770221 100644
>>> --- a/drivers/net/macvtap.c
>>> +++ b/drivers/net/macvtap.c
>>> @@ -1303,6 +1303,8 @@ static int macvtap_device_event(struct notifier_block *unused,
>>> }
>>> break;
>>> case NETDEV_UNREGISTER:
>>> + if (vlan->minor == 0)
>>> + break;
>>
>> I don't understand this bit. A minor of 0 is never assigned. That is
>> clear from the code. On what code path can you get here without
>> assigning a minor?
>>
>
> You can have vlan->minor == 0 if macvtap_device_event(NETDEV_REGISTER)
> failed.
>
> macvtap_device_event is invoked in the context of macvtap_newlink and
> it can fail if for example a macvtap interface using the same ifindex
> already exists in a different namespace. That is how we originally ran
> into the port->count issue.
> In that case the sequence is
>
> macvtap_newlink
> macvlan_common_newlink
> register_netdevice
> call_netdevice_notifiers(NETDEV_REGISTER, dev)
> macvtap_device_event(NETDEV_REGISTER)
> <fail here, vlan->minor = 0>
> rollback_registered(dev);
> rollback_registered_many
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
> macvtap_device_event(NETDEV_UNREGISTER)
> <nothing to do here>
>
> Should this bit go into a separate patch?
> Would a comment like this help:
>
> /* We can have vlan->minor == 0 if NETDEV_REGISTER above failed */
>
> Marc Angel posted https://marc.info/?l=linux-netdev&m=146116146925511&w=2
> about conflicts if one tries to create macvtap interfaces with the same
> ifindex in different namespaces.
Just for clarity I would recommend splitting the two changes.
One change that fixes macvlan_init to do the right thing.
Another change that includes your backtrace above, adds the comment
and fixes macvlan to ignore that case. Arguably we should be fixing
register_netdevice to call call_netdeivce_notifiers(NETDEV_UNREIGSTER)
if call_netdevice_notifiers(NETDEV_REGISTER) failed. Having a separate
patch will at least allow us to look at this second issue all by itself.
Thank you for your patience in all of this.
Eric
^ permalink raw reply
* Re: [PATCH net 3/3] gre: receive also TEB packets for lwtunnels
From: pravin shelar @ 2016-04-23 3:40 UTC (permalink / raw)
To: Jiri Benc
Cc: Linux Kernel Network Developers, Pravin B Shelar, Thomas Graf,
Simon Horman
In-Reply-To: <20160422232732.7dfd0ec3@griffin>
On Fri, Apr 22, 2016 at 2:27 PM, Jiri Benc <jbenc@redhat.com> wrote:
> On Fri, 22 Apr 2016 14:07:01 -0700, pravin shelar wrote:
>> On Fri, Apr 22, 2016 at 10:44 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> > For ipgre interfaces in collect metadata mode, receive also traffic with
>> > encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
>> > out correctly. This allows to have mixed Ethernet + L3-only traffic on the
>> > same lwtunnel interface.
>> >
>> How user is suppose to sort out the type of packet? whether it is L2 or L3.
>
> By skb->protocol, of course. Will be ETH_P_TEB if the inner packet was
> Ethernet. There's no problem with this, for lwtunnels used with encap
> routes, such packets are just discarded (which is the expected
> behavior), as there's no protocol handler for ETH_P_TEB. More clever
> lwtunnel users (such as openvswitch) can process the packets. This is
> important in order to have a single tunnel interface for everything.
>
But skb->protocol is not set to ETH_P_TEB anywhere in ip-gre module.
Am I missing something?
>> Is it fine to receive L2 packet over L3 device?
>> At least ip_tunnel_rcv() is not ready to handle such packet.
>
> I don't see why. Could you please elaborate? Seems safe to me. The
> desired result is skb->protocol == htons(ETH_P_TEB), network header(!)
> pointing to the start of the Ethernet frame (because we're tunneling
> Ethernet via ARPHRD_IPGRE interface, there's no L2 header to have), mac
> header not being set.
>
ip_tunnel_rcv() checks device type (tunnel->dev->type) to perform
ethernet specific processing on packet. I think that should be changed
to check packet type.
^ permalink raw reply
* Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420
From: Shi, Yang @ 2016-04-23 4:02 UTC (permalink / raw)
To: David S. Miller, hannes; +Cc: LKML, Network Development
Hi David,
When I ran some test on a nfs mounted rootfs, I got the below warning
with LOCKDEP enabled on linux-next-20160420:
WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408
udp_queue_rcv_skb+0x3d0/0x660
Modules linked in:
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G D
4.6.0-rc4-next-20160420-WR7.0.0.0_standard+ #6
Hardware name: Intel Corporation S5520HC/S5520HC, BIOS
S5500.86B.01.10.0025.030220091519 03/02/2009
0000000000000000 ffff88066fd03a70 ffffffff8155855f 0000000000000000
0000000000000000 ffff88066fd03ab0 ffffffff81062803 0000058061318ec8
ffff88065d1e39c0 ffff880661318e40 0000000000000000 ffff880661318ec8
Call Trace:
<IRQ> [<ffffffff8155855f>] dump_stack+0x67/0x98
Checking out fil [<ffffffff81062803>] __warn+0xd3/0xf0
[<ffffffff810628ed>] warn_slowpath_null+0x1d/0x20
[<ffffffff81aa48f0>] udp_queue_rcv_skb+0x3d0/0x660
[<ffffffff81aa505c>] __udp4_lib_rcv+0x4dc/0xc00
[<ffffffff81aa5b5a>] udp_rcv+0x1a/0x20
[<ffffffff81a728a1>] ip_local_deliver_finish+0xd1/0x2e0
es: 57% (30585/ [<ffffffff81a7280f>] ? ip_local_deliver_finish+0x3f/0x2e0
[<ffffffff81a73262>] ip_local_deliver+0xc2/0xd0
[<ffffffff81a72c92>] ip_rcv_finish+0x1e2/0x5a0
[<ffffffff81a7354c>] ip_rcv+0x2dc/0x410
[<ffffffff81a20a32>] ? __pskb_pull_tail+0x82/0x400
[<ffffffff81a2e188>] __netif_receive_skb_core+0x3a8/0xa80
[<ffffffff81a30b9b>] ? netif_receive_skb_internal+0x1b/0xf0
[<ffffffff81a30b3d>] __netif_receive_skb+0x1d/0x60
[<ffffffff81a30bd5>] netif_receive_skb_internal+0x55/0xf0
[<ffffffff81a30b9b>] ? netif_receive_skb_internal+0x1b/0xf0
[<ffffffff81a31b52>] napi_gro_receive+0xc2/0x180
[<ffffffff8187188a>] igb_poll+0x5ea/0xdf0
[<ffffffff81a32b9c>] net_rx_action+0x15c/0x3d0
[<ffffffff81c668c1>] __do_softirq+0x161/0x413
[<ffffffff810683a1>] irq_exit+0xd1/0x110
[<ffffffff81c664d2>] do_IRQ+0x62/0xf0
[<ffffffff81c6474e>] common_interrupt+0x8e/0x8e
<EOI> [<ffffffff8198d9c6>] ? cpuidle_enter_state+0xc6/0x290
[<ffffffff8198dbc7>] cpuidle_enter+0x17/0x20
[<ffffffff810aa963>] call_cpuidle+0x33/0x50
[<ffffffff810aace9>] cpu_startup_entry+0x229/0x3b0
[<ffffffff810407e4>] start_secondary+0x144/0x150
---[ end trace ba508c424f0d52bf ]---
The warning is triggered by commit
fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 ("sock: tigthen lockdep checks
for sock_owned_by_user"), which checks if slock is held before locking
"owned".
It looks good to lock_sock which is just called lock_sock_nested. But,
bh_lock_sock is different, which just calls spin_lock so it doesn't
touch dep_map then the check will fail even though it is locked.
So, I'm wondering what a right fix for it should be:
1. Replace bh_lock_sock to bh_lock_sock_nested in the protocols
implementation, but there are a lot places calling it.
2. Just like lock_sock, just call bh_lock_sock_nested instead of spin_lock.
Or the both approach is wrong or not ideal?
Thanks,
Yang
^ permalink raw reply
* Re: Warning triggered by lockdep checks for sock_owned_by_user on linux-next-20160420
From: Eric Dumazet @ 2016-04-23 4:50 UTC (permalink / raw)
To: Shi, Yang; +Cc: David S. Miller, hannes, LKML, Network Development
In-Reply-To: <571AF3C2.3010509@linaro.org>
On Fri, 2016-04-22 at 21:02 -0700, Shi, Yang wrote:
> Hi David,
>
> When I ran some test on a nfs mounted rootfs, I got the below warning
> with LOCKDEP enabled on linux-next-20160420:
>
> WARNING: CPU: 9 PID: 0 at include/net/sock.h:1408
> udp_queue_rcv_skb+0x3d0/0x660
> Modules linked in:
> CPU: 9 PID: 0 Comm: swapper/9 Tainted: G D
> 4.6.0-rc4-next-20160420-WR7.0.0.0_standard+ #6
> Hardware name: Intel Corporation S5520HC/S5520HC, BIOS
> S5500.86B.01.10.0025.030220091519 03/02/2009
> 0000000000000000 ffff88066fd03a70 ffffffff8155855f 0000000000000000
> 0000000000000000 ffff88066fd03ab0 ffffffff81062803 0000058061318ec8
> ffff88065d1e39c0 ffff880661318e40 0000000000000000 ffff880661318ec8
> Call Trace:
> <IRQ> [<ffffffff8155855f>] dump_stack+0x67/0x98
> Checking out fil [<ffffffff81062803>] __warn+0xd3/0xf0
> [<ffffffff810628ed>] warn_slowpath_null+0x1d/0x20
> [<ffffffff81aa48f0>] udp_queue_rcv_skb+0x3d0/0x660
> [<ffffffff81aa505c>] __udp4_lib_rcv+0x4dc/0xc00
> [<ffffffff81aa5b5a>] udp_rcv+0x1a/0x20
> [<ffffffff81a728a1>] ip_local_deliver_finish+0xd1/0x2e0
> es: 57% (30585/ [<ffffffff81a7280f>] ? ip_local_deliver_finish+0x3f/0x2e0
> [<ffffffff81a73262>] ip_local_deliver+0xc2/0xd0
> [<ffffffff81a72c92>] ip_rcv_finish+0x1e2/0x5a0
> [<ffffffff81a7354c>] ip_rcv+0x2dc/0x410
> [<ffffffff81a20a32>] ? __pskb_pull_tail+0x82/0x400
> [<ffffffff81a2e188>] __netif_receive_skb_core+0x3a8/0xa80
> [<ffffffff81a30b9b>] ? netif_receive_skb_internal+0x1b/0xf0
> [<ffffffff81a30b3d>] __netif_receive_skb+0x1d/0x60
> [<ffffffff81a30bd5>] netif_receive_skb_internal+0x55/0xf0
> [<ffffffff81a30b9b>] ? netif_receive_skb_internal+0x1b/0xf0
> [<ffffffff81a31b52>] napi_gro_receive+0xc2/0x180
> [<ffffffff8187188a>] igb_poll+0x5ea/0xdf0
> [<ffffffff81a32b9c>] net_rx_action+0x15c/0x3d0
> [<ffffffff81c668c1>] __do_softirq+0x161/0x413
> [<ffffffff810683a1>] irq_exit+0xd1/0x110
> [<ffffffff81c664d2>] do_IRQ+0x62/0xf0
> [<ffffffff81c6474e>] common_interrupt+0x8e/0x8e
> <EOI> [<ffffffff8198d9c6>] ? cpuidle_enter_state+0xc6/0x290
> [<ffffffff8198dbc7>] cpuidle_enter+0x17/0x20
> [<ffffffff810aa963>] call_cpuidle+0x33/0x50
> [<ffffffff810aace9>] cpu_startup_entry+0x229/0x3b0
> [<ffffffff810407e4>] start_secondary+0x144/0x150
> ---[ end trace ba508c424f0d52bf ]---
>
>
> The warning is triggered by commit
> fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3 ("sock: tigthen lockdep checks
> for sock_owned_by_user"), which checks if slock is held before locking
> "owned".
>
> It looks good to lock_sock which is just called lock_sock_nested. But,
> bh_lock_sock is different, which just calls spin_lock so it doesn't
> touch dep_map then the check will fail even though it is locked.
?? spin_lock() definitely is lockdep friendly.
>
> So, I'm wondering what a right fix for it should be:
>
> 1. Replace bh_lock_sock to bh_lock_sock_nested in the protocols
> implementation, but there are a lot places calling it.
>
> 2. Just like lock_sock, just call bh_lock_sock_nested instead of spin_lock.
>
> Or the both approach is wrong or not ideal?
I sent a patch yesterday, I am not sure what the status is.
diff --git a/include/net/sock.h b/include/net/sock.h
index d997ec13a643..db8301c76d50 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
{
struct sock *sk = (struct sock *)csk;
- return lockdep_is_held(&sk->sk_lock) ||
+ return !debug_locks ||
+ lockdep_is_held(&sk->sk_lock) ||
lockdep_is_held(&sk->sk_lock.slock);
}
#endif
^ permalink raw reply related
* Re: [PATCH net 3/3] gre: receive also TEB packets for lwtunnels
From: Simon Horman @ 2016-04-23 5:09 UTC (permalink / raw)
To: Thomas Graf; +Cc: Jiri Benc, netdev, Pravin B Shelar
In-Reply-To: <20160423014938.GD32327@pox.localdomain>
On Sat, Apr 23, 2016 at 03:49:38AM +0200, Thomas Graf wrote:
> On 04/22/16 at 07:44pm, Jiri Benc wrote:
> > For ipgre interfaces in collect metadata mode, receive also traffic with
> > encapsulated Ethernet headers. The lwtunnel users are supposed to sort this
> > out correctly. This allows to have mixed Ethernet + L3-only traffic on the
> > same lwtunnel interface.
> >
> > To keep backwards compatibility and prevent any surprises, gretap interfaces
> > have priority in receiving packets with Ethernet headers.
>
> I may be missing some context. Is anyone using this already or is this
> preparing the stage for another user? It's not clear to me from the
> commit message.
Hi Thomas,
I'm not sure what use Jiri may have in mind but I plan to use
this to allow OvS to support packets without an Ethernet header to
be received from and sent to a GRE tunnel.
I am reasonably sure there are no existing users.
^ permalink raw reply
* [PATCH v2 net-next 10/13] net: hns: add attribute port-mode-offset for dsaf port node
From: Yisen Zhuang @ 2016-04-23 9:05 UTC (permalink / raw)
To: devicetree, netdev, linux-arm-kernel
Cc: robh+dt, pawel.moll, mark.rutland, ijc+devicetree, galak, davem,
will.deacon, catalin.marinas, yankejian, huangdaode, salil.mehta,
lipeng321, liguozhu, xieqianqian, xuwei5, linuxarm
In-Reply-To: <1461402317-136499-1-git-send-email-Yisen.Zhuang@huawei.com>
Port mode offset for each dsaf port is different. The current code is not
so readability. This patch adds configuration named port-mode-offset to
make the code simple and more readability. If port-mode-offset isn't
exists, default value 0 will be used.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
---
change log:
PATCH v2:
- put DT bindings in separate patches.
PATCH v1:
- first submit
---
drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 10 +++++
drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 1 +
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 44 ++++++++++------------
3 files changed, 30 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index 52d757d..1c8fdd3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -665,6 +665,7 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
mac_cb->max_frm = MAC_DEFAULT_MTU;
mac_cb->tx_pause_frm_time = MAC_DEFAULT_PAUSE_TIME;
mac_cb->port_rst_off = mac_cb->mac_id;
+ mac_cb->port_mode_off = 0;
/* if the dsaf node doesn't contain a port subnode, get phy-handle
* from dsaf node
@@ -703,6 +704,15 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb)
mac_cb->mac_id);
}
+ ret = fwnode_property_read_u32(mac_cb->fw_port,
+ "port-mode-offset",
+ &mac_cb->port_mode_off);
+ if (ret) {
+ dev_dbg(mac_cb->dev,
+ "mac%d port-mode-offset not found, use default value.\n",
+ mac_cb->mac_id);
+ }
+
syscon = syscon_node_to_regmap(
of_parse_phandle(to_of_node(mac_cb->fw_port),
"cpld-syscon", 0));
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
index 7be7104..97ce9a7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
@@ -319,6 +319,7 @@ struct hns_mac_cb {
struct regmap *cpld_ctrl;
u32 cpld_ctrl_reg;
u32 port_rst_off;
+ u32 port_mode_off;
struct mac_entry_idx addr_entry_idx[DSAF_MAX_VM_NUM];
u8 sfp_prsnt;
u8 cpld_led_value;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
index e549a11..a837bb9 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
@@ -265,37 +265,31 @@ phy_interface_t hns_mac_get_phy_if(struct hns_mac_cb *mac_cb)
{
u32 mode;
u32 reg;
- u32 shift;
- u32 phy_offset;
bool is_ver1 = AE_IS_VER1(mac_cb->dsaf_dev->dsaf_ver);
int mac_id = mac_cb->mac_id;
- phy_interface_t phy_if = PHY_INTERFACE_MODE_NA;
+ phy_interface_t phy_if;
- if (is_ver1 && HNS_DSAF_IS_DEBUG(mac_cb->dsaf_dev)) {
- phy_if = PHY_INTERFACE_MODE_SGMII;
- } else if (mac_id >= 0 && mac_id <= 3 &&
- !HNS_DSAF_IS_DEBUG(mac_cb->dsaf_dev)) {
- reg = is_ver1 ? HNS_MAC_HILINK4_REG : HNS_MAC_HILINK4V2_REG;
- mode = dsaf_read_sub(mac_cb->dsaf_dev, reg);
- /* mac_id 0, 1, 2, 3 ---> hilink4 lane 0, 1, 2, 3 */
- shift = is_ver1 ? 0 : mac_id;
- if (dsaf_get_bit(mode, shift))
- phy_if = PHY_INTERFACE_MODE_XGMII;
+ if (is_ver1) {
+ if (HNS_DSAF_IS_DEBUG(mac_cb->dsaf_dev))
+ return PHY_INTERFACE_MODE_SGMII;
+
+ if (mac_id >= 0 && mac_id <= 3)
+ reg = HNS_MAC_HILINK4_REG;
else
- phy_if = PHY_INTERFACE_MODE_SGMII;
- } else {
- reg = is_ver1 ? HNS_MAC_HILINK3_REG : HNS_MAC_HILINK3V2_REG;
- mode = dsaf_read_sub(mac_cb->dsaf_dev, reg);
- /* mac_id 4, 5,---> hilink3 lane 2, 3
- * debug port 0(6), 1(7) ---> hilink3 lane 0, 1
- */
- phy_offset = mac_cb->dsaf_dev->reset_offset - 1;
- shift = is_ver1 ? 0 : mac_id >= 4 ? mac_id - 2 : phy_offset;
- if (dsaf_get_bit(mode, shift))
- phy_if = PHY_INTERFACE_MODE_XGMII;
+ reg = HNS_MAC_HILINK3_REG;
+ } else{
+ if (!HNS_DSAF_IS_DEBUG(mac_cb->dsaf_dev) && mac_id <= 3)
+ reg = HNS_MAC_HILINK4V2_REG;
else
- phy_if = PHY_INTERFACE_MODE_SGMII;
+ reg = HNS_MAC_HILINK3V2_REG;
}
+
+ mode = dsaf_read_sub(mac_cb->dsaf_dev, reg);
+ if (dsaf_get_bit(mode, mac_cb->port_mode_off))
+ phy_if = PHY_INTERFACE_MODE_XGMII;
+ else
+ phy_if = PHY_INTERFACE_MODE_SGMII;
+
return phy_if;
}
--
1.9.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox