Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] ip: frags: fix crash in ip_do_fragment()
From: Eric Dumazet @ 2018-09-06 18:23 UTC (permalink / raw)
  To: ap420073
  Cc: David Miller, Peter Oskolkov, netdev, Pablo Neira Ayuso,
	Florian Westphal
In-Reply-To: <CANn89iJENha1FiqA_pNpaXu-+vryHjw+6a-fR4Qc3-WmQsrXdQ@mail.gmail.com>

On Thu, Sep 6, 2018 at 11:06 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Sep 6, 2018 at 10:51 AM Taehee Yoo <ap420073@gmail.com> wrote:
> >
> > A kernel crash occurrs when defragmented packet is fragmented
> > in ip_do_fragment().
> > In defragment routine, skb_orphan() is called and
> > skb->ip_defrag_offset is set. but skb->sk and
> > skb->ip_defrag_offset are same union member. so that
> > frag->sk is not NULL.
> > Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> > defragmented packet is fragmented.
>
> Have you tested this patch ?
>
> Moving back ip_defrag_offset is conflicting with the rbnode !
>
> A more correct fix would be to properly clear skb->sk at reassembly.

Something like that :

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 88281fbce88ce8f1062b99594665766c2a5f5b74..e7227128df2c8fd54727c234f76043133809bd1e
100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -599,6 +599,7 @@ static int ip_frag_reasm(struct ipq *qp, struct
sk_buff *skb,
                        nextp = &fp->next;
                        fp->prev = NULL;
                        memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+                       fp->sk = NULL;
                        head->data_len += fp->len;
                        head->len += fp->len;
                        if (head->ip_summed != fp->ip_summed)

^ permalink raw reply

* Re: [PATCH bpf-next 0/4] tools/bpf: add bpftool net support
From: Alexei Starovoitov @ 2018-09-06 18:11 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180905235806.1536396-1-yhs@fb.com>

On Wed, Sep 05, 2018 at 04:58:02PM -0700, Yonghong Song wrote:
> As bpf usage becomes more pervasive, people starts to worry
> about their cpu and memory cost. On a particular host,
> people often wanted to know all running bpf programs
> and their attachment context. So they can relate
> a performance/memory anormly quickly to a particular bpf
> program or an application.
> 
> bpftool already provides a pretty good coverage for perf
> and cgroup related attachments. This patch set enabled
> to dump attachment info for xdp and tc bpf programs.
> 
> Currently, users can already use "ip link show <dev>" and
> "tc filter show dev <dev> ..." to dump bpf program attachment
> information for xdp and tc bpf programs. The main reason
> to implement such functionality in bpftool as well is for
> better user experience. We want the bpftool to be the
> ultimate tool for bpf introspection. The bpftool net
> implementation will only present necessary bpf attachment
> information to the user, ignoring most other ip/tc
> specific information.
> 
> For example, the below is a pretty json print for xdp
> and tc_filters.
> 
>   $ ./bpftool -jp net
>   [{
>         "xdp": [{
>                 "ifindex": 2,
>                 "devname": "eth0",
>                 "prog_id": 198
>             }
>         ],
>         "tc_filters": [{
>                 "ifindex": 2,
>                 "kind": "qdisc_htb",
>                 "name": "prefix_matcher.o:[cls_prefix_matcher_htb]",
>                 "prog_id": 111727,
>                 "tag": "d08fe3b4319bc2fd",
>                 "act": []
>             },{
>                 "ifindex": 2,
>                 "kind": "qdisc_clsact_ingress",
>                 "name": "fbflow_icmp",
>                 "prog_id": 130246,
>                 "tag": "3f265c7f26db62c9",
>                 "act": []
>             },{
>                 "ifindex": 2,
>                 "kind": "qdisc_clsact_egress",
>                 "name": "prefix_matcher.o:[cls_prefix_matcher_clsact]",
>                 "prog_id": 111726,
>                 "tag": "99a197826974c876"
>             },{
>                 "ifindex": 2,
>                 "kind": "qdisc_clsact_egress",
>                 "name": "cls_fg_dscp",
>                 "prog_id": 108619,
>                 "tag": "dc4630674fd72dcc",
>                 "act": []
>             },{
>                 "ifindex": 2,
>                 "kind": "qdisc_clsact_egress",
>                 "name": "fbflow_egress",
>                 "prog_id": 130245,
>                 "tag": "72d2d830d6888d2c"
>             }
>         ]
>     }
>   ]
> 
> Patch #1 synced kernel uapi header if_link.h to tools directory.
> Patch #2 moved tools/bpf/lib/bpf.c netlink related functions to
> a new file. Patch #3 implemented additional functions
> in libbpf which will be used in Patch #4.
> Patch #4 implemented bpftool net support to dump
> xdp and tc bpf program attachments.

Applied, Thanks

^ permalink raw reply

* Re: [PATCH net] ip: frags: fix crash in ip_do_fragment()
From: Eric Dumazet @ 2018-09-06 18:06 UTC (permalink / raw)
  To: ap420073
  Cc: David Miller, Peter Oskolkov, netdev, Pablo Neira Ayuso,
	Florian Westphal
In-Reply-To: <20180906175053.1906-1-ap420073@gmail.com>

On Thu, Sep 6, 2018 at 10:51 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> A kernel crash occurrs when defragmented packet is fragmented
> in ip_do_fragment().
> In defragment routine, skb_orphan() is called and
> skb->ip_defrag_offset is set. but skb->sk and
> skb->ip_defrag_offset are same union member. so that
> frag->sk is not NULL.
> Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
> defragmented packet is fragmented.

Have you tested this patch ?

Moving back ip_defrag_offset is conflicting with the rbnode !

A more correct fix would be to properly clear skb->sk at reassembly.

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: add missing executables to .gitignore
From: Alexei Starovoitov @ 2018-09-06 17:57 UTC (permalink / raw)
  To: Mauricio Vasquez B
  Cc: Alexei Starovoitov, Daniel Borkmann, Shuah Khan, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <1535990727-2778-1-git-send-email-mauricio.vasquez@polito.it>

On Mon, Sep 03, 2018 at 06:05:27PM +0200, Mauricio Vasquez B wrote:
> Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
> ---
>  tools/testing/selftests/bpf/.gitignore | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index 49938d72cf63..4d789c1e5167 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -19,3 +19,7 @@ test_btf
>  test_sockmap
>  test_lirc_mode2_user
>  get_cgroup_id_user
> +test_skb_cgroup_id_user
> +test_socket_cookie
> +test_cgroup_storage
> +test_select_reuseport

Applied, Thanks

^ permalink raw reply

* Re: [bpf-next V2 PATCH 0/3] XDP micro optimizations for redirect
From: Alexei Starovoitov @ 2018-09-06 17:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, Daniel Borkmann
In-Reply-To: <153596124879.4754.16274555878412007253.stgit@firesoul>

On Mon, Sep 03, 2018 at 09:54:52AM +0200, Jesper Dangaard Brouer wrote:
> This patchset contains XDP micro optimizations for the redirect core.
> These are not functional changes.  The optimizations revolve around
> getting the compiler to layout the code in a way that reflect how XDP
> redirect is used.
> 
> Today the compiler chooses to inline and uninline (static C functions)
> in a suboptimal way, compared to how XDP redirect can be used. Perf
> top clearly shows that almost everything gets inlined into the
> function call xdp_do_redirect.
> 
> The way the compiler chooses to inlines, does not reflect how XDP
> redirect is used, as the compile cannot know this.

Applied, Thanks

^ permalink raw reply

* [PATCH net] ip: frags: fix crash in ip_do_fragment()
From: Taehee Yoo @ 2018-09-06 17:50 UTC (permalink / raw)
  To: davem, posk, netdev; +Cc: ap420073, pablo, fw, edumazet

A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.

test commands:
   %iptables -t nat -I POSTROUTING -j MASQUERADE
   %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000

splat looks like:
[  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
[  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
[  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
[  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
[  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
[  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
[  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
[  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
[  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
[  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
[  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
[  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
[  261.198158] Call Trace:
[  261.199018]  ? dst_output+0x180/0x180
[  261.205011]  ? save_trace+0x300/0x300
[  261.209018]  ? ip_copy_metadata+0xb00/0xb00
[  261.213034]  ? sched_clock_local+0xd4/0x140
[  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
[  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
[  261.227014]  ? find_held_lock+0x39/0x1c0
[  261.233008]  ip_finish_output+0x51d/0xb50
[  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
[  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
[  261.250152]  ? rcu_is_watching+0x77/0x120
[  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
[  261.261033]  ? nf_hook_slow+0xb1/0x160
[  261.265007]  ip_output+0x1c7/0x710
[  261.269005]  ? ip_mc_output+0x13f0/0x13f0
[  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
[  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
[  261.282996]  ? nf_hook_slow+0xb1/0x160
[  261.287007]  raw_sendmsg+0x21f9/0x4420
[  261.291008]  ? dst_output+0x180/0x180
[  261.297003]  ? sched_clock_cpu+0x126/0x170
[  261.301003]  ? find_held_lock+0x39/0x1c0
[  261.306155]  ? stop_critical_timings+0x420/0x420
[  261.311004]  ? check_flags.part.36+0x450/0x450
[  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
[  261.326142]  ? cyc2ns_read_end+0x10/0x10
[  261.330139]  ? raw_bind+0x280/0x280
[  261.334138]  ? sched_clock_cpu+0x126/0x170
[  261.338995]  ? check_flags.part.36+0x450/0x450
[  261.342991]  ? __lock_acquire+0x4500/0x4500
[  261.348994]  ? inet_sendmsg+0x11c/0x500
[  261.352989]  ? dst_output+0x180/0x180
[  261.357012]  inet_sendmsg+0x11c/0x500
[ ... ]

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 include/linux/skbuff.h | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 17a13e4785fc..2eb115be5bf6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -676,16 +676,13 @@ struct sk_buff {
 				 * UDP receive path is one user.
 				 */
 				unsigned long		dev_scratch;
+				int			ip_defrag_offset;
 			};
 		};
 		struct rb_node		rbnode; /* used in netem, ip4 defrag, and tcp stack */
 		struct list_head	list;
 	};
-
-	union {
-		struct sock		*sk;
-		int			ip_defrag_offset;
-	};
+	struct sock		*sk;
 
 	union {
 		ktime_t		tstamp;
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] [RFC v2] Drop all 00-INDEX files from Documentation/
From: Daniel Vetter @ 2018-09-06 21:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mark Rutland, Linux MIPS Mailing List,
	Linux Fbdev development list, Jan Kandziora,
	Radim Krčmář, kvm, Linux Doc Mailing List,
	Peter Zijlstra, James Hogan, Mark Brown, Henrik Austad,
	Will Deacon, dri-devel, Masahiro Yamada, devicetree,
	Paul Mackerras, Henrik Austad, Pavel Machek, H. Peter Anvin,
	Evgeniy Polyakov, Ian Kent, linux-s390, Paul Moore
In-Reply-To: <20180906120120.3dd1fc91@gandalf.local.home>

On Thu, Sep 6, 2018 at 6:01 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 6 Sep 2018 09:58:04 -0600
> Jonathan Corbet <corbet@lwn.net> wrote:
>
>> Thanks,
>>
>> jon  (who is increasingly inclined to apply this patch)
>
> As Colin Kaepernick now says... "Just do it!"
>
> ;-)

+1

But I'm biased, I'm part of the party that is responsible for the new
shiny documentation system ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [RFC/PATCH] net: nixge: Add PHYLINK support
From: Moritz Fischer @ 2018-09-06 16:36 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev, David S. Miller, Alex Williams,
	Linux Kernel Mailing List
In-Reply-To: <20180905123101.GA26739@lunn.ch>

Andrew,

On Wed, Sep 5, 2018 at 5:31 AM, Andrew Lunn <andrew@lunn.ch> wrote:
>> Let me check, it seems there is a register that indicates whether the MAC can
>> do either 1G or 10G. I might be able to use that for some of the above, but
>> there is not really much in terms of writable registers there.
>
> Can the MAC do 10 or 100? At the moment, you don't have anything
> stopping the PHY anto-neg'ing 10Half. If the MAC does not fully
> implement standard Ethernet, you need to tell the PHY driver about
> this. That is what the validate call is about. phylink and phylib
> knows what the PHY supports. It passes that list to the validate
> call. You need to then remove all the modes the MAC does not support.

Makes sense, thanks for clarifying. I'll do some more research on this.
>
>> It's like a DMA engine with a bit of MDIO on the side. Let me see if
>> I can make it look less weird with that. If not I'll go with a
>> comment explaining that there isn't much to do for the MLO_AN_PHY
>> case and the MLO_FIXED cases?
>
> You again need to configure the MAC to the selected speed, duplex,
> etc. If the link is down, you want to disable the MAC. You need this
> for both MLO_AN_PHY and MLO_FIXED, because both specify speeds,
> duplex, etc.

I'll look into it.

Moritz

^ permalink raw reply

* Re: [PATCH mlx5-next v1 05/15] net/mlx5: Break encap/decap into two separated flow table creation flags
From: Or Gerlitz @ 2018-09-06 16:13 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Doug Ledford, RDMA mailing list, Ariel Levkovich,
	Mark Bloch, Or Gerlitz, Saeed Mahameed, linux-netdev
In-Reply-To: <20180906040950.GX2977@mtr-leonro.mtl.com>

On Thu, Sep 6, 2018 at 7:09 AM, Leon Romanovsky <leonro@mellanox.com> wrote:
> On Thu, Sep 06, 2018 at 12:37:17AM +0300, Or Gerlitz wrote:
>> On Wed, Sep 5, 2018 at 9:11 PM, Leon Romanovsky  wrote:
>> > On Wed, Sep 05, 2018 at 10:38:00AM -0600, Jason Gunthorpe wrote:
>> >> On Wed, Sep 05, 2018 at 08:10:25AM +0300, Leon Romanovsky wrote:
>> >> > > > -       int en_encap_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN);
>> >> > > > +       int en_encap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP);
>> >> > > > +       int en_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>> >> > >
>> >> > > Yuk, please don't use !!.
>> >> > >
>> >> > >   bool en_decap = flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP;
>> >> >
>> >> > We need to provide en_encap and en_decap as an input to MLX5_SET(...)
>> >> > which is passed to FW as 0 or 1.
>> >> >
>> >> > Boolean type is declared in C as int and treated as zero for false
>> >> > and any other value for true,
>> >>
>> >> No, that isn't right, the kernel uses C99's _Bool intrinsic type, which
>> >> is guaranteed to only hold 0 or 1 by the compiler.
>> >>
>> >> See types.h:
>> >>
>> >> typedef _Bool                   bool;
>> >
>> > Exciting, it took me a while to find C99 standard and relevant 6.3.1.2.
>> > Anyway, this patch didn't change previous functionality, which used "!!"
>> > convention.
>>
>> so? if we didn't do things properly prior to the patch, why not fixing it along
>> with the patch? lets fix
>
> Or,
>
> What exactly "to fix"? Both code lines:
> 1. Have correct syntax
> 2. Implement proper C99
> 3. Give same compiler code
> 4. Have same readability
>
> There is nothing to fix.
>
> And this patch is already merged, so if you truly care about this,
> please go ahead and prepare patch for whole driver, or better for
> whole kernel.

slow down, I was just supporting Jason's suggestion and said there's
no reason not to follow it.

If you don't agree with Jason, argue with him.

>  kernel git:(rdma-next) git grep "\!\!" |wc -l
> 8125

^ permalink raw reply

* [PATCH net-next, net v2] net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
From: Vakul Garg @ 2018-09-06 16:11 UTC (permalink / raw)
  To: netdev; +Cc: borisp, aviadye, davejwatson, davem, doronrk, Vakul Garg

tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
---
Changes since v1:
	- Added 'Fixes:' tag.
	- Marking that patch applies to both 'net-next' & 'net' branches
	- Resending after correcting system time.
	- No code changes as such

 net/tls/tls_sw.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index be4f2e990f9f..2dad3dc7be60 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -263,6 +263,9 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
 			 &ctx->sg_encrypted_num_elem,
 			 &ctx->sg_encrypted_size, 0);
 
+	if (rc == -ENOSPC)
+		ctx->sg_encrypted_num_elem = ARRAY_SIZE(ctx->sg_encrypted_data);
+
 	return rc;
 }
 
@@ -276,6 +279,9 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
 			 &ctx->sg_plaintext_num_elem, &ctx->sg_plaintext_size,
 			 tls_ctx->pending_open_record_frags);
 
+	if (rc == -ENOSPC)
+		ctx->sg_plaintext_num_elem = ARRAY_SIZE(ctx->sg_plaintext_data);
+
 	return rc;
 }
 
-- 
2.13.6

^ permalink raw reply related

* [PATCH net-next] net: sched: cls_flower: dump offload count value
From: Vlad Buslov @ 2018-09-06 15:37 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Change flower in_hw_count type to fixed-size u32 and dump it as
TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
blocks and re-offload functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h    | 2 +-
 include/uapi/linux/pkt_cls.h | 2 ++
 net/sched/cls_flower.c       | 5 ++++-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..d68ac55539a5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
 }
 
 static inline void
-tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
+tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
 			  u32 *flags, bool add)
 {
 	if (add) {
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index be382fb0592d..2824fb7ed1c9 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -483,6 +483,8 @@ enum {
 	TCA_FLOWER_KEY_ENC_OPTS,
 	TCA_FLOWER_KEY_ENC_OPTS_MASK,
 
+	TCA_FLOWER_IN_HW_COUNT,		/* be32 */
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6fd9bdd93796..4b8dd37dd4f8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -98,7 +98,7 @@ struct cls_fl_filter {
 	struct list_head list;
 	u32 handle;
 	u32 flags;
-	unsigned int in_hw_count;
+	u32 in_hw_count;
 	struct rcu_work rwork;
 	struct net_device *hw_dev;
 };
@@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
+		goto nla_put_failure;
+
 	if (tcf_exts_dump(skb, &f->exts))
 		goto nla_put_failure;
 
-- 
2.7.5

^ permalink raw reply related

* [PATCH iproute2 3/3] bridge: fix vlan show formatting
From: Stephen Hemminger @ 2018-09-06 15:30 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180906153057.5379-1-stephen@networkplumber.org>

The output of vlan show was broken previous change to use json_print.
Clean the code up and return to original format.

Note: the JSON syntax has changed to make the bridge vlan
show more like other outputs (e.g. ip -j li show).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 bridge/br_common.h |  2 +-
 bridge/link.c      |  6 ++---
 bridge/vlan.c      | 61 +++++++++++++++++++++++++++-------------------
 3 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index 2f1cb8fd9f3d..69665fde32b6 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -6,7 +6,7 @@
 #define MDB_RTR_RTA(r) \
 		((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32))))
 
-extern void print_vlan_info(FILE *fp, struct rtattr *tb);
+extern void print_vlan_info(struct rtattr *tb, int ifindex);
 extern int print_linkinfo(const struct sockaddr_nl *who,
 			  struct nlmsghdr *n,
 			  void *arg);
diff --git a/bridge/link.c b/bridge/link.c
index 8d89aca2e638..a5ee9a5c58e6 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -161,7 +161,7 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
  * This is reported by HW devices that have some bridging
  * capabilities.
  */
-static void print_af_spec(FILE *fp, struct rtattr *attr)
+static void print_af_spec(struct rtattr *attr, int ifindex)
 {
 	struct rtattr *aftb[IFLA_BRIDGE_MAX+1];
 
@@ -174,7 +174,7 @@ static void print_af_spec(FILE *fp, struct rtattr *attr)
 		return;
 
 	if (aftb[IFLA_BRIDGE_VLAN_INFO])
-		print_vlan_info(fp, aftb[IFLA_BRIDGE_VLAN_INFO]);
+		print_vlan_info(aftb[IFLA_BRIDGE_VLAN_INFO], ifindex);
 }
 
 int print_linkinfo(const struct sockaddr_nl *who,
@@ -229,7 +229,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
 		print_protinfo(fp, tb[IFLA_PROTINFO]);
 
 	if (tb[IFLA_AF_SPEC])
-		print_af_spec(fp, tb[IFLA_AF_SPEC]);
+		print_af_spec(tb[IFLA_AF_SPEC], ifi->ifi_index);
 
 	print_string(PRINT_FP, NULL, "%s", "\n");
 	close_json_object();
diff --git a/bridge/vlan.c b/bridge/vlan.c
index 19a36b804069..bdce55ae4e14 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -252,10 +252,18 @@ static int filter_vlan_check(__u16 vid, __u16 flags)
 	return 1;
 }
 
-static void print_vlan_port(FILE *fp, int ifi_index)
+static void open_vlan_port(int ifi_index)
 {
-	print_string(PRINT_ANY, NULL, "%s",
+	open_json_object(NULL);
+	print_string(PRINT_ANY, "ifname", "%s",
 		     ll_index_to_name(ifi_index));
+	open_json_array(PRINT_JSON, "vlans");
+}
+
+static void close_vlan_port(void)
+{
+	close_json_array(PRINT_JSON, NULL);
+	close_json_object();
 }
 
 static void print_range(const char *name, __u16 start, __u16 id)
@@ -278,7 +286,7 @@ static void print_vlan_tunnel_info(FILE *fp, struct rtattr *tb, int ifindex)
 	__u32 last_tunid_start = 0;
 
 	if (!filter_vlan)
-		print_vlan_port(fp, ifindex);
+		open_vlan_port(ifindex);
 
 	open_json_array(PRINT_JSON, "tunnel");
 	for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
@@ -323,18 +331,20 @@ static void print_vlan_tunnel_info(FILE *fp, struct rtattr *tb, int ifindex)
 			continue;
 
 		if (filter_vlan)
-			print_vlan_port(fp, ifindex);
+			open_vlan_port(ifindex);
 
 		open_json_object(NULL);
 		print_range("vlan", last_vid_start, tunnel_vid);
 		print_range("tunid", last_tunid_start, tunnel_id);
 		close_json_object();
 
-		if (!is_json_context())
-			fprintf(fp, "\n");
-
+		print_string(PRINT_FP, NULL, "%s", _SL_);
+		if (filter_vlan)
+			close_vlan_port();
 	}
-	close_json_array(PRINT_JSON, NULL);
+
+	if (!filter_vlan)
+		close_vlan_port();
 }
 
 static int print_vlan_tunnel(const struct sockaddr_nl *who,
@@ -421,8 +431,8 @@ static int print_vlan(const struct sockaddr_nl *who,
 		return 0;
 	}
 
-	print_vlan_port(fp, ifm->ifi_index);
-	print_vlan_info(fp, tb[IFLA_AF_SPEC]);
+	print_vlan_info(tb[IFLA_AF_SPEC], ifm->ifi_index);
+	print_string(PRINT_FP, NULL, "%s", _SL_);
 
 	fflush(fp);
 	return 0;
@@ -430,11 +440,16 @@ static int print_vlan(const struct sockaddr_nl *who,
 
 static void print_vlan_flags(__u16 flags)
 {
+	if (flags == 0)
+		return;
+
+	open_json_array(PRINT_JSON, "flags");
 	if (flags & BRIDGE_VLAN_INFO_PVID)
-		print_null(PRINT_ANY, "pvid", " %s", "PVID");
+		print_string(PRINT_ANY, NULL, " %s", "PVID");
 
 	if (flags & BRIDGE_VLAN_INFO_UNTAGGED)
-		print_null(PRINT_ANY, "untagged", " %s", "untagged");
+		print_string(PRINT_ANY, NULL, " %s", "Egress Untagged");
+	close_json_array(PRINT_JSON, NULL);
 }
 
 static void print_one_vlan_stats(const struct bridge_vlan_xstats *vstats)
@@ -461,6 +476,7 @@ static void print_vlan_stats_attr(struct rtattr *attr, int ifindex)
 {
 	struct rtattr *brtb[LINK_XSTATS_TYPE_MAX+1];
 	struct rtattr *i, *list;
+	const char *ifname;
 	int rem;
 
 	parse_rtattr(brtb, LINK_XSTATS_TYPE_MAX, RTA_DATA(attr),
@@ -471,13 +487,12 @@ static void print_vlan_stats_attr(struct rtattr *attr, int ifindex)
 	list = brtb[LINK_XSTATS_TYPE_BRIDGE];
 	rem = RTA_PAYLOAD(list);
 
-	open_json_object(NULL);
+	ifname = ll_index_to_name(ifindex);
+	open_json_object(ifname);
 
-	print_color_string(PRINT_ANY, COLOR_IFNAME,
-			   "dev", "%-16s",
-			   ll_index_to_name(ifindex));
+	print_color_string(PRINT_FP, COLOR_IFNAME,
+			   NULL, "%-16s", ifname);
 
-	open_json_array(PRINT_JSON, "xstats");
 	for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
 		const struct bridge_vlan_xstats *vstats = RTA_DATA(i);
 
@@ -494,7 +509,6 @@ static void print_vlan_stats_attr(struct rtattr *attr, int ifindex)
 
 		print_one_vlan_stats(vstats);
 	}
-	close_json_array(PRINT_ANY, "\n");
 	close_json_object();
 
 }
@@ -623,16 +637,13 @@ static int vlan_show(int argc, char **argv)
 	return 0;
 }
 
-void print_vlan_info(FILE *fp, struct rtattr *tb)
+void print_vlan_info(struct rtattr *tb, int ifindex)
 {
 	struct rtattr *i, *list = tb;
 	int rem = RTA_PAYLOAD(list);
 	__u16 last_vid_start = 0;
 
-	if (!is_json_context())
-		fprintf(fp, "%s", _SL_);
-
-	open_json_array(PRINT_JSON, "vlan");
+	open_vlan_port(ifindex);
 
 	for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
 		struct bridge_vlan_info *vinfo;
@@ -656,9 +667,9 @@ void print_vlan_info(FILE *fp, struct rtattr *tb)
 
 		print_vlan_flags(vinfo->flags);
 		close_json_object();
+		print_string(PRINT_FP, NULL, "%s", _SL_);
 	}
-
-	close_json_array(PRINT_ANY, "\n");
+	close_vlan_port();
 }
 
 int do_vlan(int argc, char **argv)
-- 
2.17.1

^ permalink raw reply related

* [PATCH iproute2 2/3] bridge: use print_json for some outputs
From: Stephen Hemminger @ 2018-09-06 15:30 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180906153057.5379-1-stephen@networkplumber.org>

Rather than using is_json_context(), use the print_string functions
which handle both cases.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 bridge/mdb.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 9bdef0262c54..cc1b4547865c 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -131,15 +131,8 @@ static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
 	if (n->nlmsg_type == RTM_DELMDB)
 		print_bool(PRINT_ANY, "deleted", "Deleted ", true);
 
-
-	if (is_json_context()) {
-		print_int(PRINT_JSON, "index", NULL, ifindex);
-		print_string(PRINT_JSON, "dev", NULL, dev);
-	} else {
-		fprintf(f, "%u: ", ifindex);
-		color_fprintf(f, COLOR_IFNAME, "%s ", dev);
-	}
-
+	print_int(PRINT_ANY, "index", "%u: ", ifindex);
+	print_color_string(PRINT_ANY, COLOR_IFNAME, "dev", "%s ", dev);
 	print_string(PRINT_ANY, "port", " %s ",
 		     ll_index_to_name(e->ifindex));
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH iproute2 1/3] bridge: minor change to mdb print
From: Stephen Hemminger @ 2018-09-06 15:30 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

Get port ifname once rather than on both sides of if(is_json_context).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 bridge/mdb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index f38dc67c849a..9bdef0262c54 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -208,19 +208,19 @@ static void print_router_entries(FILE *fp, struct nlmsghdr *n,
 	} else {
 		struct rtattr *i = RTA_DATA(router);
 		uint32_t *port_ifindex = RTA_DATA(i);
+		const char *port_name = ll_index_to_name(*port_ifindex);
 
 		if (is_json_context()) {
 			open_json_array(PRINT_JSON, brifname);
 			open_json_object(NULL);
 
 			print_string(PRINT_JSON, "port", NULL,
-				     ll_index_to_name(*port_ifindex));
+				     port_name);
 			close_json_object();
 			close_json_array(PRINT_JSON, NULL);
 		} else {
 			fprintf(fp, "router port dev %s master %s\n",
-				ll_index_to_name(*port_ifindex),
-				brifname);
+				port_name, brifname);
 		}
 	}
 	close_json_array(PRINT_JSON, NULL);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH net-next] net: dsa: b53: Fix build with B53_SRAB enabled and not B53_SERDES
From: Andrew Lunn @ 2018-09-06 19:55 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, Vivien Didelot, David S. Miller, open list
In-Reply-To: <20180906184245.31442-1-f.fainelli@gmail.com>

On Thu, Sep 06, 2018 at 11:42:45AM -0700, Florian Fainelli wrote:
> In case B53_SRAB is enabled, but not B53_SERDES, we can get the
> following linking error:
> 
> ERROR: "b53_serdes_init" [drivers/net/dsa/b53/b53_srab.ko] undefined!
> 
> We also need to ifdef the body of b53_srab_serdes_map_lane() since it
> would not be used when B53_SERDES is disabled and that would produce a
> warning.
> 
> Fixes: 0e01491de646 ("net: dsa: b53: Add SerDes support")
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: KASAN: use-after-free Read in bpf_tcp_close (2)
From: John Fastabend @ 2018-09-06 19:53 UTC (permalink / raw)
  To: syzbot, ast, daniel, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <000000000000f746f705752f95a6@google.com>

On 09/06/2018 01:22 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    11f026b4e306 libbpf: Remove the duplicate checking of func..
> git tree:       bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=153c55ca400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7e83258d6e0156
> dashboard link: https://syzkaller.appspot.com/bug?extid=579adfa56843da894bc5
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+579adfa56843da894bc5@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
> BUG: KASAN: use-after-free in virt_spin_lock arch/x86/include/asm/qspinlock.h:65 [inline]
> BUG: KASAN: use-after-free in native_queued_spin_lock_slowpath+0x189/0x1220 kernel/locking/qspinlock.c:305
> Read of size 4 at addr ffff8801bfd651a0 by task syz-executor1/9753
> 
> CPU: 1 PID: 9753 Comm: syz-executor1 Not tainted 4.19.0-rc2+ #89
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
>  kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
>  atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
>  virt_spin_lock arch/x86/include/asm/qspinlock.h:65 [inline]
>  native_queued_spin_lock_slowpath+0x189/0x1220 kernel/locking/qspinlock.c:305
>  pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:679 [inline]
>  queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:32 [inline]
>  queued_spin_lock include/asm-generic/qspinlock.h:88 [inline]
>  do_raw_spin_lock+0x1a7/0x200 kernel/locking/spinlock_debug.c:113
>  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
>  _raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:168
>  bpf_tcp_close+0x68e/0x10d0 kernel/bpf/sockmap.c:349
>  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:428
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:457
>  __sock_release+0xd7/0x250 net/socket.c:579
>  sock_close+0x19/0x20 net/socket.c:1139
>  __fput+0x38a/0xa40 fs/file_table.c:278
>  ____fput+0x15/0x20 fs/file_table.c:309
>  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
>  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
>  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x410c51
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 19 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:00007ffeacb37d60 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000410c51
> RDX: 0000000000000000 RSI: 0000000000731f70 RDI: 0000000000000007
> RBP: 0000000000000000 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 00007ffeacb37c90 R11: 0000000000000293 R12: 0000000000000010
> R13: 0000000000022aa9 R14: 000000000000004d R15: badc0ffeebadface
> 
> Allocated by task 9754:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
>  kmem_cache_alloc_trace+0x152/0x730 mm/slab.c:3620
>  kmalloc include/linux/slab.h:513 [inline]
>  kzalloc include/linux/slab.h:707 [inline]
>  sock_map_alloc+0x209/0x430 kernel/bpf/sockmap.c:1653
>  find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
>  map_create+0x3bd/0x1100 kernel/bpf/syscall.c:509
>  __do_sys_bpf kernel/bpf/syscall.c:2356 [inline]
>  __se_sys_bpf kernel/bpf/syscall.c:2333 [inline]
>  __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2333
>  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Freed by task 13:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>  set_track mm/kasan/kasan.c:460 [inline]
>  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
>  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>  __cache_free mm/slab.c:3498 [inline]
>  kfree+0xd9/0x210 mm/slab.c:3813
>  sock_map_remove_complete kernel/bpf/sockmap.c:1561 [inline]
>  sock_map_free+0x428/0x570 kernel/bpf/sockmap.c:1756
>  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:290
>  process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
>  worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
>  kthread+0x35a/0x420 kernel/kthread.c:246
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
> 
> The buggy address belongs to the object at ffff8801bfd65080
>  which belongs to the cache kmalloc-512 of size 512
> The buggy address is located 288 bytes inside of
>  512-byte region [ffff8801bfd65080, ffff8801bfd65280)
> The buggy address belongs to the page:
> page:ffffea0006ff5940 count:1 mapcount:0 mapping:ffff8801dac00940 index:0xffff8801bfd65300
> flags: 0x2fffc0000000100(slab)
> raw: 02fffc0000000100 ffffea000702a488 ffffea00075c4148 ffff8801dac00940
> raw: ffff8801bfd65300 ffff8801bfd65080 0000000100000001 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff8801bfd65080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801bfd65100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff8801bfd65180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                ^
>  ffff8801bfd65200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801bfd65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.

Introduced here (by a new reference to the map from the
tcp close context where a parallel free'ing of the map which
does not have a RCU grace period before free),

commit 585f5a6252ee43ec8feeee07387e3fcc7e8bb292
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Thu Aug 16 21:49:10 2018 +0200

    bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist

I'll have a fix shortly.

Thanks,
John

^ permalink raw reply

* Adding path to the photos
From: Leo Young @ 2018-09-06 10:41 UTC (permalink / raw)
  To: netdev

We provide image editing such as: image cutting out, retouching, masking
etc.

Here are the details what we can provide to your photos.
Jewelry retouching for your photos
Fashion retouching for your photos
Cutting out for your photos
Clipping path for your photos
Deep etch process for your photos
Image masking for your photos
Portrait retouching for your photos

We provide test editing for your photos.
let us know if interested.

Thanks,
Leo Young

^ permalink raw reply

* Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables
From: Eric Dumazet @ 2018-09-06 19:33 UTC (permalink / raw)
  To: Olof Johansson
  Cc: David Miller, Neil Horman, Marcelo Ricardo Leitner,
	Vladislav Yasevich, Herbert Xu, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, linux-crypto, LKML, linux-sctp, netdev,
	linux-decnet-user, kernel-team
In-Reply-To: <20180906192034.8467-1-olof@lixom.net>

On Thu, Sep 6, 2018 at 12:21 PM Olof Johansson <olof@lixom.net> wrote:
>
> Today these are all global shared variables per protocol, and in
> particular tcp_memory_allocated can get hot on a system with
> large number of CPUs and a substantial number of connections.
>
> Moving it over to a per-cpu variable makes it significantly cheaper,
> and the added overhead when summing up the percpu copies is still smaller
> than the cost of having a hot cacheline bouncing around.

I am curious. We never noticed contention on this variable, at least for TCP.

Please share some numbers with us.

^ permalink raw reply

* Re: [pull request][net 00/10] Mellanox, mlx5 fixes 2018-09-05
From: David Miller @ 2018-09-06 14:57 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20180906040952.29684-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Wed,  5 Sep 2018 21:09:42 -0700

> This pull request contains some fixes for mlx5 etherent netdevice and
> core driver.

Pulled.

> 
> For -stable v4.9:
> ('net/mlx5: Fix debugfs cleanup in the device init/remove flow')
> 
> For -stable v4.12:
> ("net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tables")
> 
> For -stable v4.13:
> ("net/mlx5: Fix use-after-free in self-healing flow")
> 
> For -stable v4.14:
> ("net/mlx5: Check for error in mlx5_attach_interface")
> 
> For -stable v4.15:
> ("net/mlx5: Fix not releasing read lock when adding flow rules")
> 
> For -stable v4.17:
> ("net/mlx5: Fix possible deadlock from lockdep when adding fte to fg")
> 
> For -stable v4.18:
> ("net/mlx5: Use u16 for Work Queue buffer fragment size")

And will queue these up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next v3 0/5] net: dsa: b53: SerDes support
From: David Miller @ 2018-09-06 14:51 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <20180905194215.29301-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Wed,  5 Sep 2018 12:42:10 -0700

> This patch series adds support for the SerDes found on NorthStar Plus
> (NSP) which allows us to use the SFP port on the BCM958625HR board (and
> other similar designs).
> 
> Changes in v3:
> 
> - properly hunk the request_threaded_irq() bits into patch #2
> 
> Changes in v2:
> 
> - migrate to threaded interrupt (Andrew)
> - fixed a case where MLO_AN_FIXED's mac_config would still call into
>   the serdes_config callback
> - added an additional check on the phylink interface in mac_config
> - default to ARCH_BCM_NSP instead of ARCH_BCM_IPROC which is really
>   the NSP Kconfig bit we want

Series applied, thanks Florian.

^ permalink raw reply

* [PATCH v4 2/3] IB/ipoib: Use dev_port to expose network interface port numbers
From: Arseny Maslennikov @ 2018-09-06 14:51 UTC (permalink / raw)
  To: linux-rdma; +Cc: Arseny Maslennikov, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180906145112.29245-1-ar@cs.msu.ru>

Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.

Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.

The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2a0 (`net/mlx4_en: Expose port number through sysfs').

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index e3d28f9ad9c0..30f840f874b3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1880,6 +1880,8 @@ static int ipoib_parent_init(struct net_device *ndev)
 	       sizeof(union ib_gid));
 
 	SET_NETDEV_DEV(priv->dev, priv->ca->dev.parent);
+	priv->dev->dev_port = priv->port - 1;
+	/* Let's set this one too for backwards compatibility. */
 	priv->dev->dev_id = priv->port - 1;
 
 	return 0;
-- 
2.19.0.rc2

^ permalink raw reply related

* [PATCH v4 1/3] Documentation/ABI: document /sys/class/net/*/dev_port
From: Arseny Maslennikov @ 2018-09-06 14:51 UTC (permalink / raw)
  To: linux-rdma; +Cc: Arseny Maslennikov, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180906145112.29245-1-ar@cs.msu.ru>

The sysfs field was introduced 4 years ago along with fixes to various
drivers that erroneously used `dev_id' for that purpose, but it was not
properly documented anywhere.
See commit v3.14-rc3-739-g3f85944fe207.

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
---
 Documentation/ABI/testing/sysfs-class-net | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index 2f1788111cd9..ec2232f6a949 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -91,6 +91,24 @@ Description:
 		stacked (e.g: VLAN interfaces) but still have the same MAC
 		address as their parent device.
 
+What:		/sys/class/net/<iface>/dev_port
+Date:		February 2014
+KernelVersion:	3.15
+Contact:	netdev@vger.kernel.org
+Description:
+		Indicates the port number of this network device, formatted
+		as a decimal value. Some NICs have multiple independent ports
+		on the same PCI bus, device and function. This attribute allows
+		userspace to distinguish the respective interfaces.
+
+		Note: some device drivers started to use 'dev_id' for this
+		purpose since long before 3.15 and have not adopted the new
+		attribute ever since. To query the port number, some tools look
+		exclusively at 'dev_port', while others only consult 'dev_id'.
+		If a network device has multiple client adapter ports as
+		described in the previous paragraph and does not set this
+		attribute to its port number, it's a kernel bug.
+
 What:		/sys/class/net/<iface>/dormant
 Date:		March 2006
 KernelVersion:	2.6.17
-- 
2.19.0.rc2

^ permalink raw reply related

* [PATCH v4 0/3] IB/ipoib: Use dev_port to disambiguate port numbers
From: Arseny Maslennikov @ 2018-09-06 14:51 UTC (permalink / raw)
  To: linux-rdma; +Cc: Arseny Maslennikov, Doug Ledford, Jason Gunthorpe, netdev

Pre-3.15 userspace had trouble distinguishing different ports
of a NIC on a single PCI bus/device/function. To solve this,
a sysfs field `dev_port' was introduced quite a while ago
(commit v3.14-rc3-739-g3f85944fe207), and some relevant device
drivers were fixed to use it, but not in case of IPoIB.

The convention for some reason never got documented in the kernel, but
was immediately adopted by userspace (notably udev[1][2], biosdevname[3])

1/3 documents the sysfs field — that's why I'm CC-ing netdev.

This series was tested on and applies to 4.19-rc2.

[1] https://lists.freedesktop.org/archives/systemd-devel/2014-June/020788.html
[2] https://lists.freedesktop.org/archives/systemd-devel/2014-July/020804.html
[3] https://github.com/CloudAutomationNTools/biosdevname/blob/c795d51dd93a5309652f0d635f12a3ecfabfaa72/src/eths.c#L38

v1->v2: replace a line instead of inserting and then removing.
v2->v3: restore both attributes, output a notice of deprecation to kmsg.
v3->v4: style adjustments, join the deprecation notice to single line.

Arseny Maslennikov (3):
  Documentation/ABI: document /sys/class/net/*/dev_port
  IB/ipoib: Use dev_port to expose network interface port numbers
  IB/ipoib: Log sysfs 'dev_id' accesses from userspace

 Documentation/ABI/testing/sysfs-class-net | 18 +++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 +++++++++++++++++++++++
 2 files changed, 51 insertions(+)

-- 
2.19.0.rc2

^ permalink raw reply

* [PATCH v4 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Arseny Maslennikov @ 2018-09-06 14:51 UTC (permalink / raw)
  To: linux-rdma; +Cc: Arseny Maslennikov, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180906145112.29245-1-ar@cs.msu.ru>

Some tools may currently be using only the deprecated attribute;
let's print an elaborate and clear deprecation notice to kmsg.

To do that, we have to replace the whole sysfs file, since we inherit
the original one from netdev.

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 31 +++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 30f840f874b3..74732726ec6f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2386,6 +2386,35 @@ int ipoib_add_pkey_attr(struct net_device *dev)
 	return device_create_file(&dev->dev, &dev_attr_pkey);
 }
 
+/*
+ * We erroneously exposed the iface's port number in the dev_id
+ * sysfs field long after dev_port was introduced for that purpose[1],
+ * and we need to stop everyone from relying on that.
+ * Let's overload the shower routine for the dev_id file here
+ * to gently bring the issue up.
+ *
+ * [1] https://www.spinics.net/lists/netdev/msg272123.html
+ */
+static ssize_t dev_id_show(struct device *dev,
+			   struct device_attribute *attr, char *buf)
+{
+	struct net_device *ndev = to_net_dev(dev);
+
+	if (ndev->dev_id == ndev->dev_port)
+		netdev_info_once(ndev,
+			"\"%s\" wants to know my dev_id. Should it look at dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more info.\n",
+			current->comm);
+
+	return sprintf(buf, "%#x\n", ndev->dev_id);
+}
+static DEVICE_ATTR_RO(dev_id);
+
+int ipoib_intercept_dev_id_attr(struct net_device *dev)
+{
+	device_remove_file(&dev->dev, &dev_attr_dev_id);
+	return device_create_file(&dev->dev, &dev_attr_dev_id);
+}
+
 static struct net_device *ipoib_add_port(const char *format,
 					 struct ib_device *hca, u8 port)
 {
@@ -2427,6 +2456,8 @@ static struct net_device *ipoib_add_port(const char *format,
 	 */
 	ndev->priv_destructor = ipoib_intf_free;
 
+	if (ipoib_intercept_dev_id_attr(ndev))
+		goto sysfs_failed;
 	if (ipoib_cm_add_mode_attr(ndev))
 		goto sysfs_failed;
 	if (ipoib_add_pkey_attr(ndev))
-- 
2.19.0.rc2

^ permalink raw reply related

* Re: [PATCH net-next] qed*: Utilize FW 8.37.7.0
From: David Miller @ 2018-09-06 14:44 UTC (permalink / raw)
  To: denis.bolotin; +Cc: netdev, ariel.elior
In-Reply-To: <20180905153555.2661-1-denis.bolotin@cavium.com>

From: Denis Bolotin <denis.bolotin@cavium.com>
Date: Wed, 5 Sep 2018 18:35:55 +0300

> This patch adds a new qed firmware with fixes and support for new features.
> 
> Fixes:
> - Fix a rare case of device crash with iWARP, iSCSI or FCoE offload.
> - Fix GRE tunneled traffic when iWARP offload is enabled.
> - Fix RoCE failure in ib_send_bw when using inline data.
> - Fix latency optimization flow for inline WQEs.
> - BigBear 100G fix
> 
> RDMA:
> - Reduce task context size.
> - Application page sizes above 2GB support.
> - Performance improvements.
> 
> ETH:
> - Tenant DCB support.
> - Replace RSS indirection table update interface.
> 
> Misc:
> - Debug Tools changes.
> 
> Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
> Signed-off-by: Ariel Elior <ariel.elior@cavium.com>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox