Netdev List
 help / color / mirror / Atom feed
* tc H/W offload issue with vxlan tunnels [was: nfp: flower vxlan tunnel offload]
From: Paolo Abeni @ 2017-09-27  8:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Or Gerlitz, Jiri Benc, Simon Horman, David Miller, Jakub Kicinski,
	Linux Netdev List, oss-drivers, John Hurley, Paul Blakey,
	Jiri Pirko, Roi Dayan

Hi,

Moving to a separate theread, since I think this is more related to the
flower core infrastructure than to the netrome patches.

On Wed, 2017-09-27 at 09:40 +0200, Jiri Pirko wrote:
> This kind of hooks are giving me nightmares. The code is screwed up as
> it is already. I'm currently working on conversion to callbacks. This
> part is handled in:
> https://github.com/jpirko/linux_mlxsw/commits/jiri_devel_egdevcb

Thanks for the pointer.

I skimmed quickly on the code and indeed it cleans this area a lot.
If I read it correctly the ('good') command:

tc filter add dev vxlan0 protocol ip parent ffff: flower enc_key_id 102 
   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

will generate a call to:

mlx5e_setup_tc(eth0, TC_SETUP_CLSFLOWER, &cls_flower) via:

fl_hw_replace_filter() ->
  tc_setup_cb_call() -> 
    tc_exts_setup_cb_egdev_call() ->
      tc_setup_cb_egdev_call() ->
        tcf_action_egdev_cb_call() ->
          mlx5e_rep_setup_tc_cb()

and the 'bad' command:

tc filter add dev eth0 protocol ip parent ffff: flower enc_key_id 102 \
   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

will also call:

mlx5e_setup_tc(eth0, TC_SETUP_CLSFLOWER, &cls_flower) via:

fl_hw_replace_filter() ->
  ndo_setup_tc()

So it looks like the H/W offload hook will still be called with the
same arguments in both case, and 'bad' rule will still be pushed to the
H/W as the driver itself has no way to distinct between the two
scenarios.

[ Note: I referred to the mlx hook just for convenience, should be the
same with any driver implementing the same APIs ]

Am I missing something?

Thanks,

Paolo

^ permalink raw reply

* RE: [PATCH net-next] liquidio: fix format truncation warning reported by gcc 7.1.1
From: David Laight @ 2017-09-27  9:04 UTC (permalink / raw)
  To: 'Felix Manlunas', davem@davemloft.net
  Cc: netdev@vger.kernel.org, raghu.vatsavayi@cavium.com,
	derek.chickles@cavium.com, satananda.burla@cavium.com
In-Reply-To: <20170926184827.GA3512@felix-thinkpad.cavium.com>

From: Felix Manlunas
> Sent: 26 September 2017 19:48
> gcc 7.1.1 with -Wformat-truncation reports these warnings:
> 
> drivers/net/ethernet/cavium/liquidio/lio_core.c: In function `octeon_setup_interrupt':
> drivers/net/ethernet/cavium/liquidio/lio_core.c:1003:41: warning: `%u' directive output may be
> truncated writing between 1 and 10 bytes into a region of size between 0 and 13 [-Wformat-truncation=]
>        INTRNAMSIZ, "LiquidIO%u-pf%u-rxtx-%u",
...
> Fix them by changing the type of the "i" local variable from int to short.

That probably adds pointless code bloat by forcing the compiler to
keep masking the value with 0xffff after every arithmetic operation.

About the only architecture that doesn't suffer the penalty is x86.

Until the compiler can correctly track the domain of values (and
be given hints about the domains) this warning is, IMHO, OTT.

	David

^ permalink raw reply

* RE: [PATCH net v2] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: David Laight @ 2017-09-27  9:06 UTC (permalink / raw)
  To: 'Vivien Didelot', netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, kernel@savoirfairelinux.com,
	David S. Miller, Florian Fainelli, Andrew Lunn
In-Reply-To: <20170926185721.12187-1-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot
> Sent: 26 September 2017 19:57
> mv88e6xxx_g2_irq_free locks the registers mutex, but not
> mv88e6xxx_g1_irq_free, which results in a stack trace from
> assert_reg_lock when unloading the mv88e6xxx module. Fix this.
> 
> Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index c6678aa9b4ef..e7ff7483d2fb 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3947,7 +3947,9 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
>  	if (chip->irq > 0) {
>  		if (chip->info->g2_irqs > 0)
>  			mv88e6xxx_g2_irq_free(chip);
> +		mutex_lock(&chip->reg_lock);
>  		mv88e6xxx_g1_irq_free(chip);
> +		mutex_unlock(&chip->reg_lock);

Isn't the irq_free code likely to have to sleep waiting for any
ISR to complete??

	David

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] net/sched: allow flower to match tunnel options
From: Jiri Pirko @ 2017-09-27  9:10 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Miller, Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev,
	oss-drivers
In-Reply-To: <1506500194-17637-3-git-send-email-simon.horman@netronome.com>

Wed, Sep 27, 2017 at 10:16:34AM CEST, simon.horman@netronome.com wrote:
>Allow matching on options in tunnel headers.
>This makes use of existing tunnel metadata support.
>
>Options are a bytestring of up to 256 bytes.
>Tunnel implementations may support less or more options,
>or no options at all.
>
>e.g.
> # ip link add name geneve0 type geneve dstport 0 external
> # tc qdisc add dev geneve0 ingress
> # tc filter add dev geneve0 protocol ip parent ffff: \
>     flower \
>       enc_src_ip 10.0.99.192 \
>       enc_dst_ip 10.0.99.193 \
>       enc_key_id 11 \
>       enc_opts 0102800100800020/fffffffffffffff0 \
>       ip_proto udp \
>       action mirred egress redirect dev eth1
>
>Signed-off-by: Simon Horman <simon.horman@netronome.com>
>Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>
>---
>v2
>* Correct example which was incorrectly described setting rather
>  than matching tunnel options
>---
> include/net/flow_dissector.h | 13 +++++++++++++
> include/uapi/linux/pkt_cls.h |  3 +++
> net/sched/cls_flower.c       | 35 ++++++++++++++++++++++++++++++++++-
> 3 files changed, 50 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index fc3dce730a6b..43f98bf0b349 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -183,6 +183,18 @@ struct flow_dissector_key_ip {
> 	__u8	ttl;
> };
> 
>+/**
>+ * struct flow_dissector_key_enc_opts:
>+ * @data: data
>+ * @len: len
>+ */
>+struct flow_dissector_key_enc_opts {
>+	u8 data[256];	/* Using IP_TUNNEL_OPTS_MAX is desired here
>+			 * but seems difficult to #include
>+			 */
>+	u8 len;
>+};
>+
> enum flow_dissector_key_id {
> 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
> 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
>@@ -205,6 +217,7 @@ enum flow_dissector_key_id {
> 	FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
> 	FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */
> 	FLOW_DISSECTOR_KEY_IP, /* struct flow_dissector_key_ip */
>+	FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */

I don't see the actual dissection implementation. Where is it?
Did you test the patchset?

^ permalink raw reply

* Re: tc H/W offload issue with vxlan tunnels [was: nfp: flower vxlan tunnel offload]
From: Jiri Pirko @ 2017-09-27  9:17 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Or Gerlitz, Jiri Benc, Simon Horman, David Miller, Jakub Kicinski,
	Linux Netdev List, oss-drivers, John Hurley, Paul Blakey,
	Jiri Pirko, Roi Dayan
In-Reply-To: <1506500975.2867.19.camel@redhat.com>

Wed, Sep 27, 2017 at 10:29:35AM CEST, pabeni@redhat.com wrote:
>Hi,
>
>Moving to a separate theread, since I think this is more related to the
>flower core infrastructure than to the netrome patches.
>
>On Wed, 2017-09-27 at 09:40 +0200, Jiri Pirko wrote:
>> This kind of hooks are giving me nightmares. The code is screwed up as
>> it is already. I'm currently working on conversion to callbacks. This
>> part is handled in:
>> https://github.com/jpirko/linux_mlxsw/commits/jiri_devel_egdevcb
>
>Thanks for the pointer.
>
>I skimmed quickly on the code and indeed it cleans this area a lot.
>If I read it correctly the ('good') command:
>
>tc filter add dev vxlan0 protocol ip parent ffff: flower enc_key_id 102 
>   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]

I suppose "action mirred redirect eth0". Then yes, it will generate the
callpath you described below.

>
>will generate a call to:
>
>mlx5e_setup_tc(eth0, TC_SETUP_CLSFLOWER, &cls_flower) via:
>
>fl_hw_replace_filter() ->
>  tc_setup_cb_call() -> 
>    tc_exts_setup_cb_egdev_call() ->
>      tc_setup_cb_egdev_call() ->
>        tcf_action_egdev_cb_call() ->
>          mlx5e_rep_setup_tc_cb()
>
>and the 'bad' command:
>
>tc filter add dev eth0 protocol ip parent ffff: flower enc_key_id 102 \
>   enc_dst_port 4789 src_ip 3.4.5.6 skip_sw action [...]
>
>will also call:
>
>mlx5e_setup_tc(eth0, TC_SETUP_CLSFLOWER, &cls_flower) via:
>
>fl_hw_replace_filter() ->
>  ndo_setup_tc()

Sure. You are adding a rule to eth0, the call goes down to eth0 driver.
I'm missing why is it a problem? Why the call should not go down to the
eth0 driver?


>
>So it looks like the H/W offload hook will still be called with the
>same arguments in both case, and 'bad' rule will still be pushed to the
>H/W as the driver itself has no way to distinct between the two
>scenarios.

Why "bad"?

Regarding the distinction, driver knows if user add a rule directly to
the eth0, or if the eth0 is egress device in the action. Those are 2
separete driver entrypoints - of course, talking about code with my
changes.


>
>[ Note: I referred to the mlx hook just for convenience, should be the
>same with any driver implementing the same APIs ]
>
>Am I missing something?
>
>Thanks,
>
>Paolo

^ permalink raw reply

* Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access
From: Jesper Dangaard Brouer @ 2017-09-27  9:26 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, alexei.starovoitov, john.fastabend, peter.waskiewicz.jr,
	jakub.kicinski, netdev, Andy Gospodarek, brouer
In-Reply-To: <59CAB17D.5090204@iogearbox.net>

On Tue, 26 Sep 2017 21:58:53 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:
> [...]
> > I'm currently implementing a cpumap type, that transfers raw XDP frames
> > to another CPU, and the SKB is allocated on the remote CPU.  (It
> > actually works extremely well).  
> 
> Meaning you let all the XDP_PASS packets get processed on a
> different CPU, so you can reserve the whole CPU just for
> prefiltering, right? 

Yes, exactly.  Except I use the XDP_REDIRECT action to steer packets.
The trick is using the map-flush point, to transfer packets in bulk to
the remote CPU (single call IPC is too slow), but at the same time
flush single packets if NAPI didn't see a bulk.

> Do you have some numbers to share at this point, just curious when
> you mention it works extremely well.

Sure... I've done a lot of benchmarking on this patchset ;-)
I have a benchmark program called xdp_redirect_cpu [1][2], that collect
stats via tracepoints (atm I'm limiting bulking 8 packets, and have
tracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)

 [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c
 [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c

Here I'm installing a DDoS program that drops UDP port 9 (pktgen
packets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that
the 11.9Mpps DDoS attack is hitting.

Running XDP/eBPF prog_num:4
XDP-cpumap      CPU:to  pps            drop-pps    extra-info
XDP-RX          0       12,030,471     11,966,982  0          
XDP-RX          total   12,030,471     11,966,982 
cpumap-enqueue    0:2   63,488         0           0          
cpumap-enqueue  sum:2   63,488         0           0          
cpumap_kthread  2       63,488         0           3          time_exceed
cpumap_kthread  total   63,488         0           0          
redirect_err    total   0              0          

$ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate         
bytes  Bytes  bytes    bytes   secs.    per sec   

16384  87380  1024     1024    10.00    12735.97   
16384  87380 

The netperf TCP_CRR performance is the same, without XDP loaded.


> Another test

I've previously shown (and optimized) in commit c0303efeab73 ("net:
reduce cycles spend on ICMP replies that gets rate limited"), that my
system can handle approx 2.7Mpps for UdpNoPorts, before the network
stack chokes.

Thus it is interesting to see, when I get UDP traffic that hits the
same CPU, if I can simply round-robin distribute it other CPUs.  This
evaluate if the cross-CPU transfer mechanism is fast-enough.

I do have to increase the ixgbe RX-ring size, else the ixgbe recycle
scheme breaks down, and we stall on the page spin_lock (as Tariq have
demonstrated before).

 # ethtool -G ixgbe1 rx 1024 tx 1024

Start RR program and add some CPUs:

 # ./xdp_redirect_cpu --dev ixgbe1 --prog 2 --cpu 1 --cpu 2 --cpu 3 --cpu 4

Running XDP/eBPF prog_num:2
XDP-cpumap      CPU:to  pps            drop-pps    extra-info
XDP-RX          0       11,006,992     0           0          
XDP-RX          total   11,006,992     0          
cpumap-enqueue    0:1   2,751,744      0           0          
cpumap-enqueue  sum:1   2,751,744      0           0          
cpumap-enqueue    0:2   2,751,748      0           0          
cpumap-enqueue  sum:2   2,751,748      0           0          
cpumap-enqueue    0:3   2,751,744      35          0          
cpumap-enqueue  sum:3   2,751,744      35          0          
cpumap-enqueue    0:4   2,751,748      0           0          
cpumap-enqueue  sum:4   2,751,748      0           0          
cpumap_kthread  1       2,751,745      0           156        time_exceed
cpumap_kthread  2       2,751,749      0           142        time_exceed
cpumap_kthread  3       2,751,713      0           131        time_exceed
cpumap_kthread  4       2,751,749      0           128        time_exceed
cpumap_kthread  total   11,006,957     0           0          
redirect_err    total   0              0          

$ nstat > /dev/null && sleep 1 && nstat | grep UdpNoPorts
UdpNoPorts                      11042282           0.0

The nstat show that the Linux network stack is actually now processing,
SKB alloc + free, 11Mpps. 

The generator was sending with 14Mpps, thus the XDP-RX program is
actually a bottleneck here. And I do see some drops on the HW level.
Thus, 1-CPU was not 100% fast-enough.

Thus, lets allocate two CPUs for XDP-RX:

Running XDP/eBPF prog_num:2
XDP-cpumap      CPU:to  pps            drop-pps    extra-info
XDP-RX          0       6,352,578      0           0          
XDP-RX          1       6,352,711      0           0          
XDP-RX          total   12,705,289     0          
cpumap-enqueue    0:2   1,588,156      1,351       0          
cpumap-enqueue    1:2   1,588,174      1,330       0          
cpumap-enqueue  sum:2   3,176,331      2,682       0          
cpumap-enqueue    0:3   1,588,157      994         0          
cpumap-enqueue    1:3   1,588,170      912         0          
cpumap-enqueue  sum:3   3,176,327      1,907       0          
cpumap-enqueue    0:4   1,588,157      529         0          
cpumap-enqueue    1:4   1,588,167      514         0          
cpumap-enqueue  sum:4   3,176,324      1,044       0          
cpumap-enqueue    0:5   1,588,159      625         0          
cpumap-enqueue    1:5   1,588,166      614         0          
cpumap-enqueue  sum:5   3,176,326      1,240       0          
cpumap_kthread  2       3,173,642      0           11257      time_exceed
cpumap_kthread  3       3,174,423      0           9779       time_exceed
cpumap_kthread  4       3,175,283      0           3938       time_exceed
cpumap_kthread  5       3,175,083      0           3120       time_exceed
cpumap_kthread  total   12,698,432     0           0          (null)
redirect_err    total   0              0          

Below, I'm using ./pktgen_sample04_many_flows.sh, and my generator
machine cannot generate more that 12,682,445 tx_packets /sec.
nstat says: UdpNoPorts 12,698,001 pps.  The XDP-RX CPUs actually have
30% idle CPU cycles, as the "only" handle 6.3Mpps each ;-)

Perf top on a CPU(3) that have to alloc and free SKBs etc.

# Overhead  CPU  Symbol                                 
# ........  ...  .......................................
#
    15.51%  003  [k] fib_table_lookup
     8.91%  003  [k] cpu_map_kthread_run
     8.04%  003  [k] build_skb
     7.88%  003  [k] page_frag_free
     5.13%  003  [k] kmem_cache_alloc
     4.76%  003  [k] ip_route_input_rcu
     4.59%  003  [k] kmem_cache_free
     4.02%  003  [k] __udp4_lib_rcv
     3.20%  003  [k] fib_validate_source
     3.02%  003  [k] __netif_receive_skb_core
     3.02%  003  [k] udp_v4_early_demux
     2.90%  003  [k] ip_rcv
     2.80%  003  [k] ip_rcv_finish
     2.26%  003  [k] eth_type_trans
     2.23%  003  [k] __build_skb
     2.00%  003  [k] icmp_send
     1.84%  003  [k] __rcu_read_unlock
     1.30%  003  [k] ip_local_deliver_finish
     1.26%  003  [k] netif_receive_skb_internal
     1.17%  003  [k] ip_route_input_noref
     1.11%  003  [k] make_kuid
     1.09%  003  [k] __udp4_lib_lookup
     1.07%  003  [k] skb_release_head_state
     1.04%  003  [k] __rcu_read_lock
     0.95%  003  [k] kfree_skb
     0.89%  003  [k] __local_bh_enable_ip
     0.88%  003  [k] skb_release_data
     0.71%  003  [k] ip_local_deliver
     0.58%  003  [k] netif_receive_skb

cmdline:
 perf report --sort cpu,symbol --kallsyms=/proc/kallsyms  --no-children  -C3 -g none --stdio

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] net/sched: allow flower to match tunnel options
From: Simon Horman @ 2017-09-27  9:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev,
	oss-drivers
In-Reply-To: <20170927091005.GB1944@nanopsycho.orion>

On Wed, Sep 27, 2017 at 11:10:05AM +0200, Jiri Pirko wrote:
> Wed, Sep 27, 2017 at 10:16:34AM CEST, simon.horman@netronome.com wrote:
> >Allow matching on options in tunnel headers.
> >This makes use of existing tunnel metadata support.
> >
> >Options are a bytestring of up to 256 bytes.
> >Tunnel implementations may support less or more options,
> >or no options at all.
> >
> >e.g.
> > # ip link add name geneve0 type geneve dstport 0 external
> > # tc qdisc add dev geneve0 ingress
> > # tc filter add dev geneve0 protocol ip parent ffff: \
> >     flower \
> >       enc_src_ip 10.0.99.192 \
> >       enc_dst_ip 10.0.99.193 \
> >       enc_key_id 11 \
> >       enc_opts 0102800100800020/fffffffffffffff0 \
> >       ip_proto udp \
> >       action mirred egress redirect dev eth1
> >
> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >
> >---
> >v2
> >* Correct example which was incorrectly described setting rather
> >  than matching tunnel options
> >---
> > include/net/flow_dissector.h | 13 +++++++++++++
> > include/uapi/linux/pkt_cls.h |  3 +++
> > net/sched/cls_flower.c       | 35 ++++++++++++++++++++++++++++++++++-
> > 3 files changed, 50 insertions(+), 1 deletion(-)
> >
> >diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> >index fc3dce730a6b..43f98bf0b349 100644
> >--- a/include/net/flow_dissector.h
> >+++ b/include/net/flow_dissector.h
> >@@ -183,6 +183,18 @@ struct flow_dissector_key_ip {
> > 	__u8	ttl;
> > };
> > 
> >+/**
> >+ * struct flow_dissector_key_enc_opts:
> >+ * @data: data
> >+ * @len: len
> >+ */
> >+struct flow_dissector_key_enc_opts {
> >+	u8 data[256];	/* Using IP_TUNNEL_OPTS_MAX is desired here
> >+			 * but seems difficult to #include
> >+			 */
> >+	u8 len;
> >+};
> >+
> > enum flow_dissector_key_id {
> > 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
> > 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
> >@@ -205,6 +217,7 @@ enum flow_dissector_key_id {
> > 	FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
> > 	FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */
> > 	FLOW_DISSECTOR_KEY_IP, /* struct flow_dissector_key_ip */
> >+	FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
> 
> I don't see the actual dissection implementation. Where is it?
> Did you test the patchset?

Yes, I did test it. But it is also possible something went astray along the
way and I will retest.

I think that the code you are looking for is in
fl_classify() in this patch.

^ permalink raw reply

* [PATCH v3 0/1] Introduce MPLS over GRE
From: Amine Kherbouche @ 2017-09-27  9:37 UTC (permalink / raw)
  To: netdev, xeb, roopa; +Cc: amine.kherbouche, equinox

This series introduces the MPLS over GRE encapsulation (RFC 4023).

Various applications of MPLS make use of label stacks with multiple
entries.  In some cases, it is possible to replace the top label of
the stack with an IP-based encapsulation, thereby, it is possible for
two LSRs that are adjacent on an LSP to be separated by an IP network,
even if that IP network does not provide MPLS.

Changes in v3:
  - remove mpls_forward() function exportation patch.
  - wrap efficiently mpls iptunnel add/del functions and dependent
    function/structure.
  - move mpls_gre_rcv to af_mpls.c file and export it.
  - remove unnecessary functions.
 
Changes in v2:
  - wrap ip tunnel functions under ifdef in mpls file.
  - fix indentation.
  - check return code.

An example of configuration:


         node1                LER1                       LER2                node2
        +-----+             +------+                   +------+             +-----+
        |     |             |      |                   |      |             |     |
        |     |             |      |p3  GRE tunnel   p4|      |             |     |
        |     |p1         p2|      +-------------------+      |p5         p6|     |
        |     +-------------+      +-------------------+      +------------+|     |
        |     |10.100.0.0/24|      |                   |      |10.200.0.0/24|     |
        |     |fd00:100::/64|      |  10.125.0.0/24    |      |fd00:200::/64|     |
        |     |             |      |  fd00:125::/64    |      |             |     |
        |     |             |      |                   |      |             |     |
        |     |             |      |                   |      |             |     |
        |     |             |      |                   |      |             |     |
        |     |             |      |                   |      |             |     |
        +-----+             +------+                   +------+             +-----+


		###	node1	###

ip link set p1 up
ip addr add 10.100.0.1/24 dev p1

		###	LER1	###

ip link set p2 up
ip addr add 10.100.0.2/24 dev p2

ip link set p3 up
ip addr add 10.125.0.1/24 dev p3

modprobe mpls_router
sysctl -w net.mpls.conf.p2.input=1
sysctl -w net.mpls.conf.p3.input=1
sysctl -w net.mpls.platform_labels=1000

ip link add gre1 type gre ttl 64 local 10.125.0.1 remote 10.125.0.2 dev p3
ip link set dev gre1 up

ip -M route add 111 as 222 dev gre1
ip -M route add 555 as 666 via inet 10.100.0.1 dev p2

		###	LER2	###

ip link set p5 up
ip addr add 10.200.0.2/24 dev p5

ip link set p4 up
ip addr add 10.125.0.2/24 dev p4

modprobe mpls_router
sysctl -w net.mpls.conf.p4.input=1
sysctl -w net.mpls.conf.p5.input=1
sysctl -w net.mpls.platform_labels=1000

ip link add gre1 type gre ttl 64 local 10.125.0.2 remote 10.125.0.1 dev p4
ip link set dev gre1 up

ip -M route add 444 as 555 dev gre1
ip -M route add 222 as 333 via inet 10.200.0.1 dev p5

		###	node2	###

ip link set p6 up
ip addr add 10.200.0.1/24 dev p6


Now using this scapy to forge and send packets from the port p1 of node1:

p = Ether(src='de:ed:01:0c:41:09', dst='de:ed:01:2f:3b:ba')
p /= MPLS(s=1, ttl=64, label=111)/Raw(load='\xde')
sendp(p, iface="p1", count=20, inter=0.1)

Amine Kherbouche (1):
  ip_tunnel: add mpls over gre encapsulation

 include/linux/mpls.h           |  2 ++
 include/uapi/linux/if_tunnel.h |  1 +
 net/ipv4/ip_gre.c              | 11 +++++++++
 net/ipv6/ip6_gre.c             | 11 +++++++++
 net/mpls/af_mpls.c             | 52 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 77 insertions(+)

-- 
2.1.4

^ permalink raw reply

* [PATCH v3 1/1] ip_tunnel: add mpls over gre encapsulation
From: Amine Kherbouche @ 2017-09-27  9:37 UTC (permalink / raw)
  To: netdev, xeb, roopa; +Cc: amine.kherbouche, equinox
In-Reply-To: <cover.1506504229.git.amine.kherbouche@6wind.com>

This commit introduces the MPLSoGRE support (RFC 4023), using ip tunnel
API.

Encap:
  - Add a new iptunnel type mpls.
  - Share tx path: gre type mpls loaded from skb->protocol.

Decap:
  - pull gre hdr and call mpls_forward().

Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
---
 include/linux/mpls.h           |  2 ++
 include/uapi/linux/if_tunnel.h |  1 +
 net/ipv4/ip_gre.c              | 11 +++++++++
 net/ipv6/ip6_gre.c             | 11 +++++++++
 net/mpls/af_mpls.c             | 52 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 77 insertions(+)

diff --git a/include/linux/mpls.h b/include/linux/mpls.h
index 384fb22..57203c1 100644
--- a/include/linux/mpls.h
+++ b/include/linux/mpls.h
@@ -8,4 +8,6 @@
 #define MPLS_TC_MASK		(MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
 #define MPLS_LABEL_MASK		(MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)
 
+int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len);
+
 #endif  /* _LINUX_MPLS_H */
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 2e52088..a2f48c0 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -84,6 +84,7 @@ enum tunnel_encap_types {
 	TUNNEL_ENCAP_NONE,
 	TUNNEL_ENCAP_FOU,
 	TUNNEL_ENCAP_GUE,
+	TUNNEL_ENCAP_MPLS,
 };
 
 #define TUNNEL_ENCAP_FLAG_CSUM		(1<<0)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 9cee986..0a898f4 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -32,6 +32,9 @@
 #include <linux/netfilter_ipv4.h>
 #include <linux/etherdevice.h>
 #include <linux/if_ether.h>
+#if IS_ENABLED(CONFIG_MPLS)
+#include <linux/mpls.h>
+#endif
 
 #include <net/sock.h>
 #include <net/ip.h>
@@ -412,6 +415,14 @@ static int gre_rcv(struct sk_buff *skb)
 			return 0;
 	}
 
+	if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
+#if IS_ENABLED(CONFIG_MPLS)
+		return mpls_gre_rcv(skb, hdr_len);
+#else
+		goto drop;
+#endif
+	}
+
 	if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
 		return 0;
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index c82d41e..5a0f5e1 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -34,6 +34,9 @@
 #include <linux/hash.h>
 #include <linux/if_tunnel.h>
 #include <linux/ip6_tunnel.h>
+#if IS_ENABLED(CONFIG_MPLS)
+#include <linux/mpls.h>
+#endif
 
 #include <net/sock.h>
 #include <net/ip.h>
@@ -476,6 +479,14 @@ static int gre_rcv(struct sk_buff *skb)
 	if (hdr_len < 0)
 		goto drop;
 
+	if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
+#if IS_ENABLED(CONFIG_MPLS)
+		return mpls_gre_rcv(skb, hdr_len);
+#else
+		goto drop;
+#endif
+	}
+
 	if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false))
 		goto drop;
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c5b9ce4..53ec7c0 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -16,6 +16,7 @@
 #include <net/arp.h>
 #include <net/ip_fib.h>
 #include <net/netevent.h>
+#include <net/ip_tunnels.h>
 #include <net/netns/generic.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
@@ -39,6 +40,36 @@ static int one = 1;
 static int label_limit = (1 << 20) - 1;
 static int ttl_max = 255;
 
+#if IS_ENABLED(CONFIG_NET_IP_TUNNEL)
+size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
+{
+	return sizeof(struct mpls_shim_hdr);
+}
+
+static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
+	.encap_hlen	= ipgre_mpls_encap_hlen,
+};
+
+static int ipgre_tunnel_encap_add_mpls_ops(void)
+{
+	return ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
+}
+
+static void ipgre_tunnel_encap_del_mpls_ops(void)
+{
+	ip_tunnel_encap_del_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
+}
+#else
+static int ipgre_tunnel_encap_add_mpls_ops(void)
+{
+	return 0;
+}
+
+static void ipgre_tunnel_encap_del_mpls_ops(void)
+{
+}
+#endif
+
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
 		       struct nlmsghdr *nlh, struct net *net, u32 portid,
 		       unsigned int nlm_flags);
@@ -443,6 +474,22 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	return NET_RX_DROP;
 }
 
+int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len)
+{
+	if (unlikely(!pskb_may_pull(skb, gre_hdr_len)))
+		goto drop;
+
+	/* Pop GRE hdr and reset the skb */
+	skb_pull(skb, gre_hdr_len);
+	skb_reset_network_header(skb);
+
+	return mpls_forward(skb, skb->dev, NULL, NULL);
+drop:
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
+EXPORT_SYMBOL(mpls_gre_rcv);
+
 static struct packet_type mpls_packet_type __read_mostly = {
 	.type = cpu_to_be16(ETH_P_MPLS_UC),
 	.func = mpls_forward,
@@ -2485,6 +2532,10 @@ static int __init mpls_init(void)
 		      0);
 	rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
 		      mpls_netconf_dump_devconf, 0);
+	err = ipgre_tunnel_encap_add_mpls_ops();
+	if (err)
+		pr_err("Can't add mpls over gre tunnel ops\n");
+
 	err = 0;
 out:
 	return err;
@@ -2502,6 +2553,7 @@ static void __exit mpls_exit(void)
 	dev_remove_pack(&mpls_packet_type);
 	unregister_netdevice_notifier(&mpls_dev_notifier);
 	unregister_pernet_subsys(&mpls_net_ops);
+	ipgre_tunnel_encap_del_mpls_ops();
 }
 module_exit(mpls_exit);
 
-- 
2.1.4

^ permalink raw reply related

* Re: tc H/W offload issue with vxlan tunnels [was: nfp: flower vxlan tunnel offload]
From: Paolo Abeni @ 2017-09-27  9:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Or Gerlitz, Jiri Benc, Simon Horman, David Miller, Jakub Kicinski,
	Linux Netdev List, oss-drivers, John Hurley, Paul Blakey,
	Jiri Pirko, Roi Dayan
In-Reply-To: <20170927091700.GC1944@nanopsycho.orion>

On Wed, 2017-09-27 at 11:17 +0200, Jiri Pirko wrote:
> Wed, Sep 27, 2017 at 10:29:35AM CEST, pabeni@redhat.com wrote:
> > So it looks like the H/W offload hook will still be called with the
> > same arguments in both case, and 'bad' rule will still be pushed to the
> > H/W as the driver itself has no way to distinct between the two
> > scenarios.
> 
> Why "bad"?

Such rule is coped differently by the SW and the HW data path.

a rule like:

tc filter add dev eth0 protocol ip parent ffff: flower \
   enc_key_id 102 enc_dst_port 4789 src_ip 3.4.5.6 skip_hw \
   action action mirred redirect eth0_vf_1

will match 0 packets, while:

tc filter add dev eth0 protocol ip parent ffff: flower \
   enc_key_id 102 enc_dst_port 4789 src_ip 3.4.5.6 skip_sw \
   action action mirred redirect eth0_vf_1

[just flipped 'skip_sw' and 'skip_hw' ]
will match the vxlan-tunneled packets. I understand that one of the
design goal for the h/w offload path is being consistent with the sw
one, but that does not hold in the above scenario.

> Regarding the distinction, driver knows if user add a rule directly to
> the eth0, or if the eth0 is egress device in the action. Those are 2
> separete driver entrypoints - of course, talking about code with my
> changes.

ok, but than each driver should catch the scenario "rule with tunnel
match over non tunnel device" and cope with them properly - never match
it - why don't simply avoiding pushing such rules to the H/W ? 

Cheers,

Paolo

^ permalink raw reply

* Re: tg3 pxe weirdness
From: Berend De Schouwer @ 2017-09-27 10:05 UTC (permalink / raw)
  To: Siva Reddy Kallam; +Cc: Linux Netdev List
In-Reply-To: <CAMet4B6mZ8SyCT7W2j4OBEUDiy3ZupYKWXBFti0nXoxm51_6kg@mail.gmail.com>

On Mon, 2017-09-25 at 15:11 +0530, Siva Reddy Kallam wrote:
> On Fri, Sep 22, 2017 at 9:04 PM, Berend De Schouwer
> <berend.de.schouwer@gmail.com> wrote:
> > On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
> > > 
> > > 
> > > Can you please share below details?
> > > 1) Model and Manufacturer of the system
> > > 2) Linux distro/kernel used?
> > 
> > 4.13.3 gets a little further, but after some more data is
> > transferred
> > the tg3 driver still crashes.  This is unfortunately before I've
> > got a
> > writeable filesystem.
> > 
> > The last line is:
> > tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
> > 
> > I've got some ideas to get the full dmesg.
> > 
> > As with the other kernels it works OK on 1Gbps, but not slower
> > switches.
> 
> I am suspecting with link aware mode, the clock speed could be slow
> and boot code does not
> complete within the expected time with lower link speeds. So,
> Providing a patch to override clock.
> Can you please try with attached debug patch and provide us the
> feedback with 100M link?
> If it solves this issue, we will work on proper changes.

This does work on 4.13.3 and PXE for me.

I've tested on 1 Gbps, 100 Mbps and 10 Mbps.  I've done some
preliminary testing (eg. large file copies.)

^ permalink raw reply

* Re: [PATCH v6 05/11] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY
From: Maxime Ripard @ 2017-09-27 10:15 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	wens-jdAy2FN1RRM, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	andrew-g2DYL2Zd6BY, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20170927073414.17361-6-clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

Hi,

On Wed, Sep 27, 2017 at 07:34:08AM +0000, Corentin Labbe wrote:
> This patch add documentation about the MDIO switch used on sun8i-h3-emac
> for integrated PHY.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

This should be squashed with patch 1.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
From: Sergei Shtylyov @ 2017-09-27 10:15 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: Vivien Didelot, netdev
In-Reply-To: <1506464764-12699-2-git-send-email-andrew@lunn.ch>

Hello!

On 9/27/2017 1:25 AM, Andrew Lunn wrote:

> SWITCHDEV_ATTR_ID_PORT_PARENT_ID is used by the software bridge when
> determining which ports to flood a packet out. If the packet
> originated from a switch, it assumes the switch has already flooded
> the packet out the switches ports, so the bridge should not flood the
> packet itself out switch ports. Ports on the same switch are expected
> to return the same parent ID when SWITCHDEV_ATTR_ID_PORT_PARENT_ID is
> called.
> 
> DSA gets this wrong with clusters of switches. As far as the software
> bridge is concerned, the cluster is all one switch. A packet from any
> switch in the cluster can be assumed to of been flooded as needed out

    s/of/have/?

> all ports of the cluster, not just the switch it originated
> from. Hence all ports of a cluster should return the same parent. The
> old implementation did not, each switch in the cluster had its own ID.
> 
> Also wrong was that the ID was not unique if multiple DSA instances
> are in operation.
> 
> Use the tree ID as the parent ID, which is the same for all switches
> in a cluster and unique across switch clusters.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
> v2: Swap from MAC address to dst->tree
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH v6 06/11] ARM: dts: sunxi: h3/h5: represent the mdio switch used by sun8i-h3-emac
From: Maxime Ripard @ 2017-09-27 10:16 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	wens-jdAy2FN1RRM, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	andrew-g2DYL2Zd6BY, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20170927073414.17361-7-clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

On Wed, Sep 27, 2017 at 07:34:09AM +0000, Corentin Labbe wrote:
> Since dwmac-sun8i could use either an integrated PHY or an external PHY
> (which could be at same MDIO address), we need to represent this selection
> by a MDIO switch.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  arch/arm/boot/dts/sunxi-h3-h5.dtsi | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/boot/dts/sunxi-h3-h5.dtsi b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> index 3b7d953429a6..a8e9b8f378ba 100644
> --- a/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> +++ b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
> @@ -422,14 +422,33 @@
>  			#size-cells = <0>;
>  			status = "disabled";
>  
> -			mdio: mdio {
> +			mdio0: mdio {
>  				#address-cells = <1>;
>  				#size-cells = <0>;
> -				int_mii_phy: ethernet-phy@1 {
> -					compatible = "ethernet-phy-ieee802.3-c22";
> -					reg = <1>;
> -					clocks = <&ccu CLK_BUS_EPHY>;
> -					resets = <&ccu RST_BUS_EPHY>;
> +				compatible = "snps,dwmac-mdio";
> +
> +				mdio-mux {
> +					compatible = "mdio-mux";
> +					#address-cells = <1>;
> +					#size-cells = <0>;

Newline

> +					/* Only one MDIO is usable at the time */
> +					internal_mdio: mdio@1 {
> +						reg = <1>;
> +						#address-cells = <1>;
> +						#size-cells = <0>;

Newline

> +						int_mii_phy: ethernet-phy@1 {
> +							compatible = "ethernet-phy-ieee802.3-c22";
> +							reg = <1>;
> +							clocks = <&ccu CLK_BUS_EPHY>;
> +							resets = <&ccu RST_BUS_EPHY>;
> +							phy-is-integrated;
> +						};
> +					};

Newline

> +					mdio: mdio@2 {

This is quite confusing. Why not call the label external_mdio?

Thanks

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: tg3 pxe weirdness
From: Siva Reddy Kallam @ 2017-09-27 10:31 UTC (permalink / raw)
  To: Berend De Schouwer; +Cc: Linux Netdev List

On Wed, Sep 27, 2017 at 3:35 PM, Berend De Schouwer
<berend.de.schouwer@gmail.com> wrote:
> On Mon, 2017-09-25 at 15:11 +0530, Siva Reddy Kallam wrote:
>> On Fri, Sep 22, 2017 at 9:04 PM, Berend De Schouwer
>> <berend.de.schouwer@gmail.com> wrote:
>> > On Fri, 2017-09-22 at 11:51 +0530, Siva Reddy Kallam wrote:
>> > >
>> > >
>> > > Can you please share below details?
>> > > 1) Model and Manufacturer of the system
>> > > 2) Linux distro/kernel used?
>> >
>> > 4.13.3 gets a little further, but after some more data is
>> > transferred
>> > the tg3 driver still crashes.  This is unfortunately before I've
>> > got a
>> > writeable filesystem.
>> >
>> > The last line is:
>> > tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
>> >
>> > I've got some ideas to get the full dmesg.
>> >
>> > As with the other kernels it works OK on 1Gbps, but not slower
>> > switches.
>>
>> I am suspecting with link aware mode, the clock speed could be slow
>> and boot code does not
>> complete within the expected time with lower link speeds. So,
>> Providing a patch to override clock.
>> Can you please try with attached debug patch and provide us the
>> feedback with 100M link?
>> If it solves this issue, we will work on proper changes.
>
> This does work on 4.13.3 and PXE for me.
>
> I've tested on 1 Gbps, 100 Mbps and 10 Mbps.  I've done some
> preliminary testing (eg. large file copies.)

Good. We will work on required changes and upstream proper patch after
sanity test with multiple speeds.

^ permalink raw reply

* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jesper Dangaard Brouer @ 2017-09-27 10:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: brouer, Alexei Starovoitov, netdev, daniel, davem, hannes,
	dsahern, oss-drivers, linux-doc@vger.kernel.org
In-Reply-To: <20170927000208.4396dfb7@cakuba>

On Wed, 27 Sep 2017 00:02:08 +0100
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Tue, 26 Sep 2017 15:24:06 -0700, Alexei Starovoitov wrote:
> > On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:  
> > > Add a simple tool for querying and updating BPF objects on the system.
> > > 
> > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > Reviewed-by: Simon Horman <simon.horman@netronome.com>
[...]
> > >  tools/bpf/Makefile             |  18 +-
> > >  tools/bpf/bpftool/Makefile     |  80 +++++
> > >  tools/bpf/bpftool/common.c     | 214 ++++++++++++
> > >  tools/bpf/bpftool/jit_disasm.c |  83 +++++
> > >  tools/bpf/bpftool/main.c       | 212 ++++++++++++
> > >  tools/bpf/bpftool/main.h       |  99 ++++++
> > >  tools/bpf/bpftool/map.c        | 742 +++++++++++++++++++++++++++++++++++++++++
> > >  tools/bpf/bpftool/prog.c       | 392 ++++++++++++++++++++++
> > >  8 files changed, 1837 insertions(+), 3 deletions(-)    
> > ...  
> > > +static int do_help(int argc, char **argv)
> > > +{
> > > +	fprintf(stderr,
> > > +		"Usage: %s %s show   [MAP]\n"
> > > +		"       %s %s dump    MAP\n"
> > > +		"       %s %s update  MAP  key BYTES value VALUE [UPDATE_FLAGS]\n"
> > > +		"       %s %s lookup  MAP  key BYTES\n"
> > > +		"       %s %s getnext MAP [key BYTES]\n"
> > > +		"       %s %s delete  MAP  key BYTES\n"
> > > +		"       %s %s pin     MAP  FILE\n"
> > > +		"       %s %s help\n"
> > > +		"\n"
> > > +		"       MAP := { id MAP_ID | pinned FILE }\n"
> > > +		"       " HELP_SPEC_PROGRAM "\n"
> > > +		"       VALUE := { BYTES | MAP | PROG }\n"
> > > +		"       UPDATE_FLAGS := { any | exist | noexist }\n"
> > > +		"",    
> > 
> > overall looks good to me, but still difficult to grasp how to use it.
> > Can you add README with example usage and expected output?  
> 
> I have a README on GitHub, but I was thinking about perhaps writing a
> proper man page?  Do you prefer one over the other?

I would prefer adding a README.rst file, in RST-format, as the rest of
the kernel documentation is moving in that direction[1] (your github
version is in README.md format).  A man page will always be
out-of-sync, and even out-of-sync on different distros.

 See[1]: https://www.kernel.org/doc/html/latest/

And then I would find some place in Documentation/admin-guide/ and
include the README.rst file, so it shows up at [1].

RST have an include method like:

.. include:: ../../tools/bpf/bpftool/README.rst

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH 4/4] net: stmmac: fixing DMA reset sleep and timeout values
From: Emiliano Ingrassia @ 2017-09-27 10:46 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue, netdev, linux-amlogic
In-Reply-To: <cover.1506507688.git.ingrassia@epigenesys.com>

This patch fixes the sleep and timeout values used during
DMA reset, which were inverted in a previous patch.

Fixes: 8a70aeca80c2 ("net: stmmac: Use readl_poll_timeout")

Signed-off-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
index 67af0bdd7f10..7516ca210855 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
@@ -34,7 +34,7 @@ int dwmac_dma_reset(void __iomem *ioaddr)
 
 	err = readl_poll_timeout(ioaddr + DMA_BUS_MODE, value,
 				 !(value & DMA_BUS_MODE_SFT_RESET),
-				 100000, 10000);
+				 10000, 100000);
 	if (err)
 		return -EBUSY;
 
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Daniel Borkmann @ 2017-09-27 10:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Jakub Kicinski
  Cc: Alexei Starovoitov, netdev, davem, hannes, dsahern, oss-drivers,
	linux-doc@vger.kernel.org
In-Reply-To: <20170927124511.650870bc@redhat.com>

On 09/27/2017 12:45 PM, Jesper Dangaard Brouer wrote:
> On Wed, 27 Sep 2017 00:02:08 +0100
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
>> On Tue, 26 Sep 2017 15:24:06 -0700, Alexei Starovoitov wrote:
>>> On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:
>>>> Add a simple tool for querying and updating BPF objects on the system.
>>>>
>>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> [...]
>>>>   tools/bpf/Makefile             |  18 +-
>>>>   tools/bpf/bpftool/Makefile     |  80 +++++
>>>>   tools/bpf/bpftool/common.c     | 214 ++++++++++++
>>>>   tools/bpf/bpftool/jit_disasm.c |  83 +++++
>>>>   tools/bpf/bpftool/main.c       | 212 ++++++++++++
>>>>   tools/bpf/bpftool/main.h       |  99 ++++++
>>>>   tools/bpf/bpftool/map.c        | 742 +++++++++++++++++++++++++++++++++++++++++
>>>>   tools/bpf/bpftool/prog.c       | 392 ++++++++++++++++++++++
>>>>   8 files changed, 1837 insertions(+), 3 deletions(-)
>>> ...
>>>> +static int do_help(int argc, char **argv)
>>>> +{
>>>> +	fprintf(stderr,
>>>> +		"Usage: %s %s show   [MAP]\n"
>>>> +		"       %s %s dump    MAP\n"
>>>> +		"       %s %s update  MAP  key BYTES value VALUE [UPDATE_FLAGS]\n"
>>>> +		"       %s %s lookup  MAP  key BYTES\n"
>>>> +		"       %s %s getnext MAP [key BYTES]\n"
>>>> +		"       %s %s delete  MAP  key BYTES\n"
>>>> +		"       %s %s pin     MAP  FILE\n"
>>>> +		"       %s %s help\n"
>>>> +		"\n"
>>>> +		"       MAP := { id MAP_ID | pinned FILE }\n"
>>>> +		"       " HELP_SPEC_PROGRAM "\n"
>>>> +		"       VALUE := { BYTES | MAP | PROG }\n"
>>>> +		"       UPDATE_FLAGS := { any | exist | noexist }\n"
>>>> +		"",
>>>
>>> overall looks good to me, but still difficult to grasp how to use it.
>>> Can you add README with example usage and expected output?
>>
>> I have a README on GitHub, but I was thinking about perhaps writing a
>> proper man page?  Do you prefer one over the other?
>
> I would prefer adding a README.rst file, in RST-format, as the rest of
> the kernel documentation is moving in that direction[1] (your github
> version is in README.md format).  A man page will always be
> out-of-sync, and even out-of-sync on different distros.
>
>   See[1]: https://www.kernel.org/doc/html/latest/
>
> And then I would find some place in Documentation/admin-guide/ and
> include the README.rst file, so it shows up at [1].
>
> RST have an include method like:
>
> .. include:: ../../tools/bpf/bpftool/README.rst

Agree, to have a README.rst with a description/howto in bpftool/
seems like a good idea to me, too.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jakub Kicinski @ 2017-09-27 10:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, netdev, daniel, davem, hannes, dsahern,
	oss-drivers, linux-doc@vger.kernel.org
In-Reply-To: <20170927124511.650870bc@redhat.com>

On Wed, 27 Sep 2017 12:45:11 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 27 Sep 2017 00:02:08 +0100
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> 
> > On Tue, 26 Sep 2017 15:24:06 -0700, Alexei Starovoitov wrote:  
> > > On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:    
> > > > Add a simple tool for querying and updating BPF objects on the system.
> > > > 
> > > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > > Reviewed-by: Simon Horman <simon.horman@netronome.com>  
> [...]
> > > >  tools/bpf/Makefile             |  18 +-
> > > >  tools/bpf/bpftool/Makefile     |  80 +++++
> > > >  tools/bpf/bpftool/common.c     | 214 ++++++++++++
> > > >  tools/bpf/bpftool/jit_disasm.c |  83 +++++
> > > >  tools/bpf/bpftool/main.c       | 212 ++++++++++++
> > > >  tools/bpf/bpftool/main.h       |  99 ++++++
> > > >  tools/bpf/bpftool/map.c        | 742 +++++++++++++++++++++++++++++++++++++++++
> > > >  tools/bpf/bpftool/prog.c       | 392 ++++++++++++++++++++++
> > > >  8 files changed, 1837 insertions(+), 3 deletions(-)      
> > > ...    
> > > > +static int do_help(int argc, char **argv)
> > > > +{
> > > > +	fprintf(stderr,
> > > > +		"Usage: %s %s show   [MAP]\n"
> > > > +		"       %s %s dump    MAP\n"
> > > > +		"       %s %s update  MAP  key BYTES value VALUE [UPDATE_FLAGS]\n"
> > > > +		"       %s %s lookup  MAP  key BYTES\n"
> > > > +		"       %s %s getnext MAP [key BYTES]\n"
> > > > +		"       %s %s delete  MAP  key BYTES\n"
> > > > +		"       %s %s pin     MAP  FILE\n"
> > > > +		"       %s %s help\n"
> > > > +		"\n"
> > > > +		"       MAP := { id MAP_ID | pinned FILE }\n"
> > > > +		"       " HELP_SPEC_PROGRAM "\n"
> > > > +		"       VALUE := { BYTES | MAP | PROG }\n"
> > > > +		"       UPDATE_FLAGS := { any | exist | noexist }\n"
> > > > +		"",      
> > > 
> > > overall looks good to me, but still difficult to grasp how to use it.
> > > Can you add README with example usage and expected output?    
> > 
> > I have a README on GitHub, but I was thinking about perhaps writing a
> > proper man page?  Do you prefer one over the other?  
> 
> I would prefer adding a README.rst file, in RST-format, as the rest of
> the kernel documentation is moving in that direction[1] (your github
> version is in README.md format).  A man page will always be
> out-of-sync, and even out-of-sync on different distros.
> 
>  See[1]: https://www.kernel.org/doc/html/latest/
> 
> And then I would find some place in Documentation/admin-guide/ and
> include the README.rst file, so it shows up at [1].
> 
> RST have an include method like:
> 
> .. include:: ../../tools/bpf/bpftool/README.rst

Can the docs in new format be rendered into a man page?  Call me old
fashioned but I think we should provide some form of a man page.. :)

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] net/sched: allow flower to match tunnel options
From: Jiri Pirko @ 2017-09-27 11:08 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Miller, Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev,
	oss-drivers
In-Reply-To: <20170927092732.GC25449@vergenet.net>

Wed, Sep 27, 2017 at 11:27:33AM CEST, simon.horman@netronome.com wrote:
>On Wed, Sep 27, 2017 at 11:10:05AM +0200, Jiri Pirko wrote:
>> Wed, Sep 27, 2017 at 10:16:34AM CEST, simon.horman@netronome.com wrote:
>> >Allow matching on options in tunnel headers.
>> >This makes use of existing tunnel metadata support.
>> >
>> >Options are a bytestring of up to 256 bytes.
>> >Tunnel implementations may support less or more options,
>> >or no options at all.
>> >
>> >e.g.
>> > # ip link add name geneve0 type geneve dstport 0 external
>> > # tc qdisc add dev geneve0 ingress
>> > # tc filter add dev geneve0 protocol ip parent ffff: \
>> >     flower \
>> >       enc_src_ip 10.0.99.192 \
>> >       enc_dst_ip 10.0.99.193 \
>> >       enc_key_id 11 \
>> >       enc_opts 0102800100800020/fffffffffffffff0 \
>> >       ip_proto udp \
>> >       action mirred egress redirect dev eth1
>> >
>> >Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> >
>> >---
>> >v2
>> >* Correct example which was incorrectly described setting rather
>> >  than matching tunnel options
>> >---
>> > include/net/flow_dissector.h | 13 +++++++++++++
>> > include/uapi/linux/pkt_cls.h |  3 +++
>> > net/sched/cls_flower.c       | 35 ++++++++++++++++++++++++++++++++++-
>> > 3 files changed, 50 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>> >index fc3dce730a6b..43f98bf0b349 100644
>> >--- a/include/net/flow_dissector.h
>> >+++ b/include/net/flow_dissector.h
>> >@@ -183,6 +183,18 @@ struct flow_dissector_key_ip {
>> > 	__u8	ttl;
>> > };
>> > 
>> >+/**
>> >+ * struct flow_dissector_key_enc_opts:
>> >+ * @data: data
>> >+ * @len: len
>> >+ */
>> >+struct flow_dissector_key_enc_opts {
>> >+	u8 data[256];	/* Using IP_TUNNEL_OPTS_MAX is desired here
>> >+			 * but seems difficult to #include
>> >+			 */
>> >+	u8 len;
>> >+};
>> >+
>> > enum flow_dissector_key_id {
>> > 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
>> > 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
>> >@@ -205,6 +217,7 @@ enum flow_dissector_key_id {
>> > 	FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
>> > 	FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */
>> > 	FLOW_DISSECTOR_KEY_IP, /* struct flow_dissector_key_ip */
>> >+	FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
>> 
>> I don't see the actual dissection implementation. Where is it?
>> Did you test the patchset?
>
>Yes, I did test it. But it is also possible something went astray along the
>way and I will retest.
>
>I think that the code you are looking for is in
>fl_classify() in this patch.

The dissection should be done in the flow_dissector. That's the whole
point in having it generic. You should move it there.

^ permalink raw reply

* Re: tc H/W offload issue with vxlan tunnels [was: nfp: flower vxlan tunnel offload]
From: Jiri Pirko @ 2017-09-27 11:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Or Gerlitz, Jiri Benc, Simon Horman, David Miller, Jakub Kicinski,
	Linux Netdev List, oss-drivers, John Hurley, Paul Blakey,
	Jiri Pirko, Roi Dayan
In-Reply-To: <1506505618.2867.34.camel@redhat.com>

Wed, Sep 27, 2017 at 11:46:58AM CEST, pabeni@redhat.com wrote:
>On Wed, 2017-09-27 at 11:17 +0200, Jiri Pirko wrote:
>> Wed, Sep 27, 2017 at 10:29:35AM CEST, pabeni@redhat.com wrote:
>> > So it looks like the H/W offload hook will still be called with the
>> > same arguments in both case, and 'bad' rule will still be pushed to the
>> > H/W as the driver itself has no way to distinct between the two
>> > scenarios.
>> 
>> Why "bad"?
>
>Such rule is coped differently by the SW and the HW data path.
>
>a rule like:
>
>tc filter add dev eth0 protocol ip parent ffff: flower \
>   enc_key_id 102 enc_dst_port 4789 src_ip 3.4.5.6 skip_hw \
>   action action mirred redirect eth0_vf_1
>
>will match 0 packets, while:
>
>tc filter add dev eth0 protocol ip parent ffff: flower \
>   enc_key_id 102 enc_dst_port 4789 src_ip 3.4.5.6 skip_sw \
>   action action mirred redirect eth0_vf_1
>
>[just flipped 'skip_sw' and 'skip_hw' ]
>will match the vxlan-tunneled packets. I understand that one of the
>design goal for the h/w offload path is being consistent with the sw
>one, but that does not hold in the above scenario.

Sure, the consistency is important. Howcome "skip_hw" won't match and
"skip_sw" will match? What's different?


>
>> Regarding the distinction, driver knows if user add a rule directly to
>> the eth0, or if the eth0 is egress device in the action. Those are 2
>> separete driver entrypoints - of course, talking about code with my
>> changes.
>
>ok, but than each driver should catch the scenario "rule with tunnel
>match over non tunnel device" and cope with them properly - never match
>it - why don't simply avoiding pushing such rules to the H/W ? 
>
>Cheers,
>
>Paolo

^ permalink raw reply

* [RFT] lan78xx: FIX use-after-free in lan78xx_write_reg
From: Arvind Yadav @ 2017-09-27 11:18 UTC (permalink / raw)
  To: davem, woojung.huh, UNGLinuxDriver, netdev, andreyknvl, kcc,
	dvyukov, Nisar.Sayed
  Cc: linux-usb, linux-kernel, syzkaller

We are not releasing 'buf' memory on failure or disconnect a device.

Adding 'u8 *buf' as part of 'lan78xx_net' structure to make proper
handle for 'buf'.
Now releasing 'buf' memory on failure. It's allocate first in
lan78xx_probe() and it should be freed last in lan78xx_disconnect().

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 drivers/net/usb/lan78xx.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index b99a7fb..e653982 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -402,6 +402,7 @@ struct lan78xx_net {
 	struct statstage	stats;
 
 	struct irq_domain_data	domain_data;
+	u8 *buf;
 };
 
 /* define external phy id */
@@ -3470,6 +3471,9 @@ static void lan78xx_disconnect(struct usb_interface *intf)
 
 	usb_scuttle_anchored_urbs(&dev->deferred);
 
+	kfree(dev->buf);
+	dev->buf = NULL;
+
 	lan78xx_unbind(dev, intf);
 
 	usb_kill_urb(dev->urb_intr);
@@ -3520,7 +3524,6 @@ static int lan78xx_probe(struct usb_interface *intf,
 	int ret;
 	unsigned maxp;
 	unsigned period;
-	u8 *buf = NULL;
 
 	udev = interface_to_usbdev(intf);
 	udev = usb_get_dev(udev);
@@ -3588,16 +3591,15 @@ static int lan78xx_probe(struct usb_interface *intf,
 	period = dev->ep_intr->desc.bInterval;
 
 	maxp = usb_maxpacket(dev->udev, dev->pipe_intr, 0);
-	buf = kmalloc(maxp, GFP_KERNEL);
-	if (buf) {
+	dev->buf = kmalloc(maxp, GFP_KERNEL);
+	if (dev->buf) {
 		dev->urb_intr = usb_alloc_urb(0, GFP_KERNEL);
 		if (!dev->urb_intr) {
 			ret = -ENOMEM;
-			kfree(buf);
 			goto out3;
 		} else {
 			usb_fill_int_urb(dev->urb_intr, dev->udev,
-					 dev->pipe_intr, buf, maxp,
+					 dev->pipe_intr, dev->buf, maxp,
 					 intr_complete, dev, period);
 		}
 	}
@@ -3626,6 +3628,8 @@ static int lan78xx_probe(struct usb_interface *intf,
 	return 0;
 
 out3:
+	kfree(dev->buf);
+	dev->buf = NULL;
 	lan78xx_unbind(dev, intf);
 out2:
 	free_netdev(netdev);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jesper Dangaard Brouer @ 2017-09-27 11:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, netdev, daniel, davem, hannes, dsahern,
	oss-drivers, linux-doc@vger.kernel.org, brouer
In-Reply-To: <20170927035742.5c9ee8c3@cakuba>

On Wed, 27 Sep 2017 03:57:42 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> On Wed, 27 Sep 2017 12:45:11 +0200, Jesper Dangaard Brouer wrote:
> > On Wed, 27 Sep 2017 00:02:08 +0100
> > Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> >   
> > > On Tue, 26 Sep 2017 15:24:06 -0700, Alexei Starovoitov wrote:    
> > > > On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:      
> > > > > Add a simple tool for querying and updating BPF objects on the system.
> > > > > 
> > > > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > > > Reviewed-by: Simon Horman <simon.horman@netronome.com>    
> > [...]  
> > > > >  tools/bpf/Makefile             |  18 +-
> > > > >  tools/bpf/bpftool/Makefile     |  80 +++++
> > > > >  tools/bpf/bpftool/common.c     | 214 ++++++++++++
> > > > >  tools/bpf/bpftool/jit_disasm.c |  83 +++++
> > > > >  tools/bpf/bpftool/main.c       | 212 ++++++++++++
> > > > >  tools/bpf/bpftool/main.h       |  99 ++++++
> > > > >  tools/bpf/bpftool/map.c        | 742 +++++++++++++++++++++++++++++++++++++++++
> > > > >  tools/bpf/bpftool/prog.c       | 392 ++++++++++++++++++++++
> > > > >  8 files changed, 1837 insertions(+), 3 deletions(-)        
> > > > ...      
> > > > > +static int do_help(int argc, char **argv)
> > > > > +{
> > > > > +	fprintf(stderr,
> > > > > +		"Usage: %s %s show   [MAP]\n"
> > > > > +		"       %s %s dump    MAP\n"
> > > > > +		"       %s %s update  MAP  key BYTES value VALUE [UPDATE_FLAGS]\n"
> > > > > +		"       %s %s lookup  MAP  key BYTES\n"
> > > > > +		"       %s %s getnext MAP [key BYTES]\n"
> > > > > +		"       %s %s delete  MAP  key BYTES\n"
> > > > > +		"       %s %s pin     MAP  FILE\n"
> > > > > +		"       %s %s help\n"
> > > > > +		"\n"
> > > > > +		"       MAP := { id MAP_ID | pinned FILE }\n"
> > > > > +		"       " HELP_SPEC_PROGRAM "\n"
> > > > > +		"       VALUE := { BYTES | MAP | PROG }\n"
> > > > > +		"       UPDATE_FLAGS := { any | exist | noexist }\n"
> > > > > +		"",        
> > > > 
> > > > overall looks good to me, but still difficult to grasp how to use it.
> > > > Can you add README with example usage and expected output?      
> > > 
> > > I have a README on GitHub, but I was thinking about perhaps writing a
> > > proper man page?  Do you prefer one over the other?    
> > 
> > I would prefer adding a README.rst file, in RST-format, as the rest of
> > the kernel documentation is moving in that direction[1] (your github
> > version is in README.md format).  A man page will always be
> > out-of-sync, and even out-of-sync on different distros.
> > 
> >  See[1]: https://www.kernel.org/doc/html/latest/
> > 
> > And then I would find some place in Documentation/admin-guide/ and
> > include the README.rst file, so it shows up at [1].
> > 
> > RST have an include method like:
> > 
> > .. include:: ../../tools/bpf/bpftool/README.rst  
> 
> Can the docs in new format be rendered into a man page?  Call me old
> fashioned but I think we should provide some form of a man page.. :)

Yes, simply create the man page like:

 rst2man README.rst README.man

You can add this to your local makefile.

The standard sphinx build can also generate man-pages, but it have been
removed from the kernel makefile targets:

Documentation targets:
 Linux kernel internal documentation in different formats from ReST:
  htmldocs        - HTML
  latexdocs       - LaTeX
  pdfdocs         - PDF
  epubdocs        - EPUB
  xmldocs         - XML
  linkcheckdocs   - check for broken external links (will connect to external hosts)
  cleandocs       - clean all generated files

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v2 02/16] thunderbolt: Add support for XDomain properties
From: Mika Westerberg @ 2017-09-27 11:32 UTC (permalink / raw)
  To: David Miller
  Cc: gregkh, andreas.noever, michael.jamet, yehezkel.bernat,
	amir.jer.levy, Mario.Limonciello, lukas, andriy.shevchenko,
	andrew, linux-kernel, netdev
In-Reply-To: <20170926.213354.305351416790211828.davem@davemloft.net>

On Tue, Sep 26, 2017 at 09:33:54PM -0700, David Miller wrote:
> From: Mika Westerberg <mika.westerberg@linux.intel.com>
> Date: Mon, 25 Sep 2017 14:07:24 +0300
> 
> > +struct tb_property_entry {
> > +	u32 key_hi;
> > +	u32 key_lo;
> > +	u16 length;
> > +	u8 reserved;
> > +	u8 type;
> > +	u32 value;
> > +} __packed;
> > +
> > +struct tb_property_rootdir_entry {
> > +	u32 magic;
> > +	u32 length;
> > +	struct tb_property_entry entries[];
> > +} __packed;
> > +
> > +struct tb_property_dir_entry {
> > +	u32 uuid[4];
> > +	struct tb_property_entry entries[];
> > +} __packed;
> 
> There is no apparent need for __packed here, and __packed should be
> avoided unless absolutely necessary as it pessimizes the code
> significantly on some architectures.
> 
> Please remove __packed from these datastructures unless you can
> prove it is absolutely needed and, in such case, please document
> in a comment why that requirement exists.  Because from the layout
> of these types, everything will be packed in just fine without
> __packed.

I will thanks.

Just for my education, is there some rule which tells when __packed is
to be used? For example the above structures are all 32-bit aligned but
how about something like:

struct foo {
	u32 value1;
	u8 value2;
};

If the on-wire format requires such structures I assume __packed
is needed here?

Thanks!

^ permalink raw reply

* Re: [PATCH net] i40e: Fix limit imprecise of the number of MAC/VLAN that can be added for VFs
From: Sergei Shtylyov @ 2017-09-27 11:33 UTC (permalink / raw)
  To: w00273186, davem, jeffrey.t.kirsher; +Cc: netdev, intel-wired-lan, caihe
In-Reply-To: <1506495497-10736-1-git-send-email-wangyunjian@huawei.com>

Hello!

On 9/27/2017 9:58 AM, w00273186 wrote:

> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> Now it don't limit the number of MAC/VLAN strictly. When there is more

    Doesn't.

> elements in the virtchnl MAC/VLAN list, it can still add successfully.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

[...]

MBR, Sergei

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox