Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: cache skb_shinfo() in skb_try_coalesce()
From: David Miller @ 2017-10-04 18:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1507139315.14419.5.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 04 Oct 2017 10:48:35 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Compiler does not really know that skb_shinfo(to|from) are constants
> in skb_try_coalesce(), lets cache their values to shrink code.
> 
> We might even take care of skb_zcopy() calls later.
> 
> $ size net/core/skbuff.o.before net/core/skbuff.o
>    text	   data	    bss	    dec	    hex	filename
>   40727	   1298	      0	  42025	   a429	net/core/skbuff.o.before
>   40631	   1298	      0	  41929	   a3c9	net/core/skbuff.o
> 
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v3 3/3] tools: bpftool: add documentation
From: Jesper Dangaard Brouer @ 2017-10-04 18:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: brouer, netdev, alexei.starovoitov, daniel, dsahern, oss-drivers,
	David Beckett, linux-doc@vger.kernel.org
In-Reply-To: <20171004154032.11413-4-jakub.kicinski@netronome.com>

On Wed,  4 Oct 2017 08:40:32 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:

> Add documentation for bpftool.  Separate files for each subcommand.
> Use rst format.  Documentation is compiled into man pages using
> rst2man.
> 
> Signed-off-by: David Beckett <david.beckett@netronome.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/bpf/bpftool/Documentation/Makefile         |  34 +++++++
>  tools/bpf/bpftool/Documentation/bpftool-map.txt  | 110 +++++++++++++++++++++++
>  tools/bpf/bpftool/Documentation/bpftool-prog.txt |  79 ++++++++++++++++
>  tools/bpf/bpftool/Documentation/bpftool.txt      |  34 +++++++

RST-format files are usually called .rst and not .txt

This is useful when people happen to browse the code via github, then they get formatted nicely e.g.:
 https://github.com/torvalds/linux/blob/master/samples/bpf/README.rst
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net-next 5/7] net: bonding: Add extack messages for some enslave failures
From: Jiri Pirko @ 2017-10-04 18:39 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, j.vosburgh, vfalico, andy, jiri, idosch, davem, bridge
In-Reply-To: <1d18c43d-0604-3d38-2b68-ca3c7d1ab754@gmail.com>

Wed, Oct 04, 2017 at 08:06:16PM CEST, dsahern@gmail.com wrote:
>On 10/4/17 11:04 AM, Jiri Pirko wrote:
>> Wed, Oct 04, 2017 at 05:35:46PM CEST, dsahern@gmail.com wrote:
>>> On 10/3/17 11:38 PM, Jiri Pirko wrote:
>>>> Wed, Oct 04, 2017 at 06:58:52AM CEST, dsahern@gmail.com wrote:
>>>>> A number of bond_enslave errors are logged using the netdev_err API.
>>>>> Return those messages to userspace via the extack facility.
>>>>>
>>>>> Signed-off-by: David Ahern <dsahern@gmail.com>
>>>>> ---
>>>>> drivers/net/bonding/bond_main.c | 12 ++++++++++++
>>>>> 1 file changed, 12 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>>> index bc92307c2082..6688dc9154e0 100644
>>>>> --- a/drivers/net/bonding/bond_main.c
>>>>> +++ b/drivers/net/bonding/bond_main.c
>>>>> @@ -1348,12 +1348,15 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
>>>>>
>>>>> 	/* already in-use? */
>>>>> 	if (netdev_is_rx_handler_busy(slave_dev)) {
>>>>> +		NL_SET_ERR_MSG(extack,
>>>>> +			       "Device is in use and cannot be enslaved");
>>>>
>>>> Please don't do this kind of wrapping. Just let the string be on the
>>>> same line.
>>>>
>>>
>>> Ok, I will do that for bonding only since it is the existing style.
>> 
>> I don't believe you need to do this wrap for any code. Just don't wrap.
>> General code stype says no wrap for strings :)
>> 
>
>I do not break / wrap strings; they need to be searchable. I assumed you
>meant this is preferred for bonding:
>
>NL_SET_ERR_MSG(extack, "Device is in use and cannot be enslaved");

Yep.

>
>
>over what I have done:
>
>NL_SET_ERR_MSG(extack,
>	       "Device is in use and cannot be enslaved");

^ permalink raw reply

* Re: [PATCH] nfp: convert nfp_eth_set_bit_config() into a macro
From: Matthias Kaehlcke @ 2017-10-04 18:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Manoj Gupta, David S . Miller, Simon Horman, Dirk van der Merwe,
	oss-drivers, netdev, linux-kernel, Renato Golin, Guenter Roeck,
	Doug Anderson
In-Reply-To: <20171004111724.29a4d7ff@cakuba.netronome.com>

El Wed, Oct 04, 2017 at 11:17:24AM -0700 Jakub Kicinski ha dit:

> On Wed, 4 Oct 2017 10:42:42 -0700, Manoj Gupta wrote:
> > Hi Jakub,
> > 
> > I had discussed about supporting this code with some clang developers.
> > However, the consensus was this code relies on a specific GCC optimizer
> > behavior and Clang does not share the same behavior by design.
> 
> Hm.  I find surprising that constant propagation to inlined functions
> is considered something GCC-specific rather than obvious/basic.
> 
> > Note that even GCC can't compile this code at -O0.
> 
> Yes, that makes me feel uncomfortable...  Could you try this?

The kernel as a whole does not build with -O0, so I couldn't try
that. I know Manoj (who isn't a kernel dev) isolated the
compiletime_assert macros and encountered the same 'undefined
reference' errors with gcc -O0 that we are seeing with clang.
This is due to:

 "if you use it in an inlined function and pass an argument of the
 function as the argument to the built-in, GCC never returns 1 when
 you call the inline function with a string constant or compound
 literal (see Compound Literals) and does not return 1 when you pass a
 constant numeric value to the inline function unless you specify the
 -O option."

https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Other-Builtins.html

> diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
> index f6f7c085f8e0..47251396fcae 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
> @@ -469,10 +469,10 @@ int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, bool configed)
>  	return nfp_eth_config_commit_end(nsp);
>  }
>  
> -/* Force inline, FIELD_* macroes require masks to be compilation-time known */
> -static __always_inline int
> +static int
>  nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
> -		       const u64 mask, unsigned int val, const u64 ctrl_bit)
> +		       const u64 mask, const unsigned int shift,
> +		       unsigned int val, const u64 ctrl_bit)
>  {
>  	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
>  	unsigned int idx = nfp_nsp_config_idx(nsp);
> @@ -489,11 +489,11 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
>  
>  	/* Check if we are already in requested state */
>  	reg = le64_to_cpu(entries[idx].raw[raw_idx]);
> -	if (val == FIELD_GET(mask, reg))
> +	if (val == (reg & mask) >> shift)
>  		return 0;
>  
>  	reg &= ~mask;
> -	reg |= FIELD_PREP(mask, val);
> +	reg |= (val << shift) & mask;
>  	entries[idx].raw[raw_idx] = cpu_to_le64(reg);
>  
>  	entries[idx].control |= cpu_to_le64(ctrl_bit);
> @@ -503,6 +503,13 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
>  	return 0;
>  }
>  
> +#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit)	\
> +	({								\
> +		__BF_FIELD_CHECK(mask, 0ULL, val, "NFP_ETH_SET_BIT_CONFIG: "); \
> +		nfp_eth_set_bit_config(nsp, raw_idx, mask, __bf_shf(mask), \
> +				       val, ctrl_bit);			\
> +	})
> +
>  /**
>   * __nfp_eth_set_aneg() - set PHY autonegotiation control bit
>   * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
> @@ -515,7 +522,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
>   */
>  int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode)
>  {
> -	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
> +	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
>  				      NSP_ETH_STATE_ANEG, mode,
>  				      NSP_ETH_CTRL_SET_ANEG);
>  }
> @@ -544,7 +551,7 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
>  		return -EINVAL;
>  	}
>  
> -	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
> +	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
>  				      NSP_ETH_STATE_RATE, rate,
>  				      NSP_ETH_CTRL_SET_RATE);
>  }
> @@ -561,6 +568,6 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
>   */
>  int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes)
>  {
> -	return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
> +	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
>  				      lanes, NSP_ETH_CTRL_SET_LANES);
>  }

^ permalink raw reply

* Re: [PATCH] nfp: convert nfp_eth_set_bit_config() into a macro
From: Matthias Kaehlcke @ 2017-10-04 18:49 UTC (permalink / raw)
  To: Joe Perches
  Cc: Jakub Kicinski, David S . Miller, Simon Horman,
	Dirk van der Merwe, oss-drivers, netdev, linux-kernel,
	Renato Golin, Manoj Gupta, Guenter Roeck, Doug Anderson
In-Reply-To: <1507140439.4434.14.camel@perches.com>

Hi Joe,

El Wed, Oct 04, 2017 at 11:07:19AM -0700 Joe Perches ha dit:

> On Tue, 2017-10-03 at 13:05 -0700, Matthias Kaehlcke wrote:
> > nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
> > identify the 'mask' parameter as known to be constant at compile time,
> > which is required to use the FIELD_GET() macro.
> > 
> > The forced inlining does the trick for gcc, but for kernel builds with
> > clang it results in undefined symbols:
> 
> Can't you use local different FIELD_PREP/FIELD_GET macros
> with a different name without the BUILD_BUG tests?
> 
> i.e.:
> 
> #define NFP_FIELD_PREP(_mask, _val)				\
> ({								\
> 	((typeof(_mask))(_val) << __bf_shf(_mask)) & (_mask);	\
> })
> 
> #define NFP_FIELD_GET(_mask, _reg)				\
> ({								\
> 	(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask));	\
> })
> 
> Then the __always_inline can be removed from
> nfp_eth_set_bit_config too.

Thanks for the suggestion. This seems a viable alternative if David
and the NFP owners can live without the extra checking provided by
__BF_FIELD_CHECK.

^ permalink raw reply

* Re: [PATCH v2 net-next] ravb: RX checksum offload
From: Sergei Shtylyov @ 2017-10-04 18:53 UTC (permalink / raw)
  To: Simon Horman, David Miller; +Cc: Magnus Damm, netdev, linux-renesas-soc
In-Reply-To: <1507103667-6084-1-git-send-email-horms+renesas@verge.net.au>

On 10/04/2017 10:54 AM, Simon Horman wrote:

> Add support for RX checksum offload. This is enabled by default and
> may be disabled and re-enabled using ethtool:
> 
>   # ethtool -K eth0 rx off
>   # ethtool -K eth0 rx on
> 
> The RAVB provides a simple checksumming scheme which appears to be
> completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
> the L2 header is appended to packet data; this may be trivially read by the
> driver and used to update the skb accordingly.
> 
> In terms of performance throughput is close to gigabit line-rate both with
> and without RX checksum offload enabled. Perf output, however, appears to
> indicate that significantly less time is spent in do_csum(). This is as
> expected.
> 
> Test results with RX checksum offload enabled:
>   # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
>   MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
>   enable_enobufs failed: getprotobyname
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
> 
>    87380  16384  16384    10.00     937.54
> 
>   Summary of output of perf report:
>      18.28%      ksoftirqd/0  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      10.34%      ksoftirqd/0  [kernel.kallsyms]  [k] __pi_memcpy
>       9.83%      ksoftirqd/0  [kernel.kallsyms]  [k] ravb_poll
>       7.89%      ksoftirqd/0  [kernel.kallsyms]  [k] skb_put
>       4.01%      ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive
>       3.37%          netperf  [kernel.kallsyms]  [k] __arch_copy_to_user
>       3.17%          swapper  [kernel.kallsyms]  [k] arch_cpu_idle
>       2.55%          swapper  [kernel.kallsyms]  [k] tick_nohz_idle_enter
>       2.04%      ksoftirqd/0  [kernel.kallsyms]  [k] __pi___inval_dcache_area
>       2.03%          swapper  [kernel.kallsyms]  [k] _raw_spin_unlock_irq
>       1.96%      ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb
>       1.59%      ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.83
> 
> Test results without RX checksum offload enabled:
>   # /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
>   MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
>   enable_enobufs failed: getprotobyname
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
> 
>    87380  16384  16384    10.00     940.20
> 
>   Summary of output of perf report:
>      17.10%    ksoftirqd/0  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      10.99%    ksoftirqd/0  [kernel.kallsyms]  [k] __pi_memcpy
>       8.87%    ksoftirqd/0  [kernel.kallsyms]  [k] ravb_poll
>       8.16%    ksoftirqd/0  [kernel.kallsyms]  [k] skb_put
>       7.42%    ksoftirqd/0  [kernel.kallsyms]  [k] do_csum
>       3.91%    ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive
>       2.31%        swapper  [kernel.kallsyms]  [k] arch_cpu_idle
>       2.16%    ksoftirqd/0  [kernel.kallsyms]  [k] __pi___inval_dcache_area
>       2.14%    ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb
>       1.93%        netperf  [kernel.kallsyms]  [k] __arch_copy_to_user
>       1.79%        swapper  [kernel.kallsyms]  [k] tick_nohz_idle_enter
>       1.63%    ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.83
> 
> Above results collected on an R-Car Gen 3 Salvator-X/r8a7796 ES1.0.
> Also tested on a R-Car Gen 3 Salvator-X/r8a7795 ES1.0.
> 
> By inspection this also appears to be compatible with the ravb found
> on R-Car Gen 2 SoCs, however, this patch is currently untested on such
> hardware.
> 
> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
> 
> ---
> v2
> Address review of v1 by Sergei Shtylyov
> * set features rather than oring them with (zero) existing values:
> * Set/unset using a single call to ravb_modify()

   Already belated but still:

Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 0/3] A own subdirectory for shared TCP code
From: Richard Siegfried @ 2017-10-04 18:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20171003.160338.1370433279322133749.davem@davemloft.net>

On 04/10/17 01:03, David Miller wrote:
> As someone who has to do backports regularly to -stable, there is no way
> I am applying this.
> 
> Sorry.
Okay, I see.

Is grouping files into subdirectories something generally
unwanted/unlikely to be applied or is this specific to TCP / networking?

Because there are several other places in the source tree where I would
like to group things.

^ permalink raw reply

* Re: [PATCH v6 05/11] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY
From: Corentin Labbe @ 2017-10-04 19:00 UTC (permalink / raw)
  To: Andrew Lunn, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	maxime.ripard-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, wens-jdAy2FN1RRM
  Cc: mark.rutland-5wv7dgnIgG8, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	peppe.cavallaro-qxv4g6HH51o, alexandre.torgue-qxv4g6HH51o,
	frowand.list-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <20170928073708.GB32676@Red>

On Thu, Sep 28, 2017 at 09:37:08AM +0200, Corentin Labbe wrote:
> On Wed, Sep 27, 2017 at 04:02:10PM +0200, Andrew Lunn wrote:
> > Hi Corentin
> > 
> > > +Required properties for the mdio-mux node:
> > > +  - compatible = "mdio-mux"
> > 
> > This is too generic. Please add a more specific compatible for this
> > particular mux. You can keep "mdio-mux", since that is what the MDIO
> > subsystem will look for.
> > 
> 
> I will add allwinner,sun8i-h3-mdio-mux
> 
> > > +Required properties of the integrated phy node:
> > >  - clocks: a phandle to the reference clock for the EPHY
> > >  - resets: a phandle to the reset control for the EPHY
> > > +- phy-is-integrated
> > 
> > So the last thing you said is that the mux is not the problem
> > here. Something else is locking up. Did you discover what?
> > 
> > I really would like phy-is-integrated to go away.
> > 
> 
> I have found the problem: by enabling ephy clk/reset the timeout does not occur anymore.
> So we could remove phy-is-integrated by:
> Moving internal phy clk/reset handling in mdio_mux_syscon_switch_fn()
> But this means:
> - getting internalphy node always by manually get internal_mdio/internal_phy (and not by the given phyhandle)
> - doing some unnecessary tasks (enable/scan/disable) when external_phy is needed
> 

Hello

I have get rid of phy-is-integrated, but mdio_mux_syscon_switch_fn need to enable/disable ephy clk/reset.
And so access to internal PHY node.
But current DT made this ugly: (need to find mdio-mux then internalmdio then internal PHY)

Since MAC cannot reset/choose internal MDIO without ephy clk/rst, could we interpret this as thoses clk/rst must be set in emac node.
This will simplify a lot the code.

Regards

^ permalink raw reply

* Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Alexei Starovoitov @ 2017-10-04 19:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, John Fastabend, peter.waskiewicz.jr,
	Daniel Borkmann, Andy Gospodarek
In-Reply-To: <150711862505.9499.15042356194495111353.stgit@firesoul>

On Wed, Oct 04, 2017 at 02:03:45PM +0200, Jesper Dangaard Brouer wrote:
> The 'cpumap' is primary used as a backend map for XDP BPF helper
> call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
> 
> This patch implement the main part of the map.  It is not connected to
> the XDP redirect system yet, and no SKB allocation are done yet.
> 
> The main concern in this patch is to ensure the datapath can run
> without any locking.  This adds complexity to the setup and tear-down
> procedure, which assumptions are extra carefully documented in the
> code comments.
> 
> V2:
>  - make sure array isn't larger than NR_CPUS
>  - make sure CPUs added is a valid possible CPU
> 
> V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl>
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
...
> +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> +{
> +	struct bpf_cpu_map *cmap;
> +	u64 cost;
> +	int err;
> +
> +	/* check sanity of attributes */
> +	if (attr->max_entries == 0 || attr->key_size != 4 ||
> +	    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
> +		return ERR_PTR(-EINVAL);
> +
> +	cmap = kzalloc(sizeof(*cmap), GFP_USER);
> +	if (!cmap)
> +		return ERR_PTR(-ENOMEM);

just noticed that there is nothing here nor in DEVMAP/SOCKMAP
that prevents unpriv user to create them.
I'm not sure it was intentional for DEVMAP/SOCKMAP.
For CPUMAP I'd suggest to restrict it to root, since it
suppose to operate with XDP only which is root anyway.
Note, lpm and lru maps are cap_sys_admin only already.

^ permalink raw reply

* Re: [PATCH net-next 2/2] flow_dissector: dissect tunnel info
From: Simon Horman @ 2017-10-04 19:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tom Herbert, David Miller, Jiri Pirko, Jamal Hadi Salim,
	Cong Wang, Linux Kernel Network Developers, oss-drivers
In-Reply-To: <20171004180715.GH1895@nanopsycho>

On Wed, Oct 04, 2017 at 08:07:15PM +0200, Jiri Pirko wrote:
> Wed, Oct 04, 2017 at 05:52:54PM CEST, tom@herbertland.com wrote:
> >On Wed, Oct 4, 2017 at 1:15 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> >> Wed, Oct 04, 2017 at 10:08:57AM CEST, simon.horman@netronome.com wrote:
> >>>On Tue, Oct 03, 2017 at 11:17:46AM -0700, Tom Herbert wrote:
> >>>> On Tue, Oct 3, 2017 at 2:40 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >>>> > On Mon, Oct 02, 2017 at 01:37:55PM -0700, Tom Herbert wrote:
> >>>> >> On Mon, Oct 2, 2017 at 1:41 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >>>> >> > Move dissection of tunnel info from the flower classifier to the flow
> >>>> >> > dissector where all other dissection occurs.  This should not have any
> >>>> >> > behavioural affect on other users of the flow dissector.
> >>>> >
> >>>> > ...
> >>>>
> >>>> > I feel that we are circling back the perennial issue of flower using the
> >>>> > flow dissector in a somewhat broader/different way than many/all other
> >>>> > users of the flow dissector.
> >>>> >
> >>>> Simon,
> >>>>
> >>>> It's more like __skb_flow_dissect is already an incredibly complex
> >>>> function and because of that it's difficult to maintain. We need to
> >>>> measure changes against that fact. For this patch, there is precisely
> >>>> one user (cls_flower.c) and it's not at all clear to me if there will
> >>>> be ever any more (e.g. for hashing we don't need tunnel info). IMO, it
> >>>> should be just as easy and less convolution for everyone to have
> >>>> flower call __skb_flow_dissect_tunnel_info directly and not call if
> >>>> from __skb_flow_dissect.
> >>>
> >>>Hi Tom,
> >>>
> >>>my original suggestion was just that, but Jiri indicated a strong preference
> >>>for the approach taken by this patch. I think we need to widen the
> >>>participants in this discussion.
> >>
> >> I like the __skb_flow_dissect to be the function to call and it will do
> >> the job according to the configuration. I don't like to split in
> >> multiple calls.
> >
> >Those are not technical arguments. As I already mentioned, I don't
> >like it when we add stuff for the benefit of a 1% use case that
> >negatively impacts the rest of the 99% cases which is what I believe
> >is happening here.
> 
> Yeah. I just wanted the flow dissector to stay compact. But if needed,
> could be split. I just fear that it will become a mess that's all.
> 
> 
> >
> >> Does not make sense in the most of the cases as the
> >> dissection state would have to be carried in between calls.
> >
> >Please elaborate. This code is being moved into __skb_flow_dissect, so
> >the functionality was already there. I don't see any description in
> >this discussion that things were broken and that this patch is a
> >necessary fix.
> 
> Yeah, you are right.

Hi Tom, Hi Jiri,

I'm happy to make a patch to move the call to
__skb_flow_dissect_tunnel_info() from __skb_flow_dissect() to
fl_classify(). It seems that approach has been agreed on above.

^ permalink raw reply

* [PATCH net v2] RDS: IB: Limit the scope of has_fr/has_fmr variables
From: Avinash Repaka @ 2017-10-04 19:10 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Avinash Repaka

This patch fixes the scope of has_fr and has_fmr variables as they are
needed only in rds_ib_add_one().

Signed-off-by: Avinash Repaka <avinash.repaka-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

---
 net/rds/ib.c | 11 ++++++-----
 net/rds/ib.h |  2 --
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index a0954ac..36dd209 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -126,6 +126,7 @@ void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
 static void rds_ib_add_one(struct ib_device *device)
 {
 	struct rds_ib_device *rds_ibdev;
+	bool has_fr, has_fmr;
 
 	/* Only handle IB (no iWARP) devices */
 	if (device->node_type != RDMA_NODE_IB_CA)
@@ -143,11 +144,11 @@ static void rds_ib_add_one(struct ib_device *device)
 	rds_ibdev->max_wrs = device->attrs.max_qp_wr;
 	rds_ibdev->max_sge = min(device->attrs.max_sge, RDS_IB_MAX_SGE);
 
-	rds_ibdev->has_fr = (device->attrs.device_cap_flags &
-				  IB_DEVICE_MEM_MGT_EXTENSIONS);
-	rds_ibdev->has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
-			    device->map_phys_fmr && device->unmap_fmr);
-	rds_ibdev->use_fastreg = (rds_ibdev->has_fr && !rds_ibdev->has_fmr);
+	has_fr = (device->attrs.device_cap_flags &
+		  IB_DEVICE_MEM_MGT_EXTENSIONS);
+	has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
+		   device->map_phys_fmr && device->unmap_fmr);
+	rds_ibdev->use_fastreg = (has_fr && !has_fmr);
 
 	rds_ibdev->fmr_max_remaps = device->attrs.max_map_per_fmr?: 32;
 	rds_ibdev->max_1m_mrs = device->attrs.max_mr ?
diff --git a/net/rds/ib.h b/net/rds/ib.h
index bf48224..6ea6a27 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -215,8 +215,6 @@ struct rds_ib_device {
 	struct list_head	conn_list;
 	struct ib_device	*dev;
 	struct ib_pd		*pd;
-	bool                    has_fmr;
-	bool                    has_fr;
 	bool                    use_fastreg;
 
 	unsigned int		max_mrs;
-- 
2.4.11

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net v2] RDS: IB: Initialize max_items based on underlying device attributes
From: Avinash Repaka @ 2017-10-04 19:11 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller, netdev, linux-rdma, rds-devel,
	linux-kernel
  Cc: Avinash Repaka

Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
variables are set based on the underlying device attributes.

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
v1->v2
Fixed a typo in commit description

 net/rds/ib_rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c
index 9a3c54e..e678699 100644
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -601,11 +601,11 @@ struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_ibdev,
 	if (pool_type == RDS_IB_MR_1M_POOL) {
 		/* +1 allows for unaligned MRs */
 		pool->fmr_attr.max_pages = RDS_MR_1M_MSG_SIZE + 1;
-		pool->max_items = RDS_MR_1M_POOL_SIZE;
+		pool->max_items = rds_ibdev->max_1m_mrs;
 	} else {
 		/* pool_type == RDS_IB_MR_8K_POOL */
 		pool->fmr_attr.max_pages = RDS_MR_8K_MSG_SIZE + 1;
-		pool->max_items = RDS_MR_8K_POOL_SIZE;
+		pool->max_items = rds_ibdev->max_8k_mrs;
 	}
 
 	pool->max_free_pinned = pool->max_items * pool->fmr_attr.max_pages / 4;
-- 
2.4.11

^ permalink raw reply related

* Re: [RFC] bpf: remove global verifier state
From: Daniel Borkmann @ 2017-10-04 19:13 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Dumazet
  Cc: Jakub Kicinski, dsahern, netdev, oss-drivers, david.beckett
In-Reply-To: <20171004034311.t3uba7vqcvahly2q@ast-mbp>

On 10/04/2017 05:43 AM, Alexei Starovoitov wrote:
> On Tue, Oct 03, 2017 at 08:24:06PM -0700, Eric Dumazet wrote:
>> On Tue, 2017-10-03 at 19:52 -0700, Alexei Starovoitov wrote:
>>
>>> yep. looks great.
>>> Please test it and submit officially :)
>>> The commit aafe6ae9cee3 ("bpf: dynamically allocate digest scratch buffer")
>>> fixed the other case where we were relying on the above mutex.
>>> The only other spot to be adjusted is to add spin_lock/mutex or DO_ONCE() to
>>> bpf_get_skb_set_tunnel_proto() to protect md_dst init.
>>> imo that would be it.
>>> Daniel, anything else comes to mind?

Yes, this should be all. DO_ONCE() for the tunnel proto seems a
good choice.

>> 16 MB of log (unswappable kernel memory) per active checker.
>>
>> We might offer a way to oom hosts.
>
> right. good point!
> we need to switch to continuous copy_to_user() after a page or so.
> Can even do it after every vscnprintf()
> but page at a time is probably faster.

Also worst case upper limits on verification side for holding state
aside from the log would need to be checked in terms of how much mem
we end up holding that is not accounted against any process (and not
really "rate-limited" anymore once we drop the mutex).

^ permalink raw reply

* Re: [PATCH net-next 2/2] flow_dissector: dissect tunnel info
From: Jiri Pirko @ 2017-10-04 19:17 UTC (permalink / raw)
  To: Simon Horman
  Cc: Tom Herbert, David Miller, Jiri Pirko, Jamal Hadi Salim,
	Cong Wang, Linux Kernel Network Developers, oss-drivers
In-Reply-To: <20171004190716.GA22135@netronome.com>

Wed, Oct 04, 2017 at 09:07:17PM CEST, simon.horman@netronome.com wrote:
>On Wed, Oct 04, 2017 at 08:07:15PM +0200, Jiri Pirko wrote:
>> Wed, Oct 04, 2017 at 05:52:54PM CEST, tom@herbertland.com wrote:
>> >On Wed, Oct 4, 2017 at 1:15 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> >> Wed, Oct 04, 2017 at 10:08:57AM CEST, simon.horman@netronome.com wrote:
>> >>>On Tue, Oct 03, 2017 at 11:17:46AM -0700, Tom Herbert wrote:
>> >>>> On Tue, Oct 3, 2017 at 2:40 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> >>>> > On Mon, Oct 02, 2017 at 01:37:55PM -0700, Tom Herbert wrote:
>> >>>> >> On Mon, Oct 2, 2017 at 1:41 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> >>>> >> > Move dissection of tunnel info from the flower classifier to the flow
>> >>>> >> > dissector where all other dissection occurs.  This should not have any
>> >>>> >> > behavioural affect on other users of the flow dissector.
>> >>>> >
>> >>>> > ...
>> >>>>
>> >>>> > I feel that we are circling back the perennial issue of flower using the
>> >>>> > flow dissector in a somewhat broader/different way than many/all other
>> >>>> > users of the flow dissector.
>> >>>> >
>> >>>> Simon,
>> >>>>
>> >>>> It's more like __skb_flow_dissect is already an incredibly complex
>> >>>> function and because of that it's difficult to maintain. We need to
>> >>>> measure changes against that fact. For this patch, there is precisely
>> >>>> one user (cls_flower.c) and it's not at all clear to me if there will
>> >>>> be ever any more (e.g. for hashing we don't need tunnel info). IMO, it
>> >>>> should be just as easy and less convolution for everyone to have
>> >>>> flower call __skb_flow_dissect_tunnel_info directly and not call if
>> >>>> from __skb_flow_dissect.
>> >>>
>> >>>Hi Tom,
>> >>>
>> >>>my original suggestion was just that, but Jiri indicated a strong preference
>> >>>for the approach taken by this patch. I think we need to widen the
>> >>>participants in this discussion.
>> >>
>> >> I like the __skb_flow_dissect to be the function to call and it will do
>> >> the job according to the configuration. I don't like to split in
>> >> multiple calls.
>> >
>> >Those are not technical arguments. As I already mentioned, I don't
>> >like it when we add stuff for the benefit of a 1% use case that
>> >negatively impacts the rest of the 99% cases which is what I believe
>> >is happening here.
>> 
>> Yeah. I just wanted the flow dissector to stay compact. But if needed,
>> could be split. I just fear that it will become a mess that's all.
>> 
>> 
>> >
>> >> Does not make sense in the most of the cases as the
>> >> dissection state would have to be carried in between calls.
>> >
>> >Please elaborate. This code is being moved into __skb_flow_dissect, so
>> >the functionality was already there. I don't see any description in
>> >this discussion that things were broken and that this patch is a
>> >necessary fix.
>> 
>> Yeah, you are right.
>
>Hi Tom, Hi Jiri,
>
>I'm happy to make a patch to move the call to
>__skb_flow_dissect_tunnel_info() from __skb_flow_dissect() to
>fl_classify(). It seems that approach has been agreed on above.

If the consensus is that the right way is to cut-out flow dissector,
so be it. But first, I believe it is reasonable to request to see some
numbers that would indicate that it actually resolves anything.

^ permalink raw reply

* Re: [PATCH 1/3] wireless: iwlwifi: use bool instead of int
From: Luciano Coelho @ 2017-10-04 19:18 UTC (permalink / raw)
  To: Joe Perches, Christoph Böhmwalder, johannes.berg,
	emmanuel.grumbach, kvalo
  Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <1507139722.4434.12.camel@perches.com>

On Wed, 2017-10-04 at 10:55 -0700, Joe Perches wrote:
> On Wed, 2017-10-04 at 19:39 +0300, Luciano Coelho wrote:
> > On Wed, 2017-10-04 at 09:26 -0700, Joe Perches wrote:
> 
> []
> > > This might be more intelligble as separate tests
> > > 
> > > static bool is_valid_channel(u16 ch_id)
> > > {
> > > 	if (ch_id <= 14)
> > > 		return true;
> > > 
> > > 	if ((ch_id % 4 == 0) &&
> > > 	    ((ch_id >= 36 && ch_id <= 64) ||
> > > 	     (ch_id >= 100 && ch_id <= 140)))
> > > 		return true;
> > > 
> > > 	if ((ch_id % 4 == 1) &&
> > > 	    (chid >= 145 && ch_id <= 165))
> > > 		return true;
> > > 
> > > 	return false;
> > > }
> > > 
> > > The compiler should produce the same object code.
> > 
> > Yeah, it may be a bit easier to read, but I don't want to start
> > getting
> > "fixes" to working and reasonable code.  There's nothing wrong with
> > the
> > existing function (except maybe for the int vs. boolean) so let's
> > not
> > change it.
> > 
> > A good time to change this would be the next time someone adds yet
> > another range of valid channels here. ;)
> 
> <shrug>  Your choice.
> 
> I like code I can read and understand at a glance.

I do too, but I don't think the original is that hard to read, really. 
Each "if" you add is already corresponding to one separate line in the
original code...


> At case somebody needs to add channels, likely nobody
> would do the change suggested but would just add
> another test to the already odd looking block.

Yeah, that would most likely be the case, but if I saw that and thought
there was a better way to write it, believe me, I would definitely
nitpick the patch and ask the author to reorg the code so it would look
nicer.


> And constants should be on the right side of the tests.

Sure, in a new patch, we would definitely pay attention to that.  But
now, is it worth having one more patch go through the entire machinery
to change a relatively clear, extremely simple function just because
you could write it in a different way? My answer is a resounding no,
sorry.

--
Luca.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size
From: Daniel Borkmann @ 2017-10-04 19:27 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Alexei Starovoitov, Jesper Dangaard Brouer, David S . Miller,
	Chonggang Li, netdev
In-Reply-To: <CAEfhGixoRM_wRy_e9reCGq69XUM1w-1kKGErCUty=gEB8DbzfA@mail.gmail.com>

On 10/04/2017 03:59 PM, Craig Gallek wrote:
> On Tue, Oct 3, 2017 at 10:39 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 10/03/2017 01:07 AM, Alexei Starovoitov wrote:
>>> On 10/2/17 9:41 AM, Craig Gallek wrote:
>>>>
>>>> +    /* Assume equally sized map definitions */
>>>> +    map_def_sz = data->d_size / nr_maps;
>>>> +    if (!data->d_size || (data->d_size % nr_maps) != 0) {
>>>> +        pr_warning("unable to determine map definition size "
>>>> +               "section %s, %d maps in %zd bytes\n",
>>>> +               obj->path, nr_maps, data->d_size);
>>>> +        return -EINVAL;
>>>> +    }
>>>
>>> this approach is not as flexible as done by samples/bpf/bpf_load.c
>>> where it looks at every map independently by walking symtab,
>>> but I guess it's ok.
>>
>> Regarding different map spec structs in a single prog: unless
>> we have a good use case why we would need it (and I'm not aware
>> of anything in particular), I would just go with a fixed size.
>> I did kind of similar sanity checks in bpf_fetch_maps_end() in
>> iproute2 loader as well.
>
> Split vote?  I'm happy to try to make this work with varying sizes,
> but I agree the usefulness seems low.  It would imply building map
> sections with different definition structures and we would then need a
> way to differentiate them.  If the goal is to allow for different map
> definition structures, I don't think size alone is a sufficient
> differentiator.

They would still all need to go to maps section, but you would end
up going with different structs for map specs, where they all would
need to overlap for the members in order for this to work. E.g. you
would have maps of both structs sitting in map section of a single
prog ...

struct bpf_map_spec_v1 {
	__u32 type;
	__u32 size_key;
	__u32 size_value;
	__u32 max_elem;
};

struct bpf_map_spec_v2 {
	__u32 type;
	__u32 size_key;
	__u32 size_value;
	__u32 max_elem;
	__u32 flags;
};

... rather than just using single spec everywhere consistently, and
have the members zeroed that are unused or such, so if there's no
real use-case for different structs (cannot think of any right now),
lets not add complexity for it until it's really required.

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size
From: Alexei Starovoitov @ 2017-10-04 19:54 UTC (permalink / raw)
  To: Daniel Borkmann, Craig Gallek
  Cc: Jesper Dangaard Brouer, David S . Miller, Chonggang Li, netdev
In-Reply-To: <59D53611.5020907@iogearbox.net>

On 10/4/17 12:27 PM, Daniel Borkmann wrote:
> On 10/04/2017 03:59 PM, Craig Gallek wrote:
>> On Tue, Oct 3, 2017 at 10:39 AM, Daniel Borkmann
>> <daniel@iogearbox.net> wrote:
>>> On 10/03/2017 01:07 AM, Alexei Starovoitov wrote:
>>>> On 10/2/17 9:41 AM, Craig Gallek wrote:
>>>>>
>>>>> +    /* Assume equally sized map definitions */
>>>>> +    map_def_sz = data->d_size / nr_maps;
>>>>> +    if (!data->d_size || (data->d_size % nr_maps) != 0) {
>>>>> +        pr_warning("unable to determine map definition size "
>>>>> +               "section %s, %d maps in %zd bytes\n",
>>>>> +               obj->path, nr_maps, data->d_size);
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>
>>>> this approach is not as flexible as done by samples/bpf/bpf_load.c
>>>> where it looks at every map independently by walking symtab,
>>>> but I guess it's ok.
>>>
>>> Regarding different map spec structs in a single prog: unless
>>> we have a good use case why we would need it (and I'm not aware
>>> of anything in particular), I would just go with a fixed size.
>>> I did kind of similar sanity checks in bpf_fetch_maps_end() in
>>> iproute2 loader as well.
>>
>> Split vote?  I'm happy to try to make this work with varying sizes,
>> but I agree the usefulness seems low.  It would imply building map
>> sections with different definition structures and we would then need a
>> way to differentiate them.  If the goal is to allow for different map
>> definition structures, I don't think size alone is a sufficient
>> differentiator.
>
> They would still all need to go to maps section, but you would end
> up going with different structs for map specs, where they all would
> need to overlap for the members in order for this to work. E.g. you
> would have maps of both structs sitting in map section of a single
> prog ...
>
> struct bpf_map_spec_v1 {
>     __u32 type;
>     __u32 size_key;
>     __u32 size_value;
>     __u32 max_elem;
> };
>
> struct bpf_map_spec_v2 {
>     __u32 type;
>     __u32 size_key;
>     __u32 size_value;
>     __u32 max_elem;
>     __u32 flags;
> };
>
> ... rather than just using single spec everywhere consistently, and
> have the members zeroed that are unused or such, so if there's no
> real use-case for different structs (cannot think of any right now),
> lets not add complexity for it until it's really required.

makes sense to me :)

^ permalink raw reply

* [PATCH] net/ipv4: Remove unused variable in route.c
From: Tim Hansen @ 2017-10-04 19:59 UTC (permalink / raw)
  To: davem; +Cc: kuznet, yoshfuji, netdev, linux-kernel, alexander.levin,
	devtimhansen

int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0.

This was found with coccicheck M=net/ipv4/ on linus' tree.

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
---
 net/ipv4/route.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2..02b6be9 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3032,7 +3032,6 @@ struct ip_rt_acct __percpu *ip_rt_acct __read_mostly;
 
 int __init ip_rt_init(void)
 {
-	int rc = 0;
 	int cpu;
 
 	ip_idents = kmalloc(IP_IDENTS_SZ * sizeof(*ip_idents), GFP_KERNEL);
@@ -3089,7 +3088,7 @@ int __init ip_rt_init(void)
 #endif
 	register_pernet_subsys(&rt_genid_ops);
 	register_pernet_subsys(&ipv4_inetpeer_ops);
-	return rc;
+	return 0;
 }
 
 #ifdef CONFIG_SYSCTL
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next 0/3] tcp: improving RACK cpu performance
From: Yuchung Cheng @ 2017-10-04 19:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuchung Cheng

This patch set improves the CPU consumption of the RACK TCP loss
recovery algorithm, in particular for high-speed networks. Currently,
for every ACK in recovery RACK can potentially iterate over all sent
packets in the write queue. On large BDP networks with non-trivial
losses the RACK write queue walk CPU usage becomes unreasonably high.

This patch introduces a new queue in TCP that keeps only skbs sent and
not yet (s)acked or marked lost, in time order instead of sequence
order.  With that, RACK can examine this time-sorted list and only
check packets that were sent recently, within the reordering window,
per ACK. This is the fastest way without any write queue walks. The
number of skbs examined per ACK is reduced by orders of magnitude.

Eric Dumazet (1):
  tcp: new list for sent but unacked skbs for RACK recovery

Yuchung Cheng (2):
  tcp: more efficient RACK loss detection
  tcp: a small refactor of RACK loss detection

 include/linux/skbuff.h   | 11 ++++++++--
 include/linux/tcp.h      |  1 +
 include/net/tcp.h        | 24 +++++++++++++++++++++-
 net/ipv4/tcp.c           |  2 ++
 net/ipv4/tcp_input.c     |  9 +++++++--
 net/ipv4/tcp_minisocks.c |  1 +
 net/ipv4/tcp_output.c    | 42 ++++++++++++++++++++++++++++----------
 net/ipv4/tcp_recovery.c  | 52 ++++++++++++++++++------------------------------
 8 files changed, 93 insertions(+), 49 deletions(-)

-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply

* [PATCH net-next 1/3] tcp: new list for sent but unacked skbs for RACK recovery
From: Yuchung Cheng @ 2017-10-04 19:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell
In-Reply-To: <20171004200000.39257-1-ycheng@google.com>

From: Eric Dumazet <edumazet@google.com>

This patch adds a new queue (list) that tracks the sent but not yet
acked or SACKed skbs for a TCP connection. The list is chronologically
ordered by skb->skb_mstamp (the head is the oldest sent skb).

This list will be used to optimize TCP Rack recovery, which checks
an skb's timestamp to judge if it has been lost and needs to be
retransmitted. Since TCP write queue is ordered by sequence instead
of sent time, RACK has to scan over the write queue to catch all
eligible packets to detect lost retransmission, and iterates through
SACKed skbs repeatedly.

Special cares for rare events:
1. TCP repair fakes skb transmission so the send queue needs adjusted
2. SACK reneging would require re-inserting SACKed skbs into the
   send queue. For now I believe it's not worth the complexity to
   make RACK work perfectly on SACK reneging, so we do nothing here.
3. Fast Open: currently for non-TFO, send-queue correctly queues
   the pure SYN packet. For TFO which queues a pure SYN and
   then a data packet, send-queue only queues the data packet but
   not the pure SYN due to the structure of TFO code. This is okay
   because the SYN receiver would never respond with a SACK on a
   missing SYN (i.e. SYN is never fast-retransmitted by SACK/RACK).

In order to not grow sk_buff, we use an union for the new list and
_skb_refdst/destructor fields. This is a bit complicated because
we need to make sure _skb_refdst and destructor are properly zeroed
before skb is cloned/copied at transmit, and before being freed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/linux/skbuff.h   | 11 +++++++++--
 include/linux/tcp.h      |  1 +
 include/net/tcp.h        | 24 +++++++++++++++++++++++-
 net/ipv4/tcp.c           |  2 ++
 net/ipv4/tcp_input.c     |  9 +++++++--
 net/ipv4/tcp_minisocks.c |  1 +
 net/ipv4/tcp_output.c    | 42 +++++++++++++++++++++++++++++++-----------
 7 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ada821466e88..01a985937867 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -617,6 +617,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@nf_trace: netfilter packet trace flag
  *	@protocol: Packet protocol from driver
  *	@destructor: Destruct function
+ *	@tcp_tsorted_anchor: list structure for TCP (tp->tsorted_sent_queue)
  *	@_nfct: Associated connection, if any (with nfctinfo bits)
  *	@nf_bridge: Saved data about a bridged frame - see br_netfilter.c
  *	@skb_iif: ifindex of device we arrived on
@@ -686,8 +687,14 @@ struct sk_buff {
 	 */
 	char			cb[48] __aligned(8);
 
-	unsigned long		_skb_refdst;
-	void			(*destructor)(struct sk_buff *skb);
+	union {
+		struct {
+			unsigned long	_skb_refdst;
+			void		(*destructor)(struct sk_buff *skb);
+		};
+		struct list_head	tcp_tsorted_anchor;
+	};
+
 #ifdef CONFIG_XFRM
 	struct	sec_path	*sp;
 #endif
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 4aa40ef02d32..1d2c44e09e31 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -191,6 +191,7 @@ struct tcp_sock {
 	u32	tsoffset;	/* timestamp offset */
 
 	struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
+	struct list_head tsorted_sent_queue; /* time-sorted sent but un-SACKed skbs */
 
 	u32	snd_wl1;	/* Sequence for window update		*/
 	u32	snd_wnd;	/* The window we expect to receive	*/
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d25d8305054..c39bcc222c9b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1588,14 +1588,34 @@ enum tcp_chrono {
 void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type);
 void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type);
 
+/* This helper is needed, because skb->tcp_tsorted_anchor uses
+ * the same memory storage than skb->destructor/_skb_refdst
+ */
+static inline void tcp_skb_tsorted_anchor_cleanup(struct sk_buff *skb)
+{
+	skb->destructor = NULL;
+	skb->_skb_refdst = 0UL;
+}
+
+#define tcp_skb_tsorted_save(skb) {		\
+	unsigned long _save = skb->_skb_refdst;	\
+	skb->_skb_refdst = 0UL;
+
+#define tcp_skb_tsorted_restore(skb)		\
+	skb->_skb_refdst = _save;		\
+}
+
 /* write queue abstraction */
 static inline void tcp_write_queue_purge(struct sock *sk)
 {
 	struct sk_buff *skb;
 
 	tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
-	while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
+	while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) {
+		tcp_skb_tsorted_anchor_cleanup(skb);
 		sk_wmem_free_skb(sk, skb);
+	}
+	INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
 	sk_mem_reclaim(sk);
 	tcp_clear_all_retrans_hints(tcp_sk(sk));
 }
@@ -1710,6 +1730,8 @@ static inline void tcp_insert_write_queue_before(struct sk_buff *new,
 
 static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk)
 {
+	list_del(&skb->tcp_tsorted_anchor);
+	tcp_skb_tsorted_anchor_cleanup(skb);
 	__skb_unlink(skb, &sk->sk_write_queue);
 }
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 23225c98d287..6d25008be84b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -415,6 +415,7 @@ void tcp_init_sock(struct sock *sk)
 	tp->out_of_order_queue = RB_ROOT;
 	tcp_init_xmit_timers(sk);
 	INIT_LIST_HEAD(&tp->tsq_node);
+	INIT_LIST_HEAD(&tp->tsorted_sent_queue);
 
 	icsk->icsk_rto = TCP_TIMEOUT_INIT;
 	tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
@@ -869,6 +870,7 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp,
 			 * available to the caller, no more, no less.
 			 */
 			skb->reserved_tailroom = skb->end - skb->tail - size;
+			INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
 			return skb;
 		}
 		__kfree_skb(skb);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index db9bb46b5776..f0402bd9fd7e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1593,6 +1593,8 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff *skb, struct sock *sk,
 						tcp_skb_pcount(skb),
 						skb->skb_mstamp);
 			tcp_rate_skb_delivered(sk, skb, state->rate);
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
+				list_del_init(&skb->tcp_tsorted_anchor);
 
 			if (!before(TCP_SKB_CB(skb)->seq,
 				    tcp_highest_sack_seq(tp)))
@@ -3054,8 +3056,11 @@ static void tcp_ack_tstamp(struct sock *sk, struct sk_buff *skb,
 
 	shinfo = skb_shinfo(skb);
 	if (!before(shinfo->tskey, prior_snd_una) &&
-	    before(shinfo->tskey, tcp_sk(sk)->snd_una))
-		__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
+	    before(shinfo->tskey, tcp_sk(sk)->snd_una)) {
+		tcp_skb_tsorted_save(skb) {
+			__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
+		} tcp_skb_tsorted_restore(skb);
+	}
 }
 
 /* Remove acknowledged frames from the retransmission queue. If our packet
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 188a6f31356d..2341b9f857b6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -446,6 +446,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 		newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
 
 		INIT_LIST_HEAD(&newtp->tsq_node);
+		INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
 
 		tcp_init_wl(newtp, treq->rcv_isn);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0bc9e46a5369..8162e2880178 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -971,6 +971,12 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 		      HRTIMER_MODE_ABS_PINNED);
 }
 
+static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
+{
+	skb->skb_mstamp = tp->tcp_mstamp;
+	list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
+}
+
 /* This routine actually transmits TCP packets queued in by
  * tcp_do_sendmsg().  This is used by both the initial
  * transmission and possible later retransmissions.
@@ -1003,10 +1009,14 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq
 			- tp->snd_una;
 		oskb = skb;
-		if (unlikely(skb_cloned(skb)))
-			skb = pskb_copy(skb, gfp_mask);
-		else
-			skb = skb_clone(skb, gfp_mask);
+
+		tcp_skb_tsorted_save(oskb) {
+			if (unlikely(skb_cloned(oskb)))
+				skb = pskb_copy(oskb, gfp_mask);
+			else
+				skb = skb_clone(oskb, gfp_mask);
+		} tcp_skb_tsorted_restore(oskb);
+
 		if (unlikely(!skb))
 			return -ENOBUFS;
 	}
@@ -1127,7 +1137,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		err = net_xmit_eval(err);
 	}
 	if (!err && oskb) {
-		oskb->skb_mstamp = tp->tcp_mstamp;
+		tcp_update_skb_after_send(tp, oskb);
 		tcp_rate_skb_sent(sk, oskb);
 	}
 	return err;
@@ -1328,6 +1338,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	/* Link BUFF into the send queue. */
 	__skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
+	list_add(&buff->tcp_tsorted_anchor, &skb->tcp_tsorted_anchor);
 
 	return 0;
 }
@@ -2260,7 +2271,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 
 		if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
 			/* "skb_mstamp" is used as a start point for the retransmit timer */
-			skb->skb_mstamp = tp->tcp_mstamp;
+			tcp_update_skb_after_send(tp, skb);
 			goto repair; /* Skip network transmission */
 		}
 
@@ -2838,11 +2849,14 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 		     skb_headroom(skb) >= 0xFFFF)) {
 		struct sk_buff *nskb;
 
-		nskb = __pskb_copy(skb, MAX_TCP_HEADER, GFP_ATOMIC);
-		err = nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
-			     -ENOBUFS;
+		tcp_skb_tsorted_save(skb) {
+			nskb = __pskb_copy(skb, MAX_TCP_HEADER, GFP_ATOMIC);
+			err = nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
+				     -ENOBUFS;
+		} tcp_skb_tsorted_restore(skb);
+
 		if (!err)
-			skb->skb_mstamp = tp->tcp_mstamp;
+			tcp_update_skb_after_send(tp, skb);
 	} else {
 		err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
 	}
@@ -3023,6 +3037,7 @@ void tcp_send_fin(struct sock *sk)
 				goto coalesce;
 			return;
 		}
+		INIT_LIST_HEAD(&skb->tcp_tsorted_anchor);
 		skb_reserve(skb, MAX_TCP_HEADER);
 		sk_forced_mem_schedule(sk, skb->truesize);
 		/* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */
@@ -3078,9 +3093,14 @@ int tcp_send_synack(struct sock *sk)
 	}
 	if (!(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_ACK)) {
 		if (skb_cloned(skb)) {
-			struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC);
+			struct sk_buff *nskb;
+
+			tcp_skb_tsorted_save(skb) {
+				nskb = skb_copy(skb, GFP_ATOMIC);
+			} tcp_skb_tsorted_restore(skb);
 			if (!nskb)
 				return -ENOMEM;
+			INIT_LIST_HEAD(&nskb->tcp_tsorted_anchor);
 			tcp_unlink_write_queue(skb, sk);
 			__skb_header_release(nskb);
 			__tcp_add_write_queue_head(sk, nskb);
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* [PATCH net-next 2/3] tcp: more efficient RACK loss detection
From: Yuchung Cheng @ 2017-10-04 19:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet
In-Reply-To: <20171004200000.39257-1-ycheng@google.com>

Use the new time-ordered list to speed up RACK. The detection
logic is identical. But since the list is chronologically ordered
by skb_mstamp and contains only skbs not yet acked or sacked,
RACK can abort the loop upon hitting skbs that were sent more
recently. On YouTube servers this patch reduces the iterations on
write queue by 40x. The improvement is even bigger with large
BDP networks.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_recovery.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 449cd914d58e..8aa56caefde8 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -45,7 +45,7 @@ static bool tcp_rack_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2)
 static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb;
+	struct sk_buff *skb, *n;
 	u32 reo_wnd;
 
 	*reo_timeout = 0;
@@ -58,17 +58,10 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 	if ((tp->rack.reord || !tp->lost_out) && tcp_min_rtt(tp) != ~0U)
 		reo_wnd = max(tcp_min_rtt(tp) >> 2, reo_wnd);
 
-	tcp_for_write_queue(skb, sk) {
+	list_for_each_entry_safe(skb, n, &tp->tsorted_sent_queue,
+				 tcp_tsorted_anchor) {
 		struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
 
-		if (skb == tcp_send_head(sk))
-			break;
-
-		/* Skip ones already (s)acked */
-		if (!after(scb->end_seq, tp->snd_una) ||
-		    scb->sacked & TCPCB_SACKED_ACKED)
-			continue;
-
 		if (tcp_rack_sent_after(tp->rack.mstamp, skb->skb_mstamp,
 					tp->rack.end_seq, scb->end_seq)) {
 			/* Step 3 in draft-cheng-tcpm-rack-00.txt:
@@ -81,6 +74,7 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 
 			if (remaining < 0) {
 				tcp_rack_mark_skb_lost(sk, skb);
+				list_del_init(&skb->tcp_tsorted_anchor);
 				continue;
 			}
 
@@ -91,11 +85,7 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 
 			/* Record maximum wait time (+1 to avoid 0) */
 			*reo_timeout = max_t(u32, *reo_timeout, 1 + remaining);
-
-		} else if (!(scb->sacked & TCPCB_RETRANS)) {
-			/* Original data are sent sequentially so stop early
-			 * b/c the rest are all sent after rack_sent
-			 */
+		} else {
 			break;
 		}
 	}
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* [PATCH net-next 3/3] tcp: a small refactor of RACK loss detection
From: Yuchung Cheng @ 2017-10-04 20:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet
In-Reply-To: <20171004200000.39257-1-ycheng@google.com>

Refactor the RACK loop to improve readability and speed up the checks.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_recovery.c | 40 ++++++++++++++++++----------------------
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c
index 8aa56caefde8..cda6074a429a 100644
--- a/net/ipv4/tcp_recovery.c
+++ b/net/ipv4/tcp_recovery.c
@@ -61,32 +61,28 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout)
 	list_for_each_entry_safe(skb, n, &tp->tsorted_sent_queue,
 				 tcp_tsorted_anchor) {
 		struct tcp_skb_cb *scb = TCP_SKB_CB(skb);
+		s32 remaining;
 
-		if (tcp_rack_sent_after(tp->rack.mstamp, skb->skb_mstamp,
-					tp->rack.end_seq, scb->end_seq)) {
-			/* Step 3 in draft-cheng-tcpm-rack-00.txt:
-			 * A packet is lost if its elapsed time is beyond
-			 * the recent RTT plus the reordering window.
-			 */
-			u32 elapsed = tcp_stamp_us_delta(tp->tcp_mstamp,
-							 skb->skb_mstamp);
-			s32 remaining = tp->rack.rtt_us + reo_wnd - elapsed;
-
-			if (remaining < 0) {
-				tcp_rack_mark_skb_lost(sk, skb);
-				list_del_init(&skb->tcp_tsorted_anchor);
-				continue;
-			}
-
-			/* Skip ones marked lost but not yet retransmitted */
-			if ((scb->sacked & TCPCB_LOST) &&
-			    !(scb->sacked & TCPCB_SACKED_RETRANS))
-				continue;
+		/* Skip ones marked lost but not yet retransmitted */
+		if ((scb->sacked & TCPCB_LOST) &&
+		    !(scb->sacked & TCPCB_SACKED_RETRANS))
+			continue;
 
+		if (!tcp_rack_sent_after(tp->rack.mstamp, skb->skb_mstamp,
+					 tp->rack.end_seq, scb->end_seq))
+			break;
+
+		/* A packet is lost if it has not been s/acked beyond
+		 * the recent RTT plus the reordering window.
+		 */
+		remaining = tp->rack.rtt_us + reo_wnd -
+			    tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp);
+		if (remaining < 0) {
+			tcp_rack_mark_skb_lost(sk, skb);
+			list_del_init(&skb->tcp_tsorted_anchor);
+		} else {
 			/* Record maximum wait time (+1 to avoid 0) */
 			*reo_timeout = max_t(u32, *reo_timeout, 1 + remaining);
-		} else {
-			break;
 		}
 	}
 }
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related

* Re: [PATCH net-next v3 3/3] tools: bpftool: add documentation
From: Jakub Kicinski @ 2017-10-04 20:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, alexei.starovoitov, daniel, dsahern, oss-drivers,
	David Beckett, linux-doc@vger.kernel.org
In-Reply-To: <20171004203642.699f5f1a@redhat.com>

On Wed, 4 Oct 2017 20:36:42 +0200, Jesper Dangaard Brouer wrote:
> On Wed,  4 Oct 2017 08:40:32 -0700
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> 
> > Add documentation for bpftool.  Separate files for each subcommand.
> > Use rst format.  Documentation is compiled into man pages using
> > rst2man.
> > 
> > Signed-off-by: David Beckett <david.beckett@netronome.com>
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Acked-by: Alexei Starovoitov <ast@kernel.org>
> > Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> >  tools/bpf/bpftool/Documentation/Makefile         |  34 +++++++
> >  tools/bpf/bpftool/Documentation/bpftool-map.txt  | 110 +++++++++++++++++++++++
> >  tools/bpf/bpftool/Documentation/bpftool-prog.txt |  79 ++++++++++++++++
> >  tools/bpf/bpftool/Documentation/bpftool.txt      |  34 +++++++  
> 
> RST-format files are usually called .rst and not .txt
> 
> This is useful when people happen to browse the code via github, then they get formatted nicely e.g.:
>  https://github.com/torvalds/linux/blob/master/samples/bpf/README.rst

I was following perf's example.  Are perf's docs not RST?

^ permalink raw reply

* RE
From: Stefan E Persson @ 2017-10-04 20:05 UTC (permalink / raw)


Your email has been randonly selected for my foundation Donation reply  
to email: eriling.persson089@swedenmail.com for more details

^ permalink raw reply

* Re: [PATCH net-next 0/3] A own subdirectory for shared TCP code
From: Andrew Lunn @ 2017-10-04 20:27 UTC (permalink / raw)
  To: Richard Siegfried; +Cc: David Miller, netdev
In-Reply-To: <d7237d11-ec83-0ef6-d201-da8b99c94b88@systemli.org>

On Wed, Oct 04, 2017 at 08:54:17PM +0200, Richard Siegfried wrote:
> On 04/10/17 01:03, David Miller wrote:
> > As someone who has to do backports regularly to -stable, there is no way
> > I am applying this.
> > 
> > Sorry.
> Okay, I see.
> 
> Is grouping files into subdirectories something generally
> unwanted/unlikely to be applied or is this specific to TCP / networking?
> 
> Because there are several other places in the source tree where I would
> like to group things.

Hi Richard

It is generally unwanted.

Have you tried back porting patches when the directory structure has
changed? Files have moved around? It makes it a lot harder to
do. Meaning patches are going to be back ported less often. Fixes
which could be security relevant might not get back ported, etc.

Kernel 4.4 is going to be supported until 2022. So moving files around
is going to make Greg Kroah-Hartman life more difficult for the next 5
years.

	Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox