Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 00/21] net/ipv6: Separate data structures for FIB and data path
From: David Ahern @ 2018-04-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, roopa, eric.dumazet, weiwan, kafai, yoshfuji,
	David Ahern

IPv6 uses the same data struct for both control plane (FIB entries) and
data path (dst entries). This struct has elements needed for both paths
adding memory overhead and complexity (taking a dst hold in most places
but an additional reference on rt6i_ref in a few). Furthermore, because
of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.

This patch set separates FIB entries from dst entries, better aligning
IPv6 code with IPv4, simplifying the reference counting and allowing
FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
first step to a number of performance and scalability changes.

The end result of this patch set:
  - FIB entries (fib6_info):
        /* size: 208, cachelines: 4, members: 25 */
        /* sum members: 207, holes: 1, sum holes: 1 */

  - dst entries (rt6_info)
       /* size: 240, cachelines: 4, members: 11 */

Versus the the single rt6_info struct today for both paths:
      /* size: 320, cachelines: 5, members: 28 */

This amounts to a 35% reduction in memory use for FIB entries and a
25% reduction for dst entries.

With respect to locking FIB entries use RCU and a single atomic
counter with fib6_info_hold and fib6_info_release helpers to manage
the reference counting. dst entries use only the traditional dst
refcounts with dst_hold and dst_release.

FIB entries for host routes are referenced by inet6_ifaddr and
ifacaddr6. In both cases, additional holds are taken -- similar to
what is done for devices.

This set is the first of many changes to improve the scalability of the
IPv6 code. Follow on changes include:
- consolidating duplicate fib6_info references like IPv4 does with
  duplicate fib_info

- moving fib6_info into a slab cache to avoid allocation roundups to
  power of 2 (the 208 size becomes a 256 actual allocation)

- Allow FIB lookups without generating a dst (e.g., most rt6_lookup
  users just want to verify the egress device). Means moving dst
  allocation to the other side of fib6_rule_lookup which again aligns
  with IPv4 behavior

- using separate standalone nexthop objects which have performance
  benefits beyond fib_info consolidation

At this point I am not seeing any refcount leaks or underflows, no
oops or bug_ons, or warnings from kasan, so I think it is ready for
others to beat up on it finding errors in code paths I have missed.

v1 changes
- rebased to top of tree
- fix memory leak of metrics as noted by Ido
- MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)

RFC v2 changes
- improved commit messages
- move common metrics code from dst.c to net/ipv4/metrics.c (comment
  from DaveM)
- address comments from Wei Wang and Martin KaFai Lau (let me know if
  I missed something)
- fixes detected by kernel test robots
  + added fib6_metric_set to change metric on a FIB entry which could
    be pointing to read-only dst_default_metrics
  + 0day testing found a problem with an intermediate patch; added
    dst_hold_safe on rt->from. Code is removed 3 patches later
- allow cacheinfo to handle NULL dst; means only expires is pushed to
  userspace

David Ahern (21):
  net: Move fib_convert_metrics to metrics file
  net: Handle null dst in rtnl_put_cacheinfo
  vrf: Move fib6_table into net_vrf
  net/ipv6: Pass net to fib6_update_sernum
  net/ipv6: Pass net namespace to route functions
  net/ipv6: Move support functions up in route.c
  net/ipv6: Save route type in rt6_info
  net/ipv6: Move nexthop data to fib6_nh
  net/ipv6: Defer initialization of dst to data path
  net/ipv6: move metrics from dst to rt6_info
  net/ipv6: move expires into rt6_info
  net/ipv6: Add fib6_null_entry
  net/ipv6: Add rt6_info create function for ip6_pol_route_lookup
  net/ipv6: Move dst flags to booleans in fib entries
  net/ipv6: Create a neigh_lookup for FIB entries
  net/ipv6: Add gfp_flags to route add functions
  net/ipv6: Cleanup exception and cache route handling
  net/ipv6: introduce fib6_info struct and helpers
  net/ipv6: separate handling of FIB entries from dst based routes
  net/ipv6: Flip FIB entries to fib6_info
  net/ipv6: Remove unused code and variables for rt6_info

 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |   96 +-
 drivers/net/vrf.c                                  |   25 +-
 include/net/if_inet6.h                             |    4 +-
 include/net/ip.h                                   |    3 +
 include/net/ip6_fib.h                              |  151 ++-
 include/net/ip6_route.h                            |   45 +-
 include/net/netns/ipv6.h                           |    3 +-
 net/core/dst.c                                     |    1 +
 net/core/rtnetlink.c                               |    8 +-
 net/ipv4/Makefile                                  |    3 +-
 net/ipv4/fib_semantics.c                           |   43 +-
 net/ipv4/metrics.c                                 |   53 +
 net/ipv6/addrconf.c                                |  118 +-
 net/ipv6/anycast.c                                 |   21 +-
 net/ipv6/ip6_fib.c                                 |  366 +++---
 net/ipv6/ip6_output.c                              |    3 +-
 net/ipv6/ndisc.c                                   |   40 +-
 net/ipv6/route.c                                   | 1369 ++++++++++----------
 net/ipv6/xfrm6_policy.c                            |    2 -
 19 files changed, 1218 insertions(+), 1136 deletions(-)
 create mode 100644 net/ipv4/metrics.c

-- 
2.11.0

^ permalink raw reply

* Re: [PATCH] dt-bindings: net: ravb: Add support for r8a77965 SoC
From: Rob Herring @ 2018-04-16 15:21 UTC (permalink / raw)
  To: Jacopo Mondi
  Cc: sergei.shtylyov, mark.rutland, netdev, devicetree,
	linux-renesas-soc, linux-kernel
In-Reply-To: <1523886917-14491-1-git-send-email-jacopo+renesas@jmondi.org>

On Mon, Apr 16, 2018 at 03:55:17PM +0200, Jacopo Mondi wrote:
> Add documentation for r8a77965 compatible string to renesas ravb device
> tree bindings documentation.
> 
> Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> ---
> 
> Renesas R-Car M3-N support has been merged for v4.17.
> Document the missing device tree bindings.
> 
> ---
>  Documentation/devicetree/bindings/net/renesas,ravb.txt | 1 +
>  1 file changed, 1 insertion(+)

Applied.

Rob

^ permalink raw reply

* bpf-next is OPEN
From: Daniel Borkmann @ 2018-04-16 15:10 UTC (permalink / raw)
  To: ast; +Cc: netdev

Merge window is over so new bpf-next development round begins.

^ permalink raw reply

* Re: [PATCH 0/3] Receive Side Coalescing for macb driver
From: David Miller @ 2018-04-16 15:08 UTC (permalink / raw)
  To: rafalo; +Cc: nicolas.ferre, netdev, linux-kernel
In-Reply-To: <1523739187-20077-1-git-send-email-rafalo@cadence.com>

From: Rafal Ozieblo <rafalo@cadence.com>
Date: Sat, 14 Apr 2018 21:53:07 +0100

> This patch series adds support for receive side coalescing
> for Cadence GEM driver. Receive segmentation coalescing
> is a mechanism to reduce CPU overhead. This is done by
> coalescing received TCP message segments together into
> a single large message. This means that when the message
> is complete the CPU only has to process the single header
> and act upon the one data payload.

You're going to have to think more deeply about enabling this
feature.

If you can't adjust the receive buffer offset, then the various
packet header fields will be unaligned.

On certain architectures this will result in unaligned traps
all over the networking stack as the packet is being processed.

So enabling this by default will hurt performance on such
systems a lot.

The whole "skb_reserve(skb, NET_IP_ALIGN)" is not just for fun,
it is absolutely essential.

^ permalink raw reply

* Re: [PATCH v3] net: davicom: dm9000: Avoid spinlock recursion during dm9000_timeout routine
From: David Miller @ 2018-04-16 15:05 UTC (permalink / raw)
  To: liu.xiang6; +Cc: netdev, linux-kernel, liuxiang_1999
In-Reply-To: <1523695834-6261-1-git-send-email-liu.xiang6@zte.com.cn>

From: Liu Xiang <liu.xiang6@zte.com.cn>
Date: Sat, 14 Apr 2018 16:50:34 +0800

> +static bool dm9000_current_in_timeout(struct board_info *db)
> +{
> +	bool ret = false;
> +
> +	preempt_disable();
> +	ret = (db->timeout_cpu == smp_processor_id());
> +	preempt_enable();

This doesn't work.

As soon as you do preempt_enable(), smp_processor_id() can change.

^ permalink raw reply

* Re: [PATCH net] team: avoid adding twice the same option to the event list
From: David Miller @ 2018-04-16 15:03 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, jiri
In-Reply-To: <0e295e62358a68b22c646adece4272a9bd0473f8.1523620752.git.pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>
Date: Fri, 13 Apr 2018 13:59:25 +0200

> When parsing the options provided by the user space,
> team_nl_cmd_options_set() insert them in a temporary list to send
> multiple events with a single message.
> While each option's attribute is correctly validated, the code does
> not check for duplicate entries before inserting into the event
> list.
> 
> Exploiting the above, the syzbot was able to trigger the following
> splat:
 ...
> This changeset addresses the avoiding list_add() if the current
> option is already present in the event list.
> 
> Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")

Applied and queued up for -stable, thanks.

Please put the Fixes: tag as the first tag next time.

Thank you.

^ permalink raw reply

* net-next is OPEN...
From: David Miller @ 2018-04-16 15:00 UTC (permalink / raw)
  To: netdev


You all know the drill:

	http://vger.kernel.org/~davem/net-next.html

Now, let's see if you guys can avoid drowning me all at once
this time :-)

^ permalink raw reply

* Re: Re: [PATCH 3/5] net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device
From: Chen-Yu Tsai @ 2018-04-16 14:51 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Icenowy Zheng, Rob Herring, Giuseppe Cavallaro, Corentin Labbe,
	netdev, devicetree, linux-arm-kernel, linux-kernel, linux-sunxi
In-Reply-To: <20180416143130.tls6xtcq3hsv7u7f@flea>

On Mon, Apr 16, 2018 at 10:31 PM, Maxime Ripard
<maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> wrote:
> On Thu, Apr 12, 2018 at 11:23:30PM +0800, Chen-Yu Tsai wrote:
>> On Thu, Apr 12, 2018 at 11:11 PM, Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org> wrote:
>> > 于 2018年4月12日 GMT+08:00 下午10:56:28, Maxime Ripard <maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> 写到:
>> >>On Wed, Apr 11, 2018 at 10:16:39PM +0800, Icenowy Zheng wrote:
>> >>> From: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>> >>>
>> >>> On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU
>> >>> address space; on the A64 SoC this register is in the SRAM controller
>> >>> address space, and with a different offset.
>> >>>
>> >>> To access the register from another device and hide the internal
>> >>> difference between the device, let it register a regmap named
>> >>> "emac-clock". We can then get the device from the phandle, and
>> >>> retrieve the regmap with dev_get_regmap(); in this situation the
>> >>> regmap_field will be set up to access the only register in the
>> >>regmap.
>> >>>
>> >>> Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>> >>> [Icenowy: change to use regmaps with single register, change commit
>> >>>  message]
>> >>> Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
>> >>> ---
>> >>>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 48
>> >>++++++++++++++++++++++-
>> >>>  1 file changed, 46 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>> index 1037f6c78bca..b61210c0d415 100644
>> >>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>> @@ -85,6 +85,13 @@ const struct reg_field old_syscon_reg_field = {
>> >>>      .msb = 31,
>> >>>  };
>> >>>
>> >>> +/* Specially exported regmap which contains only EMAC register */
>> >>> +const struct reg_field single_reg_field = {
>> >>> +    .reg = 0,
>> >>> +    .lsb = 0,
>> >>> +    .msb = 31,
>> >>> +};
>> >>> +
>> >>
>> >>I'm not sure this would be wise. If we ever need some other register
>> >>exported through the regmap, will have to change all the calling sites
>> >>everywhere in the kernel, which will be a pain and will break
>> >>bisectability.
>> >
>> > In this situation the register can be exported as another
>> >  regmap. Currently the code will access a regmap with name
>> > "emac-clock" for this register.
>> >
>> >>
>> >>Chen-Yu's (or was it yours?) initial solution with a custom writeable
>> >>hook only allowing a single register seemed like a better one.
>> >
>> > But I remember you mentioned that you want it to hide the
>> > difference inside the device.
>>
>> The idea is that a device can export multiple regmaps. This one,
>> the one named "gmac" (in my soon to come v2) or "emac-clock" here,
>> is but one of many possible regmaps, and it only exports the register
>> needed by the GMAC/EMAC.
>
> I'm not sure this would be wise either. There's a single register map,
> and as far as I know we don't have a binding to express this in the
> DT. This means that the customer and provider would have to use the
> same name, but without anything actually enforcing it aside from
> "someone in the community knows it".
>
> This is not a really good design, and I was actually preferring your
> first option. We shouldn't rely on any undocumented rule. This will be
> easy to break and hard to maintain.

So, one regmap per device covering the whole register range, and the
consumer knows which register to poke by looking at its own compatible.

That sound right?

ChenYu

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: Re: [PATCH 3/5] net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device
From: Icenowy Zheng @ 2018-04-16 14:34 UTC (permalink / raw)
  To: Maxime Ripard, Chen-Yu Tsai
  Cc: Rob Herring, Giuseppe Cavallaro, Corentin Labbe, netdev,
	devicetree, linux-arm-kernel, linux-kernel, linux-sunxi
In-Reply-To: <20180416143130.tls6xtcq3hsv7u7f@flea>



于 2018年4月16日 GMT+08:00 下午10:31:30, Maxime Ripard <maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> 写到:
>On Thu, Apr 12, 2018 at 11:23:30PM +0800, Chen-Yu Tsai wrote:
>> On Thu, Apr 12, 2018 at 11:11 PM, Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
>wrote:
>> > 于 2018年4月12日 GMT+08:00 下午10:56:28, Maxime Ripard
><maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> 写到:
>> >>On Wed, Apr 11, 2018 at 10:16:39PM +0800, Icenowy Zheng wrote:
>> >>> From: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>> >>>
>> >>> On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU
>> >>> address space; on the A64 SoC this register is in the SRAM
>controller
>> >>> address space, and with a different offset.
>> >>>
>> >>> To access the register from another device and hide the internal
>> >>> difference between the device, let it register a regmap named
>> >>> "emac-clock". We can then get the device from the phandle, and
>> >>> retrieve the regmap with dev_get_regmap(); in this situation the
>> >>> regmap_field will be set up to access the only register in the
>> >>regmap.
>> >>>
>> >>> Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
>> >>> [Icenowy: change to use regmaps with single register, change
>commit
>> >>>  message]
>> >>> Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
>> >>> ---
>> >>>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 48
>> >>++++++++++++++++++++++-
>> >>>  1 file changed, 46 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>> index 1037f6c78bca..b61210c0d415 100644
>> >>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
>> >>> @@ -85,6 +85,13 @@ const struct reg_field old_syscon_reg_field =
>{
>> >>>      .msb = 31,
>> >>>  };
>> >>>
>> >>> +/* Specially exported regmap which contains only EMAC register
>*/
>> >>> +const struct reg_field single_reg_field = {
>> >>> +    .reg = 0,
>> >>> +    .lsb = 0,
>> >>> +    .msb = 31,
>> >>> +};
>> >>> +
>> >>
>> >>I'm not sure this would be wise. If we ever need some other
>register
>> >>exported through the regmap, will have to change all the calling
>sites
>> >>everywhere in the kernel, which will be a pain and will break
>> >>bisectability.
>> >
>> > In this situation the register can be exported as another
>> >  regmap. Currently the code will access a regmap with name
>> > "emac-clock" for this register.
>> >
>> >>
>> >>Chen-Yu's (or was it yours?) initial solution with a custom
>writeable
>> >>hook only allowing a single register seemed like a better one.
>> >
>> > But I remember you mentioned that you want it to hide the
>> > difference inside the device.
>> 
>> The idea is that a device can export multiple regmaps. This one,
>> the one named "gmac" (in my soon to come v2) or "emac-clock" here,
>> is but one of many possible regmaps, and it only exports the register
>> needed by the GMAC/EMAC.
>
>I'm not sure this would be wise either. There's a single register map,
>and as far as I know we don't have a binding to express this in the
>DT. This means that the customer and provider would have to use the
>same name, but without anything actually enforcing it aside from
>"someone in the community knows it".
>
>This is not a really good design, and I was actually preferring your
>first option. We shouldn't rely on any undocumented rule. This will be
>easy to break and hard to maintain.

Okay. Then I will revert back to the original solution in the next version.

>
>Maxime

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: Re: [PATCH 3/5] net: stmmac: dwmac-sun8i: Allow getting syscon regmap from device
From: Maxime Ripard @ 2018-04-16 14:31 UTC (permalink / raw)
  To: Chen-Yu Tsai
  Cc: Icenowy Zheng, Rob Herring, Giuseppe Cavallaro, Corentin Labbe,
	netdev, devicetree, linux-arm-kernel, linux-kernel, linux-sunxi
In-Reply-To: <CAGb2v66Qi2D1qgF3-iVJFAiznMUzzen0nHM5vsPe7Cyi6QTYpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3899 bytes --]

On Thu, Apr 12, 2018 at 11:23:30PM +0800, Chen-Yu Tsai wrote:
> On Thu, Apr 12, 2018 at 11:11 PM, Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org> wrote:
> > 于 2018年4月12日 GMT+08:00 下午10:56:28, Maxime Ripard <maxime.ripard-LDxbnhwyfcJBDgjK7y7TUQ@public.gmane.org> 写到:
> >>On Wed, Apr 11, 2018 at 10:16:39PM +0800, Icenowy Zheng wrote:
> >>> From: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
> >>>
> >>> On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU
> >>> address space; on the A64 SoC this register is in the SRAM controller
> >>> address space, and with a different offset.
> >>>
> >>> To access the register from another device and hide the internal
> >>> difference between the device, let it register a regmap named
> >>> "emac-clock". We can then get the device from the phandle, and
> >>> retrieve the regmap with dev_get_regmap(); in this situation the
> >>> regmap_field will be set up to access the only register in the
> >>regmap.
> >>>
> >>> Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>
> >>> [Icenowy: change to use regmaps with single register, change commit
> >>>  message]
> >>> Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
> >>> ---
> >>>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 48
> >>++++++++++++++++++++++-
> >>>  1 file changed, 46 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> >>b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> >>> index 1037f6c78bca..b61210c0d415 100644
> >>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> >>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
> >>> @@ -85,6 +85,13 @@ const struct reg_field old_syscon_reg_field = {
> >>>      .msb = 31,
> >>>  };
> >>>
> >>> +/* Specially exported regmap which contains only EMAC register */
> >>> +const struct reg_field single_reg_field = {
> >>> +    .reg = 0,
> >>> +    .lsb = 0,
> >>> +    .msb = 31,
> >>> +};
> >>> +
> >>
> >>I'm not sure this would be wise. If we ever need some other register
> >>exported through the regmap, will have to change all the calling sites
> >>everywhere in the kernel, which will be a pain and will break
> >>bisectability.
> >
> > In this situation the register can be exported as another
> >  regmap. Currently the code will access a regmap with name
> > "emac-clock" for this register.
> >
> >>
> >>Chen-Yu's (or was it yours?) initial solution with a custom writeable
> >>hook only allowing a single register seemed like a better one.
> >
> > But I remember you mentioned that you want it to hide the
> > difference inside the device.
> 
> The idea is that a device can export multiple regmaps. This one,
> the one named "gmac" (in my soon to come v2) or "emac-clock" here,
> is but one of many possible regmaps, and it only exports the register
> needed by the GMAC/EMAC.

I'm not sure this would be wise either. There's a single register map,
and as far as I know we don't have a binding to express this in the
DT. This means that the customer and provider would have to use the
same name, but without anything actually enforcing it aside from
"someone in the community knows it".

This is not a really good design, and I was actually preferring your
first option. We shouldn't rely on any undocumented rule. This will be
easy to break and hard to maintain.

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] dt-bindings: net: ravb: Add support for r8a77965 SoC
From: David Miller @ 2018-04-16 14:17 UTC (permalink / raw)
  To: jacopo+renesas
  Cc: sergei.shtylyov, robh+dt, mark.rutland, netdev, devicetree,
	linux-renesas-soc, linux-kernel
In-Reply-To: <1523886917-14491-1-git-send-email-jacopo+renesas@jmondi.org>

From: Jacopo Mondi <jacopo+renesas@jmondi.org>
Date: Mon, 16 Apr 2018 15:55:17 +0200

> Add documentation for r8a77965 compatible string to renesas ravb device
> tree bindings documentation.
> 
> Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
> ---
> 
> Renesas R-Car M3-N support has been merged for v4.17.
> Document the missing device tree bindings.

Since this is purely a devicetree update, I'm assuming that it doesn't
go through my networking tree.

^ permalink raw reply

* Re: [PATCH net] net: mvpp2: Fix TCAM filter reserved range
From: David Miller @ 2018-04-16 14:04 UTC (permalink / raw)
  To: maxime.chevallier
  Cc: netdev, linux-kernel, antoine.tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal, nadavh, stefanc, ymarkman, mw
In-Reply-To: <20180416080723.15593-1-maxime.chevallier@bootlin.com>

From: Maxime Chevallier <maxime.chevallier@bootlin.com>
Date: Mon, 16 Apr 2018 10:07:23 +0200

> Marvell's PPv2 controller has a Packet Header parser, which uses a
> fixed-size TCAM array of filter entries.
> 
> The mvpp2 driver reserves some ranges among the 256 TCAM entries to
> perform MAC and VID filtering. The rest of the TCAM ids are freely usable
> for other features, such as IPv4 proto matching.
> 
> This commit fixes the MVPP2_PE_LAST_FREE_TID define that sets the end of
> the "free range", which included the MAC range. This could therefore allow
> some other features to use entries dedicated to MAC filtering,
> lowering the number of unicast/multicast addresses that could be allowed
> before switching to promiscuous mode.
> 
> Fixes: 10fea26ce2aa ("net: mvpp2: Add support for unicast filtering")
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] Revert "macsec: missing dev_put() on error in macsec_newlink()"
From: David Miller @ 2018-04-16 14:02 UTC (permalink / raw)
  To: dan.carpenter; +Cc: labbott, sd, linux-kernel, netdev
In-Reply-To: <20180416101750.GA19613@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Mon, 16 Apr 2018 13:17:50 +0300

> This patch is just wrong, sorry.  I was trying to fix a static checker
> warning and misread the code.  The reference taken in macsec_newlink()
> is released in macsec_free_netdev() when the netdevice is destroyed.
> 
> This reverts commit 5dcd8400884cc4a043a6d4617e042489e5d566a9.
> 
> Reported-by: Laura Abbott <labbott@redhat.com>
> Fixes: 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> Acked-by: Sabrina Dubroca <sd@queasysnail.net>
> ---
> I sent this earlier but I messed up the CC list.  I've updated the
> commit message as well.

Applied, thanks Dan.

^ permalink raw reply

* [PATCH] dt-bindings: net: ravb: Add support for r8a77965 SoC
From: Jacopo Mondi @ 2018-04-16 13:55 UTC (permalink / raw)
  To: sergei.shtylyov, robh+dt, mark.rutland
  Cc: Jacopo Mondi, netdev, devicetree, linux-renesas-soc, linux-kernel

Add documentation for r8a77965 compatible string to renesas ravb device
tree bindings documentation.

Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
---

Renesas R-Car M3-N support has been merged for v4.17.
Document the missing device tree bindings.

---
 Documentation/devicetree/bindings/net/renesas,ravb.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index c306f55..890526d 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -18,6 +18,7 @@ Required properties:

       - "renesas,etheravb-r8a7795" for the R8A7795 SoC.
       - "renesas,etheravb-r8a7796" for the R8A7796 SoC.
+      - "renesas,etheravb-r8a77965" for the R8A77965 SoC.
       - "renesas,etheravb-r8a77970" for the R8A77970 SoC.
       - "renesas,etheravb-r8a77980" for the R8A77980 SoC.
       - "renesas,etheravb-r8a77995" for the R8A77995 SoC.

^ permalink raw reply related

* Re: [PATCH net-next 1/5] virtio: Add support for SCTP checksum offloading
From: Vlad Yasevich @ 2018-04-16 13:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, Vladislav Yasevich
  Cc: netdev, linux-sctp, virtualization, jasowang, nhorman
In-Reply-To: <20180412014548-mutt-send-email-mst@kernel.org>

On 04/11/2018 06:49 PM, Michael S. Tsirkin wrote:
> On Mon, Apr 02, 2018 at 09:40:02AM -0400, Vladislav Yasevich wrote:
>> To support SCTP checksum offloading, we need to add a new feature
>> to virtio_net, so we can negotiate support between the hypervisor
>> and the guest.
>>
>> The signalling to the guest that an alternate checksum needs to
>> be used is done via a new flag in the virtio_net_hdr.  If the
>> flag is set, the host will know to perform an alternate checksum
>> calculation, which right now is only CRC32c.
>>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>>  drivers/net/virtio_net.c        | 11 ++++++++---
>>  include/linux/virtio_net.h      |  6 ++++++
>>  include/uapi/linux/virtio_net.h |  2 ++
>>  3 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 7b187ec..b601294 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2724,9 +2724,14 @@ static int virtnet_probe(struct virtio_device *vdev)
>>  	/* Do we support "hardware" checksums? */
>>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_CSUM)) {
>>  		/* This opens up the world of extra features. */
>> -		dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_SG;
>> +		netdev_features_t sctp = 0;
>> +
>> +		if (virtio_has_feature(vdev, VIRTIO_NET_F_SCTP_CSUM))
>> +			sctp |= NETIF_F_SCTP_CRC;
>> +
>> +		dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_SG | sctp;
>>  		if (csum)
>> -			dev->features |= NETIF_F_HW_CSUM | NETIF_F_SG;
>> +			dev->features |= NETIF_F_HW_CSUM | NETIF_F_SG | sctp;
>>  
>>  		if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
>>  			dev->hw_features |= NETIF_F_TSO
>> @@ -2952,7 +2957,7 @@ static struct virtio_device_id id_table[] = {
>>  };
>>  
>>  #define VIRTNET_FEATURES \
>> -	VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM, \
>> +	VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,  VIRTIO_NET_F_SCTP_CSUM, \
>>  	VIRTIO_NET_F_MAC, \
>>  	VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6, \
>>  	VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, \
>> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
>> index f144216..2e7a64a 100644
>> --- a/include/linux/virtio_net.h
>> +++ b/include/linux/virtio_net.h
>> @@ -39,6 +39,9 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>>  
>>  		if (!skb_partial_csum_set(skb, start, off))
>>  			return -EINVAL;
>> +
>> +		if (hdr->flags & VIRTIO_NET_HDR_F_CSUM_NOT_INET)
>> +			skb->csum_not_inet = 1;
>>  	}
>>  
>>  	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
>> @@ -96,6 +99,9 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
>>  		hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
>>  	} /* else everything is zero */
>>  
>> +	if (skb->csum_not_inet)
>> +		hdr->flags &= VIRTIO_NET_HDR_F_CSUM_NOT_INET;
>> +
>>  	return 0;
>>  }
>>  
>> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
>> index 5de6ed3..3f279c8 100644
>> --- a/include/uapi/linux/virtio_net.h
>> +++ b/include/uapi/linux/virtio_net.h
>> @@ -36,6 +36,7 @@
>>  #define VIRTIO_NET_F_GUEST_CSUM	1	/* Guest handles pkts w/ partial csum */
>>  #define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS 2 /* Dynamic offload configuration. */
>>  #define VIRTIO_NET_F_MTU	3	/* Initial MTU advice */
>> +#define VIRTIO_NET_F_SCTP_CSUM  4	/* SCTP checksum offload support */
>>  #define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
>>  #define VIRTIO_NET_F_GUEST_TSO4	7	/* Guest can handle TSOv4 in. */
>>  #define VIRTIO_NET_F_GUEST_TSO6	8	/* Guest can handle TSOv6 in. */
> 
> Is this a guest or a host checksum? We should differenciate between the
> two.

I suppose this is HOST checksum, since it behaves like VIRTIO_NET_F_CSUM only for
SCTP.  I couldn't find the use for the GUEST side flag, since there technically
isn't any validations and there is no additional mappings from VIRTIO flag to a
NETIF flag.

If the feature is negotiated, the guest ends up generating partially check-summed
packets, and the host turns on appropriate flags on it's side.   The host will
also make sure the checksum if fixed up if the guest doesn't support it.
So 1 flag is currently all that is needed.

-vlad

> 
> 
>> @@ -101,6 +102,7 @@ struct virtio_net_config {
>>  struct virtio_net_hdr_v1 {
>>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
>>  #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
>> +#define VIRTIO_NET_HDR_F_CSUM_NOT_INET  4       /* Checksum is not inet */
>>  	__u8 flags;
>>  #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
>>  #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
>> -- 
>> 2.9.5

^ permalink raw reply

* Re: [PATCH v4 net-next 08/19] inet: frags: use rhashtables for reassembly units
From: Stefan Schmidt @ 2018-04-16 12:54 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet, Alexander Aring
In-Reply-To: <20180331195900.183604-9-edumazet@google.com>

Hello Eric.

On 03/31/2018 09:58 PM, Eric Dumazet wrote:
> Some applications still rely on IP fragmentation, and to be fair linux
> reassembly unit is not working under any serious load.

[...]

> ---
>  Documentation/networking/ip-sysctl.txt  |   7 +-
>  include/net/inet_frag.h                 |  81 +++---
>  include/net/ipv6.h                      |  16 +-
>  net/ieee802154/6lowpan/6lowpan_i.h      |  26 +-
>  net/ieee802154/6lowpan/reassembly.c     |  93 +++----
>  net/ipv4/inet_fragment.c                | 354 +++++-------------------
>  net/ipv4/ip_fragment.c                  | 112 ++++----
>  net/ipv6/netfilter/nf_conntrack_reasm.c |  51 +---
>  net/ipv6/reassembly.c                   | 110 ++++----
>  9 files changed, 271 insertions(+), 579 deletions(-)
>

Just a heads up to let you know that this patch broke the 6LoWPAN fragmentation/reassembly
for 802.15.4

Simply initiating a ssh session will fail. After a 1.5 days git bisect session it pointed me to this
commit. :-) After reverting it (and some other patches of this series due to inter dependencies)
it started working again.

No further debugging on this from my side yet, as I just found it and wanted to let you know in
case you have an idea what's going on.

I will try to make some room this week to help with debugging and fixing.

regards
Stefan Schmidt

^ permalink raw reply

* Re: SRIOV switchdev mode BoF minutes
From: Andy Gospodarek @ 2018-04-16 12:39 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Samudrala, Sridhar, David Miller, Anjali Singhai Jain,
	Michael Chan, Simon Horman, Jakub Kicinski, John Fastabend,
	Saeed Mahameed, Jiri Pirko, Rony Efraim, Linux Netdev List
In-Reply-To: <CAJ3xEMhWvNJ4fv+UK2YjcfL44txpW+4KWViWSejO9s+2yTnMOQ@mail.gmail.com>

On Sun, Apr 15, 2018 at 09:01:16AM +0300, Or Gerlitz wrote:
> On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
> 
> > I meant between PFs on 2 compute nodes.
> 
> If the PF serves as uplink rep, it functions as  a switch port -- applications
> don't run on switch ports. One way to get apps to run on the host in switchdev
> mode is probe one of the VFs there.
> 
> 
> [...]
> 
> > By smartnic env, i guess you are referring to OVS control plane also running
> > on the NIC.
> 
> correct
> 

Not just OvS, but other applications running on the SmartNIC could use tc for
programming hardware can benefit from a design like this.

> > I will look forward to your patches.
> 
> FWIW, note that my patches don't bring any newz for you.. I am aligning
> mlx5 with what was agreed on netdev, e.g nfp does it (uplink rep and
> such) already.

Probably not major news from us either since this was discussed at the last
NetConf, but we are planning to have this option for SmartNICs or PCI-multihost
NICs, too.

^ permalink raw reply

* Re: XDP performance regression due to CONFIG_RETPOLINE Spectre V2
From: Christoph Hellwig @ 2018-04-16 12:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: xdp-newbies@vger.kernel.org, netdev@vger.kernel.org,
	Christoph Hellwig, David Woodhouse, William Tu,
	Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Arnaldo Carvalho de Melo
In-Reply-To: <20180412155029.0324fe58@redhat.com>

Can you try the following hack which avoids indirect calls entirely
for the fast path direct mapping case?

---
>From b256a008c1b305e6a1c2afe7c004c54ad2e96d4b Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 16 Apr 2018 14:18:14 +0200
Subject: dma-mapping: bypass dma_ops for direct mappings

Reportedly the retpoline mitigation for spectre causes huge penalties
for indirect function calls.  This hack bypasses the dma_ops mechanism
for simple direct mappings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 53 +++++++++++++++++++++++++++----------
 lib/dma-direct.c            |  4 +--
 3 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 0059b99e1f25..725eec4c6653 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -990,6 +990,7 @@ struct device {
 	bool			offline_disabled:1;
 	bool			offline:1;
 	bool			of_node_reused:1;
+	bool			is_dma_direct:1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f8ab1c0f589e..c5d384ae25d6 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -223,6 +223,13 @@ static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
 }
 #endif
 
+/* do not use directly! */
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, enum dma_data_direction dir,
+		unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
+		int nents, enum dma_data_direction dir, unsigned long attrs);
+
 static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 					      size_t size,
 					      enum dma_data_direction dir,
@@ -232,9 +239,13 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	addr = ops->map_page(dev, virt_to_page(ptr),
-			     offset_in_page(ptr), size,
-			     dir, attrs);
+	if (dev->is_dma_direct) {
+		addr = dma_direct_map_page(dev, virt_to_page(ptr),
+				offset_in_page(ptr), size, dir, attrs);
+	} else {
+		addr = ops->map_page(dev, virt_to_page(ptr),
+				offset_in_page(ptr), size, dir, attrs);
+	}
 	debug_dma_map_page(dev, virt_to_page(ptr),
 			   offset_in_page(ptr), size,
 			   dir, addr, true);
@@ -249,7 +260,7 @@ static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->unmap_page)
+	if (!dev->is_dma_direct && ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	debug_dma_unmap_page(dev, addr, size, dir, true);
 }
@@ -266,7 +277,10 @@ static inline int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
 	int ents;
 
 	BUG_ON(!valid_dma_direction(dir));
-	ents = ops->map_sg(dev, sg, nents, dir, attrs);
+	if (dev->is_dma_direct)
+		ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
+	else
+		ents = ops->map_sg(dev, sg, nents, dir, attrs);
 	BUG_ON(ents < 0);
 	debug_dma_map_sg(dev, sg, nents, ents, dir);
 
@@ -281,7 +295,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg
 
 	BUG_ON(!valid_dma_direction(dir));
 	debug_dma_unmap_sg(dev, sg, nents, dir);
-	if (ops->unmap_sg)
+	if (!dev->is_dma_direct && ops->unmap_sg)
 		ops->unmap_sg(dev, sg, nents, dir, attrs);
 }
 
@@ -295,7 +309,10 @@ static inline dma_addr_t dma_map_page_attrs(struct device *dev,
 	dma_addr_t addr;
 
 	BUG_ON(!valid_dma_direction(dir));
-	addr = ops->map_page(dev, page, offset, size, dir, attrs);
+	if (dev->is_dma_direct)
+		addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+	else
+		addr = ops->map_page(dev, page, offset, size, dir, attrs);
 	debug_dma_map_page(dev, page, offset, size, dir, addr, false);
 
 	return addr;
@@ -309,7 +326,7 @@ static inline void dma_unmap_page_attrs(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->unmap_page)
+	if (!dev->is_dma_direct && ops->unmap_page)
 		ops->unmap_page(dev, addr, size, dir, attrs);
 	debug_dma_unmap_page(dev, addr, size, dir, false);
 }
@@ -356,7 +373,7 @@ static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_cpu)
+	if (!dev->is_dma_direct && ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr, size, dir);
 	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
 }
@@ -368,7 +385,7 @@ static inline void dma_sync_single_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_device)
+	if (!dev->is_dma_direct && ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr, size, dir);
 	debug_dma_sync_single_for_device(dev, addr, size, dir);
 }
@@ -382,7 +399,7 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_cpu)
+	if (!dev->is_dma_direct && ops->sync_single_for_cpu)
 		ops->sync_single_for_cpu(dev, addr + offset, size, dir);
 	debug_dma_sync_single_range_for_cpu(dev, addr, offset, size, dir);
 }
@@ -396,7 +413,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_single_for_device)
+	if (!dev->is_dma_direct && ops->sync_single_for_device)
 		ops->sync_single_for_device(dev, addr + offset, size, dir);
 	debug_dma_sync_single_range_for_device(dev, addr, offset, size, dir);
 }
@@ -408,7 +425,7 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_sg_for_cpu)
+	if (!dev->is_dma_direct && ops->sync_sg_for_cpu)
 		ops->sync_sg_for_cpu(dev, sg, nelems, dir);
 	debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
 }
@@ -420,7 +437,7 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 
 	BUG_ON(!valid_dma_direction(dir));
-	if (ops->sync_sg_for_device)
+	if (!dev->is_dma_direct && ops->sync_sg_for_device)
 		ops->sync_sg_for_device(dev, sg, nelems, dir);
 	debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
 
@@ -600,6 +617,8 @@ static inline int dma_supported(struct device *dev, u64 mask)
 	return ops->dma_supported(dev, mask);
 }
 
+extern const struct dma_map_ops swiotlb_dma_ops;
+
 #ifndef HAVE_ARCH_DMA_SET_MASK
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
@@ -609,6 +628,12 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 	dma_check_mask(dev, mask);
 
 	*dev->dma_mask = mask;
+	if (dev->dma_ops == &dma_direct_ops ||
+	    (dev->dma_ops == &swiotlb_dma_ops &&
+	     mask == DMA_BIT_MASK(64)))
+		dev->is_dma_direct = true;
+	else
+		dev->is_dma_direct = false;
 	return 0;
 }
 #endif
diff --git a/lib/dma-direct.c b/lib/dma-direct.c
index c0bba30fef0a..3deb8666974b 100644
--- a/lib/dma-direct.c
+++ b/lib/dma-direct.c
@@ -120,7 +120,7 @@ void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
 		free_pages((unsigned long)cpu_addr, page_order);
 }
 
-static dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 		unsigned long offset, size_t size, enum dma_data_direction dir,
 		unsigned long attrs)
 {
@@ -131,7 +131,7 @@ static dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
 	return dma_addr;
 }
 
-static int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl,
 		int nents, enum dma_data_direction dir, unsigned long attrs)
 {
 	int i;
-- 
2.17.0

^ permalink raw reply related

* Re: tcp hang when socket fills up ?
From: Florian Westphal @ 2018-04-16 11:01 UTC (permalink / raw)
  To: Dominique Martinet; +Cc: Eric Dumazet, Michal Kubecek, netdev
In-Reply-To: <20180416035546.GA5388@nautica>

Dominique Martinet <asmadeus@codewreck.org> wrote:
> Eric Dumazet wrote on Sun, Apr 15, 2018:
> > Are you sure you do not have some iptables/netfilter stuff ?
> 
> I have a basic firewall setup with default rules e.g. starts with
> -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> in the INPUT chain...
> That said, I just dropped it on the server to check and that seems to
> workaround the issue?!
> When logging everything dropped it appears to decide that the connection
> is no longer established at some point, but only if there is
> tcp_timestamp, just, err, how?
> 
> And certainly enough, if I restore the firewall while a connection is up
> that just hangs; conntrack doesn't consider it connected anymore at some
> point (but it worked for a while!)
> 
> Here's the kind of logs I get from iptables:
> IN=wlp1s0 OUT= MAC=00:c2:c6:b4:7e:c7:a4:12:42:b5:5d:fc:08:00 SRC=client DST=server LEN=52 TOS=0x00 PREC=0x00 TTL=52 ID=17038 DF PROTO=TCP SPT=41558 DPT=15609 WINDOW=1212 RES=0x00 ACK URGP=0 

You could do
echo 6 > /proc/sys/net/netfilter/nf_conntrack_log_invalid

to have conntrack log when/why it thinks packet is invalid.

You can also set
echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal

which stops conntrack from marking packets with out-of-window
acks as invalid.

(Earlier email implies this is related to timestamps, but unfortunately
 to best of my knowledge conntrack doesn't look at tcp timestamps).

^ permalink raw reply

* Re: [PATCH 08/12] mmc: reduce use of block bounce buffers
From: Robin Murphy @ 2018-04-16 10:51 UTC (permalink / raw)
  To: Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-9-hch-jcswGhMUV9g@public.gmane.org>

On 16/04/18 09:50, Christoph Hellwig wrote:
> We can rely on the dma-mapping code to handle any DMA limits that is
> bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
> so remove setting the block layer bounce limit for anything but bouncing
> for highmem pages.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>   drivers/mmc/core/queue.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 56e9a803db21..60a02a763d01 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -351,17 +351,14 @@ static const struct blk_mq_ops mmc_mq_ops = {
>   static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
>   {
>   	struct mmc_host *host = card->host;
> -	u64 limit = BLK_BOUNCE_HIGH;
> -
> -	if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
> -		limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
>   
>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
>   	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
>   	if (mmc_can_erase(card))
>   		mmc_queue_setup_discard(mq->queue, card);
>   
> -	blk_queue_bounce_limit(mq->queue, limit);
> +	if (!mmc_dev(host)->dma_mask || !mmc_dev(host)->dma_mask)

I'm almost surprised that GCC doesn't warn about "x || x", but 
nevertheless I think you've lost a "*" here...

Robin.

> +		blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
>   	blk_queue_max_hw_sectors(mq->queue,
>   		min(host->max_blk_count, host->max_req_size / 512));
>   	blk_queue_max_segments(mq->queue, host->max_segs);
> 

^ permalink raw reply

* [PATCH net] Revert "macsec: missing dev_put() on error in macsec_newlink()"
From: Dan Carpenter @ 2018-04-16 10:17 UTC (permalink / raw)
  To: Laura Abbott, Sabrina Dubroca
  Cc: David S. Miller, Linux Kernel Mailing List, netdev
In-Reply-To: <20180414223121.GA7475@bistromath.localdomain>

This patch is just wrong, sorry.  I was trying to fix a static checker
warning and misread the code.  The reference taken in macsec_newlink()
is released in macsec_free_netdev() when the netdevice is destroyed.

This reverts commit 5dcd8400884cc4a043a6d4617e042489e5d566a9.

Reported-by: Laura Abbott <labbott@redhat.com>
Fixes: 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
---
I sent this earlier but I messed up the CC list.  I've updated the
commit message as well.

 drivers/net/macsec.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 9cbb0c8a896a..7de88b33d5b9 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3277,7 +3277,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,

 	err = netdev_upper_dev_link(real_dev, dev, extack);
 	if (err < 0)
-		goto put_dev;
+		goto unregister;

 	/* need to be already registered so that ->init has run and
 	 * the MAC addr is set
@@ -3316,8 +3316,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,
 	macsec_del_dev(macsec);
 unlink:
 	netdev_upper_dev_unlink(real_dev, dev);
-put_dev:
-	dev_put(real_dev);
+unregister:
 	unregister_netdevice(dev);
 	return err;
 }
-- 
2.16.2

^ permalink raw reply related

* Re: linux-next: build failure after merge of the bpf tree
From: Daniel Borkmann @ 2018-04-16  9:04 UTC (permalink / raw)
  To: Stephen Rothwell, Alexei Starovoitov, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	John Fastabend
In-Reply-To: <20180416123000.30fd8a04@canb.auug.org.au>

Hi Stephen, hi John,

On 04/16/2018 04:30 AM, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the bpf tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> kernel/bpf/core.o: In function `sock_map_release':
> core.c:(.text+0xd04): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> kernel/events/core.o: In function `sock_map_release':
> core.c:(.text+0x85cc): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> block/blk-core.o: In function `sock_map_release':
> blk-core.c:(.text+0x58e8): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> drivers/net/virtio_net.o: In function `sock_map_release':
> virtio_net.c:(.text+0x53ec): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> net/core/dev.o: In function `sock_map_release':
> dev.c:(.text+0x6c68): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> net/core/rtnetlink.o: In function `sock_map_release':
> rtnetlink.c:(.text+0x63e0): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> net/core/filter.o: In function `sock_map_release':
> filter.c:(.text+0x8c8c): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> net/core/sock_reuseport.o: In function `sock_map_release':
> sock_reuseport.c:(.text+0x398): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> net/bpf/test_run.o: In function `sock_map_release':
> test_run.c:(.text+0x3dc): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> net/packet/af_packet.o: In function `sock_map_release':
> af_packet.c:(.text+0x6958): multiple definition of `sock_map_release'
> kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
> 
> Caused by commit
> 
>   9b2e8bbc4e7a ("bpf: sockmap, map_release does not hold refcnt for pinned maps")
> 
> I applied the following patch for today:
> 
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Mon, 16 Apr 2018 12:27:24 +1000
> Subject: [PATCH] fix for "bpf: sockmap, map_release does not hold refcnt for
>  pinned maps"
> 
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> ---
>  include/linux/bpf.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f46561de5154..3b6c2b66f414 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -660,7 +660,7 @@ static inline int sock_map_prog(struct bpf_map *map,
>  	return -EOPNOTSUPP;
>  }
>  
> -void sock_map_release(struct bpf_map *map) {}
> +static inline void sock_map_release(struct bpf_map *map) {}
>  #endif
>  
>  /* verifier prototypes for helper functions called from eBPF programs */

Sigh, yeah, that's buggy. Thanks for catching it!

John, given bpf tree hasn't been pushed out, I'm considering to drop the
series from the bpf tree so we can have a clean respin of it with this fixed
initially and not as follow-up churn.

While you're at it, can you just make this a bpf callback, so we have the
'clear' handlers private in prog array and sockmap respectively? This would
also have avoided the bug in the first place.

Thanks,
Daniel

^ permalink raw reply

* [PATCH 12/12] PCI: remove PCI_DMA_BUS_IS_PHYS
From: Christoph Hellwig @ 2018-04-16  8:50 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>

This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads.  Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 arch/alpha/include/asm/pci.h      |  5 -----
 arch/arc/include/asm/pci.h        |  6 ------
 arch/arm/include/asm/pci.h        |  7 -------
 arch/arm64/include/asm/pci.h      |  5 -----
 arch/h8300/include/asm/pci.h      |  2 --
 arch/hexagon/kernel/dma.c         |  1 -
 arch/ia64/hp/common/sba_iommu.c   |  3 ---
 arch/ia64/include/asm/pci.h       | 17 -----------------
 arch/ia64/kernel/setup.c          | 12 ------------
 arch/ia64/sn/kernel/io_common.c   |  5 -----
 arch/m68k/include/asm/pci.h       |  6 ------
 arch/microblaze/include/asm/pci.h |  6 ------
 arch/mips/include/asm/pci.h       |  7 -------
 arch/parisc/include/asm/pci.h     | 23 -----------------------
 arch/parisc/kernel/setup.c        |  5 -----
 arch/powerpc/include/asm/pci.h    | 18 ------------------
 arch/riscv/include/asm/pci.h      |  3 ---
 arch/s390/include/asm/pci.h       |  2 --
 arch/s390/pci/pci_dma.c           |  2 --
 arch/sh/include/asm/pci.h         |  6 ------
 arch/sh/kernel/dma-nommu.c        |  1 -
 arch/sparc/include/asm/pci_32.h   |  4 ----
 arch/sparc/include/asm/pci_64.h   |  6 ------
 arch/x86/include/asm/pci.h        |  3 ---
 arch/x86/kernel/pci-nommu.c       |  1 -
 arch/xtensa/include/asm/pci.h     |  2 --
 drivers/parisc/ccio-dma.c         |  2 --
 drivers/parisc/sba_iommu.c        |  2 --
 include/asm-generic/pci.h         |  8 --------
 include/linux/dma-mapping.h       |  1 -
 lib/dma-direct.c                  |  1 -
 tools/virtio/linux/dma-mapping.h  |  2 --
 32 files changed, 174 deletions(-)

diff --git a/arch/alpha/include/asm/pci.h b/arch/alpha/include/asm/pci.h
index b9ec55351924..cf6bc1e64d66 100644
--- a/arch/alpha/include/asm/pci.h
+++ b/arch/alpha/include/asm/pci.h
@@ -56,11 +56,6 @@ struct pci_controller {
 
 /* IOMMU controls.  */
 
-/* The PCI address space does not equal the physical memory address space.
-   The networking and block device layers use this boolean for bounce buffer
-   decisions.  */
-#define PCI_DMA_BUS_IS_PHYS  0
-
 /* TODO: integrate with include/asm-generic/pci.h ? */
 static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
 {
diff --git a/arch/arc/include/asm/pci.h b/arch/arc/include/asm/pci.h
index ba56c23c1b20..4ff53c041c64 100644
--- a/arch/arc/include/asm/pci.h
+++ b/arch/arc/include/asm/pci.h
@@ -16,12 +16,6 @@
 #define PCIBIOS_MIN_MEM 0x100000
 
 #define pcibios_assign_all_busses()	1
-/*
- * The PCI address space does equal the physical memory address space.
- * The networking and block device layers use this boolean for bounce
- * buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS	1
 
 #endif /* __KERNEL__ */
 
diff --git a/arch/arm/include/asm/pci.h b/arch/arm/include/asm/pci.h
index 1f0de808d111..0abd389cf0ec 100644
--- a/arch/arm/include/asm/pci.h
+++ b/arch/arm/include/asm/pci.h
@@ -19,13 +19,6 @@ static inline int pci_proc_domain(struct pci_bus *bus)
 }
 #endif /* CONFIG_PCI_DOMAINS */
 
-/*
- * The PCI address space does equal the physical memory address space.
- * The networking and block device layers use this boolean for bounce
- * buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS     (1)
-
 #define HAVE_PCI_MMAP
 #define ARCH_GENERIC_PCI_MMAP_RESOURCE
 
diff --git a/arch/arm64/include/asm/pci.h b/arch/arm64/include/asm/pci.h
index 8747f7c5e0e7..9e690686e8aa 100644
--- a/arch/arm64/include/asm/pci.h
+++ b/arch/arm64/include/asm/pci.h
@@ -18,11 +18,6 @@
 #define pcibios_assign_all_busses() \
 	(pci_has_flag(PCI_REASSIGN_ALL_BUS))
 
-/*
- * PCI address space differs from physical memory address space
- */
-#define PCI_DMA_BUS_IS_PHYS	(0)
-
 #define ARCH_GENERIC_PCI_MMAP_RESOURCE	1
 
 extern int isa_dma_bridge_buggy;
diff --git a/arch/h8300/include/asm/pci.h b/arch/h8300/include/asm/pci.h
index 7c9e55d62215..d4d345a52092 100644
--- a/arch/h8300/include/asm/pci.h
+++ b/arch/h8300/include/asm/pci.h
@@ -15,6 +15,4 @@ static inline void pcibios_penalize_isa_irq(int irq, int active)
 	/* We don't do dynamic PCI IRQ allocation */
 }
 
-#define PCI_DMA_BUS_IS_PHYS	(1)
-
 #endif /* _ASM_H8300_PCI_H */
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index ad8347c29dcf..77459df34e2e 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -208,7 +208,6 @@ const struct dma_map_ops hexagon_dma_ops = {
 	.sync_single_for_cpu = hexagon_sync_single_for_cpu,
 	.sync_single_for_device = hexagon_sync_single_for_device,
 	.mapping_error	= hexagon_mapping_error,
-	.is_phys	= 1,
 };
 
 void __init hexagon_dma_init(void)
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index aec4a3354abe..6f05aba9012f 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1845,9 +1845,6 @@ static void ioc_init(unsigned long hpa, struct ioc *ioc)
 	ioc_resource_init(ioc);
 	ioc_sac_init(ioc);
 
-	if ((long) ~iovp_mask > (long) ia64_max_iommu_merge_mask)
-		ia64_max_iommu_merge_mask = ~iovp_mask;
-
 	printk(KERN_INFO PFX
 		"%s %d.%d HPA 0x%lx IOVA space %dMb at 0x%lx\n",
 		ioc->name, (ioc->rev >> 4) & 0xF, ioc->rev & 0xF,
diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h
index b1d04e8bafc8..780e8744ba85 100644
--- a/arch/ia64/include/asm/pci.h
+++ b/arch/ia64/include/asm/pci.h
@@ -30,23 +30,6 @@ struct pci_vector_struct {
 #define PCIBIOS_MIN_IO		0x1000
 #define PCIBIOS_MIN_MEM		0x10000000
 
-/*
- * PCI_DMA_BUS_IS_PHYS should be set to 1 if there is _necessarily_ a direct
- * correspondence between device bus addresses and CPU physical addresses.
- * Platforms with a hardware I/O MMU _must_ turn this off to suppress the
- * bounce buffer handling code in the block and network device layers.
- * Platforms with separate bus address spaces _must_ turn this off and provide
- * a device DMA mapping implementation that takes care of the necessary
- * address translation.
- *
- * For now, the ia64 platforms which may have separate/multiple bus address
- * spaces all have I/O MMUs which support the merging of physically
- * discontiguous buffers, so we can use that as the sole factor to determine
- * the setting of PCI_DMA_BUS_IS_PHYS.
- */
-extern unsigned long ia64_max_iommu_merge_mask;
-#define PCI_DMA_BUS_IS_PHYS	(ia64_max_iommu_merge_mask == ~0UL)
-
 #define HAVE_PCI_MMAP
 #define ARCH_GENERIC_PCI_MMAP_RESOURCE
 #define arch_can_pci_mmap_wc()	1
diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index dee56bcb993d..ad43cbf70628 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -123,18 +123,6 @@ unsigned long ia64_i_cache_stride_shift = ~0;
 #define	CACHE_STRIDE_SHIFT	5
 unsigned long ia64_cache_stride_shift = ~0;
 
-/*
- * The merge_mask variable needs to be set to (max(iommu_page_size(iommu)) - 1).  This
- * mask specifies a mask of address bits that must be 0 in order for two buffers to be
- * mergeable by the I/O MMU (i.e., the end address of the first buffer and the start
- * address of the second buffer must be aligned to (merge_mask+1) in order to be
- * mergeable).  By default, we assume there is no I/O MMU which can merge physically
- * discontiguous buffers, so we set the merge_mask to ~0UL, which corresponds to a iommu
- * page-size of 2^64.
- */
-unsigned long ia64_max_iommu_merge_mask = ~0UL;
-EXPORT_SYMBOL(ia64_max_iommu_merge_mask);
-
 /*
  * We use a special marker for the end of memory and it uses the extra (+1) slot
  */
diff --git a/arch/ia64/sn/kernel/io_common.c b/arch/ia64/sn/kernel/io_common.c
index 11f2275570fb..8479e9a7ce16 100644
--- a/arch/ia64/sn/kernel/io_common.c
+++ b/arch/ia64/sn/kernel/io_common.c
@@ -480,11 +480,6 @@ sn_io_early_init(void)
 	tioca_init_provider();
 	tioce_init_provider();
 
-	/*
-	 * This is needed to avoid bounce limit checks in the blk layer
-	 */
-	ia64_max_iommu_merge_mask = ~PAGE_MASK;
-
 	sn_irq_lh_init();
 	INIT_LIST_HEAD(&sn_sysdata_list);
 	sn_init_cpei_timer();
diff --git a/arch/m68k/include/asm/pci.h b/arch/m68k/include/asm/pci.h
index ef26fae8cf0b..5a4bc223743b 100644
--- a/arch/m68k/include/asm/pci.h
+++ b/arch/m68k/include/asm/pci.h
@@ -4,12 +4,6 @@
 
 #include <asm-generic/pci.h>
 
-/* The PCI address space does equal the physical memory
- * address space.  The networking and block device layers use
- * this boolean for bounce buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS	(1)
-
 #define	pcibios_assign_all_busses()	1
 
 #define	PCIBIOS_MIN_IO		0x00000100
diff --git a/arch/microblaze/include/asm/pci.h b/arch/microblaze/include/asm/pci.h
index 114b93488193..00478965f932 100644
--- a/arch/microblaze/include/asm/pci.h
+++ b/arch/microblaze/include/asm/pci.h
@@ -61,12 +61,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus,
 
 #define HAVE_PCI_LEGACY	1
 
-/* The PCI address space does equal the physical memory
- * address space (no IOMMU).  The IDE and SCSI device layers use
- * this boolean for bounce buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS     (1)
-
 extern void pcibios_claim_one_bus(struct pci_bus *b);
 
 extern void pcibios_finish_adding_to_bus(struct pci_bus *bus);
diff --git a/arch/mips/include/asm/pci.h b/arch/mips/include/asm/pci.h
index 2339f42f047a..436099883022 100644
--- a/arch/mips/include/asm/pci.h
+++ b/arch/mips/include/asm/pci.h
@@ -121,13 +121,6 @@ extern unsigned long PCIBIOS_MIN_MEM;
 #include <linux/string.h>
 #include <asm/io.h>
 
-/*
- * The PCI address space does equal the physical memory address space.
- * The networking and block device layers use this boolean for bounce
- * buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS     (1)
-
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
 static inline int pci_proc_domain(struct pci_bus *bus)
 {
diff --git a/arch/parisc/include/asm/pci.h b/arch/parisc/include/asm/pci.h
index 96b7deec512d..3328fd17c19d 100644
--- a/arch/parisc/include/asm/pci.h
+++ b/arch/parisc/include/asm/pci.h
@@ -87,29 +87,6 @@ struct pci_hba_data {
 #define PCI_F_EXTEND		0UL
 #endif /* !CONFIG_64BIT */
 
-/*
- * If the PCI device's view of memory is the same as the CPU's view of memory,
- * PCI_DMA_BUS_IS_PHYS is true.  The networking and block device layers use
- * this boolean for bounce buffer decisions.
- */
-#ifdef CONFIG_PA20
-/* All PA-2.0 machines have an IOMMU. */
-#define PCI_DMA_BUS_IS_PHYS	0
-#define parisc_has_iommu()	do { } while (0)
-#else
-
-#if defined(CONFIG_IOMMU_CCIO) || defined(CONFIG_IOMMU_SBA)
-extern int parisc_bus_is_phys; 	/* in arch/parisc/kernel/setup.c */
-#define PCI_DMA_BUS_IS_PHYS	parisc_bus_is_phys
-#define parisc_has_iommu()	do { parisc_bus_is_phys = 0; } while (0)
-#else
-#define PCI_DMA_BUS_IS_PHYS	1
-#define parisc_has_iommu()	do { } while (0)
-#endif
-
-#endif	/* !CONFIG_PA20 */
-
-
 /*
 ** Most PCI devices (eg Tulip, NCR720) also export the same registers
 ** to both MMIO and I/O port space.  Due to poor performance of I/O Port
diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
index 0e9675f857a5..8d3a7b80ac42 100644
--- a/arch/parisc/kernel/setup.c
+++ b/arch/parisc/kernel/setup.c
@@ -58,11 +58,6 @@ struct proc_dir_entry * proc_runway_root __read_mostly = NULL;
 struct proc_dir_entry * proc_gsc_root __read_mostly = NULL;
 struct proc_dir_entry * proc_mckinley_root __read_mostly = NULL;
 
-#if !defined(CONFIG_PA20) && (defined(CONFIG_IOMMU_CCIO) || defined(CONFIG_IOMMU_SBA))
-int parisc_bus_is_phys __read_mostly = 1;	/* Assume no IOMMU is present */
-EXPORT_SYMBOL(parisc_bus_is_phys);
-#endif
-
 void __init setup_cmdline(char **cmdline_p)
 {
 	extern unsigned int boot_args[];
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 401c62aad5e4..2af9ded80540 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -92,24 +92,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus,
 
 #define HAVE_PCI_LEGACY	1
 
-#ifdef CONFIG_PPC64
-
-/* The PCI address space does not equal the physical memory address
- * space (we have an IOMMU).  The IDE and SCSI device layers use
- * this boolean for bounce buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS	(0)
-
-#else /* 32-bit */
-
-/* The PCI address space does equal the physical memory
- * address space (no IOMMU).  The IDE and SCSI device layers use
- * this boolean for bounce buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS     (1)
-
-#endif /* CONFIG_PPC64 */
-
 extern void pcibios_claim_one_bus(struct pci_bus *b);
 
 extern void pcibios_finish_adding_to_bus(struct pci_bus *bus);
diff --git a/arch/riscv/include/asm/pci.h b/arch/riscv/include/asm/pci.h
index 0f2fc9ef20fc..b3638c505728 100644
--- a/arch/riscv/include/asm/pci.h
+++ b/arch/riscv/include/asm/pci.h
@@ -26,9 +26,6 @@
 /* RISC-V shim does not initialize PCI bus */
 #define pcibios_assign_all_busses() 1
 
-/* We do not have an IOMMU */
-#define PCI_DMA_BUS_IS_PHYS 1
-
 extern int isa_dma_bridge_buggy;
 
 #ifdef CONFIG_PCI
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index 12fe3591034f..94f8db468c9b 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -2,8 +2,6 @@
 #ifndef __ASM_S390_PCI_H
 #define __ASM_S390_PCI_H
 
-/* must be set before including asm-generic/pci.h */
-#define PCI_DMA_BUS_IS_PHYS (0)
 /* must be set before including pci_clp.h */
 #define PCI_BAR_COUNT	6
 
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 2d15d84c20ed..10abf5ed6187 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -685,8 +685,6 @@ const struct dma_map_ops s390_pci_dma_ops = {
 	.map_page	= s390_dma_map_pages,
 	.unmap_page	= s390_dma_unmap_pages,
 	.mapping_error	= s390_mapping_error,
-	/* if we support direct DMA this must be conditional */
-	.is_phys	= 0,
 	/* dma_supported is unconditionally true without a callback */
 };
 EXPORT_SYMBOL_GPL(s390_pci_dma_ops);
diff --git a/arch/sh/include/asm/pci.h b/arch/sh/include/asm/pci.h
index 0033f0df2b3b..10a36b1cf2ea 100644
--- a/arch/sh/include/asm/pci.h
+++ b/arch/sh/include/asm/pci.h
@@ -71,12 +71,6 @@ extern unsigned long PCIBIOS_MIN_IO, PCIBIOS_MIN_MEM;
  * SuperH has everything mapped statically like x86.
  */
 
-/* The PCI address space does equal the physical memory
- * address space.  The networking and block device layers use
- * this boolean for bounce buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS	(dma_ops->is_phys)
-
 #ifdef CONFIG_PCI
 /*
  * None of the SH PCI controllers support MWI, it is always treated as a
diff --git a/arch/sh/kernel/dma-nommu.c b/arch/sh/kernel/dma-nommu.c
index 62b485107eae..2077cfe73cc6 100644
--- a/arch/sh/kernel/dma-nommu.c
+++ b/arch/sh/kernel/dma-nommu.c
@@ -75,7 +75,6 @@ const struct dma_map_ops nommu_dma_ops = {
 	.sync_single_for_device	= nommu_sync_single_for_device,
 	.sync_sg_for_device	= nommu_sync_sg_for_device,
 #endif
-	.is_phys		= 1,
 };
 
 void __init no_iommu_init(void)
diff --git a/arch/sparc/include/asm/pci_32.h b/arch/sparc/include/asm/pci_32.h
index 98917e48727d..cfc0ee9476c6 100644
--- a/arch/sparc/include/asm/pci_32.h
+++ b/arch/sparc/include/asm/pci_32.h
@@ -17,10 +17,6 @@
 
 #define PCI_IRQ_NONE		0xffffffff
 
-/* Dynamic DMA mapping stuff.
- */
-#define PCI_DMA_BUS_IS_PHYS	(0)
-
 #endif /* __KERNEL__ */
 
 #ifndef CONFIG_LEON_PCI
diff --git a/arch/sparc/include/asm/pci_64.h b/arch/sparc/include/asm/pci_64.h
index 671274e36cfa..fac77813402c 100644
--- a/arch/sparc/include/asm/pci_64.h
+++ b/arch/sparc/include/asm/pci_64.h
@@ -17,12 +17,6 @@
 
 #define PCI_IRQ_NONE		0xffffffff
 
-/* The PCI address space does not equal the physical memory
- * address space.  The networking and block device layers use
- * this boolean for bounce buffer decisions.
- */
-#define PCI_DMA_BUS_IS_PHYS	(0)
-
 /* PCI IOMMU mapping bypass support. */
 
 /* PCI 64-bit addressing works for all slots on all controller
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index d32175e30259..662963681ea6 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -117,9 +117,6 @@ void native_restore_msi_irqs(struct pci_dev *dev);
 #define native_setup_msi_irqs		NULL
 #define native_teardown_msi_irq		NULL
 #endif
-
-#define PCI_DMA_BUS_IS_PHYS (dma_ops->is_phys)
-
 #endif  /* __KERNEL__ */
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index ac7ea3a8242f..67efcebbb125 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -84,7 +84,6 @@ const struct dma_map_ops nommu_dma_ops = {
 	.free			= dma_generic_free_coherent,
 	.map_sg			= nommu_map_sg,
 	.map_page		= nommu_map_page,
-	.is_phys		= 1,
 	.mapping_error		= nommu_mapping_error,
 	.dma_supported		= x86_dma_supported,
 };
diff --git a/arch/xtensa/include/asm/pci.h b/arch/xtensa/include/asm/pci.h
index d5a82153a7c5..6ddf0a30c60d 100644
--- a/arch/xtensa/include/asm/pci.h
+++ b/arch/xtensa/include/asm/pci.h
@@ -42,8 +42,6 @@ extern struct pci_controller* pcibios_alloc_controller(void);
  * decisions.
  */
 
-#define PCI_DMA_BUS_IS_PHYS	(1)
-
 /* Tell PCI code what kind of PCI resource mappings we support */
 #define HAVE_PCI_MMAP			1
 #define ARCH_GENERIC_PCI_MMAP_RESOURCE	1
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index acba1f56af3e..2b129d8525d5 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1596,8 +1596,6 @@ static int __init ccio_probe(struct parisc_device *dev)
 	}
 #endif
 	ioc_count++;
-
-	parisc_has_iommu();
 	return 0;
 }
 
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index 0a9c762a70fa..a58c586ebd81 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -2017,8 +2017,6 @@ static int __init sba_driver_callback(struct parisc_device *dev)
 	proc_create("sba_iommu", 0, root, &sba_proc_fops);
 	proc_create("sba_iommu-bitmap", 0, root, &sba_proc_bitmap_fops);
 #endif
-
-	parisc_has_iommu();
 	return 0;
 }
 
diff --git a/include/asm-generic/pci.h b/include/asm-generic/pci.h
index 830d7659289b..6bb3cd3d695a 100644
--- a/include/asm-generic/pci.h
+++ b/include/asm-generic/pci.h
@@ -14,12 +14,4 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel)
 }
 #endif /* HAVE_ARCH_PCI_GET_LEGACY_IDE_IRQ */
 
-/*
- * By default, assume that no iommu is in use and that the PCI
- * space is mapped to address physical 0.
- */
-#ifndef PCI_DMA_BUS_IS_PHYS
-#define PCI_DMA_BUS_IS_PHYS	(1)
-#endif
-
 #endif /* _ASM_GENERIC_PCI_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 14269d25498b..25a9a2b04f78 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -133,7 +133,6 @@ struct dma_map_ops {
 #ifdef ARCH_HAS_DMA_GET_REQUIRED_MASK
 	u64 (*get_required_mask)(struct device *dev);
 #endif
-	int is_phys;
 };
 
 extern const struct dma_map_ops dma_direct_ops;
diff --git a/lib/dma-direct.c b/lib/dma-direct.c
index c0bba30fef0a..199ae4cdd28f 100644
--- a/lib/dma-direct.c
+++ b/lib/dma-direct.c
@@ -179,6 +179,5 @@ const struct dma_map_ops dma_direct_ops = {
 	.map_sg			= dma_direct_map_sg,
 	.dma_supported		= dma_direct_supported,
 	.mapping_error		= dma_direct_mapping_error,
-	.is_phys		= 1,
 };
 EXPORT_SYMBOL(dma_direct_ops);
diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h
index 1571e24e9494..f91aeb5fe571 100644
--- a/tools/virtio/linux/dma-mapping.h
+++ b/tools/virtio/linux/dma-mapping.h
@@ -6,8 +6,6 @@
 # error Virtio userspace code does not support CONFIG_HAS_DMA
 #endif
 
-#define PCI_DMA_BUS_IS_PHYS 1
-
 enum dma_data_direction {
 	DMA_BIDIRECTIONAL = 0,
 	DMA_TO_DEVICE = 1,
-- 
2.17.0

^ permalink raw reply related

* [PATCH 11/12] net: remove the PCI_DMA_BUS_IS_PHYS check in illegal_highdma
From: Christoph Hellwig @ 2018-04-16  8:50 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>

These days the dma mapping routines must be able to handle any address
supported by the device, be that using an iommu, or swiotlb if none is
supported.  With that the PCI_DMA_BUS_IS_PHYS check in illegal_highdma
is not needed and can be removed.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 net/core/dev.c | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 969462ebb296..5802cf177b07 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2884,11 +2884,7 @@ void netdev_rx_csum_fault(struct net_device *dev)
 EXPORT_SYMBOL(netdev_rx_csum_fault);
 #endif
 
-/* Actually, we should eliminate this check as soon as we know, that:
- * 1. IOMMU is present and allows to map all the memory.
- * 2. No high memory really exists on this machine.
- */
-
+/* XXX: check that highmem exists at all on the given machine. */
 static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_HIGHMEM
@@ -2902,20 +2898,6 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 				return 1;
 		}
 	}
-
-	if (PCI_DMA_BUS_IS_PHYS) {
-		struct device *pdev = dev->dev.parent;
-
-		if (!pdev)
-			return 0;
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-			dma_addr_t addr = page_to_phys(skb_frag_page(frag));
-
-			if (!pdev->dma_mask || addr + PAGE_SIZE - 1 > *pdev->dma_mask)
-				return 1;
-		}
-	}
 #endif
 	return 0;
 }
-- 
2.17.0

^ permalink raw reply related

* [PATCH 10/12] ide: remove the PCI_DMA_BUS_IS_PHYS check
From: Christoph Hellwig @ 2018-04-16  8:50 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-ide-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>

We now have ways to deal with drainage in the block layer, and libata has
been using it for ages.  We also want to get rid of PCI_DMA_BUS_IS_PHYS
now, so just reduce the PCI transfer size for ide - anyone who cares for
performance on PCI controllers should have switched to libata long ago.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/ide/ide-probe.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 8d8ed036ca0a..56d7bc228cb3 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -796,8 +796,7 @@ static int ide_init_queue(ide_drive_t *drive)
 	 * This will be fixed once we teach pci_map_sg() about our boundary
 	 * requirements, hopefully soon. *FIXME*
 	 */
-	if (!PCI_DMA_BUS_IS_PHYS)
-		max_sg_entries >>= 1;
+	max_sg_entries >>= 1;
 #endif /* CONFIG_PCI */
 
 	blk_queue_max_segments(q, max_sg_entries);
-- 
2.17.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox