Netdev List
 help / color / mirror / Atom feed
* Re: [patch iproute2 5/6] link: add missing IFLA_BRPORT_PROXYARP
From: Stephen Hemminger @ 2014-12-04 18:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, f.fainelli, roopa,
	linville, jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a,
	buytenh, aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye,
	simon.horman, alexander.h.duyck, john.ronciak, mleitner, shrijeet,
	gospo, bcrl, hemal
In-Reply-To: <1417683438-10935-6-git-send-email-jiri@resnulli.us>

On Thu,  4 Dec 2014 09:57:17 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Scott Feldman <sfeldma@gmail.com>
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index a6e2594..06efa2d 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -242,6 +242,7 @@ enum {
>  	IFLA_BRPORT_FAST_LEAVE,	/* multicast fast leave    */
>  	IFLA_BRPORT_LEARNING,	/* mac learning */
>  	IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
> +	IFLA_BRPORT_PROXYARP,   /* proxy ARP */
>  	__IFLA_BRPORT_MAX
>  };
>  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)

Unnecessary patch since I pick up headers from upstream.
Already on net-next branch.

^ permalink raw reply

* Re: [PATCHv3 net] i40e: Implement ndo_gso_check()
From: Jeff Kirsher @ 2014-12-04 18:46 UTC (permalink / raw)
  To: Joe Stringer
  Cc: netdev, linux-kernel, jesse, shannon.nelson, jesse.brandeburg,
	therbert, linux.nics
In-Reply-To: <1417718366-14310-1-git-send-email-joestringer@nicira.com>

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]

On Thu, 2014-12-04 at 10:39 -0800, Joe Stringer wrote:
> ndo_gso_check() was recently introduced to allow NICs to report the
> offloading support that they have on a per-skb basis. Add an
> implementation for this driver which checks for IPIP, GRE, UDP
> tunnels.
> 
> Signed-off-by: Joe Stringer <joestringer@nicira.com>
> ---
> v3: Drop IPIP and GRE (no driver support even though hw supports it).
>     Check for UDP outer protocol for UDP tunnels.
> v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
>     Add MAX_INNER_LENGTH (as 80).
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c |   26
> ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)

Thanks Joe, I will add your patch to my queue.

Jesse Gross/Tom- If you guys are OK with the latest patch, I will move
forward with adding this patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCHv2 net] i40e: Implement ndo_gso_check()
From: Joe Stringer @ 2014-12-04 18:41 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Tom Herbert, netdev, Shannon Nelson, Brandeburg, Jesse,
	Jeff Kirsher, linux.nics, Linux Kernel Mailing List
In-Reply-To: <CAEP_g=9Yg-pZf9-Wb4qrZhAMSB=edqDxBXSRskWCturt-nnxTg@mail.gmail.com>

On 2 December 2014 at 10:26, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Dec 1, 2014 at 4:09 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Dec 1, 2014 at 3:53 PM, Jesse Gross <jesse@nicira.com> wrote:
>>> On Mon, Dec 1, 2014 at 3:47 PM, Tom Herbert <therbert@google.com> wrote:
>>>> On Mon, Dec 1, 2014 at 3:35 PM, Joe Stringer <joestringer@nicira.com> wrote:
>>>>> On 21 November 2014 at 09:59, Joe Stringer <joestringer@nicira.com> wrote:
>>>>>> On 20 November 2014 16:19, Jesse Gross <jesse@nicira.com> wrote:
>>>>>>> I don't know if we need to have the check at all for IPIP though -
>>>>>>> after all the driver doesn't expose support for it all (actually it
>>>>>>> doesn't expose GRE either). This raises kind of an interesting
>>>>>>> question about the checks though - it's pretty easy to add support to
>>>>>>> the driver for a new GSO type (and I imagine that people will be
>>>>>>> adding GRE soon) and forget to update the check.
>>>>>>
>>>>>> If the check is more conservative, then testing would show that it's
>>>>>> not working and lead people to figure out why (and update the check).
>>>>>
>>>>> More concretely, one suggestion would be something like following at
>>>>> the start of each gso_check():
>>>>>
>>>>> +       const int supported = SKB_GSO_TCPV4 | SKB_GSO_TCPV6 | SKB_GSO_FCOE |
>>>>> +                             SKB_GSO_UDP | SKB_GSO_UDP_TUNNEL;
>>>>> +
>>>>> +       if (skb_shinfo(skb)->gso_type & ~supported)
>>>>> +               return false;
>>>>
>>>> This should already be handled by net_gso_ok.
>>>
>>> My original point wasn't so much that this isn't handled at the moment
>>> but that it's easy to add a supported GSO type but then forget to
>>> update this check - i.e. if a driver already supports UDP_TUNNEL and
>>> adds support for GRE with the same constraints. It seems not entirely
>>> ideal that this function is acting as a blacklist rather than a
>>> whitelist.
>>
>> Agreed, it would be nice to have all the checking logic in one place.
>> If all the drivers end up implementing ndo_gso_check then we could
>> potentially get rid of the GSO types as features. This probably
>> wouldn't be a bad thing since we already know that the features
>> mechanism doesn't scale (for instance there's no way to indicate that
>> certain combinations of GSO types are supported by a device).
>
> This crossed my mind and I agree that it's pretty clear that the
> features mechanism isn't scaling very well. Presumably, the logical
> extension of this is that each driver would have a function that looks
> at a packet and returns a set of offload operations that it can
> support rather than exposing a set of protocols. However, it seems
> like it would probably result in a bunch of duplicate code in each
> driver.

Given the discussion is still pretty open-ended, I've made the basic
feedback changes for v3 and haven't tried to address the concern about
forgetting to update this check when a driver adds support.

^ permalink raw reply

* [PATCHv3 net] i40e: Implement ndo_gso_check()
From: Joe Stringer @ 2014-12-04 18:39 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, jesse, shannon.nelson, jesse.brandeburg,
	jeffrey.t.kirsher, therbert, linux.nics

ndo_gso_check() was recently introduced to allow NICs to report the
offloading support that they have on a per-skb basis. Add an
implementation for this driver which checks for IPIP, GRE, UDP tunnels.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
---
v3: Drop IPIP and GRE (no driver support even though hw supports it).
    Check for UDP outer protocol for UDP tunnels.
v2: Expand to include IP in IP and IPv4/IPv6 inside GRE/UDP tunnels.
    Add MAX_INNER_LENGTH (as 80).
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |   26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index c3a7f4a..0d6493a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7447,6 +7447,31 @@ static int i40e_ndo_fdb_dump(struct sk_buff *skb,
 
 #endif /* USE_DEFAULT_FDB_DEL_DUMP */
 #endif /* HAVE_FDB_OPS */
+static bool i40e_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) {
+		unsigned char *ihdr;
+
+		if (skb->protocol != IPPROTO_UDP ||
+		    skb->inner_protocol_type != ENCAP_TYPE_ETHER)
+			return false;
+
+		if (skb->inner_protocol == htons(ETH_P_TEB))
+			ihdr = skb_inner_mac_header(skb);
+		else if (skb->inner_protocol == htons(ETH_P_IP) ||
+			 skb->inner_protocol == htons(ETH_P_IPV6))
+			ihdr = skb_inner_network_header(skb);
+		else
+			return false;
+
+#define MAX_TUNNEL_HDR_LEN	80
+		if (ihdr - skb_transport_header(skb) > MAX_TUNNEL_HDR_LEN)
+			return false;
+	}
+
+	return true;
+}
+
 static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_open		= i40e_open,
 	.ndo_stop		= i40e_close,
@@ -7487,6 +7512,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_fdb_dump		= i40e_ndo_fdb_dump,
 #endif
 #endif
+	.ndo_gso_check		= i40e_gso_check,
 };
 
 /**
-- 
1.7.10.4

^ permalink raw reply related

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Jiri Pirko @ 2014-12-04 18:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <87388v9ua6.fsf@x220.int.ebiederm.org>

Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>Would someone please explain to me what a switch id is?
>>>
>>>I looked in the kernel source, and I looked here and while I know
>>>switches I don't have a clue what a switch id is.
>>>
>>>My primary concern at this point is that you have introduced a global
>>>identifier that is isn't a hardware property (it certainly does not look
>>>like a mac address) and that is unique across network namespaces and
>>>thus breaks checkpoint/restart (aka CRIU).
>>
>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>> generated by the driver and ensures that there is the same switch id for
>> all ports belonging to the same switch chip/asic. It is up to the driver
>> how to implement the id. I would like to point you to driver
>> implementing ndo_get_phys_port_id
>
>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>guid in the case of infiniband.  Which really makes me wonder why we
>didn't use the same abstractions in the code for address types that we
>do for hardware addresses.
>
>Using mac address or other hardware addresses that are used for layer 2
>addressing makes sense to me.  There is a long tradition of that and as
>I recall protocols like STP actually requiring having a different mac
>address per port.
>
>When I asked the question I thought the switch id was going to be
>something like the ifindex, the software index of a network device.
>
>
>Finally having tracked down the rocker implementation of 
>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>hardware register.
>
>Which leads me to ask what are the semantics of switch_id?
>
>Is the switch id an identifier with a prefix from IEEE and assigned by
>the manufacture so that it is guaranteed to the tolerances of the
>manufacturing process to be globally unique?

It is up to the driver what to use. It can use mac addr. This is same as
for phys port id.


>
>Is the switch id a random number that is statistically likely to be
>globally unique because you have enough bits?   As I recall you need
>at least 128 bits to have a reasonable chance of a random number
>avoiding the birthday paradox.
>
>Do we need some kind of manufacturer id to tell one switch id from
>another?
>
>Is the switch id persistent across reboots?

Yes it is (as for phys port id).

>
>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>mean we can't have a purely software implementation of this interface?
>>>Given that we will want a software implementation at some point
>>>including PHYS in the name seems completely wrong.
>>
>> We can remove the "PHYS", no problem. I do not understand what you say
>> about "software implementation". The point is to allow hw switch/ish
>> chips to be supported.
>
>If we are talking about something typically stored in a eeprom like a
>mac address phys seems appropriate.

Yes, we are.

>
>Still having a definition of this switch id clean clear enough that
>net/bridge and drivers/net/macvlan can implement it seems important.

I don't understand why would net/bridge or driver/net/macvlan want to
implement this. The purpose is to group ports/netdevs which are part of
the same hw switch.

>
>Even more important is having a definition of switch id clear enough
>that userspace can use the switch id to do something useful.

Userspace threats this the same it treats phys port id.

if two ports/netdevs has same switch id, they belong under same hw
switch. That's all.


>
>Right now switch id looks like one of those weird one manufacturer
>properties that is fine to expose as a driver specific property
>but I don't yet see it being a generic property I that can be used
>usefully in userspace.
>
>So can we please get some clear semantics or failing that can we please
>not expose this to userspace as generic property.

I thought that the semantics is clean. Looks like I will have to update
Documentation/networking/switchdev.txt adding some more info about this.

>
>Thanks,
>Eric
>
>
>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>  include/linux/if_link.h | 1 +
>>>>  ip/ipaddress.c          | 8 ++++++++
>>>>  2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>>> index 4732063..a6e2594 100644
>>>> --- a/include/linux/if_link.h
>>>> +++ b/include/linux/if_link.h
>>>> @@ -145,6 +145,7 @@ enum {
>>>>  	IFLA_CARRIER,
>>>>  	IFLA_PHYS_PORT_ID,
>>>>  	IFLA_CARRIER_CHANGES,
>>>> +	IFLA_PHYS_SWITCH_ID,
>>>>  	__IFLA_MAX
>>>>  };
>>>>  
>>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>>> index 4d99324..bd36a07 100644
>>>> --- a/ip/ipaddress.c
>>>> +++ b/ip/ipaddress.c
>>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>>  				      b1, sizeof(b1)));
>>>>  	}
>>>>  
>>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>>> +		SPRINT_BUF(b1);
>>>> +		fprintf(fp, "switchid %s ",
>>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      b1, sizeof(b1)));
>>>> +	}
>>>> +
>>>>  	if (tb[IFLA_OPERSTATE])
>>>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply

* Re: [PATCH] net: ethernet: rocker: Add select to CONFIG_BRIDGE in Kconfig
From: Jiri Pirko @ 2014-12-04 18:15 UTC (permalink / raw)
  To: Andreas Ruprecht
  Cc: Jim Davis, Stephen Rothwell, linux-next, linux-kernel, sfeldma,
	netdev
In-Reply-To: <54809BA5.7030901@rupran.de>

Thu, Dec 04, 2014 at 06:36:37PM CET, mail@rupran.de wrote:
>On 04.12.2014 17:34, Jim Davis wrote:
>> Building with the attached random configuration file,
>> 
>> drivers/built-in.o: In function `rocker_port_fdb_learn_work':
>> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3014: undefined
>> reference to `br_fdb_external_learn_del'
>> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3016: undefined
>> reference to `br_fdb_external_learn_add'
>> 
>
>Hi,
>
>the problem here is that CONFIG_BRIDGE is set to 'm' (leading to
>inclusion of the two functions above in the kernel module) while
>CONFIG_ROCKER is set to 'y', requiring the functions at link time.
>
>Is the attached patch sufficient to fix this?
>
>Regards,
>
>Andreas

>From 0529c3cbe381338dc3337e07a71e15b3d22a3255 Mon Sep 17 00:00:00 2001
>From: Andreas Ruprecht <rupran@einserver.de>
>Date: Thu, 4 Dec 2014 18:28:09 +0100
>Subject: [PATCH] net: ethernet: rocker: Add select to CONFIG_BRIDGE in Kconfig
>
>In a configuration with CONFIG_BRIDGE set to 'm' and CONFIG_ROCKER
>set to 'y', undefined references occur at link time:
>
>> drivers/built-in.o: In function `rocker_port_fdb_learn_work':
>> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3014: undefined
>> reference to `br_fdb_external_learn_del'
>> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3016: undefined
>> reference to `br_fdb_external_learn_add'
>
>This patch fixes these by selecting CONFIG_BRIDGE from CONFIG_ROCKER.
>
>Reported-by: Jim Davis <jim.epost@gmail.com>
>Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Acked-by: Jiri Pirko <jiri@resnulli.us>

this is ok for now. There is a plan to replace
br_fdb_external_learn_add/del a by notifier which will fix this as well.

Thanks.

>---
> drivers/net/ethernet/rocker/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/drivers/net/ethernet/rocker/Kconfig b/drivers/net/ethernet/rocker/Kconfig
>index 11a850eab628..ade10ec4c78d 100644
>--- a/drivers/net/ethernet/rocker/Kconfig
>+++ b/drivers/net/ethernet/rocker/Kconfig
>@@ -18,6 +18,7 @@ if NET_VENDOR_ROCKER
> config ROCKER
> 	tristate "Rocker switch driver (EXPERIMENTAL)"
> 	depends on PCI && NET_SWITCHDEV
>+	select BRIDGE
> 	---help---
> 	  This driver supports Rocker switch device.
> 
>-- 
>1.9.1
>

^ permalink raw reply

* Re: [PATCH] x86: bpf_jit_comp: simplify trivial boolean return
From: Joe Perches @ 2014-12-04 18:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Laight, Quentin Lambert, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86@kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <CAADnVQJawht6+sARaP=s9PyPERjrVOBKiJLrCK0OwnLtzT2CAA@mail.gmail.com>

On Thu, 2014-12-04 at 07:56 -0800, Alexei Starovoitov wrote:
> On Thu, Dec 4, 2014 at 1:26 AM, Joe Perches <joe@perches.com> wrote:
> > On Thu, 2014-11-27 at 10:49 -0800, Joe Perches wrote:
> >> On Thu, 2014-11-27 at 12:25 +0000, David Laight wrote:
> >> > Why the change in data?
> >>
> >> btw: without gcov and using -O2
> >>
> >> $ size arch/x86/net/bpf_jit_comp.o*
> >>    text          data     bss     dec     hex filename
> >>    9671             4       0    9675    25cb arch/x86/net/bpf_jit_comp.o.new
> >>   10679             4       0   10683    29bb arch/x86/net/bpf_jit_comp.o.old
> >
> > Alexei?
> >
> > Is this 10% reduction in size a good reason to change the code?
> 
> yes.
> I believe you're seeing it with gcc 4.9. I wanted to double
> check what 4.6 and 4.7 are doing. If they're not suddenly
> increase code size then resubmit it for inclusion please.

I get these sizes for these compilers
(x86-64, -O2, without profiling)

$ size arch/x86/net/bpf_jit_comp.o*
   text	   data	    bss	    dec	    hex	filename
   9266	      4	      0	   9270	   2436	arch/x86/net/bpf_jit_comp.o.4.4.new
  10042	      4	      0	  10046	   273e	arch/x86/net/bpf_jit_comp.o.4.4.old
   9109	      4	      0	   9113	   2399	arch/x86/net/bpf_jit_comp.o.4.6.new
   9717	      4	      0	   9721	   25f9	arch/x86/net/bpf_jit_comp.o.4.6.old
   8789	      4	      0	   8793	   2259	arch/x86/net/bpf_jit_comp.o.4.7.new
  10245	      4	      0	  10249	   2809	arch/x86/net/bpf_jit_comp.o.4.7.old
   9671	      4	      0	   9675	   25cb	arch/x86/net/bpf_jit_comp.o.4.9.new
  10679	      4	      0	  10683	   29bb	arch/x86/net/bpf_jit_comp.o.4.9.old

I am a bit surprised by the size variations

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Roopa Prabhu @ 2014-12-04 17:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <87388v9ua6.fsf@x220.int.ebiederm.org>

On 12/4/14, 9:52 AM, Eric W. Biederman wrote:
> Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>> Would someone please explain to me what a switch id is?
>>>
>>> I looked in the kernel source, and I looked here and while I know
>>> switches I don't have a clue what a switch id is.
>>>
>>> My primary concern at this point is that you have introduced a global
>>> identifier that is isn't a hardware property (it certainly does not look
>>> like a mac address) and that is unique across network namespaces and
>>> thus breaks checkpoint/restart (aka CRIU).
>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>> generated by the driver and ensures that there is the same switch id for
>> all ports belonging to the same switch chip/asic. It is up to the driver
>> how to implement the id. I would like to point you to driver
>> implementing ndo_get_phys_port_id
> Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
> guid in the case of infiniband.  Which really makes me wonder why we
> didn't use the same abstractions in the code for address types that we
> do for hardware addresses.
>
> Using mac address or other hardware addresses that are used for layer 2
> addressing makes sense to me.  There is a long tradition of that and as
> I recall protocols like STP actually requiring having a different mac
> address per port.
>
> When I asked the question I thought the switch id was going to be
> something like the ifindex, the software index of a network device.
>
>
> Finally having tracked down the rocker implementation of
> rocker_port_switch_parent_id_get I see it you are reading some 64bit
> hardware register.
>
> Which leads me to ask what are the semantics of switch_id?
>
> Is the switch id an identifier with a prefix from IEEE and assigned by
> the manufacture so that it is guaranteed to the tolerances of the
> manufacturing process to be globally unique?
>
> Is the switch id a random number that is statistically likely to be
> globally unique because you have enough bits?   As I recall you need
> at least 128 bits to have a reasonable chance of a random number
> avoiding the birthday paradox.
>
> Do we need some kind of manufacturer id to tell one switch id from
> another?
>
> Is the switch id persistent across reboots?
>
>>> Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>> mean we can't have a purely software implementation of this interface?
>>> Given that we will want a software implementation at some point
>>> including PHYS in the name seems completely wrong.
>> We can remove the "PHYS", no problem. I do not understand what you say
>> about "software implementation". The point is to allow hw switch/ish
>> chips to be supported.
> If we are talking about something typically stored in a eeprom like a
> mac address phys seems appropriate.
>
> Still having a definition of this switch id clean clear enough that
> net/bridge and drivers/net/macvlan can implement it seems important.
>
> Even more important is having a definition of switch id clear enough
> that userspace can use the switch id to do something useful.
>
> Right now switch id looks like one of those weird one manufacturer
> properties that is fine to expose as a driver specific property
> but I don't yet see it being a generic property I that can be used
> usefully in userspace.
>
> So can we please get some clear semantics or failing that can we please
> not expose this to userspace as generic property.

Agree..100%. This was my original concern as well and i have raised it 
earlier.
  Don't expose it to userspace if the semantics are still not clear.

Thanks.

>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>   include/linux/if_link.h | 1 +
>>>>   ip/ipaddress.c          | 8 ++++++++
>>>>   2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>>> index 4732063..a6e2594 100644
>>>> --- a/include/linux/if_link.h
>>>> +++ b/include/linux/if_link.h
>>>> @@ -145,6 +145,7 @@ enum {
>>>>   	IFLA_CARRIER,
>>>>   	IFLA_PHYS_PORT_ID,
>>>>   	IFLA_CARRIER_CHANGES,
>>>> +	IFLA_PHYS_SWITCH_ID,
>>>>   	__IFLA_MAX
>>>>   };
>>>>   
>>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>>> index 4d99324..bd36a07 100644
>>>> --- a/ip/ipaddress.c
>>>> +++ b/ip/ipaddress.c
>>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>>   				      b1, sizeof(b1)));
>>>>   	}
>>>>   
>>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>>> +		SPRINT_BUF(b1);
>>>> +		fprintf(fp, "switchid %s ",
>>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      b1, sizeof(b1)));
>>>> +	}
>>>> +
>>>>   	if (tb[IFLA_OPERSTATE])
>>>>   		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Eric W. Biederman @ 2014-12-04 17:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <20141204163024.GG1861@nanopsycho.orion>

Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>Jiri Pirko <jiri@resnulli.us> writes:
>>
>>Would someone please explain to me what a switch id is?
>>
>>I looked in the kernel source, and I looked here and while I know
>>switches I don't have a clue what a switch id is.
>>
>>My primary concern at this point is that you have introduced a global
>>identifier that is isn't a hardware property (it certainly does not look
>>like a mac address) and that is unique across network namespaces and
>>thus breaks checkpoint/restart (aka CRIU).
>
> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
> generated by the driver and ensures that there is the same switch id for
> all ports belonging to the same switch chip/asic. It is up to the driver
> how to implement the id. I would like to point you to driver
> implementing ndo_get_phys_port_id

Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
guid in the case of infiniband.  Which really makes me wonder why we
didn't use the same abstractions in the code for address types that we
do for hardware addresses.

Using mac address or other hardware addresses that are used for layer 2
addressing makes sense to me.  There is a long tradition of that and as
I recall protocols like STP actually requiring having a different mac
address per port.

When I asked the question I thought the switch id was going to be
something like the ifindex, the software index of a network device.


Finally having tracked down the rocker implementation of 
rocker_port_switch_parent_id_get I see it you are reading some 64bit
hardware register.

Which leads me to ask what are the semantics of switch_id?

Is the switch id an identifier with a prefix from IEEE and assigned by
the manufacture so that it is guaranteed to the tolerances of the
manufacturing process to be globally unique?

Is the switch id a random number that is statistically likely to be
globally unique because you have enough bits?   As I recall you need
at least 128 bits to have a reasonable chance of a random number
avoiding the birthday paradox.

Do we need some kind of manufacturer id to tell one switch id from
another?

Is the switch id persistent across reboots?

>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>mean we can't have a purely software implementation of this interface?
>>Given that we will want a software implementation at some point
>>including PHYS in the name seems completely wrong.
>
> We can remove the "PHYS", no problem. I do not understand what you say
> about "software implementation". The point is to allow hw switch/ish
> chips to be supported.

If we are talking about something typically stored in a eeprom like a
mac address phys seems appropriate.

Still having a definition of this switch id clean clear enough that
net/bridge and drivers/net/macvlan can implement it seems important.

Even more important is having a definition of switch id clear enough
that userspace can use the switch id to do something useful.

Right now switch id looks like one of those weird one manufacturer
properties that is fine to expose as a driver specific property
but I don't yet see it being a generic property I that can be used
usefully in userspace.

So can we please get some clear semantics or failing that can we please
not expose this to userspace as generic property.

Thanks,
Eric



>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>  include/linux/if_link.h | 1 +
>>>  ip/ipaddress.c          | 8 ++++++++
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>> index 4732063..a6e2594 100644
>>> --- a/include/linux/if_link.h
>>> +++ b/include/linux/if_link.h
>>> @@ -145,6 +145,7 @@ enum {
>>>  	IFLA_CARRIER,
>>>  	IFLA_PHYS_PORT_ID,
>>>  	IFLA_CARRIER_CHANGES,
>>> +	IFLA_PHYS_SWITCH_ID,
>>>  	__IFLA_MAX
>>>  };
>>>  
>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>> index 4d99324..bd36a07 100644
>>> --- a/ip/ipaddress.c
>>> +++ b/ip/ipaddress.c
>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>  				      b1, sizeof(b1)));
>>>  	}
>>>  
>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>> +		SPRINT_BUF(b1);
>>> +		fprintf(fp, "switchid %s ",
>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>> +				      b1, sizeof(b1)));
>>> +	}
>>> +
>>>  	if (tb[IFLA_OPERSTATE])
>>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

^ permalink raw reply

* [PATCH net] amd-xgbe: Prevent Tx cleanup stall
From: Tom Lendacky @ 2014-12-04 17:52 UTC (permalink / raw)
  To: netdev; +Cc: David Miller

When performing Tx cleanup, the dirty index counter is compared to the
current index counter as one of the tests used to determine when to stop
cleanup. The "less than" test will fail when the current index counter
rolls over to zero causing cleanup to never occur again. Update the test
to a "not equal" to avoid this situation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 2349ea9..d0e3530 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1554,7 +1554,7 @@ static int xgbe_tx_poll(struct xgbe_channel *channel)
 	spin_lock_irqsave(&ring->lock, flags);
 
 	while ((processed < XGBE_TX_DESC_MAX_PROC) &&
-	       (ring->dirty < ring->cur)) {
+	       (ring->dirty != ring->cur)) {
 		rdata = XGBE_GET_DESC_DATA(ring, ring->dirty);
 		rdesc = rdata->rdesc;
 

^ permalink raw reply related

* [PATCH v6 7/7] fs/splice: full support for compiling out splice
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Triplett, Pieter Smith, Alexander Duyck, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Bertrand Jacquin,
	Catalina Mocanu, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Eric W. Biederman, Fabian Frederick,
	open list:FUSE: FILESYSTEM..., Geert Uytterhoeven, Hugh Dickins,
	Iulia Manda, Jan Beulich, J. Bruce Fields, Jeff Layton,
	open list:ABI/API, linux-fsd
In-Reply-To: <1417715473-24110-1-git-send-email-pieter@boesman.nl>

Entirely compile out splice translation unit when the system is configured
without splice family of syscalls (i.e. CONFIG_SYSCALL_SPLICE is undefined).

Exported fs/splice functions are transparently mocked out with static inlines.
Because userspace support for splice has already been removed by this
patch-set, the exported functions cannot be called anyway. Mocking them out
prevents a maintenance burden on file system drivers.

The bloat score resulting from this patch given a tinyconfig is:
add/remove: 0/25 grow/shrink: 0/5 up/down: 0/-4845 (-4845)
function                                     old     new   delta
pipe_to_null                                   4       -      -4
generic_pipe_buf_nosteal                       6       -      -6
spd_release_page                              10       -     -10
PageUptodate                                  22      11     -11
lock_page                                     36      24     -12
page_cache_pipe_buf_release                   16       -     -16
splice_write_null                             24       4     -20
page_cache_pipe_buf_ops                       20       -     -20
nosteal_pipe_buf_ops                          20       -     -20
default_pipe_buf_ops                          20       -     -20
generic_splice_sendpage                       24       -     -24
splice_shrink_spd                             27       -     -27
direct_splice_actor                           47       -     -47
default_file_splice_write                     49       -     -49
wakeup_pipe_writers                           54       -     -54
write_pipe_buf                                71       -     -71
page_cache_pipe_buf_confirm                   80       -     -80
splice_grow_spd                               87       -     -87
splice_from_pipe                              93       -     -93
splice_from_pipe_next                        106       -    -106
pipe_to_sendpage                             109       -    -109
page_cache_pipe_buf_steal                    114       -    -114
generic_file_splice_read                     131       8    -123
do_splice_direct                             148       -    -148
__splice_from_pipe                           246       -    -246
splice_direct_to_actor                       416       -    -416
splice_to_pipe                               417       -    -417
default_file_splice_read                     688       -    -688
iter_file_splice_write                       702       4    -698
__generic_file_splice_read                  1109       -   -1109

The bloat score for the entire CONFIG_SYSCALL_SPLICE patch-set is:
add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)
function                                     old     new   delta
sys_pwritev                                  115     122      +7
sys_preadv                                   115     122      +7
fdput_pos                                     29      36      +7
sys_pwrite64                                 115     116      +1
sys_pread64                                  115     116      +1
pipe_to_null                                   4       -      -4
generic_pipe_buf_nosteal                       6       -      -6
spd_release_page                              10       -     -10
fdput                                         11       -     -11
PageUptodate                                  22      11     -11
lock_page                                     36      24     -12
signal_pending                                39      26     -13
fdget                                         56      42     -14
page_cache_pipe_buf_release                   16       -     -16
user_page_pipe_buf_ops                        20       -     -20
splice_write_null                             24       4     -20
page_cache_pipe_buf_ops                       20       -     -20
nosteal_pipe_buf_ops                          20       -     -20
default_pipe_buf_ops                          20       -     -20
generic_splice_sendpage                       24       -     -24
user_page_pipe_buf_steal                      25       -     -25
splice_shrink_spd                             27       -     -27
pipe_to_user                                  43       -     -43
direct_splice_actor                           47       -     -47
default_file_splice_write                     49       -     -49
wakeup_pipe_writers                           54       -     -54
wakeup_pipe_readers                           54       -     -54
write_pipe_buf                                71       -     -71
page_cache_pipe_buf_confirm                   80       -     -80
splice_grow_spd                               87       -     -87
do_splice_to                                  87       -     -87
ipipe_prep.part                               92       -     -92
splice_from_pipe                              93       -     -93
splice_from_pipe_next                        107       -    -107
pipe_to_sendpage                             109       -    -109
page_cache_pipe_buf_steal                    114       -    -114
opipe_prep.part                              119       -    -119
sys_sendfile                                 122       -    -122
generic_file_splice_read                     131       8    -123
sys_sendfile64                               126       -    -126
sys_vmsplice                                 137       -    -137
do_splice_direct                             148       -    -148
vmsplice_to_user                             205       -    -205
__splice_from_pipe                           246       -    -246
splice_direct_to_actor                       348       -    -348
splice_to_pipe                               371       -    -371
do_sendfile                                  492       -    -492
sys_tee                                      497       -    -497
vmsplice_to_pipe                             558       -    -558
default_file_splice_read                     688       -    -688
iter_file_splice_write                       702       4    -698
sys_splice                                  1075       -   -1075
__generic_file_splice_read                  1109       -   -1109

Signed-off-by: Pieter Smith <pieter@boesman.nl>
---
 fs/Makefile            |  3 ++-
 fs/splice.c            |  2 --
 include/linux/fs.h     | 26 ++++++++++++++++++++++++++
 include/linux/splice.h | 42 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index fb7646e..9395622 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -10,7 +10,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		ioctl.o readdir.o select.o dcache.o inode.o \
 		attr.o bad_inode.o file.o filesystems.o namespace.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
-		pnode.o splice.o sync.o utimes.o \
+		pnode.o sync.o utimes.o \
 		stack.o fs_struct.o statfs.o fs_pin.o
 
 ifeq ($(CONFIG_BLOCK),y)
@@ -22,6 +22,7 @@ endif
 obj-$(CONFIG_PROC_FS) += proc_namespace.o
 
 obj-$(CONFIG_FSNOTIFY)		+= notify/
+obj-$(CONFIG_SYSCALL_SPLICE)	+= splice.o
 obj-$(CONFIG_EPOLL)		+= eventpoll.o
 obj-$(CONFIG_ANON_INODES)	+= anon_inodes.o
 obj-$(CONFIG_SIGNALFD)		+= signalfd.o
diff --git a/fs/splice.c b/fs/splice.c
index 7c4c695..44b201b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1316,7 +1316,6 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 	return ret;
 }
 
-#ifdef CONFIG_SYSCALL_SPLICE
 static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,
 			       size_t len, unsigned int flags);
@@ -2201,5 +2200,4 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
 	return do_sendfile(out_fd, in_fd, NULL, count, 0);
 }
 #endif
-#endif
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a957d43..138107e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2444,6 +2444,7 @@ extern int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
 extern void block_sync_page(struct page *page);
 
 /* fs/splice.c */
+#ifdef CONFIG_SYSCALL_SPLICE
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
 		struct pipe_inode_info *, size_t, unsigned int);
 extern ssize_t default_file_splice_read(struct file *, loff_t *,
@@ -2452,6 +2453,31 @@ extern ssize_t iter_file_splice_write(struct pipe_inode_info *,
 		struct file *, loff_t *, size_t, unsigned int);
 extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
 		struct file *out, loff_t *, size_t len, unsigned int flags);
+#else
+static inline ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
+		struct pipe_inode_info *pipe, size_t len, unsigned int flags)
+{
+	return -EPERM;
+}
+
+static inline ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
+		struct pipe_inode_info *pipe, size_t len, unsigned int flags)
+{
+	return -EPERM;
+}
+
+static inline ssize_t iter_file_splice_write(struct pipe_inode_info *pipe,
+		struct file *out, loff_t *ppos, size_t len, unsigned int flags)
+{
+	return -EPERM;
+}
+
+static inline ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe,
+		struct file *out, loff_t *ppos, size_t len, unsigned int flags)
+{
+	return -EPERM;
+}
+#endif
 
 extern void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index da2751d..34570d8 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -65,6 +65,7 @@ typedef int (splice_actor)(struct pipe_inode_info *, struct pipe_buffer *,
 typedef int (splice_direct_actor)(struct pipe_inode_info *,
 				  struct splice_desc *);
 
+#ifdef CONFIG_SYSCALL_SPLICE
 extern ssize_t splice_from_pipe(struct pipe_inode_info *, struct file *,
 				loff_t *, size_t, unsigned int,
 				splice_actor *);
@@ -74,13 +75,54 @@ extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 			      struct splice_pipe_desc *);
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,
 				      splice_direct_actor *);
+#else
+static inline ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
+			 loff_t *ppos, size_t len, unsigned int flags,
+			 splice_actor *actor)
+{
+	return -EPERM;
+}
+
+static inline ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
+			   splice_actor *actor)
+{
+	return -EPERM;
+}
+
+static inline ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
+		       struct splice_pipe_desc *spd)
+{
+	return -EPERM;
+}
+
+static inline ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
+			       splice_direct_actor *actor)
+{
+	return -EPERM;
+}
+#endif
 
 /*
  * for dynamic pipe sizing
  */
+#ifdef CONFIG_SYSCALL_SPLICE
 extern int splice_grow_spd(const struct pipe_inode_info *, struct splice_pipe_desc *);
 extern void splice_shrink_spd(struct splice_pipe_desc *);
 extern void spd_release_page(struct splice_pipe_desc *, unsigned int);
+#else
+static inline int splice_grow_spd(const struct pipe_inode_info *pipe, struct splice_pipe_desc *spd)
+{
+	return -EPERM;
+}
+
+static inline void splice_shrink_spd(struct splice_pipe_desc *spd)
+{
+}
+
+static inline void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
+{
+}
+#endif
 
 extern const struct pipe_buf_operations page_cache_pipe_buf_ops;
 #endif
-- 
2.1.0

^ permalink raw reply related

* [PATCH v6 6/7] fs/nfsd: support compiling out splice
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Triplett, Pieter Smith, Alexander Duyck, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Bertrand Jacquin,
	Catalina Mocanu, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Eric W. Biederman, Fabian Frederick,
	open list:FUSE: FILESYSTEM..., Geert Uytterhoeven, Hugh Dickins,
	Iulia Manda, Jan Beulich, J. Bruce Fields, Jeff Layton,
	open list:ABI/API, linux-fsd
In-Reply-To: <1417715473-24110-1-git-send-email-pieter@boesman.nl>

The goal of the larger patch set is to completely compile out fs/splice, and
as a result, splice support for all file-systems. This patch ensures that
fs/nfsd falls back to non-splice fs support when CONFIG_SYSCALL_SPLICE is
undefined.

Signed-off-by: Pieter Smith <pieter@boesman.nl>
---
 net/sunrpc/svc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index ca8a795..6cacc37 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1084,7 +1084,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
 		goto err_short_len;
 
 	/* Will be turned off only in gss privacy case: */
-	rqstp->rq_splice_ok = true;
+	rqstp->rq_splice_ok = IS_ENABLED(CONFIG_SPLICE_SYSCALL);
 	/* Will be turned off only when NFSv4 Sessions are used */
 	rqstp->rq_usedeferral = true;
 	rqstp->rq_dropme = false;
-- 
2.1.0


^ permalink raw reply related

* [PATCH v6 5/7] net/core: support compiling out splice
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Michael S. Tsirkin, Trond Myklebust, Bertrand Jacquin,
	J. Bruce Fields, Eric Dumazet, Willem de Bruijn,
	蔡正龙, Jeff Layton, Tom Herbert,
	Alexei Starovoitov, Miklos Szeredi, Peter Foley, Hugh Dickins,
	Xiao Guangrong, Geert Uytterhoeven, Mel Gorman, Matt Turner,
	Paul E. McKenney, Alexander Duyck, Pieter Smith,
	open list:FUSE: FILESYSTEM...
In-Reply-To: <1417715473-24110-1-git-send-email-pieter-qeJ+1H9vRZbz+pZb47iToQ@public.gmane.org>

To implement splice support, net/core makes use of nosteal_pipe_buf_ops. This
struct is exported by fs/splice. The goal of the larger patch set is to
completely compile out fs/splice, so uses of the exported struct need to be
compiled out along with fs/splice.

This patch therefore compiles out splice support in net/core when
CONFIG_SYSCALL_SPLICE is undefined. The compiled out function skb_splice_bits
is transparently mocked out with a static inline. The greater patch set removes
userspace splice support so it cannot be called anyway.

Signed-off-by: Pieter Smith <pieter-qeJ+1H9vRZbz+pZb47iToQ@public.gmane.org>
---
 include/linux/skbuff.h | 10 ++++++++++
 net/core/skbuff.c      | 11 +++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a59d934..5cd636b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2640,9 +2640,19 @@ int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len);
 int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len);
 __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, u8 *to,
 			      int len, __wsum csum);
+#ifdef CONFIG_SYSCALL_SPLICE
 int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 		    struct pipe_inode_info *pipe, unsigned int len,
 		    unsigned int flags);
+#else
+static inline int
+skb_splice_bits(struct sk_buff *skb, unsigned int offset,
+		struct pipe_inode_info *pipe, unsigned int len,
+		unsigned int flags)
+{
+	return -EPERM;
+}
+#endif
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
 unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
 int skb_zerocopy(struct sk_buff *to, struct sk_buff *from,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 61059a0..bb426d9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1678,7 +1678,8 @@ EXPORT_SYMBOL(skb_copy_bits);
  * Callback from splice_to_pipe(), if we need to release some pages
  * at the end of the spd in case we error'ed out in filling the pipe.
  */
-static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i)
+static void __maybe_unused sock_spd_release(struct splice_pipe_desc *spd,
+					    unsigned int i)
 {
 	put_page(spd->pages[i]);
 }
@@ -1781,9 +1782,9 @@ static bool __splice_segment(struct page *page, unsigned int poff,
  * Map linear and fragment data from the skb to spd. It reports true if the
  * pipe is full or if we already spliced the requested length.
  */
-static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
-			      unsigned int *offset, unsigned int *len,
-			      struct splice_pipe_desc *spd, struct sock *sk)
+static bool __maybe_unused __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
+					     unsigned int *offset, unsigned int *len,
+					     struct splice_pipe_desc *spd, struct sock *sk)
 {
 	int seg;
 
@@ -1821,6 +1822,7 @@ static bool __skb_splice_bits(struct sk_buff *skb, struct pipe_inode_info *pipe,
  * the frag list, if such a thing exists. We'd probably need to recurse to
  * handle that cleanly.
  */
+#ifdef CONFIG_SYSCALL_SPLICE
 int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 		    struct pipe_inode_info *pipe, unsigned int tlen,
 		    unsigned int flags)
@@ -1876,6 +1878,7 @@ done:
 
 	return ret;
 }
+#endif /* CONFIG_SYSCALL_SPLICE */
 
 /**
  *	skb_store_bits - store bits from kernel buffer to skb
-- 
2.1.0


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk

^ permalink raw reply related

* [PATCH v6 4/7] fs/fuse: support compiling out splice
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Triplett, Pieter Smith, Alexander Duyck, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Bertrand Jacquin,
	Catalina Mocanu, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Eric W. Biederman, Fabian Frederick,
	open list:FUSE: FILESYSTEM..., Geert Uytterhoeven, Hugh Dickins,
	Iulia Manda, Jan Beulich, J. Bruce Fields, Jeff Layton,
	open list:ABI/API, linux-fsd
In-Reply-To: <1417715473-24110-1-git-send-email-pieter@boesman.nl>

To implement splice support, fs/fuse makes use of nosteal_pipe_buf_ops. This
struct is exported by fs/splice. The goal of the larger patch set is to
completely compile out fs/splice, so uses of the exported struct need to be
compiled out along with fs/splice.

This patch therefore compiles out splice support in fs/fuse when
CONFIG_SYSCALL_SPLICE is undefined.

Signed-off-by: Pieter Smith <pieter@boesman.nl>
---
 fs/fuse/dev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ca88731..99f1ff4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1291,6 +1291,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, const struct iovec *iov,
 	return fuse_dev_do_read(fc, file, &cs, iov_length(iov, nr_segs));
 }
 
+#ifdef CONFIG_SYSCALL_SPLICE
 static ssize_t fuse_dev_splice_read(struct file *in, loff_t *ppos,
 				    struct pipe_inode_info *pipe,
 				    size_t len, unsigned int flags)
@@ -1368,6 +1369,9 @@ out:
 	kfree(bufs);
 	return ret;
 }
+#else /* CONFIG_SYSCALL_SPLICE */
+#define fuse_dev_splice_read NULL
+#endif
 
 static int fuse_notify_poll(struct fuse_conn *fc, unsigned int size,
 			    struct fuse_copy_state *cs)
-- 
2.1.0


^ permalink raw reply related

* [PATCH v6 3/7] fs/splice: support compiling out splice-family syscalls
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Triplett, Pieter Smith, Alexander Duyck, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Bertrand Jacquin,
	Catalina Mocanu, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Eric W. Biederman, Fabian Frederick,
	open list:FUSE: FILESYSTEM..., Geert Uytterhoeven, Hugh Dickins,
	Iulia Manda, Jan Beulich, J. Bruce Fields, Jeff Layton,
	open list:ABI/API, linux-fsd
In-Reply-To: <1417715473-24110-1-git-send-email-pieter@boesman.nl>

Many embedded systems will not need the splice-family syscalls (splice,
vmsplice, tee and sendfile). Omitting them saves space.  This adds a new EXPERT
config option CONFIG_SYSCALL_SPLICE (default y) to support compiling them out.

The goal is to completely compile out fs/splice along with the syscalls. To
achieve this, the remaining patch-set will deal with fs/splice exports. As far
as possible, the impact on other device drivers will be minimized so as to
reduce the overal maintenance burden of CONFIG_SYSCALL_SPLICE.

The use of exported functions will be solved by transparently mocking them out
with static inlines. Uses of the exported pipe_buf_operations struct however
require direct modification in fs/fuse and net/core. The next two patches will
deal with this.

The last change required before fs/splice can be comipled out is making fs/nfsd
aware of the lacking splice support in file-systems when CONFIG_SYSCALL_SPLICE
is undefined.

The bloat benefit of this patch given a tinyconfig is:

add/remove: 0/16 grow/shrink: 2/5 up/down: 114/-3693 (-3579)
function                                     old     new   delta
splice_direct_to_actor                       348     416     +68
splice_to_pipe                               371     417     +46
splice_from_pipe_next                        107     106      -1
fdput                                         11       -     -11
signal_pending                                39      26     -13
fdget                                         56      42     -14
user_page_pipe_buf_ops                        20       -     -20
user_page_pipe_buf_steal                      25       -     -25
file_end_write                                58      29     -29
file_start_write                              68      34     -34
pipe_to_user                                  43       -     -43
wakeup_pipe_readers                           54       -     -54
do_splice_to                                  87       -     -87
ipipe_prep.part                               92       -     -92
opipe_prep.part                              119       -    -119
sys_sendfile                                 122       -    -122
sys_sendfile64                               126       -    -126
sys_vmsplice                                 137       -    -137
vmsplice_to_user                             205       -    -205
sys_tee                                      491       -    -491
do_sendfile                                  492       -    -492
vmsplice_to_pipe                             558       -    -558
sys_splice                                  1020       -   -1020

Signed-off-by: Pieter Smith <pieter@boesman.nl>
---
 fs/splice.c     |  2 ++
 init/Kconfig    | 10 ++++++++++
 kernel/sys_ni.c |  8 ++++++++
 3 files changed, 20 insertions(+)

diff --git a/fs/splice.c b/fs/splice.c
index 44b201b..7c4c695 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1316,6 +1316,7 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 	return ret;
 }
 
+#ifdef CONFIG_SYSCALL_SPLICE
 static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			       struct pipe_inode_info *opipe,
 			       size_t len, unsigned int flags);
@@ -2200,4 +2201,5 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
 	return do_sendfile(out_fd, in_fd, NULL, count, 0);
 }
 #endif
+#endif
 
diff --git a/init/Kconfig b/init/Kconfig
index d811d5f..dec9819 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1571,6 +1571,16 @@ config NTP
 	  system clock to an NTP server, you can disable this option to save
 	  space.
 
+config SYSCALL_SPLICE
+	bool "Enable splice/vmsplice/tee/sendfile syscalls" if EXPERT
+	default y
+	help
+	  This option enables the splice, vmsplice, tee and sendfile syscalls. These
+	  are used by applications to: move data between buffers and arbitrary file
+	  descriptors; "copy" data between buffers; or copy data from userspace into
+	  buffers. If building an embedded system where no applications use these
+	  syscalls, you can disable this option to save space.
+
 config PCI_QUIRKS
 	default y
 	bool "Enable PCI quirk workarounds" if EXPERT
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index d2f5b00..25d5551 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -170,6 +170,14 @@ cond_syscall(sys_fstat);
 cond_syscall(sys_stat);
 cond_syscall(sys_uname);
 cond_syscall(sys_olduname);
+cond_syscall(sys_vmsplice);
+cond_syscall(sys_splice);
+cond_syscall(sys_tee);
+cond_syscall(sys_sendfile);
+cond_syscall(sys_sendfile64);
+cond_syscall(compat_sys_vmsplice);
+cond_syscall(compat_sys_sendfile);
+cond_syscall(compat_sys_sendfile64);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);
-- 
2.1.0

^ permalink raw reply related

* [PATCH v6 2/7] fs: moved kernel_write to fs/read_write
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Michael S. Tsirkin, Trond Myklebust, Bertrand Jacquin,
	J. Bruce Fields, Eric Dumazet, Willem de Bruijn,
	蔡正龙, Jeff Layton, Tom Herbert,
	Alexei Starovoitov, Miklos Szeredi, Peter Foley, Hugh Dickins,
	Xiao Guangrong, Geert Uytterhoeven, Mel Gorman, Matt Turner,
	Paul E. McKenney, Alexander Duyck, Pieter Smith,
	open list:FUSE: FILESYSTEM...
In-Reply-To: <1417715473-24110-1-git-send-email-pieter-qeJ+1H9vRZbz+pZb47iToQ@public.gmane.org>

kernel_write shares infrastructure with the read_write translation unit but not
with the splice translation unit. Grouping kernel_write with the read_write
translation unit is more logical. It also paves the way to compiling out the
splice group of syscalls for embedded systems that do not need them.

Signed-off-by: Pieter Smith <pieter-qeJ+1H9vRZbz+pZb47iToQ@public.gmane.org>
---
 fs/read_write.c | 16 ++++++++++++++++
 fs/splice.c     | 16 ----------------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index d9451ba..f4c8d8b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1191,3 +1191,19 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 }
 #endif
 
+ssize_t kernel_write(struct file *file, const char *buf, size_t count,
+			    loff_t pos)
+{
+	mm_segment_t old_fs;
+	ssize_t res;
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	/* The cast to a user pointer is valid due to the set_fs() */
+	res = vfs_write(file, (__force const char __user *)buf, count, &pos);
+	set_fs(old_fs);
+
+	return res;
+}
+EXPORT_SYMBOL(kernel_write);
+
diff --git a/fs/splice.c b/fs/splice.c
index c1a2861..44b201b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -583,22 +583,6 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
 	return res;
 }
 
-ssize_t kernel_write(struct file *file, const char *buf, size_t count,
-			    loff_t pos)
-{
-	mm_segment_t old_fs;
-	ssize_t res;
-
-	old_fs = get_fs();
-	set_fs(get_ds());
-	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_write(file, (__force const char __user *)buf, count, &pos);
-	set_fs(old_fs);
-
-	return res;
-}
-EXPORT_SYMBOL(kernel_write);
-
 ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
-- 
2.1.0


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk

^ permalink raw reply related

* [PATCH v6 1/7] fs: move sendfile syscall into fs/splice
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Josh Triplett, Pieter Smith, Alexander Duyck, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Bertrand Jacquin,
	Catalina Mocanu, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Eric W. Biederman, Fabian Frederick,
	open list:FUSE: FILESYSTEM..., Geert Uytterhoeven, Hugh Dickins,
	Iulia Manda, Jan Beulich, J. Bruce Fields, Jeff Layton,
	open list:ABI/API, linux-fsd
In-Reply-To: <1417715473-24110-1-git-send-email-pieter@boesman.nl>

sendfile functionally forms part of the splice group of syscalls (splice,
vmsplice and tee). Grouping sendfile with splice paves the way to compiling out
the splice group of syscalls for embedded systems that do not need these.

add/remove: 0/0 grow/shrink: 7/2 up/down: 86/-61 (25)
function                                     old     new   delta
file_start_write                              34      68     +34
file_end_write                                29      58     +29
sys_pwritev                                  115     122      +7
sys_preadv                                   115     122      +7
fdput_pos                                     29      36      +7
sys_pwrite64                                 115     116      +1
sys_pread64                                  115     116      +1
sys_tee                                      497     491      -6
sys_splice                                  1075    1020     -55

Signed-off-by: Pieter Smith <pieter@boesman.nl>
---
 fs/read_write.c | 175 -------------------------------------------------------
 fs/splice.c     | 178 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 178 insertions(+), 175 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 7d9318c..d9451ba 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1191,178 +1191,3 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 }
 #endif
 
-static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
-		  	   size_t count, loff_t max)
-{
-	struct fd in, out;
-	struct inode *in_inode, *out_inode;
-	loff_t pos;
-	loff_t out_pos;
-	ssize_t retval;
-	int fl;
-
-	/*
-	 * Get input file, and verify that it is ok..
-	 */
-	retval = -EBADF;
-	in = fdget(in_fd);
-	if (!in.file)
-		goto out;
-	if (!(in.file->f_mode & FMODE_READ))
-		goto fput_in;
-	retval = -ESPIPE;
-	if (!ppos) {
-		pos = in.file->f_pos;
-	} else {
-		pos = *ppos;
-		if (!(in.file->f_mode & FMODE_PREAD))
-			goto fput_in;
-	}
-	retval = rw_verify_area(READ, in.file, &pos, count);
-	if (retval < 0)
-		goto fput_in;
-	count = retval;
-
-	/*
-	 * Get output file, and verify that it is ok..
-	 */
-	retval = -EBADF;
-	out = fdget(out_fd);
-	if (!out.file)
-		goto fput_in;
-	if (!(out.file->f_mode & FMODE_WRITE))
-		goto fput_out;
-	retval = -EINVAL;
-	in_inode = file_inode(in.file);
-	out_inode = file_inode(out.file);
-	out_pos = out.file->f_pos;
-	retval = rw_verify_area(WRITE, out.file, &out_pos, count);
-	if (retval < 0)
-		goto fput_out;
-	count = retval;
-
-	if (!max)
-		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
-
-	if (unlikely(pos + count > max)) {
-		retval = -EOVERFLOW;
-		if (pos >= max)
-			goto fput_out;
-		count = max - pos;
-	}
-
-	fl = 0;
-#if 0
-	/*
-	 * We need to debate whether we can enable this or not. The
-	 * man page documents EAGAIN return for the output at least,
-	 * and the application is arguably buggy if it doesn't expect
-	 * EAGAIN on a non-blocking file descriptor.
-	 */
-	if (in.file->f_flags & O_NONBLOCK)
-		fl = SPLICE_F_NONBLOCK;
-#endif
-	file_start_write(out.file);
-	retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
-	file_end_write(out.file);
-
-	if (retval > 0) {
-		add_rchar(current, retval);
-		add_wchar(current, retval);
-		fsnotify_access(in.file);
-		fsnotify_modify(out.file);
-		out.file->f_pos = out_pos;
-		if (ppos)
-			*ppos = pos;
-		else
-			in.file->f_pos = pos;
-	}
-
-	inc_syscr(current);
-	inc_syscw(current);
-	if (pos > max)
-		retval = -EOVERFLOW;
-
-fput_out:
-	fdput(out);
-fput_in:
-	fdput(in);
-out:
-	return retval;
-}
-
-SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, off_t __user *, offset, size_t, count)
-{
-	loff_t pos;
-	off_t off;
-	ssize_t ret;
-
-	if (offset) {
-		if (unlikely(get_user(off, offset)))
-			return -EFAULT;
-		pos = off;
-		ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
-		if (unlikely(put_user(pos, offset)))
-			return -EFAULT;
-		return ret;
-	}
-
-	return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-
-SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, loff_t __user *, offset, size_t, count)
-{
-	loff_t pos;
-	ssize_t ret;
-
-	if (offset) {
-		if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
-			return -EFAULT;
-		ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
-		if (unlikely(put_user(pos, offset)))
-			return -EFAULT;
-		return ret;
-	}
-
-	return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-
-#ifdef CONFIG_COMPAT
-COMPAT_SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd,
-		compat_off_t __user *, offset, compat_size_t, count)
-{
-	loff_t pos;
-	off_t off;
-	ssize_t ret;
-
-	if (offset) {
-		if (unlikely(get_user(off, offset)))
-			return -EFAULT;
-		pos = off;
-		ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
-		if (unlikely(put_user(pos, offset)))
-			return -EFAULT;
-		return ret;
-	}
-
-	return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-
-COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
-		compat_loff_t __user *, offset, compat_size_t, count)
-{
-	loff_t pos;
-	ssize_t ret;
-
-	if (offset) {
-		if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
-			return -EFAULT;
-		ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
-		if (unlikely(put_user(pos, offset)))
-			return -EFAULT;
-		return ret;
-	}
-
-	return do_sendfile(out_fd, in_fd, NULL, count, 0);
-}
-#endif
diff --git a/fs/splice.c b/fs/splice.c
index f5cb9ba..c1a2861 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -28,6 +28,7 @@
 #include <linux/export.h>
 #include <linux/syscalls.h>
 #include <linux/uio.h>
+#include <linux/fsnotify.h>
 #include <linux/security.h>
 #include <linux/gfp.h>
 #include <linux/socket.h>
@@ -2039,3 +2040,180 @@ SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
 
 	return error;
 }
+
+static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
+			   size_t count, loff_t max)
+{
+	struct fd in, out;
+	struct inode *in_inode, *out_inode;
+	loff_t pos;
+	loff_t out_pos;
+	ssize_t retval;
+	int fl;
+
+	/*
+	 * Get input file, and verify that it is ok..
+	 */
+	retval = -EBADF;
+	in = fdget(in_fd);
+	if (!in.file)
+		goto out;
+	if (!(in.file->f_mode & FMODE_READ))
+		goto fput_in;
+	retval = -ESPIPE;
+	if (!ppos) {
+		pos = in.file->f_pos;
+	} else {
+		pos = *ppos;
+		if (!(in.file->f_mode & FMODE_PREAD))
+			goto fput_in;
+	}
+	retval = rw_verify_area(READ, in.file, &pos, count);
+	if (retval < 0)
+		goto fput_in;
+	count = retval;
+
+	/*
+	 * Get output file, and verify that it is ok..
+	 */
+	retval = -EBADF;
+	out = fdget(out_fd);
+	if (!out.file)
+		goto fput_in;
+	if (!(out.file->f_mode & FMODE_WRITE))
+		goto fput_out;
+	retval = -EINVAL;
+	in_inode = file_inode(in.file);
+	out_inode = file_inode(out.file);
+	out_pos = out.file->f_pos;
+	retval = rw_verify_area(WRITE, out.file, &out_pos, count);
+	if (retval < 0)
+		goto fput_out;
+	count = retval;
+
+	if (!max)
+		max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
+
+	if (unlikely(pos + count > max)) {
+		retval = -EOVERFLOW;
+		if (pos >= max)
+			goto fput_out;
+		count = max - pos;
+	}
+
+	fl = 0;
+#if 0
+	/*
+	 * We need to debate whether we can enable this or not. The
+	 * man page documents EAGAIN return for the output at least,
+	 * and the application is arguably buggy if it doesn't expect
+	 * EAGAIN on a non-blocking file descriptor.
+	 */
+	if (in.file->f_flags & O_NONBLOCK)
+		fl = SPLICE_F_NONBLOCK;
+#endif
+	file_start_write(out.file);
+	retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
+	file_end_write(out.file);
+
+	if (retval > 0) {
+		add_rchar(current, retval);
+		add_wchar(current, retval);
+		fsnotify_access(in.file);
+		fsnotify_modify(out.file);
+		out.file->f_pos = out_pos;
+		if (ppos)
+			*ppos = pos;
+		else
+			in.file->f_pos = pos;
+	}
+
+	inc_syscr(current);
+	inc_syscw(current);
+	if (pos > max)
+		retval = -EOVERFLOW;
+
+fput_out:
+	fdput(out);
+fput_in:
+	fdput(in);
+out:
+	return retval;
+}
+
+SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd, off_t __user *, offset, size_t, count)
+{
+	loff_t pos;
+	off_t off;
+	ssize_t ret;
+
+	if (offset) {
+		if (unlikely(get_user(off, offset)))
+			return -EFAULT;
+		pos = off;
+		ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
+		if (unlikely(put_user(pos, offset)))
+			return -EFAULT;
+		return ret;
+	}
+
+	return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+
+SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd, loff_t __user *, offset, size_t, count)
+{
+	loff_t pos;
+	ssize_t ret;
+
+	if (offset) {
+		if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
+			return -EFAULT;
+		ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
+		if (unlikely(put_user(pos, offset)))
+			return -EFAULT;
+		return ret;
+	}
+
+	return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+
+#ifdef CONFIG_COMPAT
+COMPAT_SYSCALL_DEFINE4(sendfile, int, out_fd, int, in_fd,
+		compat_off_t __user *, offset, compat_size_t, count)
+{
+	loff_t pos;
+	off_t off;
+	ssize_t ret;
+
+	if (offset) {
+		if (unlikely(get_user(off, offset)))
+			return -EFAULT;
+		pos = off;
+		ret = do_sendfile(out_fd, in_fd, &pos, count, MAX_NON_LFS);
+		if (unlikely(put_user(pos, offset)))
+			return -EFAULT;
+		return ret;
+	}
+
+	return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+
+COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
+		compat_loff_t __user *, offset, compat_size_t, count)
+{
+	loff_t pos;
+	ssize_t ret;
+
+	if (offset) {
+		if (unlikely(copy_from_user(&pos, offset, sizeof(loff_t))))
+			return -EFAULT;
+		ret = do_sendfile(out_fd, in_fd, &pos, count, 0);
+		if (unlikely(put_user(pos, offset)))
+			return -EFAULT;
+		return ret;
+	}
+
+	return do_sendfile(out_fd, in_fd, NULL, count, 0);
+}
+#endif
+
-- 
2.1.0

^ permalink raw reply related

* [PATCH v6 0/7] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)
From: Pieter Smith @ 2014-12-04 17:50 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Josh Triplett, Pieter Smith, Alexander Duyck, Alexander Viro,
	Alexei Starovoitov, Andrew Morton, Bertrand Jacquin,
	Catalina Mocanu, Daniel Borkmann, David S. Miller, Eric Dumazet,
	Eric W. Biederman, Fabian Frederick,
	open list:FUSE: FILESYSTEM..., Geert Uytterhoeven, Hugh Dickins,
	Iulia Manda, Jan Beulich, J. Bruce Fields, Jeff Layton,
	open list:ABI/API, linux-fsd

REPO: https://github.com/smipi1/linux-tinification.git

BRANCH: tiny/config-syscall-splice

BACKGROUND: This patch-set forms part of the Linux Kernel Tinification effort (
  https://tiny.wiki.kernel.org/).

GOAL: Support compiling out the splice family of syscalls (splice, vmsplice,
  tee and sendfile) along with all supporting infrastructure if not needed.
  Many embedded systems will not need the splice-family syscalls. Omitting them
  saves space.

HISTORY:
  PATCH v6:
    - Removed unnecessary addition of __maybe_unused in fs/fuse

  PATCH v5:
    - Fix up commit log still referring to dropped __splice_p()

  PATCH v4:
    - Drops __splice_p()
    - Let nfsd fall back to non-splice support when splice is compiled out
    - Style fixes
  
  PATCH v3:
    - Fixup commit logs so that they are consistent with patch strategy
    - Style fixes
  
  PATCH v2:
    - Avoid the ifdef mess introduced in PATCH v1 by mocking out exported splice
      functions.

STRATEGY:
a. With the goal of eventually compiling out fs/splice.c, several functions
   that are only used in support of the the splice family of syscalls are moved
   into fs/splice.c from fs/read_write.c. The kernel_write function that is not
   used to support the splice syscalls is moved to fs/read_write.c.

b. Introduce an EXPERT kernel configuration option; CONFIG_SYSCALL_SPLICE; to
   compile out the splice family of syscalls. This removes all userspace uses
   of the splice infrastructure.

c. Splice exports an operations struct, nosteal_pipe_buf_ops. Eliminate the 
   uses of this struct when CONFIG_SYSCALL_SPLICE is undefined, so that splice
   can later be compiled out.

d. Let nfsd fall back to non-splice support when splice is compiled out.

e. Compile out fs/splice.c. Functions exported by fs/splice are mocked out with
   failing static inlines. This is done so as to all but eliminate the
   maintenance burden on file-system drivers.

RESULTS: A tinyconfig bloat-o-meter score for the entire patch-set:

add/remove: 0/41 grow/shrink: 5/7 up/down: 23/-8422 (-8399)
function                                     old     new   delta
sys_pwritev                                  115     122      +7
sys_preadv                                   115     122      +7
fdput_pos                                     29      36      +7
sys_pwrite64                                 115     116      +1
sys_pread64                                  115     116      +1
pipe_to_null                                   4       -      -4
generic_pipe_buf_nosteal                       6       -      -6
spd_release_page                              10       -     -10
fdput                                         11       -     -11
PageUptodate                                  22      11     -11
lock_page                                     36      24     -12
signal_pending                                39      26     -13
fdget                                         56      42     -14
page_cache_pipe_buf_release                   16       -     -16
user_page_pipe_buf_ops                        20       -     -20
splice_write_null                             24       4     -20
page_cache_pipe_buf_ops                       20       -     -20
nosteal_pipe_buf_ops                          20       -     -20
default_pipe_buf_ops                          20       -     -20
generic_splice_sendpage                       24       -     -24
user_page_pipe_buf_steal                      25       -     -25
splice_shrink_spd                             27       -     -27
pipe_to_user                                  43       -     -43
direct_splice_actor                           47       -     -47
default_file_splice_write                     49       -     -49
wakeup_pipe_writers                           54       -     -54
wakeup_pipe_readers                           54       -     -54
write_pipe_buf                                71       -     -71
page_cache_pipe_buf_confirm                   80       -     -80
splice_grow_spd                               87       -     -87
do_splice_to                                  87       -     -87
ipipe_prep.part                               92       -     -92
splice_from_pipe                              93       -     -93
splice_from_pipe_next                        107       -    -107
pipe_to_sendpage                             109       -    -109
page_cache_pipe_buf_steal                    114       -    -114
opipe_prep.part                              119       -    -119
sys_sendfile                                 122       -    -122
generic_file_splice_read                     131       8    -123
sys_sendfile64                               126       -    -126
sys_vmsplice                                 137       -    -137
do_splice_direct                             148       -    -148
vmsplice_to_user                             205       -    -205
__splice_from_pipe                           246       -    -246
splice_direct_to_actor                       348       -    -348
splice_to_pipe                               371       -    -371
do_sendfile                                  492       -    -492
sys_tee                                      497       -    -497
vmsplice_to_pipe                             558       -    -558
default_file_splice_read                     688       -    -688
iter_file_splice_write                       702       4    -698
sys_splice                                  1075       -   -1075
__generic_file_splice_read                  1109       -   -1109


Pieter Smith (7):
  fs: move sendfile syscall into fs/splice
  fs: moved kernel_write to fs/read_write
  fs/splice: support compiling out splice-family syscalls
  fs/fuse: support compiling out splice
  net/core: support compiling out splice
  fs/nfsd: support compiling out splice
  fs/splice: full support for compiling out splice

 fs/Makefile            |   3 +-
 fs/fuse/dev.c          |   4 +
 fs/read_write.c        | 181 +++------------------------------------------
 fs/splice.c            | 194 +++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/fs.h     |  26 +++++++
 include/linux/skbuff.h |  10 +++
 include/linux/splice.h |  42 +++++++++++
 init/Kconfig           |  10 +++
 kernel/sys_ni.c        |   8 ++
 net/core/skbuff.c      |  11 ++-
 net/sunrpc/svc.c       |   2 +-
 11 files changed, 299 insertions(+), 192 deletions(-)

-- 
2.1.0

^ permalink raw reply

* Re: [PATCH 1/1] net: dsa: replacing the hard-coded sized array "dsa_switch" by dynamic one
From: Andrey Volkov @ 2014-12-04 17:29 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev
In-Reply-To: <547E23C1.2040007@gmail.com>


Le 02/12/2014 21:40, Florian Fainelli a écrit :
> On 02/12/14 06:50, Andrey Volkov wrote:
>> Hello,
>>
>> In time of developing one of our devices (with huge, more then 6, number of onboard switches),
>> I've bumped with this ancient, I hope, restriction in the 'struct dsa_switch_tree' definition. 
>> So this simple patch remove this restriction and make dsa_switch_tree more scalable for 
>> the "usual" 1-2 switches configuration too.
> Sounds reasonable to me, you probably want to resubmit and trim the
> "Hello" form your commit message.
>
>> P.S. I've plans to fix hardcoded number of ports too, but it is not so easy as with number of switches.
>> So if someone have any objections/suggestions I'll happy to discuss them.
> I think the number of ports in a switch is something that should come
> from the switch driver, and eventually intersected with what the
> platform configuration has provided.
Yes it's exactly what I've in my mind. Also in our project I need to handle currently unsupported case:
when switches combined into the stacks by more than one hardwired links, i.e. some ports must be
configured as part of 'hard-coded' trunks.

> The difficulty is in case of sparse port number allocation because you
> still want to allocate e.g: 6 ports even though Port 0 and 5 are used, I
> don't think we want to introduce a logical to physical mapping, that
> would be too error prone.
Agree.


>
>> Signed-off-by: Andrey Volkov <andrey.volkov@nexvision.fr>                                                                                                                                                                                    
>> ---                                                                                                                                                                                                                                          
>>  include/net/dsa.h |    3 +--                                                                                                                                                                                                                
>>  net/dsa/dsa.c     |    7 +++----                                                                                                                                                                                                            
>>  2 files changed, 4 insertions(+), 6 deletions(-)                                                                                                                                                                                            
>>                                                                                                                                                                                                                                              
>> diff --git a/include/net/dsa.h b/include/net/dsa.h                                                                                                                                                                                           
>> index ed3c34b..733db2e 100644                                                                                                                                                                                                                
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -28,7 +28,6 @@ enum dsa_tag_protocol {
>>         DSA_TAG_PROTO_BRCM,
>>  };
>>  
>> -#define DSA_MAX_SWITCHES       4
>>  #define DSA_MAX_PORTS          12
>>  
>>  struct dsa_chip_data {
>> @@ -117,7 +116,7 @@ struct dsa_switch_tree {
>>         /*
>>          * Data for the individual switch chips.
>>          */
>> -       struct dsa_switch       *ds[DSA_MAX_SWITCHES];
>> +       struct dsa_switch       *ds[];
>>  };
>>  
>>  struct dsa_switch {
>> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
>> index 322c778..c081a19 100644
>> --- a/net/dsa/dsa.c
>> +++ b/net/dsa/dsa.c
>> @@ -604,8 +604,6 @@ static int dsa_of_probe(struct platform_device *pdev)
>>         pdev->dev.platform_data = pd;
>>         pd->netdev = &ethernet_dev->dev;
>>         pd->nr_chips = of_get_child_count(np);
>> -       if (pd->nr_chips > DSA_MAX_SWITCHES)
>> -               pd->nr_chips = DSA_MAX_SWITCHES;
>>  
>>         pd->chip = kcalloc(pd->nr_chips, sizeof(struct dsa_chip_data),
>>                            GFP_KERNEL);
>> @@ -717,7 +715,7 @@ static int dsa_probe(struct platform_device *pdev)
>>                 pd = pdev->dev.platform_data;
>>         }
>>  
>> -       if (pd == NULL || pd->netdev == NULL)
>> +       if (pd == NULL || pd->netdev == NULL || pd->nr_chips == 0)
>>                 return -EINVAL;
>>  
>>         dev = dev_to_net_device(pd->netdev);
>> @@ -732,7 +730,8 @@ static int dsa_probe(struct platform_device *pdev)
>>                 goto out;
>>         }
>>  
>> -       dst = kzalloc(sizeof(*dst), GFP_KERNEL);
>> +       dst = kzalloc(sizeof(*dst) +
>> +                       sizeof(struct dsa_switch *) * pd->nr_chips, GFP_KERNEL);
>>         if (dst == NULL) {
>>                 dev_put(dev);
>>                 ret = -ENOMEM;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode
From: Alexander Duyck @ 2014-12-04 17:43 UTC (permalink / raw)
  To: Hiroshi Shimamoto, e1000-devel@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, Choi, Sy Jong, Hayato Momma,
	linux-kernel@vger.kernel.org
In-Reply-To: <7F861DC0615E0C47A872E6F3C5FCDDBD05DBFDAD@BPXM14GP.gisp.nec.co.jp>

On 11/27/2014 02:39 AM, Hiroshi Shimamoto wrote:
> From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
>
> The limitation of the number of multicast address for VF is not enough
> for the large scale server with SR-IOV feature.
> IPv6 requires the multicast MAC address for each IP address to handle
> the Neighbor Solicitation message.
> We couldn't assign over 30 IPv6 addresses to a single VF interface.
>
> The easy way to solve this is enabling multicast promiscuous mode.
> It is good to have a functionality to enable multicast promiscuous mode
> for each VF from VF driver.
>
> This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
> enable/disable multicast promiscuous mode in VF. If multicast promiscuous
> mode is enabled the VF can receive all multicast packets.
>
> With this patch, the ixgbevf driver automatically enable multicast
> promiscuous mode when the number of multicast addresses is over than 30
> if possible.
>
> This also bump the API version up to 1.2 to check whether the API,
> IXGBE_VF_SET_MC_PROMISC is available.
>
> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
> CC: Choi, Sy Jong <sy.jong.choi@intel.com>
> Reviewed-by: Hayato Momma <h-momma@ce.jp.nec.com>

This is a REALLY bad idea unless you plan to limit this to privileged VFs.

I would recommend looking at adding an ndo operation to control this
feature so that it could be disabled by default in the PF and only
enabled on the host side if specifically requested.  Otherwise the
problem is I can easily see this leading security issues as the VFs
might begin getting access to messages that they aren't supposed to.

- Alex

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [PATCH] net: ethernet: rocker: Add select to CONFIG_BRIDGE in Kconfig
From: Andreas Ruprecht @ 2014-12-04 17:36 UTC (permalink / raw)
  To: Jim Davis, Stephen Rothwell, linux-next, linux-kernel, jiri,
	sfeldma, netdev
In-Reply-To: <CA+r1ZhhJQuwXbs+Et-ihsGP3QXDUWZ1nJ-5_hcjioFt5s8zMrA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 661 bytes --]

On 04.12.2014 17:34, Jim Davis wrote:
> Building with the attached random configuration file,
> 
> drivers/built-in.o: In function `rocker_port_fdb_learn_work':
> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3014: undefined
> reference to `br_fdb_external_learn_del'
> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3016: undefined
> reference to `br_fdb_external_learn_add'
> 

Hi,

the problem here is that CONFIG_BRIDGE is set to 'm' (leading to
inclusion of the two functions above in the kernel module) while
CONFIG_ROCKER is set to 'y', requiring the functions at link time.

Is the attached patch sufficient to fix this?

Regards,

Andreas

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-net-ethernet-rocker-Add-select-to-CONFIG_BRIDGE-in-K.patch --]
[-- Type: text/x-patch; name="0001-net-ethernet-rocker-Add-select-to-CONFIG_BRIDGE-in-K.patch", Size: 1371 bytes --]

From 0529c3cbe381338dc3337e07a71e15b3d22a3255 Mon Sep 17 00:00:00 2001
From: Andreas Ruprecht <rupran@einserver.de>
Date: Thu, 4 Dec 2014 18:28:09 +0100
Subject: [PATCH] net: ethernet: rocker: Add select to CONFIG_BRIDGE in Kconfig

In a configuration with CONFIG_BRIDGE set to 'm' and CONFIG_ROCKER
set to 'y', undefined references occur at link time:

> drivers/built-in.o: In function `rocker_port_fdb_learn_work':
> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3014: undefined
> reference to `br_fdb_external_learn_del'
> /home/jim/linux/drivers/net/ethernet/rocker/rocker.c:3016: undefined
> reference to `br_fdb_external_learn_add'

This patch fixes these by selecting CONFIG_BRIDGE from CONFIG_ROCKER.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
---
 drivers/net/ethernet/rocker/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/rocker/Kconfig b/drivers/net/ethernet/rocker/Kconfig
index 11a850eab628..ade10ec4c78d 100644
--- a/drivers/net/ethernet/rocker/Kconfig
+++ b/drivers/net/ethernet/rocker/Kconfig
@@ -18,6 +18,7 @@ if NET_VENDOR_ROCKER
 config ROCKER
 	tristate "Rocker switch driver (EXPERIMENTAL)"
 	depends on PCI && NET_SWITCHDEV
+	select BRIDGE
 	---help---
 	  This driver supports Rocker switch device.
 
-- 
1.9.1


^ permalink raw reply related

* [PATCH v2 1/1] net: dsa: replacing the hard-coded sized array "dsa_switch" by dynamic one
From: Andrey Volkov @ 2014-12-04 15:54 UTC (permalink / raw)
  To: Florian Fainelli, netdev
In-Reply-To: <547E23C1.2040007@gmail.com>

Replacing the hard-coded sized array "dsa_switch" by dynamic one
and remove useless DSA_MAX_SWITCHES define.
This patch also make dsa_switch_tree scalable from the "usual"
1-2 up to 10th switches configuration.

Signed-off-by: Andrey Volkov <andrey.volkov@nexvision.fr>
---
 include/net/dsa.h |    3 +--
 net/dsa/dsa.c     |    7 +++----
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index ed3c34b..733db2e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -28,7 +28,6 @@ enum dsa_tag_protocol {
     DSA_TAG_PROTO_BRCM,
 };
 
-#define DSA_MAX_SWITCHES    4
 #define DSA_MAX_PORTS        12
 
 struct dsa_chip_data {
@@ -117,7 +116,7 @@ struct dsa_switch_tree {
     /*
      * Data for the individual switch chips.
      */
-    struct dsa_switch    *ds[DSA_MAX_SWITCHES];
+    struct dsa_switch    *ds[];
 };
 
 struct dsa_switch {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 322c778..c081a19 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -604,8 +604,6 @@ static int dsa_of_probe(struct platform_device *pdev)
     pdev->dev.platform_data = pd;
     pd->netdev = &ethernet_dev->dev;
     pd->nr_chips = of_get_child_count(np);
-    if (pd->nr_chips > DSA_MAX_SWITCHES)
-        pd->nr_chips = DSA_MAX_SWITCHES;
 
     pd->chip = kcalloc(pd->nr_chips, sizeof(struct dsa_chip_data),
                GFP_KERNEL);
@@ -717,7 +715,7 @@ static int dsa_probe(struct platform_device *pdev)
         pd = pdev->dev.platform_data;
     }
 
-    if (pd == NULL || pd->netdev == NULL)
+    if (pd == NULL || pd->netdev == NULL || pd->nr_chips == 0)
         return -EINVAL;
 
     dev = dev_to_net_device(pd->netdev);
@@ -732,7 +730,8 @@ static int dsa_probe(struct platform_device *pdev)
         goto out;
     }
 
-    dst = kzalloc(sizeof(*dst), GFP_KERNEL);
+    dst = kzalloc(sizeof(*dst) +
+            sizeof(struct dsa_switch *) * pd->nr_chips, GFP_KERNEL);
     if (dst == NULL) {
         dev_put(dev);
         ret = -ENOMEM;

^ permalink raw reply related

* Re: [patch net-next v4 8/9] net: move vlan pop/push functions into common code
From: Jiri Benc @ 2014-12-04 16:57 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: Jiri Pirko, netdev, David Miller, Jamal Hadi Salim, Tom Herbert,
	Eric Dumazet, Willem de Bruijn, Daniel Borkmann, mst, fw,
	Paul.Durrant, Thomas Graf, Cong Wang
In-Reply-To: <CALnjE+r0U9-cTZYZh=qzWoztfV_TNFoB762CBcDjUj1Z5oDYag@mail.gmail.com>

On Wed, 3 Dec 2014 16:33:57 -0800, Pravin Shelar wrote:
> OVS correctly sets mac header length in case of vlan header. Can you
> give me OVS test case to reproduce this issue?

Set up ovs bridge with two ports, one of them tagged. Receive a packet
with two vlan headers (the first vlan tag corresponding to the second
port) on the untagged port.

Tried it just now with the latest net-next with printks added to
__skb_vlan_pop, __skb_vlan_pop is called twice for each packet, the
second invocation has the pointers set in the way I described.

 Jiri

-- 
Jiri Benc

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Roopa Prabhu @ 2014-12-04 16:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <20141204160444.GF1861@nanopsycho.orion>

On 12/4/14, 8:04 AM, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:45:44PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/4/14, 6:34 AM, Jiri Pirko wrote:
>>> Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>>>> On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>>>> Jiri Pirko (1):
>>>>>    iproute2: ipa: show switch id
>>>>>
>>>>> Scott Feldman (5):
>>>>>    bridge/fdb: fix statistics output spacing
>>>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>>>      device
>>>>>    bridge/link: add new offload hwmode swdev
>>>> Ack to most patches but nack on this one. The todo list still has a note to
>>>> revist the flag to indicate switchdev offloads.
>>>> Exposing this to userspace does not help that.
>>> Hmm, note that this is already exposed to userspace, this patchset is
>>> for iproute2 (userspace tool).
>> hmmm, all feedback on the switchdev patches seemed to indicate we can change
>> this later.
>> I don't see swdev mode being used in the kernel anywhere today.
> Well, it is, in rocker:
> $ git grep BRIDGE_MODE_SWDEV
> drivers/net/ethernet/rocker/rocker.c:                   if (mode != BRIDGE_MODE_SWDEV)
> drivers/net/ethernet/rocker/rocker.c:   u16 mode = BRIDGE_MODE_SWDEV;
> include/uapi/linux/if_bridge.h:#define BRIDGE_MODE_SWDEV        2       /* Full switch device offload */

The problem is rocker is not the only one who is going to be using this. 
And so, we need something that fits everybody.
And i am not going to make my user set a mode for him to enable offload 
to hw.

>
>> I will send a patch to remove it. Its still in net-next and so can be changed
>> ?.
>> I was going to resend my patch to introduce a common offload flag for all
>> link objects.
>> It would be nice if all of them had a consistent flag to indicate hw offload
>> and iproute2 could display the same flag for all.
>> Including bonds and vxlan's.
> I do not understand the connection with BRIDGE_MODE_SWDEV. We discussed
> this already. BRIDGE_MODE_SWDEV is a bridge mode, similar to for example
> BRIDGE_MODE_VEPA and makes perfect sense to have it.
I dont think everybody acked it. But it went in with a note saying that 
it can be changed.
>
> How vxlan and bonds come into the mixture, that is a puzzler for me.
> Maybe I have to see patches.

I had posted a version of the patch previously: 
http://www.spinics.net/lists/netdev/msg305472.html

I have a v2 patch in my stack which does not touch the netlink header.
But in the past hour, i have been thinking about it some more. Do we 
really need this set by the user ?. In my use case i don't need it.

We do need a feature flag (or net_device_flags), but it does not need to 
be set by the user explicitly.
This flag can be set by the switch port driver on the switch ports. And 
the logical device: bridge/bond/vxlan
can inherit it from the port. There was a need of a flag in some 
usecases, to control offloading of specific bridge port flags
to hw/sw (example learning in hw or sw). example patch: 
https://patchwork.ozlabs.org/patch/413211/

I will post something today.






>
>>>>>    link: add missing IFLA_BRPORT_PROXYARP
>>>>>    bridge/link: add learning_sync policy flag
>>>>>
>>>>>   bridge/fdb.c              |  4 +++-
>>>>>   bridge/link.c             | 17 +++++++++++++++--
>>>>>   include/linux/if_bridge.h |  1 +
>>>>>   include/linux/if_link.h   |  3 +++
>>>>>   include/linux/neighbour.h |  1 +
>>>>>   ip/ipaddress.c            |  8 ++++++++
>>>>>   man/man8/bridge.8         | 19 ++++++++++++++-----
>>>>>   7 files changed, 45 insertions(+), 8 deletions(-)
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: am335x: cpsw: interrupt failure
From: Felipe Balbi @ 2014-12-04 16:56 UTC (permalink / raw)
  To: Yegor Yefremov; +Cc: netdev, N, Mugunthan V, Felipe Balbi
In-Reply-To: <CAGm1_ksPkFXV_S6541cK1TQb_eRTHbspZy4zWxY5GMrrjGD2tA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2112 bytes --]

Hi,

On Thu, Dec 04, 2014 at 05:41:38PM +0100, Yegor Yefremov wrote:
> I have following problem. My systems reboots at high network load
> after this commit (found via git bissect):
> 
> commit 55601c9f24670ba926ebdd4d712ac3b177232330
> Author: Felipe Balbi <balbi@ti.com>
> Date:   Mon Sep 8 17:54:58 2014 -0700
> 
>     arm: omap: intc: switch over to linear irq domain
> 
>     now that we don't need to support legacy board-files,
>     we can completely switch over to a linear irq domain
>     and make use of irq_alloc_domain_generic_chips() to
>     allocate all generic irq chips for us.
> 
>     Signed-off-by: Felipe Balbi <balbi@ti.com>
>     Signed-off-by: Tony Lindgren <tony@atomide.com>
> 
> and I get following error messages:
> 
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0

irq 0 ? Weird, that's not a valid IRQ.

> ->handle_irq():  c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action():   (null)
>    IRQ_NOPROBE set
>  IRQ_NOREQUEST set
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> ->handle_irq():  c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action():   (null)
>    IRQ_NOPROBE set
>  IRQ_NOREQUEST set
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> ->handle_irq():  c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action():   (null)
>    IRQ_NOPROBE set
>  IRQ_NOREQUEST set
> irq 0, desc: cf004000, depth: 1, count: 0, unhandled: 0
> ->handle_irq():  c0087fc0, handle_bad_irq+0x0/0x258
> ->irq_data.chip(): c08e7174, no_irq_chip+0x0/0x68
> ->action():   (null)
> 
> My system: am335x with fast ethernet on the first slave and gigabit
> Ethernet on second CPSW slave. This issue occurs, when I ran nuttcp
> with default settings.
> 
> With commit above I can at least see these messages, but 3.18-rc7 for
> example reboots without any messages.
> 
> Any idea?

if you take v3.18-rc7 and just revert that commit, does the problem go
away ?

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox