Netdev List
 help / color / mirror / Atom feed
* Re: Permissions for eBPF objects
From: Casey Schaufler @ 2017-08-25 20:04 UTC (permalink / raw)
  To: Jeffrey Vander Stoep, Chenbo Feng, netdev, SELinux, LSM
In-Reply-To: <CABXk95AiYO7D8o79TBdt0_0g1TXfULSpL5i7KzHF3R4i-WhwHw@mail.gmail.com>

Adding the LSM list to the thread.

On 8/25/2017 11:01 AM, Jeffrey Vander Stoep via Selinux wrote:
> I’d like to get your thoughts on adding LSM permission checks on BPF objects.

Aside from the use of these objects requiring privilege,
what sort of controls do you think might be reasonable?
Who "owns" these objects? Can you have a coherent system
if one entity changes maps and another changes programs?
Why would "finer granularity" be better?

While I understand the issues with CAP_SYS_ADMIN being
uncomfortably general I am not the advocate of fine
grained controls that many of my peers and betters are.
Would the increased complexity add value? How?

> By default, the ability to create and use eBPF maps/programs requires
> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
> to bpf() functions. This seems like poor granularity. [2]

You could put mode bits on your maps, programs, functions.
Do you otherwise treat these as objects, or are the more
like process state?


> Like files and sockets, eBPF maps and programs can be passed between
> processes by FD and have a number of functions that map cleanly to
> permissions.
>
> Let me know what you think. Are there simpler alternative approaches
> that we haven’t considered?
>
> Thanks!
> Jeff
>
> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
> [2] We are considering eBPF for network filtering by netd. Giving netd
> CAP_SYS_ADMIN would considerably increase netd’s privileges.
> Alternatively allowing all processes permission to use bpf() goes
> against the principle of least privilege exposing a lot of kernel
> attack surface to processes that do not actually need it.

Just thinking out loud here, but if there is ownership on your
"objects" (objects have names, owners and access controls)
you could let the owner decide who gets to use them, just like
you do with user-space programs. This is kind of iffy for
programs that execute in the kernel, but you're already putting
a lot of trust in the eBPF implementation.

The big thing you need to do is define a security model, with
a list of subjects, objects and accesses. Once you have that
coming up with a basic access control policy is a matter of
creating something Linux-ish. The security modules will follow
on with their own interpretations of how to make it even better
in due course.


^ permalink raw reply

* Re: Permissions for eBPF objects
From: Chenbo Feng @ 2017-08-25 19:52 UTC (permalink / raw)
  To: Jeffrey Vander Stoep; +Cc: Stephen Smalley, netdev, SELinux
In-Reply-To: <CABXk95BkyU5MBLry0PEp+QwhtY7rM4DCmQq3CeQi6=TQtQQPwA@mail.gmail.com>

On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep <jeffv@google.com> wrote:
> On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
>> On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
>> wrote:
>>> I’d like to get your thoughts on adding LSM permission checks on BPF
>>> objects.
>>>
>>> By default, the ability to create and use eBPF maps/programs requires
>>> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
>>> to bpf() functions. This seems like poor granularity. [2]
>>>
>>> Like files and sockets, eBPF maps and programs can be passed between
>>> processes by FD and have a number of functions that map cleanly to
>>> permissions.
>>>
>>> Let me know what you think. Are there simpler alternative approaches
>>> that we haven’t considered?
>>
>> Is it possible to create the map/program in one process (with
>> CAP_SYS_ADMIN), pass the resulting fd to netd, and then use it there
>> (without requiring CAP_SYS_ADMIN in netd itself)?
>
> That might work. Any use of bpf() requires CAP_SYS_ADMIN but netd
> could potentially just apply the prog_fd to a socket:
>
>            setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,
>                       &prog_fd, sizeof(prog_fd));
>

This specific case might work. But other map and program related operations can
only be done through syscalls. And the syscall can be set to only allow
CAP_SYS_ADMIN processes to use it or open to all processes. So when the
CAP_SYS_ADMIN limitation is enforced, netd will not be able to use any of the
syscalls such as map_look_up, map_update, map_delete even if a
CAP_SYS_ADMIN process passed the fd to it. Here is how this enforcement
implemented:
http://elixir.free-electrons.com/linux/latest/source/kernel/bpf/syscall.c#L1005

>>
>> What level of granularity would be useful?  Would it go beyond just
>> being able to use bpf() at all?
>
> "use" might be sufficient. At least initially.
>
> I could see some others coming in handy. For example, a simple mapping
> of functionality to permissions gives:
> map_create, map_update, map_delete, map_read, prog_load, prog_use.
>
> Of course there's no sense in breaking "use" into multiple permissions if
> we expect the entire set to always be granted together.
>
>>
>>>
>>> Thanks!
>>> Jeff
>>>
>>> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
>>> [2] We are considering eBPF for network filtering by netd. Giving
>>> netd
>>> CAP_SYS_ADMIN would considerably increase netd’s privileges.
>>> Alternatively allowing all processes permission to use bpf() goes
>>> against the principle of least privilege exposing a lot of kernel
>>> attack surface to processes that do not actually need it.
>>>

^ permalink raw reply

* Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
From: David Ahern @ 2017-08-25 19:51 UTC (permalink / raw)
  To: Arkadi Sharshevsky, Jiri Pirko, netdev; +Cc: davem, idosch, mlxsw
In-Reply-To: <4d5b031e-3d0a-624f-1285-9540a9dc4716@mellanox.com>

On 8/25/17 2:26 AM, Arkadi Sharshevsky wrote:
> 
> 
> On 08/24/2017 10:26 PM, David Ahern wrote:
>> On 8/23/17 11:40 PM, Jiri Pirko wrote:
>>> +static int
>>> +mlxsw_sp_dpipe_table_host_entries_get(struct mlxsw_sp *mlxsw_sp,
>>> +				      struct devlink_dpipe_entry *entry,
>>> +				      bool counters_enabled,
>>> +				      struct devlink_dpipe_dump_ctx *dump_ctx,
>>> +				      int type)
>>> +{
>>> +	int rif_neigh_count = 0;
>>> +	int rif_neigh_skip = 0;
>>> +	int neigh_count = 0;
>>> +	int rif_count;
>>> +	int i, j;
>>> +	int err;
>>> +
>>> +	rtnl_lock();
>>
>> Why does a h/w driver dumping its tables need the rtnl lock?
>>
> 
> This table represents the hw IPv4 arp table, and the
> driver depends on rtnl to be held.
> 

Meaning mlxsw does not have its own locks protecting data structures --
e.g., rif adds and deletes, so it is relying on rtnl?

Also, this dpipe capability seems to be just dumping data structures
maintained by the driver. ie., you can compare the mlxsw view of
networking state to IPv4 and IPv6 level tables. Any plans to offer a
command that reads data from the h/w and passes that back to the user?
i.e, a command to compare kernel tables to h/w state?

^ permalink raw reply

* Re: Permissions for eBPF objects
From: Jeffrey Vander Stoep @ 2017-08-25 19:45 UTC (permalink / raw)
  To: Stephen Smalley; +Cc: Chenbo Feng, netdev, SELinux
In-Reply-To: <1503689199.9977.4.camel@tycho.nsa.gov>

On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
> wrote:
>> I’d like to get your thoughts on adding LSM permission checks on BPF
>> objects.
>>
>> By default, the ability to create and use eBPF maps/programs requires
>> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
>> to bpf() functions. This seems like poor granularity. [2]
>>
>> Like files and sockets, eBPF maps and programs can be passed between
>> processes by FD and have a number of functions that map cleanly to
>> permissions.
>>
>> Let me know what you think. Are there simpler alternative approaches
>> that we haven’t considered?
>
> Is it possible to create the map/program in one process (with
> CAP_SYS_ADMIN), pass the resulting fd to netd, and then use it there
> (without requiring CAP_SYS_ADMIN in netd itself)?

That might work. Any use of bpf() requires CAP_SYS_ADMIN but netd
could potentially just apply the prog_fd to a socket:

           setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_BPF,
                      &prog_fd, sizeof(prog_fd));

>
> What level of granularity would be useful?  Would it go beyond just
> being able to use bpf() at all?

"use" might be sufficient. At least initially.

I could see some others coming in handy. For example, a simple mapping
of functionality to permissions gives:
map_create, map_update, map_delete, map_read, prog_load, prog_use.

Of course there's no sense in breaking "use" into multiple permissions if
we expect the entire set to always be granted together.

>
>>
>> Thanks!
>> Jeff
>>
>> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
>> [2] We are considering eBPF for network filtering by netd. Giving
>> netd
>> CAP_SYS_ADMIN would considerably increase netd’s privileges.
>> Alternatively allowing all processes permission to use bpf() goes
>> against the principle of least privilege exposing a lot of kernel
>> attack surface to processes that do not actually need it.
>>

^ permalink raw reply

* Re: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more
From: Jakub Kicinski @ 2017-08-25 19:34 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev
In-Reply-To: <20170825152449.29790-1-jacob.e.keller@intel.com>

On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:
> Under some circumstances, such as with many stacked devices, it is
> possible that dev_hard_start_xmit will bundle many packets together, and
> mark them all with xmit_more.

Excuse my ignorance but what are those stacked devices?  Could they
perhaps be fixed somehow?  My intuition was that long xmit_more
sequences can only happen if NIC and/or BQL are back pressuring, and
therefore we shouldn't be seeing a long xmit_more "train" arriving at
an empty device ring...

^ permalink raw reply

* Re: Permissions for eBPF objects
From: Stephen Smalley @ 2017-08-25 19:26 UTC (permalink / raw)
  To: Jeffrey Vander Stoep, Chenbo Feng, netdev-u79uwXL29TY76Z2rM5mHXA,
	SELinux
In-Reply-To: <CABXk95AiYO7D8o79TBdt0_0g1TXfULSpL5i7KzHF3R4i-WhwHw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
wrote:
> I’d like to get your thoughts on adding LSM permission checks on BPF
> objects.
> 
> By default, the ability to create and use eBPF maps/programs requires
> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access
> to bpf() functions. This seems like poor granularity. [2]
> 
> Like files and sockets, eBPF maps and programs can be passed between
> processes by FD and have a number of functions that map cleanly to
> permissions.
> 
> Let me know what you think. Are there simpler alternative approaches
> that we haven’t considered?

Is it possible to create the map/program in one process (with
CAP_SYS_ADMIN), pass the resulting fd to netd, and then use it there
(without requiring CAP_SYS_ADMIN in netd itself)?

What level of granularity would be useful?  Would it go beyond just
being able to use bpf() at all?

> 
> Thanks!
> Jeff
> 
> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
> [2] We are considering eBPF for network filtering by netd. Giving
> netd
> CAP_SYS_ADMIN would considerably increase netd’s privileges.
> Alternatively allowing all processes permission to use bpf() goes
> against the principle of least privilege exposing a lot of kernel
> attack surface to processes that do not actually need it.
> 

^ permalink raw reply

* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Michael S. Tsirkin @ 2017-08-25 19:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <1503687439.11498.5.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Aug 25, 2017 at 11:57:19AM -0700, Eric Dumazet wrote:
> On Fri, 2017-08-25 at 21:03 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > As found by syzkaller, malicious users can set whatever tx_queue_len
> > > on a tun device and eventually crash the kernel.
> > > 
> > > Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> > > ring buffer is not fast anyway.
> > 
> > I'm not sure it's worth changing for small rings.
> > 
> > Does kmalloc_array guarantee cache line alignment for big buffers
> > then? If the ring is misaligned it will likely cause false sharing
> > as it's designed to be accessed from two CPUs.
> 
> I specifically said that in the changelog :
> 
> "since a small ring buffer is not fast anyway."
> 
> If one user sets up a pathological small ring buffer, kernel should not
> try to fix it.

Yes, I got that point. My question is about big buffers.
Does kmalloc_array give us an aligned array in that case?

E.g. imagine a 100 slot array. Will 800 bytes be allocated?
In that case it uses up 12.5 cache lines. It looks like the
last cache line can become false shared with something else,
causing cache line bounces on each wrap around.


> In this case, you would have to setup a ring of 2 or 4 slots to
> eventually hit false sharing.
> 

I don't think many people set up such tiny rings so I do not really
think we care what happens in that case. But you need 8 slots to avoid
false sharing I think.

-- 
MST

^ permalink raw reply

* [PATCH 4/4] net: stmmac: sun8i: Remove the compatibles
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel
In-Reply-To: <20170825191217.10278-1-maxime.ripard@free-electrons.com>

Since the bindings have been controversial, and we follow the DT stable ABI
rule, we shouldn't let a driver with a DT binding that might change slip
through in a stable release.

Remove the compatibles to make sure the driver will not probe and no-one
will start using the binding currently implemented. This commit will
obviously need to be reverted in due time.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index fffd6d5fc907..39c2122a4f26 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -979,14 +979,6 @@ static int sun8i_dwmac_probe(struct platform_device *pdev)
 }
 
 static const struct of_device_id sun8i_dwmac_match[] = {
-	{ .compatible = "allwinner,sun8i-h3-emac",
-		.data = &emac_variant_h3 },
-	{ .compatible = "allwinner,sun8i-v3s-emac",
-		.data = &emac_variant_v3s },
-	{ .compatible = "allwinner,sun8i-a83t-emac",
-		.data = &emac_variant_a83t },
-	{ .compatible = "allwinner,sun50i-a64-emac",
-		.data = &emac_variant_a64 },
 	{ }
 };
 MODULE_DEVICE_TABLE(of, sun8i_dwmac_match);
-- 
2.13.5

^ permalink raw reply related

* [PATCH 3/4] arm: dts: sunxi: Revert EMAC changes
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel
In-Reply-To: <20170825191217.10278-1-maxime.ripard@free-electrons.com>

Since the discussion is not settled yet for the EMAC, and that the release
in getting really close, let's revert the changes for now, and we'll
reintroduce them later.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts |  9 --------
 arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts   | 19 -----------------
 arch/arm/boot/dts/sun8i-h3-beelink-x2.dts         |  8 -------
 arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts         |  7 ------
 arch/arm/boot/dts/sun8i-h3-orangepi-2.dts         |  8 -------
 arch/arm/boot/dts/sun8i-h3-orangepi-one.dts       |  8 -------
 arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts   |  5 -----
 arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts        |  8 -------
 arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts      | 22 -------------------
 arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts    | 16 --------------
 arch/arm/boot/dts/sunxi-h3-h5.dtsi                | 26 -----------------------
 11 files changed, 136 deletions(-)

diff --git a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
index 6713d0f2b3f4..b1502df7b509 100644
--- a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
+++ b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
@@ -56,8 +56,6 @@
 
 	aliases {
 		serial0 = &uart0;
-		/* ethernet0 is the H3 emac, defined in sun8i-h3.dtsi */
-		ethernet0 = &emac;
 		ethernet1 = &xr819;
 	};
 
@@ -104,13 +102,6 @@
 	status = "okay";
 };
 
-&emac {
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins_a>;
diff --git a/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts b/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts
index d756ff825116..a337af1de322 100644
--- a/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts
+++ b/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts
@@ -52,7 +52,6 @@
 	compatible = "sinovoip,bpi-m2-plus", "allwinner,sun8i-h3";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 		serial1 = &uart1;
 	};
@@ -115,30 +114,12 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&emac_rgmii_pins>;
-	phy-supply = <&reg_gmac_3v3>;
-	phy-handle = <&ext_rgmii_phy>;
-	phy-mode = "rgmii";
-
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
 &ir {
 	pinctrl-names = "default";
 	pinctrl-0 = <&ir_pins_a>;
 	status = "okay";
 };
 
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <0>;
-	};
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin>;
diff --git a/arch/arm/boot/dts/sun8i-h3-beelink-x2.dts b/arch/arm/boot/dts/sun8i-h3-beelink-x2.dts
index 546837ccd8af..5cd3a391bfd9 100644
--- a/arch/arm/boot/dts/sun8i-h3-beelink-x2.dts
+++ b/arch/arm/boot/dts/sun8i-h3-beelink-x2.dts
@@ -53,7 +53,6 @@
 
 	aliases {
 		serial0 = &uart0;
-		ethernet0 = &emac;
 		ethernet1 = &sdiowifi;
 	};
 
@@ -108,13 +107,6 @@
 	status = "okay";
 };
 
-&emac {
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
 &ir {
 	pinctrl-names = "default";
 	pinctrl-0 = <&ir_pins_a>;
diff --git a/arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts b/arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts
index 78f6c24952dd..8d2cc6e9a03f 100644
--- a/arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts
+++ b/arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts
@@ -46,10 +46,3 @@
 	model = "FriendlyARM NanoPi NEO";
 	compatible = "friendlyarm,nanopi-neo", "allwinner,sun8i-h3";
 };
-
-&emac {
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	status = "okay";
-};
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-2.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-2.dts
index 17cdeae19c6f..8ff71b1bb45b 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-2.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-2.dts
@@ -54,7 +54,6 @@
 	aliases {
 		serial0 = &uart0;
 		/* ethernet0 is the H3 emac, defined in sun8i-h3.dtsi */
-		ethernet0 = &emac;
 		ethernet1 = &rtl8189;
 	};
 
@@ -118,13 +117,6 @@
 	status = "okay";
 };
 
-&emac {
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
 &ir {
 	pinctrl-names = "default";
 	pinctrl-0 = <&ir_pins_a>;
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts
index 6880268e8b87..5fea430e0eb1 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts
@@ -52,7 +52,6 @@
 	compatible = "xunlong,orangepi-one", "allwinner,sun8i-h3";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 	};
 
@@ -98,13 +97,6 @@
 	status = "okay";
 };
 
-&emac {
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin>;
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts
index a10281b455f5..8b93f5c781a7 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts
@@ -53,11 +53,6 @@
 	};
 };
 
-&emac {
-	/* LEDs changed to active high on the plus */
-	/delete-property/ allwinner,leds-active-low;
-};
-
 &mmc1 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc1_pins_a>;
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
index 998b60f8d295..1a044b17d6c6 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
@@ -52,7 +52,6 @@
 	compatible = "xunlong,orangepi-pc", "allwinner,sun8i-h3";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 	};
 
@@ -114,13 +113,6 @@
 	status = "okay";
 };
 
-&emac {
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
 &ir {
 	pinctrl-names = "default";
 	pinctrl-0 = <&ir_pins_a>;
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts
index 331ed683ac62..828ae7a526d9 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts
@@ -47,10 +47,6 @@
 	model = "Xunlong Orange Pi Plus / Plus 2";
 	compatible = "xunlong,orangepi-plus", "allwinner,sun8i-h3";
 
-	aliases {
-		ethernet0 = &emac;
-	};
-
 	reg_gmac_3v3: gmac-3v3 {
 		compatible = "regulator-fixed";
 		regulator-name = "gmac-3v3";
@@ -78,24 +74,6 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&emac_rgmii_pins>;
-	phy-supply = <&reg_gmac_3v3>;
-	phy-handle = <&ext_rgmii_phy>;
-	phy-mode = "rgmii";
-
-	allwinner,leds-active-low;
-	status = "okay";
-};
-
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <0>;
-	};
-};
-
 &mmc2 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc2_8bit_pins>;
diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts
index 80026f3caafc..97920b12a944 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts
@@ -61,19 +61,3 @@
 		gpio = <&pio 3 6 GPIO_ACTIVE_HIGH>; /* PD6 */
 	};
 };
-
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&emac_rgmii_pins>;
-	phy-supply = <&reg_gmac_3v3>;
-	phy-handle = <&ext_rgmii_phy>;
-	phy-mode = "rgmii";
-	status = "okay";
-};
-
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
diff --git a/arch/arm/boot/dts/sunxi-h3-h5.dtsi b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
index d38282b9e5d4..11240a8313c2 100644
--- a/arch/arm/boot/dts/sunxi-h3-h5.dtsi
+++ b/arch/arm/boot/dts/sunxi-h3-h5.dtsi
@@ -391,32 +391,6 @@
 			clocks = <&osc24M>;
 		};
 
-		emac: ethernet@1c30000 {
-			compatible = "allwinner,sun8i-h3-emac";
-			syscon = <&syscon>;
-			reg = <0x01c30000 0x10000>;
-			interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
-			interrupt-names = "macirq";
-			resets = <&ccu RST_BUS_EMAC>;
-			reset-names = "stmmaceth";
-			clocks = <&ccu CLK_BUS_EMAC>;
-			clock-names = "stmmaceth";
-			#address-cells = <1>;
-			#size-cells = <0>;
-			status = "disabled";
-
-			mdio: mdio {
-				#address-cells = <1>;
-				#size-cells = <0>;
-				int_mii_phy: ethernet-phy@1 {
-					compatible = "ethernet-phy-ieee802.3-c22";
-					reg = <1>;
-					clocks = <&ccu CLK_BUS_EPHY>;
-					resets = <&ccu RST_BUS_EPHY>;
-				};
-			};
-		};
-
 		spi0: spi@01c68000 {
 			compatible = "allwinner,sun8i-h3-spi";
 			reg = <0x01c68000 0x1000>;
-- 
2.13.5

^ permalink raw reply related

* [PATCH 2/4] arm64: dts: allwinner: Revert EMAC changes
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel
In-Reply-To: <20170825191217.10278-1-maxime.ripard@free-electrons.com>

Since the discussion is not settled yet for the EMAC, and that the release
in getting really close, let's revert the changes for now, and we'll
reintroduce them later.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 .../boot/dts/allwinner/sun50i-a64-bananapi-m64.dts   | 17 -----------------
 .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts    | 15 ---------------
 arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts  | 18 ------------------
 .../dts/allwinner/sun50i-a64-sopine-baseboard.dts    | 17 -----------------
 arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi        | 20 --------------------
 .../boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts     | 17 -----------------
 .../boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts    | 17 -----------------
 .../boot/dts/allwinner/sun50i-h5-orangepi-prime.dts  | 17 -----------------
 8 files changed, 138 deletions(-)

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts
index 4a8d3f83a36e..d347f52e27f6 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-bananapi-m64.dts
@@ -51,7 +51,6 @@
 	compatible = "sinovoip,bananapi-m64", "allwinner,sun50i-a64";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 		serial1 = &uart1;
 	};
@@ -70,15 +69,6 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&rgmii_pins>;
-	phy-mode = "rgmii";
-	phy-handle = <&ext_rgmii_phy>;
-	phy-supply = <&reg_dc1sw>;
-	status = "okay";
-};
-
 &i2c1 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&i2c1_pins>;
@@ -89,13 +79,6 @@
 	bias-pull-up;
 };
 
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins>;
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts
index 24f1aac366d6..f82ccf332c0f 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts
@@ -48,18 +48,3 @@
 
 	/* TODO: Camera, touchscreen, etc. */
 };
-
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&rgmii_pins>;
-	phy-mode = "rgmii";
-	phy-handle = <&ext_rgmii_phy>;
-	status = "okay";
-};
-
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts
index 122b5d8e5438..caf8b6fbe5e3 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64.dts
@@ -51,7 +51,6 @@
 	compatible = "pine64,pine64", "allwinner,sun50i-a64";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 		serial1 = &uart1;
 		serial2 = &uart2;
@@ -79,16 +78,6 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&rmii_pins>;
-	phy-mode = "rmii";
-	phy-handle = <&ext_rmii_phy1>;
-	phy-supply = <&reg_dc1sw>;
-	status = "okay";
-
-};
-
 &i2c1 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&i2c1_pins>;
@@ -99,13 +88,6 @@
 	bias-pull-up;
 };
 
-&mdio {
-	ext_rmii_phy1: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins>;
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts
index a053a6ac5267..17ccc12b58df 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts
@@ -53,7 +53,6 @@
 		     "allwinner,sun50i-a64";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 	};
 
@@ -77,22 +76,6 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&rgmii_pins>;
-	phy-mode = "rgmii";
-	phy-handle = <&ext_rgmii_phy>;
-	phy-supply = <&reg_dc1sw>;
-	status = "okay";
-};
-
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
-
 &mmc2 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc2_pins>;
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
index 50f17bab0c07..8c8db1b057df 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
@@ -449,26 +449,6 @@
 			#size-cells = <0>;
 		};
 
-		emac: ethernet@1c30000 {
-			compatible = "allwinner,sun50i-a64-emac";
-			syscon = <&syscon>;
-			reg = <0x01c30000 0x10000>;
-			interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
-			interrupt-names = "macirq";
-			resets = <&ccu RST_BUS_EMAC>;
-			reset-names = "stmmaceth";
-			clocks = <&ccu CLK_BUS_EMAC>;
-			clock-names = "stmmaceth";
-			status = "disabled";
-			#address-cells = <1>;
-			#size-cells = <0>;
-
-			mdio: mdio {
-				#address-cells = <1>;
-				#size-cells = <0>;
-			};
-		};
-
 		gic: interrupt-controller@1c81000 {
 			compatible = "arm,gic-400";
 			reg = <0x01c81000 0x1000>,
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts b/arch/arm64/boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts
index 968908761194..1c2387bd5df6 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts
@@ -50,7 +50,6 @@
 	compatible = "friendlyarm,nanopi-neo2", "allwinner,sun50i-h5";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 	};
 
@@ -109,22 +108,6 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&emac_rgmii_pins>;
-	phy-supply = <&reg_gmac_3v3>;
-	phy-handle = <&ext_rgmii_phy>;
-	phy-mode = "rgmii";
-	status = "okay";
-};
-
-&mdio {
-	ext_rgmii_phy: ethernet-phy@7 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <7>;
-	};
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin>;
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts
index a8296feee884..4f77c8470f6c 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts
@@ -59,7 +59,6 @@
 	};
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 	};
 
@@ -137,28 +136,12 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&emac_rgmii_pins>;
-	phy-supply = <&reg_gmac_3v3>;
-	phy-handle = <&ext_rgmii_phy>;
-	phy-mode = "rgmii";
-	status = "okay";
-};
-
 &ir {
 	pinctrl-names = "default";
 	pinctrl-0 = <&ir_pins_a>;
 	status = "okay";
 };
 
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin>;
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts
index d906b302cbcd..6be06873e5af 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h5-orangepi-prime.dts
@@ -54,7 +54,6 @@
 	compatible = "xunlong,orangepi-prime", "allwinner,sun50i-h5";
 
 	aliases {
-		ethernet0 = &emac;
 		serial0 = &uart0;
 	};
 
@@ -144,28 +143,12 @@
 	status = "okay";
 };
 
-&emac {
-	pinctrl-names = "default";
-	pinctrl-0 = <&emac_rgmii_pins>;
-	phy-supply = <&reg_gmac_3v3>;
-	phy-handle = <&ext_rgmii_phy>;
-	phy-mode = "rgmii";
-	status = "okay";
-};
-
 &ir {
 	pinctrl-names = "default";
 	pinctrl-0 = <&ir_pins_a>;
 	status = "okay";
 };
 
-&mdio {
-	ext_rgmii_phy: ethernet-phy@1 {
-		compatible = "ethernet-phy-ieee802.3-c22";
-		reg = <1>;
-	};
-};
-
 &mmc0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin>;
-- 
2.13.5

^ permalink raw reply related

* [PATCH 1/4] dt-bindings: net: Revert sun8i dwmac binding
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel
In-Reply-To: <20170825191217.10278-1-maxime.ripard@free-electrons.com>

This binding still doesn't please everyone, and we're getting far too
close from the release to allow it to reach a stable version.

Let's remove it until the discussion settles down.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 .../devicetree/bindings/net/dwmac-sun8i.txt        | 84 ----------------------
 1 file changed, 84 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
deleted file mode 100644
index 725f3b187886..000000000000
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ /dev/null
@@ -1,84 +0,0 @@
-* Allwinner sun8i GMAC ethernet controller
-
-This device is a platform glue layer for stmmac.
-Please see stmmac.txt for the other unchanged properties.
-
-Required properties:
-- compatible: should be one of the following string:
-		"allwinner,sun8i-a83t-emac"
-		"allwinner,sun8i-h3-emac"
-		"allwinner,sun8i-v3s-emac"
-		"allwinner,sun50i-a64-emac"
-- reg: address and length of the register for the device.
-- interrupts: interrupt for the device
-- interrupt-names: should be "macirq"
-- clocks: A phandle to the reference clock for this device
-- clock-names: should be "stmmaceth"
-- resets: A phandle to the reset control for this device
-- reset-names: should be "stmmaceth"
-- phy-mode: See ethernet.txt
-- phy-handle: See ethernet.txt
-- #address-cells: shall be 1
-- #size-cells: shall be 0
-- syscon: A phandle to the syscon of the SoC with one of the following
- compatible string:
-  - allwinner,sun8i-h3-system-controller
-  - allwinner,sun8i-v3s-system-controller
-  - allwinner,sun50i-a64-system-controller
-  - allwinner,sun8i-a83t-system-controller
-
-Optional properties:
-- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 0-700. Default is 0)
-- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 0-3100. Default is 0)
-Both delay properties need to be a multiple of 100. They control the delay for
-external PHY.
-
-Optional properties for the following compatibles:
-  - "allwinner,sun8i-h3-emac",
-  - "allwinner,sun8i-v3s-emac":
-- allwinner,leds-active-low: EPHY LEDs are active low
-
-Required child node of emac:
-- mdio bus node: should be named mdio
-
-Required properties of the mdio node:
-- #address-cells: shall be 1
-- #size-cells: shall be 0
-
-The device node referenced by "phy" or "phy-handle" should be a child node
-of the mdio node. See phy.txt for the generic PHY bindings.
-
-Required properties of the phy node with the following compatibles:
-  - "allwinner,sun8i-h3-emac",
-  - "allwinner,sun8i-v3s-emac":
-- clocks: a phandle to the reference clock for the EPHY
-- resets: a phandle to the reset control for the EPHY
-
-Example:
-
-emac: ethernet@1c0b000 {
-	compatible = "allwinner,sun8i-h3-emac";
-	syscon = <&syscon>;
-	reg = <0x01c0b000 0x104>;
-	interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>;
-	interrupt-names = "macirq";
-	resets = <&ccu RST_BUS_EMAC>;
-	reset-names = "stmmaceth";
-	clocks = <&ccu CLK_BUS_EMAC>;
-	clock-names = "stmmaceth";
-	#address-cells = <1>;
-	#size-cells = <0>;
-
-	phy-handle = <&int_mii_phy>;
-	phy-mode = "mii";
-	allwinner,leds-active-low;
-	mdio: mdio {
-		#address-cells = <1>;
-		#size-cells = <0>;
-		int_mii_phy: ethernet-phy@1 {
-			reg = <1>;
-			clocks = <&ccu CLK_BUS_EPHY>;
-			resets = <&ccu RST_BUS_EPHY>;
-		};
-	};
-};
-- 
2.13.5

^ permalink raw reply related

* [PATCH 0/4] net: stmmac: revert the EMAC bindings
From: Maxime Ripard @ 2017-08-25 19:12 UTC (permalink / raw)
  To: arm, davem, Chen-Yu Tsai, Maxime Ripard
  Cc: linux-arm-kernel, netdev, f.fainelli, clabbe.montjoie, andrew,
	linux-kernel

Hi,

The bindings of the stmmac glue for the new Allwinner EMAC controller
are still controversial and being discussed, even though they've been
merged in 4.13.

In order not to introduce any binding we do not really want to commit
to in a stable release, especially since that would mean we would have
to support both the right and old bindings, let's revert them.

We will reintroduce them in due time, once the discussion has settled
down.

The first three patches should go through the arm-soc tree, the last
one through the net tree. All of them must be treated as fixes.

Thanks!
Maxime

Maxime Ripard (4):
  dt-bindings: net: Revert sun8i dwmac binding
  arm64: dts: allwinner: Revert EMAC changes
  arm: dts: sunxi: Revert EMAC changes
  net: stmmac: sun8i: Remove the compatibles

 .../devicetree/bindings/net/dwmac-sun8i.txt        | 84 ----------------------
 arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts  |  9 ---
 arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts    | 19 -----
 arch/arm/boot/dts/sun8i-h3-beelink-x2.dts          |  8 ---
 arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts          |  7 --
 arch/arm/boot/dts/sun8i-h3-orangepi-2.dts          |  8 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-one.dts        |  8 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-pc-plus.dts    |  5 --
 arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts         |  8 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-plus.dts       | 22 ------
 arch/arm/boot/dts/sun8i-h3-orangepi-plus2e.dts     | 16 -----
 arch/arm/boot/dts/sunxi-h3-h5.dtsi                 | 26 -------
 .../boot/dts/allwinner/sun50i-a64-bananapi-m64.dts | 17 -----
 .../boot/dts/allwinner/sun50i-a64-pine64-plus.dts  | 15 ----
 .../arm64/boot/dts/allwinner/sun50i-a64-pine64.dts | 18 -----
 .../dts/allwinner/sun50i-a64-sopine-baseboard.dts  | 17 -----
 arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi      | 20 ------
 .../boot/dts/allwinner/sun50i-h5-nanopi-neo2.dts   | 17 -----
 .../boot/dts/allwinner/sun50i-h5-orangepi-pc2.dts  | 17 -----
 .../dts/allwinner/sun50i-h5-orangepi-prime.dts     | 17 -----
 drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |  8 ---
 21 files changed, 366 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt

-- 
2.13.5

^ permalink raw reply

* [PATCH v2 net-next 8/8] samples/bpf: Update cgroup socket examples to use uid gid helper
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/sock_flags_kern.c |  5 +++++
 samples/bpf/test_cgrp2_sock.c | 12 +++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/sock_flags_kern.c b/samples/bpf/sock_flags_kern.c
index 533dd11a6baa..05dcdf8a4baa 100644
--- a/samples/bpf/sock_flags_kern.c
+++ b/samples/bpf/sock_flags_kern.c
@@ -9,8 +9,13 @@ SEC("cgroup/sock1")
 int bpf_prog1(struct bpf_sock *sk)
 {
 	char fmt[] = "socket: family %d type %d protocol %d\n";
+	char fmt2[] = "socket: uid %u gid %u\n";
+	__u64 gid_uid = bpf_get_current_uid_gid();
+	__u32 uid = gid_uid & 0xffffffff;
+	__u32 gid = gid_uid >> 32;
 
 	bpf_trace_printk(fmt, sizeof(fmt), sk->family, sk->type, sk->protocol);
+	bpf_trace_printk(fmt2, sizeof(fmt2), uid, gid);
 
 	/* block PF_INET6, SOCK_RAW, IPPROTO_ICMPV6 sockets
 	 * ie., make ping6 fail
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index eabf530a5223..e9eeaaf52219 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -46,8 +46,18 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 
 	/* set mark on socket */
 	struct bpf_insn prog_mark[] = {
-		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		/* get uid of process */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+			     BPF_FUNC_get_current_uid_gid),
+		BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 0xffffffff),
+
+		/* if uid is 0, use given mark, else use the uid as the mark */
+		BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
 		BPF_MOV64_IMM(BPF_REG_3, mark),
+
+		/* set the mark on the new socket */
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
 		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, mark)),
 		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, mark)),
 	};
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 7/8] samples/bpf: Add test case for nested socket options
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock3.sh | 162 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 162 insertions(+)
 create mode 100755 samples/bpf/test_cgrp2_sock3.sh

diff --git a/samples/bpf/test_cgrp2_sock3.sh b/samples/bpf/test_cgrp2_sock3.sh
new file mode 100755
index 000000000000..9bfed035963f
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock3.sh
@@ -0,0 +1,162 @@
+#!/bin/sh
+
+# Verify socket options inherited by bpf programs attached
+# to a cgroup.
+
+CGRP_MNT="/tmp/cgroupv2-test_cgrp2_sock"
+
+################################################################################
+#
+print_result()
+{
+	printf "%-50s    [%4s]\n" "$1" "$2"
+}
+
+check_sock()
+{
+	out=$(test_cgrp2_sock)
+	echo $out | grep -q "$1"
+	if [ $? -ne 0 ]; then
+		print_result "IPv4: $2" "FAIL"
+		echo "    expected: $1"
+		echo "        have: $out"
+		rc=1
+	else
+		print_result "IPv4: $2" " OK "
+	fi
+}
+
+check_sock6()
+{
+	out=$(test_cgrp2_sock -6)
+	echo $out | grep -q "$1"
+	if [ $? -ne 0 ]; then
+		print_result "IPv6: $2" "FAIL"
+		echo "    expected: $1"
+		echo "        have: $out"
+		rc=1
+	else
+		print_result "IPv6: $2" " OK "
+	fi
+}
+
+################################################################################
+#
+setup()
+{
+	cleanup 2>/dev/null
+
+	mkdir -p ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to create cgroup hierarchy"
+
+	test_cgrp2_sock -p 123 ${CGRP_MNT}/cgrp_sock_test/prio
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set priority"
+
+	test_cgrp2_sock -m 666 -r ${CGRP_MNT}/cgrp_sock_test/prio/mark
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set mark"
+
+	test_cgrp2_sock -b cgrp2_sock -r ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set device"
+}
+
+cleanup()
+{
+	echo $$ >> ${CGRP_MNT}/cgroup.procs
+	rmdir ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	rmdir ${CGRP_MNT}/cgrp_sock_test/prio/mark
+	rmdir ${CGRP_MNT}/cgrp_sock_test/prio
+	rmdir ${CGRP_MNT}/cgrp_sock_test
+}
+
+cleanup_and_exit()
+{
+	local rc=$1
+	local msg="$2"
+
+	[ -n "$msg" ] && echo "ERROR: $msg"
+
+	ip li del cgrp2_sock
+	umount ${CGRP_MNT}
+
+	exit $rc
+}
+
+################################################################################
+#
+
+run_tests()
+{
+	# set pid into first cgroup. socket should show it
+	# has a priority but not a mark or device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/cgroup.procs
+	check_sock "dev , mark 0, priority 123" "Priority only"
+
+	# set pid into second group. socket should show it
+	# has a priority and mark but not a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/cgroup.procs
+	check_sock "dev , mark 666, priority 123" "Priority + mark"
+
+	# set pid into inner group. socket should show it
+	# has a priority, mark and a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev/cgroup.procs
+	check_sock "dev cgrp2_sock, mark 666, priority 123" "Priority + mark + dev"
+
+	echo
+
+	# set pid into first cgroup. socket should show it
+	# has a priority but not a mark or device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/cgroup.procs
+	check_sock6 "dev , mark 0, priority 123" "Priority only"
+
+	# set pid into second group. socket should show it
+	# has a priority and mark but not a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/cgroup.procs
+	check_sock6 "dev , mark 666, priority 123" "Priority + mark"
+
+	# set pid into inner group. socket should show it
+	# has a priority, mark and a device bind
+	echo $$ > ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev/cgroup.procs
+	check_sock6 "dev cgrp2_sock, mark 666, priority 123" "Priority + mark + dev"
+}
+
+################################################################################
+# verify expected invalid setups are invalid
+
+invalid_setup()
+{
+	echo
+
+	mkdir -p ${CGRP_MNT}/cgrp_sock_test/prio/mark/dev
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to create cgroup hierarchy"
+
+	test_cgrp2_sock -p 123 -r ${CGRP_MNT}/cgrp_sock_test/prio
+	[ $? -ne 0 ] && cleanup_and_exit 1 "Failed to install program to set priority"
+
+	# recursive - followed by non-recursive is not allowed
+	test_cgrp2_sock -m 666 ${CGRP_MNT}/cgrp_sock_test/prio/mark >/dev/null 2>&1
+	if [ $? -eq 0 ]; then
+		print_result "recursive setting followed by non-recursive" "FAIL"
+	else
+		print_result "recursive setting followed by non-recursive" " OK "
+	fi
+}
+
+################################################################################
+# main
+
+rc=0
+
+ip li add cgrp2_sock type dummy 2>/dev/null
+
+set -e
+mkdir -p ${CGRP_MNT}
+mount -t cgroup2 none ${CGRP_MNT}
+set +e
+
+setup
+run_tests
+cleanup
+
+invalid_setup
+
+cleanup_and_exit $rc
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 6/8] samples/bpf: Add option to dump socket settings
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock.c | 75 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index a1ef7b8bd3f9..eabf530a5223 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -112,6 +112,70 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 	return ret;
 }
 
+static int get_bind_to_device(int sd, char *name, size_t len)
+{
+	socklen_t optlen = len;
+	int rc;
+
+	name[0] = '\0';
+	rc = getsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, name, &optlen);
+	if (rc < 0)
+		perror("setsockopt(SO_BINDTODEVICE)");
+
+	return rc;
+}
+
+static unsigned int get_somark(int sd)
+{
+	unsigned int mark = 0;
+	socklen_t optlen = sizeof(mark);
+	int rc;
+
+	rc = getsockopt(sd, SOL_SOCKET, SO_MARK, &mark, &optlen);
+	if (rc < 0)
+		perror("getsockopt(SO_MARK)");
+
+	return mark;
+}
+
+static unsigned int get_priority(int sd)
+{
+	unsigned int prio = 0;
+	socklen_t optlen = sizeof(prio);
+	int rc;
+
+	rc = getsockopt(sd, SOL_SOCKET, SO_PRIORITY, &prio, &optlen);
+	if (rc < 0)
+		perror("getsockopt(SO_PRIORITY)");
+
+	return prio;
+}
+
+static int show_sockopts(int family)
+{
+	unsigned int mark, prio;
+	char name[16];
+	int sd;
+
+	sd = socket(family, SOCK_DGRAM, 17);
+	if (sd < 0) {
+		perror("socket");
+		return 1;
+	}
+
+	if (get_bind_to_device(sd, name, sizeof(name)) < 0)
+		return 1;
+
+	mark = get_somark(sd);
+	prio = get_priority(sd);
+
+	close(sd);
+
+	printf("sd %d: dev %s, mark %u, priority %u\n", sd, name, mark, prio);
+
+	return 0;
+}
+
 static int usage(const char *argv0)
 {
 	printf("Usage:\n");
@@ -120,6 +184,9 @@ static int usage(const char *argv0)
 	printf("\n");
 	printf("  Detach a program\n");
 	printf("  %s -d cg-path\n", argv0);
+	printf("\n");
+	printf("  Show inherited socket settings (mark, priority, and device)\n");
+	printf("  %s [-6]\n", argv0);
 	return EXIT_FAILURE;
 }
 
@@ -129,10 +196,11 @@ int main(int argc, char **argv)
 	__u32 idx = 0, mark = 0, prio = 0;
 	const char *cgrp_path = NULL;
 	int cg_fd, prog_fd, ret;
+	int family = PF_INET;
 	int do_attach = 1;
 	int rc;
 
-	while ((rc = getopt(argc, argv, "db:m:p:r")) != -1) {
+	while ((rc = getopt(argc, argv, "db:m:p:r6")) != -1) {
 		switch (rc) {
 		case 'd':
 			do_attach = 0;
@@ -156,13 +224,16 @@ int main(int argc, char **argv)
 		case 'r':
 			attach_flags |= BPF_F_RECURSIVE;
 			break;
+		case '6':
+			family = PF_INET6;
+			break;
 		default:
 			return usage(argv[0]);
 		}
 	}
 
 	if (optind == argc)
-		return usage(argv[0]);
+		return show_sockopts(family);
 
 	cgrp_path = argv[optind];
 	if (!cgrp_path) {
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 5/8] samples/bpf: Add detach option to test_cgrp2_sock
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add option to detach programs from a cgroup.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock.c | 50 ++++++++++++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index b018bf948933..a1ef7b8bd3f9 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -114,7 +114,12 @@ static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 
 static int usage(const char *argv0)
 {
-	printf("Usage: %s -b bind-to-dev -m mark -p prio -r cg-path\n", argv0);
+	printf("Usage:\n");
+	printf("  Attach a program\n");
+	printf("  %s -b bind-to-dev -m mark -p prio -r cg-path\n", argv0);
+	printf("\n");
+	printf("  Detach a program\n");
+	printf("  %s -d cg-path\n", argv0);
 	return EXIT_FAILURE;
 }
 
@@ -124,10 +129,14 @@ int main(int argc, char **argv)
 	__u32 idx = 0, mark = 0, prio = 0;
 	const char *cgrp_path = NULL;
 	int cg_fd, prog_fd, ret;
+	int do_attach = 1;
 	int rc;
 
-	while ((rc = getopt(argc, argv, "b:m:p:r")) != -1) {
+	while ((rc = getopt(argc, argv, "db:m:p:r")) != -1) {
 		switch (rc) {
+		case 'd':
+			do_attach = 0;
+			break;
 		case 'b':
 			idx = if_nametoindex(optarg);
 			if (!idx) {
@@ -161,7 +170,7 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	if (!idx && !mark && !prio) {
+	if (do_attach && !idx && !mark && !prio) {
 		fprintf(stderr, "One of device, mark or priority must be given\n");
 		return EXIT_FAILURE;
 	}
@@ -172,20 +181,31 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	prog_fd = prog_load(idx, mark, prio);
-	if (prog_fd < 0) {
-		printf("Failed to load prog: '%s'\n", strerror(errno));
-		printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
-		return EXIT_FAILURE;
-	}
+	if (do_attach) {
+		prog_fd = prog_load(idx, mark, prio);
+		if (prog_fd < 0) {
+			printf("Failed to load prog: '%s'\n", strerror(errno));
+			printf("Output from kernel verifier:\n%s\n-------\n",
+			       bpf_log_buf);
+			return EXIT_FAILURE;
+		}
 
-	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE,
-			      attach_flags);
-	if (ret < 0) {
-		printf("Failed to attach prog to cgroup: '%s'\n",
-		       strerror(errno));
-		return EXIT_FAILURE;
+		ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE,
+				      attach_flags);
+		if (ret < 0) {
+			printf("Failed to attach prog to cgroup: '%s'\n",
+			       strerror(errno));
+			return EXIT_FAILURE;
+		}
+	} else {
+		ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
+		if (ret < 0) {
+			printf("Failed to detach prog from cgroup: '%s'\n",
+			       strerror(errno));
+			return EXIT_FAILURE;
+		}
 	}
 
+	close(cg_fd);
 	return EXIT_SUCCESS;
 }
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 4/8] samples/bpf: Update sock test to allow setting mark and priority
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Update sock test to set mark and priority on socket create.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/test_cgrp2_sock.c  | 139 ++++++++++++++++++++++++++++++++++++-----
 samples/bpf/test_cgrp2_sock.sh |   2 +-
 2 files changed, 123 insertions(+), 18 deletions(-)

diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index c3cfb23e23b5..b018bf948933 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -19,63 +19,168 @@
 #include <errno.h>
 #include <fcntl.h>
 #include <net/if.h>
+#include <inttypes.h>
 #include <linux/bpf.h>
 
 #include "libbpf.h"
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 
-static int prog_load(int idx)
+static int prog_load(__u32 idx, __u32 mark, __u32 prio)
 {
-	struct bpf_insn prog[] = {
+	/* save pointer to context */
+	struct bpf_insn prog_start[] = {
 		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	};
+	struct bpf_insn prog_end[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+
+	/* set sk_bound_dev_if on socket */
+	struct bpf_insn prog_dev[] = {
 		BPF_MOV64_IMM(BPF_REG_3, idx),
 		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
 		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
-		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
-		BPF_EXIT_INSN(),
 	};
-	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
 
-	return bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
+	/* set mark on socket */
+	struct bpf_insn prog_mark[] = {
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		BPF_MOV64_IMM(BPF_REG_3, mark),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, mark)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, mark)),
+	};
+
+	/* set priority on socket */
+	struct bpf_insn prog_prio[] = {
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		BPF_MOV64_IMM(BPF_REG_3, prio),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, priority)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, priority)),
+	};
+
+	struct bpf_insn *prog;
+	size_t insns_cnt;
+	void *p;
+	int ret;
+
+	insns_cnt = sizeof(prog_start) + sizeof(prog_end);
+	if (idx)
+		insns_cnt += sizeof(prog_dev);
+
+	if (mark)
+		insns_cnt += sizeof(prog_mark);
+
+	if (prio)
+		insns_cnt += sizeof(prog_prio);
+
+	p = prog = malloc(insns_cnt);
+	if (!prog) {
+		fprintf(stderr, "Failed to allocate memory for instructions\n");
+		return EXIT_FAILURE;
+	}
+
+	memcpy(p, prog_start, sizeof(prog_start));
+	p += sizeof(prog_start);
+
+	if (idx) {
+		memcpy(p, prog_dev, sizeof(prog_dev));
+		p += sizeof(prog_dev);
+	}
+
+	if (mark) {
+		memcpy(p, prog_mark, sizeof(prog_mark));
+		p += sizeof(prog_mark);
+	}
+
+	if (prio) {
+		memcpy(p, prog_prio, sizeof(prog_prio));
+		p += sizeof(prog_prio);
+	}
+
+	memcpy(p, prog_end, sizeof(prog_end));
+	p += sizeof(prog_end);
+
+	insns_cnt /= sizeof(struct bpf_insn);
+
+	ret = bpf_load_program(BPF_PROG_TYPE_CGROUP_SOCK, prog, insns_cnt,
 				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
+
+	free(prog);
+
+	return ret;
 }
 
 static int usage(const char *argv0)
 {
-	printf("Usage: %s cg-path device-index\n", argv0);
+	printf("Usage: %s -b bind-to-dev -m mark -p prio -r cg-path\n", argv0);
 	return EXIT_FAILURE;
 }
 
 int main(int argc, char **argv)
 {
+	__u32 attach_flags = BPF_F_ALLOW_OVERRIDE;
+	__u32 idx = 0, mark = 0, prio = 0;
+	const char *cgrp_path = NULL;
 	int cg_fd, prog_fd, ret;
-	unsigned int idx;
+	int rc;
+
+	while ((rc = getopt(argc, argv, "b:m:p:r")) != -1) {
+		switch (rc) {
+		case 'b':
+			idx = if_nametoindex(optarg);
+			if (!idx) {
+				idx = strtoumax(optarg, NULL, 0);
+				if (!idx) {
+					printf("Invalid device name\n");
+					return EXIT_FAILURE;
+				}
+			}
+			break;
+		case 'm':
+			mark = strtoumax(optarg, NULL, 0);
+			break;
+		case 'p':
+			prio = strtoumax(optarg, NULL, 0);
+			break;
+		case 'r':
+			attach_flags |= BPF_F_RECURSIVE;
+			break;
+		default:
+			return usage(argv[0]);
+		}
+	}
 
-	if (argc < 2)
+	if (optind == argc)
 		return usage(argv[0]);
 
-	idx = if_nametoindex(argv[2]);
-	if (!idx) {
-		printf("Invalid device name\n");
+	cgrp_path = argv[optind];
+	if (!cgrp_path) {
+		fprintf(stderr, "cgroup path not given\n");
 		return EXIT_FAILURE;
 	}
 
-	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (!idx && !mark && !prio) {
+		fprintf(stderr, "One of device, mark or priority must be given\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(cgrp_path, O_DIRECTORY | O_RDONLY);
 	if (cg_fd < 0) {
 		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
 		return EXIT_FAILURE;
 	}
 
-	prog_fd = prog_load(idx);
-	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
-
+	prog_fd = prog_load(idx, mark, prio);
 	if (prog_fd < 0) {
 		printf("Failed to load prog: '%s'\n", strerror(errno));
+		printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
 		return EXIT_FAILURE;
 	}
 
-	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE, 0);
+	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE,
+			      attach_flags);
 	if (ret < 0) {
 		printf("Failed to attach prog to cgroup: '%s'\n",
 		       strerror(errno));
diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh
index 925fd467c7cc..1153c33e8964 100755
--- a/samples/bpf/test_cgrp2_sock.sh
+++ b/samples/bpf/test_cgrp2_sock.sh
@@ -20,7 +20,7 @@ function attach_bpf {
 	mkdir -p /tmp/cgroupv2
 	mount -t cgroup2 none /tmp/cgroupv2
 	mkdir -p /tmp/cgroupv2/foo
-	test_cgrp2_sock /tmp/cgroupv2/foo foo
+	test_cgrp2_sock -b foo /tmp/cgroupv2/foo
 	echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 3/8] bpf: Allow cgroup sock filters to use get_current_uid_gid helper
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Allow BPF programs run on sock create to use the get_current_uid_gid
helper. IPv4 and IPv6 sockets are created in a process context so
there is always a valid uid/gid

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/core/filter.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index d582d1b1e533..eb505842a77e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3139,6 +3139,20 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
+sock_filter_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	/* inet and inet6 sockets are created in a process
+	 * context so there is always a valid uid/gid
+	 */
+	case BPF_FUNC_get_current_uid_gid:
+		return &bpf_get_current_uid_gid_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static const struct bpf_func_proto *
 sk_filter_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -4222,7 +4236,7 @@ const struct bpf_verifier_ops lwt_xmit_prog_ops = {
 };
 
 const struct bpf_verifier_ops cg_sock_prog_ops = {
-	.get_func_proto		= bpf_base_func_proto,
+	.get_func_proto		= sock_filter_func_proto,
 	.is_valid_access	= sock_filter_is_valid_access,
 	.convert_ctx_access	= sock_filter_convert_ctx_access,
 };
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 2/8] bpf: Add mark and priority to sock options that can be set
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add socket mark and priority to fields that can be set by
ebpf program when a socket is created.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/uapi/linux/bpf.h |  2 ++
 net/core/filter.c        | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 595e31b30f23..f72b957580cd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -773,6 +773,8 @@ struct bpf_sock {
 	__u32 family;
 	__u32 type;
 	__u32 protocol;
+	__u32 mark;
+	__u32 priority;
 };
 
 #define XDP_PACKET_HEADROOM 256
diff --git a/net/core/filter.c b/net/core/filter.c
index 4bcd6baa80c9..d582d1b1e533 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3444,6 +3444,10 @@ static bool sock_filter_is_valid_access(int off, int size,
 		switch (off) {
 		case offsetof(struct bpf_sock, bound_dev_if):
 			break;
+		case offsetof(struct bpf_sock, mark):
+			break;
+		case offsetof(struct bpf_sock, priority):
+			break;
 		default:
 			return false;
 		}
@@ -3947,6 +3951,28 @@ static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct sock, sk_bound_dev_if));
 		break;
 
+	case offsetof(struct bpf_sock, mark):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_mark) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					offsetof(struct sock, sk_mark));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      offsetof(struct sock, sk_mark));
+		break;
+
+	case offsetof(struct bpf_sock, priority):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, si->dst_reg, si->src_reg,
+					offsetof(struct sock, sk_priority));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      offsetof(struct sock, sk_priority));
+		break;
+
 	case offsetof(struct bpf_sock, family):
 		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_family) != 2);
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

Add support for recursively applying sock filters attached to a cgroup.
For now, start with the inner cgroup attached to the socket and work back
to the root or first cgroup without the recursive flag set. Once the
recursive flag is set for a cgroup all descendant group's must have the
flag as well.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/bpf-cgroup.h | 10 ++++++----
 include/uapi/linux/bpf.h   |  9 +++++++++
 kernel/bpf/cgroup.c        | 29 ++++++++++++++++++++++-------
 kernel/bpf/syscall.c       |  6 +++---
 kernel/cgroup/cgroup.c     | 25 +++++++++++++++++++++++--
 5 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index d41d40ac3efd..2d02187f242f 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -23,6 +23,7 @@ struct cgroup_bpf {
 	struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
 	struct bpf_prog __rcu *effective[MAX_BPF_ATTACH_TYPE];
 	bool disallow_override[MAX_BPF_ATTACH_TYPE];
+	bool is_recursive[MAX_BPF_ATTACH_TYPE];
 };
 
 void cgroup_bpf_put(struct cgroup *cgrp);
@@ -30,18 +31,19 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
 
 int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 			struct bpf_prog *prog, enum bpf_attach_type type,
-			bool overridable);
+			u32 flags);
 
 /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
 int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, bool overridable);
+		      enum bpf_attach_type type, u32 flags);
 
 int __cgroup_bpf_run_filter_skb(struct sock *sk,
 				struct sk_buff *skb,
 				enum bpf_attach_type type);
 
-int __cgroup_bpf_run_filter_sk(struct sock *sk,
+int __cgroup_bpf_run_filter_sk(struct cgroup *cgrp, struct sock *sk,
 			       enum bpf_attach_type type);
+int cgroup_bpf_run_filter_sk(struct sock *sk, enum bpf_attach_type type);
 
 int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 				     struct bpf_sock_ops_kern *sock_ops,
@@ -74,7 +76,7 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled && sk) {					       \
-		__ret = __cgroup_bpf_run_filter_sk(sk,			       \
+		__ret = cgroup_bpf_run_filter_sk(sk,			       \
 						 BPF_CGROUP_INET_SOCK_CREATE); \
 	}								       \
 	__ret;								       \
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f71f5e07d82d..595e31b30f23 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -151,6 +151,15 @@ enum bpf_attach_type {
  */
 #define BPF_F_ALLOW_OVERRIDE	(1U << 0)
 
+/* If BPF_F_RECURSIVE flag is used in BPF_PROG_ATTACH command
+ * cgroups are walked recursively back to the root cgroup or the
+ * first cgroup without the flag set running any program attached.
+ * Once the flag is set, it MUST be set for all descendant cgroups.
+ */
+#define BPF_F_RECURSIVE		(1U << 1)
+
+#define BPF_F_ALL_ATTACH_FLAGS  (BPF_F_ALLOW_OVERRIDE | BPF_F_RECURSIVE)
+
 /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
  * verifier will perform strict alignment checking as if the kernel
  * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 546113430049..eb1f436c18fb 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -47,10 +47,16 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
 	unsigned int type;
 
 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.effective); type++) {
-		struct bpf_prog *e;
+		struct bpf_prog *e = NULL;
+
+		/* do not need to set effective program if cgroups are
+		 * walked recursively
+		 */
+		cgrp->bpf.is_recursive[type] = parent->bpf.is_recursive[type];
+		if (!cgrp->bpf.is_recursive[type])
+			e = rcu_dereference_protected(parent->bpf.effective[type],
+						      lockdep_is_held(&cgroup_mutex));
 
-		e = rcu_dereference_protected(parent->bpf.effective[type],
-					      lockdep_is_held(&cgroup_mutex));
 		rcu_assign_pointer(cgrp->bpf.effective[type], e);
 		cgrp->bpf.disallow_override[type] = parent->bpf.disallow_override[type];
 	}
@@ -85,8 +91,12 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
  */
 int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 			struct bpf_prog *prog, enum bpf_attach_type type,
-			bool new_overridable)
+			u32 flags)
 {
+	bool new_overridable = flags & BPF_F_ALLOW_OVERRIDE;
+	/* initial state inherited from parent */
+	bool curr_recursive = cgrp->bpf.is_recursive[type];
+	bool new_recursive = flags & BPF_F_RECURSIVE;
 	struct bpf_prog *old_prog, *effective = NULL;
 	struct cgroup_subsys_state *pos;
 	bool overridable = true;
@@ -109,6 +119,12 @@ int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 		 */
 		return -EPERM;
 
+	if (prog && curr_recursive && !new_recursive)
+		/* if a parent has recursive prog attached, only
+		 * allow recursive programs in descendent cgroup
+		 */
+		return -EINVAL;
+
 	old_prog = cgrp->bpf.prog[type];
 
 	if (prog) {
@@ -139,6 +155,7 @@ int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
 			rcu_assign_pointer(desc->bpf.effective[type],
 					   effective);
 			desc->bpf.disallow_override[type] = !overridable;
+			desc->bpf.is_recursive[type] = new_recursive;
 		}
 	}
 
@@ -217,14 +234,12 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_skb);
  * This function will return %-EPERM if any if an attached program was found
  * and if it returned != 1 during execution. In all other cases, 0 is returned.
  */
-int __cgroup_bpf_run_filter_sk(struct sock *sk,
+int __cgroup_bpf_run_filter_sk(struct cgroup *cgrp, struct sock *sk,
 			       enum bpf_attach_type type)
 {
-	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
 	struct bpf_prog *prog;
 	int ret = 0;
 
-
 	rcu_read_lock();
 
 	prog = rcu_dereference(cgrp->bpf.effective[type]);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d5774a6851f1..a1ab5dbaae89 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1187,7 +1187,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_PROG_ATTACH))
 		return -EINVAL;
 
-	if (attr->attach_flags & ~BPF_F_ALLOW_OVERRIDE)
+	if (attr->attach_flags & ~BPF_F_ALL_ATTACH_FLAGS)
 		return -EINVAL;
 
 	switch (attr->attach_type) {
@@ -1222,7 +1222,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	}
 
 	ret = cgroup_bpf_update(cgrp, prog, attr->attach_type,
-				attr->attach_flags & BPF_F_ALLOW_OVERRIDE);
+				attr->attach_flags);
 	if (ret)
 		bpf_prog_put(prog);
 	cgroup_put(cgrp);
@@ -1252,7 +1252,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
 
-		ret = cgroup_bpf_update(cgrp, NULL, attr->attach_type, false);
+		ret = cgroup_bpf_update(cgrp, NULL, attr->attach_type, 0);
 		cgroup_put(cgrp);
 		break;
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index df2e0f14a95d..27a4f14435a3 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5176,14 +5176,35 @@ void cgroup_sk_free(struct sock_cgroup_data *skcd)
 
 #ifdef CONFIG_CGROUP_BPF
 int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, bool overridable)
+		      enum bpf_attach_type type, u32 flags)
 {
 	struct cgroup *parent = cgroup_parent(cgrp);
 	int ret;
 
 	mutex_lock(&cgroup_mutex);
-	ret = __cgroup_bpf_update(cgrp, parent, prog, type, overridable);
+	ret = __cgroup_bpf_update(cgrp, parent, prog, type, flags);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
+
+int cgroup_bpf_run_filter_sk(struct sock *sk,
+			     enum bpf_attach_type type)
+{
+	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	int ret = 0;
+
+	while (cgrp) {
+		ret = __cgroup_bpf_run_filter_sk(cgrp, sk, type);
+		if (ret)
+			break;
+
+		if (!cgrp->bpf.is_recursive[type])
+			break;
+
+		cgrp = cgroup_parent(cgrp);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(cgroup_bpf_run_filter_sk);
 #endif /* CONFIG_CGROUP_BPF */
-- 
2.1.4

^ permalink raw reply related

* [PATCH v2 net-next 0/8] bpf: Add option to set mark and priority in cgroup sock programs
From: David Ahern @ 2017-08-25 19:05 UTC (permalink / raw)
  To: netdev, daniel, ast, tj, davem; +Cc: David Ahern

Add option to set mark and priority in addition to bound device for newly
created sockets. Also, allow the bpf programs to use the get_current_uid_gid
helper meaning socket marks, priority and device can be set base on the
uid/gid of the running process.

For flexbility in deploying these programs, option is added to allow cgroups
to be walked from current to root running any program attached. This allows
one cgroup level to control the device a socket is bound to (e.g, a VRF) while
cgroups can be used to set socket marks and priority.

Sample programs are updated to demonstrate the new options.

v2
- added flag to control recursive behavior as requested by Alexei
- added comment to sock_filter_func_proto regarding use of
  get_current_uid_gid helper
- updated test programs for recursive option

David Ahern (8):
  bpf: Add support for recursively running cgroup sock filters
  bpf: Add mark and priority to sock options that can be set
  bpf: Allow cgroup sock filters to use get_current_uid_gid helper
  samples/bpf: Update sock test to allow setting mark and priority
  samples/bpf: Add detach option to test_cgrp2_sock
  samples/bpf: Add option to dump socket settings
  samples/bpf: Add test case for nested socket options
  samples/bpf: Update cgroup socket examples to use uid gid helper

 include/linux/bpf-cgroup.h      |  10 +-
 include/uapi/linux/bpf.h        |  11 ++
 kernel/bpf/cgroup.c             |  29 +++--
 kernel/bpf/syscall.c            |   6 +-
 kernel/cgroup/cgroup.c          |  25 +++-
 net/core/filter.c               |  42 ++++++-
 samples/bpf/sock_flags_kern.c   |   5 +
 samples/bpf/test_cgrp2_sock.c   | 258 ++++++++++++++++++++++++++++++++++++----
 samples/bpf/test_cgrp2_sock.sh  |   2 +-
 samples/bpf/test_cgrp2_sock3.sh | 162 +++++++++++++++++++++++++
 10 files changed, 506 insertions(+), 44 deletions(-)
 create mode 100755 samples/bpf/test_cgrp2_sock3.sh

-- 
2.1.4

^ permalink raw reply

* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Eric Dumazet @ 2017-08-25 18:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <20170825205653-mutt-send-email-mst@kernel.org>

On Fri, 2017-08-25 at 21:03 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > As found by syzkaller, malicious users can set whatever tx_queue_len
> > on a tun device and eventually crash the kernel.
> > 
> > Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> > ring buffer is not fast anyway.
> 
> I'm not sure it's worth changing for small rings.
> 
> Does kmalloc_array guarantee cache line alignment for big buffers
> then? If the ring is misaligned it will likely cause false sharing
> as it's designed to be accessed from two CPUs.

I specifically said that in the changelog :

"since a small ring buffer is not fast anyway."

If one user sets up a pathological small ring buffer, kernel should not
try to fix it.

In this case, you would have to setup a ring of 2 or 4 slots to
eventually hit false sharing.

^ permalink raw reply

* [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2017-08-25 18:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: kvm, mst, netdev, linux-kernel, stable, virtualization, stefanha,
	yasu.isimatu, hch

The following changes since commit 14ccee78fc82f5512908f4424f541549a5705b89:

  Linux 4.13-rc6 (2017-08-20 14:13:52 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to ba74b6f7fcc07355d087af6939712eed4a454821:

  virtio_pci: fix cpu affinity support (2017-08-25 21:38:26 +0300)

----------------------------------------------------------------
virtio: bugfix

Fixes two obvious bugs in virtio pci.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Christoph Hellwig (1):
      virtio_pci: fix cpu affinity support

Stefan Hajnoczi (1):
      virtio_blk: fix incorrect message when disk is resized

 drivers/block/virtio_blk.c         | 16 ++++++++++------
 drivers/virtio/virtio_pci_common.c | 10 +++++++---
 2 files changed, 17 insertions(+), 9 deletions(-)

^ permalink raw reply

* Re: [PATCH net] ptr_ring: use kmalloc_array()
From: Michael S. Tsirkin @ 2017-08-25 18:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jason Wang
In-Reply-To: <1502905007.4936.133.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Aug 16, 2017 at 10:36:47AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> As found by syzkaller, malicious users can set whatever tx_queue_len
> on a tun device and eventually crash the kernel.
> 
> Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
> ring buffer is not fast anyway.

I'm not sure it's worth changing for small rings.

Does kmalloc_array guarantee cache line alignment for big buffers
then? If the ring is misaligned it will likely cause false sharing
as it's designed to be accessed from two CPUs.

> Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> ---
>  include/linux/ptr_ring.h  |    9 +++++----
>  include/linux/skb_array.h |    3 ++-
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> index d8c97ec8a8e6..37b4bb2545b3 100644
> --- a/include/linux/ptr_ring.h
> +++ b/include/linux/ptr_ring.h
> @@ -436,9 +436,9 @@ static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
>  	__PTR_RING_PEEK_CALL_v; \
>  })
>  
> -static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
> +static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
>  {
> -	return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
> +	return kcalloc(size, sizeof(void *), gfp);
>  }
>  
>  static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
> @@ -582,7 +582,8 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
>   * In particular if you consume ring in interrupt or BH context, you must
>   * disable interrupts/BH when doing so.
>   */
> -static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
> +static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
> +					   unsigned int nrings,
>  					   int size,
>  					   gfp_t gfp, void (*destroy)(void *))
>  {
> @@ -590,7 +591,7 @@ static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
>  	void ***queues;
>  	int i;
>  
> -	queues = kmalloc(nrings * sizeof *queues, gfp);
> +	queues = kmalloc_array(nrings, sizeof(*queues), gfp);
>  	if (!queues)
>  		goto noqueues;
>  
> diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
> index 35226cd4efb0..8621ffdeecbf 100644
> --- a/include/linux/skb_array.h
> +++ b/include/linux/skb_array.h
> @@ -193,7 +193,8 @@ static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
>  }
>  
>  static inline int skb_array_resize_multiple(struct skb_array **rings,
> -					    int nrings, int size, gfp_t gfp)
> +					    int nrings, unsigned int size,
> +					    gfp_t gfp)
>  {
>  	BUILD_BUG_ON(offsetof(struct skb_array, ring));
>  	return ptr_ring_resize_multiple((struct ptr_ring **)rings,
> 

^ permalink raw reply

* Re: Permissions for eBPF objects
From: Jeffrey Vander Stoep via Selinux @ 2017-08-25 18:03 UTC (permalink / raw)
  To: Chenbo Feng, SELinux, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CABXk95ATb_AFk+4GX9Xw+HEU6No8irb0mOoLE9O4EBuLAgA-1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

Disregard this email. Re-sending in plain-text mode to prevent rejection by
netdev list.

On Fri, Aug 25, 2017 at 10:56 AM Jeffrey Vander Stoep <jeffv-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
wrote:

> I’d like to get your thoughts on adding LSM permission checks on BPF
> objects.
>
> By default, the ability to create and use eBPF maps/programs requires
> CAP_SYS_ADMIN [1]. Alternatively, all processes can be granted access to
> bpf() functions. This seems like poor granularity. [2]
>
> Like files and sockets, eBPF maps and programs can be passed between
> processes by FD and have a number of functions that map cleanly to
> permissions.
>
> Let me know what you think. Are there simpler alternative approaches that
> we haven’t considered?
>
> Thanks!
> Jeff
>
> [1] http://man7.org/linux/man-pages/man2/bpf.2.html NOTES section
> [2] We are considering eBPF for network filtering by netd. Giving netd
> CAP_SYS_ADMIN would considerably increase netd’s privileges. Alternatively
> allowing all processes permission to use bpf() goes against the principle
> of least privilege exposing a lot of kernel attack surface to processes
> that do not actually need it.
>

[-- Attachment #2: Type: text/html, Size: 1660 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox