Netdev List
 help / color / mirror / Atom feed
* Re: [net-next V4 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: John Fastabend @ 2017-10-05 18:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, pavel.odintsov,
	Jason Wang, mchan, peter.waskiewicz.jr, Daniel Borkmann,
	Andy Gospodarek
In-Reply-To: <20171004190201.5no5mrmkko43cvv2@ast-mbp>

On 10/04/2017 12:02 PM, Alexei Starovoitov wrote:
> On Wed, Oct 04, 2017 at 02:03:45PM +0200, Jesper Dangaard Brouer wrote:
>> The 'cpumap' is primary used as a backend map for XDP BPF helper
>> call bpf_redirect_map() and XDP_REDIRECT action, like 'devmap'.
>>
>> This patch implement the main part of the map.  It is not connected to
>> the XDP redirect system yet, and no SKB allocation are done yet.
>>
>> The main concern in this patch is to ensure the datapath can run
>> without any locking.  This adds complexity to the setup and tear-down
>> procedure, which assumptions are extra carefully documented in the
>> code comments.
>>
>> V2:
>>  - make sure array isn't larger than NR_CPUS
>>  - make sure CPUs added is a valid possible CPU
>>
>> V3: fix nitpicks from Jakub Kicinski <kubakici@wp.pl>
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ...
>> +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
>> +{
>> +	struct bpf_cpu_map *cmap;
>> +	u64 cost;
>> +	int err;
>> +
>> +	/* check sanity of attributes */
>> +	if (attr->max_entries == 0 || attr->key_size != 4 ||
>> +	    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	cmap = kzalloc(sizeof(*cmap), GFP_USER);
>> +	if (!cmap)
>> +		return ERR_PTR(-ENOMEM);
> 
> just noticed that there is nothing here nor in DEVMAP/SOCKMAP
> that prevents unpriv user to create them.
> I'm not sure it was intentional for DEVMAP/SOCKMAP.
> For CPUMAP I'd suggest to restrict it to root, since it
> suppose to operate with XDP only which is root anyway.
> Note, lpm and lru maps are cap_sys_admin only already.
> 

For DEVMAP I think the same argument applies. DEVMAP is supposed
to operate with XDP only which is CAP_NET_ADMIN restricted so
we should restrict DEVMAP as well.

In the SOCKMAP case although the map can be created programs
can not be attached. So I'll restrict it to CAP_NET_ADMIN as well
until someone has a clear use case for allowing it. I don't have
a use case for non CAP_NET_ADMIN usage and its easier to relax
restrictions later than add them.

I have a couple fixes for sockmap under test so I'll add these
patches as well. Should have the set ready shortly, in a few days.

Thanks,
John

^ permalink raw reply

* Re: [PATCH 00/18] use ARRAY_SIZE macro
From: J. Bruce Fields @ 2017-10-05 17:57 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Greg KH, Tobin C. Harding, alsa-devel, nouveau, dri-devel,
	dm-devel, brcm80211-dev-list, devel, linux-scsi, linux-rdma,
	amd-gfx, Jason Gunthorpe, linux-acpi, linux-video,
	intel-wired-lan, linux-media, intel-gfx, ecryptfs, linux-nfs,
	linux-raid, openipmi-developer, intel-gvt-dev, devel,
	brcm80211-dev-list.pdl, netdev, linux-usb
In-Reply-To: <20171002213312.3f904290@blatinox-laptop.localdomain>

On Mon, Oct 02, 2017 at 09:33:12PM -0400, Jérémy Lefaure wrote:
> On Mon, 2 Oct 2017 15:22:24 -0400
> bfields@fieldses.org (J. Bruce Fields) wrote:
> 
> > Mainly I'd just like to know which you're asking for.  Do you want me to
> > apply this, or to ACK it so someone else can?  If it's sent as a series
> > I tend to assume the latter.
> > 
> > But in this case I'm assuming it's the former, so I'll pick up the nfsd
> > one....
> Could you to apply the NFSD patch ("nfsd: use ARRAY_SIZE") with the
> Reviewed-by: Jeff Layton <jlayton@redhat.com>) tag please ?
> 
> This patch is an individual patch and it should not have been sent in a
> series (sorry about that).

Applying for 4.15, thanks.--b.

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] libbpf: parse maps sections of varying size
From: Jesper Dangaard Brouer @ 2017-10-05 17:52 UTC (permalink / raw)
  To: Craig Gallek
  Cc: Alexei Starovoitov, Daniel Borkmann, David S . Miller,
	Chonggang Li, netdev, brouer
In-Reply-To: <20171005144158.14860-2-kraigatgoog@gmail.com>


On Thu,  5 Oct 2017 10:41:57 -0400 Craig Gallek <kraigatgoog@gmail.com> wrote:

> From: Craig Gallek <kraig@google.com>
> 
> This library previously assumed a fixed-size map options structure.
> Any new options were ignored.  In order to allow the options structure
> to grow and to support parsing older programs, this patch updates
> the maps section parsing to handle varying sizes.
> 
> Object files with maps sections smaller than expected will have the new
> fields initialized to zero.  Object files which have larger than expected
> maps sections will be rejected unless all of the unrecognized data is zero.
> 
> This change still assumes that each map definition in the maps section
> is the same size.
> 
> Signed-off-by: Craig Gallek <kraig@google.com>
> ---
>  tools/lib/bpf/libbpf.c | 70 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 41 insertions(+), 29 deletions(-)

Thank you for working on this! :-)

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH v2] net/mac80211/mesh_plink: Convert timers to use timer_setup()
From: Kees Cook @ 2017-10-05 17:39 UTC (permalink / raw)
  To: Johannes Berg
  Cc: David S. Miller, linux-wireless, netdev, Thomas Gleixner,
	linux-kernel

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. This requires adding a pointer back
to the sta_info since container_of() can't resolve the sta_info.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
This requires commit 686fef928bba ("timer: Prepare to change timer
callback argument type") in v4.14-rc3, but should be otherwise
stand-alone.

v2:
- make mesh_plink_timer non-static and use it in timer_setup() call directly.
---
 net/mac80211/mesh.h       |  1 +
 net/mac80211/mesh_plink.c | 10 ++++------
 net/mac80211/sta_info.c   |  4 +++-
 net/mac80211/sta_info.h   |  2 ++
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/mac80211/mesh.h b/net/mac80211/mesh.h
index 7e5f271e3c30..465b7853edc0 100644
--- a/net/mac80211/mesh.h
+++ b/net/mac80211/mesh.h
@@ -275,6 +275,7 @@ void mesh_neighbour_update(struct ieee80211_sub_if_data *sdata,
 			   u8 *hw_addr, struct ieee802_11_elems *ie);
 bool mesh_peer_accepts_plinks(struct ieee802_11_elems *ie);
 u32 mesh_accept_plinks_update(struct ieee80211_sub_if_data *sdata);
+void mesh_plink_timer(struct timer_list *t);
 void mesh_plink_broken(struct sta_info *sta);
 u32 mesh_plink_deactivate(struct sta_info *sta);
 u32 mesh_plink_open(struct sta_info *sta);
diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index f69c6c38ca43..e79adb4164f3 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -604,8 +604,9 @@ void mesh_neighbour_update(struct ieee80211_sub_if_data *sdata,
 	ieee80211_mbss_info_change_notify(sdata, changed);
 }
 
-static void mesh_plink_timer(unsigned long data)
+void mesh_plink_timer(struct timer_list *t)
 {
+	struct mesh_sta *mesh = from_timer(mesh, t, plink_timer);
 	struct sta_info *sta;
 	u16 reason = 0;
 	struct ieee80211_sub_if_data *sdata;
@@ -617,7 +618,7 @@ static void mesh_plink_timer(unsigned long data)
 	 * del_timer_sync() this timer after having made sure
 	 * it cannot be readded (by deleting the plink.)
 	 */
-	sta = (struct sta_info *) data;
+	sta = mesh->plink_sta;
 
 	if (sta->sdata->local->quiescing)
 		return;
@@ -697,11 +698,8 @@ static void mesh_plink_timer(unsigned long data)
 
 static inline void mesh_plink_timer_set(struct sta_info *sta, u32 timeout)
 {
-	sta->mesh->plink_timer.expires = jiffies + msecs_to_jiffies(timeout);
-	sta->mesh->plink_timer.data = (unsigned long) sta;
-	sta->mesh->plink_timer.function = mesh_plink_timer;
 	sta->mesh->plink_timeout = timeout;
-	add_timer(&sta->mesh->plink_timer);
+	mod_timer(&sta->mesh->plink_timer, jiffies + msecs_to_jiffies(timeout));
 }
 
 static bool llid_in_use(struct ieee80211_sub_if_data *sdata,
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 69615016d5bf..6c254a9d5f11 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -329,10 +329,12 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
 		sta->mesh = kzalloc(sizeof(*sta->mesh), gfp);
 		if (!sta->mesh)
 			goto free;
+		sta->mesh->plink_sta = sta;
 		spin_lock_init(&sta->mesh->plink_lock);
 		if (ieee80211_vif_is_mesh(&sdata->vif) &&
 		    !sdata->u.mesh.user_mpm)
-			init_timer(&sta->mesh->plink_timer);
+			timer_setup(&sta->mesh->plink_timer, mesh_plink_timer,
+				    0);
 		sta->mesh->nonpeer_pm = NL80211_MESH_POWER_ACTIVE;
 	}
 #endif
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 3acbdfa9f649..21d9760ce5c3 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -344,6 +344,7 @@ DECLARE_EWMA(mesh_fail_avg, 20, 8)
  * @plink_state: peer link state
  * @plink_timeout: timeout of peer link
  * @plink_timer: peer link watch timer
+ * @plink_sta: peer link watch timer's sta_info
  * @t_offset: timing offset relative to this host
  * @t_offset_setpoint: reference timing offset of this sta to be used when
  * 	calculating clockdrift
@@ -356,6 +357,7 @@ DECLARE_EWMA(mesh_fail_avg, 20, 8)
  */
 struct mesh_sta {
 	struct timer_list plink_timer;
+	struct sta_info *plink_sta;
 
 	s64 t_offset;
 	s64 t_offset_setpoint;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH] net/mac80211/mesh_plink: Convert timers to use
From: Kees Cook @ 2017-10-05 17:27 UTC (permalink / raw)
  To: Johannes Berg
  Cc: LKML, David S. Miller, linux-wireless, Network Development,
	Thomas Gleixner
In-Reply-To: <1507186052.2387.11.camel@sipsolutions.net>

On Wed, Oct 4, 2017 at 11:47 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Wed, 2017-10-04 at 17:49 -0700, Kees Cook wrote:
>> In preparation for unconditionally passing the struct timer_list
>> pointer to all timer callbacks, switch to using the new timer_setup()
>> and from_timer() to pass the timer pointer explicitly. This requires
>> adding a pointer back to the sta_info since container_of() can't
>> resolve the sta_info.
>
> The subject seems to be lacking something ... :-)

Oh wonderful, all the subjects are cut off. *sigh* I wonder which
piece of my workflow broke that...

>> This requires commit 686fef928bba ("timer: Prepare to change timer
>> callback argument type") in v4.14-rc3, but should be otherwise
>> stand-alone.
>
> I still can't apply that because that's not in net-next right now.

Okay, I'll see if Dave brings that into net-next and resend after that.

>>  static inline void mesh_plink_timer_set(struct sta_info *sta, u32
>> timeout)
>>  {
>>       sta->mesh->plink_timer.expires = jiffies +
>> msecs_to_jiffies(timeout);
>> -     sta->mesh->plink_timer.data = (unsigned long) sta;
>> -     sta->mesh->plink_timer.function = mesh_plink_timer;
>> +     sta->mesh->plink_sta = sta;
>> +     sta->mesh->plink_timer.function =
>> (TIMER_FUNC_TYPE)mesh_plink_timer;
>>       sta->mesh->plink_timeout = timeout;
>>       add_timer(&sta->mesh->plink_timer);
>
> Wouldn't it be better to convert this to timer_setup() now?

The problem is that plink_timer is used in several places, and it's
originally initialized in net/mac80211/sta_info.c. The call to
mesh_plink_timer_set() does an update of the function field, so it
didn't look like it could get merged with the timer_setup(), but in
looking again, it seems that this is the _only_ update to
plink_timer.function, so it could probably get collapsed into the
timer_setup() call.

> That add_timer() should probably also be mod_timer() anyway?

Agreed. I'd avoided making those changes in most places, but I can do it here.

>> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
>> index 69615016d5bf..5e5de9455e4e 100644
>> --- a/net/mac80211/sta_info.c
>> +++ b/net/mac80211/sta_info.c
>> @@ -332,7 +332,7 @@ struct sta_info *sta_info_alloc(struct
>> ieee80211_sub_if_data *sdata,
>>               spin_lock_init(&sta->mesh->plink_lock);
>>               if (ieee80211_vif_is_mesh(&sdata->vif) &&
>>                   !sdata->u.mesh.user_mpm)
>> -                     init_timer(&sta->mesh->plink_timer);
>> +                     timer_setup(&sta->mesh->plink_timer, NULL,
>> 0);
>>               sta->mesh->nonpeer_pm = NL80211_MESH_POWER_ACTIVE;
>>       }
>
> You just have to make mesh_plink_timer() non-static, put a prototype
> into mesh.h and then you can use the proper timer_setup() here with the
> function?
>
> Also, the sta->mesh->plink_sta assignment should be here I'd say, no
> point rewriting it all the time.

Sounds good. I'll get it cleaned up.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* [PATCH] ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real
From: Matteo Croce @ 2017-10-05 17:03 UTC (permalink / raw)
  To: netdev, Erik Kline

Commit 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
was intended to affect accept_dad flag handling in such a way that
DAD operation and mode on a given interface would be selected
according to the maximum value of conf/{all,interface}/accept_dad.

However, addrconf_dad_begin() checks for particular cases in which we
need to skip DAD, and this check was modified in the wrong way.

Namely, it was modified so that, if the accept_dad flag is 0 for the
given interface *or* for all interfaces, DAD would be skipped.

We have instead to skip DAD if accept_dad is 0 for the given interface
*and* for all interfaces.

Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 net/ipv6/addrconf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 96861c702c06..4a96ebbf8eda 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3820,8 +3820,8 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 		goto out;
 
 	if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
-	    dev_net(dev)->ipv6.devconf_all->accept_dad < 1 ||
-	    idev->cnf.accept_dad < 1 ||
+	    (dev_net(dev)->ipv6.devconf_all->accept_dad < 1 &&
+	     idev->cnf.accept_dad < 1) ||
 	    !(ifp->flags&IFA_F_TENTATIVE) ||
 	    ifp->flags & IFA_F_NODAD) {
 		bump_id = ifp->flags & IFA_F_TENTATIVE;
-- 
2.13.6

^ permalink raw reply related

* [PATCH] ipv6: gso: fix payload length when gso_size is zero
From: Alexey Kodanev @ 2017-10-05 17:06 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Alexander Duyck, David Miller, Alexey Kodanev

When gso_size reset to zero for the tail segment in skb_segment(), later
in ipv6_gso_segment(), we will get incorrect payload_len for that segment.
inet_gso_segment() already has a check for gso_size before calculating
payload so fixing only IPv6 part.

The issue was found with LTP vxlan & gre tests over ixgbe NIC.

Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/ipv6/ip6_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index cdb3728..4a87f94 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -105,7 +105,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 
 	for (skb = segs; skb; skb = skb->next) {
 		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
-		if (gso_partial)
+		if (gso_partial && skb_is_gso(skb))
 			payload_len = skb_shinfo(skb)->gso_size +
 				      SKB_GSO_CB(skb)->data_offset +
 				      skb->head - (unsigned char *)(ipv6h + 1);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] isdn/gigaset: Convert timers to use timer_setup()
From: David Miller @ 2017-10-05 16:53 UTC (permalink / raw)
  To: pebolle
  Cc: keescook, isdn, johan, gigaset307x-common, netdev, tglx,
	linux-kernel
In-Reply-To: <1507190336.2167.5.camel@tiscali.nl>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Thu, 05 Oct 2017 09:58:56 +0200

> Hi Kees,
> 
> On Wed, 2017-10-04 at 17:52 -0700, Kees Cook wrote:
>> Also uses kzmalloc to replace open-coded field assignments to NULL and zero.
> 
> If I'm allowed to whine (chances that I'm allowed to do that are not so great
> as Dave tends to apply gigaset patches before I even have a chance to look at
> them properly!): I'd prefer it if that was done separately in a preceding
> patch. Would that bother you? 

Agreed, these timer transformations are already exhausting to review without
unrelated modifications sneaking in.

^ permalink raw reply

* Re: [PATCH 3/3] ARM: dts: gr-peach: Add ETHER pin group
From: Andrew Lunn @ 2017-10-05 16:48 UTC (permalink / raw)
  To: jacopo mondi; +Cc: Geert Uytterhoeven, Chris Brandt, f.fainelli, netdev
In-Reply-To: <20171005154239.GB19008@w540>

On Thu, Oct 05, 2017 at 05:42:39PM +0200, jacopo mondi wrote:
> Hi Andrew,
> 
> On Thu, Oct 05, 2017 at 03:43:39PM +0200, Andrew Lunn wrote:
> > On Thu, Oct 05, 2017 at 11:39:15AM +0200, jacopo mondi wrote:
> > > Hi Geert
> > >
> > > On Thu, Oct 05, 2017 at 11:09:40AM +0200, Geert Uytterhoeven wrote:
> > > > Hi Jacopo,
> > > >
> > > > On Thu, Oct 5, 2017 at 10:58 AM, Jacopo Mondi <jacopo+renesas@jmondi.org> wrote:
> > > > > Add pin configuration subnode for ETHER pin group and enable the interface.
> > > > >
> > > > > Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
> > > >
> > > > Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > >
> > > > > --- a/arch/arm/boot/dts/r7s72100-gr-peach.dts
> > > > > +++ b/arch/arm/boot/dts/r7s72100-gr-peach.dts
> > > >
> > > > > @@ -88,3 +110,19 @@
> > > > >
> > > > >         status = "okay";
> > > > >  };
> > > > > +
> > > > > +&ether {
> > > > > +       pinctrl-names = "default";
> > > > > +       pinctrl-0 = <&ether_pins>;
> > > > > +
> > > > > +       status = "okay";
> > > > > +
> > > > > +       reset-gpios = <&port4 2 GPIO_ACTIVE_LOW>;
> > > > > +       reset-delay-us = <5>;
> > > >
> > > > I'm afraid the PHY people (not CCed ;-) will want you to move these reset
> > > > properties to the phy subnode these days, despite
> > > > Documentation/devicetree/bindings/net/mdio.txt...
> >
> > Hi Jocopo
> >
> > So what is this reset resetting?
> >
> > The MAC?
> > The PHY?
> 
> The reset line goes from our SoC to LAN8710A PHY chip external reset pin.

So yes, this is a PHY property, and should be in the PHY node.

Documentation/devicetree/bindings/net/mdio.txt does not apply here
anyway. That is for an MDIO binding. This node is an ethernet MAC.

So your binding whats to look something like

        ether: ethernet@e8203000 {
                compatible = "renesas,ether-r7s72100";
                reg = <0xe8203000 0x800>,
                      <0xe8204800 0x200>;
                interrupts = <GIC_SPI 327 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&mstp7_clks R7S72100_CLK_ETHER>;
                power-domains = <&cpg_clocks>;
                phy-mode = "mii";
		phy-handle = <&phy0>;
                #address-cells = <1>;
                #size-cells = <0>;

                mdio: bus-bus {
                      #address-cells = <1>;
                      #size-cells = <0>;

                      phy0: ethernet-phy@1 {
                            reg = <1>;
                            reset-gpios = <&port4 2 GPIO_ACTIVE_LOW>;
                            reset-delay-us = <5>;
                      };
                };
        };

	Andrew

^ permalink raw reply

* [PATCH v3] netfilter: SYNPROXY: fix process non tcp packet bug in {ipv4,ipv6}_synproxy_hook
From: Lin Zhang @ 2017-10-05 16:44 UTC (permalink / raw)
  To: pablo, kadlec, fw, davem, kuznet, yoshfuji
  Cc: netfilter-devel, coreteam, netdev, Lin Zhang

In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet,
but the real server maybe reply an icmp error packet related to the
exist tcp conntrack, so we will access wrong tcp data.

For fix it, check for the protocol field and only process tcp traffic.

Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
---
changelog:

v1 -> v2:
	* we only deal with TCP traffic, per Pablo Neira.
v2 -> v3:
	* fix parse L4 protocol field error in ipv6
---
 net/ipv4/netfilter/ipt_SYNPROXY.c  | 3 ++-
 net/ipv6/netfilter/ip6t_SYNPROXY.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c b/net/ipv4/netfilter/ipt_SYNPROXY.c
index 811689e..f75fc6b 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -330,7 +330,8 @@ static unsigned int ipv4_synproxy_hook(void *priv,
 	if (synproxy == NULL)
 		return NF_ACCEPT;
 
-	if (nf_is_loopback_packet(skb))
+	if (nf_is_loopback_packet(skb) ||
+	    ip_hdr(skb)->protocol != IPPROTO_TCP)
 		return NF_ACCEPT;
 
 	thoff = ip_hdrlen(skb);
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index a5cd43d..437af8c 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -353,7 +353,7 @@ static unsigned int ipv6_synproxy_hook(void *priv,
 	nexthdr = ipv6_hdr(skb)->nexthdr;
 	thoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
 				 &frag_off);
-	if (thoff < 0)
+	if (thoff < 0 || nexthdr != IPPROTO_TCP)
 		return NF_ACCEPT;
 
 	th = skb_header_pointer(skb, thoff, sizeof(_th), &_th);
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next v7 1/5] bpf: perf event change needed for subsequent bpf helpers
From: Yonghong Song @ 2017-10-05 16:19 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20171005161923.332790-1-yhs@fb.com>

This patch does not impact existing functionalities.
It contains the changes in perf event area needed for
subsequent bpf_perf_event_read_value and
bpf_perf_prog_read_value helpers.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/perf_event.h |  7 +++++--
 kernel/bpf/arraymap.c      |  2 +-
 kernel/events/core.c       | 15 +++++++++++++--
 kernel/trace/bpf_trace.c   |  2 +-
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8e22f24..79b18a2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -806,6 +806,7 @@ struct perf_output_handle {
 struct bpf_perf_event_data_kern {
 	struct pt_regs *regs;
 	struct perf_sample_data *data;
+	struct perf_event *event;
 };
 
 #ifdef CONFIG_CGROUP_PERF
@@ -884,7 +885,8 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				void *context);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
 				int src_cpu, int dst_cpu);
-int perf_event_read_local(struct perf_event *event, u64 *value);
+int perf_event_read_local(struct perf_event *event, u64 *value,
+			  u64 *enabled, u64 *running);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
 
@@ -1286,7 +1288,8 @@ static inline const struct perf_event_attr *perf_event_attrs(struct perf_event *
 {
 	return ERR_PTR(-EINVAL);
 }
-static inline int perf_event_read_local(struct perf_event *event, u64 *value)
+static inline int perf_event_read_local(struct perf_event *event, u64 *value,
+					u64 *enabled, u64 *running)
 {
 	return -EINVAL;
 }
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..68d8666 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -492,7 +492,7 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map *map,
 
 	ee = ERR_PTR(-EOPNOTSUPP);
 	event = perf_file->private_data;
-	if (perf_event_read_local(event, &value) == -EOPNOTSUPP)
+	if (perf_event_read_local(event, &value, NULL, NULL) == -EOPNOTSUPP)
 		goto err_out;
 
 	ee = bpf_event_entry_gen(perf_file, map_file);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6bc21e2..902149f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3684,10 +3684,12 @@ static inline u64 perf_event_count(struct perf_event *event)
  *     will not be local and we cannot read them atomically
  *   - must not have a pmu::count method
  */
-int perf_event_read_local(struct perf_event *event, u64 *value)
+int perf_event_read_local(struct perf_event *event, u64 *value,
+			  u64 *enabled, u64 *running)
 {
 	unsigned long flags;
 	int ret = 0;
+	u64 now;
 
 	/*
 	 * Disabling interrupts avoids all counter scheduling (context
@@ -3718,13 +3720,21 @@ int perf_event_read_local(struct perf_event *event, u64 *value)
 		goto out;
 	}
 
+	now = event->shadow_ctx_time + perf_clock();
+	if (enabled)
+		*enabled = now - event->tstamp_enabled;
 	/*
 	 * If the event is currently on this CPU, its either a per-task event,
 	 * or local to this CPU. Furthermore it means its ACTIVE (otherwise
 	 * oncpu == -1).
 	 */
-	if (event->oncpu == smp_processor_id())
+	if (event->oncpu == smp_processor_id()) {
 		event->pmu->read(event);
+		if (running)
+			*running = now - event->tstamp_running;
+	} else if (running) {
+		*running = event->total_time_running;
+	}
 
 	*value = local64_read(&event->count);
 out:
@@ -8072,6 +8082,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 	struct bpf_perf_event_data_kern ctx = {
 		.data = data,
 		.regs = regs,
+		.event = event,
 	};
 	int ret = 0;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index dc498b6..95888ae 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -275,7 +275,7 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 	if (!ee)
 		return -ENOENT;
 
-	err = perf_event_read_local(ee->event, &value);
+	err = perf_event_read_local(ee->event, &value, NULL, NULL);
 	/*
 	 * this api is ugly since we miss [-22..-2] range of valid
 	 * counter values, but that's uapi
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v7 2/5] bpf: add helper bpf_perf_event_read_value for perf event array map
From: Yonghong Song @ 2017-10-05 16:19 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20171005161923.332790-1-yhs@fb.com>

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch adds helper bpf_perf_event_read_value for kprobed based perf
event array map, to read perf counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.
To achieve scaling factor between two bpf invocations, users
can can use cpu_id as the key (which is typical for perf array usage model)
to remember the previous value and do the calculation inside the
bpf program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h | 21 +++++++++++++++++++--
 kernel/bpf/verifier.c    |  4 +++-
 kernel/trace/bpf_trace.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cb2b9f9..93f3d1f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -641,6 +641,14 @@ union bpf_attr {
  *     @xdp_md: pointer to xdp_md
  *     @delta: An positive/negative integer to be added to xdp_md.data_meta
  *     Return: 0 on success or negative on error
+ *
+ * int bpf_perf_event_read_value(map, flags, buf, buf_size)
+ *     read perf event counter value and perf event enabled/running time
+ *     @map: pointer to perf_event_array map
+ *     @flags: index of event in the map or bitmask flags
+ *     @buf: buf to fill
+ *     @buf_size: size of the buf
+ *     Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -697,7 +705,8 @@ union bpf_attr {
 	FN(redirect_map),		\
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
-	FN(xdp_adjust_meta),
+	FN(xdp_adjust_meta),		\
+	FN(perf_event_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -741,7 +750,9 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
 #define BPF_F_DONT_FRAGMENT		(1ULL << 2)
 
-/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
+/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
+ * BPF_FUNC_perf_event_read_value flags.
+ */
 #define BPF_F_INDEX_MASK		0xffffffffULL
 #define BPF_F_CURRENT_CPU		BPF_F_INDEX_MASK
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
@@ -934,4 +945,10 @@ enum {
 #define TCP_BPF_IW		1001	/* Set TCP initial congestion window */
 #define TCP_BPF_SNDCWND_CLAMP	1002	/* Set sndcwnd_clamp */
 
+struct bpf_perf_event_value {
+	__u64 counter;
+	__u64 enabled;
+	__u64 running;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 52b0223..590125e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1552,7 +1552,8 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		break;
 	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
 		if (func_id != BPF_FUNC_perf_event_read &&
-		    func_id != BPF_FUNC_perf_event_output)
+		    func_id != BPF_FUNC_perf_event_output &&
+		    func_id != BPF_FUNC_perf_event_read_value)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -1595,6 +1596,7 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		break;
 	case BPF_FUNC_perf_event_read:
 	case BPF_FUNC_perf_event_output:
+	case BPF_FUNC_perf_event_read_value:
 		if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
 			goto error;
 		break;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 95888ae..0be86cc 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -255,14 +255,14 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
 	return &bpf_trace_printk_proto;
 }
 
-BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
+static __always_inline int
+get_map_perf_counter(struct bpf_map *map, u64 flags,
+		     u64 *value, u64 *enabled, u64 *running)
 {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	unsigned int cpu = smp_processor_id();
 	u64 index = flags & BPF_F_INDEX_MASK;
 	struct bpf_event_entry *ee;
-	u64 value = 0;
-	int err;
 
 	if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
 		return -EINVAL;
@@ -275,7 +275,15 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 	if (!ee)
 		return -ENOENT;
 
-	err = perf_event_read_local(ee->event, &value, NULL, NULL);
+	return perf_event_read_local(ee->event, value, enabled, running);
+}
+
+BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
+{
+	u64 value = 0;
+	int err;
+
+	err = get_map_perf_counter(map, flags, &value, NULL, NULL);
 	/*
 	 * this api is ugly since we miss [-22..-2] range of valid
 	 * counter values, but that's uapi
@@ -293,6 +301,33 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_perf_event_read_value, struct bpf_map *, map, u64, flags,
+	   struct bpf_perf_event_value *, buf, u32, size)
+{
+	int err = -EINVAL;
+
+	if (unlikely(size != sizeof(struct bpf_perf_event_value)))
+		goto clear;
+	err = get_map_perf_counter(map, flags, &buf->counter, &buf->enabled,
+				   &buf->running);
+	if (unlikely(err))
+		goto clear;
+	return 0;
+clear:
+	memset(buf, 0, size);
+	return err;
+}
+
+static const struct bpf_func_proto bpf_perf_event_read_value_proto = {
+	.func		= bpf_perf_event_read_value,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
+};
+
 static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd);
 
 static __always_inline u64
@@ -499,6 +534,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 		return &bpf_perf_event_output_proto;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto;
+	case BPF_FUNC_perf_event_read_value:
+		return &bpf_perf_event_read_value_proto;
 	default:
 		return tracing_func_proto(func_id);
 	}
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v7 5/5] bpf: add a test case for helper bpf_perf_prog_read_value
From: Yonghong Song @ 2017-10-05 16:19 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20171005161923.332790-1-yhs@fb.com>

The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 samples/bpf/trace_event_kern.c            | 10 ++++++++++
 samples/bpf/trace_event_user.c            | 13 ++++++++-----
 tools/include/uapi/linux/bpf.h            |  3 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/samples/bpf/trace_event_kern.c b/samples/bpf/trace_event_kern.c
index 41b6115..a77a583d 100644
--- a/samples/bpf/trace_event_kern.c
+++ b/samples/bpf/trace_event_kern.c
@@ -37,10 +37,14 @@ struct bpf_map_def SEC("maps") stackmap = {
 SEC("perf_event")
 int bpf_prog1(struct bpf_perf_event_data *ctx)
 {
+	char time_fmt1[] = "Time Enabled: %llu, Time Running: %llu";
+	char time_fmt2[] = "Get Time Failed, ErrCode: %d";
 	char fmt[] = "CPU-%d period %lld ip %llx";
 	u32 cpu = bpf_get_smp_processor_id();
+	struct bpf_perf_event_value value_buf;
 	struct key_t key;
 	u64 *val, one = 1;
+	int ret;
 
 	if (ctx->sample_period < 10000)
 		/* ignore warmup */
@@ -54,6 +58,12 @@ int bpf_prog1(struct bpf_perf_event_data *ctx)
 		return 0;
 	}
 
+	ret = bpf_perf_prog_read_value(ctx, (void *)&value_buf, sizeof(struct bpf_perf_event_value));
+	if (!ret)
+	  bpf_trace_printk(time_fmt1, sizeof(time_fmt1), value_buf.enabled, value_buf.running);
+	else
+	  bpf_trace_printk(time_fmt2, sizeof(time_fmt2), ret);
+
 	val = bpf_map_lookup_elem(&counts, &key);
 	if (val)
 		(*val)++;
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 7bd827b..bf4f1b6 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -127,6 +127,9 @@ static void test_perf_event_all_cpu(struct perf_event_attr *attr)
 	int *pmu_fd = malloc(nr_cpus * sizeof(int));
 	int i, error = 0;
 
+	/* system wide perf event, no need to inherit */
+	attr->inherit = 0;
+
 	/* open perf_event on all cpus */
 	for (i = 0; i < nr_cpus; i++) {
 		pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
@@ -154,6 +157,11 @@ static void test_perf_event_task(struct perf_event_attr *attr)
 {
 	int pmu_fd;
 
+	/* per task perf event, enable inherit so the "dd ..." command can be traced properly.
+	 * Enabling inherit will cause bpf_perf_prog_read_time helper failure.
+	 */
+	attr->inherit = 1;
+
 	/* open task bound event */
 	pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
 	if (pmu_fd < 0) {
@@ -175,14 +183,12 @@ static void test_bpf_perf_event(void)
 		.freq = 1,
 		.type = PERF_TYPE_HARDWARE,
 		.config = PERF_COUNT_HW_CPU_CYCLES,
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_type_sw = {
 		.sample_freq = SAMPLE_FREQ,
 		.freq = 1,
 		.type = PERF_TYPE_SOFTWARE,
 		.config = PERF_COUNT_SW_CPU_CLOCK,
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_hw_cache_l1d = {
 		.sample_freq = SAMPLE_FREQ,
@@ -192,7 +198,6 @@ static void test_bpf_perf_event(void)
 			PERF_COUNT_HW_CACHE_L1D |
 			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
 			(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_hw_cache_branch_miss = {
 		.sample_freq = SAMPLE_FREQ,
@@ -202,7 +207,6 @@ static void test_bpf_perf_event(void)
 			PERF_COUNT_HW_CACHE_BPU |
 			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
 			(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
-		.inherit = 1,
 	};
 	struct perf_event_attr attr_type_raw = {
 		.sample_freq = SAMPLE_FREQ,
@@ -210,7 +214,6 @@ static void test_bpf_perf_event(void)
 		.type = PERF_TYPE_RAW,
 		/* Intel Instruction Retired */
 		.config = 0xc0,
-		.inherit = 1,
 	};
 
 	printf("Test HW_CPU_CYCLES\n");
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index cdf6c4f..0894fd2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -698,7 +698,8 @@ union bpf_attr {
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
 	FN(xdp_adjust_meta),		\
-	FN(perf_event_read_value),
+	FN(perf_event_read_value),	\
+	FN(perf_prog_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index c15ca83..e25dbf6 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -75,6 +75,9 @@ static int (*bpf_sock_map_update)(void *map, void *key, void *value,
 static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
 					void *buf, unsigned int buf_size) =
 	(void *) BPF_FUNC_perf_event_read_value;
+static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
+				       unsigned int buf_size) =
+	(void *) BPF_FUNC_perf_prog_read_value;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v7 4/5] bpf: add helper bpf_perf_prog_read_value
From: Yonghong Song @ 2017-10-05 16:19 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20171005161923.332790-1-yhs@fb.com>

This patch adds helper bpf_perf_prog_read_cvalue for perf event based bpf
programs, to read event counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.

The typical use case for perf event based bpf program is to attach itself
to a single event. In such cases, if it is desirable to get scaling factor
between two bpf invocations, users can can save the time values in a map,
and use the value from the map and the current value to calculate
the scaling factor.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h | 10 +++++++++-
 kernel/trace/bpf_trace.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 93f3d1f..5dc44d1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -649,6 +649,13 @@ union bpf_attr {
  *     @buf: buf to fill
  *     @buf_size: size of the buf
  *     Return: 0 on success or negative error code
+ *
+ * int bpf_perf_prog_read_value(ctx, buf, buf_size)
+ *     read perf prog attached perf event counter and enabled/running time
+ *     @ctx: pointer to ctx
+ *     @buf: buf to fill
+ *     @buf_size: size of the buf
+ *     Return : 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -706,7 +713,8 @@ union bpf_attr {
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
 	FN(xdp_adjust_meta),		\
-	FN(perf_event_read_value),
+	FN(perf_event_read_value),	\
+	FN(perf_prog_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0be86cc..04ea531 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -613,6 +613,32 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_perf_prog_read_value_tp, struct bpf_perf_event_data_kern *, ctx,
+	   struct bpf_perf_event_value *, buf, u32, size)
+{
+	int err = -EINVAL;
+
+	if (unlikely(size != sizeof(struct bpf_perf_event_value)))
+		goto clear;
+	err = perf_event_read_local(ctx->event, &buf->counter, &buf->enabled,
+				    &buf->running);
+	if (unlikely(err))
+		goto clear;
+	return 0;
+clear:
+	memset(buf, 0, size);
+	return err;
+}
+
+static const struct bpf_func_proto bpf_perf_prog_read_value_proto_tp = {
+         .func           = bpf_perf_prog_read_value_tp,
+         .gpl_only       = true,
+         .ret_type       = RET_INTEGER,
+         .arg1_type      = ARG_PTR_TO_CTX,
+         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
+         .arg3_type      = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -620,6 +646,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 		return &bpf_perf_event_output_proto_tp;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_tp;
+	case BPF_FUNC_perf_prog_read_value:
+		return &bpf_perf_prog_read_value_proto_tp;
 	default:
 		return tracing_func_proto(func_id);
 	}
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v7 3/5] bpf: add a test case for helper bpf_perf_event_read_value
From: Yonghong Song @ 2017-10-05 16:19 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20171005161923.332790-1-yhs@fb.com>

The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 samples/bpf/tracex6_kern.c                | 26 ++++++++++++++++++++++++++
 samples/bpf/tracex6_user.c                | 13 ++++++++++++-
 tools/include/uapi/linux/bpf.h            |  3 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/tracex6_kern.c b/samples/bpf/tracex6_kern.c
index e7d1803..46c557a 100644
--- a/samples/bpf/tracex6_kern.c
+++ b/samples/bpf/tracex6_kern.c
@@ -15,6 +15,12 @@ struct bpf_map_def SEC("maps") values = {
 	.value_size = sizeof(u64),
 	.max_entries = 64,
 };
+struct bpf_map_def SEC("maps") values2 = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct bpf_perf_event_value),
+	.max_entries = 64,
+};
 
 SEC("kprobe/htab_map_get_next_key")
 int bpf_prog1(struct pt_regs *ctx)
@@ -37,5 +43,25 @@ int bpf_prog1(struct pt_regs *ctx)
 	return 0;
 }
 
+SEC("kprobe/htab_map_lookup_elem")
+int bpf_prog2(struct pt_regs *ctx)
+{
+	u32 key = bpf_get_smp_processor_id();
+	struct bpf_perf_event_value *val, buf;
+	int error;
+
+	error = bpf_perf_event_read_value(&counters, key, &buf, sizeof(buf));
+	if (error)
+		return 0;
+
+	val = bpf_map_lookup_elem(&values2, &key);
+	if (val)
+		*val = buf;
+	else
+		bpf_map_update_elem(&values2, &key, &buf, BPF_NOEXIST);
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index a05a99a..3341a96 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -22,6 +22,7 @@
 
 static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 {
+	struct bpf_perf_event_value value2;
 	int pmu_fd, error = 0;
 	cpu_set_t set;
 	__u64 value;
@@ -46,8 +47,18 @@ static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 		fprintf(stderr, "Value missing for CPU %d\n", cpu);
 		error = 1;
 		goto on_exit;
+	} else {
+		fprintf(stderr, "CPU %d: %llu\n", cpu, value);
+	}
+	/* The above bpf_map_lookup_elem should trigger the second kprobe */
+	if (bpf_map_lookup_elem(map_fd[2], &cpu, &value2)) {
+		fprintf(stderr, "Value2 missing for CPU %d\n", cpu);
+		error = 1;
+		goto on_exit;
+	} else {
+		fprintf(stderr, "CPU %d: counter: %llu, enabled: %llu, running: %llu\n", cpu,
+			value2.counter, value2.enabled, value2.running);
 	}
-	fprintf(stderr, "CPU %d: %llu\n", cpu, value);
 
 on_exit:
 	assert(bpf_map_delete_elem(map_fd[0], &cpu) == 0 || error);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index cb2b9f9..cdf6c4f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -697,7 +697,8 @@ union bpf_attr {
 	FN(redirect_map),		\
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
-	FN(xdp_adjust_meta),
+	FN(xdp_adjust_meta),		\
+	FN(perf_event_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index a56053d..c15ca83 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -72,6 +72,9 @@ static int (*bpf_sk_redirect_map)(void *map, int key, int flags) =
 static int (*bpf_sock_map_update)(void *map, void *key, void *value,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_sock_map_update;
+static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
+					void *buf, unsigned int buf_size) =
+	(void *) BPF_FUNC_perf_event_read_value;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v7 0/5] bpf: add two helpers to read perf event enabled/running time
From: Yonghong Song @ 2017-10-05 16:19 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch set implements two helper functions.
The helper bpf_perf_event_read_value reads counter/time_enabled/time_running
for perf event array map. The helper bpf_perf_prog_read_value read
counter/time_enabled/time_running for bpf prog with type BPF_PROG_TYPE_PERF_EVENT.

[Dave, Peter,

 Patch set has been restructured such that all perf (non-bpf) related changes
 are in the first commit. The whole patch set should be merged into net-next
 and the first commit cherry-picked to tip to avoid conflicts
 during the merge window.

 Thanks!
]

Changelogs:
v6->v7:
  . restructure the patch set so that all perf related changes are
    in the first one to make cherry-pick easier,
  . address Daniel's comments to clear user buffer in all error cases
    for both helpers.
v5->v6:
  . rebase on top of net-next.
v4->v5:
  . fix some coding style issues,
  . memset the input buffer in case of error for ARG_PTR_TO_UNINIT_MEM
    type of argument.
v3->v4:
  . fix a build failure.
v2->v3:
  . counters should be read in order to read enabled/running time. This is to
    prevent that counters and enabled/running time may be read separately.
v1->v2:
  . reading enabled/running time should be together with reading counters
    which contains the logic to ensure the result is valid.

Yonghong Song (5):
  bpf: perf event change needed for subsequent bpf helpers
  bpf: add helper bpf_perf_event_read_value for perf event array map
  bpf: add a test case for helper bpf_perf_event_read_value
  bpf: add helper bpf_perf_prog_read_value
  bpf: add a test case for helper bpf_perf_prog_read_value

 include/linux/perf_event.h                |  7 ++-
 include/uapi/linux/bpf.h                  | 29 +++++++++++-
 kernel/bpf/arraymap.c                     |  2 +-
 kernel/bpf/verifier.c                     |  4 +-
 kernel/events/core.c                      | 15 ++++++-
 kernel/trace/bpf_trace.c                  | 73 +++++++++++++++++++++++++++++--
 samples/bpf/trace_event_kern.c            | 10 +++++
 samples/bpf/trace_event_user.c            | 13 +++---
 samples/bpf/tracex6_kern.c                | 26 +++++++++++
 samples/bpf/tracex6_user.c                | 13 +++++-
 tools/include/uapi/linux/bpf.h            |  4 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  6 +++
 12 files changed, 183 insertions(+), 19 deletions(-)

-- 
2.9.5

^ permalink raw reply

* Re: [PATCH RFC 1/2] virtio_net: implement VIRTIO_CONFIG_S_NEEDS_RESET
From: Willem de Bruijn @ 2017-10-05 16:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Network Development, Jason Wang, virtualization, Willem de Bruijn
In-Reply-To: <20170829231739-mutt-send-email-mst@kernel.org>

On Tue, Aug 29, 2017 at 4:20 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Aug 29, 2017 at 04:07:58PM -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Implement the reset communication request defined in the VIRTIO 1.0
>> specification and introduces in Linux in commit c00bbcf862896 ("virtio:
>> add VIRTIO_CONFIG_S_NEEDS_RESET device status bit").
>>
>> Since that patch, the virtio-net driver has added a virtnet_reset
>> function that implements the requested behavior through calls to the
>> power management freeze and restore functions.
>>
>> That function has recently been reverted when its sole caller was
>> updated. Bring it back and listen for the request from the host on
>> the config channel.
>>
>> Implement the feature analogous to other config requests. In
>> particular, acknowledge the request in the same manner as the
>> NET_S_ANNOUNCE link announce request, by responding with a new
>> VIRTIO_NET_CTRL_${TYPE} command. On reception, the host must check
>> the config status register for success or failure.
>
>
> Pls make it clearer why do you need these interface extensions.
>
>> The existing freeze handler verifies that no config changes are
>> running concurrently. Elide this check for reset. The request is
>> always handled from the config workqueue. No other config requests
>> can be active or scheduled concurrently on vi->config.
>
> You need to prevent packet TX from being in progress.

I had another look at this.

As of commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
virtnet_freeze_down calls synchronize_net() after stopping the queues
to quiesce the device before any further actions. This should suffice for
virtnet_reset, too.

The control path can indeed be much simpler than my initial patchset.
The host can read vdev->status to observe when the reset went through.
It adds flag VIRTIO_CONFIG_S_NEEDS_RESET to request the reset.
This flag is cleared by the operation itself.

>
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>  drivers/net/virtio_net.c        | 69 +++++++++++++++++++++++++++++++++++------
>>  include/uapi/linux/virtio_net.h |  4 +++
>
> virtio dev or another virtio TC list must be Cc'd on any proposed API changes.
>
>
>>  2 files changed, 64 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 52ae78ca3d38..5e349226f7c1 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1458,12 +1458,11 @@ static void virtnet_netpoll(struct net_device *dev)
>>  }
>>  #endif
>>
>> -static void virtnet_ack_link_announce(struct virtnet_info *vi)
>> +static void virtnet_ack(struct virtnet_info *vi, u8 class, u8 cmd)
>>  {
>>       rtnl_lock();
>> -     if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
>> -                               VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL))
>> -             dev_warn(&vi->dev->dev, "Failed to ack link announce.\n");
>> +     if (!virtnet_send_command(vi, class, cmd, NULL))
>> +             dev_warn(&vi->dev->dev, "Failed to ack %u.%u\n", class, cmd);
>>       rtnl_unlock();
>>  }
>>
>> @@ -1857,13 +1856,16 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>>       .set_link_ksettings = virtnet_set_link_ksettings,
>>  };
>>
>> -static void virtnet_freeze_down(struct virtio_device *vdev)
>> +static void virtnet_freeze_down(struct virtio_device *vdev, bool in_reset)
>>  {
>>       struct virtnet_info *vi = vdev->priv;
>>       int i;
>>
>> -     /* Make sure no work handler is accessing the device */
>> -     flush_work(&vi->config_work);
>> +     /* Make sure no work handler is accessing the device,
>> +      * unless this call is made from the reset work handler itself.
>> +      */
>> +     if (!in_reset)
>> +             flush_work(&vi->config_work);
>>
>>       netif_device_detach(vi->dev);
>>       netif_tx_disable(vi->dev);
>> @@ -1878,6 +1880,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>>  }
>>
>>  static int init_vqs(struct virtnet_info *vi);
>> +static void remove_vq_common(struct virtnet_info *vi);
>>
>>  static int virtnet_restore_up(struct virtio_device *vdev)
>>  {
>> @@ -1906,6 +1909,45 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>>       return err;
>>  }
>>
>> +static int virtnet_reset(struct virtnet_info *vi)
>> +{
>> +     struct virtio_device *dev = vi->vdev;
>> +     bool failed;
>> +     int ret;
>> +
>> +     virtio_config_disable(dev);
>> +     failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
>> +     virtnet_freeze_down(dev, true);
>> +     remove_vq_common(vi);
>> +
>> +     virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>> +     virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>> +
>> +     /* Restore the failed status (see virtio_device_restore). */
>> +     if (failed)
>> +             virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>> +
>> +     ret = virtio_finalize_features(dev);
>> +     if (ret)
>> +             goto err;
>> +
>> +     ret = virtnet_restore_up(dev);
>> +     if (ret)
>> +             goto err;
>> +
>> +     ret = virtnet_set_queues(vi, vi->curr_queue_pairs);
>> +     if (ret)
>> +             goto err;
>> +
>> +     virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>> +     virtio_config_enable(dev);
>> +     return 0;
>> +
>> +err:
>> +     virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>> +     return ret;
>> +}
>> +
>>  static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
>>  {
>>       struct scatterlist sg;
>> @@ -2085,7 +2127,16 @@ static void virtnet_config_changed_work(struct work_struct *work)
>>
>>       if (v & VIRTIO_NET_S_ANNOUNCE) {
>>               netdev_notify_peers(vi->dev);
>> -             virtnet_ack_link_announce(vi);
>> +             virtnet_ack(vi, VIRTIO_NET_CTRL_ANNOUNCE,
>> +                         VIRTIO_NET_CTRL_ANNOUNCE_ACK);
>> +     }
>> +
>> +     if (vi->vdev->config->get_status(vi->vdev) &
>> +         VIRTIO_CONFIG_S_NEEDS_RESET) {
>> +             virtnet_reset(vi);
>> +             virtnet_ack(vi, VIRTIO_NET_CTRL_RESET,
>> +                         VIRTIO_NET_CTRL_RESET_ACK);
>> +
>>       }
>>
>>       /* Ignore unknown (future) status bits */
>> @@ -2708,7 +2759,7 @@ static __maybe_unused int virtnet_freeze(struct virtio_device *vdev)
>>       struct virtnet_info *vi = vdev->priv;
>>
>>       virtnet_cpu_notif_remove(vi);
>> -     virtnet_freeze_down(vdev);
>> +     virtnet_freeze_down(vdev, false);
>>       remove_vq_common(vi);
>>
>>       return 0;
>> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
>> index fc353b518288..188fdc528f13 100644
>> --- a/include/uapi/linux/virtio_net.h
>> +++ b/include/uapi/linux/virtio_net.h
>> @@ -245,4 +245,8 @@ struct virtio_net_ctrl_mq {
>>  #define VIRTIO_NET_CTRL_GUEST_OFFLOADS   5
>>  #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET        0
>>
>> +/* Signal that the driver received and executed the reset command. */
>> +#define VIRTIO_NET_CTRL_RESET                        6
>> +#define VIRTIO_NET_CTRL_RESET_ACK            0
>> +
>>  #endif /* _UAPI_LINUX_VIRTIO_NET_H */
>> --
>> 2.14.1.342.g6490525c54-goog

^ permalink raw reply

* Re: [PATCH net-next v3 0/2] libbpf: support more map options
From: Alexei Starovoitov @ 2017-10-05 16:17 UTC (permalink / raw)
  To: Craig Gallek, Daniel Borkmann, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev
In-Reply-To: <20171005144158.14860-1-kraigatgoog@gmail.com>

On 10/5/17 7:41 AM, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> The functional change to this series is the ability to use flags when
> creating maps from object files loaded by libbpf.  In order to do this,
> the first patch updates the library to handle map definitions that
> differ in size from libbpf's struct bpf_map_def.
>
> For object files with a larger map definition, libbpf will continue to load
> if the unknown fields are all zero, otherwise the map is rejected.  If the
> map definition in the object file is smaller than expected, libbpf will use
> zero as a default value in the missing fields.
>
> Craig Gallek (2):
>   libbpf: parse maps sections of varying size
>   libbpf: use map_flags when creating maps
>
>  tools/lib/bpf/libbpf.c | 72 +++++++++++++++++++++++++++++---------------------
>  tools/lib/bpf/libbpf.h |  1 +
>  2 files changed, 43 insertions(+), 30 deletions(-)

lgtm
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH] net: phy: DP83822 initial driver submission (fwd)
From: Dan Murphy @ 2017-10-05 16:14 UTC (permalink / raw)
  To: Julia Lawall; +Cc: andrew, f.fainelli, netdev, kbuild-all
In-Reply-To: <alpine.DEB.2.20.1710051809480.3168@hadrien>

Julia

On 10/05/2017 11:10 AM, Julia Lawall wrote:
> DP83822_WOL_CLR_INDICATION appears twice on line 136.  Perhaps this is not
> what is wanted.
> 

That line is wrong it should have DP83822_WOL_INDICATION_SEL included as well.

Onto v3.

Dan

> julia
> 
> ---------- Forwarded message ----------
> Date: Thu, 5 Oct 2017 21:38:28 +0800
> From: kbuild test robot <fengguang.wu@intel.com>
> To: kbuild@01.org
> Cc: Julia Lawall <julia.lawall@lip6.fr>
> Subject: Re: [PATCH] net: phy: DP83822 initial driver submission
> 
> CC: kbuild-all@01.org
> In-Reply-To: <20171003155316.12312-1-dmurphy@ti.com>
> TO: Dan Murphy <dmurphy@ti.com>
> CC: andrew@lunn.ch, f.fainelli@gmail.com, netdev@vger.kernel.org, Dan Murphy <dmurphy@ti.com>
> CC: netdev@vger.kernel.org, Dan Murphy <dmurphy@ti.com>
> 
> Hi Dan,
> 
> [auto build test WARNING on net-next/master]
> [also build test WARNING on v4.14-rc3 next-20170929]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Dan-Murphy/net-phy-DP83822-initial-driver-submission/20171005-165547
> :::::: branch date: 5 hours ago
> :::::: commit date: 5 hours ago
> 
>>> drivers/net/phy/dp83822.c:136:29-55: duplicated argument to & or |
> 
> # https://github.com/0day-ci/linux/commit/49190df6a2304f031dc2a6ac63710447db36bc23
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 49190df6a2304f031dc2a6ac63710447db36bc23
> vim +136 drivers/net/phy/dp83822.c
> 
> 49190df6 Dan Murphy 2017-10-03   90
> 49190df6 Dan Murphy 2017-10-03   91  static int dp83822_set_wol(struct phy_device *phydev,
> 49190df6 Dan Murphy 2017-10-03   92  			   struct ethtool_wolinfo *wol)
> 49190df6 Dan Murphy 2017-10-03   93  {
> 49190df6 Dan Murphy 2017-10-03   94  	struct net_device *ndev = phydev->attached_dev;
> 49190df6 Dan Murphy 2017-10-03   95  	u16 value;
> 49190df6 Dan Murphy 2017-10-03   96  	const u8 *mac;
> 49190df6 Dan Murphy 2017-10-03   97
> 49190df6 Dan Murphy 2017-10-03   98  	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
> 49190df6 Dan Murphy 2017-10-03   99  		mac = (const u8 *)ndev->dev_addr;
> 49190df6 Dan Murphy 2017-10-03  100
> 49190df6 Dan Murphy 2017-10-03  101  		if (!is_valid_ether_addr(mac))
> 49190df6 Dan Murphy 2017-10-03  102  			return -EFAULT;
> 49190df6 Dan Murphy 2017-10-03  103
> 49190df6 Dan Murphy 2017-10-03  104  		/* MAC addresses start with byte 5, but stored in mac[0].
> 49190df6 Dan Murphy 2017-10-03  105  		 * 822 PHYs store bytes 4|5, 2|3, 0|1
> 49190df6 Dan Murphy 2017-10-03  106  		 */
> 49190df6 Dan Murphy 2017-10-03  107  		phy_write_mmd(phydev, DP83822_DEVADDR,
> 49190df6 Dan Murphy 2017-10-03  108  			      MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
> 49190df6 Dan Murphy 2017-10-03  109  		phy_write_mmd(phydev, DP83822_DEVADDR,
> 49190df6 Dan Murphy 2017-10-03  110  			      MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
> 49190df6 Dan Murphy 2017-10-03  111  		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
> 49190df6 Dan Murphy 2017-10-03  112  			      (mac[5] << 8) | mac[4]);
> 49190df6 Dan Murphy 2017-10-03  113
> 49190df6 Dan Murphy 2017-10-03  114  		value = phy_read_mmd(phydev, DP83822_DEVADDR,
> 49190df6 Dan Murphy 2017-10-03  115  				     MII_DP83822_WOL_CFG);
> 49190df6 Dan Murphy 2017-10-03  116  		if (wol->wolopts & WAKE_MAGIC)
> 49190df6 Dan Murphy 2017-10-03  117  			value |= DP83822_WOL_MAGIC_EN;
> 49190df6 Dan Murphy 2017-10-03  118  		else
> 49190df6 Dan Murphy 2017-10-03  119  			value &= ~DP83822_WOL_MAGIC_EN;
> 49190df6 Dan Murphy 2017-10-03  120
> 49190df6 Dan Murphy 2017-10-03  121  		if (wol->wolopts & WAKE_MAGICSECURE) {
> 49190df6 Dan Murphy 2017-10-03  122  			value |= DP83822_WOL_SECURE_ON;
> 49190df6 Dan Murphy 2017-10-03  123  			phy_write_mmd(phydev, DP83822_DEVADDR,
> 49190df6 Dan Murphy 2017-10-03  124  				      MII_DP83822_RXSOP1,
> 49190df6 Dan Murphy 2017-10-03  125  				      (wol->sopass[1] << 8) | wol->sopass[0]);
> 49190df6 Dan Murphy 2017-10-03  126  			phy_write_mmd(phydev, DP83822_DEVADDR,
> 49190df6 Dan Murphy 2017-10-03  127  				      MII_DP83822_RXSOP2,
> 49190df6 Dan Murphy 2017-10-03  128  				      (wol->sopass[3] << 8) | wol->sopass[2]);
> 49190df6 Dan Murphy 2017-10-03  129  			phy_write_mmd(phydev, DP83822_DEVADDR,
> 49190df6 Dan Murphy 2017-10-03  130  				      MII_DP83822_RXSOP3,
> 49190df6 Dan Murphy 2017-10-03  131  				      (wol->sopass[5] << 8) | wol->sopass[4]);
> 49190df6 Dan Murphy 2017-10-03  132  		} else {
> 49190df6 Dan Murphy 2017-10-03  133  			value &= ~DP83822_WOL_SECURE_ON;
> 49190df6 Dan Murphy 2017-10-03  134  		}
> 49190df6 Dan Murphy 2017-10-03  135
> 49190df6 Dan Murphy 2017-10-03 @136  		value |= (DP83822_WOL_EN | DP83822_WOL_CLR_INDICATION |
> 49190df6 Dan Murphy 2017-10-03  137  			  DP83822_WOL_CLR_INDICATION);
> 49190df6 Dan Murphy 2017-10-03  138  		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
> 49190df6 Dan Murphy 2017-10-03  139  			      value);
> 49190df6 Dan Murphy 2017-10-03  140  	} else {
> 49190df6 Dan Murphy 2017-10-03  141  		value =
> 49190df6 Dan Murphy 2017-10-03  142  		    phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> 49190df6 Dan Murphy 2017-10-03  143  		value &= (~DP83822_WOL_EN);
> 49190df6 Dan Murphy 2017-10-03  144  		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
> 49190df6 Dan Murphy 2017-10-03  145  			      value);
> 49190df6 Dan Murphy 2017-10-03  146  	}
> 49190df6 Dan Murphy 2017-10-03  147
> 49190df6 Dan Murphy 2017-10-03  148  	return 0;
> 49190df6 Dan Murphy 2017-10-03  149  }
> 49190df6 Dan Murphy 2017-10-03  150
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 


-- 
------------------
Dan Murphy

^ permalink raw reply

* Re: [PATCH net-next 3/4] selinux: bpf: Add selinux check for eBPF syscall operations
From: Daniel Borkmann @ 2017-10-05 16:12 UTC (permalink / raw)
  To: Stephen Smalley, Chenbo Feng, netdev, SELinux,
	linux-security-module
  Cc: Chenbo Feng, Alexei Starovoitov, Lorenzo Colitti
In-Reply-To: <1507210095.27146.5.camel@tycho.nsa.gov>

On 10/05/2017 03:28 PM, Stephen Smalley wrote:
[...]
>> +static int selinux_bpf_prog(struct bpf_prog *prog)
>> +{
>> +	u32 sid = current_sid();
>> +	struct bpf_security_struct *bpfsec;
>> +
>> +	bpfsec = prog->aux->security;
>
> I haven't looked closely at the bpf code, but is it guaranteed that
> prog->aux cannot be NULL here?  What's the difference in lifecycle for
> bpf_prog vs bpf_prog_aux?  Could the aux field be shared across progs
> created by different processes?

prog->aux cannot be NULL here, its lifetime is 1:1 with prog,
it holds slow-path auxiliary data unlike prog itself which is
additionally set to read-only after initial setup until teardown;
aux is also never shared across BPF progs, so always 1:1 relation
to prog itself.

Things that can be shared across multiple BPF programs and user
space processes are BPF maps themselves. From user space side
they can be passed via fd or pinned/retrieved from bpf fs, so
as currently implemented the void *security member you'd hold
in struct bpf_map would refer to the initial creator process of
the map here that doesn't strictly need to be alive anymore;
similarly for prog itself, when its shared between multiple
user space processes, *security ctx would point to the one that
installed it into the kernel initially.

^ permalink raw reply

* Re: [PATCH] rsi: fix integer overflow warning
From: Joe Perches @ 2017-10-05 16:11 UTC (permalink / raw)
  To: David Laight, Arnd Bergmann, Kalle Valo, Prameela Rani Garnepudi,
	Amitkumar Karwar
  Cc: Pavani Muthyala, Karun Eagalapati, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD008ACEE@AcuExch.aculab.com>

On Thu, 2017-10-05 at 15:12 +0000, David Laight wrote:
> From: Joe Perches
> > Sent: 05 October 2017 13:19
> > On Thu, 2017-10-05 at 14:05 +0200, Arnd Bergmann wrote:
> > > gcc produces a harmless warning about a recently introduced
> > > signed integer overflow:
> > > 
> > > drivers/net/wireless/rsi/rsi_91x_hal.c: In function 'rsi_prepare_mgmt_desc':
> > > include/uapi/linux/swab.h:13:15: error: integer overflow in expression [-Werror=overflow]
> > >   (((__u16)(x) & (__u16)0x00ffU) << 8) |   \
> > >    ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
> > > include/uapi/linux/swab.h:104:2: note: in expansion of macro '___constant_swab16'
> > >   ___constant_swab16(x) :   \
> > >   ^~~~~~~~~~~~~~~~~~
> > > include/uapi/linux/byteorder/big_endian.h:34:43: note: in expansion of macro '__swab16'
> > >  #define __cpu_to_le16(x) ((__force __le16)__swab16((x)))
> > 
> > []
> > 
> > > The problem is that the 'mask' value is a signed integer that gets
> > > turned into a negative number when truncated to 16 bits. Making it
> > > an unsigned constant avoids this.
> > 
> > I would expect there are more of these.
> > 
> > Perhaps this define in include/uapi/linux/swab.h:
> > 
> > #define __swab16(x)				\
> > 	(__builtin_constant_p((__u16)(x)) ?	\
> > 	___constant_swab16(x) :			\
> > 	__fswab16(x))
> > 
> > should be
> > 
> > #define __swab16(x)				\
> > 	(__builtin_constant_p((__u16)(x)) ?	\
> > 	___constant_swab16((__u16)(x)) :	\
> > 	__fswab16((__u16)(x)))
> 
> You probably don't want the cast in the call to __fswab16() since
> that is likely to generate an explicit and with 0xffff.
> You will likely also get one if the argument is _u16 (not unsigned int).

It would just an explicit vs implicit cast as __fswab16 is
a static inline with a __u16 argument

^ permalink raw reply

* Re: [PATCH] net: phy: DP83822 initial driver submission (fwd)
From: Julia Lawall @ 2017-10-05 16:10 UTC (permalink / raw)
  To: Dan Murphy; +Cc: andrew, f.fainelli, netdev, Dan Murphy, netdev, kbuild-all

DP83822_WOL_CLR_INDICATION appears twice on line 136.  Perhaps this is not
what is wanted.

julia

---------- Forwarded message ----------
Date: Thu, 5 Oct 2017 21:38:28 +0800
From: kbuild test robot <fengguang.wu@intel.com>
To: kbuild@01.org
Cc: Julia Lawall <julia.lawall@lip6.fr>
Subject: Re: [PATCH] net: phy: DP83822 initial driver submission

CC: kbuild-all@01.org
In-Reply-To: <20171003155316.12312-1-dmurphy@ti.com>
TO: Dan Murphy <dmurphy@ti.com>
CC: andrew@lunn.ch, f.fainelli@gmail.com, netdev@vger.kernel.org, Dan Murphy <dmurphy@ti.com>
CC: netdev@vger.kernel.org, Dan Murphy <dmurphy@ti.com>

Hi Dan,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.14-rc3 next-20170929]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dan-Murphy/net-phy-DP83822-initial-driver-submission/20171005-165547
:::::: branch date: 5 hours ago
:::::: commit date: 5 hours ago

>> drivers/net/phy/dp83822.c:136:29-55: duplicated argument to & or |

# https://github.com/0day-ci/linux/commit/49190df6a2304f031dc2a6ac63710447db36bc23
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 49190df6a2304f031dc2a6ac63710447db36bc23
vim +136 drivers/net/phy/dp83822.c

49190df6 Dan Murphy 2017-10-03   90
49190df6 Dan Murphy 2017-10-03   91  static int dp83822_set_wol(struct phy_device *phydev,
49190df6 Dan Murphy 2017-10-03   92  			   struct ethtool_wolinfo *wol)
49190df6 Dan Murphy 2017-10-03   93  {
49190df6 Dan Murphy 2017-10-03   94  	struct net_device *ndev = phydev->attached_dev;
49190df6 Dan Murphy 2017-10-03   95  	u16 value;
49190df6 Dan Murphy 2017-10-03   96  	const u8 *mac;
49190df6 Dan Murphy 2017-10-03   97
49190df6 Dan Murphy 2017-10-03   98  	if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
49190df6 Dan Murphy 2017-10-03   99  		mac = (const u8 *)ndev->dev_addr;
49190df6 Dan Murphy 2017-10-03  100
49190df6 Dan Murphy 2017-10-03  101  		if (!is_valid_ether_addr(mac))
49190df6 Dan Murphy 2017-10-03  102  			return -EFAULT;
49190df6 Dan Murphy 2017-10-03  103
49190df6 Dan Murphy 2017-10-03  104  		/* MAC addresses start with byte 5, but stored in mac[0].
49190df6 Dan Murphy 2017-10-03  105  		 * 822 PHYs store bytes 4|5, 2|3, 0|1
49190df6 Dan Murphy 2017-10-03  106  		 */
49190df6 Dan Murphy 2017-10-03  107  		phy_write_mmd(phydev, DP83822_DEVADDR,
49190df6 Dan Murphy 2017-10-03  108  			      MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
49190df6 Dan Murphy 2017-10-03  109  		phy_write_mmd(phydev, DP83822_DEVADDR,
49190df6 Dan Murphy 2017-10-03  110  			      MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
49190df6 Dan Murphy 2017-10-03  111  		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
49190df6 Dan Murphy 2017-10-03  112  			      (mac[5] << 8) | mac[4]);
49190df6 Dan Murphy 2017-10-03  113
49190df6 Dan Murphy 2017-10-03  114  		value = phy_read_mmd(phydev, DP83822_DEVADDR,
49190df6 Dan Murphy 2017-10-03  115  				     MII_DP83822_WOL_CFG);
49190df6 Dan Murphy 2017-10-03  116  		if (wol->wolopts & WAKE_MAGIC)
49190df6 Dan Murphy 2017-10-03  117  			value |= DP83822_WOL_MAGIC_EN;
49190df6 Dan Murphy 2017-10-03  118  		else
49190df6 Dan Murphy 2017-10-03  119  			value &= ~DP83822_WOL_MAGIC_EN;
49190df6 Dan Murphy 2017-10-03  120
49190df6 Dan Murphy 2017-10-03  121  		if (wol->wolopts & WAKE_MAGICSECURE) {
49190df6 Dan Murphy 2017-10-03  122  			value |= DP83822_WOL_SECURE_ON;
49190df6 Dan Murphy 2017-10-03  123  			phy_write_mmd(phydev, DP83822_DEVADDR,
49190df6 Dan Murphy 2017-10-03  124  				      MII_DP83822_RXSOP1,
49190df6 Dan Murphy 2017-10-03  125  				      (wol->sopass[1] << 8) | wol->sopass[0]);
49190df6 Dan Murphy 2017-10-03  126  			phy_write_mmd(phydev, DP83822_DEVADDR,
49190df6 Dan Murphy 2017-10-03  127  				      MII_DP83822_RXSOP2,
49190df6 Dan Murphy 2017-10-03  128  				      (wol->sopass[3] << 8) | wol->sopass[2]);
49190df6 Dan Murphy 2017-10-03  129  			phy_write_mmd(phydev, DP83822_DEVADDR,
49190df6 Dan Murphy 2017-10-03  130  				      MII_DP83822_RXSOP3,
49190df6 Dan Murphy 2017-10-03  131  				      (wol->sopass[5] << 8) | wol->sopass[4]);
49190df6 Dan Murphy 2017-10-03  132  		} else {
49190df6 Dan Murphy 2017-10-03  133  			value &= ~DP83822_WOL_SECURE_ON;
49190df6 Dan Murphy 2017-10-03  134  		}
49190df6 Dan Murphy 2017-10-03  135
49190df6 Dan Murphy 2017-10-03 @136  		value |= (DP83822_WOL_EN | DP83822_WOL_CLR_INDICATION |
49190df6 Dan Murphy 2017-10-03  137  			  DP83822_WOL_CLR_INDICATION);
49190df6 Dan Murphy 2017-10-03  138  		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
49190df6 Dan Murphy 2017-10-03  139  			      value);
49190df6 Dan Murphy 2017-10-03  140  	} else {
49190df6 Dan Murphy 2017-10-03  141  		value =
49190df6 Dan Murphy 2017-10-03  142  		    phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
49190df6 Dan Murphy 2017-10-03  143  		value &= (~DP83822_WOL_EN);
49190df6 Dan Murphy 2017-10-03  144  		phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
49190df6 Dan Murphy 2017-10-03  145  			      value);
49190df6 Dan Murphy 2017-10-03  146  	}
49190df6 Dan Murphy 2017-10-03  147
49190df6 Dan Murphy 2017-10-03  148  	return 0;
49190df6 Dan Murphy 2017-10-03  149  }
49190df6 Dan Murphy 2017-10-03  150

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] tools: add bpftool
From: Jakub Kicinski @ 2017-10-05 16:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, alexei.starovoitov, daniel, dsahern, oss-drivers, brouer
In-Reply-To: <20171004.214647.434421323387588461.davem@davemloft.net>

On Wed, 04 Oct 2017 21:46:47 -0700 (PDT), David Miller wrote:
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Wed,  4 Oct 2017 20:10:02 -0700
> 
> > Hi!
> > 
> > This set adds bpftool to the tools/ directory.  The first 
> > patch renames tools/net to tools/bpf, the second one adds 
> > the new code, while the third adds simple documentation.
> > 
> > v4:
> >  - rename docs *.txt -> *.rst (Jesper).
> > v3:
> >  - address Alexei's comments about output and docs.
> > v2:
> >  - report names, map ids, load time, uid;
> >  - add docs/man pages;
> >  - general cleanups & fixes.  
> 
> Series applied, although there was some trailing whitespace I had to fix
> up in patch #3.

I was under the impression that the white space was required by the RST
format, but I double checked and the man page looks correct.

Thank you!

^ permalink raw reply

* Re: [PATCH 3/3] ARM: dts: gr-peach: Add ETHER pin group
From: jacopo mondi @ 2017-10-05 15:42 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Geert Uytterhoeven, Chris Brandt, f.fainelli, netdev
In-Reply-To: <20171005134339.GJ13247@lunn.ch>

Hi Andrew,

On Thu, Oct 05, 2017 at 03:43:39PM +0200, Andrew Lunn wrote:
> On Thu, Oct 05, 2017 at 11:39:15AM +0200, jacopo mondi wrote:
> > Hi Geert
> >
> > On Thu, Oct 05, 2017 at 11:09:40AM +0200, Geert Uytterhoeven wrote:
> > > Hi Jacopo,
> > >
> > > On Thu, Oct 5, 2017 at 10:58 AM, Jacopo Mondi <jacopo+renesas@jmondi.org> wrote:
> > > > Add pin configuration subnode for ETHER pin group and enable the interface.
> > > >
> > > > Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
> > >
> > > Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > >
> > > > --- a/arch/arm/boot/dts/r7s72100-gr-peach.dts
> > > > +++ b/arch/arm/boot/dts/r7s72100-gr-peach.dts
> > >
> > > > @@ -88,3 +110,19 @@
> > > >
> > > >         status = "okay";
> > > >  };
> > > > +
> > > > +&ether {
> > > > +       pinctrl-names = "default";
> > > > +       pinctrl-0 = <&ether_pins>;
> > > > +
> > > > +       status = "okay";
> > > > +
> > > > +       reset-gpios = <&port4 2 GPIO_ACTIVE_LOW>;
> > > > +       reset-delay-us = <5>;
> > >
> > > I'm afraid the PHY people (not CCed ;-) will want you to move these reset
> > > properties to the phy subnode these days, despite
> > > Documentation/devicetree/bindings/net/mdio.txt...
>
> Hi Jocopo
>
> So what is this reset resetting?
>
> The MAC?
> The PHY?

The reset line goes from our SoC to LAN8710A PHY chip external reset pin.

Thanks
   j

>
>     Andrew

^ permalink raw reply

* (unknown), 
From: kindergartenchaos2 @ 2017-10-05 15:34 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: EBAY-00128399787315netdev.zip --]
[-- Type: application/zip, Size: 7325 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox