Netdev List
 help / color / mirror / Atom feed
* [PATCH v2] packet: uses kfree_skb() for drops or errors.
From: Weongyo Jeong @ 2016-04-06 21:14 UTC (permalink / raw)
  To: netdev; +Cc: Weongyo Jeong, David S. Miller, Willem de Bruijn

consume_skb() isn't for drop or error cases that  kfree_skb() is more proper
one.  At this patch, it fixed tpacket_rcv() and packet_rcv() to be
consistent for error or non-error cases letting perf trace its event
properly.

Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com>
---
 net/packet/af_packet.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1ecfa71..cd100cf 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2040,7 +2040,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct sockaddr_ll *sll;
 	struct packet_sock *po;
 	u8 *skb_head = skb->data;
-	int skb_len = skb->len;
+	int err = 0, skb_len = skb->len;
 	unsigned int snaplen, res;
 
 	if (skb->pkt_type == PACKET_LOOPBACK)
@@ -2130,6 +2130,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 	return 0;
 
 drop_n_acct:
+	err = -1;
 	spin_lock(&sk->sk_receive_queue.lock);
 	po->stats.stats1.tp_drops++;
 	atomic_inc(&sk->sk_drops);
@@ -2141,7 +2142,10 @@ drop_n_restore:
 		skb->len = skb_len;
 	}
 drop:
-	consume_skb(skb);
+	if (!err)
+		consume_skb(skb);
+	else
+		kfree_skb(skb);
 	return 0;
 }
 
@@ -2153,7 +2157,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	struct sockaddr_ll *sll;
 	union tpacket_uhdr h;
 	u8 *skb_head = skb->data;
-	int skb_len = skb->len;
+	int err = 0, skb_len = skb->len;
 	unsigned int snaplen, res;
 	unsigned long status = TP_STATUS_USER;
 	unsigned short macoff, netoff, hdrlen;
@@ -2367,10 +2371,14 @@ drop_n_restore:
 		skb->len = skb_len;
 	}
 drop:
-	kfree_skb(skb);
+	if (!err)
+		consume_skb(skb);
+	else
+		kfree_skb(skb);
 	return 0;
 
 drop_n_account:
+	err = -1;
 	po->stats.stats1.tp_drops++;
 	spin_unlock(&sk->sk_receive_queue.lock);
 
-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH net-next V3 05/16] net: fec: reduce interrupts
From: David Miller @ 2016-04-06 21:20 UTC (permalink / raw)
  To: troy.kisky
  Cc: netdev, fugang.duan, lznuaa, fabio.estevam, l.stach, andrew,
	tremyfr, gerg, linux-arm-kernel, johannes, stillcompiling,
	sergei.shtylyov, arnd
In-Reply-To: <1459909562-22865-6-git-send-email-troy.kisky@boundarydevices.com>

From: Troy Kisky <troy.kisky@boundarydevices.com>
Date: Tue,  5 Apr 2016 19:25:51 -0700

> By clearing the NAPI interrupts in the NAPI routine
> and not in the interrupt handler, we can reduce the
> number of interrupts. We also don't need any status
> variables as the registers are still valid.
> 
> Also, notice that if budget pkts are received, the
> next call to fec_enet_rx_napi will now continue to
> receive the previously pending packets.
> 
> To test that this actually reduces interrupts, try
> this command before/after patch
> 
> cat /proc/interrupts |grep ether; \
> ping -s2800 192.168.0.201 -f -c1000 ; \
> cat /proc/interrupts |grep ether
> 
> For me, before this patch is 2996 interrupts.
> After patch is 2010 interrupts.
> 
> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>

I really don't think this is a good idea at all.

I would instead really rather see you stash away the
status register values into some piece of software state,
and then re-read them before you are about to finish a
NAPI poll cycle.

^ permalink raw reply

* Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: David Miller @ 2016-04-06 21:20 UTC (permalink / raw)
  To: troy.kisky
  Cc: netdev, fugang.duan, lznuaa, fabio.estevam, l.stach, andrew,
	tremyfr, gerg, linux-arm-kernel, johannes, stillcompiling,
	sergei.shtylyov, arnd
In-Reply-To: <1459909562-22865-1-git-send-email-troy.kisky@boundarydevices.com>


This is a way too large patch series.

Please split it up into smaller, more logical, pieces.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock
From: marcelo.leitner @ 2016-04-06 21:21 UTC (permalink / raw)
  To: David Miller; +Cc: joe, netdev, nhorman, vyasevich, linux-sctp
In-Reply-To: <20160406.155735.1431363453832315669.davem@davemloft.net>

On Wed, Apr 06, 2016 at 03:57:35PM -0400, David Miller wrote:
> From: Joe Perches <joe@perches.com>
> Date: Wed, 06 Apr 2016 12:53:24 -0700
> 
> > On Wed, 2016-04-06 at 14:53 -0300, Marcelo Ricardo Leitner wrote:
> >> It wastes space and gets worse as we add new flags, so convert bit-wide
> >> flags to a bitfield.
> >> 
> >> Currently it already saves 4 bytes in sctp_sock, which are left as holes
> >> in it for now. The whole struct needs packing, which should be done in
> >> another patch.
> >> 
> >> Note that do_auto_asconf cannot be merged, as explained in the comment
> >> before it.
> >> 
> >> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> >> ---
> > []
> >> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> > []
> >> @@ -210,14 +210,14 @@ struct sctp_sock {
> >>  	int user_frag;
> >>  
> >>  	__u32 autoclose;
> >> -	__u8 nodelay;
> >> -	__u8 disable_fragments;
> >> -	__u8 v4mapped;
> >> -	__u8 frag_interleave;
> >>  	__u32 adaptation_ind;
> >>  	__u32 pd_point;
> >> -	__u8 recvrcvinfo;
> >> -	__u8 recvnxtinfo;
> >> +	__u16	nodelay:1,
> >> +		disable_fragments:1,
> >> +		v4mapped:1,
> >> +		frag_interleave:1,
> >> +		recvrcvinfo:1,
> >> +		recvnxtinfo:1;
> > 
> > Might as well make this __u32 as the next field would be
> > aligned on an atomic_t

On changelog I meant that this (re-)ordering will happen in another
patch. The hole is already there today and there are others to be
considered/fixed too. Please consider this patch as a preparation for
the other one.

After this patch, pahole gives for this struct:

        /* size: 1272, cachelines: 20, members: 40 */
        /* sum members: 1250, holes: 7, sum holes: 18 */
        /* bit holes: 1, sum bit holes: 9 bits */
        /* padding: 4 */
        /* paddings: 1, sum paddings: 2 */
        /* last cacheline: 56 bytes */

Quite some work todo :-)

> > It might be better if these fields didn't use the __ prefix.
> 
> Indeed, this isn't in a UAPI file so __ prefixed types really aren't
> appropriate.

The whole file is using __ prefixed types. I can remove them in another
patch instead if that's okay with you.
It's also present in other files not under UAPI:
$ git grep -wl '\(__u32\|__u16\)' include/net/sctp/
include/net/sctp/auth.h
include/net/sctp/checksum.h
include/net/sctp/command.h
include/net/sctp/constants.h
include/net/sctp/sctp.h
include/net/sctp/sm.h
include/net/sctp/structs.h
include/net/sctp/tsnmap.h
include/net/sctp/ulpevent.h
include/net/sctp/ulpqueue.h

We can start changing it progressively but I think it would be better if
we replace them all at once.

  Marcelo

^ permalink raw reply

* Re: [patch net-next 00/17] mlxsw: Introduce support for Data Center Bridging
From: David Miller @ 2016-04-06 21:24 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1459955416-23786-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed,  6 Apr 2016 17:09:59 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Ido says:
> 
> This patchset introduces support for Quality of Service (QoS) as part of the
> IEEE Data Center Bridiging (DCB) standards.
> 
> Patches 1-9 do the required device initialization. Specifically, patches 1-6
> initialize the ports' headroom buffers, which are used at ingress to store
> incoming packets while they go through the switch's pipeline. Patches 7-9
> complete them by initializing the egress scheduling.
> 
> The pipeline mentioned above determines the packet's egress port(s) and
> traffic class. Ideally, once out of the pipeline the packet moves to the
> switch's shared buffer (to be introduced in Jiri's patchset, currently
> default values are used) and scheduled for transmission according to its
> traffic class. The egress scheduling is configured according to the 802.1Qaz
> standard, which is part of the DCB infrastructure supported by Linux. This
> is introduced in patches 10-12.
> 
> Even after going through the pipeline packets are not always eligible to
> enter the shared buffer. This is determined by the amount of available space
> and the quotas associated with the packet. However, if flow control is
> enabled and the packet is associated with the lossless flow, then it will
> stay in the headroom and won't be discarded. This is introduced in patches
> 13-17.
> 
> Please check individual commit messages for more info, as I tried to keep
> them pretty detailed.

Series applied, thanks guys.

^ permalink raw reply

* Re: [PATCH] macvlan: Support interface operstate properly
From: Nikolay Aleksandrov @ 2016-04-06 21:27 UTC (permalink / raw)
  To: Debabrata Banerjee, Patrick McHardy, netdev
In-Reply-To: <570579AA.6050400@cumulusnetworks.com>

On 04/06/2016 11:03 PM, Nikolay Aleksandrov wrote:
> On 04/06/2016 10:30 PM, Debabrata Banerjee wrote:
>> Set appropriate macvlan interface status based on lower device and our
>> status. Can be up, down, or lowerlayerdown.
>>
>> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>>
> 
> May I ask what is exactly that you're fixing here ? I recently had to make macvlan's
> operstates more accurate and I haven't experienced any wrong behaviour since commit
> de7d244d0a35 ("macvlan: make operstate and carrier more accurate").
> Also it's the linkwatch's job to take care for the proper operstate, we can use
> netif_stacked_transfer_operstate to help it, but I don't think directly setting
> operstate is a good idea.

Misread part of the change, got it now. My comment below still stands though,

> One more thing - you cannot use netdev_state_change() under the write_lock as it
> may sleep.
> 

^ permalink raw reply

* Re: [PATCH] macvlan: Support interface operstate properly
From: Banerjee, Debabrata @ 2016-04-06 21:26 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Patrick McHardy, netdev@vger.kernel.org
In-Reply-To: <570579AA.6050400@cumulusnetworks.com>

On 4/6/16, 5:03 PM, "Nikolay Aleksandrov" <nikolay@cumulusnetworks.com> wrote:


>On 04/06/2016 10:30 PM, Debabrata Banerjee wrote:
>> Set appropriate macvlan interface status based on lower device and our
>> status. Can be up, down, or lowerlayerdown.
>> 
>> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>> 
>
>May I ask what is exactly that you're fixing here ? I recently had to make macvlan's
>operstates more accurate and I haven't experienced any wrong behaviour since commit
>de7d244d0a35 ("macvlan: make operstate and carrier more accurate").

Yes I saw the other patch, it's an improvement from when I started working on this. 


>Also it's the linkwatch's job to take care for the proper operstate, we can use
>netif_stacked_transfer_operstate to help it, but I don't think directly setting
>operstate is a good idea.

This patch was modeled after __hsr_set_operstate(). But I agree there's probably
better ways to do it. I'm not sure why netif_stacked_transfer_operstate() doesn't do
it itself, although in the case of a layered device, my patch actually uses the other
possible state, which is lowerlayerdown. Without the patch operstate goes directly to
down.

>
>One more thing - you cannot use netdev_state_change() under the write_lock as it
>may sleep.

You're right, I can resubmit moving the call out of the critical section, if the patch
will be taken.


^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Casey Schaufler @ 2016-04-06 21:37 UTC (permalink / raw)
  To: Paolo Abeni, linux-security-module
  Cc: David S. Miller, James Morris, Paul Moore, Andreas Gruenbacher,
	Stephen Smalley, Florian Westphal, netdev
In-Reply-To: <cover.1459934322.git.pabeni@redhat.com>

On 4/6/2016 2:51 AM, Paolo Abeni wrote:
> Currently, selinux always registers iptables POSTROUTING hooks regarless of
> the running policy needs for any action to be performed by them.
>
> Even the socket_sock_rcv_skb() is always registered, but it can result in a no-op
> depending on the current policy configuration.
>
> The above invocations in the kernel datapath are cause of measurable
> overhead in networking performance test.
>
> This patch series adds explicit notification for netlabel status change 
> (other relevant status change, like xfrm and secmark, are already notified to
> LSM) and use this information in selinux to register the above hooks only when
> the current status makes them relevant, deregistering them when no-op
>
> Avoiding the LSM hooks overhead, in netperf UDP_STREAM test with small packets,
> gives about 5% performance improvement on rx and about 8% on tx.
>
> Paolo Abeni (2):
>   security: add hook for netlabel status change notification
>   selinux: implement support for dynamic net hook [de-]registration
>
>  include/linux/lsm_hooks.h           |  6 ++++
>  include/linux/security.h            |  5 +++
>  net/netlabel/netlabel_cipso_v4.c    |  8 +++--
>  net/netlabel/netlabel_unlabeled.c   |  5 ++-
>  security/security.c                 |  7 ++++
>  security/selinux/hooks.c            | 72 +++++++++++++++++++++++++++++++------
>  security/selinux/include/security.h |  1 +
>  security/selinux/ss/services.c      |  1 +
>  security/selinux/xfrm.c             |  4 +++
>  9 files changed, 96 insertions(+), 13 deletions(-)
>
Is there a patch 1/2?


^ permalink raw reply

* Re: [PATCH] macvlan: Support interface operstate properly
From: Nikolay Aleksandrov @ 2016-04-06 21:42 UTC (permalink / raw)
  To: Banerjee, Debabrata, Patrick McHardy, netdev@vger.kernel.org
In-Reply-To: <53A6C2CA-9C49-4F85-97F6-DD60276AFCC2@akamai.com>

On 04/06/2016 11:26 PM, Banerjee, Debabrata wrote:
> On 4/6/16, 5:03 PM, "Nikolay Aleksandrov" <nikolay@cumulusnetworks.com> wrote:
> 
> 
>> On 04/06/2016 10:30 PM, Debabrata Banerjee wrote:
>>> Set appropriate macvlan interface status based on lower device and our
>>> status. Can be up, down, or lowerlayerdown.
>>>
>>> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>>>
>>
>> May I ask what is exactly that you're fixing here ? I recently had to make macvlan's
>> operstates more accurate and I haven't experienced any wrong behaviour since commit
>> de7d244d0a35 ("macvlan: make operstate and carrier more accurate").
> 
> Yes I saw the other patch, it's an improvement from when I started working on this. 
> 
> 
>> Also it's the linkwatch's job to take care for the proper operstate, we can use
>> netif_stacked_transfer_operstate to help it, but I don't think directly setting
>> operstate is a good idea.
> 
> This patch was modeled after __hsr_set_operstate(). But I agree there's probably
> better ways to do it. I'm not sure why netif_stacked_transfer_operstate() doesn't do
> it itself, although in the case of a layered device, my patch actually uses the other
> possible state, which is lowerlayerdown. Without the patch operstate goes directly to
> down.
> 
>>
>> One more thing - you cannot use netdev_state_change() under the write_lock as it
>> may sleep.
> 
> You're right, I can resubmit moving the call out of the critical section, if the patch
> will be taken.
> 

I don't know if it'll be taken, but you can submit v2 for review. I'll review and
test it tomorrow as it's late here and I'm tired. :-)
Since this is not a bug fix, I'd suggest to target net-next and you
don't have to CC linux-kernel@vger.kernel.org.

Thanks for the explanation, I misread part of the patch at first and was confused,
but I got the idea now.

Cheers,
 Nik

^ permalink raw reply

* Re: [PATCH net-next 0/7] sctp: support sctp_diag in kernel
From: marcelo.leitner @ 2016-04-06 21:42 UTC (permalink / raw)
  To: David Miller; +Cc: lucien.xin, netdev, linux-sctp, vyasevich, daniel
In-Reply-To: <20160406.161312.1132930117025273561.davem@davemloft.net>

On Wed, Apr 06, 2016 at 04:13:12PM -0400, David Miller wrote:
> From: Xin Long <lucien.xin@gmail.com>
> Date: Tue,  5 Apr 2016 12:06:25 +0800
> 
> > This patchset will add sctp_diag module to implement diag interface on
> > sctp in kernel.
>  ...
> 
> This series looks generally fine to me, but I'd like to see some review from
> SCTP experts before I apply this series.
> 
> Thanks.

Will finish reviewing it tomorrow. Thanks for reviewing it already

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Casey Schaufler @ 2016-04-06 21:43 UTC (permalink / raw)
  To: Paolo Abeni, linux-security-module
  Cc: David S. Miller, James Morris, Paul Moore, Andreas Gruenbacher,
	Stephen Smalley, Florian Westphal, netdev
In-Reply-To: <cover.1459934322.git.pabeni@redhat.com>

On 4/6/2016 2:51 AM, Paolo Abeni wrote:
> Currently, selinux always registers iptables POSTROUTING hooks regarless of
> the running policy needs for any action to be performed by them.
>
> Even the socket_sock_rcv_skb() is always registered, but it can result in a no-op
> depending on the current policy configuration.
>
> The above invocations in the kernel datapath are cause of measurable
> overhead in networking performance test.
>
> This patch series adds explicit notification for netlabel status change 
> (other relevant status change, like xfrm and secmark, are already notified to
> LSM) and use this information in selinux to register the above hooks only when
> the current status makes them relevant, deregistering them when no-op
>
> Avoiding the LSM hooks overhead, in netperf UDP_STREAM test with small packets,
> gives about 5% performance improvement on rx and about 8% on tx.
>
> Paolo Abeni (2):
>   security: add hook for netlabel status change notification
>   selinux: implement support for dynamic net hook [de-]registration

Did you consider the fact that netlabel and the LSM socket
hooks are used by Smack as well as SELinux? Did you measure the
impact that your changes have on Smack? Does Smack even work
with your changes?

>
>  include/linux/lsm_hooks.h           |  6 ++++
>  include/linux/security.h            |  5 +++
>  net/netlabel/netlabel_cipso_v4.c    |  8 +++--
>  net/netlabel/netlabel_unlabeled.c   |  5 ++-
>  security/security.c                 |  7 ++++
>  security/selinux/hooks.c            | 72 +++++++++++++++++++++++++++++++------
>  security/selinux/include/security.h |  1 +
>  security/selinux/ss/services.c      |  1 +
>  security/selinux/xfrm.c             |  4 +++
>  9 files changed, 96 insertions(+), 13 deletions(-)
>

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Paul Moore @ 2016-04-06 21:43 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Paolo Abeni, linux-security-module, David S. Miller, James Morris,
	Andreas Gruenbacher, Stephen Smalley, Florian Westphal, netdev
In-Reply-To: <5705819A.3030809@schaufler-ca.com>

On Wed, Apr 6, 2016 at 5:37 PM, Casey Schaufler <casey@schaufler-ca.com> wrote:
> On 4/6/2016 2:51 AM, Paolo Abeni wrote:
>> Currently, selinux always registers iptables POSTROUTING hooks regarless of
>> the running policy needs for any action to be performed by them.
>>
>> Even the socket_sock_rcv_skb() is always registered, but it can result in a no-op
>> depending on the current policy configuration.
>>
>> The above invocations in the kernel datapath are cause of measurable
>> overhead in networking performance test.
>>
>> This patch series adds explicit notification for netlabel status change
>> (other relevant status change, like xfrm and secmark, are already notified to
>> LSM) and use this information in selinux to register the above hooks only when
>> the current status makes them relevant, deregistering them when no-op
>>
>> Avoiding the LSM hooks overhead, in netperf UDP_STREAM test with small packets,
>> gives about 5% performance improvement on rx and about 8% on tx.
>>
>> Paolo Abeni (2):
>>   security: add hook for netlabel status change notification
>>   selinux: implement support for dynamic net hook [de-]registration
>>
>>  include/linux/lsm_hooks.h           |  6 ++++
>>  include/linux/security.h            |  5 +++
>>  net/netlabel/netlabel_cipso_v4.c    |  8 +++--
>>  net/netlabel/netlabel_unlabeled.c   |  5 ++-
>>  security/security.c                 |  7 ++++
>>  security/selinux/hooks.c            | 72 +++++++++++++++++++++++++++++++------
>>  security/selinux/include/security.h |  1 +
>>  security/selinux/ss/services.c      |  1 +
>>  security/selinux/xfrm.c             |  4 +++
>>  9 files changed, 96 insertions(+), 13 deletions(-)
>>
>
> Is there a patch 1/2?

Yes, there was (it was the "security: add hook ..." patch), but for
some reason it hasn't hit the archive that I normally use.  Odd.

I'll fwd the patch to you off-list so as not to spam everyone again.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net-next] net: intel: remove dead links
From: Jeff Kirsher @ 2016-04-06 22:08 UTC (permalink / raw)
  To: David Miller, jbenc
  Cc: netdev, jesse.brandeburg, shannon.nelson, carolyn.wyborny,
	donald.c.skidmore, bruce.w.allan, john.ronciak, mitch.a.williams,
	intel-wired-lan
In-Reply-To: <20160406.165451.42694079252458681.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

On Wed, 2016-04-06 at 16:54 -0400, David Miller wrote:
> From: Jiri Benc <jbenc@redhat.com>
> Date: Tue,  5 Apr 2016 16:25:07 +0200
> 
> > 
> > The Kconfig for Intel NICs references two different URLs for the
> > "Adapter
> > & Driver ID Guide". Neither of those two links works. The current
> > URL seems
> > to be
> > http://www.intel.com/content/www/us/en/support/network-and-i-o/ethe
> > rnet-products/000005584.html
> > but given it's apparently constantly changing, there's no point in
> > having it
> > in the help text.
> > 
> > Just keep a generic pointer to http://support.intel.com. Hopefully,
> > this one
> > will have a longer live. It still works, at least.
> > 
> > Futhermore, remove a link to "the latest Intel PRO/100 network
> > driver for
> > Linux", this has no place in the mainline kernel and the latest
> > Linux driver
> > it offers is from 2006, anyway.
> > 
> > Signed-off-by: Jiri Benc <jbenc@redhat.com>
> I expect Jeff to pull this into his tree, thanks.

Yep, got it queued up.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Florian Westphal @ 2016-04-06 22:14 UTC (permalink / raw)
  To: Paul Moore
  Cc: Paolo Abeni, linux-security-module, David S. Miller, James Morris,
	Andreas Gruenbacher, Stephen Smalley, Florian Westphal, netdev,
	selinux
In-Reply-To: <CAHC9VhS6aJdPR6xTH2-ehikS5qvj6jFbZAUtzoBXp+WC9Ugi=Q@mail.gmail.com>

Paul Moore <paul@paul-moore.com> wrote:
> On Wed, Apr 6, 2016 at 5:51 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > Currently, selinux always registers iptables POSTROUTING hooks regarless of
> > the running policy needs for any action to be performed by them.
> >
> > Even the socket_sock_rcv_skb() is always registered, but it can result in a no-op
> > depending on the current policy configuration.
> >
> > The above invocations in the kernel datapath are cause of measurable
> > overhead in networking performance test.
> >
> > This patch series adds explicit notification for netlabel status change
> > (other relevant status change, like xfrm and secmark, are already notified to
> > LSM) and use this information in selinux to register the above hooks only when
> > the current status makes them relevant, deregistering them when no-op
> >
> > Avoiding the LSM hooks overhead, in netperf UDP_STREAM test with small packets,
> > gives about 5% performance improvement on rx and about 8% on tx.
> 
> [NOTE: added the SELinux mailing list to the CC line, please include
> when submitting SELinux patches]
> 
> While I appreciate the patch and the work that went into development
> and testing, I'm going to reject this patch on the grounds that it
> conflicts with work we've just started thinking about which should
> bring some tangible security benefit.
> 
> The recent addition of post-init read only memory opens up some
> interesting possibilities for SELinux and LSMs in general, the thing
> which we've just started looking at is marking the LSM hook structure
> read only after init.  There are some complicating factors for
> SELinux, but I'm confident those can be resolved, and from what I can
> tell marking the hooks read only will have no effect on other LSMs.
> While marking the LSM hook structure doesn't directly affect the
> SELinux netfilter hooks, once we remove the ability to deregister the
> LSM hooks we will have no need to support deregistering netfilter
> hooks and I expect we will drop that functionality as well to help
> decrease the risk of tampering.

netfilter hooks are per namespace -- so there is hook unregister when
netns is destroyed.

Do you think it makes sense to rework the patch to delay registering
of the netfiler hooks until the system is in a state where they're
needed, without the 'unregister' aspect?

Ideally this would even be per netns -- in perfect world we would
be able to make it so that a new netns are created with an empty
hook list.

Thanks.

^ permalink raw reply

* [PATCH net-next v2] macvlan: Support interface operstate properly
From: Debabrata Banerjee @ 2016-04-06 22:36 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Patrick McHardy, netdev; +Cc: Debabrata Banerjee

Set appropriate macvlan interface status based on lower device and our
status. Can be up, down, or lowerlayerdown.

de7d244d0 improved operstate by setting it from unknown to up, however
it did not handle transferring down or lowerlayerdown.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
---
v2: Fix locking and update commit message

 drivers/net/macvlan.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 2bcf1f3..306124ba 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -91,6 +91,7 @@ static struct macvlan_port *macvlan_port_get_rtnl(const struct net_device *dev)
 }
 
 #define macvlan_port_exists(dev) (dev->priv_flags & IFF_MACVLAN_PORT)
+#define is_macvlan(dev) (dev->priv_flags & IFF_MACVLAN)
 
 static struct macvlan_dev *macvlan_hash_lookup(const struct macvlan_port *port,
 					       const unsigned char *addr)
@@ -1242,6 +1243,28 @@ static int macvlan_changelink_sources(struct macvlan_dev *vlan, u32 mode,
 	return 0;
 }
 
+static void macvlan_set_operstate(struct net_device *lowerdev,
+				  struct net_device *dev)
+{
+	unsigned char newstate = dev->operstate;
+
+	if (!(dev->flags & IFF_UP))
+		newstate = IF_OPER_DOWN;
+	else if ((lowerdev->flags & IFF_UP) && netif_oper_up(lowerdev))
+		newstate = IF_OPER_UP;
+	else
+		newstate = IF_OPER_LOWERLAYERDOWN;
+
+	write_lock_bh(&dev_base_lock);
+	if (dev->operstate != newstate) {
+		dev->operstate = newstate;
+		write_unlock_bh(&dev_base_lock);
+		netdev_state_change(dev);
+	} else {
+		write_unlock_bh(&dev_base_lock);
+	}
+}
+
 int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 			   struct nlattr *tb[], struct nlattr *data[])
 {
@@ -1324,6 +1347,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 
 	list_add_tail_rcu(&vlan->list, &port->vlans);
 	netif_stacked_transfer_operstate(lowerdev, dev);
+	macvlan_set_operstate(lowerdev, dev);
 	linkwatch_fire_event(dev);
 
 	return 0;
@@ -1518,17 +1542,36 @@ static int macvlan_device_event(struct notifier_block *unused,
 	struct macvlan_port *port;
 	LIST_HEAD(list_kill);
 
-	if (!macvlan_port_exists(dev))
+	if (!macvlan_port_exists(dev) && !is_macvlan(dev))
+		return NOTIFY_DONE;
+
+	if (is_macvlan(dev)) {
+		vlan = netdev_priv(dev);
+
+		switch (event) {
+		case NETDEV_UP:
+		case NETDEV_DOWN:
+		case NETDEV_CHANGE:
+			netif_stacked_transfer_operstate(vlan->lowerdev,
+							 vlan->dev);
+			macvlan_set_operstate(vlan->lowerdev, vlan->dev);
+			break;
+		}
+
 		return NOTIFY_DONE;
+	}
 
 	port = macvlan_port_get_rtnl(dev);
 
 	switch (event) {
 	case NETDEV_UP:
+	case NETDEV_DOWN:
 	case NETDEV_CHANGE:
-		list_for_each_entry(vlan, &port->vlans, list)
+		list_for_each_entry(vlan, &port->vlans, list) {
 			netif_stacked_transfer_operstate(vlan->lowerdev,
 							 vlan->dev);
+			macvlan_set_operstate(vlan->lowerdev, vlan->dev);
+		}
 		break;
 	case NETDEV_FEAT_CHANGE:
 		list_for_each_entry(vlan, &port->vlans, list) {
-- 
2.8.0

^ permalink raw reply related

* Re: [RFC PATCH 2/2] selinux: implement support for dynamic net hook [de-]registration
From: Casey Schaufler @ 2016-04-06 22:32 UTC (permalink / raw)
  To: Paolo Abeni, linux-security-module
  Cc: David S. Miller, James Morris, Paul Moore, Andreas Gruenbacher,
	Stephen Smalley, Florian Westphal, netdev
In-Reply-To: <b7f1a4578e0617c7361dd392282e03556aa37b46.1459934322.git.pabeni@redhat.com>

On 4/6/2016 2:51 AM, Paolo Abeni wrote:
> This patch leverage the netlbl_changed() hook to perform on demand
> registration and deregistration of the netfilter hooks and the
> socket_sock_rcv_skb hook.
>
> With default policy and empty netfilter/netlabel configuration, the
> above hooks are not registered and this allows avoiding nf_hook_slow
> in the xmit path and socket_sock_rcv_skb() in the rx path.

There is no reason to assume that there is a relationship between
a netlabel configuration and a netfilter configuration. Smack always
has a netlabel configuration. Security modules (e.g. AppArmor) may
well use netfilter without netlabel.

Please stop assuming that security == SELinux. 

>
> This gives measurable network performance improvement in both
> directions.

In the case where SELinux is enabled and netfilter is not.

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  security/selinux/hooks.c            | 72 +++++++++++++++++++++++++++++++------
>  security/selinux/include/security.h |  1 +
>  security/selinux/ss/services.c      |  1 +
>  security/selinux/xfrm.c             |  4 +++
>  4 files changed, 68 insertions(+), 10 deletions(-)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 912deee..a3baa69 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4745,11 +4745,13 @@ static int selinux_secmark_relabel_packet(u32 sid)
>  static void selinux_secmark_refcount_inc(void)
>  {
>  	atomic_inc(&selinux_secmark_refcount);
> +	selinux_net_update();
>  }
>  
>  static void selinux_secmark_refcount_dec(void)
>  {
>  	atomic_dec(&selinux_secmark_refcount);
> +	selinux_net_update();
>  }
>  
>  static void selinux_req_classify_flow(const struct request_sock *req,
> @@ -4836,6 +4838,11 @@ static int selinux_tun_dev_open(void *security)
>  	return 0;
>  }
>  
> +static void selinux_netlbl_changed(void)
> +{
> +	selinux_net_update();
> +}
> +
>  static int selinux_nlmsg_perm(struct sock *sk, struct sk_buff *skb)
>  {
>  	int err = 0;
> @@ -6091,7 +6098,6 @@ static struct security_hook_list selinux_hooks[] = {
>  	LSM_HOOK_INIT(socket_getsockopt, selinux_socket_getsockopt),
>  	LSM_HOOK_INIT(socket_setsockopt, selinux_socket_setsockopt),
>  	LSM_HOOK_INIT(socket_shutdown, selinux_socket_shutdown),
> -	LSM_HOOK_INIT(socket_sock_rcv_skb, selinux_socket_sock_rcv_skb),
>  	LSM_HOOK_INIT(socket_getpeersec_stream,
>  			selinux_socket_getpeersec_stream),
>  	LSM_HOOK_INIT(socket_getpeersec_dgram, selinux_socket_getpeersec_dgram),
> @@ -6113,6 +6119,7 @@ static struct security_hook_list selinux_hooks[] = {
>  	LSM_HOOK_INIT(tun_dev_attach_queue, selinux_tun_dev_attach_queue),
>  	LSM_HOOK_INIT(tun_dev_attach, selinux_tun_dev_attach),
>  	LSM_HOOK_INIT(tun_dev_open, selinux_tun_dev_open),
> +	LSM_HOOK_INIT(netlbl_changed, selinux_netlbl_changed),
>  
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
>  	LSM_HOOK_INIT(xfrm_policy_alloc_security, selinux_xfrm_policy_alloc),
> @@ -6145,6 +6152,11 @@ static struct security_hook_list selinux_hooks[] = {
>  #endif
>  };
>  
> +/* dynamically registered/unregisterd */
> +static struct security_hook_list selinux_sock_hooks[] = {
> +	LSM_HOOK_INIT(socket_sock_rcv_skb, selinux_socket_sock_rcv_skb),
> +};
> +
>  static __init int selinux_init(void)
>  {
>  	if (!security_module_enable("selinux")) {
> @@ -6240,7 +6252,9 @@ static struct nf_hook_ops selinux_nf_ops[] = {
>  #endif	/* IPV6 */
>  };
>  
> -static int __init selinux_nf_ip_init(void)
> +static bool nf_hooks_registered;
> +
> +static int selinux_nf_ip_init(void)
>  {
>  	int err;
>  
> @@ -6253,25 +6267,21 @@ static int __init selinux_nf_ip_init(void)
>  	if (err)
>  		panic("SELinux: nf_register_hooks: error %d\n", err);
>  
> +	nf_hooks_registered = true;
>  	return 0;
>  }
>  
> -__initcall(selinux_nf_ip_init);
> -
> -#ifdef CONFIG_SECURITY_SELINUX_DISABLE
>  static void selinux_nf_ip_exit(void)
>  {
>  	printk(KERN_DEBUG "SELinux:  Unregistering netfilter hooks\n");
>  
>  	nf_unregister_hooks(selinux_nf_ops, ARRAY_SIZE(selinux_nf_ops));
> +	nf_hooks_registered = false;
>  }
> -#endif
>  
>  #else /* CONFIG_NETFILTER */
>  
> -#ifdef CONFIG_SECURITY_SELINUX_DISABLE
>  #define selinux_nf_ip_exit()
> -#endif
>  
>  #endif /* CONFIG_NETFILTER */
>  
> @@ -6300,8 +6310,8 @@ int selinux_disable(void)
>  	/* Try to destroy the avc node cache */
>  	avc_disable();
>  
> -	/* Unregister netfilter hooks. */
> -	selinux_nf_ip_exit();
> +	/* Unregister net hooks. */
> +	selinux_net_update();
>  
>  	/* Unregister selinuxfs. */
>  	exit_sel_fs();
> @@ -6309,3 +6319,45 @@ int selinux_disable(void)
>  	return 0;
>  }
>  #endif
> +
> +DEFINE_MUTEX(selinux_net_mutex);
> +
> +static bool nf_hooks_required(void)
> +{
> +	return (selinux_secmark_enabled() || selinux_peerlbl_enabled() ||
> +		!selinux_policycap_netpeer) && selinux_enabled;
> +}
> +
> +static bool sock_hooks_required(void)
> +{
> +	return (!selinux_policycap_netpeer || selinux_secmark_enabled() ||
> +		selinux_peerlbl_enabled()) && selinux_enabled;
> +}
> +
> +static bool sock_hooks_registered;
> +
> +void selinux_net_update(void)
> +{
> +	if ((nf_hooks_required() == nf_hooks_registered) &&
> +	    (sock_hooks_required() == sock_hooks_registered))
> +		return;
> +
> +	mutex_lock(&selinux_net_mutex);
> +	if (nf_hooks_required() != nf_hooks_registered) {
> +		if (!nf_hooks_registered)
> +			selinux_nf_ip_init();
> +		else
> +			selinux_nf_ip_exit();
> +	}
> +
> +	if (sock_hooks_required() != sock_hooks_registered) {
> +		if (!sock_hooks_registered)
> +			security_add_hooks(selinux_sock_hooks,
> +					   ARRAY_SIZE(selinux_sock_hooks));
> +		else
> +			security_delete_hooks(selinux_sock_hooks,
> +					      ARRAY_SIZE(selinux_sock_hooks));
> +		sock_hooks_registered = !sock_hooks_registered;
> +	}
> +	mutex_unlock(&selinux_net_mutex);
> +}
> diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
> index 38feb55..0428ab4 100644
> --- a/security/selinux/include/security.h
> +++ b/security/selinux/include/security.h
> @@ -261,6 +261,7 @@ extern void selinux_status_update_setenforce(int enforcing);
>  extern void selinux_status_update_policyload(int seqno);
>  extern void selinux_complete_init(void);
>  extern int selinux_disable(void);
> +extern void selinux_net_update(void);
>  extern void exit_sel_fs(void);
>  extern struct path selinux_null;
>  extern struct vfsmount *selinuxfs_mount;
> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> index ebda973..c509018 100644
> --- a/security/selinux/ss/services.c
> +++ b/security/selinux/ss/services.c
> @@ -2016,6 +2016,7 @@ static void security_load_policycaps(void)
>  						  POLICYDB_CAPABILITY_OPENPERM);
>  	selinux_policycap_alwaysnetwork = ebitmap_get_bit(&policydb.policycaps,
>  						  POLICYDB_CAPABILITY_ALWAYSNETWORK);
> +	selinux_net_update();
>  }
>  
>  static int security_preserve_bools(struct policydb *p);
> diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
> index 56e354f..cc2b0d4 100644
> --- a/security/selinux/xfrm.c
> +++ b/security/selinux/xfrm.c
> @@ -112,6 +112,7 @@ static int selinux_xfrm_alloc_user(struct xfrm_sec_ctx **ctxp,
>  
>  	*ctxp = ctx;
>  	atomic_inc(&selinux_xfrm_refcount);
> +	selinux_net_update();
>  	return 0;
>  
>  err:
> @@ -128,6 +129,7 @@ static void selinux_xfrm_free(struct xfrm_sec_ctx *ctx)
>  		return;
>  
>  	atomic_dec(&selinux_xfrm_refcount);
> +	selinux_net_update();
>  	kfree(ctx);
>  }
>  
> @@ -303,6 +305,7 @@ int selinux_xfrm_policy_clone(struct xfrm_sec_ctx *old_ctx,
>  	if (!new_ctx)
>  		return -ENOMEM;
>  	atomic_inc(&selinux_xfrm_refcount);
> +	selinux_net_update();
>  	*new_ctxp = new_ctx;
>  
>  	return 0;
> @@ -370,6 +373,7 @@ int selinux_xfrm_state_alloc_acquire(struct xfrm_state *x,
>  
>  	x->security = ctx;
>  	atomic_inc(&selinux_xfrm_refcount);
> +	selinux_net_update();
>  out:
>  	kfree(ctx_str);
>  	return rc;

^ permalink raw reply

* Re: [RFC PATCH 5/5] Add sample for adding simple drop program to link
From: Alexei Starovoitov @ 2016-04-06 23:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Brenden Blanco, davem, netdev, tom, Or Gerlitz, daniel,
	john.fastabend
In-Reply-To: <20160406220100.0df04925@redhat.com>

On Wed, Apr 06, 2016 at 10:01:00PM +0200, Jesper Dangaard Brouer wrote:
> On Wed, 6 Apr 2016 21:48:48 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> > If I do multiple flows, via ./pktgen_sample05_flow_per_thread.sh
> > then I hit this strange 14.5Mpps limit (proto 17:   14505558 drops/s).
> > And the RX 4x CPUs are starting to NOT use 100% in softirq, they have
> > some cycles attributed to %idle. (I verified generator is sending at
> > 24Mpps).
...
> > If I change the program to not touch packet data (don't call
> > load_byte()) then the performance increase to 14.6Mpps (single
> > flow/cpu).  And the RX CPU is mostly idle... mlx4_en_process_rx_cq()
> > and page alloc/free functions taking the time.

Please try it with module param log_num_mgm_entry_size=-1
It should get to 20Mpps when bpf doesn't touch the packet.

> Before someone else point out the obvious... I forgot to enable JIT.
> Enable it::
> 
>  # echo 1 > /proc/sys/net/core/bpf_jit_enable
> 
> Performance increased to: 10.8Mpps (proto 17:   10819446 drops/s)
> 
>  Samples: 51K of event 'cycles', Event count (approx.): 56775706510
>    Overhead  Command      Shared Object     Symbol
>  +   55.90%  ksoftirqd/7  [kernel.vmlinux]  [k] sk_load_byte_positive_offset
>  +   10.71%  ksoftirqd/7  [mlx4_en]         [k] mlx4_en_alloc_frags
... 
> It is a very likely cache-miss in sk_load_byte_positive_offset().

yes, likely due to missing ddio as you said.

^ permalink raw reply

* Boot failure when using NFS on OMAP based evms
From: Franklin S Cooper Jr. @ 2016-04-06 23:12 UTC (permalink / raw)
  To: samanthakumar, willemb, edumazet, tony, mugunthanvnm
  Cc: netdev, linux-kernel, davem, kuznet, jmorris, yoshfuji, kaber,
	nsekhar, linux-omap

Hi All,

Currently linux-next is failing to boot via NFS on my AM335x GP evm,
AM437x GP evm and Beagle X15. I bisected the problem down to the commit
"udp: remove headers from UDP packets before queueing".

I had to revert the following three commits to get things working again:

e6afc8ace6dd5cef5e812f26c72579da8806f5ac
udp: remove headers from UDP packets before queueing

627d2d6b550094d88f9e518e15967e7bf906ebbf
udp: enable MSG_PEEK at non-zero offset

b9bb53f3836f4eb2bdeb3447be11042bd29c2408
sock: convert sk_peek_offset functions to WRITE_ONCE


I'm using omap2plus_defconfig for my config.

Below bootlogs are from my AM335x GP evm:

Working
http://pastebin.ubuntu.com/15661989/ (Linux-next 3 patches reverted)

Failing
http://pastebin.ubuntu.com/15661318/ (Linux-next)

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Paul Moore @ 2016-04-06 23:15 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Paolo Abeni, linux-security-module, David S. Miller, James Morris,
	Andreas Gruenbacher, Stephen Smalley, netdev, selinux
In-Reply-To: <20160406221445.GA26807@breakpoint.cc>

On Wed, Apr 6, 2016 at 6:14 PM, Florian Westphal <fw@strlen.de> wrote:
> netfilter hooks are per namespace -- so there is hook unregister when
> netns is destroyed.

Looking around, I see the global and per-namespace registration
functions (nf_register_hook and nf_register_net_hook, respectively),
but I'm looking to see if/how newly created namespace inherit
netfilter hooks from the init network namespace ... if you can create
a network namespace and dodge the SELinux hooks, that isn't a good
thing from a SELinux point of view, although it might be a plus
depending on where you view Paolo's original patches ;)

> Do you think it makes sense to rework the patch to delay registering
> of the netfiler hooks until the system is in a state where they're
> needed, without the 'unregister' aspect?

I would need to see the patch to say for certain, but in principle
that seems perfectly reasonable and I think would satisfy both the
netdev and SELinux camps - good suggestion.  My main goal is to drop
the selinux_nf_ip_init() entirely so it can't be used as a ROP gadget.

We might even be able to trim the secmark_active and peerlbl_active
checks in the SELinux netfilter hooks (an earlier attempt at
optimization; contrary to popular belief, I do care about SELinux
performance), although that would mean that enabling the network
access controls would be one way ... I guess you can disregard that
last bit, I'm thinking aloud again.

> Ideally this would even be per netns -- in perfect world we would
> be able to make it so that a new netns are created with an empty
> hook list.

In general SELinux doesn't care about namespaces, for reasons that are
sorta beyond the scope of this conversation, so I would like to stick
to a all or nothing approach to enabling the SELinux netfilter hooks
across namespaces.  Perhaps we can revisit this at a later time, but
let's keep it simple right now.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [net-next 03/16] fm10k: Avoid crashing the kernel
From: Keller, Jacob E @ 2016-04-06 23:24 UTC (permalink / raw)
  To: davem@davemloft.net, Kirsher, Jeffrey T
  Cc: nhorman@redhat.com, netdev@vger.kernel.org, Allan, Bruce W,
	sassmann@redhat.com, jogreene@redhat.com
In-Reply-To: <20160405.121234.207383895896842448.davem@davemloft.net>

On Tue, 2016-04-05 at 12:12 -0400, David Miller wrote:
> As Joe suggested, it is not reasonable to expect all compilers to be
> able to figure
> out the result of all of the index increments in this function lead
> to a specific
> constant value.
> 
> Your only option is to either keep the code as-is, or add proper
> error reporting to
> this function and to all callers, in order to handle the situation at
> run time which
> I realize is exactly what you are trying to avoid.
> 
> If this crashes at run time with the BUG_ON(), it's going to happen
> really quickly
> when you bring the interface up.  So I don't see 
> the run time check as so tragic.

So we're ok with just not changing this then, and living with a crash?
That's fine with me, I suppose.

Should the second WARN_ONCE be changed to a BUG_ON as well to get a
crash instead of just a warning? 

Thanks,
Jake

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Florian Westphal @ 2016-04-06 23:45 UTC (permalink / raw)
  To: Paul Moore
  Cc: Florian Westphal, Paolo Abeni, linux-security-module,
	David S. Miller, James Morris, Andreas Gruenbacher,
	Stephen Smalley, netdev, selinux
In-Reply-To: <CAHC9VhSZP1QxT-pQeHAW9m1xmyDsbFeA=9b8SdeWgS0RiDTOBA@mail.gmail.com>

Paul Moore <paul@paul-moore.com> wrote:
> On Wed, Apr 6, 2016 at 6:14 PM, Florian Westphal <fw@strlen.de> wrote:
> > netfilter hooks are per namespace -- so there is hook unregister when
> > netns is destroyed.
> 
> Looking around, I see the global and per-namespace registration
> functions (nf_register_hook and nf_register_net_hook, respectively),
> but I'm looking to see if/how newly created namespace inherit
> netfilter hooks from the init network namespace ... if you can create
> a network namespace and dodge the SELinux hooks, that isn't a good
> thing from a SELinux point of view, although it might be a plus
> depending on where you view Paolo's original patches ;)

Heh :-)

If you use nf_register_net_hook, the hook is only registered in the
namespace.

If you use nf_register_hook, the hook is put on a global list and
registed in all existing namespaces.

New namespaces will have the hook added as well (see
netfilter_net_init -> nf_register_hook_list in netfilter/core.c )

Since nf_register_hook is used it should be impossible to get a netns
that doesn't call these hooks.

> > Do you think it makes sense to rework the patch to delay registering
> > of the netfiler hooks until the system is in a state where they're
> > needed, without the 'unregister' aspect?
> 
> I would need to see the patch to say for certain, but in principle
> that seems perfectly reasonable and I think would satisfy both the
> netdev and SELinux camps - good suggestion.  My main goal is to drop
> the selinux_nf_ip_init() entirely so it can't be used as a ROP gadget.
> 
> We might even be able to trim the secmark_active and peerlbl_active
> checks in the SELinux netfilter hooks (an earlier attempt at
> optimization; contrary to popular belief, I do care about SELinux
> performance), although that would mean that enabling the network
> access controls would be one way ... I guess you can disregard that
> last bit, I'm thinking aloud again.

One way is fine I think.

> > Ideally this would even be per netns -- in perfect world we would
> > be able to make it so that a new netns are created with an empty
> > hook list.
> 
> In general SELinux doesn't care about namespaces, for reasons that are
> sorta beyond the scope of this conversation, so I would like to stick
> to a all or nothing approach to enabling the SELinux netfilter hooks
> across namespaces.  Perhaps we can revisit this at a later time, but
> let's keep it simple right now.

Okay, I'd prefer to stick to your recommendation anyway wrt. to selinux
(Casey, I read your comment regarding smack. Noted, we don't want to
break smack either...)

I think that in this case the entire question is:

In your experience, how likely is a config where
selinux is enabled BUT the hooks are not needed (i.e., where we hit the

if (!selinux_policycap_netpeer)
    return NF_ACCEPT;

if (!secmark_active && !peerlbl_active)
   return NF_ACCEPT;

tests inside the hooks)?  If such setups are uncommon we should just
drop this idea or at least put it on the back burner until the more expensive
netfilter hooks (conntrack, cough) are out of the way.

Thanks,
Florian

^ permalink raw reply

* panic in __inet_hash
From: David Ahern @ 2016-04-07  0:02 UTC (permalink / raw)
  To: netdev@vger.kernel.org, edumazet

Rebased to top of tree a few minutes ago to test v6 of the multipath 
patch. I hitting a panic in __inet_hash:

[   17.264247] BUG: unable to handle kernel paging request at 
ffffffffffffffa8
[   17.265015] IP: [<ffffffff816e2b3c>] __inet_hash+0x11c/0x1c0
[   17.265015] PGD 1e07067 PUD 1e09067 PMD 0
[   17.265015] Oops: 0000 [#1] SMP
[   17.265015] Modules linked in:
[   17.265015] CPU: 0 PID: 1488 Comm: zebra Not tainted 4.6.0-rc1+ #5
[   17.265015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 
04/01/2014
[   17.265015] task: ffff880011010800 ti: ffff880017bb0000 task.ti: 
ffff880017bb0000
[   17.265015] RIP: 0010:[<ffffffff816e2b3c>]  [<ffffffff816e2b3c>] 
__inet_hash+0x11c/0x1c0
[   17.265015] RSP: 0018:ffff880017bb3e48  EFLAGS: 00010293
[   17.265015] RAX: 0000000000000002 RBX: 0000000000000004 RCX: 
0000000000000002
[   17.265015] RDX: 0000000000000000 RSI: ffff880011010fa0 RDI: 
ffff880011010800
[   17.265015] RBP: ffff880017bb3e88 R08: ffffffff8282b080 R09: 
ffffffffffffff98
[   17.265015] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffffffff82ea80c0
[   17.265015] R13: ffffffff82ea7f80 R14: 0000000000000000 R15: 
ffff88001b618000
[   17.265015] FS:  00007f947df9d740(0000) GS:ffff88001fc00000(0000) 
knlGS:0000000000000000
[   17.265015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.265015] CR2: ffffffffffffffa8 CR3: 00000000179d7000 CR4: 
00000000000006f0
[   17.265015] Stack:
[   17.265015]  ffffffff82e95150 0000006e00000005 ffffffff8170b540 
ffff88001b618000
[   17.265015]  0000000000000000 0000000000000003 0000000000000000 
00000000ffffffff
[   17.265015]  ffff880017bb3ea0 ffffffff816e2c19 ffff88001b618000 
ffff880017bb3ec0
[   17.265015] Call Trace:
[   17.265015]  [<ffffffff8170b540>] ? udp4_seq_show+0x150/0x150
[   17.265015]  [<ffffffff816e2c19>] inet_hash+0x39/0x60
[   17.265015]  [<ffffffff816e44dd>] inet_csk_listen_start+0x9d/0xc0
[   17.265015]  [<ffffffff81719c2c>] inet_listen+0x9c/0xe0
[   17.265015]  [<ffffffff8166b5ae>] SyS_listen+0x4e/0x80
[   17.265015]  [<ffffffff81002eac>] do_syscall_64+0x5c/0x2b0
[   17.265015]  [<ffffffff8181829a>] entry_SYSCALL64_slow_path+0x25/0x25
[   17.265015] Code: 4b 54 9e ff 41 f6 c6 01 74 13 e9 88 00 00 00 4d 8b 
36 e8 38 54 9e ff 41 f6 c6 01 75 7a 4d 8d 4e 98 4d 39 cf 74 e9 41 0f b7 
47 10 <66> 41 39 46 a8 75 dd 41 0f b6 46 ab 89 c2 41 32 57 13 83 e2 20
[   17.265015] RIP  [<ffffffff816e2b3c>] __inet_hash+0x11c/0x1c0
[   17.265015]  RSP <ffff880017bb3e48>
[   17.265015] CR2: ffffffffffffffa8
[   17.265015] ---[ end trace d7340320507851f4 ]---



git bisect points to:
dsa@kenny:~/kernel-2.git$ git bisect bad
3b24d854cb35383c30642116e5992fd619bdc9bc is the first bad commit
commit 3b24d854cb35383c30642116e5992fd619bdc9bc
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Apr 1 08:52:17 2016 -0700

     tcp/dccp: do not touch listener sk_refcnt under synflood

     When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
     cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.

     By letting listeners use SOCK_RCU_FREE infrastructure,
     we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt

     Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
     only listeners are impacted by this change.

     Peak performance under SYNFLOOD is increased by ~33% :

     On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps

     Most consuming functions are now skb_set_owner_w() and sock_wfree()
     contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.

     Signed-off-by: Eric Dumazet <edumazet@google.com>
     Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 77d3f9ba3c3ec4c3443416f7536fe6d3ee9ec1a1 
b797845f7bbad79b5875bfa969ebfe9759c0b8b9 M	include
:040000 040000 0325378c93011f87012e5a6c064b09c2d59e052b 
32f90a3ec258f4625765c33d4718b2370e69e862 M	net

^ permalink raw reply

* Re: [PATCH net-next V3 05/16] net: fec: reduce interrupts
From: Troy Kisky @ 2016-04-07  0:42 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, fugang.duan, lznuaa, fabio.estevam, l.stach, andrew,
	tremyfr, gerg, linux-arm-kernel, johannes, stillcompiling,
	sergei.shtylyov, arnd
In-Reply-To: <20160406.172008.266926707628676037.davem@davemloft.net>

On 4/6/2016 2:20 PM, David Miller wrote:
> From: Troy Kisky <troy.kisky@boundarydevices.com>
> Date: Tue,  5 Apr 2016 19:25:51 -0700
> 
>> By clearing the NAPI interrupts in the NAPI routine
>> and not in the interrupt handler, we can reduce the
>> number of interrupts. We also don't need any status
>> variables as the registers are still valid.
>>
>> Also, notice that if budget pkts are received, the
>> next call to fec_enet_rx_napi will now continue to
>> receive the previously pending packets.
>>
>> To test that this actually reduces interrupts, try
>> this command before/after patch
>>
>> cat /proc/interrupts |grep ether; \
>> ping -s2800 192.168.0.201 -f -c1000 ; \
>> cat /proc/interrupts |grep ether
>>
>> For me, before this patch is 2996 interrupts.
>> After patch is 2010 interrupts.
>>
>> Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>
> 
> I really don't think this is a good idea at all.
> 
> I would instead really rather see you stash away the
> status register values into some piece of software state,
> and then re-read them before you are about to finish a
> NAPI poll cycle.
> 
> 

Sure, that's an easy change. But if a TX int is what caused the interrupt
and masks them, and then a RX packet happens before napi runs, do you want
the TX serviced 1st, or RX ?

^ permalink raw reply

* Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: Troy Kisky @ 2016-04-07  1:09 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, fugang.duan, lznuaa, fabio.estevam, l.stach, andrew,
	tremyfr, gerg, linux-arm-kernel, johannes, stillcompiling,
	sergei.shtylyov, arnd
In-Reply-To: <20160406.172029.2094647648319882709.davem@davemloft.net>

On 4/6/2016 2:20 PM, David Miller wrote:
> 
> This is a way too large patch series.
> 
> Please split it up into smaller, more logical, pieces.
> 
> Thanks.
> 

If you apply the 1st 3 that have been acked, I'll be at 13.

Would I then send the next 5 for  V4, and when that is applied
send another V4 with the next 8 that have been already been acked?

Thanks
Troy

^ permalink raw reply

* RE: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: Fugang Duan @ 2016-04-07  1:23 UTC (permalink / raw)
  To: Troy Kisky, netdev@vger.kernel.org, davem@davemloft.net,
	lznuaa@gmail.com
  Cc: andrew@lunn.ch, stillcompiling@gmail.com, arnd@arndb.de,
	sergei.shtylyov@cogentembedded.com, gerg@uclinux.org,
	Fabio Estevam, johannes@sipsolutions.net, l.stach@pengutronix.de,
	linux-arm-kernel@lists.infradead.org, tremyfr@gmail.com
In-Reply-To: <57053C94.2000009@boundarydevices.com>

From: Troy Kisky <troy.kisky@boundarydevices.com> Sent: Thursday, April 07, 2016 12:43 AM
> To: Fugang Duan <fugang.duan@nxp.com>; netdev@vger.kernel.org;
> davem@davemloft.net; lznuaa@gmail.com
> Cc: Fabio Estevam <fabio.estevam@nxp.com>; l.stach@pengutronix.de;
> andrew@lunn.ch; tremyfr@gmail.com; gerg@uclinux.org; linux-arm-
> kernel@lists.infradead.org; johannes@sipsolutions.net;
> stillcompiling@gmail.com; sergei.shtylyov@cogentembedded.com;
> arnd@arndb.de
> Subject: Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
> 
> On 4/6/2016 1:51 AM, Fugang Duan wrote:
> > From: Troy Kisky <troy.kisky@boundarydevices.com> Sent: Wednesday,
> > April 06, 2016 10:26 AM
> >> To: netdev@vger.kernel.org; davem@davemloft.net; Fugang Duan
> >> <fugang.duan@nxp.com>; lznuaa@gmail.com
> >> Cc: Fabio Estevam <fabio.estevam@nxp.com>; l.stach@pengutronix.de;
> >> andrew@lunn.ch; tremyfr@gmail.com; gerg@uclinux.org; linux-arm-
> >> kernel@lists.infradead.org; johannes@sipsolutions.net;
> >> stillcompiling@gmail.com; sergei.shtylyov@cogentembedded.com;
> >> arnd@arndb.de; Troy Kisky <troy.kisky@boundarydevices.com>
> >> Subject: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
> >>
> >> V3 has
> >>
> >> 1 dropped patch "net: fec: print more debug info in fec_timeout"
> >> 2 new patches
> >> 0002-net-fec-remove-unused-interrupt-FEC_ENET_TS_TIMER.patch
> >> 0003-net-fec-return-IRQ_HANDLED-if-fec_ptp_check_pps_even.patch
> >>
> >> 1 combined patch
> >> 0004-net-fec-pass-rxq-txq-to-fec_enet_rx-tx_queue-instead.patch
> >>
> >> The changes are noted on individual patches
> >>
> >> My measured performance of this series is
> >>
> >> before patch set
> >> 365 Mbits/sec Tx/407 RX
> >>
> >> after patch set
> >> 374 Tx/427 Rx
> >>
> >
> > I doubt the performance data,  I validate it on i.MX6q sabresd board on the
> latest commit(4da46cebbd3b) in net tree.
> 
> 
> 
> I was doing UDP tests, as outlined in my V2 cover letter. Also, my cpu is 1G. Is
> yours 1.2G?
> 
It is also 1GHz.

I will double test UDP performance.

> 
> 
> 
> > root@imx6qdlsolo:~# uname -a
> > Linux imx6qdlsolo 4.6.0-rc1-00318-g4da46ce #180 SMP Wed Apr 6 16:24:09
> > CST 2016 armv7l GNU/Linux
> 
> 
> This is the V2 patch that I dropped.
> 
> I will force update my local net-next_master branch, to make testing this series
> easier.
> Note that my local net-next_master branch has about 19 patches on top of this
> series.
> so,
> 
> tkisky@office-server2:~/linux-imx6$ git reset --hard HEAD~19 HEAD is now at
> a125da7 net: fec: don't set cbd_bufaddr unless no mapping error
> 
I see.

> 
> >
> > TCP RX performance is 602Mbps, TX is only 325Mbps,   TX path has some
> performance issue in net tree.
> > I will dig out it.
> >
> >
> 
> 
> More testing is always better. Thanks
> 
> 
> >>
> >> Troy Kisky (16):
> >>   net: fec: only check queue 0 if RXF_0/TXF_0 interrupt is set
> >>   net: fec: remove unused interrupt FEC_ENET_TS_TIMER
> >>   net: fec: return IRQ_HANDLED if fec_ptp_check_pps_event handled it
> >>   net: fec: pass rxq/txq to fec_enet_rx/tx_queue instead of queue_id
> >>   net: fec: reduce interrupts
> >>   net: fec: split off napi routine with 3 queues
> >>   net: fec: don't clear all rx queue bits when just one is being checked
> >>   net: fec: set cbd_sc without relying on previous value
> >>   net: fec: eliminate calls to fec_enet_get_prevdesc
> >>   net: fec: move restart test for efficiency
> >>   net: fec: clear cbd_sc after transmission to help with debugging
> >>   net: fec: dump all tx queues in fec_dump
> >>   net: fec: detect tx int lost
> >>   net: fec: create subroutine reset_tx_queue
> >>   net: fec: call dma_unmap_single on mapped tx buffers at restart
> >>   net: fec: don't set cbd_bufaddr unless no mapping error
> >>
> >>  drivers/net/ethernet/freescale/fec.h      |  10 +-
> >>  drivers/net/ethernet/freescale/fec_main.c | 410
> >> ++++++++++++++++------------
> >> --
> >>  2 files changed, 218 insertions(+), 202 deletions(-)
> >>
> >> --
> >> 2.5.0
> >

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox