Netdev List
 help / color / mirror / Atom feed
* Re: [net-next 6/8] bna: Add Stats Clear Counter
From: Ben Hutchings @ 2012-12-06  2:19 UTC (permalink / raw)
  To: Rasesh Mody; +Cc: davem, netdev, adapter_linux_open_src_team
In-Reply-To: <1354748470-26293-7-git-send-email-rmody@brocade.com>

On Wed, 2012-12-05 at 15:01 -0800, Rasesh Mody wrote:
> Added Stats clear counter to the bfi_enet_stats_mac structure and ethtool stats
>  
> Signed-off-by: Rasesh Mody <rmody@brocade.com>

Since this structure appears to be part of the firmware interface, you
should combine this (and any other interface changes) with the change to
the requested firmware version (7/8).

Ben.

> ---
>  drivers/net/ethernet/brocade/bna/bfi_enet.h     |    1 +
>  drivers/net/ethernet/brocade/bna/bnad_ethtool.c |    1 +
>  2 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/brocade/bna/bfi_enet.h b/drivers/net/ethernet/brocade/bna/bfi_enet.h
> index eef6e1f..7d10e33 100644
> --- a/drivers/net/ethernet/brocade/bna/bfi_enet.h
> +++ b/drivers/net/ethernet/brocade/bna/bfi_enet.h
> @@ -787,6 +787,7 @@ struct bfi_enet_stats_bpc {
>  
>  /* MAC Rx Statistics */
>  struct bfi_enet_stats_mac {
> +	u64 stats_clr_cnt;	/* times this stats cleared */
>  	u64 frame_64;		/* both rx and tx counter */
>  	u64 frame_65_127;		/* both rx and tx counter */
>  	u64 frame_128_255;		/* both rx and tx counter */
> diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> index 40e1e84..455b5a2 100644
> --- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> +++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
> @@ -102,6 +102,7 @@ static const char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
>  	"rx_unmap_q_alloc_failed",
>  	"rxbuf_alloc_failed",
>  
> +	"mac_stats_clr_cnt",
>  	"mac_frame_64",
>  	"mac_frame_65_127",
>  	"mac_frame_128_255",

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] net: fix some compiler warning in net/core/neighbour.c
From: Shan Wei @ 2012-12-06  2:39 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Shan Wei, David S. Miller
In-Reply-To: <1354759444-4937-1-git-send-email-amwang@redhat.com>

Cong Wang said, at 2012/12/6 10:04:
> From: Cong Wang <amwang@redhat.com>
> 
> net/core/neighbour.c:65:12: warning: 'zero' defined but not used [-Wunused-variable]
> net/core/neighbour.c:66:12: warning: 'unres_qlen_max' defined but not used [-Wunused-variable]
> 
> These variables are only used when CONFIG_SYSCTL is defined,
> so move them under #ifdef CONFIG_SYSCTL.
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Cc: Shan Wei <davidshan@tencent.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Rapid response~~~ :-)
Same patch in my tree,and i see your patch when prepare to submit it.

Acked-by: Shan Wei <davidshan@tencent.com>


Best Regards
Shan Wei

^ permalink raw reply

* Re: [GIT PULL] Remove __dev* markings from the networking drivers
From: Greg KH @ 2012-12-06  2:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, wfp5p
In-Reply-To: <20121204.160257.680931337109544105.davem@davemloft.net>

On Tue, Dec 04, 2012 at 04:02:57PM -0500, David Miller wrote:
> From: Greg KH <gregkh@linuxfoundation.org>
> Date: Tue, 4 Dec 2012 12:30:52 -0800
> 
> > On Tue, Dec 04, 2012 at 01:17:26PM -0500, David Miller wrote:
> >> It seemse the function declarations were not properly reformatted
> >> after the __dev* tags were removed.  You can't just search and replace
> >> this kind of stuff.  The result looks terrible.
> >> 
> >> Greg please check for things like this next time you send me changes
> >> written by someone else.
> > 
> > Ick, sorry about that.  Want me to fix them all back up?  It's the least
> > I could do.
> 
> If you could do that, I'd really appreciate it.

I don't see these patches in your net-next branch yet, so should I just
make this against this branch and do a new pull request?  Or am I not
looking at net-next properly?

thanks,

greg k-h

^ permalink raw reply

* Re: [GIT PULL] Remove __dev* markings from the networking drivers
From: David Miller @ 2012-12-06  2:51 UTC (permalink / raw)
  To: gregkh; +Cc: netdev, wfp5p
In-Reply-To: <20121206024833.GA8469@kroah.com>

From: Greg KH <gregkh@linuxfoundation.org>
Date: Wed, 5 Dec 2012 18:48:33 -0800

> On Tue, Dec 04, 2012 at 04:02:57PM -0500, David Miller wrote:
>> From: Greg KH <gregkh@linuxfoundation.org>
>> Date: Tue, 4 Dec 2012 12:30:52 -0800
>> 
>> > On Tue, Dec 04, 2012 at 01:17:26PM -0500, David Miller wrote:
>> >> It seemse the function declarations were not properly reformatted
>> >> after the __dev* tags were removed.  You can't just search and replace
>> >> this kind of stuff.  The result looks terrible.
>> >> 
>> >> Greg please check for things like this next time you send me changes
>> >> written by someone else.
>> > 
>> > Ick, sorry about that.  Want me to fix them all back up?  It's the least
>> > I could do.
>> 
>> If you could do that, I'd really appreciate it.
> 
> I don't see these patches in your net-next branch yet, so should I just
> make this against this branch and do a new pull request?  Or am I not
> looking at net-next properly?

The drivers/net changes definitely are in my net-next tree.  I only use
the 'master' branch.

^ permalink raw reply

* Re: [PATCH net-next] net: fix some compiler warning in net/core/neighbour.c
From: David Miller @ 2012-12-06  2:51 UTC (permalink / raw)
  To: shanwei88; +Cc: amwang, netdev, davidshan
In-Reply-To: <50C00544.7060307@gmail.com>

From: Shan Wei <shanwei88@gmail.com>
Date: Thu, 06 Dec 2012 10:39:00 +0800

> Cong Wang said, at 2012/12/6 10:04:
>> From: Cong Wang <amwang@redhat.com>
>> 
>> net/core/neighbour.c:65:12: warning: 'zero' defined but not used [-Wunused-variable]
>> net/core/neighbour.c:66:12: warning: 'unres_qlen_max' defined but not used [-Wunused-variable]
>> 
>> These variables are only used when CONFIG_SYSCTL is defined,
>> so move them under #ifdef CONFIG_SYSCTL.
>> 
>> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
>> Cc: Shan Wei <davidshan@tencent.com>
>> Cc: David S. Miller <davem@davemloft.net>
>> Signed-off-by: Cong Wang <amwang@redhat.com>
> 
> Rapid response~~~ :-)
> Same patch in my tree,and i see your patch when prepare to submit it.
> 
> Acked-by: Shan Wei <davidshan@tencent.com>

Applied.

^ permalink raw reply

* Re: [GIT PULL] Remove __dev* markings from the networking drivers
From: David Miller @ 2012-12-06  2:52 UTC (permalink / raw)
  To: gregkh; +Cc: netdev, wfp5p
In-Reply-To: <20121205.215121.1118396738034568895.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 05 Dec 2012 21:51:21 -0500 (EST)

> From: Greg KH <gregkh@linuxfoundation.org>
> Date: Wed, 5 Dec 2012 18:48:33 -0800
> 
>> On Tue, Dec 04, 2012 at 04:02:57PM -0500, David Miller wrote:
>>> From: Greg KH <gregkh@linuxfoundation.org>
>>> Date: Tue, 4 Dec 2012 12:30:52 -0800
>>> 
>>> > On Tue, Dec 04, 2012 at 01:17:26PM -0500, David Miller wrote:
>>> >> It seemse the function declarations were not properly reformatted
>>> >> after the __dev* tags were removed.  You can't just search and replace
>>> >> this kind of stuff.  The result looks terrible.
>>> >> 
>>> >> Greg please check for things like this next time you send me changes
>>> >> written by someone else.
>>> > 
>>> > Ick, sorry about that.  Want me to fix them all back up?  It's the least
>>> > I could do.
>>> 
>>> If you could do that, I'd really appreciate it.
>> 
>> I don't see these patches in your net-next branch yet, so should I just
>> make this against this branch and do a new pull request?  Or am I not
>> looking at net-next properly?
> 
> The drivers/net changes definitely are in my net-next tree.  I only use
> the 'master' branch.

The merge commit is 682d7978aee072f411fc747d32954a8371dd7b1b

^ permalink raw reply

* Re: [GIT PULL] Remove __dev* markings from the networking drivers
From: Greg KH @ 2012-12-06  2:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, wfp5p
In-Reply-To: <20121205.215246.789211879539058594.davem@davemloft.net>

On Wed, Dec 05, 2012 at 09:52:46PM -0500, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 05 Dec 2012 21:51:21 -0500 (EST)
> 
> > From: Greg KH <gregkh@linuxfoundation.org>
> > Date: Wed, 5 Dec 2012 18:48:33 -0800
> > 
> >> On Tue, Dec 04, 2012 at 04:02:57PM -0500, David Miller wrote:
> >>> From: Greg KH <gregkh@linuxfoundation.org>
> >>> Date: Tue, 4 Dec 2012 12:30:52 -0800
> >>> 
> >>> > On Tue, Dec 04, 2012 at 01:17:26PM -0500, David Miller wrote:
> >>> >> It seemse the function declarations were not properly reformatted
> >>> >> after the __dev* tags were removed.  You can't just search and replace
> >>> >> this kind of stuff.  The result looks terrible.
> >>> >> 
> >>> >> Greg please check for things like this next time you send me changes
> >>> >> written by someone else.
> >>> > 
> >>> > Ick, sorry about that.  Want me to fix them all back up?  It's the least
> >>> > I could do.
> >>> 
> >>> If you could do that, I'd really appreciate it.
> >> 
> >> I don't see these patches in your net-next branch yet, so should I just
> >> make this against this branch and do a new pull request?  Or am I not
> >> looking at net-next properly?
> > 
> > The drivers/net changes definitely are in my net-next tree.  I only use
> > the 'master' branch.
> 
> The merge commit is 682d7978aee072f411fc747d32954a8371dd7b1b

Ugh, sorry about that, I forgot I had pointed my repo at my copy, not
yours, my fault, you are right.  sorry for the noise.

greg "time to knock off for the night, I'm doing stupid mistakes" k-h

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: doc: add default value for neighbour parameters
From: Shan Wei @ 2012-12-06  2:58 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, Eric Dumazet, NetDev
In-Reply-To: <1354758235.17107.131.camel@deadeye.wl.decadent.org.uk>

Ben Hutchings said, at 2012/12/6 9:43:
>>  neigh/default/unres_qlen - INTEGER
>>  	The maximum number of packets which may be queued for each
>>  	unresolved address by other network layers.
>>  	(deprecated in linux 3.3) : use unres_qlen_bytes instead.
>> +	Prior to linux 3.3, the default value is 3 which may cause
>> +	secluded packet loss. The current default value is calculated
>           ^^^^^^^^
> I think the proper word here is 'silent'?
 
The number of lost packets is recorded in unresolved_discards
of /proc/net/stat/arp_cache. Although, arp_cache is not easy
to understand(I still don't know why we need so many rows),
We can confirm dropping event from unresolved_discards in last column.
The dropping event is not marked by absence of sound.

But for general user who using TCP/UDP or ping to sending packets
out, can't simply find the dropping reason that destination ip is unresolved.
They just doubt about the TCP/UDP or ping. So I use 'secluded' which is hidden
from general view.

My English is not good enough, if missing something, please point to me.
Thanks.

Best Regards
Shan Wei

^ permalink raw reply

* Re: [RFC PATCH 1/2] tun: correctly report an error in tun_flow_init()
From: Jason Wang @ 2012-12-06  3:35 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux
In-Reply-To: <2057175.PIsargdSHu@sifl>

On Wednesday, December 05, 2012 11:02:04 AM Paul Moore wrote:
> On Thursday, November 29, 2012 05:06:29 PM Paul Moore wrote:
> > On error, the error code from tun_flow_init() is lost inside
> > tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
> > error code to the "err" variable which is returned by
> > the tun_flow_init() function on error.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>
> 
> Jason, we've had some good discussion around patch 2/2 but nothing on this
> fix; can I assume you are okay with this patch?  If so I think we should go
> ahead and apply this ...

Yes, it looks good. Maybe we can separate this patch from this RFC series and 
tag it as "net-next" to let David apply it soon.

Thanks
> 
> > ---
> > 
> >  drivers/net/tun.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 607a3a5..877ffe2 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1605,7 +1605,8 @@ static int tun_set_iff(struct net *net, struct file
> > *file, struct ifreq *ifr)
> > 
> >  		tun_net_init(dev);
> > 
> > -		if (tun_flow_init(tun))
> > +		err = tun_flow_init(tun);
> > +		if (err < 0)
> > 
> >  			goto err_free_dev;
> >  		
> >  		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |

^ permalink raw reply

* Re: [PATCH rfc] netfilter: two xtables matches
From: Pablo Neira Ayuso @ 2012-12-06  5:22 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Willem de Bruijn, netfilter-devel, netdev, Eric Dumazet,
	David Miller, kaber
In-Reply-To: <alpine.LNX.2.01.1212052100190.30908@nerf07.vanv.qr>

On Wed, Dec 05, 2012 at 09:00:36PM +0100, Jan Engelhardt wrote:
> On Wednesday 2012-12-05 20:28, Willem de Bruijn wrote:
> 
> >Somehow, the first part of this email went missing. Not critical,
> >but for completeness:
> >
> >These two patches each add an xtables match.
> >
> >The xt_priority match is a straighforward addition in the style of
> >xt_mark, adding the option to filter on one more sk_buff field. I
> >have an immediate application for this. The amount of code (in
> >kernel + userspace) to add a single check proved quite large.
> 
> Hm so yeah, can't we just place this in xt_mark.c?

I don't feel this belongs to xt_mark at all.

^ permalink raw reply

* Re: [PATCH] net: fixup tx time stamping for uml vde driver.
From: Richard Cochran @ 2012-12-06  6:55 UTC (permalink / raw)
  To: Paul Chavent; +Cc: jdike, richard, user-mode-linux-devel, netdev
In-Reply-To: <1354717253-8737-1-git-send-email-paul.chavent@onera.fr>

On Wed, Dec 05, 2012 at 03:20:53PM +0100, Paul Chavent wrote:
> Call skb_tx_timestamp after write completion.
> 
> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>

The subject line would better describe the changes if it would read,
"enable tx time stamping in the uml vde driver."

Can you please also add support for ethtool get_ts_info?

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH net 1/1] r8169: workaround for missing extended GigaMAC registers
From: Wang YanQing @ 2012-12-06  7:38 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, David Miller, Lee Chun-Yi, Hayes Wang
In-Reply-To: <20121205223452.GA24164@electric-eye.fr.zoreil.com>

On Wed, Dec 05, 2012 at 11:34:52PM +0100, Francois Romieu wrote:
> GigaMAC registers have been reported left unitialized in several
> situations:
> - after cold boot from power-off state
> - after S3 resume
> 
> Tweaking rtl_hw_phy_config takes care of both.
Hi Francois.
Are you sure we will lost GigaMAC registers's content
after NIC into PCI_D3hot state?

Thanks

^ permalink raw reply

* Re: [PATCHv5] virtio-spec: virtio network device RFS support
From: Michael S. Tsirkin @ 2012-12-06  8:13 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, kvm, virtualization
In-Reply-To: <1354739966.2655.25.camel@bwh-desktop.uk.solarflarecom.com>

On Wed, Dec 05, 2012 at 08:39:26PM +0000, Ben Hutchings wrote:
> On Mon, 2012-12-03 at 12:58 +0200, Michael S. Tsirkin wrote:
> > Add RFS support to virtio network device.
> > Add a new feature flag VIRTIO_NET_F_RFS for this feature, a new
> > configuration field max_virtqueue_pairs to detect supported number of
> > virtqueues as well as a new command VIRTIO_NET_CTRL_RFS to program
> > packet steering for unidirectional protocols.
> [...]
> > +Programming of the receive flow classificator is implicit.
> > + Transmitting a packet of a specific flow on transmitqX will cause incoming
> > + packets for this flow to be steered to receiveqX.
> > + For uni-directional protocols, or where no packets have been transmitted
> > + yet, device will steer a packet to a random queue out of the specified
> > + receiveq0..receiveqn.
> [...]
> 
> It doesn't seem like this is usable to implement accelerated RFS in the
> guest, though perhaps that doesn't matter.

What is the issue? Could you be more explicit please?

It seems to work pretty well: if we have
# of queues >= # of cpus, incoming TCP_STREAM into
guest scales very nicely without manual tweaks in guest.

The way it works is, when guest sends a packet driver
select the rx queue that we want to use for incoming
packets for this slow, and transmit on the matching tx queue.
This is exactly what text above suggests no?

>  On the host side, presumably
> you'll want vhost_net to do the equivalent of sock_rps_record_flow() -
> only without a socket?  But in any case, that requires an rxhash, so I
> don't see how this is supposed to work.
> 
> Ben.

Host should just do what guest tells it to.
On the host side we build up the steering table as we get packets
to transmit. See the code in drivers/net/tun.c in recent
kernels.

Again this actually works fine - what are the problems that you see?
Could you give an example please?

> -- 
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next 0/7] Allow to monitor multicast cache event via rtnetlink
From: Nicolas Dichtel @ 2012-12-06  8:43 UTC (permalink / raw)
  To: David Miller; +Cc: David.Laight, netdev
In-Reply-To: <20121205.125453.1457654258131828976.davem@davemloft.net>

Le 05/12/2012 18:54, David Miller a écrit :
> From: "David Laight" <David.Laight@ACULAB.COM>
> Date: Wed, 5 Dec 2012 11:41:33 -0000
>
>> Probably worth commenting that the 64bit items might only be 32bit aligned.
>> Just to stop anyone trying to read/write them with pointer casts.
>
> Rather, let's not create this situation at all.
>
> It's totally inappropriate to have special code to handle every single
> time we want to put 64-bit values into netlink messages.
>
> We need a real solution to this issue.
>
The easiest way is to update *_ALIGNTO values (maybe we can keep NLMSG_ALIGNTO 
to 4). But I think that many userland apps have these values hardcoded and, the 
most important thing, this may increase size of many netlink messages. Hence we 
need probably to find something better.


diff --git a/include/uapi/linux/netfilter/nfnetlink_compat.h 
b/include/uapi/linux/netfilter/nfnetlink_compat.h
index ffb9503..121e62a 100644
--- a/include/uapi/linux/netfilter/nfnetlink_compat.h
+++ b/include/uapi/linux/netfilter/nfnetlink_compat.h
@@ -33,7 +33,7 @@ struct nfattr {
  #define NFNL_NFA_NEST	0x8000
  #define NFA_TYPE(attr) 	((attr)->nfa_type & 0x7fff)

-#define NFA_ALIGNTO     4
+#define NFA_ALIGNTO     8
  #define NFA_ALIGN(len)	(((len) + NFA_ALIGNTO - 1) & ~(NFA_ALIGNTO - 1))
  #define NFA_OK(nfa,len)	((len) > 0 && (nfa)->nfa_len >= sizeof(struct nfattr) \
  	&& (nfa)->nfa_len <= (len))
diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 78d5b8a..66d2a26 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -75,7 +75,7 @@ struct nlmsghdr {
     Check		NLM_F_EXCL
   */

-#define NLMSG_ALIGNTO	4U
+#define NLMSG_ALIGNTO	8U
  #define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )
  #define NLMSG_HDRLEN	 ((int) NLMSG_ALIGN(sizeof(struct nlmsghdr)))
  #define NLMSG_LENGTH(len) ((len)+NLMSG_ALIGN(NLMSG_HDRLEN))
@@ -145,7 +145,7 @@ struct nlattr {
  #define NLA_F_NET_BYTEORDER	(1 << 14)
  #define NLA_TYPE_MASK		~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)

-#define NLA_ALIGNTO		4
+#define NLA_ALIGNTO		8
  #define NLA_ALIGN(len)		(((len) + NLA_ALIGNTO - 1) & ~(NLA_ALIGNTO - 1))
  #define NLA_HDRLEN		((int) NLA_ALIGN(sizeof(struct nlattr)))

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 33d29ce..ee898c1 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -146,7 +146,7 @@ struct rtattr {

  /* Macros to handle rtattributes */

-#define RTA_ALIGNTO	4
+#define RTA_ALIGNTO	8
  #define RTA_ALIGN(len) ( ((len)+RTA_ALIGNTO-1) & ~(RTA_ALIGNTO-1) )
  #define RTA_OK(rta,len) ((len) >= (int)sizeof(struct rtattr) && \
  			 (rta)->rta_len >= sizeof(struct rtattr) && \
@@ -322,7 +322,7 @@ struct rtnexthop {

  /* Macros to handle hexthops */

-#define RTNH_ALIGNTO	4
+#define RTNH_ALIGNTO	8
  #define RTNH_ALIGN(len) ( ((len)+RTNH_ALIGNTO-1) & ~(RTNH_ALIGNTO-1) )
  #define RTNH_OK(rtnh,len) ((rtnh)->rtnh_len >= sizeof(struct rtnexthop) && \
  			   ((int)(rtnh)->rtnh_len) <= (len))

^ permalink raw reply related

* [PATCH net-next 1/1] bnx2x: Prevent link flaps when booting from SAN.
From: Yuval Mintz @ 2012-12-06  9:04 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, Barak Witkowski, Yuval Mintz, Eilon Greenstein

From: Barak Witkowski <barak@broadcom.com>

It is possible that the driver is configured to operate with a certain
link configuration which differs from the link's configuration during
boot from SAN - this would cause the driver to flap the link.

Said flap may be missed by specific switches, causing dcbx convergence 
to be too long and boot sequence to fail. Convergence is longer because
switch ignores new dcbx packets due to counters mismatch, as only host 
side reset the counters due to the link flap.

This patch causes the driver to ignore user's initial configuration during
boot from SAN, and continues with the existing link configuration.

Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
Hi Dave,

Boot from SAN from some switches fail without this fix.

As this fix is built above Link Flap Avoidance feature submitted only to
`net-next', please consider applying it to `net-next'.
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h      |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c |    6 ++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   28 +++++++++++++++++++--
 5 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 02ea644..c79a584 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1186,6 +1186,7 @@ struct bnx2x_prev_path_list {
 	u8 slot;
 	u8 path;
 	struct list_head list;
+	u8 undi;
 };
 
 struct bnx2x_sp_objs {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 8ab1492..67baddd 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2396,6 +2396,7 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode)
 
 	if (bp->port.pmf)
 		bnx2x_initial_phy_init(bp, load_mode);
+	bp->link_params.feature_config_flags &= ~FEATURE_CONFIG_BOOT_FROM_SAN;
 
 	/* Start fast path */
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index 3e7d824..09096b4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -254,6 +254,12 @@ static int bnx2x_check_lfa(struct link_params *params)
 	if (!(link_status & LINK_STATUS_LINK_UP))
 		return LFA_LINK_DOWN;
 
+	/* if loaded after BOOT from SAN, don't flap the link in any case and
+	 * rely on link set by preboot driver
+	 */
+	if (params->feature_config_flags & FEATURE_CONFIG_BOOT_FROM_SAN)
+		return 0;
+
 	/* Verify that loopback mode is not set */
 	if (params->loopback_mode)
 		return LFA_LOOPBACK_ENABLED;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
index 181c5ce..ee6e7ec 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
@@ -267,6 +267,7 @@ struct link_params {
 #define FEATURE_CONFIG_BC_SUPPORTS_SFP_TX_DISABLED		(1<<10)
 #define FEATURE_CONFIG_DISABLE_REMOTE_FAULT_DET		(1<<11)
 #define FEATURE_CONFIG_MT_SUPPORT			(1<<13)
+#define FEATURE_CONFIG_BOOT_FROM_SAN			(1<<14)
 
 	/* Will be populated during common init */
 	struct bnx2x_phy phy[MAX_PHYS];
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 75aea83..7145b37 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9515,6 +9515,20 @@ static int bnx2x_prev_mcp_done(struct bnx2x *bp)
 	return 0;
 }
 
+static struct bnx2x_prev_path_list *
+		bnx2x_prev_path_get_entry(struct bnx2x *bp)
+{
+	struct bnx2x_prev_path_list *tmp_list;
+
+	list_for_each_entry(tmp_list, &bnx2x_prev_list, list)
+		if (PCI_SLOT(bp->pdev->devfn) == tmp_list->slot &&
+		    bp->pdev->bus->number == tmp_list->bus &&
+		    BP_PATH(bp) == tmp_list->path)
+			return tmp_list;
+
+	return NULL;
+}
+
 static bool bnx2x_prev_is_path_marked(struct bnx2x *bp)
 {
 	struct bnx2x_prev_path_list *tmp_list;
@@ -9539,7 +9553,7 @@ static bool bnx2x_prev_is_path_marked(struct bnx2x *bp)
 	return rc;
 }
 
-static int bnx2x_prev_mark_path(struct bnx2x *bp)
+static int bnx2x_prev_mark_path(struct bnx2x *bp, bool after_undi)
 {
 	struct bnx2x_prev_path_list *tmp_list;
 	int rc;
@@ -9553,6 +9567,7 @@ static int bnx2x_prev_mark_path(struct bnx2x *bp)
 	tmp_list->bus = bp->pdev->bus->number;
 	tmp_list->slot = PCI_SLOT(bp->pdev->devfn);
 	tmp_list->path = BP_PATH(bp);
+	tmp_list->undi = after_undi ? (1 << BP_PORT(bp)) : 0;
 
 	rc = down_interruptible(&bnx2x_prev_sem);
 	if (rc) {
@@ -9649,6 +9664,7 @@ static int bnx2x_prev_unload_uncommon(struct bnx2x *bp)
 static int bnx2x_prev_unload_common(struct bnx2x *bp)
 {
 	u32 reset_reg, tmp_reg = 0, rc;
+	bool prev_undi = false;
 	/* It is possible a previous function received 'common' answer,
 	 * but hasn't loaded yet, therefore creating a scenario of
 	 * multiple functions receiving 'common' on the same path.
@@ -9663,7 +9679,6 @@ static int bnx2x_prev_unload_common(struct bnx2x *bp)
 	/* Reset should be performed after BRB is emptied */
 	if (reset_reg & MISC_REGISTERS_RESET_REG_1_RST_BRB1) {
 		u32 timer_count = 1000;
-		bool prev_undi = false;
 
 		/* Close the MAC Rx to prevent BRB from filling up */
 		bnx2x_prev_unload_close_mac(bp);
@@ -9713,7 +9728,7 @@ static int bnx2x_prev_unload_common(struct bnx2x *bp)
 	/* No packets are in the pipeline, path is ready for reset */
 	bnx2x_reset_common(bp);
 
-	rc = bnx2x_prev_mark_path(bp);
+	rc = bnx2x_prev_mark_path(bp, prev_undi);
 	if (rc) {
 		bnx2x_prev_mcp_done(bp);
 		return rc;
@@ -9745,6 +9760,7 @@ static int bnx2x_prev_unload(struct bnx2x *bp)
 {
 	int time_counter = 10;
 	u32 rc, fw, hw_lock_reg, hw_lock_val;
+	struct bnx2x_prev_path_list *prev_list;
 	BNX2X_DEV_INFO("Entering Previous Unload Flow\n");
 
 	/* clear hw from errors which may have resulted from an interrupted
@@ -9803,6 +9819,12 @@ static int bnx2x_prev_unload(struct bnx2x *bp)
 		rc = -EBUSY;
 	}
 
+	/* Mark function if its port was used to boot from SAN */
+	prev_list = bnx2x_prev_path_get_entry(bp);
+	if (prev_list && (prev_list->undi & (1 << BP_PORT(bp))))
+		bp->link_params.feature_config_flags |=
+			FEATURE_CONFIG_BOOT_FROM_SAN;
+
 	BNX2X_DEV_INFO("Finished Previous Unload Flow [%d]\n", rc);
 
 	return rc;
-- 
1.7.1

^ permalink raw reply related

* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: Chen Gang @ 2012-12-06  9:05 UTC (permalink / raw)
  To: chas williams - CONTRACTOR; +Cc: David Miller, netdev
In-Reply-To: <50BFF19E.1040405@asianux.com>

Hi Chas Williams:

  all of my original reply are my idea (or suggestions), not for issues.

  if you do not need them, please help to send patch.

  I have tried to check it. at least, I did not find issues (and I also
also learned from it)

  hope the patch can pass reviewers checking !

  Good Luck !

  :-)

gchen.

于 2012年12月06日 09:15, Chen Gang 写道:
> 于 2012年12月05日 22:55, chas williams - CONTRACTOR 写道:
>> On Wed, 05 Dec 2012 13:59:26 +0800
>>
>> it doesn't seem like optimizing for this corner case is a huge
>> concern.  the list cannot be infinitely long.
>>
> 
>   ok.
> 
> 
>>>>>
>>>>> By the way:
>>>>>   will it be better that always let "\n" at the end ?
>>>>>   (if count == PAGE_SIZE in a loop, we can not let "\n" at the end).
>>>>
>>>>    oh, sorry ! count will never >= PAGE_SIZE.
>>>>
>>>>    I think let "PAGE_SIZE - 2" instead of "PAGE_SIZE" in the loop, so we
>>>> can make the room for the end of "\n".
>>>>
>>>>
>>>>
>>>    sorry, "PAGE_SIZE - 1" is enough, not need "PAGE_SIZE - 2".
>>
>> did you mean '\0' instead of '\n'?  scnprintf() considers the trailing
>> '\0' when formatting.
> 
>   no, originally, the end is "\n\0".
> 
>   I prefer we still compatible "\n" when the contents are very large.
>   if count already == (PAGE_SIZE - 1), we have no chance to append "\n" to the end.
> 
> -		pos += sprintf(pos, "\n");
> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");
> 
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 


-- 
Chen Gang

Asianux Corporation

^ permalink raw reply

* Re: [PATCH] net: ICMPv6 packets transmitted on wrong interface if nfmark is mangled
From: Dries De Winter @ 2012-12-06  9:11 UTC (permalink / raw)
  To: David Miller; +Cc: pablo, kaber, netdev, netfilter-devel
In-Reply-To: <20121205.125700.2246243377198648534.davem@davemloft.net>

2012/12/5 David Miller <davem@davemloft.net>:
> From: Dries De Winter <dries.dewinter@gmail.com>
> Date: Wed, 5 Dec 2012 14:41:59 +0100
>
>> My "noreroute" patch will not fix this. Therefore it's indeed maybe
>> better to add a simple check to ip6_route_me_harder(): not a check for
>> ICMPv6, but a check for (ipv6_addr_type(&iph->daddr) &
>> IPV6_ADDR_LINKLOCAL) instead. What do you think?
>
> What if a packet is rewritten from a non-link-local destination address
> into a link-local one?  Or vice versa?
>
> Your test will fail in those cases.

You are saying that the decision should be based on the original
destination address rather the modified one? I would say the opposite:

- If a non-link-local destination is changed into a link-local one, it
should certainly not be rerouted because routing doesn't make much
sense for link-local destinations.

- If a link-local destination is changed into a non-link-local one,
why not reroute it according to the new destination?

If you do not agree, we can also put the check in
ip6t_local_out_hook() where the original destination is still
available.

Dries.

^ permalink raw reply

* [PATCH] chelsio: remove get_clock and use ktime_get
From: Jan Glauber @ 2012-12-06  9:20 UTC (permalink / raw)
  To: netdev; +Cc: Jan Glauber

The get_clock() of the chelsio driver clashes with the s390 one.
The chelsio helper reads a timespec via ktime just to convert it
back to ktime. I can see no different outcome from calling
ktime_get directly.

Remove the get_clock and use ktime_get directly.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
---
 drivers/net/ethernet/chelsio/cxgb/sge.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb/sge.c b/drivers/net/ethernet/chelsio/cxgb/sge.c
index 47a8435..31de20a 100644
--- a/drivers/net/ethernet/chelsio/cxgb/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb/sge.c
@@ -367,18 +367,6 @@ void t1_sched_set_drain_bits_per_us(struct sge *sge, unsigned int port,
 
 #endif  /*  0  */
 
-
-/*
- * get_clock() implements a ns clock (see ktime_get)
- */
-static inline ktime_t get_clock(void)
-{
-	struct timespec ts;
-
-	ktime_get_ts(&ts);
-	return timespec_to_ktime(ts);
-}
-
 /*
  * tx_sched_init() allocates resources and does basic initialization.
  */
@@ -411,7 +399,7 @@ static int tx_sched_init(struct sge *sge)
 static inline int sched_update_avail(struct sge *sge)
 {
 	struct sched *s = sge->tx_sched;
-	ktime_t now = get_clock();
+	ktime_t now = ktime_get();
 	unsigned int i;
 	long long delta_time_ns;
 
-- 
1.7.12.4

^ permalink raw reply related

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-06 10:29 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <20121205202619.18626.98778.stgit@localhost>

On Wednesday, December 05, 2012 03:26:19 PM Paul Moore wrote:
> This patch corrects some problems with LSM/SELinux that were introduced
> with the multiqueue patchset.  The problem stems from the fact that the
> multiqueue work changed the relationship between the tun device and its
> associated socket; before the socket persisted for the life of the
> device, however after the multiqueue changes the socket only persisted
> for the life of the userspace connection (fd open).  For non-persistent
> devices this is not an issue, but for persistent devices this can cause
> the tun device to lose its SELinux label.
> 
> We correct this problem by adding an opaque LSM security blob to the
> tun device struct which allows us to have the LSM security state, e.g.
> SELinux labeling information, persist for the lifetime of the tun
> device.  In the process we tweak the LSM hooks to work with this new
> approach to TUN device/socket labeling and introduce a new LSM hook,
> security_tun_dev_create_queue(), to approve requests to create a new
> TUN queue via TUNSETQUEUE.
> 
> The SELinux code has been adjusted to match the new LSM hooks, the
> other LSMs do not make use of the LSM TUN controls.  This patch makes
> use of the recently added "tun_socket:create_queue" permission to
> restrict access to the TUNSETQUEUE operation.  On older SELinux
> policies which do not define the "tun_socket:create_queue" permission
> the access control decision for TUNSETQUEUE will be handled according
> to the SELinux policy's unknown permission setting.
> 
> Signed-off-by: Paul Moore <pmoore@redhat.com>
> ---
>  drivers/net/tun.c                 |   26 +++++++++++++---
>  include/linux/security.h          |   59
> +++++++++++++++++++++++++++++-------- security/capability.c             |  
> 24 +++++++++++++--
>  security/security.c               |   28 ++++++++++++++----
>  security/selinux/hooks.c          |   50 ++++++++++++++++++++++++-------
>  security/selinux/include/objsec.h |    4 +++
>  6 files changed, 153 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 14a0454..fb8148b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -182,6 +182,7 @@ struct tun_struct {
>  	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
>  	struct timer_list flow_gc_timer;
>  	unsigned long ageing_time;
> +	void *security;
>  };
> 
>  static inline u32 tun_hashfn(u32 rxhash)
> @@ -465,6 +466,10 @@ static int tun_attach(struct tun_struct *tun, struct
> file *file) struct tun_file *tfile = file->private_data;
>  	int err;
> 
> +	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
> +	if (err < 0)
> +		goto out;
> +
>  	err = -EINVAL;
>  	if (rcu_dereference_protected(tfile->tun, lockdep_rtnl_is_held()))
>  		goto out;
> @@ -1348,6 +1353,7 @@ static void tun_free_netdev(struct net_device *dev)
>  	struct tun_struct *tun = netdev_priv(dev);
> 
>  	tun_flow_uninit(tun);
> +	security_tun_dev_free_security(tun->security);
>  	free_netdev(dev);
>  }
> 
> @@ -1534,7 +1540,7 @@ static int tun_set_iff(struct net *net, struct file
> *file, struct ifreq *ifr)
> 
>  		if (tun_not_capable(tun))
>  			return -EPERM;
> -		err = security_tun_dev_attach(tfile->socket.sk);
> +		err = security_tun_dev_open(tun->security);
>  		if (err < 0)
>  			return err;
> 
> @@ -1587,7 +1593,9 @@ static int tun_set_iff(struct net *net, struct file
> *file, struct ifreq *ifr)
> 
>  		spin_lock_init(&tun->lock);
> 
> -		security_tun_dev_post_create(&tfile->sk);
> +		err = security_tun_dev_alloc_security(&tun->security);
> +		if (err < 0)
> +			goto err_free_dev;
> 
>  		tun_net_init(dev);
> 
> @@ -1767,12 +1775,18 @@ static int tun_set_queue(struct file *file, struct
> ifreq *ifr)
> 
>  		tun = netdev_priv(dev);
>  		if (dev->netdev_ops != &tap_netdev_ops &&
> -			dev->netdev_ops != &tun_netdev_ops)
> +			dev->netdev_ops != &tun_netdev_ops) {
>  			ret = -EINVAL;
> -		else if (tun_not_capable(tun))
> +			goto unlock;
> +		}
> +		if (tun_not_capable(tun)) {
>  			ret = -EPERM;
> -		else
> -			ret = tun_attach(tun, file);
> +			goto unlock;
> +		}
> +		ret = security_tun_dev_create_queue(tun->security);
> +		if (ret < 0)
> +			goto unlock;
> +		ret = tun_attach(tun, file);
>  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
>  		__tun_detach(tfile, false);
>  	else
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 05e88bd..8ea923b 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -983,17 +983,29 @@ static inline void security_free_mnt_opts(struct
> security_mnt_opts *opts) *	tells the LSM to decrement the number of 
secmark
> labeling rules loaded * @req_classify_flow:
>   *	Sets the flow's sid to the openreq sid.
> + * @tun_dev_alloc_security:
> + *	This hook allows a module to allocate a security structure for a TUN
> + *	device.
> + *	@security pointer to a security structure pointer.
> + *	Returns a zero on success, negative values on failure.
> + * @tun_dev_free_security:
> + *	This hook allows a module to free the security structure for a TUN
> + *	device.
> + *	@security pointer to the TUN device's security structure
>   * @tun_dev_create:
>   *	Check permissions prior to creating a new TUN device.
> - * @tun_dev_post_create:
> - *	This hook allows a module to update or allocate a per-socket security
> - *	structure.
> - *	@sk contains the newly created sock structure.
> + * @tun_dev_create_queue:
> + *	Check permissions prior to creating a new TUN device queue.
> + *	@security pointer to the TUN device's security structure.
>   * @tun_dev_attach:
> - *	Check permissions prior to attaching to a persistent TUN device.  This
> - *	hook can also be used by the module to update any security state
> + *	This hook can be used by the module to update any security state
>   *	associated with the TUN device's sock structure.
>   *	@sk contains the existing sock structure.
> + *	@security pointer to the TUN device's security structure.
> + * @tun_dev_open:
> + *	This hook can be used by the module to update any security state
> + *	associated with the TUN device's security structure.
> + *	@security pointer to the TUN devices's security structure.
>   *
>   * Security hooks for XFRM operations.
>   *
> @@ -1613,9 +1625,12 @@ struct security_operations {
>  	void (*secmark_refcount_inc) (void);
>  	void (*secmark_refcount_dec) (void);
>  	void (*req_classify_flow) (const struct request_sock *req, struct flowi
> *fl); -	int (*tun_dev_create)(void);
> -	void (*tun_dev_post_create)(struct sock *sk);
> -	int (*tun_dev_attach)(struct sock *sk);
> +	int (*tun_dev_alloc_security) (void **security);
> +	void (*tun_dev_free_security) (void *security);
> +	int (*tun_dev_create) (void);
> +	int (*tun_dev_create_queue) (void *security);
> +	int (*tun_dev_attach) (struct sock *sk, void *security);
> +	int (*tun_dev_open) (void *security);
>  #endif	/* CONFIG_SECURITY_NETWORK */
> 
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> @@ -2553,9 +2568,12 @@ void security_inet_conn_established(struct sock *sk,
>  int security_secmark_relabel_packet(u32 secid);
>  void security_secmark_refcount_inc(void);
>  void security_secmark_refcount_dec(void);
> +int security_tun_dev_alloc_security(void **security);
> +void security_tun_dev_free_security(void *security);
>  int security_tun_dev_create(void);
> -void security_tun_dev_post_create(struct sock *sk);
> -int security_tun_dev_attach(struct sock *sk);
> +int security_tun_dev_create_queue(void *security);
> +int security_tun_dev_attach(struct sock *sk, void *security);
> +int security_tun_dev_open(void *security);
> 
>  #else	/* CONFIG_SECURITY_NETWORK */
>  static inline int security_unix_stream_connect(struct sock *sock,
> @@ -2720,16 +2738,31 @@ static inline void
> security_secmark_refcount_dec(void) {
>  }
> 
> +static inline int security_tun_dev_alloc_security(void **security)
> +{
> +	return 0;
> +}
> +
> +static inline void security_tun_dev_free_security(void *security)
> +{
> +}
> +
>  static inline int security_tun_dev_create(void)
>  {
>  	return 0;
>  }
> 
> -static inline void security_tun_dev_post_create(struct sock *sk)
> +static inline int security_tun_dev_create_queue(void *security)
> +{
> +	return 0;
> +}
> +
> +static inline int security_tun_dev_attach(struct sock *sk, void *security)
>  {
> +	return 0;
>  }
> 
> -static inline int security_tun_dev_attach(struct sock *sk)
> +static inline int security_tun_dev_open(void *security)
>  {
>  	return 0;
>  }
> diff --git a/security/capability.c b/security/capability.c
> index b14a30c..bf4cbf2 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -704,16 +704,31 @@ static void cap_req_classify_flow(const struct
> request_sock *req, {
>  }
> 
> +static int cap_tun_dev_alloc_security(void **security)
> +{
> +	return 0;
> +}
> +
> +static void cap_tun_dev_free_security(void *security)
> +{
> +}
> +
>  static int cap_tun_dev_create(void)
>  {
>  	return 0;
>  }
> 
> -static void cap_tun_dev_post_create(struct sock *sk)
> +static int cap_tun_dev_create_queue(void *security)
> +{
> +	return 0;
> +}
> +
> +static int cap_tun_dev_attach(struct sock *sk, void *security)
>  {
> +	return 0;
>  }
> 
> -static int cap_tun_dev_attach(struct sock *sk)
> +static int cap_tun_dev_open(void *security)
>  {
>  	return 0;
>  }
> @@ -1044,8 +1059,11 @@ void __init security_fixup_ops(struct
> security_operations *ops) set_to_cap_if_null(ops, secmark_refcount_inc);
>  	set_to_cap_if_null(ops, secmark_refcount_dec);
>  	set_to_cap_if_null(ops, req_classify_flow);
> +	set_to_cap_if_null(ops, tun_dev_alloc_security);
> +	set_to_cap_if_null(ops, tun_dev_free_security);
>  	set_to_cap_if_null(ops, tun_dev_create);
> -	set_to_cap_if_null(ops, tun_dev_post_create);
> +	set_to_cap_if_null(ops, tun_dev_create_queue);
> +	set_to_cap_if_null(ops, tun_dev_open);
>  	set_to_cap_if_null(ops, tun_dev_attach);
>  #endif	/* CONFIG_SECURITY_NETWORK */
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> diff --git a/security/security.c b/security/security.c
> index 8dcd4ae..4d82654 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1244,24 +1244,42 @@ void security_secmark_refcount_dec(void)
>  }
>  EXPORT_SYMBOL(security_secmark_refcount_dec);
> 
> +int security_tun_dev_alloc_security(void **security)
> +{
> +	return security_ops->tun_dev_alloc_security(security);
> +}
> +EXPORT_SYMBOL(security_tun_dev_alloc_security);
> +
> +void security_tun_dev_free_security(void *security)
> +{
> +	security_ops->tun_dev_free_security(security);
> +}
> +EXPORT_SYMBOL(security_tun_dev_free_security);
> +
>  int security_tun_dev_create(void)
>  {
>  	return security_ops->tun_dev_create();
>  }
>  EXPORT_SYMBOL(security_tun_dev_create);
> 
> -void security_tun_dev_post_create(struct sock *sk)
> +int security_tun_dev_create_queue(void *security)
>  {
> -	return security_ops->tun_dev_post_create(sk);
> +	return security_ops->tun_dev_create_queue(security);
>  }
> -EXPORT_SYMBOL(security_tun_dev_post_create);
> +EXPORT_SYMBOL(security_tun_dev_create_queue);
> 
> -int security_tun_dev_attach(struct sock *sk)
> +int security_tun_dev_attach(struct sock *sk, void *security)
>  {
> -	return security_ops->tun_dev_attach(sk);
> +	return security_ops->tun_dev_attach(sk, security);
>  }
>  EXPORT_SYMBOL(security_tun_dev_attach);
> 
> +int security_tun_dev_open(void *security)
> +{
> +	return security_ops->tun_dev_open(security);
> +}
> +EXPORT_SYMBOL(security_tun_dev_open);
> +
>  #endif	/* CONFIG_SECURITY_NETWORK */
> 
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 61a5336..f1efb08 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4399,6 +4399,24 @@ static void selinux_req_classify_flow(const struct
> request_sock *req, fl->flowi_secid = req->secid;
>  }
> 
> +static int selinux_tun_dev_alloc_security(void **security)
> +{
> +	struct tun_security_struct *tunsec;
> +
> +	tunsec = kzalloc(sizeof(*tunsec), GFP_KERNEL);
> +	if (!tunsec)
> +		return -ENOMEM;
> +	tunsec->sid = current_sid();
> +
> +	*security = tunsec;
> +	return 0;
> +}
> +
> +static void selinux_tun_dev_free_security(void *security)
> +{
> +	kfree(security);
> +}
> +
>  static int selinux_tun_dev_create(void)
>  {
>  	u32 sid = current_sid();
> @@ -4414,8 +4432,17 @@ static int selinux_tun_dev_create(void)
>  			    NULL);
>  }
> 
> -static void selinux_tun_dev_post_create(struct sock *sk)
> +static int selinux_tun_dev_create_queue(void *security)
>  {
> +	struct tun_security_struct *tunsec = security;
> +
> +	return avc_has_perm(current_sid(), tunsec->sid, SECCLASS_TUN_SOCKET,
> +			    TUN_SOCKET__CREATE_QUEUE, NULL);
> +}
> +
> +static int selinux_tun_dev_attach(struct sock *sk, void *security)
> +{
> +	struct tun_security_struct *tunsec = security;
>  	struct sk_security_struct *sksec = sk->sk_security;
> 
>  	/* we don't currently perform any NetLabel based labeling here and it
> @@ -4425,20 +4452,19 @@ static void selinux_tun_dev_post_create(struct sock
> *sk) * cause confusion to the TUN user that had no idea network labeling *
> protocols were being used */
> 
> -	/* see the comments in selinux_tun_dev_create() about why we don't use
> -	 * the sockcreate SID here */
> -
> -	sksec->sid = current_sid();
> +	sksec->sid = tunsec->sid;

Since both tun_set_iff() and tun_set_queue() would call this. I wonder when it 
is called by tun_set_queue() we need some checking just like what we done in 
v1, otherwise it's unconditionally in TUNSETQUEUE. Or we can add them in 
selinux_tun_dev_create_queue()?
>  	sksec->sclass = SECCLASS_TUN_SOCKET;
> +
> +	return 0;
>  }
> 
> -static int selinux_tun_dev_attach(struct sock *sk)
> +static int selinux_tun_dev_open(void *security)
>  {
> -	struct sk_security_struct *sksec = sk->sk_security;
> +	struct tun_security_struct *tunsec = security;
>  	u32 sid = current_sid();
>  	int err;
> 
> -	err = avc_has_perm(sid, sksec->sid, SECCLASS_TUN_SOCKET,
> +	err = avc_has_perm(sid, tunsec->sid, SECCLASS_TUN_SOCKET,
>  			   TUN_SOCKET__RELABELFROM, NULL);
>  	if (err)
>  		return err;
> @@ -4446,8 +4472,7 @@ static int selinux_tun_dev_attach(struct sock *sk)
>  			   TUN_SOCKET__RELABELTO, NULL);
>  	if (err)
>  		return err;
> -
> -	sksec->sid = sid;
> +	tunsec->sid = sid;
> 
>  	return 0;
>  }
> @@ -5642,9 +5667,12 @@ static struct security_operations selinux_ops = {
>  	.secmark_refcount_inc =		selinux_secmark_refcount_inc,
>  	.secmark_refcount_dec =		selinux_secmark_refcount_dec,
>  	.req_classify_flow =		selinux_req_classify_flow,
> +	.tun_dev_alloc_security =	selinux_tun_dev_alloc_security,
> +	.tun_dev_free_security =	selinux_tun_dev_free_security,
>  	.tun_dev_create =		selinux_tun_dev_create,
> -	.tun_dev_post_create = 		selinux_tun_dev_post_create,
> +	.tun_dev_create_queue =		selinux_tun_dev_create_queue,
>  	.tun_dev_attach =		selinux_tun_dev_attach,
> +	.tun_dev_open =			selinux_tun_dev_open,
> 
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
>  	.xfrm_policy_alloc_security =	selinux_xfrm_policy_alloc,
> diff --git a/security/selinux/include/objsec.h
> b/security/selinux/include/objsec.h index 26c7eee..aa47bca 100644
> --- a/security/selinux/include/objsec.h
> +++ b/security/selinux/include/objsec.h
> @@ -110,6 +110,10 @@ struct sk_security_struct {
>  	u16 sclass;			/* sock security class */
>  };
> 
> +struct tun_security_struct {
> +	u32 sid;			/* SID for the tun device sockets */
> +};
> +
>  struct key_security_struct {
>  	u32 sid;	/* SID of key */
>  };
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] tun: correctly report an error in tun_flow_init()
From: Jason Wang @ 2012-12-06 10:31 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <20121205202604.18626.71229.stgit@localhost>

On Wednesday, December 05, 2012 03:26:04 PM Paul Moore wrote:
> On error, the error code from tun_flow_init() is lost inside
> tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
> error code to the "err" variable which is returned by
> the tun_flow_init() function on error.
> 
> Signed-off-by: Paul Moore <pmoore@redhat.com>
> ---
>  drivers/net/tun.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index a1b2389..14a0454 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1591,7 +1591,8 @@ static int tun_set_iff(struct net *net, struct file
> *file, struct ifreq *ifr) 
>                 tun_net_init(dev);
>  
> -               if (tun_flow_init(tun))
> +               err = tun_flow_init(tun);
> +               if (err < 0)
>                         goto err_free_dev;
>  
>                 dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> 
> --

Looks fine, we can separate this out of this series and replace the RFC with 
net-next to let David apply it soon.

Thank Paul.

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-06 10:33 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121205202619.18626.98778.stgit@localhost>

On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> This patch corrects some problems with LSM/SELinux that were introduced
> with the multiqueue patchset.  The problem stems from the fact that the
> multiqueue work changed the relationship between the tun device and its
> associated socket; before the socket persisted for the life of the
> device, however after the multiqueue changes the socket only persisted
> for the life of the userspace connection (fd open).  For non-persistent
> devices this is not an issue, but for persistent devices this can cause
> the tun device to lose its SELinux label.
> 
> We correct this problem by adding an opaque LSM security blob to the
> tun device struct which allows us to have the LSM security state, e.g.
> SELinux labeling information, persist for the lifetime of the tun
> device.  In the process we tweak the LSM hooks to work with this new
> approach to TUN device/socket labeling and introduce a new LSM hook,
> security_tun_dev_create_queue(), to approve requests to create a new
> TUN queue via TUNSETQUEUE.
> 
> The SELinux code has been adjusted to match the new LSM hooks, the
> other LSMs do not make use of the LSM TUN controls.  This patch makes
> use of the recently added "tun_socket:create_queue" permission to
> restrict access to the TUNSETQUEUE operation.  On older SELinux
> policies which do not define the "tun_socket:create_queue" permission
> the access control decision for TUNSETQUEUE will be handled according
> to the SELinux policy's unknown permission setting.
> 
> Signed-off-by: Paul Moore <pmoore@redhat.com>

OK so just to verify: this can be used to ensure that qemu
process that has the queue fd can only attach it to
a specific device, right?

> ---
>  drivers/net/tun.c                 |   26 +++++++++++++---
>  include/linux/security.h          |   59 +++++++++++++++++++++++++++++--------
>  security/capability.c             |   24 +++++++++++++--
>  security/security.c               |   28 ++++++++++++++----
>  security/selinux/hooks.c          |   50 ++++++++++++++++++++++++-------
>  security/selinux/include/objsec.h |    4 +++
>  6 files changed, 153 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 14a0454..fb8148b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -182,6 +182,7 @@ struct tun_struct {
>  	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
>  	struct timer_list flow_gc_timer;
>  	unsigned long ageing_time;
> +	void *security;
>  };
>  
>  static inline u32 tun_hashfn(u32 rxhash)
> @@ -465,6 +466,10 @@ static int tun_attach(struct tun_struct *tun, struct file *file)
>  	struct tun_file *tfile = file->private_data;
>  	int err;
>  
> +	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
> +	if (err < 0)
> +		goto out;
> +
>  	err = -EINVAL;
>  	if (rcu_dereference_protected(tfile->tun, lockdep_rtnl_is_held()))
>  		goto out;
> @@ -1348,6 +1353,7 @@ static void tun_free_netdev(struct net_device *dev)
>  	struct tun_struct *tun = netdev_priv(dev);
>  
>  	tun_flow_uninit(tun);
> +	security_tun_dev_free_security(tun->security);
>  	free_netdev(dev);
>  }
>  
> @@ -1534,7 +1540,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  
>  		if (tun_not_capable(tun))
>  			return -EPERM;
> -		err = security_tun_dev_attach(tfile->socket.sk);
> +		err = security_tun_dev_open(tun->security);
>  		if (err < 0)
>  			return err;
>  
> @@ -1587,7 +1593,9 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  
>  		spin_lock_init(&tun->lock);
>  
> -		security_tun_dev_post_create(&tfile->sk);
> +		err = security_tun_dev_alloc_security(&tun->security);
> +		if (err < 0)
> +			goto err_free_dev;
>  
>  		tun_net_init(dev);
>  
> @@ -1767,12 +1775,18 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
>  
>  		tun = netdev_priv(dev);
>  		if (dev->netdev_ops != &tap_netdev_ops &&
> -			dev->netdev_ops != &tun_netdev_ops)
> +			dev->netdev_ops != &tun_netdev_ops) {
>  			ret = -EINVAL;
> -		else if (tun_not_capable(tun))
> +			goto unlock;
> +		}
> +		if (tun_not_capable(tun)) {
>  			ret = -EPERM;
> -		else
> -			ret = tun_attach(tun, file);
> +			goto unlock;
> +		}
> +		ret = security_tun_dev_create_queue(tun->security);
> +		if (ret < 0)
> +			goto unlock;
> +		ret = tun_attach(tun, file);
>  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
>  		__tun_detach(tfile, false);
>  	else
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 05e88bd..8ea923b 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -983,17 +983,29 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
>   *	tells the LSM to decrement the number of secmark labeling rules loaded
>   * @req_classify_flow:
>   *	Sets the flow's sid to the openreq sid.
> + * @tun_dev_alloc_security:
> + *	This hook allows a module to allocate a security structure for a TUN
> + *	device.
> + *	@security pointer to a security structure pointer.
> + *	Returns a zero on success, negative values on failure.
> + * @tun_dev_free_security:
> + *	This hook allows a module to free the security structure for a TUN
> + *	device.
> + *	@security pointer to the TUN device's security structure
>   * @tun_dev_create:
>   *	Check permissions prior to creating a new TUN device.
> - * @tun_dev_post_create:
> - *	This hook allows a module to update or allocate a per-socket security
> - *	structure.
> - *	@sk contains the newly created sock structure.
> + * @tun_dev_create_queue:
> + *	Check permissions prior to creating a new TUN device queue.
> + *	@security pointer to the TUN device's security structure.
>   * @tun_dev_attach:
> - *	Check permissions prior to attaching to a persistent TUN device.  This
> - *	hook can also be used by the module to update any security state
> + *	This hook can be used by the module to update any security state
>   *	associated with the TUN device's sock structure.
>   *	@sk contains the existing sock structure.
> + *	@security pointer to the TUN device's security structure.
> + * @tun_dev_open:
> + *	This hook can be used by the module to update any security state
> + *	associated with the TUN device's security structure.
> + *	@security pointer to the TUN devices's security structure.
>   *
>   * Security hooks for XFRM operations.
>   *
> @@ -1613,9 +1625,12 @@ struct security_operations {
>  	void (*secmark_refcount_inc) (void);
>  	void (*secmark_refcount_dec) (void);
>  	void (*req_classify_flow) (const struct request_sock *req, struct flowi *fl);
> -	int (*tun_dev_create)(void);
> -	void (*tun_dev_post_create)(struct sock *sk);
> -	int (*tun_dev_attach)(struct sock *sk);
> +	int (*tun_dev_alloc_security) (void **security);
> +	void (*tun_dev_free_security) (void *security);
> +	int (*tun_dev_create) (void);
> +	int (*tun_dev_create_queue) (void *security);
> +	int (*tun_dev_attach) (struct sock *sk, void *security);
> +	int (*tun_dev_open) (void *security);
>  #endif	/* CONFIG_SECURITY_NETWORK */
>  
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> @@ -2553,9 +2568,12 @@ void security_inet_conn_established(struct sock *sk,
>  int security_secmark_relabel_packet(u32 secid);
>  void security_secmark_refcount_inc(void);
>  void security_secmark_refcount_dec(void);
> +int security_tun_dev_alloc_security(void **security);
> +void security_tun_dev_free_security(void *security);
>  int security_tun_dev_create(void);
> -void security_tun_dev_post_create(struct sock *sk);
> -int security_tun_dev_attach(struct sock *sk);
> +int security_tun_dev_create_queue(void *security);
> +int security_tun_dev_attach(struct sock *sk, void *security);
> +int security_tun_dev_open(void *security);
>  
>  #else	/* CONFIG_SECURITY_NETWORK */
>  static inline int security_unix_stream_connect(struct sock *sock,
> @@ -2720,16 +2738,31 @@ static inline void security_secmark_refcount_dec(void)
>  {
>  }
>  
> +static inline int security_tun_dev_alloc_security(void **security)
> +{
> +	return 0;
> +}
> +
> +static inline void security_tun_dev_free_security(void *security)
> +{
> +}
> +
>  static inline int security_tun_dev_create(void)
>  {
>  	return 0;
>  }
>  
> -static inline void security_tun_dev_post_create(struct sock *sk)
> +static inline int security_tun_dev_create_queue(void *security)
> +{
> +	return 0;
> +}
> +
> +static inline int security_tun_dev_attach(struct sock *sk, void *security)
>  {
> +	return 0;
>  }
>  
> -static inline int security_tun_dev_attach(struct sock *sk)
> +static inline int security_tun_dev_open(void *security)
>  {
>  	return 0;
>  }
> diff --git a/security/capability.c b/security/capability.c
> index b14a30c..bf4cbf2 100644
> --- a/security/capability.c
> +++ b/security/capability.c
> @@ -704,16 +704,31 @@ static void cap_req_classify_flow(const struct request_sock *req,
>  {
>  }
>  
> +static int cap_tun_dev_alloc_security(void **security)
> +{
> +	return 0;
> +}
> +
> +static void cap_tun_dev_free_security(void *security)
> +{
> +}
> +
>  static int cap_tun_dev_create(void)
>  {
>  	return 0;
>  }
>  
> -static void cap_tun_dev_post_create(struct sock *sk)
> +static int cap_tun_dev_create_queue(void *security)
> +{
> +	return 0;
> +}
> +
> +static int cap_tun_dev_attach(struct sock *sk, void *security)
>  {
> +	return 0;
>  }
>  
> -static int cap_tun_dev_attach(struct sock *sk)
> +static int cap_tun_dev_open(void *security)
>  {
>  	return 0;
>  }
> @@ -1044,8 +1059,11 @@ void __init security_fixup_ops(struct security_operations *ops)
>  	set_to_cap_if_null(ops, secmark_refcount_inc);
>  	set_to_cap_if_null(ops, secmark_refcount_dec);
>  	set_to_cap_if_null(ops, req_classify_flow);
> +	set_to_cap_if_null(ops, tun_dev_alloc_security);
> +	set_to_cap_if_null(ops, tun_dev_free_security);
>  	set_to_cap_if_null(ops, tun_dev_create);
> -	set_to_cap_if_null(ops, tun_dev_post_create);
> +	set_to_cap_if_null(ops, tun_dev_create_queue);
> +	set_to_cap_if_null(ops, tun_dev_open);
>  	set_to_cap_if_null(ops, tun_dev_attach);
>  #endif	/* CONFIG_SECURITY_NETWORK */
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> diff --git a/security/security.c b/security/security.c
> index 8dcd4ae..4d82654 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1244,24 +1244,42 @@ void security_secmark_refcount_dec(void)
>  }
>  EXPORT_SYMBOL(security_secmark_refcount_dec);
>  
> +int security_tun_dev_alloc_security(void **security)
> +{
> +	return security_ops->tun_dev_alloc_security(security);
> +}
> +EXPORT_SYMBOL(security_tun_dev_alloc_security);
> +
> +void security_tun_dev_free_security(void *security)
> +{
> +	security_ops->tun_dev_free_security(security);
> +}
> +EXPORT_SYMBOL(security_tun_dev_free_security);
> +
>  int security_tun_dev_create(void)
>  {
>  	return security_ops->tun_dev_create();
>  }
>  EXPORT_SYMBOL(security_tun_dev_create);
>  
> -void security_tun_dev_post_create(struct sock *sk)
> +int security_tun_dev_create_queue(void *security)
>  {
> -	return security_ops->tun_dev_post_create(sk);
> +	return security_ops->tun_dev_create_queue(security);
>  }
> -EXPORT_SYMBOL(security_tun_dev_post_create);
> +EXPORT_SYMBOL(security_tun_dev_create_queue);
>  
> -int security_tun_dev_attach(struct sock *sk)
> +int security_tun_dev_attach(struct sock *sk, void *security)
>  {
> -	return security_ops->tun_dev_attach(sk);
> +	return security_ops->tun_dev_attach(sk, security);
>  }
>  EXPORT_SYMBOL(security_tun_dev_attach);
>  
> +int security_tun_dev_open(void *security)
> +{
> +	return security_ops->tun_dev_open(security);
> +}
> +EXPORT_SYMBOL(security_tun_dev_open);
> +
>  #endif	/* CONFIG_SECURITY_NETWORK */
>  
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 61a5336..f1efb08 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4399,6 +4399,24 @@ static void selinux_req_classify_flow(const struct request_sock *req,
>  	fl->flowi_secid = req->secid;
>  }
>  
> +static int selinux_tun_dev_alloc_security(void **security)
> +{
> +	struct tun_security_struct *tunsec;
> +
> +	tunsec = kzalloc(sizeof(*tunsec), GFP_KERNEL);
> +	if (!tunsec)
> +		return -ENOMEM;
> +	tunsec->sid = current_sid();
> +
> +	*security = tunsec;
> +	return 0;
> +}
> +
> +static void selinux_tun_dev_free_security(void *security)
> +{
> +	kfree(security);
> +}
> +
>  static int selinux_tun_dev_create(void)
>  {
>  	u32 sid = current_sid();
> @@ -4414,8 +4432,17 @@ static int selinux_tun_dev_create(void)
>  			    NULL);
>  }
>  
> -static void selinux_tun_dev_post_create(struct sock *sk)
> +static int selinux_tun_dev_create_queue(void *security)
>  {
> +	struct tun_security_struct *tunsec = security;
> +
> +	return avc_has_perm(current_sid(), tunsec->sid, SECCLASS_TUN_SOCKET,
> +			    TUN_SOCKET__CREATE_QUEUE, NULL);
> +}
> +
> +static int selinux_tun_dev_attach(struct sock *sk, void *security)
> +{
> +	struct tun_security_struct *tunsec = security;
>  	struct sk_security_struct *sksec = sk->sk_security;
>  
>  	/* we don't currently perform any NetLabel based labeling here and it
> @@ -4425,20 +4452,19 @@ static void selinux_tun_dev_post_create(struct sock *sk)
>  	 * cause confusion to the TUN user that had no idea network labeling
>  	 * protocols were being used */
>  
> -	/* see the comments in selinux_tun_dev_create() about why we don't use
> -	 * the sockcreate SID here */
> -
> -	sksec->sid = current_sid();
> +	sksec->sid = tunsec->sid;
>  	sksec->sclass = SECCLASS_TUN_SOCKET;
> +
> +	return 0;
>  }
>  
> -static int selinux_tun_dev_attach(struct sock *sk)
> +static int selinux_tun_dev_open(void *security)
>  {
> -	struct sk_security_struct *sksec = sk->sk_security;
> +	struct tun_security_struct *tunsec = security;
>  	u32 sid = current_sid();
>  	int err;
>  
> -	err = avc_has_perm(sid, sksec->sid, SECCLASS_TUN_SOCKET,
> +	err = avc_has_perm(sid, tunsec->sid, SECCLASS_TUN_SOCKET,
>  			   TUN_SOCKET__RELABELFROM, NULL);
>  	if (err)
>  		return err;
> @@ -4446,8 +4472,7 @@ static int selinux_tun_dev_attach(struct sock *sk)
>  			   TUN_SOCKET__RELABELTO, NULL);
>  	if (err)
>  		return err;
> -
> -	sksec->sid = sid;
> +	tunsec->sid = sid;
>  
>  	return 0;
>  }
> @@ -5642,9 +5667,12 @@ static struct security_operations selinux_ops = {
>  	.secmark_refcount_inc =		selinux_secmark_refcount_inc,
>  	.secmark_refcount_dec =		selinux_secmark_refcount_dec,
>  	.req_classify_flow =		selinux_req_classify_flow,
> +	.tun_dev_alloc_security =	selinux_tun_dev_alloc_security,
> +	.tun_dev_free_security =	selinux_tun_dev_free_security,
>  	.tun_dev_create =		selinux_tun_dev_create,
> -	.tun_dev_post_create = 		selinux_tun_dev_post_create,
> +	.tun_dev_create_queue =		selinux_tun_dev_create_queue,
>  	.tun_dev_attach =		selinux_tun_dev_attach,
> +	.tun_dev_open =			selinux_tun_dev_open,
>  
>  #ifdef CONFIG_SECURITY_NETWORK_XFRM
>  	.xfrm_policy_alloc_security =	selinux_xfrm_policy_alloc,
> diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
> index 26c7eee..aa47bca 100644
> --- a/security/selinux/include/objsec.h
> +++ b/security/selinux/include/objsec.h
> @@ -110,6 +110,10 @@ struct sk_security_struct {
>  	u16 sclass;			/* sock security class */
>  };
>  
> +struct tun_security_struct {
> +	u32 sid;			/* SID for the tun device sockets */
> +};
> +
>  struct key_security_struct {
>  	u32 sid;	/* SID of key */
>  };

^ permalink raw reply

* Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-12-06 12:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Florian Westphal, netdev, Thomas Graf,
	Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <1354699462.20888.207.camel@localhost>

On Wed, 2012-12-05 at 10:24 +0100, Jesper Dangaard Brouer wrote:
> 
> The previous evictor patch of letting new fragments enter, worked
> amazingly well.  But I suspect, this might also be related to a
> bug/problem in the evictor loop (which were being hidden by that
> patch).

The evictor loop does not contain a bug, just a SMP scalability issue
(which is fixed by later patches).  The first evictor patch, which
does not let new fragments enter, only worked amazingly well because
its hiding this (and other) scalability issues, and implicit allowing
frags already "in" to exceed the mem usage for 1 jiffie.  Thus,
invalidating the patch, as the improvement were only a side effect.


> My new *theory* is that the evictor loop, will be looping too much, if
> it finds a fragment which is INET_FRAG_COMPLETE ... in that case, we
> don't advance the LRU list, and thus will pickup the exact same
> inet_frag_queue again in the loop... to get out of the loop we need
> another CPU or packet to change the LRU list for us... I'll test that
> theory... (its could also be CPUs fighting over the same LRU head
> element that cause this) ... more to come...

The above theory does happen, but does not cause excessive looping.
The CPUs are just fighting about who gets to free the inet_frag_queue
and who gets to unlink it from its data structures (I guess, resulting
cache bouncing between CPUs).

CPUs are fighting for the same LRU head (inet_frag_queue) element,
which is bad for scalability.  We could fix this by unlinking the
element once a CPU graps it, but it would require us to change a
read_lock to a write_lock, thus we might not gain much performance.

I already (implicit) fix this is a later patch, where I'm moving the
LRU lists to be per CPU.  So, I don't know if it's worth fixing.


(And yes, I'm using thresh 4Mb/3Mb as my default setting now, but I'm
also experimenting with other thresh sizes)

p.s. Thank you Eric for being so persistent, so I realized this patch
were not good.  We can hopefully now, move on to the other patches,
which fixes the real scalability issues.

--Jesper

^ permalink raw reply

* Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Florian Westphal @ 2012-12-06 12:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, Florian Westphal, netdev,
	Thomas Graf, Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <1354796760.20888.217.camel@localhost>

Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
> CPUs are fighting for the same LRU head (inet_frag_queue) element,
> which is bad for scalability.  We could fix this by unlinking the
> element once a CPU graps it, but it would require us to change a
> read_lock to a write_lock, thus we might not gain much performance.
> 
> I already (implicit) fix this is a later patch, where I'm moving the
> LRU lists to be per CPU.  So, I don't know if it's worth fixing.

Do you think its worth trying to remove the lru list altogether and
just evict from the hash in a round-robin fashion instead?

^ permalink raw reply

* Re: [PATCH net 1/1] r8169: workaround for missing extended GigaMAC registers
From: Francois Romieu @ 2012-12-06 12:25 UTC (permalink / raw)
  To: Wang YanQing; +Cc: netdev, David Miller, Lee Chun-Yi, Hayes Wang
In-Reply-To: <20121206073842.GA3847@udknight>

Wang YanQing:
[...]
> Are you sure we will lose GigaMAC registers's content
> after NIC into PCI_D3hot state ?

I have not made such a claim.

Only some bioses f*ck things up on resume, see:

http://marc.info/?l=linux-netdev&m=132195832624117&w=2

My test computer and its 8168evl (courtesy of Realtek) was not able to make
a difference w/o the patch: it always works.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH] vhost-blk: Add vhost-blk support v6
From: Michael S. Tsirkin @ 2012-12-06 13:00 UTC (permalink / raw)
  To: Asias He
  Cc: Jens Axboe, kvm, netdev, linux-kernel, virtualization,
	Christoph Hellwig, David S. Miller
In-Reply-To: <1354412033-32372-1-git-send-email-asias@redhat.com>

On Sun, Dec 02, 2012 at 09:33:53AM +0800, Asias He wrote:
> diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk
> new file mode 100644
> index 0000000..ff8ab76
> --- /dev/null
> +++ b/drivers/vhost/Kconfig.blk
> @@ -0,0 +1,10 @@
> +config VHOST_BLK
> +	tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
> +	depends on BLOCK &&  EXPERIMENTAL && m


should depend on eventfd as well.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox