Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] vhost-blk: Add vhost-blk support v6
From: Michael S. Tsirkin @ 2012-12-06 13:00 UTC (permalink / raw)
  To: Asias He
  Cc: Jens Axboe, kvm, netdev, linux-kernel, virtualization,
	Christoph Hellwig, David S. Miller
In-Reply-To: <1354412033-32372-1-git-send-email-asias@redhat.com>

On Sun, Dec 02, 2012 at 09:33:53AM +0800, Asias He wrote:
> diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk
> new file mode 100644
> index 0000000..ff8ab76
> --- /dev/null
> +++ b/drivers/vhost/Kconfig.blk
> @@ -0,0 +1,10 @@
> +config VHOST_BLK
> +	tristate "Host kernel accelerator for virtio blk (EXPERIMENTAL)"
> +	depends on BLOCK &&  EXPERIMENTAL && m


should depend on eventfd as well.

^ permalink raw reply

* [PATCH] sctp: Fix compiler warning when CONFIG_DEBUG_SECTION_MISMATCH=y
From: Christoph Paasch @ 2012-12-06 13:03 UTC (permalink / raw)
  To: vyasevich, linux-sctp, davem; +Cc: sri, nhorman, netdev

WARNING: net/sctp/sctp.o(.text+0x72f1): Section mismatch in reference
from the function sctp_net_init() to the function
.init.text:sctp_proc_init()
The function sctp_net_init() references
the function __init sctp_proc_init().
This is often because sctp_net_init lacks a __init
annotation or the annotation of sctp_proc_init is wrong.

And put __net_init after 'int' for sctp_proc_init - as it is done
everywhere else in the sctp-stack.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 net/sctp/protocol.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 456bc3d..2c7785b 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -86,7 +86,7 @@ int sysctl_sctp_rmem[3];
 int sysctl_sctp_wmem[3];
 
 /* Set up the proc fs entry for the SCTP protocol. */
-static __net_init int sctp_proc_init(struct net *net)
+static int __net_init sctp_proc_init(struct net *net)
 {
 #ifdef CONFIG_PROC_FS
 	net->sctp.proc_net_sctp = proc_net_mkdir(net, "sctp", net->proc_net);
@@ -1165,7 +1165,7 @@ static void sctp_v4_del_protocol(void)
 	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
 }
 
-static int sctp_net_init(struct net *net)
+static int __net_init sctp_net_init(struct net *net)
 {
 	int status;
 
@@ -1290,7 +1290,7 @@ err_sysctl_register:
 	return status;
 }
 
-static void sctp_net_exit(struct net *net)
+static void __net_exit sctp_net_exit(struct net *net)
 {
 	/* Free the local address list */
 	sctp_free_addr_wq(net);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net next] team: remove team parameter from port_enter callback.
From: Rami Rosen @ 2012-12-06 13:09 UTC (permalink / raw)
  To: davem; +Cc: jiri, netdev, Rami Rosen

This patch removes an unused parameter (team) from port_enter callback in 
team_mode_ops and fixes accordingly its invocations in 3 modes and in team.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
---
 drivers/net/team/team.c                  | 2 +-
 drivers/net/team/team_mode_broadcast.c   | 2 +-
 drivers/net/team/team_mode_loadbalance.c | 2 +-
 drivers/net/team/team_mode_roundrobin.c  | 2 +-
 include/linux/if_team.h                  | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index ad86660..4203808 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -887,7 +887,7 @@ static int team_port_enter(struct team *team, struct team_port *port)
 	dev_hold(team->dev);
 	port->dev->priv_flags |= IFF_TEAM_PORT;
 	if (team->ops.port_enter) {
-		err = team->ops.port_enter(team, port);
+		err = team->ops.port_enter(port);
 		if (err) {
 			netdev_err(team->dev, "Device %s failed to enter team mode\n",
 				   port->dev->name);
diff --git a/drivers/net/team/team_mode_broadcast.c b/drivers/net/team/team_mode_broadcast.c
index c5db428..c3840bc 100644
--- a/drivers/net/team/team_mode_broadcast.c
+++ b/drivers/net/team/team_mode_broadcast.c
@@ -46,7 +46,7 @@ static bool bc_transmit(struct team *team, struct sk_buff *skb)
 	return sum_ret;
 }
 
-static int bc_port_enter(struct team *team, struct team_port *port)
+static int bc_port_enter(struct team_port *port)
 {
 	return team_port_set_team_dev_addr(port);
 }
diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
index cdc31b5..1e84c77 100644
--- a/drivers/net/team/team_mode_loadbalance.c
+++ b/drivers/net/team/team_mode_loadbalance.c
@@ -614,7 +614,7 @@ static void lb_exit(struct team *team)
 	kfree(lb_priv->ex);
 }
 
-static int lb_port_enter(struct team *team, struct team_port *port)
+static int lb_port_enter(struct team_port *port)
 {
 	struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
 
diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
index 105135a..abe889e 100644
--- a/drivers/net/team/team_mode_roundrobin.c
+++ b/drivers/net/team/team_mode_roundrobin.c
@@ -64,7 +64,7 @@ drop:
 	return false;
 }
 
-static int rr_port_enter(struct team *team, struct team_port *port)
+static int rr_port_enter(struct team_port *port)
 {
 	return team_port_set_team_dev_addr(port);
 }
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index 0245def..3366453 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -105,7 +105,7 @@ struct team_mode_ops {
 				       struct team_port *port,
 				       struct sk_buff *skb);
 	bool (*transmit)(struct team *team, struct sk_buff *skb);
-	int (*port_enter)(struct team *team, struct team_port *port);
+	int (*port_enter)(struct team_port *port);
 	void (*port_leave)(struct team *team, struct team_port *port);
 	void (*port_change_dev_addr)(struct team *team, struct team_port *port);
 	void (*port_enabled)(struct team *team, struct team_port *port);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH] tulip: Fix compiler warning when CONFIG_DEBUG_SECTION_MISMATCH=y
From: Christoph Paasch @ 2012-12-06 13:04 UTC (permalink / raw)
  To: Grant Grundler, David Miller; +Cc: netdev

WARNING: drivers/net/ethernet/dec/tulip/tulip.o(.text+0x4057): Section
mismatch in reference from the function tulip_init_one() to the variable
.devinit.rodata:early_486_chipsets
The function tulip_init_one() references
the variable __devinitconst early_486_chipsets.
This is often because tulip_init_one lacks a __devinitconst
annotation or the annotation of early_486_chipsets is wrong.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 drivers/net/ethernet/dec/tulip/tulip_core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c
index 157c8e6..6baec9c 100644
--- a/drivers/net/ethernet/dec/tulip/tulip_core.c
+++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
@@ -1301,7 +1301,7 @@ DEFINE_PCI_DEVICE_TABLE(early_486_chipsets) = {
 	{ },
 };
 
-static int tulip_init_one(struct pci_dev *pdev,
+static int __devinit tulip_init_one(struct pci_dev *pdev,
 			  const struct pci_device_id *ent)
 {
 	struct tulip_private *tp;
@@ -1970,7 +1970,7 @@ static void poll_tulip (struct net_device *dev)
 }
 #endif
 
-static struct pci_driver tulip_driver = {
+static struct __devinitdata pci_driver tulip_driver = {
 	.name		= DRV_NAME,
 	.id_table	= tulip_pci_tbl,
 	.probe		= tulip_init_one,
-- 
1.7.10.4

^ permalink raw reply related

* Re: at91sam9260 MACB problem with IP fragmentation
From: Nicolas Ferre @ 2012-12-06 13:27 UTC (permalink / raw)
  To: Erwin Rol
  Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
	netdev
In-Reply-To: <50C08233.9030905@erwinrol.com>

Erwin,

On 12/06/2012 12:32 PM, Erwin Rol :
> Hello Nicolas, Havard, all,
> 
> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
> copy of the Atmel EK).
> 
> The MACB seems to stall when I use large (>2 * MTU) UDP datagrams. The
> test case is that a udp echo client (PC) sends datagrams with increasing
> length to the AT91 until the max length of the UDP datagram is reached.
> When there is no IP fragmentation everything is fine, but when the
> datagrams are starting to get fragmented the AT91 will not reply
> anymore. But as soon as some network traffic happens it goes on again,
> and non of the data is lost.
> 
> With wireshark the effect can be easily seen (192.168.1.4 is the PC echo
> client, and 192.168.1.133 is the at91 echo server) After the first
> request there comes no reply. After a 5 second timeout the second
> request is send. And then both replies are returned.
> 
> When I enabled debugging output it all started to work. So I tried some
> udelays in the driver instead of printk and with a 1ms delay in the irq
> handler it started working. Of course that is an unacceptable fix, but
> it looks like that is some weird race condition that causes the sending
> to stall. The only difference with normal MTU sized datagrams I can
> think of is that the fragmented packets can be passed very quickly to
> the macb tx function, because the kernel has all 5 skb's ready.
> 
> I would be very interested to hear if someone else could reproduce this
> problem. Or even better, has seen this problem and has a fix for it.
> 
> I tried several kernels including the test version from Nicolas that he
> posted on LKML in October. They all show the same effect.

[..]

It seems that Matteo has the same behavior: check here:
http://www.spinics.net/lists/netdev/msg218951.html

I am working on the macb driver right now, so I will try to reproduce
and track this issue on my side.

Best regards,
-- 
Nicolas Ferre

^ permalink raw reply

* RE: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: David Laight @ 2012-12-06 13:29 UTC (permalink / raw)
  To: Florian Westphal, Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S. Miller, netdev, Thomas Graf,
	Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <20121206123248.GA24493@breakpoint.cc>

> Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
> > CPUs are fighting for the same LRU head (inet_frag_queue) element,
> > which is bad for scalability.  We could fix this by unlinking the
> > element once a CPU graps it, but it would require us to change a
> > read_lock to a write_lock, thus we might not gain much performance.
> >
> > I already (implicit) fix this is a later patch, where I'm moving the
> > LRU lists to be per CPU.  So, I don't know if it's worth fixing.
> 
> Do you think its worth trying to remove the lru list altogether and
> just evict from the hash in a round-robin fashion instead?

Round-robin will be the same as LRU under overload - so have the
same issues.
Random might be better - especially if IP datagrams for which
more than one in-sequence packet have been received are moved
to a second structure.
But you still need something to control the total memory use.

NFS/UDP is about the only thing that generates very large
IP datagrams - and no one in their right mind runs that
over non-local links.

For SMP you might hash to a small array of pointers (to fragments)
each having its own lock. Only evict items with the same hash.
Put the id in the array and you probably won't need to look at
the actual fragment (saving a cache miss) unless it is the one
you want.

	David

^ permalink raw reply

* KINDLY READ THE ATTACHED FUND INVESTMENT PROPOSAL LETTER AND REPLY.
From: MR. ZUMA DECUBO @ 2012-12-06 13:32 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 2 bytes --]



[-- Attachment #2: FUND INVESTMENT PROPOSAL.docx --]
[-- Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document, Size: 12603 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Jason Wang @ 2012-12-06 13:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paul Moore, netdev, linux-security-module, selinux
In-Reply-To: <20121206103325.GG10837@redhat.com>

On Thursday, December 06, 2012 12:33:25 PM Michael S. Tsirkin wrote:
> On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> > This patch corrects some problems with LSM/SELinux that were introduced
> > with the multiqueue patchset.  The problem stems from the fact that the
> > multiqueue work changed the relationship between the tun device and its
> > associated socket; before the socket persisted for the life of the
> > device, however after the multiqueue changes the socket only persisted
> > for the life of the userspace connection (fd open).  For non-persistent
> > devices this is not an issue, but for persistent devices this can cause
> > the tun device to lose its SELinux label.
> > 
> > We correct this problem by adding an opaque LSM security blob to the
> > tun device struct which allows us to have the LSM security state, e.g.
> > SELinux labeling information, persist for the lifetime of the tun
> > device.  In the process we tweak the LSM hooks to work with this new
> > approach to TUN device/socket labeling and introduce a new LSM hook,
> > security_tun_dev_create_queue(), to approve requests to create a new
> > TUN queue via TUNSETQUEUE.
> > 
> > The SELinux code has been adjusted to match the new LSM hooks, the
> > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > use of the recently added "tun_socket:create_queue" permission to
> > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > policies which do not define the "tun_socket:create_queue" permission
> > the access control decision for TUNSETQUEUE will be handled according
> > to the SELinux policy's unknown permission setting.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>
> 
> OK so just to verify: this can be used to ensure that qemu
> process that has the queue fd can only attach it to
> a specific device, right?

I think it can't. And I'm not sure whether we need selinux help to do this. 
Looks like we can do this without selinux through:

1. Don't assign a NULL pointer to tfile->tun during file detaching
2. Compare the ifr_name and the name of tfil->tun, if not equal, return -EINVAL
3. Set a special flag in tun_detach_all() to notify the fd is not usable, and 
can't be used for future attaching.

Afther this, only the device that the fd is first attched through (TUNSETIFF or 
TUNSETQUEUE) is allowed to be attached again.

> 
> > ---
> > 
> >  drivers/net/tun.c                 |   26 +++++++++++++---
> >  include/linux/security.h          |   59
> >  +++++++++++++++++++++++++++++-------- security/capability.c            
> >  |   24 +++++++++++++--
> >  security/security.c               |   28 ++++++++++++++----
> >  security/selinux/hooks.c          |   50 ++++++++++++++++++++++++-------
> >  security/selinux/include/objsec.h |    4 +++
> >  6 files changed, 153 insertions(+), 38 deletions(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 14a0454..fb8148b 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -182,6 +182,7 @@ struct tun_struct {
> > 
> >  	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
> >  	struct timer_list flow_gc_timer;
> >  	unsigned long ageing_time;
> > 
> > +	void *security;
> > 
> >  };
> >  
> >  static inline u32 tun_hashfn(u32 rxhash)
> > 
> > @@ -465,6 +466,10 @@ static int tun_attach(struct tun_struct *tun, struct
> > file *file)> 
> >  	struct tun_file *tfile = file->private_data;
> >  	int err;
> > 
> > +	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
> > +	if (err < 0)
> > +		goto out;
> > +
> > 
> >  	err = -EINVAL;
> >  	if (rcu_dereference_protected(tfile->tun, lockdep_rtnl_is_held()))
> >  	
> >  		goto out;
> > 
> > @@ -1348,6 +1353,7 @@ static void tun_free_netdev(struct net_device *dev)
> > 
> >  	struct tun_struct *tun = netdev_priv(dev);
> >  	
> >  	tun_flow_uninit(tun);
> > 
> > +	security_tun_dev_free_security(tun->security);
> > 
> >  	free_netdev(dev);
> >  
> >  }
> > 
> > @@ -1534,7 +1540,7 @@ static int tun_set_iff(struct net *net, struct file
> > *file, struct ifreq *ifr)> 
> >  		if (tun_not_capable(tun))
> >  		
> >  			return -EPERM;
> > 
> > -		err = security_tun_dev_attach(tfile->socket.sk);
> > +		err = security_tun_dev_open(tun->security);
> > 
> >  		if (err < 0)
> >  		
> >  			return err;
> > 
> > @@ -1587,7 +1593,9 @@ static int tun_set_iff(struct net *net, struct file
> > *file, struct ifreq *ifr)> 
> >  		spin_lock_init(&tun->lock);
> > 
> > -		security_tun_dev_post_create(&tfile->sk);
> > +		err = security_tun_dev_alloc_security(&tun->security);
> > +		if (err < 0)
> > +			goto err_free_dev;
> > 
> >  		tun_net_init(dev);
> > 
> > @@ -1767,12 +1775,18 @@ static int tun_set_queue(struct file *file, struct
> > ifreq *ifr)> 
> >  		tun = netdev_priv(dev);
> >  		if (dev->netdev_ops != &tap_netdev_ops &&
> > 
> > -			dev->netdev_ops != &tun_netdev_ops)
> > +			dev->netdev_ops != &tun_netdev_ops) {
> > 
> >  			ret = -EINVAL;
> > 
> > -		else if (tun_not_capable(tun))
> > +			goto unlock;
> > +		}
> > +		if (tun_not_capable(tun)) {
> > 
> >  			ret = -EPERM;
> > 
> > -		else
> > -			ret = tun_attach(tun, file);
> > +			goto unlock;
> > +		}
> > +		ret = security_tun_dev_create_queue(tun->security);
> > +		if (ret < 0)
> > +			goto unlock;
> > +		ret = tun_attach(tun, file);
> > 
> >  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
> >  	
> >  		__tun_detach(tfile, false);
> >  	
> >  	else
> > 
> > diff --git a/include/linux/security.h b/include/linux/security.h
> > index 05e88bd..8ea923b 100644
> > --- a/include/linux/security.h
> > +++ b/include/linux/security.h
> > @@ -983,17 +983,29 @@ static inline void security_free_mnt_opts(struct
> > security_mnt_opts *opts)> 
> >   *	tells the LSM to decrement the number of secmark labeling rules loaded
> >   * @req_classify_flow:
> >   *	Sets the flow's sid to the openreq sid.
> > 
> > + * @tun_dev_alloc_security:
> > + *	This hook allows a module to allocate a security structure for a TUN
> > + *	device.
> > + *	@security pointer to a security structure pointer.
> > + *	Returns a zero on success, negative values on failure.
> > + * @tun_dev_free_security:
> > + *	This hook allows a module to free the security structure for a TUN
> > + *	device.
> > + *	@security pointer to the TUN device's security structure
> > 
> >   * @tun_dev_create:
> >   *	Check permissions prior to creating a new TUN device.
> > 
> > - * @tun_dev_post_create:
> > - *	This hook allows a module to update or allocate a per-socket security
> > - *	structure.
> > - *	@sk contains the newly created sock structure.
> > + * @tun_dev_create_queue:
> > + *	Check permissions prior to creating a new TUN device queue.
> > + *	@security pointer to the TUN device's security structure.
> > 
> >   * @tun_dev_attach:
> > - *	Check permissions prior to attaching to a persistent TUN device.  This
> > - *	hook can also be used by the module to update any security state
> > + *	This hook can be used by the module to update any security state
> > 
> >   *	associated with the TUN device's sock structure.
> >   *	@sk contains the existing sock structure.
> > 
> > + *	@security pointer to the TUN device's security structure.
> > + * @tun_dev_open:
> > + *	This hook can be used by the module to update any security state
> > + *	associated with the TUN device's security structure.
> > + *	@security pointer to the TUN devices's security structure.
> > 
> >   *
> >   * Security hooks for XFRM operations.
> >   *
> > 
> > @@ -1613,9 +1625,12 @@ struct security_operations {
> > 
> >  	void (*secmark_refcount_inc) (void);
> >  	void (*secmark_refcount_dec) (void);
> >  	void (*req_classify_flow) (const struct request_sock *req, struct flowi
> >  	*fl);> 
> > -	int (*tun_dev_create)(void);
> > -	void (*tun_dev_post_create)(struct sock *sk);
> > -	int (*tun_dev_attach)(struct sock *sk);
> > +	int (*tun_dev_alloc_security) (void **security);
> > +	void (*tun_dev_free_security) (void *security);
> > +	int (*tun_dev_create) (void);
> > +	int (*tun_dev_create_queue) (void *security);
> > +	int (*tun_dev_attach) (struct sock *sk, void *security);
> > +	int (*tun_dev_open) (void *security);
> > 
> >  #endif	/* CONFIG_SECURITY_NETWORK */
> >  
> >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > 
> > @@ -2553,9 +2568,12 @@ void security_inet_conn_established(struct sock
> > *sk,
> > 
> >  int security_secmark_relabel_packet(u32 secid);
> >  void security_secmark_refcount_inc(void);
> >  void security_secmark_refcount_dec(void);
> > 
> > +int security_tun_dev_alloc_security(void **security);
> > +void security_tun_dev_free_security(void *security);
> > 
> >  int security_tun_dev_create(void);
> > 
> > -void security_tun_dev_post_create(struct sock *sk);
> > -int security_tun_dev_attach(struct sock *sk);
> > +int security_tun_dev_create_queue(void *security);
> > +int security_tun_dev_attach(struct sock *sk, void *security);
> > +int security_tun_dev_open(void *security);
> > 
> >  #else	/* CONFIG_SECURITY_NETWORK */
> >  static inline int security_unix_stream_connect(struct sock *sock,
> > 
> > @@ -2720,16 +2738,31 @@ static inline void
> > security_secmark_refcount_dec(void)> 
> >  {
> >  }
> > 
> > +static inline int security_tun_dev_alloc_security(void **security)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void security_tun_dev_free_security(void *security)
> > +{
> > +}
> > +
> > 
> >  static inline int security_tun_dev_create(void)
> >  {
> >  
> >  	return 0;
> >  
> >  }
> > 
> > -static inline void security_tun_dev_post_create(struct sock *sk)
> > +static inline int security_tun_dev_create_queue(void *security)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline int security_tun_dev_attach(struct sock *sk, void
> > *security)
> > 
> >  {
> > 
> > +	return 0;
> > 
> >  }
> > 
> > -static inline int security_tun_dev_attach(struct sock *sk)
> > +static inline int security_tun_dev_open(void *security)
> > 
> >  {
> >  
> >  	return 0;
> >  
> >  }
> > 
> > diff --git a/security/capability.c b/security/capability.c
> > index b14a30c..bf4cbf2 100644
> > --- a/security/capability.c
> > +++ b/security/capability.c
> > @@ -704,16 +704,31 @@ static void cap_req_classify_flow(const struct
> > request_sock *req,> 
> >  {
> >  }
> > 
> > +static int cap_tun_dev_alloc_security(void **security)
> > +{
> > +	return 0;
> > +}
> > +
> > +static void cap_tun_dev_free_security(void *security)
> > +{
> > +}
> > +
> > 
> >  static int cap_tun_dev_create(void)
> >  {
> >  
> >  	return 0;
> >  
> >  }
> > 
> > -static void cap_tun_dev_post_create(struct sock *sk)
> > +static int cap_tun_dev_create_queue(void *security)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int cap_tun_dev_attach(struct sock *sk, void *security)
> > 
> >  {
> > 
> > +	return 0;
> > 
> >  }
> > 
> > -static int cap_tun_dev_attach(struct sock *sk)
> > +static int cap_tun_dev_open(void *security)
> > 
> >  {
> >  
> >  	return 0;
> >  
> >  }
> > 
> > @@ -1044,8 +1059,11 @@ void __init security_fixup_ops(struct
> > security_operations *ops)> 
> >  	set_to_cap_if_null(ops, secmark_refcount_inc);
> >  	set_to_cap_if_null(ops, secmark_refcount_dec);
> >  	set_to_cap_if_null(ops, req_classify_flow);
> > 
> > +	set_to_cap_if_null(ops, tun_dev_alloc_security);
> > +	set_to_cap_if_null(ops, tun_dev_free_security);
> > 
> >  	set_to_cap_if_null(ops, tun_dev_create);
> > 
> > -	set_to_cap_if_null(ops, tun_dev_post_create);
> > +	set_to_cap_if_null(ops, tun_dev_create_queue);
> > +	set_to_cap_if_null(ops, tun_dev_open);
> > 
> >  	set_to_cap_if_null(ops, tun_dev_attach);
> >  
> >  #endif	/* CONFIG_SECURITY_NETWORK */
> >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > 
> > diff --git a/security/security.c b/security/security.c
> > index 8dcd4ae..4d82654 100644
> > --- a/security/security.c
> > +++ b/security/security.c
> > @@ -1244,24 +1244,42 @@ void security_secmark_refcount_dec(void)
> > 
> >  }
> >  EXPORT_SYMBOL(security_secmark_refcount_dec);
> > 
> > +int security_tun_dev_alloc_security(void **security)
> > +{
> > +	return security_ops->tun_dev_alloc_security(security);
> > +}
> > +EXPORT_SYMBOL(security_tun_dev_alloc_security);
> > +
> > +void security_tun_dev_free_security(void *security)
> > +{
> > +	security_ops->tun_dev_free_security(security);
> > +}
> > +EXPORT_SYMBOL(security_tun_dev_free_security);
> > +
> > 
> >  int security_tun_dev_create(void)
> >  {
> >  
> >  	return security_ops->tun_dev_create();
> >  
> >  }
> >  EXPORT_SYMBOL(security_tun_dev_create);
> > 
> > -void security_tun_dev_post_create(struct sock *sk)
> > +int security_tun_dev_create_queue(void *security)
> > 
> >  {
> > 
> > -	return security_ops->tun_dev_post_create(sk);
> > +	return security_ops->tun_dev_create_queue(security);
> > 
> >  }
> > 
> > -EXPORT_SYMBOL(security_tun_dev_post_create);
> > +EXPORT_SYMBOL(security_tun_dev_create_queue);
> > 
> > -int security_tun_dev_attach(struct sock *sk)
> > +int security_tun_dev_attach(struct sock *sk, void *security)
> > 
> >  {
> > 
> > -	return security_ops->tun_dev_attach(sk);
> > +	return security_ops->tun_dev_attach(sk, security);
> > 
> >  }
> >  EXPORT_SYMBOL(security_tun_dev_attach);
> > 
> > +int security_tun_dev_open(void *security)
> > +{
> > +	return security_ops->tun_dev_open(security);
> > +}
> > +EXPORT_SYMBOL(security_tun_dev_open);
> > +
> > 
> >  #endif	/* CONFIG_SECURITY_NETWORK */
> >  
> >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > 
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 61a5336..f1efb08 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -4399,6 +4399,24 @@ static void selinux_req_classify_flow(const struct
> > request_sock *req,> 
> >  	fl->flowi_secid = req->secid;
> >  
> >  }
> > 
> > +static int selinux_tun_dev_alloc_security(void **security)
> > +{
> > +	struct tun_security_struct *tunsec;
> > +
> > +	tunsec = kzalloc(sizeof(*tunsec), GFP_KERNEL);
> > +	if (!tunsec)
> > +		return -ENOMEM;
> > +	tunsec->sid = current_sid();
> > +
> > +	*security = tunsec;
> > +	return 0;
> > +}
> > +
> > +static void selinux_tun_dev_free_security(void *security)
> > +{
> > +	kfree(security);
> > +}
> > +
> > 
> >  static int selinux_tun_dev_create(void)
> >  {
> >  
> >  	u32 sid = current_sid();
> > 
> > @@ -4414,8 +4432,17 @@ static int selinux_tun_dev_create(void)
> > 
> >  			    NULL);
> >  
> >  }
> > 
> > -static void selinux_tun_dev_post_create(struct sock *sk)
> > +static int selinux_tun_dev_create_queue(void *security)
> > 
> >  {
> > 
> > +	struct tun_security_struct *tunsec = security;
> > +
> > +	return avc_has_perm(current_sid(), tunsec->sid, SECCLASS_TUN_SOCKET,
> > +			    TUN_SOCKET__CREATE_QUEUE, NULL);
> > +}
> > +
> > +static int selinux_tun_dev_attach(struct sock *sk, void *security)
> > +{
> > +	struct tun_security_struct *tunsec = security;
> > 
> >  	struct sk_security_struct *sksec = sk->sk_security;
> >  	
> >  	/* we don't currently perform any NetLabel based labeling here and it
> > 
> > @@ -4425,20 +4452,19 @@ static void selinux_tun_dev_post_create(struct
> > sock *sk)> 
> >  	 * cause confusion to the TUN user that had no idea network labeling
> >  	 * protocols were being used */
> > 
> > -	/* see the comments in selinux_tun_dev_create() about why we don't use
> > -	 * the sockcreate SID here */
> > -
> > -	sksec->sid = current_sid();
> > +	sksec->sid = tunsec->sid;
> > 
> >  	sksec->sclass = SECCLASS_TUN_SOCKET;
> > 
> > +
> > +	return 0;
> > 
> >  }
> > 
> > -static int selinux_tun_dev_attach(struct sock *sk)
> > +static int selinux_tun_dev_open(void *security)
> > 
> >  {
> > 
> > -	struct sk_security_struct *sksec = sk->sk_security;
> > +	struct tun_security_struct *tunsec = security;
> > 
> >  	u32 sid = current_sid();
> >  	int err;
> > 
> > -	err = avc_has_perm(sid, sksec->sid, SECCLASS_TUN_SOCKET,
> > +	err = avc_has_perm(sid, tunsec->sid, SECCLASS_TUN_SOCKET,
> > 
> >  			   TUN_SOCKET__RELABELFROM, NULL);
> >  	
> >  	if (err)
> >  	
> >  		return err;
> > 
> > @@ -4446,8 +4472,7 @@ static int selinux_tun_dev_attach(struct sock *sk)
> > 
> >  			   TUN_SOCKET__RELABELTO, NULL);
> >  	
> >  	if (err)
> >  	
> >  		return err;
> > 
> > -
> > -	sksec->sid = sid;
> > +	tunsec->sid = sid;
> > 
> >  	return 0;
> >  
> >  }
> > 
> > @@ -5642,9 +5667,12 @@ static struct security_operations selinux_ops = {
> > 
> >  	.secmark_refcount_inc =		selinux_secmark_refcount_inc,
> >  	.secmark_refcount_dec =		selinux_secmark_refcount_dec,
> >  	.req_classify_flow =		selinux_req_classify_flow,
> > 
> > +	.tun_dev_alloc_security =	selinux_tun_dev_alloc_security,
> > +	.tun_dev_free_security =	selinux_tun_dev_free_security,
> > 
> >  	.tun_dev_create =		selinux_tun_dev_create,
> > 
> > -	.tun_dev_post_create = 		selinux_tun_dev_post_create,
> > +	.tun_dev_create_queue =		selinux_tun_dev_create_queue,
> > 
> >  	.tun_dev_attach =		selinux_tun_dev_attach,
> > 
> > +	.tun_dev_open =			selinux_tun_dev_open,
> > 
> >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> >  
> >  	.xfrm_policy_alloc_security =	selinux_xfrm_policy_alloc,
> > 
> > diff --git a/security/selinux/include/objsec.h
> > b/security/selinux/include/objsec.h index 26c7eee..aa47bca 100644
> > --- a/security/selinux/include/objsec.h
> > +++ b/security/selinux/include/objsec.h
> > @@ -110,6 +110,10 @@ struct sk_security_struct {
> > 
> >  	u16 sclass;			/* sock security class */
> >  
> >  };
> > 
> > +struct tun_security_struct {
> > +	u32 sid;			/* SID for the tun device sockets */
> > +};
> > +
> > 
> >  struct key_security_struct {
> >  
> >  	u32 sid;	/* SID of key */
> >  
> >  };

^ permalink raw reply

* Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-12-06 13:55 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric Dumazet, David S. Miller, netdev, Thomas Graf,
	Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <20121206123248.GA24493@breakpoint.cc>

On Thu, 2012-12-06 at 13:32 +0100, Florian Westphal wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> wrote:
> > CPUs are fighting for the same LRU head (inet_frag_queue) element,
> > which is bad for scalability.  We could fix this by unlinking the
> > element once a CPU graps it, but it would require us to change a
> > read_lock to a write_lock, thus we might not gain much performance.
> > 
> > I already (implicit) fix this is a later patch, where I'm moving the
> > LRU lists to be per CPU.  So, I don't know if it's worth fixing.
> 
> Do you think its worth trying to remove the lru list altogether and
> just evict from the hash in a round-robin fashion instead?

Perhaps.  But do note my bashing of the LRU list were wrong.  I planned
to explain that in a separate mail, but basically I were causing a DoS
attack with incomplete fragments on my self, because I had disabled
Ethernet flow-control.  Which led me to some false assumptions on the
LRU list behavior (sorry).

The LRU might be the correct solution after all.  If I enable Ethernet
flow-control again, then I have a hard time "activating" the evictor
code (with thresh 4M/3M) .  I'll need a separate DoS program, which can
send incomplete fragments (in back-to-back bursts) to provoke the
evictor and LRU.

My cheap DoS reproducer-hack is to disable Ethernet flow-control on only
one interface (out of 3), to cause packet drops and the incomplete
fragments. The current preliminary results is that the two other
interfaces still gets packets through, we don't get the zero throughput
situation.
 Two interfaces and no DoS: 15342 Mbit/s
 Three interfaces and DoS:   7355 Mbit/s

The reduction might look big, but you have to take into account, that
"activating" the evictor code, is also causing scalability issues of its
own (which could account for the performance drop it self).

--Jesper

^ permalink raw reply

* Re: [Suggestion] net/atm : for sprintf, need check the total write length whether larger than a page.
From: chas williams - CONTRACTOR @ 2012-12-06 14:08 UTC (permalink / raw)
  To: Chen Gang; +Cc: David Miller, netdev
In-Reply-To: <50BFF19E.1040405@asianux.com>

On Thu, 06 Dec 2012 09:15:10 +0800
Chen Gang <gang.chen@asianux.com> wrote:

> 于 2012年12月05日 22:55, chas williams - CONTRACTOR 写道:

> > did you mean '\0' instead of '\n'?  scnprintf() considers the trailing
> > '\0' when formatting.
> 
>   no, originally, the end is "\n\0".
> 
>   I prefer we still compatible "\n" when the contents are very large.
>   if count already == (PAGE_SIZE - 1), we have no chance to append "\n" to the end.
> 
> -		pos += sprintf(pos, "\n");
> +		count += scnprintf(buf + count, PAGE_SIZE - count, "\n");

i would make the code a bit messy to do this for not much gain.  again,
it isnt likely you would run into this in a normal situation.

^ permalink raw reply

* Re: [PATCH] sctp: Fix compiler warning when CONFIG_DEBUG_SECTION_MISMATCH=y
From: Neil Horman @ 2012-12-06 14:11 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: vyasevich, linux-sctp, davem, sri, netdev
In-Reply-To: <1354798992-25622-1-git-send-email-christoph.paasch@uclouvain.be>

On Thu, Dec 06, 2012 at 02:03:12PM +0100, Christoph Paasch wrote:
> WARNING: net/sctp/sctp.o(.text+0x72f1): Section mismatch in reference
> from the function sctp_net_init() to the function
> .init.text:sctp_proc_init()
> The function sctp_net_init() references
> the function __init sctp_proc_init().
> This is often because sctp_net_init lacks a __init
> annotation or the annotation of sctp_proc_init is wrong.
> 
> And put __net_init after 'int' for sctp_proc_init - as it is done
> everywhere else in the sctp-stack.
> 
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> ---
>  net/sctp/protocol.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index 456bc3d..2c7785b 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -86,7 +86,7 @@ int sysctl_sctp_rmem[3];
>  int sysctl_sctp_wmem[3];
>  
>  /* Set up the proc fs entry for the SCTP protocol. */
> -static __net_init int sctp_proc_init(struct net *net)
> +static int __net_init sctp_proc_init(struct net *net)
>  {
>  #ifdef CONFIG_PROC_FS
>  	net->sctp.proc_net_sctp = proc_net_mkdir(net, "sctp", net->proc_net);
> @@ -1165,7 +1165,7 @@ static void sctp_v4_del_protocol(void)
>  	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
>  }
>  
> -static int sctp_net_init(struct net *net)
> +static int __net_init sctp_net_init(struct net *net)
>  {
>  	int status;
>  
> @@ -1290,7 +1290,7 @@ err_sysctl_register:
>  	return status;
>  }
>  
> -static void sctp_net_exit(struct net *net)
> +static void __net_exit sctp_net_exit(struct net *net)
>  {
>  	/* Free the local address list */
>  	sctp_free_addr_wq(net);
> -- 
> 1.7.10.4
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-06 14:12 UTC (permalink / raw)
  To: Jason Wang; +Cc: Paul Moore, netdev, linux-security-module, selinux
In-Reply-To: <1505323.yRRvBmo94H@jason-thinkpad-t430s>

On Thu, Dec 06, 2012 at 09:51:13PM +0800, Jason Wang wrote:
> On Thursday, December 06, 2012 12:33:25 PM Michael S. Tsirkin wrote:
> > On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> > > This patch corrects some problems with LSM/SELinux that were introduced
> > > with the multiqueue patchset.  The problem stems from the fact that the
> > > multiqueue work changed the relationship between the tun device and its
> > > associated socket; before the socket persisted for the life of the
> > > device, however after the multiqueue changes the socket only persisted
> > > for the life of the userspace connection (fd open).  For non-persistent
> > > devices this is not an issue, but for persistent devices this can cause
> > > the tun device to lose its SELinux label.
> > > 
> > > We correct this problem by adding an opaque LSM security blob to the
> > > tun device struct which allows us to have the LSM security state, e.g.
> > > SELinux labeling information, persist for the lifetime of the tun
> > > device.  In the process we tweak the LSM hooks to work with this new
> > > approach to TUN device/socket labeling and introduce a new LSM hook,
> > > security_tun_dev_create_queue(), to approve requests to create a new
> > > TUN queue via TUNSETQUEUE.
> > > 
> > > The SELinux code has been adjusted to match the new LSM hooks, the
> > > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > > use of the recently added "tun_socket:create_queue" permission to
> > > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > > policies which do not define the "tun_socket:create_queue" permission
> > > the access control decision for TUNSETQUEUE will be handled according
> > > to the SELinux policy's unknown permission setting.
> > > 
> > > Signed-off-by: Paul Moore <pmoore@redhat.com>
> > 
> > OK so just to verify: this can be used to ensure that qemu
> > process that has the queue fd can only attach it to
> > a specific device, right?
> 
> I think it can't.
> And I'm not sure whether we need selinux help to do this. 

Well without selinux I doi not see a problem.
If you can do SETQUEUE you can do SETIFF too and then
you can attach to tap.

> Looks like we can do this without selinux through:
> 
> 1. Don't assign a NULL pointer to tfile->tun during file detaching

So you detach from tun but keep a pointer to it? Not good.

> 2. Compare the ifr_name and the name of tfil->tun, if not equal, return -EINVAL
> 3. Set a special flag in tun_detach_all() to notify the fd is not usable, and 
> can't be used for future attaching.
> 
> Afther this, only the device that the fd is first attched through (TUNSETIFF or 
> TUNSETQUEUE) is allowed to be attached again.

This looks like a hard-coded security policy.
The problem is not detach the problem is attach,
we should solve it there.

> > 
> > > ---
> > > 
> > >  drivers/net/tun.c                 |   26 +++++++++++++---
> > >  include/linux/security.h          |   59
> > >  +++++++++++++++++++++++++++++-------- security/capability.c            
> > >  |   24 +++++++++++++--
> > >  security/security.c               |   28 ++++++++++++++----
> > >  security/selinux/hooks.c          |   50 ++++++++++++++++++++++++-------
> > >  security/selinux/include/objsec.h |    4 +++
> > >  6 files changed, 153 insertions(+), 38 deletions(-)
> > > 
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 14a0454..fb8148b 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -182,6 +182,7 @@ struct tun_struct {
> > > 
> > >  	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
> > >  	struct timer_list flow_gc_timer;
> > >  	unsigned long ageing_time;
> > > 
> > > +	void *security;
> > > 
> > >  };
> > >  
> > >  static inline u32 tun_hashfn(u32 rxhash)
> > > 
> > > @@ -465,6 +466,10 @@ static int tun_attach(struct tun_struct *tun, struct
> > > file *file)> 
> > >  	struct tun_file *tfile = file->private_data;
> > >  	int err;
> > > 
> > > +	err = security_tun_dev_attach(tfile->socket.sk, tun->security);
> > > +	if (err < 0)
> > > +		goto out;
> > > +
> > > 
> > >  	err = -EINVAL;
> > >  	if (rcu_dereference_protected(tfile->tun, lockdep_rtnl_is_held()))
> > >  	
> > >  		goto out;
> > > 
> > > @@ -1348,6 +1353,7 @@ static void tun_free_netdev(struct net_device *dev)
> > > 
> > >  	struct tun_struct *tun = netdev_priv(dev);
> > >  	
> > >  	tun_flow_uninit(tun);
> > > 
> > > +	security_tun_dev_free_security(tun->security);
> > > 
> > >  	free_netdev(dev);
> > >  
> > >  }
> > > 
> > > @@ -1534,7 +1540,7 @@ static int tun_set_iff(struct net *net, struct file
> > > *file, struct ifreq *ifr)> 
> > >  		if (tun_not_capable(tun))
> > >  		
> > >  			return -EPERM;
> > > 
> > > -		err = security_tun_dev_attach(tfile->socket.sk);
> > > +		err = security_tun_dev_open(tun->security);
> > > 
> > >  		if (err < 0)
> > >  		
> > >  			return err;
> > > 
> > > @@ -1587,7 +1593,9 @@ static int tun_set_iff(struct net *net, struct file
> > > *file, struct ifreq *ifr)> 
> > >  		spin_lock_init(&tun->lock);
> > > 
> > > -		security_tun_dev_post_create(&tfile->sk);
> > > +		err = security_tun_dev_alloc_security(&tun->security);
> > > +		if (err < 0)
> > > +			goto err_free_dev;
> > > 
> > >  		tun_net_init(dev);
> > > 
> > > @@ -1767,12 +1775,18 @@ static int tun_set_queue(struct file *file, struct
> > > ifreq *ifr)> 
> > >  		tun = netdev_priv(dev);
> > >  		if (dev->netdev_ops != &tap_netdev_ops &&
> > > 
> > > -			dev->netdev_ops != &tun_netdev_ops)
> > > +			dev->netdev_ops != &tun_netdev_ops) {
> > > 
> > >  			ret = -EINVAL;
> > > 
> > > -		else if (tun_not_capable(tun))
> > > +			goto unlock;
> > > +		}
> > > +		if (tun_not_capable(tun)) {
> > > 
> > >  			ret = -EPERM;
> > > 
> > > -		else
> > > -			ret = tun_attach(tun, file);
> > > +			goto unlock;
> > > +		}
> > > +		ret = security_tun_dev_create_queue(tun->security);
> > > +		if (ret < 0)
> > > +			goto unlock;
> > > +		ret = tun_attach(tun, file);
> > > 
> > >  	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
> > >  	
> > >  		__tun_detach(tfile, false);
> > >  	
> > >  	else
> > > 
> > > diff --git a/include/linux/security.h b/include/linux/security.h
> > > index 05e88bd..8ea923b 100644
> > > --- a/include/linux/security.h
> > > +++ b/include/linux/security.h
> > > @@ -983,17 +983,29 @@ static inline void security_free_mnt_opts(struct
> > > security_mnt_opts *opts)> 
> > >   *	tells the LSM to decrement the number of secmark labeling rules loaded
> > >   * @req_classify_flow:
> > >   *	Sets the flow's sid to the openreq sid.
> > > 
> > > + * @tun_dev_alloc_security:
> > > + *	This hook allows a module to allocate a security structure for a TUN
> > > + *	device.
> > > + *	@security pointer to a security structure pointer.
> > > + *	Returns a zero on success, negative values on failure.
> > > + * @tun_dev_free_security:
> > > + *	This hook allows a module to free the security structure for a TUN
> > > + *	device.
> > > + *	@security pointer to the TUN device's security structure
> > > 
> > >   * @tun_dev_create:
> > >   *	Check permissions prior to creating a new TUN device.
> > > 
> > > - * @tun_dev_post_create:
> > > - *	This hook allows a module to update or allocate a per-socket security
> > > - *	structure.
> > > - *	@sk contains the newly created sock structure.
> > > + * @tun_dev_create_queue:
> > > + *	Check permissions prior to creating a new TUN device queue.
> > > + *	@security pointer to the TUN device's security structure.
> > > 
> > >   * @tun_dev_attach:
> > > - *	Check permissions prior to attaching to a persistent TUN device.  This
> > > - *	hook can also be used by the module to update any security state
> > > + *	This hook can be used by the module to update any security state
> > > 
> > >   *	associated with the TUN device's sock structure.
> > >   *	@sk contains the existing sock structure.
> > > 
> > > + *	@security pointer to the TUN device's security structure.
> > > + * @tun_dev_open:
> > > + *	This hook can be used by the module to update any security state
> > > + *	associated with the TUN device's security structure.
> > > + *	@security pointer to the TUN devices's security structure.
> > > 
> > >   *
> > >   * Security hooks for XFRM operations.
> > >   *
> > > 
> > > @@ -1613,9 +1625,12 @@ struct security_operations {
> > > 
> > >  	void (*secmark_refcount_inc) (void);
> > >  	void (*secmark_refcount_dec) (void);
> > >  	void (*req_classify_flow) (const struct request_sock *req, struct flowi
> > >  	*fl);> 
> > > -	int (*tun_dev_create)(void);
> > > -	void (*tun_dev_post_create)(struct sock *sk);
> > > -	int (*tun_dev_attach)(struct sock *sk);
> > > +	int (*tun_dev_alloc_security) (void **security);
> > > +	void (*tun_dev_free_security) (void *security);
> > > +	int (*tun_dev_create) (void);
> > > +	int (*tun_dev_create_queue) (void *security);
> > > +	int (*tun_dev_attach) (struct sock *sk, void *security);
> > > +	int (*tun_dev_open) (void *security);
> > > 
> > >  #endif	/* CONFIG_SECURITY_NETWORK */
> > >  
> > >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > > 
> > > @@ -2553,9 +2568,12 @@ void security_inet_conn_established(struct sock
> > > *sk,
> > > 
> > >  int security_secmark_relabel_packet(u32 secid);
> > >  void security_secmark_refcount_inc(void);
> > >  void security_secmark_refcount_dec(void);
> > > 
> > > +int security_tun_dev_alloc_security(void **security);
> > > +void security_tun_dev_free_security(void *security);
> > > 
> > >  int security_tun_dev_create(void);
> > > 
> > > -void security_tun_dev_post_create(struct sock *sk);
> > > -int security_tun_dev_attach(struct sock *sk);
> > > +int security_tun_dev_create_queue(void *security);
> > > +int security_tun_dev_attach(struct sock *sk, void *security);
> > > +int security_tun_dev_open(void *security);
> > > 
> > >  #else	/* CONFIG_SECURITY_NETWORK */
> > >  static inline int security_unix_stream_connect(struct sock *sock,
> > > 
> > > @@ -2720,16 +2738,31 @@ static inline void
> > > security_secmark_refcount_dec(void)> 
> > >  {
> > >  }
> > > 
> > > +static inline int security_tun_dev_alloc_security(void **security)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static inline void security_tun_dev_free_security(void *security)
> > > +{
> > > +}
> > > +
> > > 
> > >  static inline int security_tun_dev_create(void)
> > >  {
> > >  
> > >  	return 0;
> > >  
> > >  }
> > > 
> > > -static inline void security_tun_dev_post_create(struct sock *sk)
> > > +static inline int security_tun_dev_create_queue(void *security)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static inline int security_tun_dev_attach(struct sock *sk, void
> > > *security)
> > > 
> > >  {
> > > 
> > > +	return 0;
> > > 
> > >  }
> > > 
> > > -static inline int security_tun_dev_attach(struct sock *sk)
> > > +static inline int security_tun_dev_open(void *security)
> > > 
> > >  {
> > >  
> > >  	return 0;
> > >  
> > >  }
> > > 
> > > diff --git a/security/capability.c b/security/capability.c
> > > index b14a30c..bf4cbf2 100644
> > > --- a/security/capability.c
> > > +++ b/security/capability.c
> > > @@ -704,16 +704,31 @@ static void cap_req_classify_flow(const struct
> > > request_sock *req,> 
> > >  {
> > >  }
> > > 
> > > +static int cap_tun_dev_alloc_security(void **security)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static void cap_tun_dev_free_security(void *security)
> > > +{
> > > +}
> > > +
> > > 
> > >  static int cap_tun_dev_create(void)
> > >  {
> > >  
> > >  	return 0;
> > >  
> > >  }
> > > 
> > > -static void cap_tun_dev_post_create(struct sock *sk)
> > > +static int cap_tun_dev_create_queue(void *security)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > > +static int cap_tun_dev_attach(struct sock *sk, void *security)
> > > 
> > >  {
> > > 
> > > +	return 0;
> > > 
> > >  }
> > > 
> > > -static int cap_tun_dev_attach(struct sock *sk)
> > > +static int cap_tun_dev_open(void *security)
> > > 
> > >  {
> > >  
> > >  	return 0;
> > >  
> > >  }
> > > 
> > > @@ -1044,8 +1059,11 @@ void __init security_fixup_ops(struct
> > > security_operations *ops)> 
> > >  	set_to_cap_if_null(ops, secmark_refcount_inc);
> > >  	set_to_cap_if_null(ops, secmark_refcount_dec);
> > >  	set_to_cap_if_null(ops, req_classify_flow);
> > > 
> > > +	set_to_cap_if_null(ops, tun_dev_alloc_security);
> > > +	set_to_cap_if_null(ops, tun_dev_free_security);
> > > 
> > >  	set_to_cap_if_null(ops, tun_dev_create);
> > > 
> > > -	set_to_cap_if_null(ops, tun_dev_post_create);
> > > +	set_to_cap_if_null(ops, tun_dev_create_queue);
> > > +	set_to_cap_if_null(ops, tun_dev_open);
> > > 
> > >  	set_to_cap_if_null(ops, tun_dev_attach);
> > >  
> > >  #endif	/* CONFIG_SECURITY_NETWORK */
> > >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > > 
> > > diff --git a/security/security.c b/security/security.c
> > > index 8dcd4ae..4d82654 100644
> > > --- a/security/security.c
> > > +++ b/security/security.c
> > > @@ -1244,24 +1244,42 @@ void security_secmark_refcount_dec(void)
> > > 
> > >  }
> > >  EXPORT_SYMBOL(security_secmark_refcount_dec);
> > > 
> > > +int security_tun_dev_alloc_security(void **security)
> > > +{
> > > +	return security_ops->tun_dev_alloc_security(security);
> > > +}
> > > +EXPORT_SYMBOL(security_tun_dev_alloc_security);
> > > +
> > > +void security_tun_dev_free_security(void *security)
> > > +{
> > > +	security_ops->tun_dev_free_security(security);
> > > +}
> > > +EXPORT_SYMBOL(security_tun_dev_free_security);
> > > +
> > > 
> > >  int security_tun_dev_create(void)
> > >  {
> > >  
> > >  	return security_ops->tun_dev_create();
> > >  
> > >  }
> > >  EXPORT_SYMBOL(security_tun_dev_create);
> > > 
> > > -void security_tun_dev_post_create(struct sock *sk)
> > > +int security_tun_dev_create_queue(void *security)
> > > 
> > >  {
> > > 
> > > -	return security_ops->tun_dev_post_create(sk);
> > > +	return security_ops->tun_dev_create_queue(security);
> > > 
> > >  }
> > > 
> > > -EXPORT_SYMBOL(security_tun_dev_post_create);
> > > +EXPORT_SYMBOL(security_tun_dev_create_queue);
> > > 
> > > -int security_tun_dev_attach(struct sock *sk)
> > > +int security_tun_dev_attach(struct sock *sk, void *security)
> > > 
> > >  {
> > > 
> > > -	return security_ops->tun_dev_attach(sk);
> > > +	return security_ops->tun_dev_attach(sk, security);
> > > 
> > >  }
> > >  EXPORT_SYMBOL(security_tun_dev_attach);
> > > 
> > > +int security_tun_dev_open(void *security)
> > > +{
> > > +	return security_ops->tun_dev_open(security);
> > > +}
> > > +EXPORT_SYMBOL(security_tun_dev_open);
> > > +
> > > 
> > >  #endif	/* CONFIG_SECURITY_NETWORK */
> > >  
> > >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > > 
> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > > index 61a5336..f1efb08 100644
> > > --- a/security/selinux/hooks.c
> > > +++ b/security/selinux/hooks.c
> > > @@ -4399,6 +4399,24 @@ static void selinux_req_classify_flow(const struct
> > > request_sock *req,> 
> > >  	fl->flowi_secid = req->secid;
> > >  
> > >  }
> > > 
> > > +static int selinux_tun_dev_alloc_security(void **security)
> > > +{
> > > +	struct tun_security_struct *tunsec;
> > > +
> > > +	tunsec = kzalloc(sizeof(*tunsec), GFP_KERNEL);
> > > +	if (!tunsec)
> > > +		return -ENOMEM;
> > > +	tunsec->sid = current_sid();
> > > +
> > > +	*security = tunsec;
> > > +	return 0;
> > > +}
> > > +
> > > +static void selinux_tun_dev_free_security(void *security)
> > > +{
> > > +	kfree(security);
> > > +}
> > > +
> > > 
> > >  static int selinux_tun_dev_create(void)
> > >  {
> > >  
> > >  	u32 sid = current_sid();
> > > 
> > > @@ -4414,8 +4432,17 @@ static int selinux_tun_dev_create(void)
> > > 
> > >  			    NULL);
> > >  
> > >  }
> > > 
> > > -static void selinux_tun_dev_post_create(struct sock *sk)
> > > +static int selinux_tun_dev_create_queue(void *security)
> > > 
> > >  {
> > > 
> > > +	struct tun_security_struct *tunsec = security;
> > > +
> > > +	return avc_has_perm(current_sid(), tunsec->sid, SECCLASS_TUN_SOCKET,
> > > +			    TUN_SOCKET__CREATE_QUEUE, NULL);
> > > +}
> > > +
> > > +static int selinux_tun_dev_attach(struct sock *sk, void *security)
> > > +{
> > > +	struct tun_security_struct *tunsec = security;
> > > 
> > >  	struct sk_security_struct *sksec = sk->sk_security;
> > >  	
> > >  	/* we don't currently perform any NetLabel based labeling here and it
> > > 
> > > @@ -4425,20 +4452,19 @@ static void selinux_tun_dev_post_create(struct
> > > sock *sk)> 
> > >  	 * cause confusion to the TUN user that had no idea network labeling
> > >  	 * protocols were being used */
> > > 
> > > -	/* see the comments in selinux_tun_dev_create() about why we don't use
> > > -	 * the sockcreate SID here */
> > > -
> > > -	sksec->sid = current_sid();
> > > +	sksec->sid = tunsec->sid;
> > > 
> > >  	sksec->sclass = SECCLASS_TUN_SOCKET;
> > > 
> > > +
> > > +	return 0;
> > > 
> > >  }
> > > 
> > > -static int selinux_tun_dev_attach(struct sock *sk)
> > > +static int selinux_tun_dev_open(void *security)
> > > 
> > >  {
> > > 
> > > -	struct sk_security_struct *sksec = sk->sk_security;
> > > +	struct tun_security_struct *tunsec = security;
> > > 
> > >  	u32 sid = current_sid();
> > >  	int err;
> > > 
> > > -	err = avc_has_perm(sid, sksec->sid, SECCLASS_TUN_SOCKET,
> > > +	err = avc_has_perm(sid, tunsec->sid, SECCLASS_TUN_SOCKET,
> > > 
> > >  			   TUN_SOCKET__RELABELFROM, NULL);
> > >  	
> > >  	if (err)
> > >  	
> > >  		return err;
> > > 
> > > @@ -4446,8 +4472,7 @@ static int selinux_tun_dev_attach(struct sock *sk)
> > > 
> > >  			   TUN_SOCKET__RELABELTO, NULL);
> > >  	
> > >  	if (err)
> > >  	
> > >  		return err;
> > > 
> > > -
> > > -	sksec->sid = sid;
> > > +	tunsec->sid = sid;
> > > 
> > >  	return 0;
> > >  
> > >  }
> > > 
> > > @@ -5642,9 +5667,12 @@ static struct security_operations selinux_ops = {
> > > 
> > >  	.secmark_refcount_inc =		selinux_secmark_refcount_inc,
> > >  	.secmark_refcount_dec =		selinux_secmark_refcount_dec,
> > >  	.req_classify_flow =		selinux_req_classify_flow,
> > > 
> > > +	.tun_dev_alloc_security =	selinux_tun_dev_alloc_security,
> > > +	.tun_dev_free_security =	selinux_tun_dev_free_security,
> > > 
> > >  	.tun_dev_create =		selinux_tun_dev_create,
> > > 
> > > -	.tun_dev_post_create = 		selinux_tun_dev_post_create,
> > > +	.tun_dev_create_queue =		selinux_tun_dev_create_queue,
> > > 
> > >  	.tun_dev_attach =		selinux_tun_dev_attach,
> > > 
> > > +	.tun_dev_open =			selinux_tun_dev_open,
> > > 
> > >  #ifdef CONFIG_SECURITY_NETWORK_XFRM
> > >  
> > >  	.xfrm_policy_alloc_security =	selinux_xfrm_policy_alloc,
> > > 
> > > diff --git a/security/selinux/include/objsec.h
> > > b/security/selinux/include/objsec.h index 26c7eee..aa47bca 100644
> > > --- a/security/selinux/include/objsec.h
> > > +++ b/security/selinux/include/objsec.h
> > > @@ -110,6 +110,10 @@ struct sk_security_struct {
> > > 
> > >  	u16 sclass;			/* sock security class */
> > >  
> > >  };
> > > 
> > > +struct tun_security_struct {
> > > +	u32 sid;			/* SID for the tun device sockets */
> > > +};
> > > +
> > > 
> > >  struct key_security_struct {
> > >  
> > >  	u32 sid;	/* SID of key */
> > >  
> > >  };

^ permalink raw reply

* Re: [PATCH] tulip: Fix compiler warning when CONFIG_DEBUG_SECTION_MISMATCH=y
From: Ben Hutchings @ 2012-12-06 14:20 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: Grant Grundler, David Miller, netdev
In-Reply-To: <1354799067-25680-1-git-send-email-christoph.paasch@uclouvain.be>

On Thu, 2012-12-06 at 14:04 +0100, Christoph Paasch wrote:
> WARNING: drivers/net/ethernet/dec/tulip/tulip.o(.text+0x4057): Section
> mismatch in reference from the function tulip_init_one() to the variable
> .devinit.rodata:early_486_chipsets
> The function tulip_init_one() references
> the variable __devinitconst early_486_chipsets.
> This is often because tulip_init_one lacks a __devinitconst
> annotation or the annotation of early_486_chipsets is wrong.
> 
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
[...]

The section mismatch will be resolved shortly as hotplug is being made
non-optional and all the __devinit and similar section qualifiers will
go away.  There's no need to make local fixes like this now.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] sctp: Fix compiler warning when CONFIG_DEBUG_SECTION_MISMATCH=y
From: Vlad Yasevich @ 2012-12-06 14:40 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: linux-sctp, davem, sri, nhorman, netdev
In-Reply-To: <1354798992-25622-1-git-send-email-christoph.paasch@uclouvain.be>

On 12/06/2012 08:03 AM, Christoph Paasch wrote:
> WARNING: net/sctp/sctp.o(.text+0x72f1): Section mismatch in reference
> from the function sctp_net_init() to the function
> .init.text:sctp_proc_init()
> The function sctp_net_init() references
> the function __init sctp_proc_init().
> This is often because sctp_net_init lacks a __init
> annotation or the annotation of sctp_proc_init is wrong.
>
> And put __net_init after 'int' for sctp_proc_init - as it is done
> everywhere else in the sctp-stack.
>
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

> ---
>   net/sctp/protocol.c |    6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index 456bc3d..2c7785b 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -86,7 +86,7 @@ int sysctl_sctp_rmem[3];
>   int sysctl_sctp_wmem[3];
>
>   /* Set up the proc fs entry for the SCTP protocol. */
> -static __net_init int sctp_proc_init(struct net *net)
> +static int __net_init sctp_proc_init(struct net *net)
>   {
>   #ifdef CONFIG_PROC_FS
>   	net->sctp.proc_net_sctp = proc_net_mkdir(net, "sctp", net->proc_net);
> @@ -1165,7 +1165,7 @@ static void sctp_v4_del_protocol(void)
>   	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
>   }
>
> -static int sctp_net_init(struct net *net)
> +static int __net_init sctp_net_init(struct net *net)
>   {
>   	int status;
>
> @@ -1290,7 +1290,7 @@ err_sysctl_register:
>   	return status;
>   }
>
> -static void sctp_net_exit(struct net *net)
> +static void __net_exit sctp_net_exit(struct net *net)
>   {
>   	/* Free the local address list */
>   	sctp_free_addr_wq(net);
>

^ permalink raw reply

* Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Eric Dumazet @ 2012-12-06 14:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Florian Westphal, David S. Miller, netdev, Thomas Graf,
	Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <1354802100.20888.242.camel@localhost>

On Thu, 2012-12-06 at 14:55 +0100, Jesper Dangaard Brouer wrote:

> Perhaps.  But do note my bashing of the LRU list were wrong.  I planned
> to explain that in a separate mail, but basically I were causing a DoS
> attack with incomplete fragments on my self, because I had disabled
> Ethernet flow-control.  Which led me to some false assumptions on the
> LRU list behavior (sorry).
> 
> The LRU might be the correct solution after all.  If I enable Ethernet
> flow-control again, then I have a hard time "activating" the evictor
> code (with thresh 4M/3M) .  I'll need a separate DoS program, which can
> send incomplete fragments (in back-to-back bursts) to provoke the
> evictor and LRU.
> 
> My cheap DoS reproducer-hack is to disable Ethernet flow-control on only
> one interface (out of 3), to cause packet drops and the incomplete
> fragments. The current preliminary results is that the two other
> interfaces still gets packets through, we don't get the zero throughput
> situation.
>  Two interfaces and no DoS: 15342 Mbit/s
>  Three interfaces and DoS:   7355 Mbit/s
> 
> The reduction might look big, but you have to take into account, that
> "activating" the evictor code, is also causing scalability issues of its
> own (which could account for the performance drop it self).

I would try removing the LRU, but keeping the age information (jiffie of
last valid frag received on one inet_frag_queue)

The eviction would be a function of the current memory used for the
frags (percpu_counter for good SMP scalability), divided by the max
allowed size, and ipfrag_time.

Under load, we would evict inet_frag_queue before the ipfrag_time timer,
without necessarily having to scan whole frags, only the ones we find in
the bucket we need to parse anyway (and lock)

The whole idea of a full garbage collect under softirq is not scalable,
as it locks a CPU in a non preemptible section for too long.

^ permalink raw reply

* Re: at91sam9260 MACB problem with IP fragmentation
From: Erwin Rol @ 2012-12-06 15:15 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
	netdev
In-Reply-To: <50C09D2E.8050608@atmel.com>

Hey Nicolas,

On 6-12-2012 14:27, Nicolas Ferre wrote:
> Erwin,
> 
> On 12/06/2012 12:32 PM, Erwin Rol :
>> Hello Nicolas, Havard, all,
>>
>> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
>> copy of the Atmel EK).
>>
>>  <snip>
>>
> [..]
> 
> It seems that Matteo has the same behavior: check here:
> http://www.spinics.net/lists/netdev/msg218951.html

The difference seems to be that in Matteo's case the receiving stalls.
In my case it is the sending that stalls. I see the UDP datagram in
userspace and the sendto call also returns without error, but the data
does not end up on the network (until the next packet is send)

> I am working on the macb driver right now, so I will try to reproduce
> and track this issue on my side.

That would be really great, and thank you for the quick reply. If you
have anything that I should try or test on my hardware just let me know.

BTW: A quick check on a at91sam9263 board did not show the problem. I
will try to verify if it really does work on a 9263, cause maybe it just
more rare on a 9263.

- Erwin

^ permalink raw reply

* [PULL net-next] vhost: changes for 3.8
From: Michael S. Tsirkin @ 2012-12-06 15:18 UTC (permalink / raw)
  To: David Miller
  Cc: kvm, virtualization, netdev, linux-kernel, dinggnu, mst,
	yongjun_wei

The following changes since commit b93196dc5af7729ff7cc50d3d322ab1a364aa14f:

  net: fix some compiler warning in net/core/neighbour.c (2012-12-05 21:50:37 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net-next

for you to fetch changes up to 405d55c99da7a3045275fdb1a30614293a53c6e7:

  tcm_vhost: remove unused variable in vhost_scsi_allocate_cmd() (2012-12-06 17:09:19 +0200)

----------------------------------------------------------------
Cong Ding (1):
      tools:virtio: fix compilation warning

Michael S. Tsirkin (4):
      vhost: avoid backend flush on vring ops
      vhost-net: flush outstanding DMAs on memory change
      vhost-net: skip head management if no outstanding
      vhost-net: enable zerocopy tx by default

Wei Yongjun (1):
      tcm_vhost: remove unused variable in vhost_scsi_allocate_cmd()

 drivers/vhost/net.c        | 51 ++++++++++++++++++++++++++++++----------------
 drivers/vhost/tcm_vhost.c  |  7 ++++---
 drivers/vhost/vhost.c      |  7 +++----
 drivers/vhost/vhost.h      |  3 ++-
 tools/virtio/virtio_test.c |  2 +-
 5 files changed, 44 insertions(+), 26 deletions(-)

^ permalink raw reply

* Re: [net-next PATCH V3-evictor] net: frag evictor, avoid killing warm frag queues
From: Jesper Dangaard Brouer @ 2012-12-06 15:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, David S. Miller, netdev, Thomas Graf,
	Paul E. McKenney, Cong Wang, Herbert Xu
In-Reply-To: <1354805276.31222.803.camel@edumazet-glaptop>

On Thu, 2012-12-06 at 06:47 -0800, Eric Dumazet wrote:
> On Thu, 2012-12-06 at 14:55 +0100, Jesper Dangaard Brouer wrote:
> 
> > The LRU might be the correct solution after all.  If I enable Ethernet
> > flow-control again, then I have a hard time "activating" the evictor
> > code (with thresh 4M/3M) .  I'll need a separate DoS program, which can
> > send incomplete fragments (in back-to-back bursts) to provoke the
> > evictor and LRU.
> > 
> > My cheap DoS reproducer-hack is to disable Ethernet flow-control on only
> > one interface (out of 3), to cause packet drops and the incomplete
> > fragments. The current preliminary results is that the two other
> > interfaces still gets packets through, we don't get the zero throughput
> > situation.
> >  Two interfaces and no DoS: 15342 Mbit/s
> >  Three interfaces and DoS:   7355 Mbit/s
> > 
> > The reduction might look big, but you have to take into account, that
> > "activating" the evictor code, is also causing scalability issues of its
> > own (which could account for the performance drop it self).
> 
> I would try removing the LRU, but keeping the age information (jiffie of
> last valid frag received on one inet_frag_queue)

I don't think its worth optimizing further, atm.

Because, the test above is without any of my SMP scalability fixes.
With my SMP fixes the result is, full scalability:

 Three interfaces:  (9601+6723+9432) = 25756 Mbit/s

And the 6723 Mbit/s number, is because the old 10G NIC cannot generate
anymore...

And I basically cannot use the cheap DoS reproducer-hack, as the
machine/code-path is now too fast...

Running with 4 interfaces, and starting 6 netperf's (to cause more
interleaving and higher mem usage):

 4716+8042+8765+6204+2475+4568 = 34770 Mbit/s

I could just manage to get to do IpReasmFails = 14.

[jbrouer@dragon ~]$ nstat > /dev/null && sleep 1 && nstat
#kernel
IpInReceives                    2980048            0.0
IpInDelivers                    66217              0.0
IpReasmReqds                    2980040            0.0
IpReasmOKs                      66218              0.0
IpReasmFails                    14                 0.0
UdpInDatagrams                  66218              0.0
IpExtInOctets                   4397976885         0.0

So, after the SMP fixes, its very hard to "activate" the evictor.  We
would need to find a slower e.g. embedded box and tune the evictor on
that, as a multi-CPU machine basically will scale "too-well" now ;-)

--Jesper

^ permalink raw reply

* Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming
From: Andy King @ 2012-12-06 15:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Benjamin Herrenschmidt, pv-drivers, netdev, linux-kernel,
	virtualization, gregkh, David Miller, georgezhang
In-Reply-To: <50A55F57.7080804@us.ibm.com>

Hi Anthony,

> This was already done in a hypervisor neutral way using virtio:
> 
> http://lists.openwall.net/netdev/2008/12/14/8
> 
> The concept was Nacked and that led to the abomination of
> virtio-serial.  If an address family for virtualization is on the
> table, we should reconsider AF_VMCHANNEL.

I finally had a look at your patch.  Please correct me if I'm wrong,
but it seems that the peer of an AF_VMCHANNEL connection is a virtio
channel inside the hypervisor, i.e., the other end is _not_ sockets,
right?

That's quite a bit different from vSockets, where both sides use the
socket interface, even within the VMX process in our hypervisor.  We
also intend for arbitrary host processes -- those outside the
hypervisor -- to use it via the sockets interface.  We have shipping
applications that do just that, where communication is between a guest
process and a host service, with both sides using the standard socket
API but with the vSockets address family.  We also encourage people to
write such VM-to-host applications, and we've been shipping the
vSockets header in our host-side products to allow people to do just
that.

So I think in that sense vSockets is somewhat more general, and we'd
obviously prefer to go with our socket family and address structure
if LKML is open to something like this.

Thanks!
- Andy

PS I realize we still owe LKML a spec for the vSockets protocol.

^ permalink raw reply

* [PATCH] net : enable tx time stamping in the vde driver.
From: Paul Chavent @ 2012-12-06 15:25 UTC (permalink / raw)
  To: jdike, richard, user-mode-linux-devel, netdev, richardcochran
  Cc: Paul Chavent

This new version moves the skb_tx_timestamp in the main uml
driver. This should avoid the need to call this function in each
transport (vde, slirp, tuntap, ...). It also add support for ethtool
get_ts_info.

Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
---
 arch/um/drivers/net_kern.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c
index b1314eb..5aa8696 100644
--- a/arch/um/drivers/net_kern.c
+++ b/arch/um/drivers/net_kern.c
@@ -218,6 +218,7 @@ static int uml_net_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_lock_irqsave(&lp->lock, flags);

 	len = (*lp->write)(lp->fd, skb, lp);
+	skb_tx_timestampns(skb);

 	if (len == skb->len) {
 		dev->stats.tx_packets++;
@@ -281,6 +282,7 @@ static void uml_net_get_drvinfo(struct net_device *dev,
 static const struct ethtool_ops uml_net_ethtool_ops = {
 	.get_drvinfo	= uml_net_get_drvinfo,
 	.get_link	= ethtool_op_get_link,
+	.get_ts_info	= ethtool_op_get_ts_info,
 };

 static void uml_net_user_timer_expire(unsigned long _conn)
-- 
1.7.12.1

^ permalink raw reply related

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-06 15:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <9040763.QsllgCP7TP@jason-thinkpad-t430s>

On Thursday, December 06, 2012 06:29:54 PM Jason Wang wrote:
> On Wednesday, December 05, 2012 03:26:19 PM Paul Moore wrote:
> > This patch corrects some problems with LSM/SELinux that were introduced
> > with the multiqueue patchset.  The problem stems from the fact that the
> > multiqueue work changed the relationship between the tun device and its
> > associated socket; before the socket persisted for the life of the
> > device, however after the multiqueue changes the socket only persisted
> > for the life of the userspace connection (fd open).  For non-persistent
> > devices this is not an issue, but for persistent devices this can cause
> > the tun device to lose its SELinux label.
> > 
> > We correct this problem by adding an opaque LSM security blob to the
> > tun device struct which allows us to have the LSM security state, e.g.
> > SELinux labeling information, persist for the lifetime of the tun
> > device.  In the process we tweak the LSM hooks to work with this new
> > approach to TUN device/socket labeling and introduce a new LSM hook,
> > security_tun_dev_create_queue(), to approve requests to create a new
> > TUN queue via TUNSETQUEUE.
> > 
> > The SELinux code has been adjusted to match the new LSM hooks, the
> > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > use of the recently added "tun_socket:create_queue" permission to
> > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > policies which do not define the "tun_socket:create_queue" permission
> > the access control decision for TUNSETQUEUE will be handled according
> > to the SELinux policy's unknown permission setting.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>

...

> > @@ -4425,20 +4452,19 @@ static void selinux_tun_dev_post_create(struct
> > sock
> > *sk) * cause confusion to the TUN user that had no idea network labeling *
> > protocols were being used */
> > 
> > -	/* see the comments in selinux_tun_dev_create() about why we ...
> > -
> > -	sksec->sid = current_sid();
> > +	sksec->sid = tunsec->sid;
> 
> Since both tun_set_iff() and tun_set_queue() would call this. I wonder when
> it is called by tun_set_queue() we need some checking just like what we
> done in v1, otherwise it's unconditionally in TUNSETQUEUE. Or we can add
> them in selinux_tun_dev_create_queue()?

In all the cases that call tun_attach() we have a new socket which needs to be 
labeled based on the tun->security label, yes?  That is what the 
security_tun_dev_attach() code does, there is no need for access control at 
this point as the operation has already been authorized by either 
security_tun_dev_create() (new device), security_tun_dev_create_queue() (new 
queue), or security_tun_dev_open() (opening persistent device).

I think we are all set, or am I missing something?
 
> >  	sksec->sclass = SECCLASS_TUN_SOCKET;
> > 
> > +
> > +	return 0;
> > 
> >  }

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* [PATCH net] inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
From: Neal Cardwell @ 2012-12-06 15:42 UTC (permalink / raw)
  To: David Miller; +Cc: edumazet, netdev, Neal Cardwell

Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/inet_diag.c |   53 ++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 0c34bfa..35c3de4 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -44,6 +44,10 @@ struct inet_diag_entry {
 	u16 dport;
 	u16 family;
 	u16 userlocks;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct in6_addr saddr_storage;	/* for IPv4-mapped-IPv6 addresses */
+	struct in6_addr daddr_storage;	/* for IPv4-mapped-IPv6 addresses */
+#endif
 };
 
 static DEFINE_MUTEX(inet_diag_table_mutex);
@@ -67,6 +71,36 @@ static inline void inet_diag_unlock_handler(
 	mutex_unlock(&inet_diag_table_mutex);
 }
 
+
+/* Get the IPv4, IPv6, or IPv4-mapped-IPv6 local and remote addresses
+ * from a request_sock. For IPv4-mapped-IPv6 we must map IPv4 to IPv6.
+ */
+static inline void inet_diag_req_addrs(const struct sock *sk,
+				       const struct request_sock *req,
+				       struct inet_diag_entry *entry)
+{
+	struct inet_request_sock *ireq = inet_rsk(req);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6) {
+		if (req->rsk_ops->family == AF_INET6) {
+			entry->saddr = inet6_rsk(req)->loc_addr.s6_addr32;
+			entry->daddr = inet6_rsk(req)->rmt_addr.s6_addr32;
+		} else if (req->rsk_ops->family == AF_INET) {
+			ipv6_addr_set_v4mapped(ireq->loc_addr,
+					       &entry->saddr_storage);
+			ipv6_addr_set_v4mapped(ireq->rmt_addr,
+					       &entry->daddr_storage);
+			entry->saddr = entry->saddr_storage.s6_addr32;
+			entry->daddr = entry->daddr_storage.s6_addr32;
+		}
+		return;
+	}
+#endif
+	entry->saddr = &ireq->loc_addr;
+	entry->daddr = &ireq->rmt_addr;
+}
+
 int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 			      struct sk_buff *skb, struct inet_diag_req_v2 *req,
 			      struct user_namespace *user_ns,		      	
@@ -637,8 +671,10 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
 	r->idiag_inode = 0;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (r->idiag_family == AF_INET6) {
-		*(struct in6_addr *)r->id.idiag_src = inet6_rsk(req)->loc_addr;
-		*(struct in6_addr *)r->id.idiag_dst = inet6_rsk(req)->rmt_addr;
+		struct inet_diag_entry entry;
+		inet_diag_req_addrs(sk, req, &entry);
+		memcpy(r->id.idiag_src, entry.saddr, sizeof(struct in6_addr));
+		memcpy(r->id.idiag_dst, entry.daddr, sizeof(struct in6_addr));
 	}
 #endif
 
@@ -691,18 +727,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
 				continue;
 
 			if (bc) {
-				entry.saddr =
-#if IS_ENABLED(CONFIG_IPV6)
-					(entry.family == AF_INET6) ?
-					inet6_rsk(req)->loc_addr.s6_addr32 :
-#endif
-					&ireq->loc_addr;
-				entry.daddr =
-#if IS_ENABLED(CONFIG_IPV6)
-					(entry.family == AF_INET6) ?
-					inet6_rsk(req)->rmt_addr.s6_addr32 :
-#endif
-					&ireq->rmt_addr;
+				inet_diag_req_addrs(sk, req, &entry);
 				entry.dport = ntohs(ireq->rmt_port);
 
 				if (!inet_diag_bc_run(bc, &entry))
-- 
1.7.7.3

^ permalink raw reply related

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-06 15:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121206103325.GG10837@redhat.com>

On Thursday, December 06, 2012 12:33:25 PM Michael S. Tsirkin wrote:
> On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> > This patch corrects some problems with LSM/SELinux that were introduced
> > with the multiqueue patchset.  The problem stems from the fact that the
> > multiqueue work changed the relationship between the tun device and its
> > associated socket; before the socket persisted for the life of the
> > device, however after the multiqueue changes the socket only persisted
> > for the life of the userspace connection (fd open).  For non-persistent
> > devices this is not an issue, but for persistent devices this can cause
> > the tun device to lose its SELinux label.
> > 
> > We correct this problem by adding an opaque LSM security blob to the
> > tun device struct which allows us to have the LSM security state, e.g.
> > SELinux labeling information, persist for the lifetime of the tun
> > device.  In the process we tweak the LSM hooks to work with this new
> > approach to TUN device/socket labeling and introduce a new LSM hook,
> > security_tun_dev_create_queue(), to approve requests to create a new
> > TUN queue via TUNSETQUEUE.
> > 
> > The SELinux code has been adjusted to match the new LSM hooks, the
> > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > use of the recently added "tun_socket:create_queue" permission to
> > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > policies which do not define the "tun_socket:create_queue" permission
> > the access control decision for TUNSETQUEUE will be handled according
> > to the SELinux policy's unknown permission setting.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>
> 
> OK so just to verify: this can be used to ensure that qemu
> process that has the queue fd can only attach it to
> a specific device, right?

Whenever a new queue is created via TUNSETQUEUE/tun_set_queue() the 
security_tun_dev_create_queue() LSM hook is called.  When SELinux is enabled 
this hook ends up calling selinux_tun_dev_create_queue() which checks that the 
calling process (process_t) is allowed to create a new queue on the specified 
device (tundev_t) .  If you are familiar with SELinux security policy, the 
allow rule would look like this:

  allow process_t tundev_t:tun_socket create_queue;

In practice, if we assume libvirt is creating the TUN device and running with 
a SELinux label of virtd_t and that QEMU instances are running with a SELinux 
label of svirt_t then the allow rule would look like this:

  allow svirt_t virtd_t:tun_socket create_queue;

There is also the matter of the MLS/MCS constraints providing additional 
separation but that is another level of detail which I don't believe is 
important for our discussion.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [RFC PATCH v2 1/3] tun: correctly report an error in tun_flow_init()
From: Paul Moore @ 2012-12-06 15:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <2215330.qi9iHRh0XG@jason-thinkpad-t430s>

On Thursday, December 06, 2012 06:31:29 PM Jason Wang wrote:
> On Wednesday, December 05, 2012 03:26:04 PM Paul Moore wrote:
> > On error, the error code from tun_flow_init() is lost inside
> > tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
> > error code to the "err" variable which is returned by
> > the tun_flow_init() function on error.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>
> > ---
> > 
> >  drivers/net/tun.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index a1b2389..14a0454 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1591,7 +1591,8 @@ static int tun_set_iff(struct net *net, struct file
> > *file, struct ifreq *ifr)
> > 
> >                 tun_net_init(dev);
> > 
> > -               if (tun_flow_init(tun))
> > +               err = tun_flow_init(tun);
> > +               if (err < 0)
> > 
> >                         goto err_free_dev;
> >                 
> >                 dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> > 
> > --
> 
> Looks fine, we can separate this out of this series and replace the RFC with
> net-next to let David apply it soon.

Will do shortly.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* [net-next PATCH] tun: correctly report an error in tun_flow_init()
From: Paul Moore @ 2012-12-06 15:48 UTC (permalink / raw)
  To: netdev; +Cc: jasowang

On error, the error code from tun_flow_init() is lost inside
tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
error code to the "err" variable which is returned by
the tun_flow_init() function on error.

Signed-off-by: Paul Moore <pmoore@redhat.com>
---
 drivers/net/tun.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a1b2389..14a0454 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1591,7 +1591,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		tun_net_init(dev);
 
-		if (tun_flow_init(tun))
+		err = tun_flow_init(tun);
+		if (err < 0)
 			goto err_free_dev;
 
 		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox