Netdev List
 help / color / mirror / Atom feed
* [PATCH v2 03/10] net: smc91x: use io{read,write}*_rep accessors instead of string functions
From: Will Deacon @ 2012-12-10 19:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, benh, arnd, james.hogan, Will Deacon, Nicolas Pitre,
	netdev
In-Reply-To: <1355166762-15133-1-git-send-email-will.deacon@arm.com>

The {read,write}s{b,w,l} operations are not defined by all architectures
and are being removed from the asm-generic/io.h interface.

This patch replaces the usage of these string functions in the default
SMC accessors with io{read,write}{8,16,32}_rep calls instead, which are
defined for all architectures.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/smsc/smc91x.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
index 5f53fbb..370e13d 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -286,16 +286,16 @@ static inline void mcf_outsw(void *a, unsigned char *p, int l)
 
 #define SMC_IO_SHIFT		(lp->io_shift)
 
-#define SMC_inb(a, r)		readb((a) + (r))
-#define SMC_inw(a, r)		readw((a) + (r))
-#define SMC_inl(a, r)		readl((a) + (r))
-#define SMC_outb(v, a, r)	writeb(v, (a) + (r))
-#define SMC_outw(v, a, r)	writew(v, (a) + (r))
-#define SMC_outl(v, a, r)	writel(v, (a) + (r))
-#define SMC_insw(a, r, p, l)	readsw((a) + (r), p, l)
-#define SMC_outsw(a, r, p, l)	writesw((a) + (r), p, l)
-#define SMC_insl(a, r, p, l)	readsl((a) + (r), p, l)
-#define SMC_outsl(a, r, p, l)	writesl((a) + (r), p, l)
+#define SMC_inb(a, r)		ioread8((a) + (r))
+#define SMC_inw(a, r)		ioread16((a) + (r))
+#define SMC_inl(a, r)		ioread32((a) + (r))
+#define SMC_outb(v, a, r)	iowrite8(v, (a) + (r))
+#define SMC_outw(v, a, r)	iowrite16(v, (a) + (r))
+#define SMC_outl(v, a, r)	iowrite32(v, (a) + (r))
+#define SMC_insw(a, r, p, l)	ioread16_rep((a) + (r), p, l)
+#define SMC_outsw(a, r, p, l)	iowrite16_rep((a) + (r), p, l)
+#define SMC_insl(a, r, p, l)	ioread32_rep((a) + (r), p, l)
+#define SMC_outsl(a, r, p, l)	iowrite32_rep((a) + (r), p, l)
 
 #define RPC_LSA_DEFAULT		RPC_LED_100_10
 #define RPC_LSB_DEFAULT		RPC_LED_TX_RX
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH 2/2] smsc95xx: fix async register writes on big endian platforms
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1355137388-2938-2-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 10 Dec 2012 11:03:08 +0000

> This patch fixes a missing endian conversion which results in the
> interface failing to come up on BE platforms.
> 
> It also removes an unnecessary pointer dereference from this
> function.
> 
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] smsc95xx: fix register dump of last register
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1355137388-2938-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 10 Dec 2012 11:03:07 +0000

> This patch fixes the ethtool register dump for smsc95xx to dump
> all 4 bytes of the final register (COE_CR) instead of just the
> first byte.
> 
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>

Applied.

^ permalink raw reply

* Re: [PATCH] smsc75xx: only set mac address once on bind
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev, bjorn, dcbw
In-Reply-To: <1355137279-2695-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 10 Dec 2012 11:01:19 +0000

> This patch changes when we decide what the device's MAC address
> is from per ifconfig up to once when the device is connected.
> 
> Without this patch, a manually forced device MAC is overwritten
> on ifconfig down/up.  Also devices that have no EEPROM are
> assigned a new random address on ifconfig down/up instead of
> persisting the same one.
> 
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
> Reported-by: Robert Cunningham <rcunningham@nsmsurveillance.com>

Applied.

^ permalink raw reply

* Re: [PATCH RESEND] net: remove obsolete simple_strto<foo>
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: abhi.c.pawar
  Cc: pablo, kaber, kuznet, jmorris, yoshfuji, linville, johannes,
	amwang, edumazet, nhorman, joe, netdev, linux-kernel,
	netfilter-devel, netfilter, coreteam, linux-wireless
In-Reply-To: <1355130748-7828-1-git-send-email-abhi.c.pawar@gmail.com>

From: Abhijit Pawar <abhi.c.pawar@gmail.com>
Date: Mon, 10 Dec 2012 14:42:28 +0530

> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
> 
> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: Allow DCBnl to use other namespaces besides init_net
From: David Miller @ 2012-12-10 19:09 UTC (permalink / raw)
  To: john.fastabend; +Cc: netdev, ebiederm
In-Reply-To: <20121210064813.10883.88194.stgit@nitbit.x32>

From: John Fastabend <john.fastabend@gmail.com>
Date: Sun, 09 Dec 2012 22:48:13 -0800

> Allow DCB and net namespace to work together. This is useful if you
> have containers that are bound to 'phys' interfaces that want to
> also manage their DCB attributes.
> 
> The net namespace is taken from sock_net(skb->sk) of the netlink skb.
> 
> CC: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied.

^ permalink raw reply

* Re: ipgre rss is broken since gro
From: David Miller @ 2012-12-10 19:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dmitry, edumazet, netdev, therbert
In-Reply-To: <1355158471.27891.44.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Dec 2012 08:54:31 -0800

> I believe performance problem might come from the
> skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
> 
> So all packets are queued into a single GRO queue, instead of being
> split as intended in multiple queues.
> 
> I cant find why we must clear queue_mapping, so could you try :

Tom says:

commit 693019e90ca45d881109d32c0c6d29adf03f6447
Author: Tom Herbert <therbert@google.com>
Date:   Thu Sep 23 11:19:54 2010 +0000

    net: reset skb queue mapping when rx'ing over tunnel
    
    Reset queue mapping when an skb is reentering the stack via a tunnel.
    On second pass, the queue mapping from the original device is no
    longer valid.
    
    Signed-off-by: Tom Herbert <therbert@google.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/dst.h b/include/net/dst.h
index 81d1413..0238650 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -242,6 +242,7 @@ static inline void skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 	dev->stats.rx_packets++;
 	dev->stats.rx_bytes += skb->len;
 	skb->rxhash = 0;
+	skb_set_queue_mapping(skb, 0);
 	skb_dst_drop(skb);
 	nf_reset(skb);
 }

^ permalink raw reply related

* Re: [PATCH net-next v3 01/22] bnx2x: Support probing and removing of VF device
From: Joe Perches @ 2012-12-10 18:50 UTC (permalink / raw)
  To: David Miller; +Cc: ariele, netdev, eilong
In-Reply-To: <20121210.133519.2048543022289027034.davem@davemloft.net>

On Mon, 2012-12-10 at 13:35 -0500, David Miller wrote:
> From: "Ariel Elior" <ariele@broadcom.com>
> Date: Mon, 10 Dec 2012 17:46:25 +0200
[]
> > @@ -12023,85 +12057,115 @@ static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev,
> >        * If MSI-X is not supported - return number of SBs needed to support
> >        * one fast path queue: one FP queue + SB for CNIC
> >        */
> > -     if (!pos)
> > +     if (!pos) {
> > +             pr_info("no msix capability found");
[]
> Use dev_info(), netdev_info(), or similar, rather than plain
> pr_info().

Also, please add terminating newlines to avoid
dmesg message interleaving.

^ permalink raw reply

* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: David Miller @ 2012-12-10 18:50 UTC (permalink / raw)
  To: johannes-cdvu00un1VgdHxzADdlk8Q
  Cc: eric-EVVnsjFE0OfYtjvyW6yDsg, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linville-2XuSBdqkA4R54TAoqtyWWQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1355165152.8083.4.camel-8Nb76shvtaUJvtFkdXX2HixXY32XiHfO@public.gmane.org>

From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Mon, 10 Dec 2012 19:45:52 +0100

> On Mon, 2012-12-10 at 13:41 -0500, David Miller wrote:
>> So the bug is that ip_check_defrag() has a precondition which is met
>> properly by all callers except AF_PACKET.
>> 
>> If this is the case, remind me why are we changing ip_check_defrag()
>> rather than the violator of the precondition?
> 
> I don't think this is the case.
> 
> If you're referring to my note about af_packet: the kernels where this
> goes into af_packet.c are the kernels that don't even have
> ip_check_defrag() because macvlan didn't exist/didn't have ip defrag
> support and af_packet had this code there -- see commit bc416d9768a.
> 
> If you're not referring to my note about af_packet: both callers (there
> are only two) of ip_check_defrag() have this bug as far as I can tell
> because they're both in the part of the RX path where shared SKBs might
> happen.

You're right, I misinterpreted what's happening here.

My misunderstanding was that this was a situation where normal IPV4
input processing makes sure the SKB is unshared, and we had special
code paths that didn't make sure that was the case.

Rather, here, we have a special entrypoint for macvlan and AF_PACKET
which is supposed to take care of such issues since it is known to
execute in a different kind of environment.

I'm pretty sure I'll apply this, after I check a few more things,
thanks Johannes!
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [Patch net-next] virtio_net: fix a typo in virtnet_alloc_queues()
From: David Miller @ 2012-12-10 18:46 UTC (permalink / raw)
  To: amwang; +Cc: netdev, jasowang
In-Reply-To: <1355142248-19987-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Mon, 10 Dec 2012 20:24:08 +0800

> Obviously it should check !vi->rq.
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com> 
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [Patch net-next] bridge: fix seq check in br_mdb_dump()
From: David Miller @ 2012-12-10 18:46 UTC (permalink / raw)
  To: amwang; +Cc: netdev, herbert, shemminger, tgraf, brouer
In-Reply-To: <1355141735-19576-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Mon, 10 Dec 2012 20:15:35 +0800

> From: Cong Wang <amwang@redhat.com>
> 
> In case of rehashing, introduce a global variable 'br_mdb_rehash_seq'
> which gets increased every time when rehashing, and assign
> net->dev_base_seq + br_mdb_rehash_seq to cb->seq.
> 
> In theory cb->seq could be wrapped to zero, but this is not
> easy to fix, as net->dev_base_seq is not visible inside
> br_mdb_rehash(). In practice, this is rare.
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Cong Wang <amwang@redhat.com>

No synchronization at all is applied to this variable, I can't
see how this is OK.

^ permalink raw reply

* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: Johannes Berg @ 2012-12-10 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: eric, netdev, linux-wireless, linville, eric.dumazet
In-Reply-To: <20121210.134146.1583909966821253233.davem@davemloft.net>

On Mon, 2012-12-10 at 13:41 -0500, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Mon, 10 Dec 2012 10:41:06 +0100
> 
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > ip_check_defrag() might be called from af_packet within the
> > RX path where shared SKBs are used, so it must not modify
> > the input SKB before it has unshared it for defragmentation.
> > Use skb_copy_bits() to get the IP header and only pull in
> > everything later.
> > 
> > The same is true for the other caller in macvlan as it is
> > called from dev->rx_handler which can also get a shared SKB.
> > 
> > Reported-by: Eric Leblond <eric@regit.org>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > ---
> > For some versions of the kernel, this code goes into af_packet.c
> 
> So the bug is that ip_check_defrag() has a precondition which is met
> properly by all callers except AF_PACKET.
> 
> If this is the case, remind me why are we changing ip_check_defrag()
> rather than the violator of the precondition?

I don't think this is the case.

If you're referring to my note about af_packet: the kernels where this
goes into af_packet.c are the kernels that don't even have
ip_check_defrag() because macvlan didn't exist/didn't have ip defrag
support and af_packet had this code there -- see commit bc416d9768a.

If you're not referring to my note about af_packet: both callers (there
are only two) of ip_check_defrag() have this bug as far as I can tell
because they're both in the part of the RX path where shared SKBs might
happen.

johannes

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Eric Paris @ 2012-12-10 18:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paul Moore, Linux Netdev List, LSM List, SE-Linux, jasowang
In-Reply-To: <20121210175035.GA31856@redhat.com>

Let me abstract a little here Paul.  Lets say user A starts an
unclassified process and a top secret process.  SELinux policy darn
well better be able to enforce that they can not attach to the same
tun.

Am I missing something here?

On Mon, Dec 10, 2012 at 12:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
>> On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
>> > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
>> > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
>> > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
>> > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
>> > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
>> > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist
>> > > > > > > in any released SELinux policy as we are just now adding them with
>> > > > > > > this patchset. With current policies loaded into a kernel with
>> > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
>> > > > > > > permission would be treated according to the policy's unknown
>> > > > > > > permission setting.
>> > > > > >
>> > > > > > OK I think we need to rethink what we are doing here: what you sent
>> > > > > > addresses the problem as stated but I think we mis-stated it.  Let
>> > > > > > me try to restate the problem: it is not just selinux problem. Let's
>> > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it as
>> > > > > > root.
>> > > > > >
>> > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
>> > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
>> > > > > > kernel priveledges.
>> > > > >
>> > > > > Correct me if I'm wrong, but I believe libvirt does this while running
>> > > > > as root.  Assuming that is the case, why not simply setuid()/setgid()
>> > > > > to the same credentials as the QEMU instance before creating the TUN
>> > > > > device? You can always (re)configure the device afterwards while
>> > > > > running as root/CAP_NET_ADMIN.
>> > > >
>> > > > We want isolation between qemu instances.
>> > >
>> > > Understood, I agree.
>> > >
>> > > Achieving separation via SELinux is easily done, with libvirt/sVirt
>> > > already doing this for us automatically in most cases; the only thing we
>> > > will want to do is make sure the SELinux policy is aware of the new
>> > > permission.
>> > >
>> > > Achieving separation via DAC should also be easily done, simply run each
>> > > QEMU instance with a separate UID and/or GID.
>> > >
>> > > > Giving qemu right to open tun and SETIFF would give it rights
>> > > > to access any tun device.
>> > >
>> > > I'm quickly looked at tun_chr_open() again and I don't see any special
>> > > rights/privileges required, the same for tun_chr_ioctl() and
>> > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
>> > > tun_not_capable() which does a simple DAC check; it must have the same
>> > > UID/GID or have CAP_NET_ADMIN.
>> > >
>> > > I'm having a hard time seeing the problem you are describing; help me
>> > > understand.
>> >
>> > The issue is guest controls the number of queues in use.
>> > So qemu would be required to be allowed to call tun_set_queue.
>> > If we allow this we have a problem as one qemu will be
>> > able to access any tun.
>>
>> QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(), which
>> from a practical point of view means that the TUN device was created with the
>> same UID/GID as the QEMU instance.  If you want TUN device separation between
>> QEMU instances using DAC you need to run each QEMU instance with a different
>> UID/GID (which you should be doing anyway if you want DAC enforced general
>> separation).
>>
>> I believe I've stated this point several times now and I don't feel you've
>> addressed it properly.
>
> Look at how it works at the moment:
> a priveledged libvirt server calls tun_set_iff
> and passes the fd to qemu which is not priveledged.
>
> The result is isolation between qemu instances without
> need to create uid per qemu instance.
>
> How do we create multiple queues? It makes sense to
> follow this model and pass in fds for individual queues.
> However they need to be disabled initially
> so libvirt can not do tun_set_queue for us.
> When qemu later calls tun_set_queue it will fail which means we
> can't utilize multiqueue.
>
> My solution is an unpriveledged variant
> of tun_set_queue that only enables/disables
> a queue without attach/detach.
>
>
>> --
>> paul moore
>> security and virtualization @ redhat
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: David Miller @ 2012-12-10 18:41 UTC (permalink / raw)
  To: johannes; +Cc: eric, netdev, linux-wireless, linville, eric.dumazet
In-Reply-To: <1355132466.9857.6.camel@jlt4.sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Mon, 10 Dec 2012 10:41:06 +0100

> From: Johannes Berg <johannes.berg@intel.com>
> 
> ip_check_defrag() might be called from af_packet within the
> RX path where shared SKBs are used, so it must not modify
> the input SKB before it has unshared it for defragmentation.
> Use skb_copy_bits() to get the IP header and only pull in
> everything later.
> 
> The same is true for the other caller in macvlan as it is
> called from dev->rx_handler which can also get a shared SKB.
> 
> Reported-by: Eric Leblond <eric@regit.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
> For some versions of the kernel, this code goes into af_packet.c

So the bug is that ip_check_defrag() has a precondition which is met
properly by all callers except AF_PACKET.

If this is the case, remind me why are we changing ip_check_defrag()
rather than the violator of the precondition?

^ permalink raw reply

* Re: [PATCH net-next v3 03/22] bnx2x: Add to VF <-> PF channel the release request
From: David Miller @ 2012-12-10 18:39 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1355154406-10855-4-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Mon, 10 Dec 2012 17:46:27 +0200

> +	/* PF released us */
> +	if (resp->hdr.status == PFVF_STATUS_SUCCESS) {
> +		DP(BNX2X_MSG_SP, "vf released\n");
> +
> +	/* PF reports error */
> +	} else {

Seriously?  Do I even need to tell you how awful this looks?

Remove the empty line and put that comment before the "else"
somewhere where it doesn't have to have indentation which does
not match the basic block it is in.

I think I've reviewed enough of this series, please audit the
rest of coding style problems yourself.

^ permalink raw reply

* Re: [PATCH net-next v3 02/22] bnx2x: VF <-> PF channel 'acquire' at vf probe
From: David Miller @ 2012-12-10 18:37 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1355154406-10855-3-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Mon, 10 Dec 2012 17:46:26 +0200

> +/* VF MBX (aka vf-pf channel) */
> +#include "bnx2x.h"
> +#include "bnx2x_sriov.h"
> +/* place a given tlv on the tlv buffer at a given offset */
> +void bnx2x_add_tlv(struct bnx2x *bp, void *tlvs_list, u16 offset, u16 type,
> +		   u16 length)

Put an empty line between include directives and the rest of the
file.

> +	while (tlv->type != CHANNEL_TLV_LIST_END) {
> +
> +		/* output tlv */
> +		DP(BNX2X_MSG_IOV, "TLV number %d: type %d, length %d\n", i,
> +		   tlv->type, tlv->length);

But this is a place where an empty line is not appropriate, please remove
the empty line after the one with the while().

> +/* acquire response tlv - carries the allocated resources */
> +struct pfvf_acquire_resp_tlv {
> +	struct pfvf_tlv hdr;
> +	struct pf_vf_pfdev_info {
> +		u32 chip_num;
> +		u32 pf_cap;
> +		#define PFVF_CAP_RSS		0x00000001
> +		#define PFVF_CAP_DHC		0x00000002
> +		#define PFVF_CAP_TPA		0x00000004

Please don't indent CPP defines excessively like this, they should be
at or near the initial column so that CPP directives can be easily
spotted by someone reading this code.

^ permalink raw reply

* Re: [PATCH net-next v3 01/22] bnx2x: Support probing and removing of VF device
From: David Miller @ 2012-12-10 18:35 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1355154406-10855-2-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Mon, 10 Dec 2012 17:46:25 +0200

>  static void bnx2x_get_pcie_width_speed(struct bnx2x *bp, int *width, int *speed)
>  {
> -	u32 val = REG_RD(bp, PCICFG_OFFSET + PCICFG_LINK_CONTROL);
> +	u32 val = 0;
> +	pci_read_config_dword(bp->pdev, PCICFG_LINK_CONTROL, &val);

Please put an empty line between function local variable declarations
and actual code.

> @@ -12023,85 +12057,115 @@ static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev,
>  	 * If MSI-X is not supported - return number of SBs needed to support
>  	 * one fast path queue: one FP queue + SB for CNIC
>  	 */
> -	if (!pos)
> +	if (!pos) {
> +		pr_info("no msix capability found");
>  		return 1 + cnic_cnt;
> +	}
> +
> +	pr_info("msix capability found");
>  

Use dev_info(), netdev_info(), or similar, rather than plain
pr_info().

^ permalink raw reply

* RE: ixgbe:  pci_get_device() call without counterpart call of pci_dev_put()
From: Rose, Gregory V @ 2012-12-10 18:24 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, Elena Gurevich
  Cc: netdev, e1000-devel@lists.sourceforge.net, Skidmore, Donald C
In-Reply-To: <50C58C98.2030509@gmail.com>

> -----Original Message-----
> From: Jeff Kirsher [mailto:tarbal@gmail.com]
> Sent: Sunday, December 09, 2012 11:18 PM
> To: Elena Gurevich
> Cc: netdev; e1000-devel@lists.sourceforge.net; Rose, Gregory V; Skidmore,
> Donald C
> Subject: Re: ixgbe: pci_get_device() call without counterpart call of
> pci_dev_put()
> 
> On 12/09/2012 01:47 AM, Elena Gurevich wrote:
> > Hi all,
> > I am pioneer in linux device drivers here and using Intel 82599 NIC as
> > reference model, During investigation to drivers sources I found the
> > suspicious code:
> >
> > Is code sequence (1)   and (2) the possible device reference count
> leakage
> > ???
> >
> >
> > Thanks a lot in advance
> > Lena
> >
> > --------snipped from ixgbe_main.c  file function
> > ixgbe_io_error_detected()
> > -----------
> >
> > . . .
> > 		/* Find the pci device of the offending VF */
> > 		vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, device_id, NULL);
> > 		while (vfdev) {
> > 			if (vfdev->devfn == (req_id & 0xFF))
> > 				break;
> > <------------------------------      (1) leaves the loop with successful
> get
> > call  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> >
> > 			vfdev = pci_get_device(PCI_VENDOR_ID_INTEL,
> > 					       device_id, vfdev);
> > 		}
> > 		/*
> > 		 * There's a slim chance the VF could have been hot plugged,
> > 		 * so if it is no longer present we don't need to issue the
> > 		 * VFLR.  Just clean up the AER in that case.
> > 		 */
> > 		if (vfdev) {
> > 			e_dev_err("Issuing VFLR to VF %d\n", vf);
> > 			pci_write_config_dword(vfdev, 0xA8, 0x00008000);
> > 		}
> >
> > 		pci_cleanup_aer_uncorrect_error_status(pdev);
> > 	}
> >
> > 	/*
> > 	 * Even though the error may have occurred on the other port
> > 	 * we still need to increment the vf error reference count for
> > 	 * both ports because the I/O resume function will be called
> > 	 * for both of them.
> > 	 */
> > 	adapter->vferr_refcount++;
> >
> > 	return PCI_ERS_RESULT_RECOVERED;
> > <--------------------------------------------  (2) leaves the function
> > without put call !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> > ----------------------------------  snipped
> > -----------------------------------
> >
> Added Greg Rose (ixgbevf maintainer) & Don Skidmore (ixgbe maintainer)
> 
> Since the code you have questions about is dealing with the vf, Greg will
> most likely be able to shed some light to your questions.

That's a bug.  I'll generate a patch to add the necessary pci_dev_put().

- Greg


^ permalink raw reply

* Re: [PATCH net-next v3] tipc: sk_recv_queue size check only for connectionless sockets
From: Neil Horman @ 2012-12-10 18:22 UTC (permalink / raw)
  To: Jon Maloy; +Cc: Ying Xue, Paul.Gortmaker, erik.hugne, netdev, tipc-discussion
In-Reply-To: <50C60496.5040800@ericsson.com>

On Mon, Dec 10, 2012 at 10:49:42AM -0500, Jon Maloy wrote:
> On 12/10/2012 05:13 AM, Jon Maloy wrote:
> > On 12/10/2012 04:23 AM, Ying Xue wrote:
> >> The sk_receive_queue limit control is currently performed for all
> >> arriving messages, disregarding socket and message type. But for
> >> connectionless sockets this check is redundant, since the protocol
> >> flow already makes queue overflow impossible.
> >>
> >> We move the sk_receive_queue limit control so that it's only performed
> >> for connectionless sockets, i.e. SOCK_RDM and SOCK_DGRAM type sockets.
> >>
> >> However, as Neil Horman specified, we cannot simply force the socket
> >> receive queue limit against connectionless sockets as it may create a
> >> DoS vulnerability. For example, if a sender floods a receiver with
> >> messages containing an invalid set of message importance bits or
> >> CRITICAL importance, we will queue messages indefinitely.
> >>
> >> To avoid DoS attack, socket receive queue will be marked as overflow
> >> if we receive messages with invalid message importances, meanwhile,
> >> we also set one new threshold for CRITICAL importance messages.
> >>
> >> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> >> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> >> Cc: Neil Horman <nhorman@tuxdriver.com>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> >> ---
> >> v3 changes:
> >>  - set new threshold for CRITICAL message
> >>  - defined an importance factor table to avoid multiplication and
> >>    division operations in rx_queue_full().
> >>  - changed return value of rx_queue_full() from integer to boolean.
> >>
> >>  net/tipc/socket.c |   44 +++++++++++++++++++-------------------------
> >>  1 files changed, 19 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> >> index 9b4e483..a18a757 100644
> >> --- a/net/tipc/socket.c
> >> +++ b/net/tipc/socket.c
> >> @@ -43,7 +43,7 @@
> >>  #define SS_LISTENING	-1	/* socket is listening */
> >>  #define SS_READY	-2	/* socket is connectionless */
> >>  
> >> -#define OVERLOAD_LIMIT_BASE	10000
> >> +#define OVERLOAD_LIMIT_BASE	5000
> >>  #define CONN_TIMEOUT_DEFAULT	8000	/* default connect timeout = 8s */
> >>  
> >>  struct tipc_sock {
> >> @@ -73,6 +73,13 @@ static struct proto tipc_proto;
> >>  
> >>  static int sockets_enabled;
> >>  
> >> +static const u32 msg_importance_factor[] = {
> >> +	OVERLOAD_LIMIT_BASE,		/* TIPC_LOW_IMPORTANCE limit */
> >> +	OVERLOAD_LIMIT_BASE * 2,	/* TIPC_MEDIUM_IMPORTANCE limit */
> >> +	OVERLOAD_LIMIT_BASE * 100,	/* TIPC_HIGH_IMPORTANCE limit */
> >> +	OVERLOAD_LIMIT_BASE * 200	/* TIPC_CRITICAL_IMPORTANCE limit */
> >> +	};
> >> +
> >>  /*
> >>   * Revised TIPC socket locking policy:
> >>   *
> >> @@ -1158,28 +1165,17 @@ static void tipc_data_ready(struct sock *sk, int len)
> >>   * rx_queue_full - determine if receive queue can accept another message
> >>   * @msg: message to be added to queue
> >>   * @queue_size: current size of queue
> >> - * @base: nominal maximum size of queue
> >>   *
> >> - * Returns 1 if queue is unable to accept message, 0 otherwise
> >> + * Returns true if queue is unable to accept message, false otherwise
> >>   */
> >> -static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
> >> +static bool rx_queue_full(struct tipc_msg *msg, u32 queue_size)
> >>  {
> >> -	u32 threshold;
> >>  	u32 imp = msg_importance(msg);
> >>  
> >> -	if (imp == TIPC_LOW_IMPORTANCE)
> >> -		threshold = base;
> >> -	else if (imp == TIPC_MEDIUM_IMPORTANCE)
> >> -		threshold = base * 2;
> >> -	else if (imp == TIPC_HIGH_IMPORTANCE)
> >> -		threshold = base * 100;
> >> -	else
> >> -		return 0;
> >> +	if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE))
> >> +		return true;
> > 
> > This test is not necessary. Such messages have already been filtered out
> > in tipc_recv_msg() at link level.
> > The test msg_isdata(), which determines if a message should be sent up to
> > the port/socket level, is  also an implicit test that 
> > importance < TIPC_CRITICAL_IMPORTANCE.
> 
> (importance <= TIPC_CRITICAL_IMPORTANCE), of course.
> This is an effect of the co-location of the user and importance fields in the
> TIPC header. I.e., the importance is in reality coded into the value of
> the user field.
> 
> To clarify (and improve) my previous suggestion, what I had in mind
> was something like this:
> 
> 
>                 recv_q_len = skb_queue_len(&sk->sk_receive_queue);
> -               if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
> -                       if (rx_queue_full(msg, recv_q_len,
> -                           OVERLOAD_LIMIT_BASE / 2))
> -                               return TIPC_ERR_OVERLOAD;
> -               }
> +               imp = msg_importance(msg);
> +               if (unlikely(recv_q_len > (OVERLOAD_LIMIT_BASE << (imp << 1))))
> +                       return TIPC_ERR_OVERLOAD;                 
>         } else {
>                 if (msg_mcast(msg))
>                         return TIPC_ERR_NO_PORT;
> 
> No need for any separate function at all, and guaranteed inline.
> This will probably translate to one single shift-instruction extra,
> and no memor access, since the compiler merge (imp << 1) with the 
> shift operation used to read imp from the header.
> 
> 
> The limits for message rejection become the following,
> given an OVERLOAD_LIMIT_BASE of 10000:
> 
> TIP_LOW_IMPORTANCE:       10000    (previously 10000)
> TIPC_MEDIUM_IMPORTANCE    40000    (previously 20000)
> TIPC_MEDIUM_IMPORTANCE    1600000  (previously 1000000)
> TIPC_CRITICAL_IMPORTANCE  25600000 (previously no limit)
> 
> ///jon
> 
Sure, this will work nicely as well.
Neil

^ permalink raw reply

* netconsole fun
From: Peter Hurley @ 2012-12-10 18:00 UTC (permalink / raw)
  To: netdev

Is there a recommended method of running netconsole on a machine that
has a slaved device?

I ask because it seems pretty difficult to get netconsole running even
on a machine that has multiple physical interfaces, only one of which is
slaved.

I scoured the documentation but didn't find anything relevant (does
documentation/networking/netconsole.txt need a patch?). AFAICT, the last
discussion ended here back in June '11
http://lkml.indiana.edu/hypermail/linux/kernel/1106.1/03185.html 

Any help or pointers provided would be much appreciated.

Regards,
Peter Hurley

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-10 17:50 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <3124654.2UMIXvF0vN@sifl>

On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
> On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist
> > > > > > > in any released SELinux policy as we are just now adding them with
> > > > > > > this patchset. With current policies loaded into a kernel with
> > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
> > > > > > > permission would be treated according to the policy's unknown
> > > > > > > permission setting.
> > > > > > 
> > > > > > OK I think we need to rethink what we are doing here: what you sent
> > > > > > addresses the problem as stated but I think we mis-stated it.  Let
> > > > > > me try to restate the problem: it is not just selinux problem. Let's
> > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it as
> > > > > > root.
> > > > > > 
> > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
> > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > > > kernel priveledges.
> > > > > 
> > > > > Correct me if I'm wrong, but I believe libvirt does this while running
> > > > > as root.  Assuming that is the case, why not simply setuid()/setgid()
> > > > > to the same credentials as the QEMU instance before creating the TUN
> > > > > device? You can always (re)configure the device afterwards while
> > > > > running as root/CAP_NET_ADMIN.
> > > > 
> > > > We want isolation between qemu instances.
> > > 
> > > Understood, I agree.
> > > 
> > > Achieving separation via SELinux is easily done, with libvirt/sVirt
> > > already doing this for us automatically in most cases; the only thing we
> > > will want to do is make sure the SELinux policy is aware of the new
> > > permission.
> > > 
> > > Achieving separation via DAC should also be easily done, simply run each
> > > QEMU instance with a separate UID and/or GID.
> > > 
> > > > Giving qemu right to open tun and SETIFF would give it rights
> > > > to access any tun device.
> > > 
> > > I'm quickly looked at tun_chr_open() again and I don't see any special
> > > rights/privileges required, the same for tun_chr_ioctl() and
> > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
> > > tun_not_capable() which does a simple DAC check; it must have the same
> > > UID/GID or have CAP_NET_ADMIN.
> > > 
> > > I'm having a hard time seeing the problem you are describing; help me
> > > understand.
> > 
> > The issue is guest controls the number of queues in use.
> > So qemu would be required to be allowed to call tun_set_queue.
> > If we allow this we have a problem as one qemu will be
> > able to access any tun.
> 
> QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(), which 
> from a practical point of view means that the TUN device was created with the 
> same UID/GID as the QEMU instance.  If you want TUN device separation between 
> QEMU instances using DAC you need to run each QEMU instance with a different 
> UID/GID (which you should be doing anyway if you want DAC enforced general 
> separation).
> 
> I believe I've stated this point several times now and I don't feel you've 
> addressed it properly.

Look at how it works at the moment:
a priveledged libvirt server calls tun_set_iff
and passes the fd to qemu which is not priveledged.

The result is isolation between qemu instances without
need to create uid per qemu instance.

How do we create multiple queues? It makes sense to
follow this model and pass in fds for individual queues.
However they need to be disabled initially
so libvirt can not do tun_set_queue for us.
When qemu later calls tun_set_queue it will fail which means we
can't utilize multiqueue.

My solution is an unpriveledged variant
of tun_set_queue that only enables/disables
a queue without attach/detach.


> -- 
> paul moore
> security and virtualization @ redhat

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-10 17:33 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121210172656.GA30775@redhat.com>

On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
> > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist
> > > > > > in any released SELinux policy as we are just now adding them with
> > > > > > this patchset. With current policies loaded into a kernel with
> > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
> > > > > > permission would be treated according to the policy's unknown
> > > > > > permission setting.
> > > > > 
> > > > > OK I think we need to rethink what we are doing here: what you sent
> > > > > addresses the problem as stated but I think we mis-stated it.  Let
> > > > > me try to restate the problem: it is not just selinux problem. Let's
> > > > > assume qemu wants to use tun, I (libvirt) don't want to run it as
> > > > > root.
> > > > > 
> > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
> > > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > > kernel priveledges.
> > > > 
> > > > Correct me if I'm wrong, but I believe libvirt does this while running
> > > > as root.  Assuming that is the case, why not simply setuid()/setgid()
> > > > to the same credentials as the QEMU instance before creating the TUN
> > > > device? You can always (re)configure the device afterwards while
> > > > running as root/CAP_NET_ADMIN.
> > > 
> > > We want isolation between qemu instances.
> > 
> > Understood, I agree.
> > 
> > Achieving separation via SELinux is easily done, with libvirt/sVirt
> > already doing this for us automatically in most cases; the only thing we
> > will want to do is make sure the SELinux policy is aware of the new
> > permission.
> > 
> > Achieving separation via DAC should also be easily done, simply run each
> > QEMU instance with a separate UID and/or GID.
> > 
> > > Giving qemu right to open tun and SETIFF would give it rights
> > > to access any tun device.
> > 
> > I'm quickly looked at tun_chr_open() again and I don't see any special
> > rights/privileges required, the same for tun_chr_ioctl() and
> > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
> > tun_not_capable() which does a simple DAC check; it must have the same
> > UID/GID or have CAP_NET_ADMIN.
> > 
> > I'm having a hard time seeing the problem you are describing; help me
> > understand.
> 
> The issue is guest controls the number of queues in use.
> So qemu would be required to be allowed to call tun_set_queue.
> If we allow this we have a problem as one qemu will be
> able to access any tun.

QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(), which 
from a practical point of view means that the TUN device was created with the 
same UID/GID as the QEMU instance.  If you want TUN device separation between 
QEMU instances using DAC you need to run each QEMU instance with a different 
UID/GID (which you should be doing anyway if you want DAC enforced general 
separation).

I believe I've stated this point several times now and I don't feel you've 
addressed it properly.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-10 17:26 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <5998443.squEvSxCG9@sifl>

On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist in
> > > > > any released SELinux policy as we are just now adding them with this
> > > > > patchset. With current policies loaded into a kernel with this
> > > > > patchset applied the SETQUEUE/tun_socket:create_queue permission would
> > > > > be treated according to the policy's unknown permission setting.
> > > > 
> > > > OK I think we need to rethink what we are doing here: what you sent
> > > > addresses the problem as stated but I think we mis-stated it.  Let me
> > > > try to restate the problem: it is not just selinux problem. Let's assume
> > > > qemu wants to use tun, I (libvirt) don't want to run it as root.
> > > > 
> > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
> > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > kernel priveledges.
> > > 
> > > Correct me if I'm wrong, but I believe libvirt does this while running as
> > > root.  Assuming that is the case, why not simply setuid()/setgid() to the
> > > same credentials as the QEMU instance before creating the TUN device? 
> > > You can always (re)configure the device afterwards while running as
> > > root/CAP_NET_ADMIN.
> > 
> > We want isolation between qemu instances.
> 
> Understood, I agree.
> 
> Achieving separation via SELinux is easily done, with libvirt/sVirt already 
> doing this for us automatically in most cases; the only thing we will want to 
> do is make sure the SELinux policy is aware of the new permission.
> 
> Achieving separation via DAC should also be easily done, simply run each QEMU 
> instance with a separate UID and/or GID.
> 
> > Giving qemu right to open tun and SETIFF would give it rights
> > to access any tun device.
> 
> I'm quickly looked at tun_chr_open() again and I don't see any special 
> rights/privileges required, the same for tun_chr_ioctl() and 
> __tun_chr_ioctl().  Looking at tun_set_queue() I see we call tun_not_capable() 
> which does a simple DAC check; it must have the same UID/GID or have 
> CAP_NET_ADMIN.
> 
> I'm having a hard time seeing the problem you are describing; help me 
> understand.

The issue is guest controls the number of queues in use.
So qemu would be required to be allowed to call tun_set_queue.
If we allow this we have a problem as one qemu will be
able to access any tun.

At least with DAC, looks like there's a problem. SELinux I think
can address this.

> > There could also be user tun users we want them isolated from qemu.
> 
> Once again, should be possible using either SELinux, DAC, or both.
> 
> -- 
> paul moore
> security and virtualization @ redhat

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-10 17:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121207122516.GA16577@redhat.com>

On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
> > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist in
> > > > any released SELinux policy as we are just now adding them with this
> > > > patchset. With current policies loaded into a kernel with this
> > > > patchset applied the SETQUEUE/tun_socket:create_queue permission would
> > > > be treated according to the policy's unknown permission setting.
> > > 
> > > OK I think we need to rethink what we are doing here: what you sent
> > > addresses the problem as stated but I think we mis-stated it.  Let me
> > > try to restate the problem: it is not just selinux problem. Let's assume
> > > qemu wants to use tun, I (libvirt) don't want to run it as root.
> > > 
> > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
> > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > kernel priveledges.
> > 
> > Correct me if I'm wrong, but I believe libvirt does this while running as
> > root.  Assuming that is the case, why not simply setuid()/setgid() to the
> > same credentials as the QEMU instance before creating the TUN device? 
> > You can always (re)configure the device afterwards while running as
> > root/CAP_NET_ADMIN.
> 
> We want isolation between qemu instances.

Understood, I agree.

Achieving separation via SELinux is easily done, with libvirt/sVirt already 
doing this for us automatically in most cases; the only thing we will want to 
do is make sure the SELinux policy is aware of the new permission.

Achieving separation via DAC should also be easily done, simply run each QEMU 
instance with a separate UID and/or GID.

> Giving qemu right to open tun and SETIFF would give it rights
> to access any tun device.

I'm quickly looked at tun_chr_open() again and I don't see any special 
rights/privileges required, the same for tun_chr_ioctl() and 
__tun_chr_ioctl().  Looking at tun_set_queue() I see we call tun_not_capable() 
which does a simple DAC check; it must have the same UID/GID or have 
CAP_NET_ADMIN.

I'm having a hard time seeing the problem you are describing; help me 
understand.

> There could also be user tun users we want them isolated from qemu.

Once again, should be possible using either SELinux, DAC, or both.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* RE: ipgre rss is broken since gro
From: Eric Dumazet @ 2012-12-10 16:54 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <504C9EFCA2D0054393414C9CB605C37F1BFC104B@SJEXCHMB06.corp.ad.broadcom.com>

On Mon, 2012-12-10 at 11:32 +0000, Dmitry Kravkov wrote:

> CPU is not loaded at all 
> 
> > cat /proc/net/softnet_stat
> Please find attached.
> 
> For gre interface RX and DROP statistics are advancing simultaneously (by one each ICMP request):
> 
> [root@ ~]# ifconfig gre
> gre       Link encap:UNSPEC  HWaddr C0-A8-0A-40-73-72-83-D2-00-00-00-00-00-00-00-00
>           inet addr:8.0.0.1  P-t-P:8.0.0.1  Mask:255.255.255.0
>           inet6 addr: fe80::5efe:c0a8:a40/64 Scope:Link
>           UP POINTOPOINT RUNNING NOARP  MTU:1476  Metric:1
>           RX packets:1646824 errors:0 dropped:51610 overruns:0 frame:0
>           TX packets:140519 errors:1 dropped:0 overruns:0 carrier:1
>           collisions:0 txqueuelen:0
>           RX bytes:2357650904 (2.1 GiB)  TX bytes:7309072 (6.9 MiB)


dropped:51610  so obviously  one cpu is fully loaded.

I believe performance problem might come from the
skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()

So all packets are queued into a single GRO queue, instead of being
split as intended in multiple queues.

I cant find why we must clear queue_mapping, so could you try :


diff --git a/include/net/dst.h b/include/net/dst.h
index 9a78810..4cb27df 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -329,7 +329,6 @@ static inline void __skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 	 */
 	if (!skb->l4_rxhash)
 		skb->rxhash = 0;
-	skb_set_queue_mapping(skb, 0);
 	skb_dst_drop(skb);
 	nf_reset(skb);
 }

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox