Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-10 17:50 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <3124654.2UMIXvF0vN@sifl>

On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
> On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
> > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
> > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
> > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
> > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
> > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist
> > > > > > > in any released SELinux policy as we are just now adding them with
> > > > > > > this patchset. With current policies loaded into a kernel with
> > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
> > > > > > > permission would be treated according to the policy's unknown
> > > > > > > permission setting.
> > > > > > 
> > > > > > OK I think we need to rethink what we are doing here: what you sent
> > > > > > addresses the problem as stated but I think we mis-stated it.  Let
> > > > > > me try to restate the problem: it is not just selinux problem. Let's
> > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it as
> > > > > > root.
> > > > > > 
> > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
> > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
> > > > > > kernel priveledges.
> > > > > 
> > > > > Correct me if I'm wrong, but I believe libvirt does this while running
> > > > > as root.  Assuming that is the case, why not simply setuid()/setgid()
> > > > > to the same credentials as the QEMU instance before creating the TUN
> > > > > device? You can always (re)configure the device afterwards while
> > > > > running as root/CAP_NET_ADMIN.
> > > > 
> > > > We want isolation between qemu instances.
> > > 
> > > Understood, I agree.
> > > 
> > > Achieving separation via SELinux is easily done, with libvirt/sVirt
> > > already doing this for us automatically in most cases; the only thing we
> > > will want to do is make sure the SELinux policy is aware of the new
> > > permission.
> > > 
> > > Achieving separation via DAC should also be easily done, simply run each
> > > QEMU instance with a separate UID and/or GID.
> > > 
> > > > Giving qemu right to open tun and SETIFF would give it rights
> > > > to access any tun device.
> > > 
> > > I'm quickly looked at tun_chr_open() again and I don't see any special
> > > rights/privileges required, the same for tun_chr_ioctl() and
> > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
> > > tun_not_capable() which does a simple DAC check; it must have the same
> > > UID/GID or have CAP_NET_ADMIN.
> > > 
> > > I'm having a hard time seeing the problem you are describing; help me
> > > understand.
> > 
> > The issue is guest controls the number of queues in use.
> > So qemu would be required to be allowed to call tun_set_queue.
> > If we allow this we have a problem as one qemu will be
> > able to access any tun.
> 
> QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(), which 
> from a practical point of view means that the TUN device was created with the 
> same UID/GID as the QEMU instance.  If you want TUN device separation between 
> QEMU instances using DAC you need to run each QEMU instance with a different 
> UID/GID (which you should be doing anyway if you want DAC enforced general 
> separation).
> 
> I believe I've stated this point several times now and I don't feel you've 
> addressed it properly.

Look at how it works at the moment:
a priveledged libvirt server calls tun_set_iff
and passes the fd to qemu which is not priveledged.

The result is isolation between qemu instances without
need to create uid per qemu instance.

How do we create multiple queues? It makes sense to
follow this model and pass in fds for individual queues.
However they need to be disabled initially
so libvirt can not do tun_set_queue for us.
When qemu later calls tun_set_queue it will fail which means we
can't utilize multiqueue.

My solution is an unpriveledged variant
of tun_set_queue that only enables/disables
a queue without attach/detach.


> -- 
> paul moore
> security and virtualization @ redhat

^ permalink raw reply

* netconsole fun
From: Peter Hurley @ 2012-12-10 18:00 UTC (permalink / raw)
  To: netdev

Is there a recommended method of running netconsole on a machine that
has a slaved device?

I ask because it seems pretty difficult to get netconsole running even
on a machine that has multiple physical interfaces, only one of which is
slaved.

I scoured the documentation but didn't find anything relevant (does
documentation/networking/netconsole.txt need a patch?). AFAICT, the last
discussion ended here back in June '11
http://lkml.indiana.edu/hypermail/linux/kernel/1106.1/03185.html 

Any help or pointers provided would be much appreciated.

Regards,
Peter Hurley

^ permalink raw reply

* Re: [PATCH net-next v3] tipc: sk_recv_queue size check only for connectionless sockets
From: Neil Horman @ 2012-12-10 18:22 UTC (permalink / raw)
  To: Jon Maloy; +Cc: Ying Xue, Paul.Gortmaker, erik.hugne, netdev, tipc-discussion
In-Reply-To: <50C60496.5040800@ericsson.com>

On Mon, Dec 10, 2012 at 10:49:42AM -0500, Jon Maloy wrote:
> On 12/10/2012 05:13 AM, Jon Maloy wrote:
> > On 12/10/2012 04:23 AM, Ying Xue wrote:
> >> The sk_receive_queue limit control is currently performed for all
> >> arriving messages, disregarding socket and message type. But for
> >> connectionless sockets this check is redundant, since the protocol
> >> flow already makes queue overflow impossible.
> >>
> >> We move the sk_receive_queue limit control so that it's only performed
> >> for connectionless sockets, i.e. SOCK_RDM and SOCK_DGRAM type sockets.
> >>
> >> However, as Neil Horman specified, we cannot simply force the socket
> >> receive queue limit against connectionless sockets as it may create a
> >> DoS vulnerability. For example, if a sender floods a receiver with
> >> messages containing an invalid set of message importance bits or
> >> CRITICAL importance, we will queue messages indefinitely.
> >>
> >> To avoid DoS attack, socket receive queue will be marked as overflow
> >> if we receive messages with invalid message importances, meanwhile,
> >> we also set one new threshold for CRITICAL importance messages.
> >>
> >> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> >> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> >> Cc: Neil Horman <nhorman@tuxdriver.com>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> >> ---
> >> v3 changes:
> >>  - set new threshold for CRITICAL message
> >>  - defined an importance factor table to avoid multiplication and
> >>    division operations in rx_queue_full().
> >>  - changed return value of rx_queue_full() from integer to boolean.
> >>
> >>  net/tipc/socket.c |   44 +++++++++++++++++++-------------------------
> >>  1 files changed, 19 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> >> index 9b4e483..a18a757 100644
> >> --- a/net/tipc/socket.c
> >> +++ b/net/tipc/socket.c
> >> @@ -43,7 +43,7 @@
> >>  #define SS_LISTENING	-1	/* socket is listening */
> >>  #define SS_READY	-2	/* socket is connectionless */
> >>  
> >> -#define OVERLOAD_LIMIT_BASE	10000
> >> +#define OVERLOAD_LIMIT_BASE	5000
> >>  #define CONN_TIMEOUT_DEFAULT	8000	/* default connect timeout = 8s */
> >>  
> >>  struct tipc_sock {
> >> @@ -73,6 +73,13 @@ static struct proto tipc_proto;
> >>  
> >>  static int sockets_enabled;
> >>  
> >> +static const u32 msg_importance_factor[] = {
> >> +	OVERLOAD_LIMIT_BASE,		/* TIPC_LOW_IMPORTANCE limit */
> >> +	OVERLOAD_LIMIT_BASE * 2,	/* TIPC_MEDIUM_IMPORTANCE limit */
> >> +	OVERLOAD_LIMIT_BASE * 100,	/* TIPC_HIGH_IMPORTANCE limit */
> >> +	OVERLOAD_LIMIT_BASE * 200	/* TIPC_CRITICAL_IMPORTANCE limit */
> >> +	};
> >> +
> >>  /*
> >>   * Revised TIPC socket locking policy:
> >>   *
> >> @@ -1158,28 +1165,17 @@ static void tipc_data_ready(struct sock *sk, int len)
> >>   * rx_queue_full - determine if receive queue can accept another message
> >>   * @msg: message to be added to queue
> >>   * @queue_size: current size of queue
> >> - * @base: nominal maximum size of queue
> >>   *
> >> - * Returns 1 if queue is unable to accept message, 0 otherwise
> >> + * Returns true if queue is unable to accept message, false otherwise
> >>   */
> >> -static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
> >> +static bool rx_queue_full(struct tipc_msg *msg, u32 queue_size)
> >>  {
> >> -	u32 threshold;
> >>  	u32 imp = msg_importance(msg);
> >>  
> >> -	if (imp == TIPC_LOW_IMPORTANCE)
> >> -		threshold = base;
> >> -	else if (imp == TIPC_MEDIUM_IMPORTANCE)
> >> -		threshold = base * 2;
> >> -	else if (imp == TIPC_HIGH_IMPORTANCE)
> >> -		threshold = base * 100;
> >> -	else
> >> -		return 0;
> >> +	if (unlikely(imp > TIPC_CRITICAL_IMPORTANCE))
> >> +		return true;
> > 
> > This test is not necessary. Such messages have already been filtered out
> > in tipc_recv_msg() at link level.
> > The test msg_isdata(), which determines if a message should be sent up to
> > the port/socket level, is  also an implicit test that 
> > importance < TIPC_CRITICAL_IMPORTANCE.
> 
> (importance <= TIPC_CRITICAL_IMPORTANCE), of course.
> This is an effect of the co-location of the user and importance fields in the
> TIPC header. I.e., the importance is in reality coded into the value of
> the user field.
> 
> To clarify (and improve) my previous suggestion, what I had in mind
> was something like this:
> 
> 
>                 recv_q_len = skb_queue_len(&sk->sk_receive_queue);
> -               if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
> -                       if (rx_queue_full(msg, recv_q_len,
> -                           OVERLOAD_LIMIT_BASE / 2))
> -                               return TIPC_ERR_OVERLOAD;
> -               }
> +               imp = msg_importance(msg);
> +               if (unlikely(recv_q_len > (OVERLOAD_LIMIT_BASE << (imp << 1))))
> +                       return TIPC_ERR_OVERLOAD;                 
>         } else {
>                 if (msg_mcast(msg))
>                         return TIPC_ERR_NO_PORT;
> 
> No need for any separate function at all, and guaranteed inline.
> This will probably translate to one single shift-instruction extra,
> and no memor access, since the compiler merge (imp << 1) with the 
> shift operation used to read imp from the header.
> 
> 
> The limits for message rejection become the following,
> given an OVERLOAD_LIMIT_BASE of 10000:
> 
> TIP_LOW_IMPORTANCE:       10000    (previously 10000)
> TIPC_MEDIUM_IMPORTANCE    40000    (previously 20000)
> TIPC_MEDIUM_IMPORTANCE    1600000  (previously 1000000)
> TIPC_CRITICAL_IMPORTANCE  25600000 (previously no limit)
> 
> ///jon
> 
Sure, this will work nicely as well.
Neil

^ permalink raw reply

* RE: ixgbe:  pci_get_device() call without counterpart call of pci_dev_put()
From: Rose, Gregory V @ 2012-12-10 18:24 UTC (permalink / raw)
  To: Kirsher, Jeffrey T, Elena Gurevich
  Cc: netdev, e1000-devel@lists.sourceforge.net, Skidmore, Donald C
In-Reply-To: <50C58C98.2030509@gmail.com>

> -----Original Message-----
> From: Jeff Kirsher [mailto:tarbal@gmail.com]
> Sent: Sunday, December 09, 2012 11:18 PM
> To: Elena Gurevich
> Cc: netdev; e1000-devel@lists.sourceforge.net; Rose, Gregory V; Skidmore,
> Donald C
> Subject: Re: ixgbe: pci_get_device() call without counterpart call of
> pci_dev_put()
> 
> On 12/09/2012 01:47 AM, Elena Gurevich wrote:
> > Hi all,
> > I am pioneer in linux device drivers here and using Intel 82599 NIC as
> > reference model, During investigation to drivers sources I found the
> > suspicious code:
> >
> > Is code sequence (1)   and (2) the possible device reference count
> leakage
> > ???
> >
> >
> > Thanks a lot in advance
> > Lena
> >
> > --------snipped from ixgbe_main.c  file function
> > ixgbe_io_error_detected()
> > -----------
> >
> > . . .
> > 		/* Find the pci device of the offending VF */
> > 		vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, device_id, NULL);
> > 		while (vfdev) {
> > 			if (vfdev->devfn == (req_id & 0xFF))
> > 				break;
> > <------------------------------      (1) leaves the loop with successful
> get
> > call  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> >
> > 			vfdev = pci_get_device(PCI_VENDOR_ID_INTEL,
> > 					       device_id, vfdev);
> > 		}
> > 		/*
> > 		 * There's a slim chance the VF could have been hot plugged,
> > 		 * so if it is no longer present we don't need to issue the
> > 		 * VFLR.  Just clean up the AER in that case.
> > 		 */
> > 		if (vfdev) {
> > 			e_dev_err("Issuing VFLR to VF %d\n", vf);
> > 			pci_write_config_dword(vfdev, 0xA8, 0x00008000);
> > 		}
> >
> > 		pci_cleanup_aer_uncorrect_error_status(pdev);
> > 	}
> >
> > 	/*
> > 	 * Even though the error may have occurred on the other port
> > 	 * we still need to increment the vf error reference count for
> > 	 * both ports because the I/O resume function will be called
> > 	 * for both of them.
> > 	 */
> > 	adapter->vferr_refcount++;
> >
> > 	return PCI_ERS_RESULT_RECOVERED;
> > <--------------------------------------------  (2) leaves the function
> > without put call !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> > ----------------------------------  snipped
> > -----------------------------------
> >
> Added Greg Rose (ixgbevf maintainer) & Don Skidmore (ixgbe maintainer)
> 
> Since the code you have questions about is dealing with the vf, Greg will
> most likely be able to shed some light to your questions.

That's a bug.  I'll generate a patch to add the necessary pci_dev_put().

- Greg


^ permalink raw reply

* Re: [PATCH net-next v3 01/22] bnx2x: Support probing and removing of VF device
From: David Miller @ 2012-12-10 18:35 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1355154406-10855-2-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Mon, 10 Dec 2012 17:46:25 +0200

>  static void bnx2x_get_pcie_width_speed(struct bnx2x *bp, int *width, int *speed)
>  {
> -	u32 val = REG_RD(bp, PCICFG_OFFSET + PCICFG_LINK_CONTROL);
> +	u32 val = 0;
> +	pci_read_config_dword(bp->pdev, PCICFG_LINK_CONTROL, &val);

Please put an empty line between function local variable declarations
and actual code.

> @@ -12023,85 +12057,115 @@ static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev,
>  	 * If MSI-X is not supported - return number of SBs needed to support
>  	 * one fast path queue: one FP queue + SB for CNIC
>  	 */
> -	if (!pos)
> +	if (!pos) {
> +		pr_info("no msix capability found");
>  		return 1 + cnic_cnt;
> +	}
> +
> +	pr_info("msix capability found");
>  

Use dev_info(), netdev_info(), or similar, rather than plain
pr_info().

^ permalink raw reply

* Re: [PATCH net-next v3 02/22] bnx2x: VF <-> PF channel 'acquire' at vf probe
From: David Miller @ 2012-12-10 18:37 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1355154406-10855-3-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Mon, 10 Dec 2012 17:46:26 +0200

> +/* VF MBX (aka vf-pf channel) */
> +#include "bnx2x.h"
> +#include "bnx2x_sriov.h"
> +/* place a given tlv on the tlv buffer at a given offset */
> +void bnx2x_add_tlv(struct bnx2x *bp, void *tlvs_list, u16 offset, u16 type,
> +		   u16 length)

Put an empty line between include directives and the rest of the
file.

> +	while (tlv->type != CHANNEL_TLV_LIST_END) {
> +
> +		/* output tlv */
> +		DP(BNX2X_MSG_IOV, "TLV number %d: type %d, length %d\n", i,
> +		   tlv->type, tlv->length);

But this is a place where an empty line is not appropriate, please remove
the empty line after the one with the while().

> +/* acquire response tlv - carries the allocated resources */
> +struct pfvf_acquire_resp_tlv {
> +	struct pfvf_tlv hdr;
> +	struct pf_vf_pfdev_info {
> +		u32 chip_num;
> +		u32 pf_cap;
> +		#define PFVF_CAP_RSS		0x00000001
> +		#define PFVF_CAP_DHC		0x00000002
> +		#define PFVF_CAP_TPA		0x00000004

Please don't indent CPP defines excessively like this, they should be
at or near the initial column so that CPP directives can be easily
spotted by someone reading this code.

^ permalink raw reply

* Re: [PATCH net-next v3 03/22] bnx2x: Add to VF <-> PF channel the release request
From: David Miller @ 2012-12-10 18:39 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1355154406-10855-4-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Mon, 10 Dec 2012 17:46:27 +0200

> +	/* PF released us */
> +	if (resp->hdr.status == PFVF_STATUS_SUCCESS) {
> +		DP(BNX2X_MSG_SP, "vf released\n");
> +
> +	/* PF reports error */
> +	} else {

Seriously?  Do I even need to tell you how awful this looks?

Remove the empty line and put that comment before the "else"
somewhere where it doesn't have to have indentation which does
not match the basic block it is in.

I think I've reviewed enough of this series, please audit the
rest of coding style problems yourself.

^ permalink raw reply

* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: David Miller @ 2012-12-10 18:41 UTC (permalink / raw)
  To: johannes; +Cc: eric, netdev, linux-wireless, linville, eric.dumazet
In-Reply-To: <1355132466.9857.6.camel@jlt4.sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Mon, 10 Dec 2012 10:41:06 +0100

> From: Johannes Berg <johannes.berg@intel.com>
> 
> ip_check_defrag() might be called from af_packet within the
> RX path where shared SKBs are used, so it must not modify
> the input SKB before it has unshared it for defragmentation.
> Use skb_copy_bits() to get the IP header and only pull in
> everything later.
> 
> The same is true for the other caller in macvlan as it is
> called from dev->rx_handler which can also get a shared SKB.
> 
> Reported-by: Eric Leblond <eric@regit.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
> For some versions of the kernel, this code goes into af_packet.c

So the bug is that ip_check_defrag() has a precondition which is met
properly by all callers except AF_PACKET.

If this is the case, remind me why are we changing ip_check_defrag()
rather than the violator of the precondition?

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Eric Paris @ 2012-12-10 18:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paul Moore, Linux Netdev List, LSM List, SE-Linux, jasowang
In-Reply-To: <20121210175035.GA31856@redhat.com>

Let me abstract a little here Paul.  Lets say user A starts an
unclassified process and a top secret process.  SELinux policy darn
well better be able to enforce that they can not attach to the same
tun.

Am I missing something here?

On Mon, Dec 10, 2012 at 12:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Dec 10, 2012 at 12:33:49PM -0500, Paul Moore wrote:
>> On Monday, December 10, 2012 07:26:56 PM Michael S. Tsirkin wrote:
>> > On Mon, Dec 10, 2012 at 12:04:35PM -0500, Paul Moore wrote:
>> > > On Friday, December 07, 2012 02:25:16 PM Michael S. Tsirkin wrote:
>> > > > On Thu, Dec 06, 2012 at 04:09:51PM -0500, Paul Moore wrote:
>> > > > > On Thursday, December 06, 2012 10:57:16 PM Michael S. Tsirkin wrote:
>> > > > > > On Thu, Dec 06, 2012 at 11:56:45AM -0500, Paul Moore wrote:
>> > > > > > > The SETQUEUE/tun_socket:create_queue permissions do not yet exist
>> > > > > > > in any released SELinux policy as we are just now adding them with
>> > > > > > > this patchset. With current policies loaded into a kernel with
>> > > > > > > this patchset applied the SETQUEUE/tun_socket:create_queue
>> > > > > > > permission would be treated according to the policy's unknown
>> > > > > > > permission setting.
>> > > > > >
>> > > > > > OK I think we need to rethink what we are doing here: what you sent
>> > > > > > addresses the problem as stated but I think we mis-stated it.  Let
>> > > > > > me try to restate the problem: it is not just selinux problem. Let's
>> > > > > > assume qemu wants to use tun, I (libvirt) don't want to run it as
>> > > > > > root.
>> > > > > >
>> > > > > > 1. TUNSETIFF: I can open tun, attach an fd and pass it to qemu.
>> > > > > > Now, qemu does not invoke TUNSETIFF so it can run without
>> > > > > > kernel priveledges.
>> > > > >
>> > > > > Correct me if I'm wrong, but I believe libvirt does this while running
>> > > > > as root.  Assuming that is the case, why not simply setuid()/setgid()
>> > > > > to the same credentials as the QEMU instance before creating the TUN
>> > > > > device? You can always (re)configure the device afterwards while
>> > > > > running as root/CAP_NET_ADMIN.
>> > > >
>> > > > We want isolation between qemu instances.
>> > >
>> > > Understood, I agree.
>> > >
>> > > Achieving separation via SELinux is easily done, with libvirt/sVirt
>> > > already doing this for us automatically in most cases; the only thing we
>> > > will want to do is make sure the SELinux policy is aware of the new
>> > > permission.
>> > >
>> > > Achieving separation via DAC should also be easily done, simply run each
>> > > QEMU instance with a separate UID and/or GID.
>> > >
>> > > > Giving qemu right to open tun and SETIFF would give it rights
>> > > > to access any tun device.
>> > >
>> > > I'm quickly looked at tun_chr_open() again and I don't see any special
>> > > rights/privileges required, the same for tun_chr_ioctl() and
>> > > __tun_chr_ioctl().  Looking at tun_set_queue() I see we call
>> > > tun_not_capable() which does a simple DAC check; it must have the same
>> > > UID/GID or have CAP_NET_ADMIN.
>> > >
>> > > I'm having a hard time seeing the problem you are describing; help me
>> > > understand.
>> >
>> > The issue is guest controls the number of queues in use.
>> > So qemu would be required to be allowed to call tun_set_queue.
>> > If we allow this we have a problem as one qemu will be
>> > able to access any tun.
>>
>> QEMU can call tun_set_queue() as long as it satisfies tun_not_capable(), which
>> from a practical point of view means that the TUN device was created with the
>> same UID/GID as the QEMU instance.  If you want TUN device separation between
>> QEMU instances using DAC you need to run each QEMU instance with a different
>> UID/GID (which you should be doing anyway if you want DAC enforced general
>> separation).
>>
>> I believe I've stated this point several times now and I don't feel you've
>> addressed it properly.
>
> Look at how it works at the moment:
> a priveledged libvirt server calls tun_set_iff
> and passes the fd to qemu which is not priveledged.
>
> The result is isolation between qemu instances without
> need to create uid per qemu instance.
>
> How do we create multiple queues? It makes sense to
> follow this model and pass in fds for individual queues.
> However they need to be disabled initially
> so libvirt can not do tun_set_queue for us.
> When qemu later calls tun_set_queue it will fail which means we
> can't utilize multiqueue.
>
> My solution is an unpriveledged variant
> of tun_set_queue that only enables/disables
> a queue without attach/detach.
>
>
>> --
>> paul moore
>> security and virtualization @ redhat
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: Johannes Berg @ 2012-12-10 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: eric, netdev, linux-wireless, linville, eric.dumazet
In-Reply-To: <20121210.134146.1583909966821253233.davem@davemloft.net>

On Mon, 2012-12-10 at 13:41 -0500, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Mon, 10 Dec 2012 10:41:06 +0100
> 
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > ip_check_defrag() might be called from af_packet within the
> > RX path where shared SKBs are used, so it must not modify
> > the input SKB before it has unshared it for defragmentation.
> > Use skb_copy_bits() to get the IP header and only pull in
> > everything later.
> > 
> > The same is true for the other caller in macvlan as it is
> > called from dev->rx_handler which can also get a shared SKB.
> > 
> > Reported-by: Eric Leblond <eric@regit.org>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > ---
> > For some versions of the kernel, this code goes into af_packet.c
> 
> So the bug is that ip_check_defrag() has a precondition which is met
> properly by all callers except AF_PACKET.
> 
> If this is the case, remind me why are we changing ip_check_defrag()
> rather than the violator of the precondition?

I don't think this is the case.

If you're referring to my note about af_packet: the kernels where this
goes into af_packet.c are the kernels that don't even have
ip_check_defrag() because macvlan didn't exist/didn't have ip defrag
support and af_packet had this code there -- see commit bc416d9768a.

If you're not referring to my note about af_packet: both callers (there
are only two) of ip_check_defrag() have this bug as far as I can tell
because they're both in the part of the RX path where shared SKBs might
happen.

johannes

^ permalink raw reply

* Re: [Patch net-next] bridge: fix seq check in br_mdb_dump()
From: David Miller @ 2012-12-10 18:46 UTC (permalink / raw)
  To: amwang; +Cc: netdev, herbert, shemminger, tgraf, brouer
In-Reply-To: <1355141735-19576-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Mon, 10 Dec 2012 20:15:35 +0800

> From: Cong Wang <amwang@redhat.com>
> 
> In case of rehashing, introduce a global variable 'br_mdb_rehash_seq'
> which gets increased every time when rehashing, and assign
> net->dev_base_seq + br_mdb_rehash_seq to cb->seq.
> 
> In theory cb->seq could be wrapped to zero, but this is not
> easy to fix, as net->dev_base_seq is not visible inside
> br_mdb_rehash(). In practice, this is rare.
> 
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Cong Wang <amwang@redhat.com>

No synchronization at all is applied to this variable, I can't
see how this is OK.

^ permalink raw reply

* Re: [Patch net-next] virtio_net: fix a typo in virtnet_alloc_queues()
From: David Miller @ 2012-12-10 18:46 UTC (permalink / raw)
  To: amwang; +Cc: netdev, jasowang
In-Reply-To: <1355142248-19987-1-git-send-email-amwang@redhat.com>

From: Cong Wang <amwang@redhat.com>
Date: Mon, 10 Dec 2012 20:24:08 +0800

> Obviously it should check !vi->rq.
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com> 
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] ipv4: ip_check_defrag must not modify skb before unsharing
From: David Miller @ 2012-12-10 18:50 UTC (permalink / raw)
  To: johannes-cdvu00un1VgdHxzADdlk8Q
  Cc: eric-EVVnsjFE0OfYtjvyW6yDsg, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linville-2XuSBdqkA4R54TAoqtyWWQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1355165152.8083.4.camel-8Nb76shvtaUJvtFkdXX2HixXY32XiHfO@public.gmane.org>

From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Mon, 10 Dec 2012 19:45:52 +0100

> On Mon, 2012-12-10 at 13:41 -0500, David Miller wrote:
>> So the bug is that ip_check_defrag() has a precondition which is met
>> properly by all callers except AF_PACKET.
>> 
>> If this is the case, remind me why are we changing ip_check_defrag()
>> rather than the violator of the precondition?
> 
> I don't think this is the case.
> 
> If you're referring to my note about af_packet: the kernels where this
> goes into af_packet.c are the kernels that don't even have
> ip_check_defrag() because macvlan didn't exist/didn't have ip defrag
> support and af_packet had this code there -- see commit bc416d9768a.
> 
> If you're not referring to my note about af_packet: both callers (there
> are only two) of ip_check_defrag() have this bug as far as I can tell
> because they're both in the part of the RX path where shared SKBs might
> happen.

You're right, I misinterpreted what's happening here.

My misunderstanding was that this was a situation where normal IPV4
input processing makes sure the SKB is unshared, and we had special
code paths that didn't make sure that was the case.

Rather, here, we have a special entrypoint for macvlan and AF_PACKET
which is supposed to take care of such issues since it is known to
execute in a different kind of environment.

I'm pretty sure I'll apply this, after I check a few more things,
thanks Johannes!
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v3 01/22] bnx2x: Support probing and removing of VF device
From: Joe Perches @ 2012-12-10 18:50 UTC (permalink / raw)
  To: David Miller; +Cc: ariele, netdev, eilong
In-Reply-To: <20121210.133519.2048543022289027034.davem@davemloft.net>

On Mon, 2012-12-10 at 13:35 -0500, David Miller wrote:
> From: "Ariel Elior" <ariele@broadcom.com>
> Date: Mon, 10 Dec 2012 17:46:25 +0200
[]
> > @@ -12023,85 +12057,115 @@ static int bnx2x_get_num_non_def_sbs(struct pci_dev *pdev,
> >        * If MSI-X is not supported - return number of SBs needed to support
> >        * one fast path queue: one FP queue + SB for CNIC
> >        */
> > -     if (!pos)
> > +     if (!pos) {
> > +             pr_info("no msix capability found");
[]
> Use dev_info(), netdev_info(), or similar, rather than plain
> pr_info().

Also, please add terminating newlines to avoid
dmesg message interleaving.

^ permalink raw reply

* Re: ipgre rss is broken since gro
From: David Miller @ 2012-12-10 19:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dmitry, edumazet, netdev, therbert
In-Reply-To: <1355158471.27891.44.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Dec 2012 08:54:31 -0800

> I believe performance problem might come from the
> skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
> 
> So all packets are queued into a single GRO queue, instead of being
> split as intended in multiple queues.
> 
> I cant find why we must clear queue_mapping, so could you try :

Tom says:

commit 693019e90ca45d881109d32c0c6d29adf03f6447
Author: Tom Herbert <therbert@google.com>
Date:   Thu Sep 23 11:19:54 2010 +0000

    net: reset skb queue mapping when rx'ing over tunnel
    
    Reset queue mapping when an skb is reentering the stack via a tunnel.
    On second pass, the queue mapping from the original device is no
    longer valid.
    
    Signed-off-by: Tom Herbert <therbert@google.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/dst.h b/include/net/dst.h
index 81d1413..0238650 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -242,6 +242,7 @@ static inline void skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 	dev->stats.rx_packets++;
 	dev->stats.rx_bytes += skb->len;
 	skb->rxhash = 0;
+	skb_set_queue_mapping(skb, 0);
 	skb_dst_drop(skb);
 	nf_reset(skb);
 }

^ permalink raw reply related

* Re: [PATCH] net: Allow DCBnl to use other namespaces besides init_net
From: David Miller @ 2012-12-10 19:09 UTC (permalink / raw)
  To: john.fastabend; +Cc: netdev, ebiederm
In-Reply-To: <20121210064813.10883.88194.stgit@nitbit.x32>

From: John Fastabend <john.fastabend@gmail.com>
Date: Sun, 09 Dec 2012 22:48:13 -0800

> Allow DCB and net namespace to work together. This is useful if you
> have containers that are bound to 'phys' interfaces that want to
> also manage their DCB attributes.
> 
> The net namespace is taken from sock_net(skb->sk) of the netlink skb.
> 
> CC: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH RESEND] net: remove obsolete simple_strto<foo>
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: abhi.c.pawar
  Cc: pablo, kaber, kuznet, jmorris, yoshfuji, linville, johannes,
	amwang, edumazet, nhorman, joe, netdev, linux-kernel,
	netfilter-devel, netfilter, coreteam, linux-wireless
In-Reply-To: <1355130748-7828-1-git-send-email-abhi.c.pawar@gmail.com>

From: Abhijit Pawar <abhi.c.pawar@gmail.com>
Date: Mon, 10 Dec 2012 14:42:28 +0530

> This patch replace the obsolete simple_strto<foo> with kstrto<foo>
> 
> Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] smsc75xx: only set mac address once on bind
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev, bjorn, dcbw
In-Reply-To: <1355137279-2695-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 10 Dec 2012 11:01:19 +0000

> This patch changes when we decide what the device's MAC address
> is from per ifconfig up to once when the device is connected.
> 
> Without this patch, a manually forced device MAC is overwritten
> on ifconfig down/up.  Also devices that have no EEPROM are
> assigned a new random address on ifconfig down/up instead of
> persisting the same one.
> 
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
> Reported-by: Robert Cunningham <rcunningham@nsmsurveillance.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] smsc95xx: fix register dump of last register
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1355137388-2938-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 10 Dec 2012 11:03:07 +0000

> This patch fixes the ethtool register dump for smsc95xx to dump
> all 4 bytes of the final register (COE_CR) instead of just the
> first byte.
> 
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] smsc95xx: fix async register writes on big endian platforms
From: David Miller @ 2012-12-10 19:10 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1355137388-2938-2-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Mon, 10 Dec 2012 11:03:08 +0000

> This patch fixes a missing endian conversion which results in the
> interface failing to come up on BE platforms.
> 
> It also removes an unnecessary pointer dereference from this
> function.
> 
> Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>

Applied.

^ permalink raw reply

* [PATCH v2 03/10] net: smc91x: use io{read,write}*_rep accessors instead of string functions
From: Will Deacon @ 2012-12-10 19:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, benh, arnd, james.hogan, Will Deacon, Nicolas Pitre,
	netdev
In-Reply-To: <1355166762-15133-1-git-send-email-will.deacon@arm.com>

The {read,write}s{b,w,l} operations are not defined by all architectures
and are being removed from the asm-generic/io.h interface.

This patch replaces the usage of these string functions in the default
SMC accessors with io{read,write}{8,16,32}_rep calls instead, which are
defined for all architectures.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/smsc/smc91x.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
index 5f53fbb..370e13d 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -286,16 +286,16 @@ static inline void mcf_outsw(void *a, unsigned char *p, int l)
 
 #define SMC_IO_SHIFT		(lp->io_shift)
 
-#define SMC_inb(a, r)		readb((a) + (r))
-#define SMC_inw(a, r)		readw((a) + (r))
-#define SMC_inl(a, r)		readl((a) + (r))
-#define SMC_outb(v, a, r)	writeb(v, (a) + (r))
-#define SMC_outw(v, a, r)	writew(v, (a) + (r))
-#define SMC_outl(v, a, r)	writel(v, (a) + (r))
-#define SMC_insw(a, r, p, l)	readsw((a) + (r), p, l)
-#define SMC_outsw(a, r, p, l)	writesw((a) + (r), p, l)
-#define SMC_insl(a, r, p, l)	readsl((a) + (r), p, l)
-#define SMC_outsl(a, r, p, l)	writesl((a) + (r), p, l)
+#define SMC_inb(a, r)		ioread8((a) + (r))
+#define SMC_inw(a, r)		ioread16((a) + (r))
+#define SMC_inl(a, r)		ioread32((a) + (r))
+#define SMC_outb(v, a, r)	iowrite8(v, (a) + (r))
+#define SMC_outw(v, a, r)	iowrite16(v, (a) + (r))
+#define SMC_outl(v, a, r)	iowrite32(v, (a) + (r))
+#define SMC_insw(a, r, p, l)	ioread16_rep((a) + (r), p, l)
+#define SMC_outsw(a, r, p, l)	iowrite16_rep((a) + (r), p, l)
+#define SMC_insl(a, r, p, l)	ioread32_rep((a) + (r), p, l)
+#define SMC_outsl(a, r, p, l)	iowrite32_rep((a) + (r), p, l)
 
 #define RPC_LSA_DEFAULT		RPC_LED_100_10
 #define RPC_LSB_DEFAULT		RPC_LED_TX_RX
-- 
1.8.0

^ permalink raw reply related

* [PATCH v2 04/10] net: smc911x: use io{read,write}*_rep accessors
From: Will Deacon @ 2012-12-10 19:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, benh, arnd, james.hogan, Matthew Leach, netdev,
	Will Deacon
In-Reply-To: <1355166762-15133-1-git-send-email-will.deacon@arm.com>

From: Matthew Leach <matthew@mattleach.net>

The {read,write}s{b,w,l} operations are not defined by all
architectures and are being removed from the asm-generic/io.h
interface.

This patch replaces the usage of these string functions in the smc911x
accessors with io{read,write}{8,16,32}_rep calls instead.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Matthew Leach <matthew@mattleach.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/smsc/smc911x.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc911x.h b/drivers/net/ethernet/smsc/smc911x.h
index 3269292..d51261b 100644
--- a/drivers/net/ethernet/smsc/smc911x.h
+++ b/drivers/net/ethernet/smsc/smc911x.h
@@ -159,12 +159,12 @@ static inline void SMC_insl(struct smc911x_local *lp, int reg,
 	void __iomem *ioaddr = lp->base + reg;
 
 	if (lp->cfg.flags & SMC911X_USE_32BIT) {
-		readsl(ioaddr, addr, count);
+		ioread32_rep(ioaddr, addr, count);
 		return;
 	}
 
 	if (lp->cfg.flags & SMC911X_USE_16BIT) {
-		readsw(ioaddr, addr, count * 2);
+		ioread16_rep(ioaddr, addr, count * 2);
 		return;
 	}
 
@@ -177,12 +177,12 @@ static inline void SMC_outsl(struct smc911x_local *lp, int reg,
 	void __iomem *ioaddr = lp->base + reg;
 
 	if (lp->cfg.flags & SMC911X_USE_32BIT) {
-		writesl(ioaddr, addr, count);
+		iowrite32_rep(ioaddr, addr, count);
 		return;
 	}
 
 	if (lp->cfg.flags & SMC911X_USE_16BIT) {
-		writesw(ioaddr, addr, count * 2);
+		iowrite16_rep(ioaddr, addr, count * 2);
 		return;
 	}
 
@@ -196,14 +196,14 @@ static inline void SMC_outsl(struct smc911x_local *lp, int reg,
 		 writew(v & 0xFFFF, (lp)->base + (r));	 \
 		 writew(v >> 16, (lp)->base + (r) + 2); \
 	 } while (0)
-#define SMC_insl(lp, r, p, l)	 readsw((short*)((lp)->base + (r)), p, l*2)
-#define SMC_outsl(lp, r, p, l)	 writesw((short*)((lp)->base + (r)), p, l*2)
+#define SMC_insl(lp, r, p, l)	 ioread16_rep((short*)((lp)->base + (r)), p, l*2)
+#define SMC_outsl(lp, r, p, l)	 iowrite16_rep((short*)((lp)->base + (r)), p, l*2)
 
 #elif	SMC_USE_32BIT
 #define SMC_inl(lp, r)		 readl((lp)->base + (r))
 #define SMC_outl(v, lp, r)	 writel(v, (lp)->base + (r))
-#define SMC_insl(lp, r, p, l)	 readsl((int*)((lp)->base + (r)), p, l)
-#define SMC_outsl(lp, r, p, l)	 writesl((int*)((lp)->base + (r)), p, l)
+#define SMC_insl(lp, r, p, l)	 ioread32_rep((int*)((lp)->base + (r)), p, l)
+#define SMC_outsl(lp, r, p, l)	 iowrite32_rep((int*)((lp)->base + (r)), p, l)
 
 #endif /* SMC_USE_16BIT */
 #endif /* SMC_DYNAMIC_BUS_CONFIG */
-- 
1.8.0

^ permalink raw reply related

* [PATCH v2 05/10] net: dm9000: use io{read,write}*_rep accessors
From: Will Deacon @ 2012-12-10 19:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, benh, arnd, james.hogan, Matthew Leach, David Miller,
	netdev, Will Deacon
In-Reply-To: <1355166762-15133-1-git-send-email-will.deacon@arm.com>

From: Matthew Leach <matthew@mattleach.net>

The {read,write}s{b,w,l} operations are not defined by all
architectures and are being removed from the asm-generic/io.h
interface.

This patch replaces the usage of these string functions in the default
DM9000 accessors with io{read,write}{8,16,32}_rep calls instead. This
is required as the dm9000 driver is in use by the blackfin
architecture which uses the asm-generic io accessors.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Matthew Leach <matthew@mattleach.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/davicom/dm9000.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
index 36499d5..61648a6 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -193,35 +193,35 @@ iow(board_info_t * db, int reg, int value)
 
 static void dm9000_outblk_8bit(void __iomem *reg, void *data, int count)
 {
-	writesb(reg, data, count);
+	iowrite8_rep(reg, data, count);
 }
 
 static void dm9000_outblk_16bit(void __iomem *reg, void *data, int count)
 {
-	writesw(reg, data, (count+1) >> 1);
+	iowrite16_rep(reg, data, (count+1) >> 1);
 }
 
 static void dm9000_outblk_32bit(void __iomem *reg, void *data, int count)
 {
-	writesl(reg, data, (count+3) >> 2);
+	iowrite32_rep(reg, data, (count+3) >> 2);
 }
 
 /* input block from chip to memory */
 
 static void dm9000_inblk_8bit(void __iomem *reg, void *data, int count)
 {
-	readsb(reg, data, count);
+	ioread8_rep(reg, data, count);
 }
 
 
 static void dm9000_inblk_16bit(void __iomem *reg, void *data, int count)
 {
-	readsw(reg, data, (count+1) >> 1);
+	ioread16_rep(reg, data, (count+1) >> 1);
 }
 
 static void dm9000_inblk_32bit(void __iomem *reg, void *data, int count)
 {
-	readsl(reg, data, (count+3) >> 2);
+	ioread32_rep(reg, data, (count+3) >> 2);
 }
 
 /* dump block from chip to null */
-- 
1.8.0

^ permalink raw reply related

* [PATCH v2 06/10] net: 8390: use io{read,write}*_rep accessors
From: Will Deacon @ 2012-12-10 19:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, benh, arnd, james.hogan, Matthew Leach, David Miller,
	netdev, Will Deacon
In-Reply-To: <1355166762-15133-1-git-send-email-will.deacon@arm.com>

From: Matthew Leach <matthew@mattleach.net>

The {read,write}s{b,w,l} operations are not defined by all
architectures and are being removed from the asm-generic/io.h
interface.

This patch replaces the usage of these string functions in the 8390
accessors with io{read,write}{8,16,32}_rep calls instead.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Matthew Leach <matthew@mattleach.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/8390/ax88796.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/8390/ax88796.c b/drivers/net/ethernet/8390/ax88796.c
index 203ff9d..0338352 100644
--- a/drivers/net/ethernet/8390/ax88796.c
+++ b/drivers/net/ethernet/8390/ax88796.c
@@ -191,11 +191,11 @@ static void ax_get_8390_hdr(struct net_device *dev, struct e8390_pkt_hdr *hdr,
 	ei_outb(E8390_RREAD+E8390_START, nic_base + NE_CMD);
 
 	if (ei_local->word16)
-		readsw(nic_base + NE_DATAPORT, hdr,
-		       sizeof(struct e8390_pkt_hdr) >> 1);
+		ioread16_rep(nic_base + NE_DATAPORT, hdr,
+			     sizeof(struct e8390_pkt_hdr) >> 1);
 	else
-		readsb(nic_base + NE_DATAPORT, hdr,
-		       sizeof(struct e8390_pkt_hdr));
+		ioread8_rep(nic_base + NE_DATAPORT, hdr,
+			    sizeof(struct e8390_pkt_hdr));
 
 	ei_outb(ENISR_RDC, nic_base + EN0_ISR);	/* Ack intr. */
 	ei_local->dmaing &= ~0x01;
@@ -237,12 +237,12 @@ static void ax_block_input(struct net_device *dev, int count,
 	ei_outb(E8390_RREAD+E8390_START, nic_base + NE_CMD);
 
 	if (ei_local->word16) {
-		readsw(nic_base + NE_DATAPORT, buf, count >> 1);
+		ioread16_rep(nic_base + NE_DATAPORT, buf, count >> 1);
 		if (count & 0x01)
 			buf[count-1] = ei_inb(nic_base + NE_DATAPORT);
 
 	} else {
-		readsb(nic_base + NE_DATAPORT, buf, count);
+		ioread8_rep(nic_base + NE_DATAPORT, buf, count);
 	}
 
 	ei_local->dmaing &= ~1;
@@ -286,9 +286,9 @@ static void ax_block_output(struct net_device *dev, int count,
 
 	ei_outb(E8390_RWRITE+E8390_START, nic_base + NE_CMD);
 	if (ei_local->word16)
-		writesw(nic_base + NE_DATAPORT, buf, count >> 1);
+		iowrite16_rep(nic_base + NE_DATAPORT, buf, count >> 1);
 	else
-		writesb(nic_base + NE_DATAPORT, buf, count);
+		iowrite8_rep(nic_base + NE_DATAPORT, buf, count);
 
 	dma_start = jiffies;
 
-- 
1.8.0

^ permalink raw reply related

* RE: ipgre rss is broken since gro
From: Dmitry Kravkov @ 2012-12-10 19:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, netdev@vger.kernel.org
In-Reply-To: <1355158471.27891.44.camel@edumazet-glaptop>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Monday, December 10, 2012 6:55 PM
> To: Dmitry Kravkov
> Cc: Eric Dumazet; netdev@vger.kernel.org
> Subject: RE: ipgre rss is broken since gro
> 
> On Mon, 2012-12-10 at 11:32 +0000, Dmitry Kravkov wrote:
> 
> 
> dropped:51610  so obviously  one cpu is fully loaded.
Link partner continued pushing data overnight

> 
> I believe performance problem might come from the
> skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
> 
> So all packets are queued into a single GRO queue, instead of being
> split as intended in multiple queues.
> 
> I cant find why we must clear queue_mapping, so could you try :
> 
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 9a78810..4cb27df 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -329,7 +329,6 @@ static inline void __skb_tunnel_rx(struct sk_buff *skb,
> struct net_device *dev)
>  	 */
>  	if (!skb->l4_rxhash)
>  		skb->rxhash = 0;
> -	skb_set_queue_mapping(skb, 0);
>  	skb_dst_drop(skb);
>  	nf_reset(skb);
>  }
> 
Yep, this resolved the issue - Interface is functional after 3 and 100 TCP connections. Thanks
   


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox