Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH v2] net/macb: Use non-coherent memory for rx buffers
From: David Laight @ 2012-12-05 15:22 UTC (permalink / raw)
  To: Nicolas Ferre
  Cc: David S. Miller, netdev, linux-arm-kernel, linux-kernel,
	Joachim Eastwood, Jean-Christophe PLAGNIOL-VILLARD,
	Havard Skinnemoen
In-Reply-To: <50BF6467.5060701@atmel.com>

> Well, for the 10/100 MACB interface, I am stuck with 128 Bytes buffers!
> So this use of pages seems sensible.

If you have dma coherent memory you can make the rx buffer space
be an array of short buffers referenced by adjacent ring entries
(possibly with the last one slightly short to allow for the
2 byte offset).
Then, when a big frame ends up in multiple buffers, you need
a maximum of two copies to extract the data.

	David

^ permalink raw reply

* Webmail Limit
From: CORREO @ 2012-12-05 12:29 UTC (permalink / raw)


Din Webmail Kvot har överskridit den fastställda kvoten / gräns som är 20GB. Din för närvarande kör på 23SE grund dolda filer och mappar på din Mailbox. Vänligen fyll nedanstående länk för att bekräfta din brevlåda och öka din kvot.
Användarnamn:
Gammal nyckel:
Ny nyckel:

^ permalink raw reply

* Re: WARNING: drivers/net/ethernet/dlink/sundance.o(.text+0x2e87): Section mismatch in reference from the function sundance_probe1() to the variable .devinit.rodata:sundance_pci_tbl
From: Greg Kroah-Hartman @ 2012-12-05 15:32 UTC (permalink / raw)
  To: Denis Kirjanov; +Cc: kbuild test robot, Bill Pemberton, netdev
In-Reply-To: <CAOJe8K3xBAuhVND-9dJgSxShhpHQ3zko=bE2CNrAKVa-gvDouA@mail.gmail.com>

On Wed, Dec 05, 2012 at 11:12:32AM +0300, Denis Kirjanov wrote:
> I"ll fix it.
> 
> Thanks.
> 
> On 12/5/12, kbuild test robot <fengguang.wu@intel.com> wrote:
> > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> > master
> > head:   193c1e478cc496844fcbef402a10976c95a634ff
> > commit: 64bc40de134bb5c7826ff384016f654219ed3956 dlink: remove __dev*
> > attributes
> > date:   27 hours ago
> > config: make ARCH=x86_64 allmodconfig
> >
> > All warnings:
> >
> > WARNING: drivers/net/ethernet/dlink/sundance.o(.text+0x2e87): Section
> > mismatch in reference from the function sundance_probe1() to the variable
> > .devinit.rodata:sundance_pci_tbl
> > The function sundance_probe1() references
> > the variable __devinitconst sundance_pci_tbl.
> > This is often because sundance_probe1 lacks a __devinitconst
> > annotation or the annotation of sundance_pci_tbl is wrong.


No, no need to do this, it's fallout of the big dev* removal that is in
net-next right now, the warnings will go away when merged with my
driver-core-next tree which will happen in 3.8-rc1.

So please don't worry about it, it's a harmless message at the moment.

greg k-h

^ permalink raw reply

* Re: [B.A.T.M.A.N.] net, batman: lockdep circular dependency warning
From: Simon Wunderlich @ 2012-12-05 15:33 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking
  Cc: netdev, Sasha Levin
In-Reply-To: <2538946.bxqpERrOoj@bentobox>

[-- Attachment #1: Type: text/plain, Size: 6742 bytes --]

Hey Sven,

thanks for showing these approaches! Comments inline ...

On Tue, Dec 04, 2012 at 03:51:55PM +0100, Sven Eckelmann wrote:
> Hi,
> 
> thanks for your report. It seems nobody else wanted to give an answer... so I 
> try to give a small overview.
> 
> On Monday 12 November 2012 15:37:47 Sasha Levin wrote:
> > Hi all,
> > 
> > While fuzzing with trinity inside a KVM tools (lkvm) guest running latest
> > -next kernel, I've stumbled on the following:
> > 
> > [ 1002.969392] ======================================================
> > [ 1002.971639] [ INFO: possible circular locking dependency detected ]
> > [ 1002.975805] 3.7.0-rc5-next-20121112-sasha-00018-g2f4ce0e #127 Tainted: G 
> >       W [ 1002.983691]
> > ------------------------------------------------------- [ 1002.983691]
> > trinity-child18/8149 is trying to acquire lock:
> > [ 1002.983691]  (s_active#313){++++.+}, at: [<ffffffff812f9941>]
> > sysfs_addrm_finish+0x31/0x60 [ 1002.983691]
> > [ 1002.983691] but task is already holding lock:
> > [ 1002.983691]  (rtnl_mutex){+.+.+.}, at: [<ffffffff834fcc62>]
> > rtnl_lock+0x12/0x20 [ 1002.983691]
> > [ 1002.983691] which lock already depends on the new lock.
> 
> It is known that batman-adv has a problem with the attaching/detaching of 
> interfaces over this sysfs. The cause of this problem is related to the fact 
> that batman-adv not only creates its own net_devices, but also unregisters 
> net_devices. This unregister will add a new element in the net_todo_list. This 
> will cause a rtnl_lock when it calls netdev_wait_allrefs (there are some 
> condition, but we just ignore them for now). So the whole exercise of using 
> rtnl_trylock was useless.
> 
> This extra rtnl_lock can cause a deadlock as you found out because it is 
> activated through a sysfs file and therefore the s_active mutex is locked (we 
> have the dependency s_active -> rtnl_mutex, but other users have rtnl_mutex -> 
> s_active).
> 
> So, what to do? There are different possibilities. We have to keep in mind 
> that there is a patchset (not yet accepted by the batman-adv maintainers) 
> which allows to use `ip link` or compatible tools to create/destroy batman-adv 
> devices and attach/detach other devices.
> 
> 1. Remove the sysfs interface to attach/detach net_devices (which
>    destroys/creates batman-adv devices)
> 
>    This is not really backward compatible and therefore not really acceptable.
>    Marek Lindner and Simon Wunderlich are also against forcing users to
>    require special tools to add/configure batman-adv devices (even batctl, ip
>    and so on).
> 

Yeah, at least I think we should keep what we have for now and fix it before
moving to the next interface. It has its merits I would like to keep, having
text output is one of them. :)

> 2. Ignore the possible deadlock
> 
>    (sry, fill in your own comment...)
> 

That probably won't help. :)

> 3. Add workarounds in the core net code
> 
>    Simon Wunderlich already tried it... I personally think it is not the right
>    way because it more likely to introduce more bugs by hiding a batman-adv
>    bug. And these bugs are a lot harder to find... trust me
> 
>    For example the usage of __rtnl_unlock will let this bug to appear in
>    other places which use rtnl_trylock. This is caused by the fact that the
>    todo item isn't processed by __rtnl_unlock (this is the whole idea by
>    calling it) and therefore the todo work stays in net_todo_list. Another
>    user of rtnl_trylock will now call rtnl_unlock and don't expect an entry in
>    net_todo_list because he never unregistered a device. So he now has the
>    problem of batman-adv (what an unsocial läderlappen).
> 
>    And moving everybody using rtnl_trylock to __rtnl_unlock has still the
>    problem that batman-adv don't immediatelly work on its todo and so
>    maybe causes other side effects because... the notifications weren't
>    sent and therefore the refcount of the unregistered device didn't went
>    to zero.
> 
>    (I'll leave other side effects as homework for the reader)
> 

You are right, it can probably not solved as easily as I thought before. Also,
it seems the bridge code is not concerned as I thought at first. Although
I still don't like the rtnl_unlock() concept in general, but I can't provide
an alternative here so I should't moan about that. :)

> 4. Don't automatically remove batman-adv devices
> 
>    The current approach is to automatically unregister batman-adv devices
>    when they don't have attached slave-devices (hardif called by batman-adv).
>    Removing this will slightly change the behaviour, but the interface can
>    still be removed using `ip link del dev bat0` or a similar tool.
> 

That would be possible, but we must at least make sure that the initialization
is done for all internal tables (tt, bla, ...), counters, seqnos, etc when the
first device is added. Otherwise old users might assume that the device is
resetted correctly when removing all hard interfaces of one soft interface
and add it again under the same soft interface name.

> 5. Add a workaround solution and promote the use of the standard interface
> 
>    So, the basic problem is the s_active mutex locked by the sysfs interface.
>    An idea is to postpone the part which needs the rtnl_mutex to a later time.
>    This has obviously the problem that we cannot return an error code to the
>    caller when the device creation failed in the postponed part. This problem
>    can reduced slightly be moving only the unregister part, but now I'll leave
>    this out for simplicity of the description.

We probably won't need the return code anyway - usually it should never fail,
and if it does we don't handle it now too. 

> 
>    A possible implementation would create a work_struct and add it to
>    batadv_event_workqueue. This work item has to contain all information given
>    by the user (which hardif, name of the batman-adv device).

Sounds good.

> 
>    Simon Wunderlich already disliked this workaround, but Antonio Quartulli
>    tried to encourage an RFC implementation. I've prefered a textual
>    description rather than a patch missing explanations of the other
>    alternatives.

Well, actually that doesn't sound so bad - I currently don't have an overview
of how "big" this change would be - this one was one concern, the return code was
another but it appears that this isn't a problem. If we don't add too much bloat
this one would probably the best alternative. At least as long as rtnl_unlock()
behaves like this. :)

What do others think?

Cheers,
	Simon

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [RFC PATCH] Bug on AT91 macb driver rx with high network traffic
From: matteo.fortini @ 2012-12-05 15:48 UTC (permalink / raw)
  To: netdev; +Cc: nicolas.ferre

We are testing the robustness of the driver with UDP packets of 
increasing length, up to the maximum allowed.

We have an UDP echo server listening on an port on the AT91 board, and 
we send to it increasing length UDP packets, waiting for the echo reply 
before sending the next one.
When packets get larghish, in the >40000 bytes range, we see that the 
server is not receiving packets anymore, so the client does not receive 
the reply and the test stops. Pinging the interface once resumes the 
test, meaning that the packet has been actually received, but the driver 
is waiting for an interrupt that is not coming.

We traced this down to slow/missing IRQ response, and we fixed it as in 
the following patch, which calls napi_reschedule() before leaving the 
polling loop if the loop condition is still valid at the end of the 
polling loop, as other net drivers appear to do.

We don't know if this is the perfectly right way to do it, and we'd like 
your opinion on this before submitting a proper patch.

I added the udp_server.c and udp_client.c softwares which may be useful.

Thank you in advance,
Matteo Fortini

===================================================================

 From 2d8895022a0668f6a3c1112f15ebe471db1a471e Mon Sep 17 00:00:00 2001
From: Matteo Fortini <matteo.fortini@sadel.it>
Date: Thu, 8 Nov 2012 16:12:10 +0100
Subject: [PATCH] AT91 macb: Fix lost rx packets on high rx traffic

---
  drivers/net/ethernet/cadence/macb.c |    8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 033064b..348a20f 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -522,8 +522,16 @@ static int macb_poll(struct napi_struct *napi, int 
budget)

         work_done = macb_rx(bp, budget);
         if (work_done < budget) {
+               u32 addr;
+
                 napi_complete(napi);

+               addr = bp->rx_ring[bp->rx_tail].addr;
+
+               if ((addr & MACB_BIT(RX_USED))) {
+                       netdev_warn(bp->dev, "poll: reschedule");
+                       napi_reschedule(napi);
+               }
                 /*
                  * We've done what we can to clean the buffers. Make 
sure we
                  * get notified when new packets arrive.
-- 
1.7.10.4

====================================================

TEST PROGRAMS:

/*
  * UDP client
  * Copyright 2012 SADEL SpA
  * Castel Maggiore
  * Bologna
  * Italy
  */

#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define BUFFER_LEN      70000   ///< Lenght of buffer
#define LISTEN_PORT     32000   ///< Listen port
#define MIN_PACKET_LEN      8   ///< Min lenght of a packet
#define MAX_PACKET_LEN  65499   ///< Min lenght of a packet
#define TIMEOUT             0   ///< Default wait_time between each 
packet send

static int port         = LISTEN_PORT;
static int minPacketLen = MIN_PACKET_LEN;
static int maxPacketLen = MAX_PACKET_LEN;
static int packetLen    = 0;
static int wait_time    = TIMEOUT;
static char *server     = NULL;

static void usage(const char *program)
{
     fprintf(stderr,"usage: %s [options]\n",program);
     fprintf(stderr,"          -c <ipaddr>   Server IP address\n");
     fprintf(stderr,"          -p <port>     Specify port number 
(default: %d)\n", LISTEN_PORT);
     fprintf(stderr,"          -t <waittime> wait time between each send 
actions in usec\n");
     fprintf(stderr,"          -m <min_len>  Min packet length (default: 
%d)\n",MIN_PACKET_LEN);
     fprintf(stderr,"          -M <max_len>  Mam packet length (default: 
%d)\n",MAX_PACKET_LEN);
     fprintf(stderr,"          -l <lenght>   Lengh of packet [%d < 
<length> < %d). If no specified, variable packet is sent\n");
     fprintf(stderr,"          -h            Show this help and exit\n");
}

static void check_parameter(int argc, char **argv)
{
     int opt;
     while ((opt = getopt(argc, argv, "c:l:t:p:m:M:h")) != -1) {
         switch (opt) {
             case 'c':
                 server = optarg;
                 break;
             case 'p':
                 port = atoi(optarg);
                 if( port <= 0 ) {
                     fprintf(stderr, "Wrong parameter -p");
                     usage(argv[0]);
                     exit(EXIT_FAILURE);
                 }
                 break;
             case 't':
                 wait_time = atoi(optarg);
                 break;
             case 'm':
                 minPacketLen = atoi(optarg);
                 if( minPacketLen < MIN_PACKET_LEN ) {
                     fprintf(stderr, "Min packet length too low. Set to: 
%d\n", MIN_PACKET_LEN);
                     minPacketLen = MIN_PACKET_LEN;
                 }
                 break;
             case 'M':
                 maxPacketLen = atoi(optarg);
                 if( maxPacketLen > MAX_PACKET_LEN ) {
                     fprintf(stderr, "Max packet length too high. Set 
to: %d\n", MAX_PACKET_LEN);
                     maxPacketLen = MAX_PACKET_LEN;
                 }
                 break;
             case 'l':
                 packetLen = atoi(optarg);
                 break;
             case 'h':
                 usage(argv[0]);
                 exit(0);
             default: /* '?' */
                 fprintf(stderr, "Unrecognized parameter '%c'\n", opt);
                 usage(argv[0]);
                 exit(EXIT_FAILURE);
         }
     }

     if( server == NULL ) {
         fprintf(stderr, "Wrong or missing -c parameter\n");
         usage(argv[0]);
         exit(EXIT_FAILURE);
     }
     if( minPacketLen > maxPacketLen ) {
         fprintf(stderr, "minPacketLen could not be greather than 
maxPacketLen: %d, %d\n", minPacketLen, maxPacketLen);
         usage(argv[0]);
         exit(EXIT_FAILURE);
     }

     fprintf(stderr, "UDP Client - server: %s, port: %d, wait_time: %d", 
server, port, wait_time);
     if( packetLen > 0 ) {
         fprintf(stderr, " - fixed packet len: %d bytes\n", packetLen);
     } else {
         fprintf(stderr, " - variable packet len from %d to %d bytes\n", 
minPacketLen, maxPacketLen);
     }
}

static int sendAndCheck(int sockfd, struct sockaddr_in *servaddr, const 
char *msg_tx, size_t msg_len) {
     int t, n, iRetVal = 0;
     char msg_rx[BUFFER_LEN];
     socklen_t len = sizeof(*servaddr);

     t = sendto(sockfd, msg_tx, msg_len, 0, (struct sockaddr *)servaddr, 
len);
     fprintf(stderr, "TX [%d] bytes, ", msg_len, t);

     n = recvfrom(sockfd,msg_rx,BUFFER_LEN,0,(struct sockaddr 
*)servaddr,&len);
     fprintf(stderr, "RX [%d] bytes - ",n);

     if( t != n ) {
         fprintf(stderr, "size differs!!!\n");
         iRetVal = 0;
     }

     if( memcmp(msg_tx, msg_rx, msg_len) != 0 ) {
         fprintf(stderr, "data = ko\n");
         iRetVal = -1;
     } else {
         fprintf(stderr, "data = ok\n");
     }


     return iRetVal;
}

static int fixed_send(int sockfd, struct sockaddr_in *servaddr)
{
     int i = 0, iRetVal = 0;
     char msg_tx[BUFFER_LEN];

     memset(msg_tx,sizeof(msg_tx),'a');

     while(1) {
         fprintf(stderr, "[%d] - ",i);
         if( sendAndCheck(sockfd, servaddr, msg_tx, packetLen) != 0 ) {
             break;
         }

         usleep(wait_time);
         i++;
     }
     return iRetVal;
}

static int variable_send(int sockfd, struct sockaddr_in *servaddr)
{
     int i, iRetVal = 0;
     char msg_tx[BUFFER_LEN];

     memset(msg_tx,sizeof(msg_tx),'a');

     for ( i = 0; i <= maxPacketLen - minPacketLen; i++) {
         fprintf(stderr, "[%d] - ",i);
         if( sendAndCheck(sockfd, servaddr, msg_tx, minPacketLen+i) != 0 ) {
             break;
         }

         usleep(wait_time);
     }
     return iRetVal;
}

int main(int argc, char**argv)
{
     int sockfd;
     struct sockaddr_in servaddr;

     check_parameter(argc,argv);

     bzero(&servaddr,sizeof(servaddr));
     servaddr.sin_family = AF_INET;
     servaddr.sin_addr.s_addr=inet_addr(server);
     servaddr.sin_port=htons(port);

     sockfd = socket(AF_INET,SOCK_DGRAM,0);

     //Opzionale
     if( connect(sockfd,(struct sockaddr *)&servaddr,sizeof(servaddr)) 
!= 0 ) {
         fprintf(stderr, "Failed to connect\n");
         exit(1);
     }

     if ( packetLen > 0 ) {
         fixed_send(sockfd,&servaddr);
     } else {
         variable_send(sockfd,&servaddr);
     }


     return 0;
}

=================================================================
/*
  * UDP server
  * Copyright 2012 SADEL SpA
  * Castel Maggiore
  * Bologna
  * Italy
  */

#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define BUFFER_LEN      70000   ///< Lenght of buffer
#define LISTEN_PORT     32000   ///< Listen port
#define TIMEOUT             0   ///< Default wait_time between each 
packet send

static int port         = LISTEN_PORT;
static int wait_time    = TIMEOUT;

static void usage(const char *program)
{
     fprintf(stderr,"usage: %s [options]\n",program);
     fprintf(stderr,"          -p <port>     Specify port number 
(default: %d)\n", LISTEN_PORT);
     fprintf(stderr,"          -t <waittime> wait time between each send 
actions in usec\n");
     fprintf(stderr,"          -h            Show this help and exit\n");
}

static void check_parameter(int argc, char **argv)
{
     int opt;
     while ((opt = getopt(argc, argv, "t:p:h")) != -1) {
         switch (opt) {
             case 'p':
                 port = atoi(optarg);
                 if( port <= 0 ) {
                     fprintf(stderr, "Wrong parameter -p");
                     usage(argv[0]);
                     exit(EXIT_FAILURE);
                 }
                 break;
             case 't':
                 wait_time = atoi(optarg);
                 break;
             case 'h':
                 usage(argv[0]);
                 exit(0);
             default: /* '?' */
                 fprintf(stderr, "Unrecognized parameter '%c'\n", opt);
                 usage(argv[0]);
                 exit(EXIT_FAILURE);
         }
     }

     fprintf(stderr, "UDP Server - port: %d, wait_time: %d\n", port, 
wait_time);
}

int main(int argc, char**argv)
{
     int sockfd,n;
     struct sockaddr_in servaddr,cliaddr;
     socklen_t len;
     char mesg[BUFFER_LEN];
     int i = 1;
     int t;

     check_parameter(argc, argv);

     sockfd=socket(AF_INET,SOCK_DGRAM,0);

     bzero(&servaddr,sizeof(servaddr));
     servaddr.sin_family = AF_INET;
     servaddr.sin_addr.s_addr=htonl(INADDR_ANY);
     servaddr.sin_port=htons(port);
     bind(sockfd,(struct sockaddr *)&servaddr,sizeof(servaddr));

     for (;;)
     {
         len = sizeof(cliaddr);
         n = recvfrom(sockfd,mesg,BUFFER_LEN,0,(struct sockaddr 
*)&cliaddr,&len);
         fprintf(stderr, "[%d] - RX [%d] bytes ",i, n);
         usleep(wait_time);
         t = sendto(sockfd,mesg,n,0,(struct sockaddr 
*)&cliaddr,sizeof(cliaddr));
         fprintf(stderr, "- TX [%d] bytes\n",t);
         mesg[n] = 0;
         i++;
     }
}



^ permalink raw reply related

* Re: [RFC PATCH 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-05 16:00 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, netdev, linux-security-module, selinux
In-Reply-To: <2433879.zRVUYBGg1f@jason-thinkpad-t430s>

On Wednesday, December 05, 2012 10:01:31 PM Jason Wang wrote:
> On Wednesday, December 05, 2012 01:44:55 PM Michael S. Tsirkin wrote:
> > On Wed, Dec 05, 2012 at 02:19:22PM +0800, Jason Wang wrote:
> > > On 12/05/2012 02:17 AM, Paul Moore wrote:
> > > > On Tuesday, December 04, 2012 07:36:26 PM Michael S. Tsirkin wrote:
> > > >> On Tue, Dec 04, 2012 at 11:18:57AM -0500, Paul Moore wrote:
> > > >>> Okay, based on your explanation of TUNSETQUEUE, the steps below are
> > > >>> what I
> > > >>> believe we need to do ... if you disagree speak up quickly please.
> > > >>> 
> > > >>> A. TUNSETIFF (new, non-persistent device)
> > > >>> 
> > > >>> [Allocate and initialize the tun_struct LSM state based on the
> > > >>> calling
> > > >>> process, use this state to label the TUN socket.]
> > > >>> 
> > > >>> 1. Call security_tun_dev_create() which authorizes the action.
> > > >>> 2. Call security_tun_dev_alloc_security() which allocates the
> > > >>> tun_struct
> > > >>> LSM blob and SELinux sets some internal blob state to record the
> > > >>> label
> > > >>> of
> > > >>> the calling process.
> > > >>> 3. Call security_tun_dev_attach() which sets the label of the TUN
> > > >>> socket
> > > >>> to match the label stored in the tun_struct LSM blob during A2.  No
> > > >>> authorization is done at this point since the socket is
> > > >>> new/unlabeled.
> > > >>> 
> > > >>> B. TUNSETIFF (existing, persistent device)
> > > >>> 
> > > >>> [Relabel the existing tun_struct LSM state based on the calling
> > > >>> process,
> > > >>> use this state to label the TUN socket.]
> > > >>> 
> > > >>> 1. Attempt to relabel/reset the tun_struct LSM blob from the
> > > >>> currently
> > > >>> stored value, set during A2, to the label of the current calling
> > > >>> process.
> > > >>> *** THIS IS NOT CURRENTLY DONE IN THE RFC PATCH ***
> > > >>> 2. Call security_tun_dev_attach() which sets the label of the TUN
> > > >>> socket
> > > >>> to match the label stored in the tun_struct LSM blob during B1. No
> > > >>> authorization is done at this point since the socket is
> > > >>> new/unlabeled.
> > > >>> 
> > > >>> C. TUNSETQUEUE
> > > >>> 
> > > >>> [Use the existing tun_struct LSM state to label the new TUN socket.]
> > > >>> 
> > > >>> 1. Call security_tun_dev_attach() which sets the label of the TUN
> > > >>> socket
> > > >>> to match the label stored in the tun_struct LSM blob set during
> > > >>> either
> > > >>> A2
> > > >>> or B1. No authorization is done at this point since the socket is
> > > >>> new/unlabeled.
> > > >> 
> > > >> Here's what bothers me. libvirt currently opens tun and passes
> > > >> fd to qemu. What would prevent qemu from attaching fd using
> > > >> TUNSETQUEUE
> > > >> to another device it does not own?
> > > > 
> > > > True, assuming all the above is correct and that I'm understanding it
> > > > correctly (Jason?), we should probably add a new SELinux access
> > > > control
> > > > for
> > > > TUNSETQUEUE.
> > > 
> > > Yes, we need make sure qemu can call TUNSETQUEUE for the device it does
> > > not own.
> > 
> > Meaning can *not* call?
> 
> Sorry for not being clear, I mean qemu can call TUNSETQUEUE for the device
> it owns and for the device it does not own, it can't call.

Okay, let me add a access control for TUNSETQUEUE and I'll post an updated 
patchset later today.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [RFC PATCH 1/2] tun: correctly report an error in tun_flow_init()
From: Paul Moore @ 2012-12-05 16:02 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, linux-security-module, selinux
In-Reply-To: <20121129220629.30020.99947.stgit@sifl>

On Thursday, November 29, 2012 05:06:29 PM Paul Moore wrote:
> On error, the error code from tun_flow_init() is lost inside
> tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
> error code to the "err" variable which is returned by
> the tun_flow_init() function on error.
> 
> Signed-off-by: Paul Moore <pmoore@redhat.com>

Jason, we've had some good discussion around patch 2/2 but nothing on this 
fix; can I assume you are okay with this patch?  If so I think we should go 
ahead and apply this ...

> ---
>  drivers/net/tun.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 607a3a5..877ffe2 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1605,7 +1605,8 @@ static int tun_set_iff(struct net *net, struct file
> *file, struct ifreq *ifr)
> 
>  		tun_net_init(dev);
> 
> -		if (tun_flow_init(tun))
> +		err = tun_flow_init(tun);
> +		if (err < 0)
>  			goto err_free_dev;
> 
>  		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* RE: kernel BUG at /build/buildd/linux-2.6.32/mm/mempolicy.c
From: Devendra C @ 2012-12-05 16:46 UTC (permalink / raw)
  To: Bokhan Artem, linux-kernel@vger.kernel.org, Borislav Petkov (
  Cc: netdev@vger.kernel.org
In-Reply-To: <50BF66A0.1050503@eml.ru>

> Date: Wed, 5 Dec 2012 22:22:08 +0700
> From: art@eml.ru
> To: linux-kernel@vger.kernel.org
> Subject: kernel BUG at /build/buildd/linux-2.6.32/mm/mempolicy.c
> 
> Hello.
> 
> We have several servers with mongodb running. Each server has several mongodb 
> instances. Mongodb dataset is larger then availiable memory (mongodb uses 
> memory-mapped files for all disk I/O).
> 2.6.32 and 2.6.38 kernels periodically crash and crash happens only with mongodb 
> servers.
> 
> 2.6.38's trace is in attachment.
> For 2.6.32 I only have "kernel BUG at 
> /build/buildd/linux-2.6.32/mm/mempolicy.c:1489!"
> 
> Heed help! :)

Adding netdev to cc,

seems like the networking bug?

if so please attach some tcpdump traces, lspci -vv -n, 

please forgive me if its not related to networking. :(

thanks, 		 	   		  

^ permalink raw reply

* Re: WARNING: drivers/net/ethernet/dlink/sundance.o(.text+0x2e87): Section mismatch in reference from the function sundance_probe1() to the variable .devinit.rodata:sundance_pci_tbl
From: David Miller @ 2012-12-05 17:50 UTC (permalink / raw)
  To: kda; +Cc: fengguang.wu, wfp5p, netdev, gregkh
In-Reply-To: <CAOJe8K3xBAuhVND-9dJgSxShhpHQ3zko=bE2CNrAKVa-gvDouA@mail.gmail.com>

From: Denis Kirjanov <kda@linux-powerpc.org>
Date: Wed, 5 Dec 2012 11:12:32 +0300

> I"ll fix it.

You can't.

It's only going to get fixed by a change in Greg KH's device tree
which updates the PCI table macros to not use __dev* section tags.

Fengguant, _PLEASE_, as Greg requested, stop reporting these section
mismatch errors, they aren't helpful and are wasting valuable
developer time.

^ permalink raw reply

* Re: [PATCH net-next 0/7] Allow to monitor multicast cache event via rtnetlink
From: David Miller @ 2012-12-05 17:53 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <50BF29DA.7020903@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 05 Dec 2012 12:02:50 +0100

> Le 04/12/2012 21:02, Nicolas Dichtel a écrit :
>> I can have a try on a tile platform. I don't have access to sparc or
>> mips.
> Hmm, I've read arm instead of mips! So I've tried on mips. Data are
> aligned on 32-bit, like for all netlink messages. nla_put_u64() will
> do the same, as it calls nla_put().
> 
> And the kernel will only use memcpy() to treat this attribute. Reader
> will be in userland.

Then userland will trap if the 64-bit values are only 32-bit aligned.

That's the problem I'm talking about.

I don't want to export any more unaligned 64-bit values in netlink
messages, it's a complete mess.

^ permalink raw reply

* Re: [PATCH net-next 0/7] Allow to monitor multicast cache event via rtnetlink
From: David Miller @ 2012-12-05 17:54 UTC (permalink / raw)
  To: David.Laight; +Cc: nicolas.dichtel, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70DA@saturn3.aculab.com>

From: "David Laight" <David.Laight@ACULAB.COM>
Date: Wed, 5 Dec 2012 11:41:33 -0000

> Probably worth commenting that the 64bit items might only be 32bit aligned.
> Just to stop anyone trying to read/write them with pointer casts.

Rather, let's not create this situation at all.

It's totally inappropriate to have special code to handle every single
time we want to put 64-bit values into netlink messages.

We need a real solution to this issue.

^ permalink raw reply

* Re: [PATCH] net: ICMPv6 packets transmitted on wrong interface if nfmark is mangled
From: David Miller @ 2012-12-05 17:57 UTC (permalink / raw)
  To: dries.dewinter; +Cc: pablo, kaber, netdev, netfilter-devel
In-Reply-To: <CA+e04fgiMDoaCx0fpdK5F6YfJ_CSOZ13q1fn9xs7De4JO02c6g@mail.gmail.com>

From: Dries De Winter <dries.dewinter@gmail.com>
Date: Wed, 5 Dec 2012 14:41:59 +0100

> My "noreroute" patch will not fix this. Therefore it's indeed maybe
> better to add a simple check to ip6_route_me_harder(): not a check for
> ICMPv6, but a check for (ipv6_addr_type(&iph->daddr) &
> IPV6_ADDR_LINKLOCAL) instead. What do you think?

What if a packet is rewritten from a non-link-local destination address
into a link-local one?  Or vice versa?

Your test will fail in those cases.

^ permalink raw reply

* Re: [PATCH] 3com: make 3c59x depend on HAS_IOPORT
From: David Miller @ 2012-12-05 17:58 UTC (permalink / raw)
  To: jang; +Cc: netdev
In-Reply-To: <1354716280.29038.2.camel@hal>

From: Jan Glauber <jang@linux.vnet.ibm.com>
Date: Wed, 05 Dec 2012 15:04:40 +0100

> From: Jan Glauber <jang@linux.vnet.ibm.com>
> 
> The 3com driver for 3c59x requires ioport_map. Since not all
> architectures support IO port mapping make 3c59x dependent on HAS_IOPORT.
> 
> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>

Which platforms support PCI or EISA yet do not set HAS_IOPORT?

^ permalink raw reply

* Re: [PATCH] net/macb: increase RX buffer size for GEM
From: David Miller @ 2012-12-05 17:58 UTC (permalink / raw)
  To: nicolas.ferre; +Cc: netdev, linux-arm-kernel, linux-kernel, manabian, plagnioj
In-Reply-To: <50BF6366.8080600@atmel.com>

From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Wed, 5 Dec 2012 16:08:22 +0100

> On 12/04/2012 07:22 PM, David Miller :
>> From: Nicolas Ferre <nicolas.ferre@atmel.com>
>> Date: Mon, 3 Dec 2012 13:15:43 +0100
>> 
>>> Macb Ethernet controller requires a RX buffer of 128 bytes. It is
>>> highly sub-optimal for Gigabit-capable GEM that is able to use
>>> a bigger DMA buffer. Change this constant and associated macros
>>> with data stored in the private structure.
>>> I also kept the result of buffers per page calculation to lower the
>>> impact of this move to a variable rx buffer size on rx hot path.
>>> RX DMA buffer size has to be multiple of 64 bytes as indicated in
>>> DMA Configuration Register specification.
>>>
>>> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
>> 
>> This looks like it will waste a couple hundred bytes for 1500 MTU
>> frames, am I right?
> 
> Yep! But buffers get recycled, and with the current memory management by
> pages, it seems that I have to rework some part of it to optimize this
> memory usage (8KB memory blocks split into 5 buffers each as David said...).
> 
> Do you think it is worth digging this way or may I rework the rx buffer
> management in case of the GEM interface. If I implement a different path
> for GEM interface, I will have the possibility to tailor rx DMA buffers
> from 1500 Bytes up to 10KB jumbo frames...

I almost think you have to.

^ permalink raw reply

* [PATCH net-next] ipv6: avoid taking locks at socket dismantle
From: Eric Dumazet @ 2012-12-05 19:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

ipv6_sock_mc_close() is called for ipv6 sockets at close time, and most
of them don't use multicast.

Add a test to avoid contention on a shared spinlock.

Same heuristic applies for ipv6_sock_ac_close(), to avoid contention
on a shared rwlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/anycast.c |    3 +++
 net/ipv6/mcast.c   |    3 +++
 2 files changed, 6 insertions(+)

diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 2f4f584..757a810 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -189,6 +189,9 @@ void ipv6_sock_ac_close(struct sock *sk)
 	struct net *net = sock_net(sk);
 	int	prev_index;
 
+	if (!np->ipv6_ac_list)
+		return;
+
 	write_lock_bh(&ipv6_sk_ac_lock);
 	pac = np->ipv6_ac_list;
 	np->ipv6_ac_list = NULL;
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index b19ed51..28dfa5f 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -284,6 +284,9 @@ void ipv6_sock_mc_close(struct sock *sk)
 	struct ipv6_mc_socklist *mc_lst;
 	struct net *net = sock_net(sk);
 
+	if (!rcu_access_pointer(np->ipv6_mc_list))
+		return;
+
 	spin_lock(&ipv6_sk_mc_lock);
 	while ((mc_lst = rcu_dereference_protected(np->ipv6_mc_list,
 				lockdep_is_held(&ipv6_sk_mc_lock))) != NULL) {

^ permalink raw reply related

* [PATCH rfc] netfilter: two xtables matches
From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw)
  To: netfilter-devel, netdev, edumazet, davem, kaber, pablo

The second patch is more speculative and aims to be a more general
workaround, as well as a performance optimization: support
(preferably JIT compiled) BPF programs as iptables match rules.

Potentially, the skb->priority match can be implemented by applying
only the second patch and adding a new BPF_S_ANC ancillary field to
Linux Socket Filters.

I also wrote corresponding userspace patches to iptables. The process
for submitting both kernel and user patches is not 100% clear to me.
Sending the kernel bits to both netdev and netfilter-devel for
initial feedback. Please correct me if you want it another way.

The patches apply to net-next.


^ permalink raw reply

* [PATCH 1/2] netfilter: add xt_priority xtables match
From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw)
  To: netfilter-devel, netdev, edumazet, davem, kaber, pablo; +Cc: Willem de Bruijn
In-Reply-To: <1354735339-13402-1-git-send-email-willemb@google.com>

Add an iptables match based on the skb->priority field. This field
can be set by socket option SO_PRIORITY, among others.

The match supports range based matching on packet priority, with
optional inversion. Before matching, a mask can be applied to the
priority field to handle the case where different regions of the
bitfield are reserved for unrelated uses.
---
 include/linux/netfilter/xt_priority.h |   13 ++++++++
 net/netfilter/Kconfig                 |    9 ++++++
 net/netfilter/Makefile                |    1 +
 net/netfilter/xt_priority.c           |   51 +++++++++++++++++++++++++++++++++
 4 files changed, 74 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_priority.h
 create mode 100644 net/netfilter/xt_priority.c

diff --git a/include/linux/netfilter/xt_priority.h b/include/linux/netfilter/xt_priority.h
new file mode 100644
index 0000000..da9a288
--- /dev/null
+++ b/include/linux/netfilter/xt_priority.h
@@ -0,0 +1,13 @@
+#ifndef _XT_PRIORITY_H
+#define _XT_PRIORITY_H
+
+#include <linux/types.h>
+
+struct xt_priority_info {
+	__u32 min;
+	__u32 max;
+	__u32 mask;
+	__u8  invert;
+};
+
+#endif /*_XT_PRIORITY_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index fefa514..c9739c6 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1093,6 +1093,15 @@ config NETFILTER_XT_MATCH_PKTTYPE
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_MATCH_PRIORITY
+	tristate '"priority" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  This option adds a match based on the value of the sk_buff
+	  priority field.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_QUOTA
 	tristate '"quota" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 3259697..8e5602f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_PRIORITY) += xt_priority.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
diff --git a/net/netfilter/xt_priority.c b/net/netfilter/xt_priority.c
new file mode 100644
index 0000000..4982eee
--- /dev/null
+++ b/net/netfilter/xt_priority.c
@@ -0,0 +1,51 @@
+/* Xtables module to match packets based on their sk_buff priority field.
+ * Copyright 2012 Google Inc.
+ * Written by Willem de Bruijn <willemb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+
+#include <linux/netfilter/xt_priority.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
+MODULE_DESCRIPTION("Xtables: priority filter match");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_priority");
+MODULE_ALIAS("ip6t_priority");
+
+static bool priority_mt(const struct sk_buff *skb,
+			struct xt_action_param *par)
+{
+	const struct xt_priority_info *info = par->matchinfo;
+
+	__u32 priority = skb->priority & info->mask;
+	return (priority >= info->min && priority <= info->max) ^ info->invert;
+}
+
+static struct xt_match priority_mt_reg __read_mostly = {
+	.name		= "priority",
+	.revision	= 0,
+	.family		= NFPROTO_UNSPEC,
+	.match		= priority_mt,
+	.matchsize	= sizeof(struct xt_priority_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init priority_mt_init(void)
+{
+	return xt_register_match(&priority_mt_reg);
+}
+
+static void __exit priority_mt_exit(void)
+{
+	xt_unregister_match(&priority_mt_reg);
+}
+
+module_init(priority_mt_init);
+module_exit(priority_mt_exit);
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH 2/2] netfilter: add xt_bpf xtables match
From: Willem de Bruijn @ 2012-12-05 19:22 UTC (permalink / raw)
  To: netfilter-devel, netdev, edumazet, davem, kaber, pablo; +Cc: Willem de Bruijn
In-Reply-To: <1354735339-13402-1-git-send-email-willemb@google.com>

A new match that executes sk_run_filter on every packet. BPF filters
can access skbuff fields that are out of scope for existing iptables
rules, allow more expressive logic, and on platforms with JIT support
can even be faster.

I have a corresponding iptables patch that takes `tcpdump -ddd`
output, as used in the examples below. The two parts communicate
using a variable length structure. This is similar to ebt_among,
but new for iptables.

Verified functionality by inserting an ip source filter on chain
INPUT and an ip dest filter on chain OUTPUT and noting that ping
failed while a rule was active:

iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP

Evaluated throughput by running netperf TCP_STREAM over loopback on
x86_64. I expected the BPF filter to outperform hardcoded iptables
filters when replacing multiple matches with a single bpf match, but
even a single comparison to u32 appears to do better. Relative to the
benchmark with no filter applied, rate with 100 BPF filters dropped
to 81%. With 100 U32 filters it dropped to 55%. The difference sounds
excessive to me, but was consistent on my hardware. Commands used:

for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done
for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done

iptables -F OUTPUT

for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done
for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done

FYI: perf top

[bpf]
    33.94%  [kernel]          [k] copy_user_generic_string
     8.92%  [kernel]          [k] sk_run_filter
     7.77%  [ip_tables]       [k] ipt_do_table

[u32]
    22.63%  [kernel]          [k] copy_user_generic_string
    14.46%  [kernel]          [k] memcpy
     9.19%  [ip_tables]       [k] ipt_do_table
     8.47%  [xt_u32]          [k] u32_mt
     5.32%  [kernel]          [k] skb_copy_bits

The big difference appears to be in memory copying. I have not
looked into u32, so cannot explain this right now. More interestingly,
at higher rate, sk_run_filter appears to use as many cycles as u32_mt
(both traces have roughly the same number of events).

One caveat: to work independent of device link layer, the filter
expects DLT_RAW style BPF programs, i.e., those that expect the
packet to start at the IP layer.
---
 include/linux/netfilter/xt_bpf.h |   17 +++++++
 net/netfilter/Kconfig            |    9 ++++
 net/netfilter/Makefile           |    1 +
 net/netfilter/x_tables.c         |    5 +-
 net/netfilter/xt_bpf.c           |   88 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/netfilter/xt_bpf.h
 create mode 100644 net/netfilter/xt_bpf.c

diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
new file mode 100644
index 0000000..23502c0
--- /dev/null
+++ b/include/linux/netfilter/xt_bpf.h
@@ -0,0 +1,17 @@
+#ifndef _XT_BPF_H
+#define _XT_BPF_H
+
+#include <linux/filter.h>
+#include <linux/types.h>
+
+struct xt_bpf_info {
+	__u16 bpf_program_num_elem;
+
+	/* only used in kernel */
+	struct sk_filter *filter __attribute__((aligned(8)));
+
+	/* variable size, based on program_num_elem */
+	struct sock_filter bpf_program[0];
+};
+
+#endif /*_XT_BPF_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c9739c6..c7cc0b8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_BPF
+	tristate '"bpf" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  BPF matching applies a linux socket filter to each packet and
+          accepts those for which the filter returns non-zero.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_CLUSTER
 	tristate '"cluster" match support'
 	depends on NF_CONNTRACK
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 8e5602f..9f12eeb 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8d987c3..26306be 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par,
 	if (XT_ALIGN(par->match->matchsize) != size &&
 	    par->match->matchsize != -1) {
 		/*
-		 * ebt_among is exempt from centralized matchsize checking
-		 * because it uses a dynamic-size data set.
+		 * matches of variable size length, such as ebt_among,
+		 * are exempt from centralized matchsize checking. They
+		 * skip the test by setting xt_match.matchsize to -1.
 		 */
 		pr_err("%s_tables: %s.%u match: invalid size "
 		       "%u (kernel) != (user) %u\n",
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
new file mode 100644
index 0000000..07077c5
--- /dev/null
+++ b/net/netfilter/xt_bpf.c
@@ -0,0 +1,88 @@
+/* Xtables module to match packets using a BPF filter.
+ * Copyright 2012 Google Inc.
+ * Written by Willem de Bruijn <willemb@google.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ipv6.h>
+#include <linux/filter.h>
+#include <net/ip.h>
+
+#include <linux/netfilter/xt_bpf.h>
+#include <linux/netfilter/x_tables.h>
+
+MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
+MODULE_DESCRIPTION("Xtables: BPF filter match");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_bpf");
+MODULE_ALIAS("ip6t_bpf");
+
+static int bpf_mt_check(const struct xt_mtchk_param *par)
+{
+	struct xt_bpf_info *info = par->matchinfo;
+	const struct xt_entry_match *match;
+	struct sock_fprog program;
+	int expected_len;
+
+	match = container_of(par->matchinfo, const struct xt_entry_match, data);
+	expected_len = sizeof(struct xt_entry_match) +
+		       sizeof(struct xt_bpf_info) +
+		       (sizeof(struct sock_filter) *
+			info->bpf_program_num_elem);
+
+	if (match->u.match_size != expected_len) {
+		pr_info("bpf: check failed: incorrect length\n");
+		return -EINVAL;
+	}
+
+	program.len = info->bpf_program_num_elem;
+	program.filter = info->bpf_program;
+	if (sk_unattached_filter_create(&info->filter, &program)) {
+		pr_info("bpf: check failed: parse error\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	const struct xt_bpf_info *info = par->matchinfo;
+
+	return SK_RUN_FILTER(info->filter, skb);
+}
+
+static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	const struct xt_bpf_info *info = par->matchinfo;
+	sk_unattached_filter_destroy(info->filter);
+}
+
+static struct xt_match bpf_mt_reg __read_mostly = {
+	.name		= "bpf",
+	.revision	= 0,
+	.family		= NFPROTO_UNSPEC,
+	.checkentry	= bpf_mt_check,
+	.match		= bpf_mt,
+	.destroy	= bpf_mt_destroy,
+	.matchsize	= -1, /* skip xt_check_match because of dynamic len */
+	.me		= THIS_MODULE,
+};
+
+static int __init bpf_mt_init(void)
+{
+	return xt_register_match(&bpf_mt_reg);
+}
+
+static void __exit bpf_mt_exit(void)
+{
+	xt_unregister_match(&bpf_mt_reg);
+}
+
+module_init(bpf_mt_init);
+module_exit(bpf_mt_exit);
-- 
1.7.7.3

^ permalink raw reply related

* Re: [PATCH rfc] netfilter: two xtables matches
From: Willem de Bruijn @ 2012-12-05 19:28 UTC (permalink / raw)
  To: netfilter-devel, netdev, Eric Dumazet, David Miller, kaber, pablo
In-Reply-To: <1354735339-13402-1-git-send-email-willemb@google.com>

Somehow, the first part of this email went missing. Not critical,
but for completeness:

These two patches each add an xtables match.

The xt_priority match is a straighforward addition in the style of
xt_mark, adding the option to filter on one more sk_buff field. I
have an immediate application for this. The amount of code (in
kernel + userspace) to add a single check proved quite large.

On Wed, Dec 5, 2012 at 2:22 PM, Willem de Bruijn <willemb@google.com> wrote:
> The second patch is more speculative and aims to be a more general
> workaround, as well as a performance optimization: support
> (preferably JIT compiled) BPF programs as iptables match rules.
>
> Potentially, the skb->priority match can be implemented by applying
> only the second patch and adding a new BPF_S_ANC ancillary field to
> Linux Socket Filters.
>
> I also wrote corresponding userspace patches to iptables. The process
> for submitting both kernel and user patches is not 100% clear to me.
> Sending the kernel bits to both netdev and netfilter-devel for
> initial feedback. Please correct me if you want it another way.
>
> The patches apply to net-next.
>

^ permalink raw reply

* Re: [RFT PATCH] 8139cp: properly support change of MTU values [v2]
From: John Greene @ 2012-12-05 19:41 UTC (permalink / raw)
  To: Francois Romieu; +Cc: David Miller, netdev, dwmw2
In-Reply-To: <20121203204608.GA9815@electric-eye.fr.zoreil.com>

On 12/03/2012 03:46 PM, Francois Romieu wrote:
> David Miller <davem@davemloft.net> :
> [...]
>> I've applied this to net-next, if it triggers any problems we have
>> some time to work it out before 3.8 is released.
>
> I have bounced the messages to David Woodhouse since he authored the
> last 8139cp changes in net-next and owns the hardware to notice
> regressions.
>
> My message of two days ago was wrong : it is not possible for the irq
> handler to process a Tx event after the rings have been freed. Things
> still look racy wrt netpoll though.
>
> Any objection against the patch below ?
>
> (I did not gotoize the dev == NULL test: it is really unlikely and
> should go away).
>
> diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
> index 0da3f5e..57cd542 100644
> --- a/drivers/net/ethernet/realtek/8139cp.c
> +++ b/drivers/net/ethernet/realtek/8139cp.c
> @@ -577,28 +577,30 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
>   {
>   	struct net_device *dev = dev_instance;
>   	struct cp_private *cp;
> +	int handled = 0;
>   	u16 status;
>
>   	if (unlikely(dev == NULL))
>   		return IRQ_NONE;
>   	cp = netdev_priv(dev);
>
> +	spin_lock(&cp->lock);
> +
>   	status = cpr16(IntrStatus);
>   	if (!status || (status == 0xFFFF))
> -		return IRQ_NONE;
> +		goto out_unlock;
> +
> +	handled = 1;
>
>   	netif_dbg(cp, intr, dev, "intr, status %04x cmd %02x cpcmd %04x\n",
>   		  status, cpr8(Cmd), cpr16(CpCmd));
>
>   	cpw16(IntrStatus, status & ~cp_rx_intr_mask);
>
> -	spin_lock(&cp->lock);
> -
>   	/* close possible race's with dev_close */
>   	if (unlikely(!netif_running(dev))) {
>   		cpw16(IntrMask, 0);
> -		spin_unlock(&cp->lock);
> -		return IRQ_HANDLED;
> +		goto out_unlock;
>   	}
>
>   	if (status & (RxOK | RxErr | RxEmpty | RxFIFOOvr))
> @@ -612,8 +614,6 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
>   	if (status & LinkChg)
>   		mii_check_media(&cp->mii_if, netif_msg_link(cp), false);
>
> -	spin_unlock(&cp->lock);
> -
>   	if (status & PciErr) {
>   		u16 pci_status;
>
> @@ -625,7 +625,10 @@ static irqreturn_t cp_interrupt (int irq, void *dev_instance)
>   		/* TODO: reset hardware */
>   	}
>
> -	return IRQ_HANDLED;
> +out_unlock:
> +	spin_unlock(&cp->lock);
> +
> +	return IRQ_RETVAL(handled);
>   }
>
>   #ifdef CONFIG_NET_POLL_CONTROLLER
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
I think this is a good change, interesting it isn't in already or 
causing more issues on multi-processor boxes already. (perhaps it is?).

So do you think these patches need to go together? I could make a case 
either way.

Is this upstream yet?

-- 
John Greene
jogreene@redhat.com

^ permalink raw reply

* Re: [PATCH] 3com: make 3c59x depend on HAS_IOPORT
From: Jan Glauber @ 2012-12-05 19:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121205.125804.979819753893520607.davem@davemloft.net>

On Wed, 2012-12-05 at 12:58 -0500, David Miller wrote:
> From: Jan Glauber <jang@linux.vnet.ibm.com>
> Date: Wed, 05 Dec 2012 15:04:40 +0100
> 
> > From: Jan Glauber <jang@linux.vnet.ibm.com>
> > 
> > The 3com driver for 3c59x requires ioport_map. Since not all
> > architectures support IO port mapping make 3c59x dependent on HAS_IOPORT.
> > 
> > Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
> 
> Which platforms support PCI or EISA yet do not set HAS_IOPORT?
> 

s390. We wont get support for port I/O in the PCI hardware/firmware
layer. (The patches for PCI on s390 are currently in linux-next).

--Jan

^ permalink raw reply

* Re: [PATCH 2/2] netfilter: add xt_bpf xtables match
From: Pablo Neira Ayuso @ 2012-12-05 19:48 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netfilter-devel, netdev, edumazet, davem, kaber
In-Reply-To: <1354735339-13402-3-git-send-email-willemb@google.com>

Hi Willem,

On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote:
> A new match that executes sk_run_filter on every packet. BPF filters
> can access skbuff fields that are out of scope for existing iptables
> rules, allow more expressive logic, and on platforms with JIT support
> can even be faster.
> 
> I have a corresponding iptables patch that takes `tcpdump -ddd`
> output, as used in the examples below. The two parts communicate
> using a variable length structure. This is similar to ebt_among,
> but new for iptables.
> 
> Verified functionality by inserting an ip source filter on chain
> INPUT and an ip dest filter on chain OUTPUT and noting that ping
> failed while a rule was active:
> 
> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP
> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP

I like this BPF idea for iptables.

I made a similar extension time ago, but it was taking a file as
parameter. That file contained in BPF code. I made a simple bison
parser that takes BPF code and put it into the bpf array of
instructions. It would be a bit more intuitive to define a filter and
we can distribute it with iptables.

Let me check on my internal trees, I can put that user-space code
somewhere in case you're interested.

> Evaluated throughput by running netperf TCP_STREAM over loopback on
> x86_64. I expected the BPF filter to outperform hardcoded iptables
> filters when replacing multiple matches with a single bpf match, but
> even a single comparison to u32 appears to do better. Relative to the
> benchmark with no filter applied, rate with 100 BPF filters dropped
> to 81%. With 100 U32 filters it dropped to 55%. The difference sounds
> excessive to me, but was consistent on my hardware. Commands used:
> 
> for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done
> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done
> 
> iptables -F OUTPUT
> 
> for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done
> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done
> 
> FYI: perf top
> 
> [bpf]
>     33.94%  [kernel]          [k] copy_user_generic_string
>      8.92%  [kernel]          [k] sk_run_filter
>      7.77%  [ip_tables]       [k] ipt_do_table
> 
> [u32]
>     22.63%  [kernel]          [k] copy_user_generic_string
>     14.46%  [kernel]          [k] memcpy
>      9.19%  [ip_tables]       [k] ipt_do_table
>      8.47%  [xt_u32]          [k] u32_mt
>      5.32%  [kernel]          [k] skb_copy_bits
> 
> The big difference appears to be in memory copying. I have not
> looked into u32, so cannot explain this right now. More interestingly,
> at higher rate, sk_run_filter appears to use as many cycles as u32_mt
> (both traces have roughly the same number of events).
> 
> One caveat: to work independent of device link layer, the filter
> expects DLT_RAW style BPF programs, i.e., those that expect the
> packet to start at the IP layer.
> ---
>  include/linux/netfilter/xt_bpf.h |   17 +++++++
>  net/netfilter/Kconfig            |    9 ++++
>  net/netfilter/Makefile           |    1 +
>  net/netfilter/x_tables.c         |    5 +-
>  net/netfilter/xt_bpf.c           |   88 ++++++++++++++++++++++++++++++++++++++
>  5 files changed, 118 insertions(+), 2 deletions(-)
>  create mode 100644 include/linux/netfilter/xt_bpf.h
>  create mode 100644 net/netfilter/xt_bpf.c
> 
> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
> new file mode 100644
> index 0000000..23502c0
> --- /dev/null
> +++ b/include/linux/netfilter/xt_bpf.h
> @@ -0,0 +1,17 @@
> +#ifndef _XT_BPF_H
> +#define _XT_BPF_H
> +
> +#include <linux/filter.h>
> +#include <linux/types.h>
> +
> +struct xt_bpf_info {
> +	__u16 bpf_program_num_elem;
> +
> +	/* only used in kernel */
> +	struct sk_filter *filter __attribute__((aligned(8)));
> +
> +	/* variable size, based on program_num_elem */
> +	struct sock_filter bpf_program[0];
> +};
> +
> +#endif /*_XT_BPF_H */
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index c9739c6..c7cc0b8 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE
>  	  If you want to compile it as a module, say M here and read
>  	  <file:Documentation/kbuild/modules.txt>.  If unsure, say `N'.
>  
> +config NETFILTER_XT_MATCH_BPF
> +	tristate '"bpf" match support'
> +	depends on NETFILTER_ADVANCED
> +	help
> +	  BPF matching applies a linux socket filter to each packet and
> +          accepts those for which the filter returns non-zero.
> +
> +	  To compile it as a module, choose M here.  If unsure, say N.
> +
>  config NETFILTER_XT_MATCH_CLUSTER
>  	tristate '"cluster" match support'
>  	depends on NF_CONNTRACK
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index 8e5602f..9f12eeb 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
>  
>  # matches
>  obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o
> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o
>  obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
>  obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
>  obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 8d987c3..26306be 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par,
>  	if (XT_ALIGN(par->match->matchsize) != size &&
>  	    par->match->matchsize != -1) {
>  		/*
> -		 * ebt_among is exempt from centralized matchsize checking
> -		 * because it uses a dynamic-size data set.
> +		 * matches of variable size length, such as ebt_among,
> +		 * are exempt from centralized matchsize checking. They
> +		 * skip the test by setting xt_match.matchsize to -1.
>  		 */
>  		pr_err("%s_tables: %s.%u match: invalid size "
>  		       "%u (kernel) != (user) %u\n",
> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
> new file mode 100644
> index 0000000..07077c5
> --- /dev/null
> +++ b/net/netfilter/xt_bpf.c
> @@ -0,0 +1,88 @@
> +/* Xtables module to match packets using a BPF filter.
> + * Copyright 2012 Google Inc.
> + * Written by Willem de Bruijn <willemb@google.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/ipv6.h>
> +#include <linux/filter.h>
> +#include <net/ip.h>
> +
> +#include <linux/netfilter/xt_bpf.h>
> +#include <linux/netfilter/x_tables.h>
> +
> +MODULE_AUTHOR("Willem de Bruijn <willemb@google.com>");
> +MODULE_DESCRIPTION("Xtables: BPF filter match");
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS("ipt_bpf");
> +MODULE_ALIAS("ip6t_bpf");
> +
> +static int bpf_mt_check(const struct xt_mtchk_param *par)
> +{
> +	struct xt_bpf_info *info = par->matchinfo;
> +	const struct xt_entry_match *match;
> +	struct sock_fprog program;
> +	int expected_len;
> +
> +	match = container_of(par->matchinfo, const struct xt_entry_match, data);
> +	expected_len = sizeof(struct xt_entry_match) +
> +		       sizeof(struct xt_bpf_info) +
> +		       (sizeof(struct sock_filter) *
> +			info->bpf_program_num_elem);
> +
> +	if (match->u.match_size != expected_len) {
> +		pr_info("bpf: check failed: incorrect length\n");
> +		return -EINVAL;
> +	}
> +
> +	program.len = info->bpf_program_num_elem;
> +	program.filter = info->bpf_program;
> +	if (sk_unattached_filter_create(&info->filter, &program)) {
> +		pr_info("bpf: check failed: parse error\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par)
> +{
> +	const struct xt_bpf_info *info = par->matchinfo;
> +
> +	return SK_RUN_FILTER(info->filter, skb);
> +}
> +
> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par)
> +{
> +	const struct xt_bpf_info *info = par->matchinfo;
> +	sk_unattached_filter_destroy(info->filter);
> +}
> +
> +static struct xt_match bpf_mt_reg __read_mostly = {
> +	.name		= "bpf",
> +	.revision	= 0,
> +	.family		= NFPROTO_UNSPEC,
> +	.checkentry	= bpf_mt_check,
> +	.match		= bpf_mt,
> +	.destroy	= bpf_mt_destroy,
> +	.matchsize	= -1, /* skip xt_check_match because of dynamic len */
> +	.me		= THIS_MODULE,
> +};
> +
> +static int __init bpf_mt_init(void)
> +{
> +	return xt_register_match(&bpf_mt_reg);
> +}
> +
> +static void __exit bpf_mt_exit(void)
> +{
> +	xt_unregister_match(&bpf_mt_reg);
> +}
> +
> +module_init(bpf_mt_init);
> +module_exit(bpf_mt_exit);
> -- 
> 1.7.7.3
> 

^ permalink raw reply

* Re: [Patch 1/1] net/phy: Add interrupt support for dp83640 phy.
From: Stephan Gatzka @ 2012-12-05 19:52 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev, davem
In-Reply-To: <20121205100544.GA2293@netboy.at.omicron.at>


> The patch looks okay to me, but I worry that this might fail on boards
> which have not connected the phyer's PWERDOWN/INTN pin to anything.
> Such designs really need the PHY_POLL working.
> Taking a brief glance at the drivers for two such boards I know of
> (m5234bcc and an IXP), it looks like their MAC drivers set mii_bus irq
> to PHY_POLL, so it might work fine, but this patch still makes me
> nervous that some other board might break.
>
> Maybe this should be a kconfig option?

I don't think so.

Systems using device tree just don't specify the interrupt tag in the 
mdio section.

I have to admit that I don't know how how systems without employing 
device tree get the phy interrupt configured, maybe someone can explain 
that shortly?

Nevertheless, other drivers for very common phys like the lxt971 also 
just set the function pointers to config_intr and ack_interrupt and also 
set the flag PHY_HAS_INTERRUPT. So I don't think that the my patch 
breaks something.

Regards,

Stephan

^ permalink raw reply

* [PATCH 1/2 net-next] cnic: Reset iSCSI EQ during shutdown.
From: Michael Chan @ 2012-12-05 20:10 UTC (permalink / raw)
  To: davem; +Cc: netdev

Without the reset, reloading the cnic driver can cause the iSCSI
Event Queue to be out of sync with the driver and cause intermittent
crash.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Ariel Elior <ariele@broadcom.com>
---
 .../net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h    |    5 +++++
 drivers/net/ethernet/broadcom/cnic.c               |   19 +++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h
index 620fe93..60a83ad 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_fw_defs.h
@@ -23,6 +23,11 @@
 	(IRO[159].base + ((funcId) * IRO[159].m1))
 #define CSTORM_FUNC_EN_OFFSET(funcId) \
 	(IRO[149].base + ((funcId) * IRO[149].m1))
+#define CSTORM_HC_SYNC_LINE_INDEX_E1X_OFFSET(hcIndex, sbId) \
+	(IRO[139].base + ((hcIndex) * IRO[139].m1) + ((sbId) * IRO[139].m2))
+#define CSTORM_HC_SYNC_LINE_INDEX_E2_OFFSET(hcIndex, sbId) \
+	(IRO[138].base + (((hcIndex)>>2) * IRO[138].m1) + (((hcIndex)&3) \
+	* IRO[138].m2) + ((sbId) * IRO[138].m3))
 #define CSTORM_IGU_MODE_OFFSET (IRO[157].base)
 #define CSTORM_ISCSI_CQ_SIZE_OFFSET(pfId) \
 	(IRO[316].base + ((pfId) * IRO[316].m1))
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index cc8434f..2c1f66d 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -5344,8 +5344,27 @@ static void cnic_stop_bnx2_hw(struct cnic_dev *dev)
 static void cnic_stop_bnx2x_hw(struct cnic_dev *dev)
 {
 	struct cnic_local *cp = dev->cnic_priv;
+	u32 hc_index = HC_INDEX_ISCSI_EQ_CONS;
+	u32 sb_id = cp->status_blk_num;
+	u32 idx_off, syn_off;
 
 	cnic_free_irq(dev);
+
+	if (BNX2X_CHIP_IS_E2_PLUS(cp->chip_id)) {
+		idx_off = offsetof(struct hc_status_block_e2, index_values) +
+			  (hc_index * sizeof(u16));
+
+		syn_off = CSTORM_HC_SYNC_LINE_INDEX_E2_OFFSET(hc_index, sb_id);
+	} else {
+		idx_off = offsetof(struct hc_status_block_e1x, index_values) +
+			  (hc_index * sizeof(u16));
+
+		syn_off = CSTORM_HC_SYNC_LINE_INDEX_E1X_OFFSET(hc_index, sb_id);
+	}
+	CNIC_WR16(dev, BAR_CSTRORM_INTMEM + syn_off, 0);
+	CNIC_WR16(dev, BAR_CSTRORM_INTMEM + CSTORM_STATUS_BLOCK_OFFSET(sb_id) +
+		  idx_off, 0);
+
 	*cp->kcq1.hw_prod_idx_ptr = 0;
 	CNIC_WR(dev, BAR_CSTRORM_INTMEM +
 		CSTORM_ISCSI_EQ_CONS_OFFSET(cp->pfid, 0), 0);
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/2 net-next] cnic: Fix rare race condition during iSCSI disconnect.
From: Michael Chan @ 2012-12-05 20:10 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1354738215-6644-1-git-send-email-mchan@broadcom.com>

From: Eddie Wai <eddie.wai@broadcom.com>

If the initiator and target try to close the connection at about the same
time, there is a race condition in the termination sequence for bnx2x.
Fix the problem by waiting for the remote termination to complete before
deleting the Connection ID.  This will prevent the firmware assert.

Update version to 2.5.15.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/cnic.c    |   13 +++++++++++--
 drivers/net/ethernet/broadcom/cnic_if.h |    4 ++--
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 2c1f66d..1c2a851 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -3853,12 +3853,17 @@ static int cnic_cm_abort(struct cnic_sock *csk)
 		return cnic_cm_abort_req(csk);
 
 	/* Getting here means that we haven't started connect, or
-	 * connect was not successful.
+	 * connect was not successful, or it has been reset by the target.
 	 */
 
 	cp->close_conn(csk, opcode);
-	if (csk->state != opcode)
+	if (csk->state != opcode) {
+		/* Wait for remote reset sequence to complete */
+		while (test_bit(SK_F_PG_OFFLD_COMPLETE, &csk->flags))
+			msleep(1);
+
 		return -EALREADY;
+	}
 
 	return 0;
 }
@@ -3872,6 +3877,10 @@ static int cnic_cm_close(struct cnic_sock *csk)
 		csk->state = L4_KCQE_OPCODE_VALUE_CLOSE_COMP;
 		return cnic_cm_close_req(csk);
 	} else {
+		/* Wait for remote reset sequence to complete */
+		while (test_bit(SK_F_PG_OFFLD_COMPLETE, &csk->flags))
+			msleep(1);
+
 		return -EALREADY;
 	}
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 865095a..502e11e 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -14,8 +14,8 @@
 
 #include "bnx2x/bnx2x_mfw_req.h"
 
-#define CNIC_MODULE_VERSION	"2.5.14"
-#define CNIC_MODULE_RELDATE	"Sep 30, 2012"
+#define CNIC_MODULE_VERSION	"2.5.15"
+#define CNIC_MODULE_RELDATE	"Dec 04, 2012"
 
 #define CNIC_ULP_RDMA		0
 #define CNIC_ULP_ISCSI		1
-- 
1.7.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox