Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: b43legacy: fix compile error
From: Eric Dumazet @ 2010-10-25 15:51 UTC (permalink / raw)
  To: Larry Finger
  Cc: Arnd Hannemann, David S. Miller, netdev, linux-kernel,
	linux-wireless
In-Reply-To: <4CC5A301.1080606@lwfinger.net>

Le lundi 25 octobre 2010 à 10:32 -0500, Larry Finger a écrit :
> On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
> > On todays linus tree the following compile error happened to me:
> > 
> >   CC [M]  drivers/net/wireless/b43legacy/xmit.o
> > In file included from include/net/dst.h:11,
> >                  from drivers/net/wireless/b43legacy/xmit.c:31:
> > include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
> > include/net/dst_ops.h: In function 'dst_entries_get_fast':
> > include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_get_slow':
> > include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_add':
> > include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_init':
> > include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > include/net/dst_ops.h: In function 'dst_entries_destroy':
> > include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> > make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
> > make[3]: *** [drivers/net/wireless/b43legacy] Error 2
> > make[2]: *** [drivers/net/wireless] Error 2
> > make[1]: *** [drivers/net] Error 2
> > make: *** [drivers] Error 2
> > 
> > This patch fixes this issue by adding "linux/cache.h" as an include to
> > "include/net/dst_ops.h".
> 
> Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
> v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.

Well, x86_64 must include cache.h, this is probably why I missed it in
my build tests.

I wonder also why #include <net/dst.h> is needed at all in this
driver...

diff --git a/drivers/net/wireless/b43legacy/xmit.c
b/drivers/net/wireless/b43legacy/xmit.c
index 7d177d9..a261aec 100644
--- a/drivers/net/wireless/b43legacy/xmit.c
+++ b/drivers/net/wireless/b43legacy/xmit.c
@@ -28,8 +28,6 @@
 
 */
 
-#include <net/dst.h>
-
 #include "xmit.h"
 #include "phy.h"
 #include "dma.h"

^ permalink raw reply related

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-25 15:50 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, mst, netdev, rusty
In-Reply-To: <20101020085452.15579.76002.sendpatchset@krkumar2.in.ibm.com>

> Krishna Kumar2/India/IBM@IBMIN wrote on 10/20/2010 02:24:52 PM:

Any feedback, comments, objections, issues or bugs about the
patches? Please let me know if something needs to be done.

Some more test results:
_____________________________________________________
         Host->Guest BW (numtxqs=2)
#       BW%     CPU%    RCPU%   SD%     RSD%
_____________________________________________________
1       5.53    .31     .67     -5.88   0
2       -2.11   -1.01   -2.08   4.34    0
4       13.53   10.77   13.87   -1.96   0
8       34.22   22.80   30.53   -8.46   -2.50
16      30.89   24.06   35.17   -5.20   3.20
24      33.22   26.30   43.39   -5.17   7.58
32      30.85   27.27   47.74   -.59    15.51
40      33.80   27.33   48.00   -7.42   7.59
48      45.93   26.33   45.46   -12.24  1.10
64      33.51   27.11   45.00   -3.27   10.30
80      39.28   29.21   52.33   -4.88   12.17
96      32.05   31.01   57.72   -1.02   19.05
128     35.66   32.04   60.00   -.66    20.41
_____________________________________________________
BW: 23.5%  CPU/RCPU: 28.6%,51.2%  SD/RSD: -2.6%,15.8%

____________________________________________________
Guest->Host 512 byte (numtxqs=2):
#       BW%     CPU%    RCPU%   SD%     RSD%
_____________________________________________________
1       3.02    -3.84   -4.76   -12.50  -7.69
2       52.77   -15.73  -8.66   -45.31  -40.33
4       -23.14  13.84   7.50    50.58   40.81
8       -21.44  28.08   16.32   63.06   47.43
16      33.53   46.50   27.19   7.61    -6.60
24      55.77   42.81   30.49   -8.65   -16.48
32      52.59   38.92   29.08   -9.18   -15.63
40      50.92   36.11   28.92   -10.59  -15.30
48      46.63   34.73   28.17   -7.83   -12.32
64      45.56   37.12   28.81   -5.05   -10.80
80      44.55   36.60   28.45   -4.95   -10.61
96      43.02   35.97   28.89   -.11    -5.31
128     38.54   33.88   27.19   -4.79   -9.54
_____________________________________________________
BW: 34.4%  CPU/RCPU: 35.9%,27.8%  SD/RSD: -4.1%,-9.3%


Thanks,

- KK



> [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
>
> Following set of patches implement transmit MQ in virtio-net.  Also
> included is the user qemu changes.  MQ is disabled by default unless
> qemu specifies it.
>
>                   Changes from rev2:
>                   ------------------
> 1. Define (in virtio_net.h) the maximum send txqs; and use in
>    virtio-net and vhost-net.
> 2. vi->sq[i] is allocated individually, resulting in cache line
>    aligned sq[0] to sq[n].  Another option was to define
>    'send_queue' as:
>        struct send_queue {
>                struct virtqueue *svq;
>                struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
>        } ____cacheline_aligned_in_smp;
>    and to statically allocate 'VIRTIO_MAX_SQ' of those.  I hope
>    the submitted method is preferable.
> 3. Changed vhost model such that vhost[0] handles RX and vhost[1-MAX]
>    handles TX[0-n].
> 4. Further change TX handling such that vhost[0] handles both RX/TX
>    for single stream case.
>
>                   Enabling MQ on virtio:
>                   -----------------------
> When following options are passed to qemu:
>         - smp > 1
>         - vhost=on
>         - mq=on (new option, default:off)
> then #txqueues = #cpus.  The #txqueues can be changed by using an
> optional 'numtxqs' option.  e.g. for a smp=4 guest:
>         vhost=on                   ->   #txqueues = 1
>         vhost=on,mq=on             ->   #txqueues = 4
>         vhost=on,mq=on,numtxqs=2   ->   #txqueues = 2
>         vhost=on,mq=on,numtxqs=8   ->   #txqueues = 8
>
>
>                    Performance (guest -> local host):
>                    -----------------------------------
> System configuration:
>         Host:  8 Intel Xeon, 8 GB memory
>         Guest: 4 cpus, 2 GB memory
> Test: Each test case runs for 60 secs, sum over three runs (except
> when number of netperf sessions is 1, which has 10 runs of 12 secs
> each).  No tuning (default netperf) other than taskset vhost's to
> cpus 0-3.  numtxqs=32 gave the best results though the guest had
> only 4 vcpus (I haven't tried beyond that).
>
> ______________ numtxqs=2, vhosts=3  ____________________
> #sessions  BW%      CPU%    RCPU%    SD%      RSD%
> ________________________________________________________
> 1          4.46    -1.96     .19     -12.50   -6.06
> 2          4.93    -1.16    2.10      0       -2.38
> 4          46.17    64.77   33.72     19.51   -2.48
> 8          47.89    70.00   36.23     41.46    13.35
> 16         48.97    80.44   40.67     21.11   -5.46
> 24         49.03    78.78   41.22     20.51   -4.78
> 32         51.11    77.15   42.42     15.81   -6.87
> 40         51.60    71.65   42.43     9.75    -8.94
> 48         50.10    69.55   42.85     11.80   -5.81
> 64         46.24    68.42   42.67     14.18   -3.28
> 80         46.37    63.13   41.62     7.43    -6.73
> 96         46.40    63.31   42.20     9.36    -4.78
> 128        50.43    62.79   42.16     13.11   -1.23
> ________________________________________________________
> BW: 37.2%,  CPU/RCPU: 66.3%,41.6%,  SD/RSD: 11.5%,-3.7%
>
> ______________ numtxqs=8, vhosts=5  ____________________
> #sessions   BW%      CPU%     RCPU%     SD%      RSD%
> ________________________________________________________
> 1           -.76    -1.56     2.33      0        3.03
> 2           17.41    11.11    11.41     0       -4.76
> 4           42.12    55.11    30.20     19.51    .62
> 8           54.69    80.00    39.22     24.39    -3.88
> 16          54.77    81.62    40.89     20.34    -6.58
> 24          54.66    79.68    41.57     15.49    -8.99
> 32          54.92    76.82    41.79     17.59    -5.70
> 40          51.79    68.56    40.53     15.31    -3.87
> 48          51.72    66.40    40.84     9.72     -7.13
> 64          51.11    63.94    41.10     5.93     -8.82
> 80          46.51    59.50    39.80     9.33     -4.18
> 96          47.72    57.75    39.84     4.20     -7.62
> 128         54.35    58.95    40.66     3.24     -8.63
> ________________________________________________________
> BW: 38.9%,  CPU/RCPU: 63.0%,40.1%,  SD/RSD: 6.0%,-7.4%
>
> ______________ numtxqs=16, vhosts=5  ___________________
> #sessions   BW%      CPU%     RCPU%     SD%      RSD%
> ________________________________________________________
> 1           -1.43    -3.52    1.55      0          3.03
> 2           33.09     21.63   20.12    -10.00     -9.52
> 4           67.17     94.60   44.28     19.51     -11.80
> 8           75.72     108.14  49.15     25.00     -10.71
> 16          80.34     101.77  52.94     25.93     -4.49
> 24          70.84     93.12   43.62     27.63     -5.03
> 32          69.01     94.16   47.33     29.68     -1.51
> 40          58.56     63.47   25.91    -3.92      -25.85
> 48          61.16     74.70   34.88     .89       -22.08
> 64          54.37     69.09   26.80    -6.68      -30.04
> 80          36.22     22.73   -2.97    -8.25      -27.23
> 96          41.51     50.59   13.24     9.84      -16.77
> 128         48.98     38.15   6.41     -.33       -22.80
> ________________________________________________________
> BW: 46.2%,  CPU/RCPU: 55.2%,18.8%,  SD/RSD: 1.2%,-22.0%
>
> ______________ numtxqs=32, vhosts=5  ___________________
> #            BW%       CPU%    RCPU%    SD%     RSD%
> ________________________________________________________
> 1            7.62     -38.03   -26.26  -50.00   -33.33
> 2            28.95     20.46    21.62   0       -7.14
> 4            84.05     60.79    45.74  -2.43    -12.42
> 8            86.43     79.57    50.32   15.85   -3.10
> 16           88.63     99.48    58.17   9.47    -13.10
> 24           74.65     80.87    41.99  -1.81    -22.89
> 32           63.86     59.21    23.58  -18.13   -36.37
> 40           64.79     60.53    22.23  -15.77   -35.84
> 48           49.68     26.93    .51    -36.40   -49.61
> 64           54.69     36.50    5.41   -26.59   -43.23
> 80           45.06     12.72   -13.25  -37.79   -52.08
> 96           40.21    -3.16    -24.53  -39.92   -52.97
> 128          36.33    -33.19   -43.66  -5.68    -20.49
> ________________________________________________________
> BW: 49.3%,  CPU/RCPU: 15.5%,-8.2%,  SD/RSD: -22.2%,-37.0%
>
>
> Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>


^ permalink raw reply

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Hin-Tak Leung @ 2010-10-25 15:39 UTC (permalink / raw)
  To: John W. Linville
  Cc: Larry Finger, Serafeim Zanikolas, herton, joe, davem,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20101025142232.GC2414@tuxdriver.com>


--- On Mon, 25/10/10, John W. Linville <linville@tuxdriver.com> wrote:

> > I had a quick look for similiar constructs and AFAIK
> only the
> > b43/b43legacy drivers uses DMA buffers. Seems to be a
> rare practice.
> > Is that something we should or should not do?
> 
> It doesn't mean what you think it means.  It is a
> relic of the past,
> used to indicate memory below 16MB so that ISA devices
> could do DMA.

okay - sorry about the confusion - I was grep'ing for GFP_DMA and only b43/b43lagacy have it and it is relatively rare. AFAIK none of the rtl8187 devices are non-USB... probably a NACK then, but I should ask Serafeim if there is a reason for him to submit this patch? (other than "it says dma"...) 

Hin-tak


      

^ permalink raw reply

* Re: [PATCH] net: b43legacy: fix compile error
From: Larry Finger @ 2010-10-25 15:32 UTC (permalink / raw)
  To: Arnd Hannemann; +Cc: David S. Miller, netdev, linux-kernel, linux-wireless
In-Reply-To: <1288017690-31248-1-git-send-email-arnd@arndnet.de>

On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
> On todays linus tree the following compile error happened to me:
> 
>   CC [M]  drivers/net/wireless/b43legacy/xmit.o
> In file included from include/net/dst.h:11,
>                  from drivers/net/wireless/b43legacy/xmit.c:31:
> include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
> include/net/dst_ops.h: In function 'dst_entries_get_fast':
> include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_get_slow':
> include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_add':
> include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_init':
> include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> include/net/dst_ops.h: In function 'dst_entries_destroy':
> include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
> make[2]: *** [drivers/net/wireless] Error 2
> make[1]: *** [drivers/net] Error 2
> make: *** [drivers] Error 2
> 
> This patch fixes this issue by adding "linux/cache.h" as an include to
> "include/net/dst_ops.h".

Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.

Larry


^ permalink raw reply

* [PATCH] net: b43legacy: fix compile error
From: Arnd Hannemann @ 2010-10-25 14:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, linux-wireless, Arnd Hannemann

On todays linus tree the following compile error happened to me:

  CC [M]  drivers/net/wireless/b43legacy/xmit.o
In file included from include/net/dst.h:11,
                 from drivers/net/wireless/b43legacy/xmit.c:31:
include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
include/net/dst_ops.h: In function 'dst_entries_get_fast':
include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_get_slow':
include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_add':
include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_init':
include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
include/net/dst_ops.h: In function 'dst_entries_destroy':
include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
make[3]: *** [drivers/net/wireless/b43legacy] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

This patch fixes this issue by adding "linux/cache.h" as an include to
"include/net/dst_ops.h".

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
---
 include/net/dst_ops.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 1fa5306..51665b3 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -2,6 +2,7 @@
 #define _NET_DST_OPS_H
 #include <linux/types.h>
 #include <linux/percpu_counter.h>
+#include <linux/cache.h>
 
 struct dst_entry;
 struct kmem_cachep;
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: John W. Linville @ 2010-10-25 14:22 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Larry Finger, Serafeim Zanikolas, herton, joe, davem,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <4CC5900A.2050209@users.sourceforge.net>

On Mon, Oct 25, 2010 at 03:11:22PM +0100, Hin-Tak Leung wrote:
> 
> 
> Larry Finger wrote:
> >On 10/24/2010 03:32 PM, Serafeim Zanikolas wrote:
> >>Despite the indicated intention in comment, the kmalloc() call was not
> >>explicitly requesting memory from ZONE_DMA.
> >>
> >>Signed-off-by: Serafeim Zanikolas <sez@debian.org>
> >>---
> >> drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
> >> 1 files changed, 2 insertions(+), 1 deletions(-)
> >>
> >>diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> >>index 38fa824..771794d 100644
> >>--- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
> >>+++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> >>@@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
> >> 	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
> >> 	/* allocate "DMA aware" buffer for register accesses */
> >>-	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
> >>+	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
> >>+				  GFP_DMA | GFP_KERNEL);
> >> 	if (!priv->io_dmabuf) {
> >> 		err = -ENOMEM;
> >> 		goto err_free_dev;
> >
> >ACK.
> >
> >Larry
> 
> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
> 
> I had a quick look for similiar constructs and AFAIK only the
> b43/b43legacy drivers uses DMA buffers. Seems to be a rare practice.
> Is that something we should or should not do?

It doesn't mean what you think it means.  It is a relic of the past,
used to indicate memory below 16MB so that ISA devices could do DMA.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Johannes Berg @ 2010-10-25 14:23 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: Larry Finger, Serafeim Zanikolas, herton, linville, joe, davem,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <4CC5900A.2050209@users.sourceforge.net>

On Mon, 2010-10-25 at 15:11 +0100, Hin-Tak Leung wrote:

> >> Despite the indicated intention in comment, the kmalloc() call was not
> >> explicitly requesting memory from ZONE_DMA.

> I had a quick look for similiar constructs and AFAIK only the b43/b43legacy 
> drivers uses DMA buffers. Seems to be a rare practice. Is that something we 
> should or should not do?

I think there's some confusion here about ZONE_DMA vs. DMA-able memory.
All memory you get with kmalloc can be used for DMA, GFP_DMA means using
ZONE_DMA which is a hack for ISA (and in b43 maybe PCMCIA/Cardbus)
devices to put memory into something they can address. I don't think the
latter is necessary for USB devices.

johannes

^ permalink raw reply

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Hin-Tak Leung @ 2010-10-25 14:11 UTC (permalink / raw)
  To: Larry Finger
  Cc: Serafeim Zanikolas, herton, linville, joe, davem, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <4CC5851D.1040204@lwfinger.net>



Larry Finger wrote:
> On 10/24/2010 03:32 PM, Serafeim Zanikolas wrote:
>> Despite the indicated intention in comment, the kmalloc() call was not
>> explicitly requesting memory from ZONE_DMA.
>>
>> Signed-off-by: Serafeim Zanikolas <sez@debian.org>
>> ---
>>  drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
>> index 38fa824..771794d 100644
>> --- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
>> +++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
>> @@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
>>  	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
>>  
>>  	/* allocate "DMA aware" buffer for register accesses */
>> -	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
>> +	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
>> +				  GFP_DMA | GFP_KERNEL);
>>  	if (!priv->io_dmabuf) {
>>  		err = -ENOMEM;
>>  		goto err_free_dev;
> 
> ACK.
> 
> Larry

Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>

I had a quick look for similiar constructs and AFAIK only the b43/b43legacy 
drivers uses DMA buffers. Seems to be a rare practice. Is that something we 
should or should not do?

Hin-Tak

^ permalink raw reply

* Re: VLAN packets silently dropped in promiscuous mode
From: Guillaume Gaudonville @ 2010-10-25 13:48 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <AANLkTi=MYxSUzVUF2sf1G32Z2EcjhfyOJ4EZyv5ePGWM@mail.gmail.com>

Jesse Gross wrote:
> On Fri, Oct 15, 2010 at 2:16 AM, Guillaume Gaudonville
> <guillaume.gaudonville@6wind.com> wrote:
>   
>> Jesse Gross wrote:
>>     
>>> On Thu, Sep 30, 2010 at 1:07 AM, Roger Luethi <rl@hellgate.ch> wrote:
>>>
>>>       
>>>> On Wed, 29 Sep 2010 10:44:26 -0700, Jesse Gross wrote:
>>>>
>>>>         
>>>>> On Wed, Sep 29, 2010 at 4:37 AM, Roger Luethi <rl@hellgate.ch> wrote:
>>>>>
>>>>>           
>>>>>> I noticed packets for unknown VLANs getting silently dropped even in
>>>>>> promiscuous mode (this is true only for the hardware accelerated path).
>>>>>> netif_nit_deliver was introduced specifically to prevent that, but the
>>>>>> function gets called only _after_ packets from unknown VLANs have been
>>>>>> dropped.
>>>>>>
>>>>>>             
>>>>> Some drivers are fixing this on a case by case basis by disabling
>>>>> hardware accelerated VLAN stripping when in promiscuous mode, i.e.:
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f6c01819979afbfec7e0b15fe52371b8eed87e8
>>>>>
>>>>> However, at this point it is more or less random which drivers do
>>>>> this.  It would obviously be much better if it were consistent.
>>>>>
>>>>>           
>>>> My understanding is this. Hardware VLAN tagging and stripping can always
>>>> be
>>>> enabled. The kernel passes 802.1Q information along with the stripped
>>>> header to libpcap which reassembles the original header where necessary.
>>>> Works for me.
>>>>
>>>>         
>>> Sorry, I misread your original post as saying that the VLAN header
>>> gets dropped, rather than the entire packet.  I agree that this is how
>>> it should work but not necessarily how it does work (again, depending
>>> on the driver).  Here's the problem that I was talking about:
>>>
>>> Most drivers have a snippet of code that looks something like this
>>> (taken from ixgbe):
>>>
>>> if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>>>        vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
>>> else
>>>        napi_gro_receive(napi, skb);
>>>
>>> At this point the VLAN has already been stripped in hardware.  If
>>> there is no VLAN group configured on the device then we hit the second
>>> case.  The VLAN header was removed from the SKB and the tag variable
>>> is unused.  It is no longer possible for libpcap to reconstruct the
>>> header because the information was thrown away (even the fact that
>>> there was a VLAN tag at all).
>>>
>>> There are a couple ways to fix this:
>>>
>>> * Turn off VLAN stripping when in promiscuous mode (as done by the ixgbe
>>> driver)
>>>
>>>       
>> This is not totally true: if changing the MTU ixgbe_change_mtu will call:
>> ixgbe_reinit_locked--> ixgbe_up --> ixgbe_configure:
>>                --> ixgbe_set_rx_mode: flag IFF_PROMISC is tested
>> ixgbe_vlan_filter_enable is not called
>>                --> ixgbe_restore_vlan --> ixgbe_vlan_rx_register: flag
>> IFF_PROMISC is not tested ixgbe_vlan_filter_enable
>>                     will be called.
>>
>> In fact it should happen each time we configure something which needs a
>> reset of the device. Why don't add a test
>> on flag promiscuous directly in ixgbe_vlan_filter_enable? Or do it on each
>> call, if we want to allow a device in promiscuous
>> mode to enable this feature.
>>
>> What do you think?
>>     
>
> I can believe that there are paths that lead to this not working
> correctly.  That was actually my larger point: this is something that
> is commonly not implemented correctly in drivers.  Rather than try to
> study every driver my goal is to just avoid the problem completely by
> handling vlan acceleration centrally in the networking core.  I sent
> out an RFC patch series a few days ago that should solve this problem:
>
> http://marc.info/?l=linux-netdev&m=128700022614170&w=3
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
Thank you, I'm going to check these patches and try to apply them in our 
kernel.

Best Regards,

-- 
Guillaume Gaudonville
6WIND
Software Engineer

Tel: +33 1 39 30 92 63
Mob: +33 6 47 85 34 33
Fax: +33 1 39 30 92 11
guillaume.gaudonville@6wind.com
www.6wind.com
Join the Multicore Packet Processing Forum: www.multicorepacketprocessing.com

Ce courriel ainsi que toutes les pièces jointes, est uniquement destiné à son ou ses destinataires. Il contient des informations confidentielles qui sont la propriété de 6WIND. Toute révélation, distribution ou copie des informations qu'il contient est strictement interdite. Si vous avez reçu ce message par erreur, veuillez immédiatement le signaler à l'émetteur et détruire toutes les données reçues

This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to 6WIND. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


^ permalink raw reply

* RE: Question w.r.t debugfs / netdevice pass-through IOCTL
From: Shyam_Iyer @ 2010-10-25 13:48 UTC (permalink / raw)
  To: john.r.fastabend, shemminger; +Cc: ddutt, netdev
In-Reply-To: <4CC0A0E9.9090903@intel.com>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of John Fastabend
> Sent: Thursday, October 21, 2010 4:22 PM
> To: Stephen Hemminger
> Cc: Debashis Dutt; netdev@vger.kernel.org
> Subject: Re: Question w.r.t debugfs / netdevice pass-through IOCTL
> 
> On 10/20/2010 9:19 PM, Stephen Hemminger wrote:
> > On Wed, 20 Oct 2010 20:26:50 -0700
> > Debashis Dutt <ddutt@Brocade.COM> wrote:
> >
> >> Hi,
> >>
> >> For the Brocade 10G Ethernet driver (bna) we want to implement a set of operations which is not
> supported by current tools like ethtool.
> >>
> >> Examples of such operations would be
> >>        a) Queries related to CEE, if the link is CEE.
> 
> Assuming CEE is Converged Enhanced Ethernet here.
> 
> For CEE queries please consider using the dcbnl interface in /net/dcb/dcbnl.c. If
> it is missing an interface that would be useful to all DCB devices we could
> entertain adding it. Also this way DCB queries will work with existing tools that
> query these things lldpad/dcbtool.
> 
> The things you would want to know about a CEE device should be about the same
> regardless of the hardware in use lets try to use a single interface and avoid
> private interfaces.

John - I agree on this.. On a sidenote I would like the interface to not be netdev device specific.. 
View the CEE device to be not just an Ethernet controller but a SCSI controller as well and hence if this interface can be generally accessible by other subsystems.

> 
> Thanks,
> John.
> 
> >>        b) Get traces from firmware.
> >
> >>
> >> I was wondering what would be right approach to take here:
> >>                 a) use debugfs (like the Chelsio cxgb4 driver)
> > Works as long as they are really debug operations. The debugfs isn't always
> > available, and support should be a config option for your driver.
> >
> >>                 b) use SIOCDEVPRIVATE for the pass through IOCTL defined in
> >>                     struct net_device_ops{}
> >
> > The problem with ioctl is it doesn't work for 32 bit user space
> > compatiablity. The ioctl compat layer does not have enough context
> > to translate SIOCDEVPRIVATE
> >
> >>                     As per comments in the header file, b) should not be used
> >>                     since this IOCTL is supposed to be deprecated.
> >>                 c) use procfs / sysfs (these may not scale, in our opinion)
> >
> > Although less common, there were drivers putting things in /proc/net/xxx/ethX
> >
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] net: add __rcu annotation to sk_filter
From: Eric Dumazet @ 2010-10-25 13:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |    2 +-
 net/core/filter.c  |    4 ++--
 net/core/sock.c    |    2 +-
 net/ipv4/udp.c     |    2 +-
 net/ipv6/raw.c     |    2 +-
 net/ipv6/udp.c     |    2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 73a4f97..c7a7362 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -301,7 +301,7 @@ struct sock {
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
 	long			sk_sndtimeo;
-	struct sk_filter      	*sk_filter;
+	struct sk_filter __rcu	*sk_filter;
 	void			*sk_protinfo;
 	struct timer_list	sk_timer;
 	ktime_t			sk_stamp;
diff --git a/net/core/filter.c b/net/core/filter.c
index 7adf503..7beaec3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -89,8 +89,8 @@ int sk_filter(struct sock *sk, struct sk_buff *skb)
 	rcu_read_lock_bh();
 	filter = rcu_dereference_bh(sk->sk_filter);
 	if (filter) {
-		unsigned int pkt_len = sk_run_filter(skb, filter->insns,
-				filter->len);
+		unsigned int pkt_len = sk_run_filter(skb, filter->insns, filter->len);
+
 		err = pkt_len ? pskb_trim(skb, pkt_len) : -EPERM;
 	}
 	rcu_read_unlock_bh();
diff --git a/net/core/sock.c b/net/core/sock.c
index 11db436..3eed542 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1225,7 +1225,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 		sock_reset_flag(newsk, SOCK_DONE);
 		skb_queue_head_init(&newsk->sk_error_queue);
 
-		filter = newsk->sk_filter;
+		filter = rcu_dereference_protected(newsk->sk_filter, 1);
 		if (filter != NULL)
 			sk_filter_charge(newsk, filter);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b3f7e8c..28cb2d7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1413,7 +1413,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		}
 	}
 
-	if (sk->sk_filter) {
+	if (rcu_dereference_raw(sk->sk_filter)) {
 		if (udp_lib_checksum_complete(skb))
 			goto drop;
 	}
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 45e6efb..86c3952 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -373,7 +373,7 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr,
 
 static inline int rawv6_rcv_skb(struct sock * sk, struct sk_buff * skb)
 {
-	if ((raw6_sk(sk)->checksum || sk->sk_filter) &&
+	if ((raw6_sk(sk)->checksum || rcu_dereference_raw(sk->sk_filter)) &&
 	    skb_checksum_complete(skb)) {
 		atomic_inc(&sk->sk_drops);
 		kfree_skb(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c84dad4..91def93 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -527,7 +527,7 @@ int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
 		}
 	}
 
-	if (sk->sk_filter) {
+	if (rcu_dereference_raw(sk->sk_filter)) {
 		if (udp_lib_checksum_complete(skb))
 			goto drop;
 	}



^ permalink raw reply related

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: John W. Linville @ 2010-10-25 13:35 UTC (permalink / raw)
  To: Serafeim Zanikolas
  Cc: herton, htl10, Larry.Finger, joe, davem, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1287952327-9924-1-git-send-email-sez@debian.org>

On Sun, Oct 24, 2010 at 10:32:07PM +0200, Serafeim Zanikolas wrote:
> Despite the indicated intention in comment, the kmalloc() call was not
> explicitly requesting memory from ZONE_DMA.
> 
> Signed-off-by: Serafeim Zanikolas <sez@debian.org>
> ---
>  drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> index 38fa824..771794d 100644
> --- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
> +++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> @@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
>  	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
>  
>  	/* allocate "DMA aware" buffer for register accesses */
> -	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
> +	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
> +				  GFP_DMA | GFP_KERNEL);
>  	if (!priv->io_dmabuf) {
>  		err = -ENOMEM;
>  		goto err_free_dev;

Are you sure about this?  Are there USB controllers out there with
the ISA 16MB limitation for DMA?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* [PATCH] ipv4: add __rcu annotations to ip_ra_chain
From: Eric Dumazet @ 2010-10-25 13:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
        (struct ip_ra_chain)->next
	struct ip_ra_chain *ip_ra_chain;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ip.h       |    4 ++--
 net/ipv4/ip_sockglue.c |   10 +++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index dbee3fe..86e2b18 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -59,7 +59,7 @@ struct ipcm_cookie {
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
 
 struct ip_ra_chain {
-	struct ip_ra_chain	*next;
+	struct ip_ra_chain __rcu *next;
 	struct sock		*sk;
 	union {
 		void			(*destructor)(struct sock *);
@@ -68,7 +68,7 @@ struct ip_ra_chain {
 	struct rcu_head		rcu;
 };
 
-extern struct ip_ra_chain *ip_ra_chain;
+extern struct ip_ra_chain __rcu *ip_ra_chain;
 
 /* IP flags. */
 #define IP_CE		0x8000		/* Flag: "Congestion"		*/
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 64b70ad..3948c86 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -238,7 +238,7 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
    but receiver should be enough clever f.e. to forward mtrace requests,
    sent to multicast group to reach destination designated router.
  */
-struct ip_ra_chain *ip_ra_chain;
+struct ip_ra_chain __rcu *ip_ra_chain;
 static DEFINE_SPINLOCK(ip_ra_lock);
 
 
@@ -253,7 +253,8 @@ static void ip_ra_destroy_rcu(struct rcu_head *head)
 int ip_ra_control(struct sock *sk, unsigned char on,
 		  void (*destructor)(struct sock *))
 {
-	struct ip_ra_chain *ra, *new_ra, **rap;
+	struct ip_ra_chain *ra, *new_ra;
+	struct ip_ra_chain __rcu **rap;
 
 	if (sk->sk_type != SOCK_RAW || inet_sk(sk)->inet_num == IPPROTO_RAW)
 		return -EINVAL;
@@ -261,7 +262,10 @@ int ip_ra_control(struct sock *sk, unsigned char on,
 	new_ra = on ? kmalloc(sizeof(*new_ra), GFP_KERNEL) : NULL;
 
 	spin_lock_bh(&ip_ra_lock);
-	for (rap = &ip_ra_chain; (ra = *rap) != NULL; rap = &ra->next) {
+	for (rap = &ip_ra_chain;
+	     (ra = rcu_dereference_protected(*rap,
+			lockdep_is_held(&ip_ra_lock))) != NULL;
+	     rap = &ra->next) {
 		if (ra->sk == sk) {
 			if (on) {
 				spin_unlock_bh(&ip_ra_lock);



^ permalink raw reply related

* Re: [PATCH] drivers: rtl818x: request DMA-able memory
From: Larry Finger @ 2010-10-25 13:24 UTC (permalink / raw)
  To: Serafeim Zanikolas
  Cc: herton, htl10, linville, joe, davem, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1287952327-9924-1-git-send-email-sez@debian.org>

On 10/24/2010 03:32 PM, Serafeim Zanikolas wrote:
> Despite the indicated intention in comment, the kmalloc() call was not
> explicitly requesting memory from ZONE_DMA.
> 
> Signed-off-by: Serafeim Zanikolas <sez@debian.org>
> ---
>  drivers/net/wireless/rtl818x/rtl8187_dev.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/wireless/rtl818x/rtl8187_dev.c b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> index 38fa824..771794d 100644
> --- a/drivers/net/wireless/rtl818x/rtl8187_dev.c
> +++ b/drivers/net/wireless/rtl818x/rtl8187_dev.c
> @@ -1343,7 +1343,8 @@ static int __devinit rtl8187_probe(struct usb_interface *intf,
>  	priv->is_rtl8187b = (id->driver_info == DEVICE_RTL8187B);
>  
>  	/* allocate "DMA aware" buffer for register accesses */
> -	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf), GFP_KERNEL);
> +	priv->io_dmabuf = kmalloc(sizeof(*priv->io_dmabuf),
> +				  GFP_DMA | GFP_KERNEL);
>  	if (!priv->io_dmabuf) {
>  		err = -ENOMEM;
>  		goto err_free_dev;

ACK.

Larry


^ permalink raw reply

* [PATCH] net_ns: add __rcu annotations
From: Eric Dumazet @ 2010-10-25 13:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/net_namespace.h |    2 +-
 net/core/net_namespace.c    |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 65af9a0..1bf812b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -88,7 +88,7 @@ struct net {
 #ifdef CONFIG_WEXT_CORE
 	struct sk_buff_head	wext_nlevents;
 #endif
-	struct net_generic	*gen;
+	struct net_generic __rcu	*gen;
 
 	/* Note : following structs are cache line aligned */
 #ifdef CONFIG_XFRM
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index c988e68..3f86026 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -42,7 +42,9 @@ static int net_assign_generic(struct net *net, int id, void *data)
 	BUG_ON(!mutex_is_locked(&net_mutex));
 	BUG_ON(id == 0);
 
-	ng = old_ng = net->gen;
+	old_ng = rcu_dereference_protected(net->gen,
+					   lockdep_is_held(&net_mutex));
+	ng = old_ng;
 	if (old_ng->len >= id)
 		goto assign;
 



^ permalink raw reply related

* [PATCH] rps: add __rcu annotations
From: Eric Dumazet @ 2010-10-25 13:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Add __rcu annotations to :
	(struct netdev_rx_queue)->rps_map
	(struct netdev_rx_queue)->rps_flow_table
	struct rps_sock_flow_table *rps_sock_flow_table;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h  |   12 ++++++------
 net/core/dev.c             |   12 ++++++------
 net/core/net-sysfs.c       |   20 +++++++++++++-------
 net/core/sysctl_net_core.c |    3 ++-
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fcd3dda..2475206 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -585,15 +585,15 @@ static inline void rps_reset_sock_flow(struct rps_sock_flow_table *table,
 		table->ents[hash & table->mask] = RPS_NO_CPU;
 }
 
-extern struct rps_sock_flow_table *rps_sock_flow_table;
+extern struct rps_sock_flow_table __rcu *rps_sock_flow_table;
 
 /* This structure contains an instance of an RX queue. */
 struct netdev_rx_queue {
-	struct rps_map *rps_map;
-	struct rps_dev_flow_table *rps_flow_table;
-	struct kobject kobj;
-	struct netdev_rx_queue *first;
-	atomic_t count;
+	struct rps_map __rcu		*rps_map;
+	struct rps_dev_flow_table __rcu	*rps_flow_table;
+	struct kobject			kobj;
+	struct netdev_rx_queue		*first;
+	atomic_t			count;
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 78b5a89..625fde2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2413,7 +2413,7 @@ EXPORT_SYMBOL(__skb_get_rxhash);
 #ifdef CONFIG_RPS
 
 /* One global table that all flow-based protocols share. */
-struct rps_sock_flow_table *rps_sock_flow_table __read_mostly;
+struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly;
 EXPORT_SYMBOL(rps_sock_flow_table);
 
 /*
@@ -2425,7 +2425,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		       struct rps_dev_flow **rflowp)
 {
 	struct netdev_rx_queue *rxqueue;
-	struct rps_map *map = NULL;
+	struct rps_map *map;
 	struct rps_dev_flow_table *flow_table;
 	struct rps_sock_flow_table *sock_flow_table;
 	int cpu = -1;
@@ -2444,15 +2444,15 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	} else
 		rxqueue = dev->_rx;
 
-	if (rxqueue->rps_map) {
-		map = rcu_dereference(rxqueue->rps_map);
-		if (map && map->len == 1) {
+	map = rcu_dereference(rxqueue->rps_map);
+	if (map) {
+		if (map->len == 1) {
 			tcpu = map->cpus[0];
 			if (cpu_online(tcpu))
 				cpu = tcpu;
 			goto done;
 		}
-	} else if (!rxqueue->rps_flow_table) {
+	} else if (!rcu_dereference_raw(rxqueue->rps_flow_table)) {
 		goto done;
 	}
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b143173..a5ff5a8 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -598,7 +598,8 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 	}
 
 	spin_lock(&rps_map_lock);
-	old_map = queue->rps_map;
+	old_map = rcu_dereference_protected(queue->rps_map,
+					    lockdep_is_held(&rps_map_lock));
 	rcu_assign_pointer(queue->rps_map, map);
 	spin_unlock(&rps_map_lock);
 
@@ -677,7 +678,8 @@ static ssize_t store_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
 		table = NULL;
 
 	spin_lock(&rps_dev_flow_lock);
-	old_table = queue->rps_flow_table;
+	old_table = rcu_dereference_protected(queue->rps_flow_table,
+					      lockdep_is_held(&rps_dev_flow_lock));
 	rcu_assign_pointer(queue->rps_flow_table, table);
 	spin_unlock(&rps_dev_flow_lock);
 
@@ -705,13 +707,17 @@ static void rx_queue_release(struct kobject *kobj)
 {
 	struct netdev_rx_queue *queue = to_rx_queue(kobj);
 	struct netdev_rx_queue *first = queue->first;
+	struct rps_map *map;
+	struct rps_dev_flow_table *flow_table;
 
-	if (queue->rps_map)
-		call_rcu(&queue->rps_map->rcu, rps_map_release);
 
-	if (queue->rps_flow_table)
-		call_rcu(&queue->rps_flow_table->rcu,
-		    rps_dev_flow_table_release);
+	map = rcu_dereference_raw(queue->rps_map);
+	if (map)
+		call_rcu(&map->rcu, rps_map_release);
+
+	flow_table = rcu_dereference_raw(queue->rps_flow_table);
+	if (flow_table)
+		call_rcu(&flow_table->rcu, rps_dev_flow_table_release);
 
 	if (atomic_dec_and_test(&first->count))
 		kfree(first);
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 01eee5d..385b609 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -34,7 +34,8 @@ static int rps_sock_flow_sysctl(ctl_table *table, int write,
 
 	mutex_lock(&sock_flow_mutex);
 
-	orig_sock_table = rps_sock_flow_table;
+	orig_sock_table = rcu_dereference_protected(rps_sock_flow_table,
+					lockdep_is_held(&sock_flow_mutex));
 	size = orig_size = orig_sock_table ? orig_sock_table->mask + 1 : 0;
 
 	ret = proc_dointvec(&tmp, write, buffer, lenp, ppos);



^ permalink raw reply related

* [PATCH] mlx4_en: Fix out of bounds array access
From: Eli Cohen @ 2010-10-25 12:56 UTC (permalink / raw)
  To: davem, yevgenyp; +Cc: Roland Dreier, RDMA list, netdev

When searching for a free entry in either mlx4_register_vlan() or
mlx4_register_mac(), and there is no free entry, the loop terminates without
updating the local variable free thus causing out of array bounds access. Fix
this by adding a proper check outside the loop.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
---
 drivers/net/mlx4/port.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c
index 56371ef..4513395 100644
--- a/drivers/net/mlx4/port.c
+++ b/drivers/net/mlx4/port.c
@@ -111,6 +111,12 @@ int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index)
 			goto out;
 		}
 	}
+
+	if (free < 0) {
+		err = -ENOMEM;
+		goto out;
+	}
+
 	mlx4_dbg(dev, "Free MAC index is %d\n", free);
 
 	if (table->total == table->max) {
@@ -224,6 +230,11 @@ int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
 		}
 	}
 
+	if (free < 0) {
+		err = -ENOMEM;
+		goto out;
+	}
+
 	if (table->total == table->max) {
 		/* No free vlan entries */
 		err = -ENOSPC;
-- 
1.7.3.1


^ permalink raw reply related

* RE: [PATCH] macb: Don't re-enable interrupts while in polling mode
From: Joshua Hoke @ 2010-10-25 11:44 UTC (permalink / raw)
  To: David Miller
  Cc: nicolas.ferre, jpirko, peter.korsgaard, eric.dumazet,
	haavard.skinnemoen, netdev, linux-kernel, akpm
In-Reply-To: <20101024.152306.48512783.davem@davemloft.net>

> Your email client is breaking up long lines which corrupts the
> patch.
> 
> Please fix this up in your email client and resubmit.

Thanks for pointing this out. It looks fine in my sent folder but here's
a second try. In case this doesn't work, I've attached the contents to the bug as: https://bugzilla.kernel.org/attachment.cgi?id=34972

Contents of first e-mail follow, maybe not mangled this time.



From: Joshua Hoke <joshua.hoke@sixnet.com>

[PATCH] macb: Don't re-enable interrupts while in polling mode

On a busy network, the macb driver could get stuck in the interrupt
handler, quickly triggering the watchdog, due to a confluence of
factors:

 1. macb_poll re-enables interrupts unconditionally, even when it will
    be called again because it exhausted its rx budget

 2. macb_interrupt only disables interrupts after scheduling
    macb_poll, but scheduling fails when macb_poll is already scheduled
    because it didn't call napi_complete

 3. macb_interrupt loops until the interrupt status register is clear,
    which will never happen in this case if the driver doesn't disable
    the RX interrupt

Since macb_interrupt runs in interrupt context, this effectively locks
up the machine, triggering the hardware watchdog.

This issue was readily reproducible on a flooded network with a
modified 2.6.27.48 kernel. The same problem appears to still be in the
2.6.36-rc8 driver code, so I am submitting this patch against that
version. I have not tested this version of the patch except to make
sure the kernel compiles.

Signed-off-by: Joshua Hoke <joshua.hoke@sixnet.com>

---

I'm submitting this at the request of Andrew Morton in this bug:

  https://bugzilla.kernel.org/show_bug.cgi?id=20732

This version of the patch applies to 2.6.36-rc8 but has not been
tested. In particular I am assuming that napi_schedule_prep() behaves
the same as netif_rx_schedule_prep() did by failing when the macb_poll
callback is already scheduled.

diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index ff2f158..36cf594 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -515,14 +515,15 @@ static int macb_poll(struct napi_struct *napi, int budget)
 		(unsigned long)status, budget);
 
 	work_done = macb_rx(bp, budget);
-	if (work_done < budget)
+	if (work_done < budget) {
 		napi_complete(napi);
 
-	/*
-	 * We've done what we can to clean the buffers. Make sure we
-	 * get notified when new packets arrive.
-	 */
-	macb_writel(bp, IER, MACB_RX_INT_FLAGS);
+		/*
+		 * We've done what we can to clean the buffers. Make sure we
+		 * get notified when new packets arrive.
+		 */
+		macb_writel(bp, IER, MACB_RX_INT_FLAGS);
+	}
 
 	/* TODO: Handle errors */
 
@@ -550,12 +551,16 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 		}
 
 		if (status & MACB_RX_INT_FLAGS) {
+			/*
+			 * There's no point taking any more interrupts
+			 * until we have processed the buffers. The
+			 * scheduling call may fail if the poll routine
+			 * is already scheduled, so disable interrupts
+			 * now.
+			 */
+			macb_writel(bp, IDR, MACB_RX_INT_FLAGS);
+
 			if (napi_schedule_prep(&bp->napi)) {
-				/*
-				 * There's no point taking any more interrupts
-				 * until we have processed the buffers
-				 */
-				macb_writel(bp, IDR, MACB_RX_INT_FLAGS);
 				dev_dbg(&bp->pdev->dev,
 					"scheduling RX softirq\n");
 				__napi_schedule(&bp->napi);

^ permalink raw reply related

* [PATCH net-next-2.6 2/2] be2net: Schedule/Destroy worker thread in probe()/remove() rather than open()/close()
From: Somnath Kotur @ 2010-10-25 11:13 UTC (permalink / raw)
  To: netdev

When async mcc compls are rcvd on an i/f that is down (and so interrupts are disabled)
they just lie unprocessed in the compl queue.The compl queue can eventually get filled
up and cause the BE to lock up.The fix is to use be_worker to reap mcc compls when the
i/f is down.be_worker is now launched in be_probe() and canceled in be_remove().

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/benet/be_main.c |   22 ++++++++++++++++++----
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 1262292..2546273 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1809,6 +1809,20 @@ static void be_worker(struct work_struct *work)
 	struct be_rx_obj *rxo;
 	int i;
 
+	/* when interrupts are not yet enabled, just reap any pending
+	* mcc completions */
+	if (!netif_running(adapter->netdev)) {
+		int mcc_compl, status = 0;
+
+		mcc_compl = be_process_mcc(adapter, &status);
+
+		if (mcc_compl) {
+			struct be_mcc_obj *mcc_obj = &adapter->mcc_obj;
+			be_cq_notify(adapter, mcc_obj->cq.id, false, mcc_compl);
+		}
+		goto reschedule;
+	}
+
 	if (!adapter->stats_ioctl_sent)
 		be_cmd_get_stats(adapter, &adapter->stats_cmd);
 
@@ -1827,6 +1841,7 @@ static void be_worker(struct work_struct *work)
 	if (!adapter->ue_detected)
 		be_detect_dump_ue(adapter);
 
+reschedule:
 	schedule_delayed_work(&adapter->work, msecs_to_jiffies(1000));
 }
 
@@ -2022,8 +2037,6 @@ static int be_close(struct net_device *netdev)
 	struct be_eq_obj *tx_eq = &adapter->tx_eq;
 	int vec, i;
 
-	cancel_delayed_work_sync(&adapter->work);
-
 	be_async_mcc_disable(adapter);
 
 	netif_stop_queue(netdev);
@@ -2088,8 +2101,6 @@ static int be_open(struct net_device *netdev)
 	/* Now that interrupts are on we can process async mcc */
 	be_async_mcc_enable(adapter);
 
-	schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
-
 	status = be_cmd_link_status_query(adapter, &link_up, &mac_speed,
 			&link_speed);
 	if (status)
@@ -2718,6 +2729,8 @@ static void __devexit be_remove(struct pci_dev *pdev)
 	if (!adapter)
 		return;
 
+	cancel_delayed_work_sync(&adapter->work);
+
 	unregister_netdev(adapter->netdev);
 
 	be_clear(adapter);
@@ -2874,6 +2887,7 @@ static int __devinit be_probe(struct pci_dev *pdev,
 		goto unsetup;
 
 	dev_info(&pdev->dev, "%s port %d\n", nic_name(pdev), adapter->port_num);
+	schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
 	return 0;
 
 unsetup:
-- 
1.5.6.1


^ permalink raw reply related

* [PATCH net-next-2.6 1/2] be2net: Adding an option to use INTx instead of MSI-X
From: Somnath Kotur @ 2010-10-25 11:12 UTC (permalink / raw)
  To: netdev

By default, be2net uses MSIx wherever possible.
Adding a module parameter to use INTx for users who do not want to use MSIx.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/benet/be_main.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 45b1f66..1262292 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -27,10 +27,13 @@ MODULE_LICENSE("GPL");
 
 static unsigned int rx_frag_size = 2048;
 static unsigned int num_vfs;
+static unsigned int msix = 1;
 module_param(rx_frag_size, uint, S_IRUGO);
 module_param(num_vfs, uint, S_IRUGO);
+module_param(msix, uint, S_IRUGO);
 MODULE_PARM_DESC(rx_frag_size, "Size of a fragment that holds rcvd data.");
 MODULE_PARM_DESC(num_vfs, "Number of PCI VFs to initialize");
+MODULE_PARM_DESC(msix, "Enable/Disable the MSIx (MSIx enabled by default)");
 
 static bool multi_rxq = true;
 module_param(multi_rxq, bool, S_IRUGO | S_IWUSR);
@@ -2856,7 +2859,8 @@ static int __devinit be_probe(struct pci_dev *pdev,
 	if (status)
 		goto stats_clean;
 
-	be_msix_enable(adapter);
+	if (msix)
+		be_msix_enable(adapter);
 
 	INIT_DELAYED_WORK(&adapter->work, be_worker);
 
@@ -3082,6 +3086,12 @@ static int __init be_init_module(void)
 		num_vfs = 32;
 	}
 
+	if (!msix && num_vfs > 0) {
+		printk(KERN_WARNING DRV_NAME
+			" : MSIx required for num_vfs > 0. Ignoring msix=0\n");
+		msix = 1;
+	}
+
 	return pci_register_driver(&be_driver);
 }
 module_init(be_init_module);
-- 
1.5.6.1


^ permalink raw reply related

* [PATCH 2/2] be2net: Fix CSO for UDP packets
From: Somnath Kotur @ 2010-10-25 11:11 UTC (permalink / raw)
  To: netdev

We're setting skb->ip_summed to CHECKSUM_NONE for all non-TCP pkts, making the stack
recompute checksum.This is a bug for UDP pkts for which cso must be used.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/benet/be_main.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 8767355..6b80a06 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -849,20 +849,16 @@ static void be_rx_stats_update(struct be_rx_obj *rxo,
 		stats->rx_mcast_pkts++;
 }
 
-static inline bool do_pkt_csum(struct be_eth_rx_compl *rxcp, bool cso)
+static inline bool csum_passed(struct be_eth_rx_compl *rxcp)
 {
-	u8 l4_cksm, ip_version, ipcksm, tcpf = 0, udpf = 0, ipv6_chk;
+	u8 l4_cksm, ipv6, ipcksm;
 
 	l4_cksm = AMAP_GET_BITS(struct amap_eth_rx_compl, l4_cksm, rxcp);
 	ipcksm = AMAP_GET_BITS(struct amap_eth_rx_compl, ipcksm, rxcp);
-	ip_version = AMAP_GET_BITS(struct amap_eth_rx_compl, ip_version, rxcp);
-	if (ip_version) {
-		tcpf = AMAP_GET_BITS(struct amap_eth_rx_compl, tcpf, rxcp);
-		udpf = AMAP_GET_BITS(struct amap_eth_rx_compl, udpf, rxcp);
-	}
-	ipv6_chk = (ip_version && (tcpf || udpf));
+	ipv6 = AMAP_GET_BITS(struct amap_eth_rx_compl, ip_version, rxcp);
 
-	return ((l4_cksm && ipv6_chk && ipcksm) && cso) ? false : true;
+	/* Ignore ipcksm for ipv6 pkts */
+	return l4_cksm && (ipcksm || ipv6);
 }
 
 static struct be_rx_page_info *
@@ -1017,10 +1013,10 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 
 	skb_fill_rx_data(adapter, rxo, skb, rxcp, num_rcvd);
 
-	if (do_pkt_csum(rxcp, adapter->rx_csum))
-		skb_checksum_none_assert(skb);
-	else
+	if (likely(adapter->rx_csum && csum_passed(rxcp)))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	else
+		skb_checksum_none_assert(skb);
 
 	skb->truesize = skb->len + sizeof(struct sk_buff);
 	skb->protocol = eth_type_trans(skb, adapter->netdev);
-- 
1.5.6.1


^ permalink raw reply related

* [PATCH 1/2] be2net: Call netif_carier_off() after register_netdev()
From: Somnath Kotur @ 2010-10-25 11:11 UTC (permalink / raw)
  To: netdev

Calling netif_carrier_off before register_netdev was causing the network interface
to miss a linkwatch pending event leading to  an inconsistent state if the link
is not up when interface is initialized.This is now invoked after register_netdev.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/benet/be_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 1248626..8767355 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2556,7 +2556,6 @@ static void be_netdev_init(struct net_device *netdev)
 	netif_napi_add(netdev, &adapter->tx_eq.napi, be_poll_tx_mcc,
 		BE_NAPI_WEIGHT);
 
-	netif_carrier_off(netdev);
 	netif_stop_queue(netdev);
 }
 
@@ -2865,6 +2864,7 @@ static int __devinit be_probe(struct pci_dev *pdev,
 	status = register_netdev(netdev);
 	if (status != 0)
 		goto unsetup;
+	netif_carrier_off(netdev);
 
 	dev_info(&pdev->dev, "%s port %d\n", nic_name(pdev), adapter->port_num);
 	return 0;
-- 
1.5.6.1


^ permalink raw reply related

* ***IMPORTANT NOTICE***
From: Mr Jerry Ntai @ 2010-10-25 10:56 UTC (permalink / raw)


MR.JERRY NTAI. 
MEVAS BANK
QUEEN`S ROAD
CENTRAL BRANCH, 
HONG KONG.

Good Day,

Please kindly accept my apology for sending you this email without your consent. I believe you are a highly respected personality, considering the fact that I sourced your email from the peoples search database on the web during my descret search for a foreign partner whom can assist me in taking this business to it success. Though, I do not know to what extent you are familiar with events. I have a proposal for you.This however is not mandatory nor will I in any manner compel you to honor against your will, but I hope you will read on and consider the value I offer. 

My name is Mr. Jerry Ntai, I am the Head of Operations in Mevas Bank, Hong Kong. I have a business proposal in the tune of US$25.2m  to be transferred to an offshore account with your assistance if willing.

After the successful transfer, we shall share in ratio of 30% for you and 70% for me. Should you be interested, please 
respond to my letter immediately, so we can commence all arrangements and I will give you more information on the project and 
how we would handle it.

You can contact me on my private email:(mrjerryntai72@yahoo.com.hk) and send me the following information for documentation 
purpose:

(1)Full name:
(2)private phone number:
(3)current residential address:
(4)Occupation:
(5)Age and Sex:

I look forward to hearing from you.
Kind Regards,
Mr.Jerry Ntai.


^ permalink raw reply

* Re: [PATCH v2 1/9] tproxy: split off ipv6 defragmentation to a separate module
From: Eric Dumazet @ 2010-10-25 10:14 UTC (permalink / raw)
  To: KOVACS Krisztian
  Cc: Patrick McHardy, netdev, netfilter-devel, Balazs Scheidler,
	David Miller
In-Reply-To: <1287999512.2160.25.camel@este.odu>

Le lundi 25 octobre 2010 à 11:38 +0200, KOVACS Krisztian a écrit :
> Hi,
> 
> On Fri, 2010-10-22 at 00:19 +0200, Eric Dumazet wrote:
> > Le jeudi 21 octobre 2010 à 16:04 +0200, Patrick McHardy a écrit :
> > > Am 21.10.2010 13:43, schrieb KOVACS Krisztian:
> > > > tproxy: split off ipv6 defragmentation to a separate module
> > > >     
> > > >     Like with IPv4, TProxy needs IPv6 defragmentation but does not
> > > >     require connection tracking. Since defragmentation was coupled
> > > >     with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
> > > >     similar to the already existing nf_defrag_ipv4.
> > > 
> > > Applied, thanks.
> > 
> > Hmm...
> > 
> > CONFIG_IPV6=m
> > CONFIG_NETFILTER_TPROXY=m
> > 
> > 
> >   MODPOST 201 modules
> > ERROR: "nf_defrag_ipv6_enable" [net/netfilter/xt_TPROXY.ko] undefined!
> > ERROR: "ipv6_find_hdr" [net/netfilter/xt_TPROXY.ko] undefined!
> > 
> > Sorry, it's late here, I wont fix this ;)
> 
> Oops, I guess this is because you do have IPv6 support but don't have
> ip6tables enabled in your config. Does the patch below fix the issue for
> you? (For me it now compiles with and without IPv6 conntrack, ip6tables
> and IPv6 support, too.)
> 
> 

I had ip6tables enabled, but not CONFIG_NF_CONNTRACK_IPV6 ;)

> 
> netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY
> 
> One of the previous tproxy related patches split IPv6 defragmentation and
> connection tracking, but did not correctly add Kconfig stanzas to handle the
> new dependencies correctly. This patch fixes that by making the config options
> mirror the setup we have for IPv4: a distinct config option for defragmentation
> that is automatically selected by both connection tracking and
> xt_TPROXY/xt_socket.
> 
> The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket
> and xt_TPROXY: we only compile these in case we have ip6tables support enabled.
> 
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/9] tproxy: split off ipv6 defragmentation to a separate module
From: KOVACS Krisztian @ 2010-10-25  9:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, netdev, netfilter-devel, Balazs Scheidler,
	David Miller
In-Reply-To: <1287699558.2607.5.camel@edumazet-laptop>

Hi,

On Fri, 2010-10-22 at 00:19 +0200, Eric Dumazet wrote:
> Le jeudi 21 octobre 2010 à 16:04 +0200, Patrick McHardy a écrit :
> > Am 21.10.2010 13:43, schrieb KOVACS Krisztian:
> > > tproxy: split off ipv6 defragmentation to a separate module
> > >     
> > >     Like with IPv4, TProxy needs IPv6 defragmentation but does not
> > >     require connection tracking. Since defragmentation was coupled
> > >     with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
> > >     similar to the already existing nf_defrag_ipv4.
> > 
> > Applied, thanks.
> 
> Hmm...
> 
> CONFIG_IPV6=m
> CONFIG_NETFILTER_TPROXY=m
> 
> 
>   MODPOST 201 modules
> ERROR: "nf_defrag_ipv6_enable" [net/netfilter/xt_TPROXY.ko] undefined!
> ERROR: "ipv6_find_hdr" [net/netfilter/xt_TPROXY.ko] undefined!
> 
> Sorry, it's late here, I wont fix this ;)

Oops, I guess this is because you do have IPv6 support but don't have
ip6tables enabled in your config. Does the patch below fix the issue for
you? (For me it now compiles with and without IPv6 conntrack, ip6tables
and IPv6 support, too.)



netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY

One of the previous tproxy related patches split IPv6 defragmentation and
connection tracking, but did not correctly add Kconfig stanzas to handle the
new dependencies correctly. This patch fixes that by making the config options
mirror the setup we have for IPv4: a distinct config option for defragmentation
that is automatically selected by both connection tracking and
xt_TPROXY/xt_socket.

The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket
and xt_TPROXY: we only compile these in case we have ip6tables support enabled.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---
 net/ipv6/netfilter/Kconfig  |    5 +++++
 net/ipv6/netfilter/Makefile |    5 ++++-
 net/netfilter/Kconfig       |    2 ++
 net/netfilter/xt_TPROXY.c   |   10 ++++++----
 net/netfilter/xt_socket.c   |   12 ++++++++----
 5 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 29d643b..e5f6edc 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -5,10 +5,15 @@
 menu "IPv6: Netfilter Configuration"
 	depends on INET && IPV6 && NETFILTER
 
+config NF_DEFRAG_IPV6
+	tristate
+	default n
+
 config NF_CONNTRACK_IPV6
 	tristate "IPv6 connection tracking support"
 	depends on INET && IPV6 && NF_CONNTRACK
 	default m if NETFILTER_ADVANCED=n
+	select NF_DEFRAG_IPV6
 	---help---
 	  Connection tracking keeps a record of what packets have passed
 	  through your machine, in order to figure out how they are related
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 3f8e4a3..0a432c9 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -12,11 +12,14 @@ obj-$(CONFIG_IP6_NF_SECURITY) += ip6table_security.o
 
 # objects for l3 independent conntrack
 nf_conntrack_ipv6-objs  :=  nf_conntrack_l3proto_ipv6.o nf_conntrack_proto_icmpv6.o
-nf_defrag_ipv6-objs := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o
 
 # l3 independent conntrack
 obj-$(CONFIG_NF_CONNTRACK_IPV6) += nf_conntrack_ipv6.o nf_defrag_ipv6.o
 
+# defrag
+nf_defrag_ipv6-objs := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o
+obj-$(CONFIG_NF_DEFRAG_IPV6) += nf_defrag_ipv6.o
+
 # matches
 obj-$(CONFIG_IP6_NF_MATCH_AH) += ip6t_ah.o
 obj-$(CONFIG_IP6_NF_MATCH_EUI64) += ip6t_eui64.o
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 4328825..1534f2b 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -525,6 +525,7 @@ config NETFILTER_XT_TARGET_TPROXY
 	depends on NETFILTER_XTABLES
 	depends on NETFILTER_ADVANCED
 	select NF_DEFRAG_IPV4
+	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
 	help
 	  This option adds a `TPROXY' target, which is somewhat similar to
 	  REDIRECT.  It can only be used in the mangle table and is useful
@@ -927,6 +928,7 @@ config NETFILTER_XT_MATCH_SOCKET
 	depends on NETFILTER_ADVANCED
 	depends on !NF_CONNTRACK || NF_CONNTRACK
 	select NF_DEFRAG_IPV4
+	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
 	help
 	  This option adds a `socket' match, which can be used to match
 	  packets for which a TCP or UDP socket lookup finds a valid socket.
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 19c482c..640678f 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -21,7 +21,9 @@
 #include <linux/netfilter_ipv4/ip_tables.h>
 
 #include <net/netfilter/ipv4/nf_defrag_ipv4.h>
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+
+#if defined(CONFIG_IP6_NF_IPTABLES) || defined(CONFIG_IP6_NF_IPTABLES_MODULE)
+#define XT_TPROXY_HAVE_IPV6 1
 #include <net/if_inet6.h>
 #include <net/addrconf.h>
 #include <linux/netfilter_ipv6/ip6_tables.h>
@@ -172,7 +174,7 @@ tproxy_tg4_v1(struct sk_buff *skb, const struct xt_action_param *par)
 	return tproxy_tg4(skb, tgi->laddr.ip, tgi->lport, tgi->mark_mask, tgi->mark_value);
 }
 
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#ifdef XT_TPROXY_HAVE_IPV6
 
 static inline const struct in6_addr *
 tproxy_laddr6(struct sk_buff *skb, const struct in6_addr *user_laddr,
@@ -372,7 +374,7 @@ static struct xt_target tproxy_tg_reg[] __read_mostly = {
 		.hooks		= 1 << NF_INET_PRE_ROUTING,
 		.me		= THIS_MODULE,
 	},
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#ifdef XT_TPROXY_HAVE_IPV6
 	{
 		.name		= "TPROXY",
 		.family		= NFPROTO_IPV6,
@@ -391,7 +393,7 @@ static struct xt_target tproxy_tg_reg[] __read_mostly = {
 static int __init tproxy_tg_init(void)
 {
 	nf_defrag_ipv4_enable();
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#ifdef XT_TPROXY_HAVE_IPV6
 	nf_defrag_ipv6_enable();
 #endif
 
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 2dbd4c8..d94a858 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -14,7 +14,6 @@
 #include <linux/skbuff.h>
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter_ipv4/ip_tables.h>
-#include <linux/netfilter_ipv6/ip6_tables.h>
 #include <net/tcp.h>
 #include <net/udp.h>
 #include <net/icmp.h>
@@ -22,7 +21,12 @@
 #include <net/inet_sock.h>
 #include <net/netfilter/nf_tproxy_core.h>
 #include <net/netfilter/ipv4/nf_defrag_ipv4.h>
+
+#if defined(CONFIG_IP6_NF_IPTABLES) || defined(CONFIG_IP6_NF_IPTABLES_MODULE)
+#define XT_SOCKET_HAVE_IPV6 1
+#include <linux/netfilter_ipv6/ip6_tables.h>
 #include <net/netfilter/ipv6/nf_defrag_ipv6.h>
+#endif
 
 #include <linux/netfilter/xt_socket.h>
 
@@ -186,7 +190,7 @@ socket_mt4_v1(const struct sk_buff *skb, struct xt_action_param *par)
 	return socket_match(skb, par, par->matchinfo);
 }
 
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#ifdef XT_SOCKET_HAVE_IPV6
 
 static int
 extract_icmp6_fields(const struct sk_buff *skb,
@@ -331,7 +335,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 				  (1 << NF_INET_LOCAL_IN),
 		.me		= THIS_MODULE,
 	},
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#ifdef XT_SOCKET_HAVE_IPV6
 	{
 		.name		= "socket",
 		.revision	= 1,
@@ -348,7 +352,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 static int __init socket_mt_init(void)
 {
 	nf_defrag_ipv4_enable();
-#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#ifdef XT_SOCKET_HAVE_IPV6
 	nf_defrag_ipv6_enable();
 #endif
 






^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox