Netdev List
 help / color / mirror / Atom feed
* GPF in skb_flow_dissect
From: Dave Jones @ 2012-12-13  4:16 UTC (permalink / raw)
  To: netdev

Since todays net merge, I see this when I start openvpn..

general protection fault: 0000 [#1] PREEMPT SMP 
Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
CPU 0 
Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14                  /D975XBX
RIP: 0010:[<ffffffff815b54a4>]  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
RSP: 0018:ffff88007d0d9c48  EFLAGS: 00010206
RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
FS:  00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
Stack:
 ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
 ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
 ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
Call Trace:
 [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
 [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
 [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
 [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
 [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
 [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
 [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
 [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
 [<ffffffff811db917>] do_sync_write+0xa7/0xe0
 [<ffffffff811dc01f>] vfs_write+0xaf/0x190
 [<ffffffff811dc375>] sys_write+0x55/0xa0
 [<ffffffff81705540>] tracesys+0xdd/0xe2
Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48 
RIP  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
 RSP <ffff88007d0d9c48>
---[ end trace 6d42c834c72c002e ]---


Faulting instruction is

   0:	8b 03                	mov    (%rbx),%eax

rbx is slab poison (-20) so this looks like a use-after-free here...

                        flow->ports = *ports;
 314:   8b 03                   mov    (%rbx),%eax
 316:   41 89 46 08             mov    %eax,0x8(%r14)

in the inlined skb_header_pointer in skb_flow_dissect

	Dave

^ permalink raw reply

* Re: [Query] TCP TFO Query
From: Yuchung Cheng @ 2012-12-13  3:49 UTC (permalink / raw)
  To: Ketan Kulkarni; +Cc: netdev
In-Reply-To: <CAD6NSj4dMG3OC0mb4Qiq2eXTNwFBonkcnw=gRF5YAef-5yjeVQ@mail.gmail.com>

On Wed, Dec 12, 2012 at 10:17 AM, Ketan Kulkarni <ketkulka@gmail.com> wrote:
> Thanks Yuchung for your reply.
>
> My only concern is -If syn+data is sent by client and syn-ack only acks the
> ISN, then isnt this a sufficient indication that server now is not
> supporting the TFO? So for further connections to this server, instead of
> sending syn+data, only ask for cookie. (fall back to the state where it was
> all started) (Note that this condition is different from syn+data is dropped
> in the nw.)
>
> I agree with you in saying it doesn't lead to any performance penalty,
> however sending syn+data to a server seems a little odd when we know we have
> sufficient information to believe that it may not be accepted at first,
> retransmitted later. And otherwise we also have a way to fall back and
> re-attempt the TFO.
Your proposal sounds  reasonable. We can change that. In addition,
maybe we can change the server to send SYN-ACK acking ISN only with a
cookie option, if  the server prefers the client to still do SYN-data-cookie
next time for some reason. I will try prepare a rfc patch soon.

>
> Thoughts?
>
> Thanks,
> Ketan
>
> On Dec 12, 2012 3:34 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>>
>> Hi Ketan,
>>
>> On Tue, Dec 11, 2012 at 9:29 AM, Ketan Kulkarni <ketkulka@gmail.com>
>> wrote:
>> > Hi,
>> > I am testing tcp tfo behavior with httping client and polipo server on
>> > 3.7rc-8
>> >
>> > One observation from my TFO testing  -If for a connection server sends
>> > a cookie to client, client always does TFO for subsequent connections.
>> > This is ok.
>> >
>> > If for some reason, server stops supporting TFO (either because server
>> > got restarted without TFO support (in my case) or because path changed
>> > and the nw node is dropping packet with unknown syn option or
>> > stripping the option), client does not clear up its cookie cache. It
>> > always sends data in syn and server never acks the syn-data and client
>> > retransmits.
>> >
>> > As per kernel code -if syn-data is not acked it is retransmitted
>> > immediately - with the assumption first syn was dropped (but the
>> > assumption server stopped supporting TFO might not have been
>> > considered)
>> >
>> > Will it be better to flush the cookie for this server and re-attempt
>> > the cookie "negotiation" on subsequent connection than to retransmit
>> > the data every time?
>> >
>> > Your thoughts?
>>
>> In our initial design the client actually removes the cookie of the
>> particular server
>> (!= flush the entire cache though). Later on we changed to the current
>> behavior because
>> it does not have a performance penalty. It falls back to regular
>> handshake:
>>
>> SYN/cookie/data -> SYN-ACK acking ISN -> ACK(data).
>>
>> It may happen frequently when a large server farms are upgrading to
>> support TFO.
>>
>> However there are always more options:
>> 1) Server can selectively instrument to delete old cookies by sending a
>> SYN-ACK
>>    acking initial sequence with a null TFO option (== caching a null
>> cookie ==
>>    removing the older one).
> In the case I mentioned, this might not help because server got restarted
> with TFO disabled so having this option can help cases when server
> understands/supports tfo and know when to delete the client side cookie. Or
> may be I am missing something!!!
>
>> 2) another client-side flag in sysctl_tcp_fastopen to remove cookie if
>> SYN-ACK
>>    only acks the syn sequence.
> My view is to prefer keeping knobs as minimum as possible as otherwise imo
> we might put extra efforts on the user to know and understand why and what
> this flag is  when he is simply interested in TFO.
>
>> 3) combination of 1 and 2.
>>
>> More ideas are welcome :)
>>
>> NOTE: I've checked in a patch so that syn-data not acked is not treated as
>> a
>> network-drop.
>> http://patchwork.ozlabs.org/patch/171978/
>>
>> Yuchung
>>
>> >
>> > Thanks,
>> > Ketan
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [GIT] Networking
From: David Miller @ 2012-12-13  3:22 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel, nhorman, vyasevich
In-Reply-To: <CA+55aFxvHrNYB_J851XTkZ4EiwZ68Fb64DEU1JJmxPV-zB+9Vw@mail.gmail.com>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 12 Dec 2012 18:37:08 -0800

> On Wed, Dec 12, 2012 at 6:27 PM, David Miller <davem@davemloft.net> wrote:
>>
>> There are two SCTP HMAC cookie algorithms, MD5 and SHA1.
>>
>> What used to happen is that you had to choose one at build
>> time, and then you were stuck with that decision and it was
>> all that you could use.
>>
>> Now, it's selectable at run time.
>>
>> If there's anything you find particularly anti-social about
>> this, I'm sure we can adjust it.
> 
> So I'd suggest doing the same thing that the new thermal throttling
> Kconfig does: start off by asking for the default algorithm, then ask
> about the others.
> 
> The "choice" part selects the one that is default (so it never gets
> asked about and is obviously compiled in), and the rest default to no
> like we should.
> 
> See drivers/thermal/Kconfig for an example of this. I think we do it
> in other places too, but that one happens to be new so I picked it as
> an example.
> 
> The rule should be that we *never* default anything to 'yes', unless
> it's old functionality that we always compiled in before too, and now
> it got made conditional. So if you see a "default y" on new options,
> you should basically consider it broken.
> 
> We're already bloating too much, we should not encourage people to
> make things more bloated than necessary.
> 
> Btw, that Kconfig option has basically no useful help text either.
> What's the point of repeating the question as a "help" message?
> 
> If people can't explain why anybody should enable it, it sure as hell
> shouldn't default to 'y'. Maybe it shouldn't exist at all?

Neil and Vlad, please take care of this.

Thanks.

^ permalink raw reply

* Re: [GIT] Networking
From: Linus Torvalds @ 2012-12-13  2:37 UTC (permalink / raw)
  To: David Miller
  Cc: Andrew Morton, Network Development, Linux Kernel Mailing List
In-Reply-To: <20121212.212734.917363230032045212.davem@davemloft.net>

On Wed, Dec 12, 2012 at 6:27 PM, David Miller <davem@davemloft.net> wrote:
>
> There are two SCTP HMAC cookie algorithms, MD5 and SHA1.
>
> What used to happen is that you had to choose one at build
> time, and then you were stuck with that decision and it was
> all that you could use.
>
> Now, it's selectable at run time.
>
> If there's anything you find particularly anti-social about
> this, I'm sure we can adjust it.

So I'd suggest doing the same thing that the new thermal throttling
Kconfig does: start off by asking for the default algorithm, then ask
about the others.

The "choice" part selects the one that is default (so it never gets
asked about and is obviously compiled in), and the rest default to no
like we should.

See drivers/thermal/Kconfig for an example of this. I think we do it
in other places too, but that one happens to be new so I picked it as
an example.

The rule should be that we *never* default anything to 'yes', unless
it's old functionality that we always compiled in before too, and now
it got made conditional. So if you see a "default y" on new options,
you should basically consider it broken.

We're already bloating too much, we should not encourage people to
make things more bloated than necessary.

Btw, that Kconfig option has basically no useful help text either.
What's the point of repeating the question as a "help" message?

If people can't explain why anybody should enable it, it sure as hell
shouldn't default to 'y'. Maybe it shouldn't exist at all?

            Linus

^ permalink raw reply

* Re: [GIT] Networking
From: David Miller @ 2012-12-13  2:27 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel
In-Reply-To: <CA+55aFwzUgxQAze=mYbEx8b61V542tzm06Df=mR1BtYVbJy0mg@mail.gmail.com>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 12 Dec 2012 18:15:04 -0800

> On Wed, Dec 12, 2012 at 12:11 PM, David Miller <davem@davemloft.net> wrote:
>>
>> There is one merge conflict to resolve in net/sched/cls_cgroup.c,
>> one commit changes the name of some members to "css_*" (this came
>> from Tejun's tree) and another commit adds an "attach" method.
> 
> There's more than that. The ARM board mess is apparently now affecting
> the networking merges too.
> 
> I fixed it up. Hopefully correctly.
> 
> Also, why does the new SHA1 hmac cookie support default to 'y'?

There are two SCTP HMAC cookie algorithms, MD5 and SHA1.

What used to happen is that you had to choose one at build
time, and then you were stuck with that decision and it was
all that you could use.

Now, it's selectable at run time.

If there's anything you find particularly anti-social about
this, I'm sure we can adjust it.

^ permalink raw reply

* Re: [GIT] Networking
From: Linus Torvalds @ 2012-12-13  2:15 UTC (permalink / raw)
  To: David Miller
  Cc: Andrew Morton, Network Development, Linux Kernel Mailing List
In-Reply-To: <20121212.151116.143443755590581447.davem@davemloft.net>

On Wed, Dec 12, 2012 at 12:11 PM, David Miller <davem@davemloft.net> wrote:
>
> There is one merge conflict to resolve in net/sched/cls_cgroup.c,
> one commit changes the name of some members to "css_*" (this came
> from Tejun's tree) and another commit adds an "attach" method.

There's more than that. The ARM board mess is apparently now affecting
the networking merges too.

I fixed it up. Hopefully correctly.

Also, why does the new SHA1 hmac cookie support default to 'y'?

                  Linus

^ permalink raw reply

* RFC: Launch Time Support
From: Ulf Samuelsson @ 2012-12-13  1:04 UTC (permalink / raw)
  To: netdev

Hi, I am looking for some feedback on how to implement launchtime
in the kernel.

I.E: You define WHEN you want to send a packet,
and the driver will store the packet in a buffer and will send it out
on the net when the internal timestamp counter in the network controller
reaches the specified "launch time".

Some Ethernet controllers like the new Intel i210 support "launch time",

Support for launch time is desirable for any isochronous connection,
but I am currently interested in the NTP protocol to improve the timing.

Proposed Changes to the Kernel
===========================================================
The launchtime support will be dependent on CONFIG_NET_LAUNCHTIME
If this is not set, then the kernel functionality is not changed.

My current idea is to add a new bit to the "flags" field of 
"socket.c:sendto"
#define MSG_LAUNCHTIME 0x?????

struct msghdr gets an additional launchtime field.

sendto will check if the flags parameter contains MSG_LAUNCHTIME.
If it does, then the first 64 bit longword of the packet (buff) contains 
the launchtime.
The launchtime from the buffer is copied to the msghdr.launchtime field,
and the first 64 bits of the packet is then shaved off, before the address
is written to the msghdr.

Each network controller supporting launchtime needs to have an alternative
call to "send packet with launchtime" . This call adds the launchtime 
parameter.
If launchtime is supported the exported "ops" includes the new call.

The UDP/IP packet send will check the MSG_LAUNCHTIME and
if set, it will check if the "send packet with launchtime" call
is available for the driver and if so call it, otherwise it will call
the normal send packet and thus ignore the launchtime.

Before launchtime is used, the application should send an ioctl
to the driver, making sure that launchtime is configured,
and only if the driver ACKs , the application will use launchtime.

(Possibly the "ops" field for "send packet with launchtime" should be
NULL until that ioctl is complete. Comments?)

To me, this seems to be transparent for all other network stacks
so protocols and drivers not supporting launchtime should still work.

As far as I know, drivers do not support launch time today.
The Intel igb driver does not in the latest version on the intel web site,
There are some defines headers in the latest version  defining the registers
but so far, the code is not using it.

There is the linux_igb_avb project on sourceforge which  allows use of
launch time for user space applications, but not as part of the kernel.

Maybe there is more work done somewhere else, but i am not aware
of this, so any links to such work is appreciated.

There are some FPGA based PCIe boards that support launchtime (Endace DAG)
using proprietary APIs.
Talked to some vendors providing TCP/IP offload engines for FPGA
and they do not support launchtime and liuke Endace use proprietary APIs
so they are only useable by custom programs. Normal networking interfaces
are not supported.

Comment on above is appreciated.

BACKGROUND
For those that do not know how the NTP protocol works:
===================================================
The client sends an UDP packet to the NTP server using port 123
The NTP client reads the current systime and puts that in the outgoing 
packet.
There is a delay between the time the systime is read, and the time
the packet actually leaves the Ethernet controller adding jitter to the
NTP algorithm.

When the server receives the packet, it can be timestamped in H/W
and a CMSG is then created by the network stack containing that
timestamp for use by the server NTP daemon.

The server generates a reply, which needs to include the client
transmit time, the servers receive time, and the servers transmit time.
Again, the transmit time needs to be written into the NTP packet,
and then it needs to be processed through the network stack before it is
leaving the ethernet controller causing more jitter.

If launch time is supported, then the client NTP daemon would simply
read the systime, add a constant delay to create the transmit timestamp.
The delay needs to be sufficiently large to ensure that all processing 
is done,

The server will do something similar adding a constant to the server 
receive timestamp
to create the server transmit timestamp.
If both the client and the server uses H/W timestamping and launch time,
then the the jitter ideally is reduced to zero.

TRANSMIT TIMESTAMPING
========================
Support for TX timestamps in H/W is not really useful, since you need to 
provide
the TX timestamp in the packet you measure on, so when you know the 
timestamp
it is too late. Server to server  NTP connections support sending that 
timestamp
in a new packet, but there is no such support in client server 
communication.

The i210 supports putting the timestamp inside the packet as it leaves the
Ethernet controller, but that means that you screw up the UDP checksum, so
the packet will be rejected by the receiving NTP daemon.
In addition, the i210 timestamp measures seconds and nanoseconds
which is incompatible with the NTP timestamp which uses seconds
and a 32 bit fraction of a second so that does not work either.

Best Regards
Ulf Samuelsson
eMagii.

^ permalink raw reply

* [net-next:master 14/17] net/bridge/br_multicast.c:677:54: sparse: incorrect type in argument 3 (different address spaces)
From: kbuild test robot @ 2012-12-13  0:51 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   520dfe3a3645257bf83660f672c47f8558f3d4c4
commit: cfd567543590f71ca0af397437e2554f9756d750 [14/17] bridge: add support of adding and deleting mdb entries


sparse warnings:

net/bridge/br_multicast.c:635:17: sparse: incorrect type in assignment (different address spaces)
net/bridge/br_multicast.c:635:17:    expected struct net_bridge_port_group [noderef] <asn:4>*next
net/bridge/br_multicast.c:635:17:    got struct net_bridge_port_group *next
+ net/bridge/br_multicast.c:677:54: sparse: incorrect type in argument 3 (different address spaces)
net/bridge/br_multicast.c:677:54:    expected struct net_bridge_port_group *next
net/bridge/br_multicast.c:677:54:    got struct net_bridge_port_group [noderef] <asn:4>*<noident>
net/bridge/br_multicast.c:1175:48: sparse: restricted __be16 degrades to integer
net/bridge/br_multicast.c:1175:48: sparse: restricted __be16 degrades to integer
net/bridge/br_multicast.c:1175:48: sparse: restricted __be16 degrades to integer
net/bridge/br_multicast.c:1175:48: sparse: restricted __be16 degrades to integer
net/bridge/br_multicast.c:1175:48: sparse: restricted __be16 degrades to integer
net/bridge/br_multicast.c:1175:48: sparse: restricted __be16 degrades to integer

vim +677 net/bridge/br_multicast.c

cfd56754 Cong Wang         2012-12-11  629  	p = kzalloc(sizeof(*p), GFP_ATOMIC);
cfd56754 Cong Wang         2012-12-11  630  	if (unlikely(!p))
cfd56754 Cong Wang         2012-12-11  631  		return NULL;
cfd56754 Cong Wang         2012-12-11  632  
cfd56754 Cong Wang         2012-12-11  633  	p->addr = *group;
cfd56754 Cong Wang         2012-12-11  634  	p->port = port;
cfd56754 Cong Wang         2012-12-11 @635  	p->next = next;
cfd56754 Cong Wang         2012-12-11  636  	hlist_add_head(&p->mglist, &port->mglist);
cfd56754 Cong Wang         2012-12-11  637  	setup_timer(&p->timer, br_multicast_port_group_expired,
cfd56754 Cong Wang         2012-12-11  638  		    (unsigned long)p);
cfd56754 Cong Wang         2012-12-11  639  	return p;
cfd56754 Cong Wang         2012-12-11  640  }
cfd56754 Cong Wang         2012-12-11  641  
eb1d1641 Herbert Xu        2010-02-27  642  static int br_multicast_add_group(struct net_bridge *br,
8ef2a9a5 YOSHIFUJI Hideaki 2010-04-18  643  				  struct net_bridge_port *port,
8ef2a9a5 YOSHIFUJI Hideaki 2010-04-18  644  				  struct br_ip *group)
eb1d1641 Herbert Xu        2010-02-27  645  {
eb1d1641 Herbert Xu        2010-02-27  646  	struct net_bridge_mdb_entry *mp;
eb1d1641 Herbert Xu        2010-02-27  647  	struct net_bridge_port_group *p;
e8051688 Eric Dumazet      2010-11-15  648  	struct net_bridge_port_group __rcu **pp;
eb1d1641 Herbert Xu        2010-02-27  649  	unsigned long now = jiffies;
eb1d1641 Herbert Xu        2010-02-27  650  	int err;
eb1d1641 Herbert Xu        2010-02-27  651  
eb1d1641 Herbert Xu        2010-02-27  652  	spin_lock(&br->multicast_lock);
eb1d1641 Herbert Xu        2010-02-27  653  	if (!netif_running(br->dev) ||
eb1d1641 Herbert Xu        2010-02-27  654  	    (port && port->state == BR_STATE_DISABLED))
eb1d1641 Herbert Xu        2010-02-27  655  		goto out;
eb1d1641 Herbert Xu        2010-02-27  656  
eb1d1641 Herbert Xu        2010-02-27  657  	mp = br_multicast_new_group(br, port, group);
eb1d1641 Herbert Xu        2010-02-27  658  	err = PTR_ERR(mp);
4c0833bc Tobias Klauser    2010-12-10  659  	if (IS_ERR(mp))
eb1d1641 Herbert Xu        2010-02-27  660  		goto err;
eb1d1641 Herbert Xu        2010-02-27  661  
eb1d1641 Herbert Xu        2010-02-27  662  	if (!port) {
8a870178 Herbert Xu        2011-02-12  663  		mp->mglist = true;
eb1d1641 Herbert Xu        2010-02-27  664  		mod_timer(&mp->timer, now + br->multicast_membership_interval);
eb1d1641 Herbert Xu        2010-02-27  665  		goto out;
eb1d1641 Herbert Xu        2010-02-27  666  	}
eb1d1641 Herbert Xu        2010-02-27  667  
e8051688 Eric Dumazet      2010-11-15  668  	for (pp = &mp->ports;
e8051688 Eric Dumazet      2010-11-15  669  	     (p = mlock_dereference(*pp, br)) != NULL;
e8051688 Eric Dumazet      2010-11-15  670  	     pp = &p->next) {
eb1d1641 Herbert Xu        2010-02-27  671  		if (p->port == port)
eb1d1641 Herbert Xu        2010-02-27  672  			goto found;
eb1d1641 Herbert Xu        2010-02-27  673  		if ((unsigned long)p->port < (unsigned long)port)
eb1d1641 Herbert Xu        2010-02-27  674  			break;
eb1d1641 Herbert Xu        2010-02-27  675  	}
eb1d1641 Herbert Xu        2010-02-27  676  
cfd56754 Cong Wang         2012-12-11 @677  	p = br_multicast_new_port_group(port, group, *pp);
eb1d1641 Herbert Xu        2010-02-27  678  	if (unlikely(!p))
eb1d1641 Herbert Xu        2010-02-27  679  		goto err;
eb1d1641 Herbert Xu        2010-02-27  680  	rcu_assign_pointer(*pp, p);

---
0-DAY kernel build testing backend         Open Source Technology Center
Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply

* [net-next:master 2/17] drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1932:17: sparse: incorrect type in initializer (different base types)
From: kbuild test robot @ 2012-12-13  0:30 UTC (permalink / raw)
  To: Rasesh Mody; +Cc: netdev

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   520dfe3a3645257bf83660f672c47f8558f3d4c4
commit: 5216562a2ccd037d0eb85a2e8bbfd6315e3f1bb5 [2/17] bna: Tx and Rx Optimizations


sparse warnings:

drivers/net/ethernet/brocade/bna/bna_tx_rx.c:283:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:299:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:299:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:299:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:315:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:315:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:315:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:317:21: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:317:21:    expected unsigned short [unsigned] [usertype] handle
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:317:21:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:330:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:330:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:330:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:345:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:345:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:345:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:362:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:362:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:362:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:368:42: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:368:42:    expected unsigned int [unsigned] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:368:42:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:385:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:385:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:385:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:400:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:400:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:400:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:402:19: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:402:19:    expected unsigned short [unsigned] [usertype] size
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:402:19:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:417:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:417:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:417:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:422:33: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:422:33:    expected unsigned int [unsigned] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:422:33:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:436:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:436:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:436:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:723:17: sparse: cast to restricted __be16
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:723:17: sparse: cast to restricted __be16
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:723:17: sparse: cast to restricted __be16
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:723:17: sparse: cast to restricted __be16
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1650:33: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1650:33:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1650:33:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1664:25: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1664:25:    expected unsigned short [unsigned] [usertype] pages
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1664:25:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1664:25: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1664:25:    expected unsigned short [unsigned] [usertype] page_sz
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1664:25:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1666:61: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1666:61:    expected unsigned short [unsigned] [usertype] rx_buffer_size
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1666:61:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1672:25: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1672:25:    expected unsigned short [unsigned] [usertype] pages
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1672:25:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1672:25: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1672:25:    expected unsigned short [unsigned] [usertype] page_sz
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1672:25:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1676:61: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1676:61:    expected unsigned short [unsigned] [usertype] rx_buffer_size
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1676:61:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1684:17: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1684:17:    expected unsigned short [unsigned] [usertype] pages
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1684:17:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1684:17: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1684:17:    expected unsigned short [unsigned] [usertype] page_sz
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1684:17:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1691:54: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1691:54:    expected unsigned short [unsigned] [usertype] msix_index
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1691:54:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1702:44: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1702:44:    expected unsigned int [unsigned] [usertype] coalescing_timeout
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1702:44:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1704:43: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1704:43:    expected unsigned int [unsigned] [usertype] inter_pkt_timeout
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1704:43:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1406:1: sparse: symbol 'bna_rx_sm_stop_wait_entry' was not declared. Should it be static?
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1459:1: sparse: symbol 'bna_rx_sm_rxf_stop_wait_entry' was not declared. Should it be static?
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1492:1: sparse: symbol 'bna_rx_sm_started_entry' was not declared. Should it be static?
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1557:1: sparse: symbol 'bna_rx_sm_cleanup_wait_entry' was not declared. Should it be static?
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1562:1: sparse: symbol 'bna_rx_sm_cleanup_wait' was not declared. Should it be static?
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1741:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1741:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1741:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1926:9: sparse: cast to restricted __be32
+ drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1932:17: sparse: incorrect type in initializer (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1932:17:    expected unsigned long long [unsigned] [usertype] tmp_addr
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1932:17:    got restricted __be64 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1964:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1970:17: sparse: incorrect type in initializer (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1970:17:    expected unsigned long long [unsigned] [usertype] tmp_addr
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:1970:17:    got restricted __be64 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2185:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2185:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2185:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2185:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2185:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2185:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2189:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2189:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2189:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2189:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2189:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2189:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2194:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2194:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2194:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2194:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2194:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:2194:27: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3168:33: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3168:33:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3168:33:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3177:17: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3177:17:    expected unsigned short [unsigned] [usertype] pages
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3177:17:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3177:17: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3177:17:    expected unsigned short [unsigned] [usertype] page_sz
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3177:17:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3184:54: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3184:54:    expected unsigned short [unsigned] [usertype] msix_index
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3184:54:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3194:44: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3194:44:    expected unsigned int [unsigned] [usertype] coalescing_timeout
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3194:44:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3196:43: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3196:43:    expected unsigned int [unsigned] [usertype] inter_pkt_timeout
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3196:43:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3201:33: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3201:33:    expected unsigned short [unsigned] [usertype] vlan_id
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3201:33:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3217:29: sparse: incorrect type in assignment (different base types)
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3217:29:    expected unsigned short [unsigned] [usertype] num_entries
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3217:29:    got restricted __be16 [usertype] <noident>
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: cast to restricted __be32
+ drivers/net/ethernet/brocade/bna/bna_tx_rx.c:3260:9: sparse: too many warnings

vim +1932 drivers/net/ethernet/brocade/bna/bna_tx_rx.c

f3bd5173 Rasesh Mody 2011-08-08  1920  	rxq->qpt.page_size = page_size;
f3bd5173 Rasesh Mody 2011-08-08  1921  
f3bd5173 Rasesh Mody 2011-08-08  1922  	rxq->rcb->sw_qpt = (void **) swqpt_mem->kva;
5216562a Rasesh Mody 2012-12-11  1923  	rxq->rcb->sw_q = page_mem->kva;
5216562a Rasesh Mody 2012-12-11  1924  
5216562a Rasesh Mody 2012-12-11  1925  	kva = page_mem->kva;
5216562a Rasesh Mody 2012-12-11 @1926  	BNA_GET_DMA_ADDR(&page_mem->dma, dma);
f3bd5173 Rasesh Mody 2011-08-08  1927  
f3bd5173 Rasesh Mody 2011-08-08  1928  	for (i = 0; i < rxq->qpt.page_count; i++) {
5216562a Rasesh Mody 2012-12-11  1929  		rxq->rcb->sw_qpt[i] = kva;
5216562a Rasesh Mody 2012-12-11  1930  		kva += PAGE_SIZE;
5216562a Rasesh Mody 2012-12-11  1931  
5216562a Rasesh Mody 2012-12-11 @1932  		BNA_SET_DMA_ADDR(dma, &bna_dma);
f3bd5173 Rasesh Mody 2011-08-08  1933  		((struct bna_dma_addr *)rxq->qpt.kv_qpt_ptr)[i].lsb =
5216562a Rasesh Mody 2012-12-11  1934  			bna_dma.lsb;
f3bd5173 Rasesh Mody 2011-08-08  1935  		((struct bna_dma_addr *)rxq->qpt.kv_qpt_ptr)[i].msb =

---
0-DAY kernel build testing backend         Open Source Technology Center
Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply

* [PATCH v2] netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
From: Andrew Collins @ 2012-12-13  0:23 UTC (permalink / raw)
  To: netfilter-devel, netdev, kadlec

The MASQUERADE target now handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections.  It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.

This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.

Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
---
 net/ipv4/netfilter/iptable_nat.c  |   15 ++++++++++-----
 net/ipv6/netfilter/ip6table_nat.c |   15 ++++++++++-----
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c
index da2c8a3..eeaff7e 100644
--- a/net/ipv4/netfilter/iptable_nat.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -124,23 +124,28 @@ nf_nat_ipv4_fn(unsigned int hooknum,
 			ret = nf_nat_rule_find(skb, hooknum, in, out, ct);
 			if (ret != NF_ACCEPT)
 				return ret;
-		} else
+		} else {
 			pr_debug("Already setup manip %s for ct %p\n",
 				 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
 				 ct);
+			if (nf_nat_oif_changed(hooknum, ctinfo, nat, out))
+				goto oif_changed;
+		}
 		break;
 
 	default:
 		/* ESTABLISHED */
 		NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
 			     ctinfo == IP_CT_ESTABLISHED_REPLY);
-		if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) {
-			nf_ct_kill_acct(ct, ctinfo, skb);
-			return NF_DROP;
-		}
+		if (nf_nat_oif_changed(hooknum, ctinfo, nat, out))
+			goto oif_changed;
 	}
 
 	return nf_nat_packet(ct, ctinfo, hooknum, skb);
+
+oif_changed:
+	nf_ct_kill_acct(ct, ctinfo, skb);
+	return NF_DROP;
 }
 
 static unsigned int
diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c
index 6c8ae24..e0e788d 100644
--- a/net/ipv6/netfilter/ip6table_nat.c
+++ b/net/ipv6/netfilter/ip6table_nat.c
@@ -127,23 +127,28 @@ nf_nat_ipv6_fn(unsigned int hooknum,
 			ret = nf_nat_rule_find(skb, hooknum, in, out, ct);
 			if (ret != NF_ACCEPT)
 				return ret;
-		} else
+		} else {
 			pr_debug("Already setup manip %s for ct %p\n",
 				 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
 				 ct);
+			if (nf_nat_oif_changed(hooknum, ctinfo, nat, out))
+				goto oif_changed;
+		}
 		break;
 
 	default:
 		/* ESTABLISHED */
 		NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
 			     ctinfo == IP_CT_ESTABLISHED_REPLY);
-		if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) {
-			nf_ct_kill_acct(ct, ctinfo, skb);
-			return NF_DROP;
-		}
+		if (nf_nat_oif_changed(hooknum, ctinfo, nat, out))
+			goto oif_changed;
 	}
 
 	return nf_nat_packet(ct, ctinfo, hooknum, skb);
+
+oif_changed:
+	nf_ct_kill_acct(ct, ctinfo, skb);
+	return NF_DROP;
 }
 
 static unsigned int
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
From: Andrew Collins @ 2012-12-13  0:17 UTC (permalink / raw)
  To: netfilter-devel, netdev, kadlec
In-Reply-To: <1355356167-10397-1-git-send-email-bsderandrew@gmail.com>

On Wed, Dec 12, 2012 at 4:49 PM, Andrew Collins <bsderandrew@gmail.com> wrote:
> The MASQUERADE target now handles routing changes which affect
> the output interface of a connection, but only for ESTABLISHED
> connections.  It is also possible for NEW connections which
> already have a conntrack entry to be affected by routing changes.
>
> This adds a check to drop entries in the NEW+conntrack state
> when the oif has changed.
>
> Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
> ---
>  net/ipv4/netfilter/iptable_nat.c |   15 ++++++++++-----
>  1 files changed, 10 insertions(+), 5 deletions(-)

My mistake, I forgot to include the corresponding ip6table_nat.c
change (it's identical), ignore this for now.

^ permalink raw reply

* [PATCH] netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
From: Andrew Collins @ 2012-12-12 23:49 UTC (permalink / raw)
  To: netfilter-devel, netdev, kadlec

The MASQUERADE target now handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections.  It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.

This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.

Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
---
 net/ipv4/netfilter/iptable_nat.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c
index da2c8a3..eeaff7e 100644
--- a/net/ipv4/netfilter/iptable_nat.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -124,23 +124,28 @@ nf_nat_ipv4_fn(unsigned int hooknum,
 			ret = nf_nat_rule_find(skb, hooknum, in, out, ct);
 			if (ret != NF_ACCEPT)
 				return ret;
-		} else
+		} else {
 			pr_debug("Already setup manip %s for ct %p\n",
 				 maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST",
 				 ct);
+			if (nf_nat_oif_changed(hooknum, ctinfo, nat, out))
+				goto oif_changed;
+		}
 		break;
 
 	default:
 		/* ESTABLISHED */
 		NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED ||
 			     ctinfo == IP_CT_ESTABLISHED_REPLY);
-		if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) {
-			nf_ct_kill_acct(ct, ctinfo, skb);
-			return NF_DROP;
-		}
+		if (nf_nat_oif_changed(hooknum, ctinfo, nat, out))
+			goto oif_changed;
 	}
 
 	return nf_nat_packet(ct, ctinfo, hooknum, skb);
+
+oif_changed:
+	nf_ct_kill_acct(ct, ctinfo, skb);
+	return NF_DROP;
 }
 
 static unsigned int
-- 
1.7.1


^ permalink raw reply related

* Re: Any good documentation on RTNL
From: Ben Hutchings @ 2012-12-12 23:37 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <50C9101A.6000906@candelatech.com>

On Wed, 2012-12-12 at 15:15 -0800, Ben Greear wrote:
> I'm wondering if anyone could point me to some documentation on
> the finer points of what the rtnl_lock() does?  I can't find anything
> overly useful in google or the kernel docs.
> 
> For instance, can the packet rx-logic run (on other threads?) while rtnl is held?
>
> How about tx-logic?

rtnl_lock() is just mutex_lock() on a particular global mutex.  Since
the RX and TX paths obviously don't take such a mutex, it has no effect
on them.

All rtnetlink operations and most net device ioctls are serialised by
this mutex (it's the BKL of networking!).

> In particular, I'm interested to know if it is valid to have
> this state:
> 
> thread 1 holds RTNL, and blocks on thread 2 due to trying to flush a work-queue.
> 
> thread 2 is processing an item on that work-queue, and the work item is sending packets
>    (and blocking for up to 200ms timeout trying to flush a wifi driver's queues).

So long as the workqueue is private (if you're flushing it, I suppose it
must be) then I don't see any problem with that.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-12 23:36 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev, shemminger, davem, mst, john.r.fastabend
In-Reply-To: <CAJZOPZL3grqtd-Tnhr-QxkdFWrhkgcp5+fOUZSaPJd3Ker_7GQ@mail.gmail.com>

On 12/12/2012 05:54 PM, Or Gerlitz wrote:
> On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
>> This series of patches provides an ability to add VLANs to the bridge
>> ports.  This is similar to what can be found in most switches.
>
> Vlad, I wasn't sure if these patches support both modes of switches
> w.r.t vlans namely "access" and " trunk" or in virtualization terms
> VST and VGT or in natural language, both the mode where the entity
> (e.g VM) eventually using the bridge port uses untagged traffic and
> the bridge does vlan tagging/marking and vlan untagging/stripping,
> plus a mode where packets are tagged under a set of allowed vlans or a
> third hybrid mode where there's a default vlan to be used when packets
> arrive untagged and set of allowed vlans to be used as a filter for
> tagged packets.

The patches are generic enough that they can support all three.  Its
just a matter of configuration.

If the entity using the switch is expecting untagged traffic for a
particular vlan, you can designate that vlan as untagged or native, and
the bridge will strip the headers.  If you want more then one untagged
vlan, then you have configure vlan interfaces under the bridge and 
bridge them together.

The patch will also insert a VLAN tag on port if that is how the port
is configured.

There 2 things I don't do:  Q-in-Q (but there is nothing stopping it, 
just didn't write the code), and vlan translation (that would be a 
headache).  I also don't set priorities yet, but that can be added later 
if it is something people want.

>
> Also, does this patch set assumes that a certain port is actually an
> uplink towards the the physical nework/external switch?

No, there is no uplink designation yet.  It might be useful for some
other work I am thinking of, but it wasn't really needed here.

-vlad

>
> Or.
>
>>   The bridge
>> port may have any number of VLANs added to it including vlan 0 priority tagged
>> traffic.  When vlans are added to the port, only traffic tagged with particular
>> vlan will forwarded over this port.  Additionally, vlan ids are added to FDB
>> entries and become part of the lookup.  This way we correctly identify the FDB
>> entry.
>>
>> A single vlan may also be designated as untagged.  Any untagged traffic
>> recieved by the port will be assigned to this vlan.  Any traffic exiting
>> the port with a VID matching the untagged vlan will exit untagged (the
>> bridge will strip the vlan header).  This is similar to "Native Vlan" support
>> available in most switches.
>>
>> The default behavior ofthe bridge is unchanged if no vlans have been
>> configured.
>>
>> Changes since rfc v2:
>>   - Per-port vlan bitmap is gone and is replaced with a vlan list.
>>   - Added bridge vlan list, which is referenced by each port.  Entries in
>>     the birdge vlan list have port bitmap that shows which port are parts
>>     of which vlan.
>>   - Netlink API changes.
>>   - Dropped sysfs support for now.  If people think this is really usefull,
>>     can add it back.
>>   - Support for native/untagged vlans.
>>
>> Changes since rfc v1:
>>   - Comments addressed regarding formatting and RCU usage
>>   - iocts have been removed and changed over the netlink interface.
>>   - Added support of user added ndb entries.
>>   - changed sysfs interface to export a bitmap.  Also added a write interface.
>>     I am not sure how much I like it, but it made my testing easier/faster.  I
>>     might change the write interface to take text instead of binary.
>>
>> Vlad Yasevich (11):
>>    bridge: Add vlan filtering infrastructure
>>    bridge: Validate that vlan is permitted on ingress
>>    bridge: Verify that a vlan is allowed to egress on give port
>>    bridge: Cache vlan in the cb for faster egress lookup.
>>    bridge: Add vlan to unicast fdb entries
>>    bridge: Add vlan id to multicast groups
>>    bridge: Add netlink interface to configure vlans on bridge ports
>>    bridge: Add vlan support to static neighbors
>>    bridge: Add the ability to configure untagged vlans
>>    bridge: Implement untagged vlan handling
>>    bridge: Dump vlan information from a bridge port
>>
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    5 +-
>>   drivers/net/macvlan.c                         |    2 +-
>>   drivers/net/vxlan.c                           |    3 +-
>>   include/linux/netdevice.h                     |    4 +-
>>   include/uapi/linux/if_bridge.h                |   24 ++-
>>   include/uapi/linux/neighbour.h                |    1 +
>>   include/uapi/linux/rtnetlink.h                |    1 +
>>   net/bridge/br_device.c                        |   34 +++-
>>   net/bridge/br_fdb.c                           |  199 +++++++++++++---
>>   net/bridge/br_forward.c                       |  139 +++++++++++
>>   net/bridge/br_if.c                            |  312 +++++++++++++++++++++++++
>>   net/bridge/br_input.c                         |   65 +++++-
>>   net/bridge/br_multicast.c                     |   71 ++++--
>>   net/bridge/br_netlink.c                       |  154 +++++++++++--
>>   net/bridge/br_private.h                       |   66 +++++-
>>   net/core/rtnetlink.c                          |   40 +++-
>>   16 files changed, 1010 insertions(+), 110 deletions(-)
>>
>> --
>> 1.7.7.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Any good documentation on RTNL
From: Ben Greear @ 2012-12-12 23:15 UTC (permalink / raw)
  To: netdev

I'm wondering if anyone could point me to some documentation on
the finer points of what the rtnl_lock() does?  I can't find anything
overly useful in google or the kernel docs.

For instance, can the packet rx-logic run (on other threads?) while rtnl is held?

How about tx-logic?

In particular, I'm interested to know if it is valid to have
this state:

thread 1 holds RTNL, and blocks on thread 2 due to trying to flush a work-queue.

thread 2 is processing an item on that work-queue, and the work item is sending packets
   (and blocking for up to 200ms timeout trying to flush a wifi driver's queues).

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Or Gerlitz @ 2012-12-12 22:54 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, shemminger, davem, mst, john.r.fastabend
In-Reply-To: <1355342477-4971-1-git-send-email-vyasevic@redhat.com>

On Wed, Dec 12, 2012 at 10:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> This series of patches provides an ability to add VLANs to the bridge
> ports.  This is similar to what can be found in most switches.

Vlad, I wasn't sure if these patches support both modes of switches
w.r.t vlans namely "access" and " trunk" or in virtualization terms
VST and VGT or in natural language, both the mode where the entity
(e.g VM) eventually using the bridge port uses untagged traffic and
the bridge does vlan tagging/marking and vlan untagging/stripping,
plus a mode where packets are tagged under a set of allowed vlans or a
third hybrid mode where there's a default vlan to be used when packets
arrive untagged and set of allowed vlans to be used as a filter for
tagged packets.

Also, does this patch set assumes that a certain port is actually an
uplink towards the the physical nework/external switch?

Or.

>  The bridge
> port may have any number of VLANs added to it including vlan 0 priority tagged
> traffic.  When vlans are added to the port, only traffic tagged with particular
> vlan will forwarded over this port.  Additionally, vlan ids are added to FDB
> entries and become part of the lookup.  This way we correctly identify the FDB
> entry.
>
> A single vlan may also be designated as untagged.  Any untagged traffic
> recieved by the port will be assigned to this vlan.  Any traffic exiting
> the port with a VID matching the untagged vlan will exit untagged (the
> bridge will strip the vlan header).  This is similar to "Native Vlan" support
> available in most switches.
>
> The default behavior ofthe bridge is unchanged if no vlans have been
> configured.
>
> Changes since rfc v2:
>  - Per-port vlan bitmap is gone and is replaced with a vlan list.
>  - Added bridge vlan list, which is referenced by each port.  Entries in
>    the birdge vlan list have port bitmap that shows which port are parts
>    of which vlan.
>  - Netlink API changes.
>  - Dropped sysfs support for now.  If people think this is really usefull,
>    can add it back.
>  - Support for native/untagged vlans.
>
> Changes since rfc v1:
>  - Comments addressed regarding formatting and RCU usage
>  - iocts have been removed and changed over the netlink interface.
>  - Added support of user added ndb entries.
>  - changed sysfs interface to export a bitmap.  Also added a write interface.
>    I am not sure how much I like it, but it made my testing easier/faster.  I
>    might change the write interface to take text instead of binary.
>
> Vlad Yasevich (11):
>   bridge: Add vlan filtering infrastructure
>   bridge: Validate that vlan is permitted on ingress
>   bridge: Verify that a vlan is allowed to egress on give port
>   bridge: Cache vlan in the cb for faster egress lookup.
>   bridge: Add vlan to unicast fdb entries
>   bridge: Add vlan id to multicast groups
>   bridge: Add netlink interface to configure vlans on bridge ports
>   bridge: Add vlan support to static neighbors
>   bridge: Add the ability to configure untagged vlans
>   bridge: Implement untagged vlan handling
>   bridge: Dump vlan information from a bridge port
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    5 +-
>  drivers/net/macvlan.c                         |    2 +-
>  drivers/net/vxlan.c                           |    3 +-
>  include/linux/netdevice.h                     |    4 +-
>  include/uapi/linux/if_bridge.h                |   24 ++-
>  include/uapi/linux/neighbour.h                |    1 +
>  include/uapi/linux/rtnetlink.h                |    1 +
>  net/bridge/br_device.c                        |   34 +++-
>  net/bridge/br_fdb.c                           |  199 +++++++++++++---
>  net/bridge/br_forward.c                       |  139 +++++++++++
>  net/bridge/br_if.c                            |  312 +++++++++++++++++++++++++
>  net/bridge/br_input.c                         |   65 +++++-
>  net/bridge/br_multicast.c                     |   71 ++++--
>  net/bridge/br_netlink.c                       |  154 +++++++++++--
>  net/bridge/br_private.h                       |   66 +++++-
>  net/core/rtnetlink.c                          |   40 +++-
>  16 files changed, 1010 insertions(+), 110 deletions(-)
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: vlan tagged packets and libpcap breakage
From: Ani Sinha @ 2012-12-12 22:16 UTC (permalink / raw)
  To: Michael Richardson, Eric W. Biederman
  Cc: netdev, Francesco Ruggeri, tcpdump-workers
In-Reply-To: <alpine.OSX.2.00.1212121205040.78903@animac.local>

+ Eric B.


On Wed, Dec 12, 2012 at 1:53 PM, Ani Sinha <ani@aristanetworks.com> wrote:
>
>>
>> unsigned int netdev_8021q_inskb = 1;
>>
>> ...
>>       {
>>               .ctl_name       = NET_CORE_8021q_INSKB,
>>               .procname       = "netdev_8021q_inskb",
>>               .data           = &netdev_8021q_inskb,
>>               .maxlen         = sizeof(int),
>>               .mode           = 0444,
>>               .proc_handler   = proc_dointvec
>>       },
>>
>> would seem to do it to me.
>> Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it
>> finds it, and it is >0, then do the cmsg thing.
>>
>
> Does this work? This is just an experimental patch and by no means final.
> I just want to have an idea what everyone thought about it. Once we debate
> and discusss, I can cook up a final patch that would be worth commiting.
>
> Also instead of having this /proc interface, we can perhaps check for a
> specific
> kernel version that :
>
> (a) has the vlan tag info in the skb metadata (as opposed to in the packet
> itself)
> (b) has the following patch that adds the capability to generate a filter
> based on the tag value :
>
> commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat Oct 27 02:26:17 2012 +0000
>
>     net: filter: add vlan tag access
>
> WE need both of the above two things for the userland to generate a filter
> code that compares vlan tag values in the skb metadata. For kernels that
> has the vlan tag in
> the skb metadata but does not have the above commit (b), there is nothing
> that can be done. For older kernels that had the vlan tag info in the
> packet itself, the filter code can be generated differently to look at
> specific offsets within the packet (something that libpcap does
> currently).
>
> We have already ruled out the idea of generating a filter and trying to
> load and see if that fails (see previous emails on this thread).
>
> Hope this makes sense.
>
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index c45eabc..91e2ba3 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -36,6 +36,7 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp)
>         return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
>  }
>
> +extern bool sysctl_8021q_inskb;
>  extern int sk_filter(struct sock *sk, struct sk_buff *skb);
>  extern unsigned int sk_run_filter(const struct sk_buff *skb,
>                                   const struct sock_filter *filter);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c23543c..4f5a657 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -41,6 +41,8 @@
>  #include <linux/seccomp.h>
>  #include <linux/if_vlan.h>
>
> +bool sysctl_8021q_inskb = 1;
> +
>  /* No hurry in this branch
>   *
>   * Exported for the bpf jit load helper.
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index d1b0804..f9a3700 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -15,6 +15,7 @@
>  #include <linux/init.h>
>  #include <linux/slab.h>
>  #include <linux/kmemleak.h>
> +#include <linux/filter.h>
>
>  #include <net/ip.h>
>  #include <net/sock.h>
> @@ -189,6 +190,13 @@ static struct ctl_table net_core_table[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec
>         },
> +       {
> +               .procname       = "8021q_inskb",
> +               .data           = &sysctl_8021q_inskb,
> +               .maxlen         = sizeof(bool),
> +               .mode           = 0444,
> +               .proc_handler   = proc_dointvec
> +       },
>         { }
>  };
>
_______________________________________________
tcpdump-workers mailing list
tcpdump-workers@lists.tcpdump.org
https://lists.sandelman.ca/mailman/listinfo/tcpdump-workers

^ permalink raw reply

* Re: [PATCH] net: filter: return -EINVAL if BPF_S_ANC* operation is not supported
From: Ani Sinha @ 2012-12-12 22:06 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Eric Dumazet, David Miller, netdev
In-Reply-To: <50C8B008.2000804@redhat.com>

On Wed, Dec 12, 2012 at 8:25 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 12/12/2012 01:22 PM, Eric Dumazet wrote:
>>
>> On Wed, 2012-12-12 at 10:31 +0100, Daniel Borkmann wrote:
>>>
>>> Currently, we return -EINVAL for malicious or wrong BPF filters.
>>> However, this is not done for BPF_S_ANC* operations, which makes it
>>> more difficult to detect if it's actually supported or not by the
>>> BPF machine. Therefore, we should also return -EINVAL if K is within
>>> the SKF_AD_OFF universe and the ancillary operation did not match.
>>>
>>> Cc: Ani Sinha <ani@aristanetworks.com>
>>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>> ---
>>>   net/core/filter.c | 8 +++++++-
>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index c23543c..de9bed4 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -531,7 +531,7 @@ int sk_chk_filter(struct sock_filter *filter,
>>> unsigned int flen)
>>>                 [BPF_JMP|BPF_JSET|BPF_K] = BPF_S_JMP_JSET_K,
>>>                 [BPF_JMP|BPF_JSET|BPF_X] = BPF_S_JMP_JSET_X,
>>>         };
>>> -       int pc;
>>> +       int pc, anc_found;
>>>
>>>         if (flen == 0 || flen > BPF_MAXINSNS)
>>>                 return -EINVAL;
>>> @@ -592,8 +592,10 @@ int sk_chk_filter(struct sock_filter *filter,
>>> unsigned int flen)
>>>                 case BPF_S_LD_W_ABS:
>>>                 case BPF_S_LD_H_ABS:
>>>                 case BPF_S_LD_B_ABS:
>>> +                       anc_found = 0;
>>>   #define ANCILLARY(CODE) case SKF_AD_OFF + SKF_AD_##CODE:      \
>>>                                 code = BPF_S_ANC_##CODE;        \
>>> +                               anc_found = 1;                  \
>>>                                 break
>>>                         switch (ftest->k) {
>>>                         ANCILLARY(PROTOCOL);
>>> @@ -610,6 +612,10 @@ int sk_chk_filter(struct sock_filter *filter,
>>> unsigned int flen)
>>>                         ANCILLARY(VLAN_TAG);
>>>                         ANCILLARY(VLAN_TAG_PRESENT);
>>>                         }
>>> +
>>> +                       /* ancillary operation unkown or unsupported */
>>> +                       if (anc_found == 0 && ftest->k >= SKF_AD_OFF)
>>> +                               return -EINVAL;
>>>                 }
>>>                 ftest->code = code;
>>>         }
>>
>>
>> Several points :
>>
>> 1) This might break a userland filter that was previously working, by
>> returning 0 when load_pointer() returns NULL.
>>
>> Specifying an offset bigger than skb->len is not _invalid_, it only
>> makes a filter returns 0, because load_pointer() returns NULL.
>
>
> I think it will not break for code, that calls load_pointer() in such a
> circumstance which passed the sk_chk_filter() test. However, it will
> "break" for code that calls ...
>
>   { BPF_LD | BPF_(W|H|B) | BPF_ABS, 0, 0, <K> },
>
> ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is not an ancillary.
>
> But ...
>
> Assuming some old code will have such an instruction where <K> is between
> [0xfffff000, 0xffffffff] and it doesn't know ancillary operations, then
> this will give a non-expected/unwanted behavior as well (since we do not
> return the BPF machine with 0 as it probably was the case before anc.ops,
> but load sth. into the accumulator instead and continue with the next
> instruction, for instance), right? Thus, following this argumentation, user
> space code would already have been broken by introducing ancillary
> operations into the BPF machine per se.
>
> This is probably just an assumption, but code that does such a direct load,
> e.g. "load word at packet offset 0xffffffff into accumulator" ("ld
> [0xffffffff]")
> is quite broken, isn't it? Isn't the whole assumption of ancillary
> operations
> that no-one intentionally calls things like "ld [0xffffffff]" and expect
> this
> word to be loaded from the packet offset?
>
>
>> 2) This wont help applications running on old kernels where your patch
>> wont be applied, as already mentioned yesterday.
>
>
> Agreed, but leaving old kernels aside, it would be nice if newer kernels
> could validate that, so at least from kernel <xyz> onwards it could be
> checked _for sure_ if anc.op <abc> is present and can be used.
>

I second that. It would be nice to have a clean way to know whether a
particular ancilliary operation is supported by the kernel. After all,
the latest kernel of today will be ancient one soon enough ;)

^ permalink raw reply

* Re: [tcpdump-workers] vlan tagged packets and libpcap breakage
From: Ani Sinha @ 2012-12-12 21:53 UTC (permalink / raw)
  To: Michael Richardson; +Cc: netdev, tcpdump-workers, Francesco Ruggeri
In-Reply-To: <21992.1351723328@obiwan.sandelman.ca>


>
> unsigned int netdev_8021q_inskb = 1;
>
> ...
> 	{
> 		.ctl_name	= NET_CORE_8021q_INSKB,
> 		.procname	= "netdev_8021q_inskb",
> 		.data		= &netdev_8021q_inskb,
> 		.maxlen		= sizeof(int),
> 		.mode		= 0444,
> 		.proc_handler	= proc_dointvec
> 	},
>
> would seem to do it to me.
> Then pcap can fopen("/proc/sys/net/core/netdev_8021q_inskb") and if it
> finds it, and it is >0, then do the cmsg thing.
>

Does this work? This is just an experimental patch and by no means final.
I just want to have an idea what everyone thought about it. Once we debate
and discusss, I can cook up a final patch that would be worth commiting.

Also instead of having this /proc interface, we can perhaps check for a
specific
kernel version that :

(a) has the vlan tag info in the skb metadata (as opposed to in the packet
itself)
(b) has the following patch that adds the capability to generate a filter
based on the tag value :

commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Oct 27 02:26:17 2012 +0000

    net: filter: add vlan tag access

WE need both of the above two things for the userland to generate a filter
code that compares vlan tag values in the skb metadata. For kernels that
has the vlan tag in
the skb metadata but does not have the above commit (b), there is nothing
that can be done. For older kernels that had the vlan tag info in the
packet itself, the filter code can be generated differently to look at
specific offsets within the packet (something that libpcap does
currently).

We have already ruled out the idea of generating a filter and trying to
load and see if that fails (see previous emails on this thread).

Hope this makes sense.


diff --git a/include/linux/filter.h b/include/linux/filter.h
index c45eabc..91e2ba3 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -36,6 +36,7 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp)
 	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
 }

+extern bool sysctl_8021q_inskb;
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);
 extern unsigned int sk_run_filter(const struct sk_buff *skb,
 				  const struct sock_filter *filter);
diff --git a/net/core/filter.c b/net/core/filter.c
index c23543c..4f5a657 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -41,6 +41,8 @@
 #include <linux/seccomp.h>
 #include <linux/if_vlan.h>

+bool sysctl_8021q_inskb = 1;
+
 /* No hurry in this branch
  *
  * Exported for the bpf jit load helper.
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index d1b0804..f9a3700 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -15,6 +15,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/kmemleak.h>
+#include <linux/filter.h>

 #include <net/ip.h>
 #include <net/sock.h>
@@ -189,6 +190,13 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "8021q_inskb",
+		.data		= &sysctl_8021q_inskb,
+		.maxlen		= sizeof(bool),
+		.mode		= 0444,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };

^ permalink raw reply related

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Eric W. Biederman @ 2012-12-12 21:48 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka
In-Reply-To: <87zk1jht7d.fsf@xmission.com>

ebiederm@xmission.com (Eric W. Biederman) writes:

> It is very wrong to presume that without context you know the reason for
> the exsitence of any network namespace and that you should or even that
> you can manage it.  Think of running your multi-network namespace
> managing application in a container.

A good example of a network namespace you don't want to mess with are
the network namespaces created by vsftp and chrome for security purposes
to remove any possibility of creating new connections to the network.

Eric

^ permalink raw reply

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Eric W. Biederman @ 2012-12-12 21:11 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, aatteka
In-Reply-To: <50C8EEF0.2010201@6wind.com>

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 12/12/2012 20:25, Eric W. Biederman a écrit :

>> Short answer you don't need to do any of this.
>>
>> setns with the namespace files in /proc/<pid>/ns/net gives you more than
>> enough mechanism to solve this problem.  And iprout2 already supports
>> all of this.
>>
>> And your approach creates very serious maintenances problems to the
>> point I don't even want to read your patches.  What namespace do your
>> namespace id's live in?
>>
>> A socketopt to change the namespace of a socket is nasty because sockets
>> changing which network namespace they are in, leads to races which
>> aren't worth thinking about writing the code to handle.
>>
>> Longer answer.
>>
>> You can bind mount the namespace id's /proc/<pid>/ns/net files to
>> give you any name you want.  This puts naming policy in userspace
>> control, and nests just fine.
>>
>> You can open a socket in any network namespace you want just
>> by calling setns before socket.  Wrapping this idiom in a library call
>> or if there is sufficient need in a socketat system call seems
>> reasonable.
> Yes, I agree that this SO_NETNS may be a bad idea.
>
>>
>> There is a classic question of if two network namespace files refer to
>> the same network namespace and I have code in linux-next and my pull
>> request to Linus to give those files a unique inode number.
> Interesseting to know that.
>
>>
>> So please use the facilities already merged into the kernel.
> Ok, but how can a daemon get the list of netns? Suppose that we want that
> quagga manage all netns, how can it get this list to open needed netlink
> socket?
>
> For example, iproute2 is only aware of netns created with iproute2, but it
> will no detect other netns.

iproute2 is only aware of network namespaces created with the convention
that iproute2 uses.

If you want other network namespaces to be visible globally use the same
or a similar convention. All iproute2 does is
"mount --bind /proc/<pid>/ns/net /var/run/netns/<name>".   So this
convention is not hard to follow.

It is very wrong to presume that without context you know the reason for
the exsitence of any network namespace and that you should or even that
you can manage it.  Think of running your multi-network namespace
managing application in a container.

Eric

^ permalink raw reply

* Re: netconsole fun
From: Peter Hurley @ 2012-12-12 20:59 UTC (permalink / raw)
  To: Neil Horman; +Cc: Cong Wang, netdev
In-Reply-To: <20121211164526.GB7481@neilslaptop.think-freely.org>

On Tue, 2012-12-11 at 11:45 -0500, Neil Horman wrote:
> On Tue, Dec 11, 2012 at 10:16:51AM -0500, Peter Hurley wrote:
> > On Tue, 2012-12-11 at 09:30 -0500, Neil Horman wrote:
> > > On Tue, Dec 11, 2012 at 09:19:52AM -0500, Peter Hurley wrote:
> > > > On Tue, 2012-12-11 at 04:51 +0000, Cong Wang wrote:
> > > > > On Mon, 10 Dec 2012 at 14:17 GMT, Peter Hurley <peter@hurleysoftware.com> wrote:
> > > > > > Now that netpoll has been disabled for slaved devices, is there a
> > > > > > recommended method of running netconsole on a machine that has a slaved
> > > > > > device?
> > > > > >
> > > > > 
> > > > > Yes, running it on the master device instead.
> > > > 
> > > > Thanks for the suggestion, but:
> > > > 
> > > > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.10.99/br0,30000@192.168.10.100/xx:xx:xx:xx:xx:xx
> > > > ...
> > > > [ 5.289869] netpoll: netconsole: local port 6665
> > > > [ 5.289885] netpoll: netconsole: local IP 192.168.10.99
> > > > [ 5.289892] netpoll: netconsole: interface 'br0'
> > > > [ 5.289898] netpoll: netconsole: remote port 30000
> > > > [ 5.289907] netpoll: netconsole: remote IP 192.168.10.100
> > > > [ 5.289914] netpoll: netconsole: remote ethernet address xx:xx:xx:xx:xx:xx
> > > > [ 5.289922] netpoll: netconsole: br0 doesn't exist, aborting
> > > > [ 5.289929] netconsole: cleaning up
> > > > ...
> > > > [ 9.392291] Bridge firewalling registered
> > > > [ 9.396805] device eth1 entered promiscuous mode
> > > > [ 9.418350] eth1:  setting full-duplex.
> > > > [ 9.421268] br0: port 1(eth1) entered forwarding state
> > > > [ 9.423354] br0: port 1(eth1) entered forwarding state
> > > > 
> > > > 
> > > > Is there a way to control or associate network device names prior to
> > > > udev renaming?
> > > > 
> > > That looks like a systemd problem (or more specifically a boot dependency
> > > problem).  You need to modify your netconsole unit/service file to start after
> > > all your networking is up.  NetworkManager provides a dummy service file for
> > > this purpose, called networkmanager-wait-online.service
> > 
> > Ok. So with a single physical network interface that will be bridged,
> > netconsole cannot used for kernel boot messages.
> > 
> > With a machine with multiple nics, is there a way to control device
> > naming so that the interface name to be used by netconsole specified on
> > the boot command line will actually corresponding to the intended
> > device. For example,
> > 
> > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.7.0-rc8-xeon ...... netconsole=@192.168.1.123/eth0,30000@192.168.1.139/xx:xx:xx:xx:xx:xx
> > ....
> > [ 4.092184] 3c59x: Donald Becker and others.
> > [ 4.092204] 0000:07:05.0: 3Com PCI 3c905C Tornado at ffffc9000186cf80.
> > [ 4.094035] tg3.c:v3.125 (September 26, 2012)
> > ....
> > [ 4.125038] tg3 0000:08:00.0 eth1: Tigon3 [partno(BCM95754) rev b002] (PCI Express) MAC address xx:xx:xx:xx:xx:xx
> > [ 4.125055] tg3 0000:08:00.0 eth1: attached PHY is 5787 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
> > [ 4.125062] tg3 0000:08:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
> > [ 4.125068] tg3 0000:08:00.0 eth1: dma_rwctrl[76180000] dma_mask[64-bit]
> > 
> > This is attaching netconsole to the wrong device because bus
> > enumeration, and therefore load order, is not consistent from boot to
> > boot.
> > 
> No, theres no way to do that.  As you note device ennumeration isn't consistent
> accross boots, thats why udev creates rules to rename devices based on immutable
> (or semi-immutable) data, like mac addresses, or pci bus locations).  Once that
> happens, you'll have consistent names for your interfaces, and that work will be
> guaranteed to be done after networkmanager has finished opening all the
> interfaces that it needs (hence my suggestion to make netconsole service
> dependent on networkmanager service startup completing).

Just wondering if you think something like the patch below is
suitable/acceptable for insulating netconsole from inconsistent device
name scenarios without changing the existing semantics. The basic idea
is to allow an ethernet MAC address in the <dev> field of the
netconsole= options, and if a MAC address was specified rather than a
device name, to do the dev lookup from the MAC address instead.

This doesn't extend to, but also doesn't interfere with, the dynamic
config of netconsole via configfs.

Would you mind reviewing it?

Regards,
Peter

-- >% --
Subject: [PATCH] netconsole: allow mac addr to specify local interface device

Allow the local interface device to be specified by ethernet
MAC address. For example,

  netconsole=@10.0.0.1/12:34:56:78:9a:bc,30000@10.0.0.3/cb:a9:87:65:43:21

This alternate form enables netconsole to start and log boot messages
even if the network device name varies (eg., a machine with multiple NICs).

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
---
 Documentation/networking/netconsole.txt |  9 +++++++--
 drivers/net/netconsole.c                |  2 ++
 include/linux/netpoll.h                 |  1 +
 net/core/netpoll.c                      | 19 +++++++++++++++++--
 4 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/netconsole.txt b/Documentation/networking/netconsole.txt
index 2e9e0ae2..2dfd703 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -23,12 +23,13 @@ Sender and receiver configuration:
 It takes a string configuration parameter "netconsole" in the
 following format:
 
- netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
+ netconsole=[src-port]@[src-ip]/[dev|macaddr],[tgt-port]@<tgt-ip>/[tgt-macaddr]
 
    where
         src-port      source for UDP packets (defaults to 6665)
         src-ip        source IP to use (interface address)
-        dev           network interface (eth0)
+        dev|macaddr   network interface (eth0)
+		      alternate: ethernet MAC address of network interface
         tgt-port      port for logging agent (6666)
         tgt-ip        IP address for logging agent
         tgt-macaddr   ethernet MAC address for logging agent (broadcast)
@@ -47,6 +48,10 @@ complete string enclosed in "quotes", thusly:
 
  modprobe netconsole netconsole="@/,@10.0.0.2/;@/eth1,6892@10.0.0.3/"
 
+The alternate form for specifying the local network interface with the
+ethernet MAC address is useful when the device names are inconsistent from
+boot to boot (eg., if the machine has multiple NICs).
+
 Built-in netconsole starts immediately after the TCP stack is
 initialized and attempts to bring up the supplied dev at the supplied
 address.
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 6989ebe..3808a31 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -180,6 +180,7 @@ static struct netconsole_target *alloc_param_target(char *target_config)
 	strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
 	nt->np.local_port = 6665;
 	nt->np.remote_port = 6666;
+	memset(nt->np.local_mac, 0, ETH_ALEN);
 	memset(nt->np.remote_mac, 0xff, ETH_ALEN);
 
 	/* Parse parameters and setup netpoll */
@@ -560,6 +561,7 @@ static struct config_item *make_netconsole_target(struct config_group *group,
 	strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
 	nt->np.local_port = 6665;
 	nt->np.remote_port = 6666;
+	memset(nt->np.local_mac, 0, ETH_ALEN);
 	memset(nt->np.remote_mac, 0xff, ETH_ALEN);
 
 	/* Initialize the config_item member */
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 66d5379..d646b26 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -20,6 +20,7 @@ struct netpoll {
 
 	__be32 local_ip, remote_ip;
 	u16 local_port, remote_port;
+	u8 local_mac[ETH_ALEN];
 	u8 remote_mac[ETH_ALEN];
 
 	struct list_head rx; /* rx_np list element */
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 77a0388..8910a95 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -660,6 +660,7 @@ void netpoll_print_options(struct netpoll *np)
 	np_info(np, "local port %d\n", np->local_port);
 	np_info(np, "local IP %pI4\n", &np->local_ip);
 	np_info(np, "interface '%s'\n", np->dev_name);
+	np_info(np, "local ethernet address %pM\n", np->local_mac);
 	np_info(np, "remote port %d\n", np->remote_port);
 	np_info(np, "remote IP %pI4\n", &np->remote_ip);
 	np_info(np, "remote ethernet address %pM\n", np->remote_mac);
@@ -693,7 +694,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt)
 		if ((delim = strchr(cur, ',')) == NULL)
 			goto parse_failed;
 		*delim = 0;
-		strlcpy(np->dev_name, cur, sizeof(np->dev_name));
+		if (!mac_pton(cur, np->local_mac))
+			strlcpy(np->dev_name, cur, sizeof(np->dev_name));
 		cur = delim;
 	}
 	cur++;
@@ -806,8 +808,21 @@ int netpoll_setup(struct netpoll *np)
 	struct in_device *in_dev;
 	int err;
 
-	if (np->dev_name)
+	if (!is_zero_ether_addr(np->local_mac)) {
+		rcu_read_lock();
+		ndev = dev_getbyhwaddr_rcu(&init_net, ARPHRD_ETHER, np->local_mac);
+		if (!ndev) {
+			rcu_read_unlock();
+			np_err(np, "%pM doesn't exist, aborting\n", np->local_mac);
+			return -ENODEV;
+		}
+		dev_hold(ndev);
+		rcu_read_unlock();
+		strlcpy(np->dev_name, ndev->name, IFNAMSIZ);
+
+	} else if (np->dev_name)
 		ndev = dev_get_by_name(&init_net, np->dev_name);
+
 	if (!ndev) {
 		np_err(np, "%s doesn't exist, aborting\n", np->dev_name);
 		return -ENODEV;
-- 
1.8.0.1

^ permalink raw reply related

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Nicolas Dichtel @ 2012-12-12 20:54 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka
In-Reply-To: <87fw3boyxn.fsf@xmission.com>

Le 12/12/2012 20:25, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> The goal of this serie is to ease netns management by daemons. Some systems use
>> netns only to virtualize network stack and don't want to multiply userland
>> daemons.  These system may have a lot of netns, up to 2000. We don't want to
>> launch an instance of each daemons (quagga, strongswan, conntrackd, ...) for
>> each netns because it will consume a lot of ressources. Having one daemon that
>> manage all netns is more efficient (mainly if there are few objects to manage:
>> one or two routes per netns for example).
>> Hence, one goal of this serie is to allow, for a daemon, to monitor netns
>> activities, thus it can open or close netlink sockets, allocating structures
>> needed to manage these netns when they are created or deleted.
>> To help to identify a netns, an index has been added to each netns.
>>
>> A new setsockopt() option is also added, to help daemons to open socket in the
>> right netns. For now, a daemon that want to open a socket in a specified netns,
>> need to call setns(CLONE_NEWNET) with a fd (not so easy to found), open the
>> socket and then call again setns() to go back in the initial netns. Having this
>> kind of setsockopt() will simplify operations. Obviously, this setsockopt()
>> should be done enough early (is test on sk_state enough?). The first target is
>> netlink socket but it can be useful for other kind of socket, it's why a add a
>> generic socket option.
>>
>> As usual, the patch against iproute2 will be sent once the patches are included
>> and net-next merged. I can send it on demand.
>
> Short answer you don't need to do any of this.
>
> setns with the namespace files in /proc/<pid>/ns/net gives you more than
> enough mechanism to solve this problem.  And iprout2 already supports
> all of this.
>
> And your approach creates very serious maintenances problems to the
> point I don't even want to read your patches.  What namespace do your
> namespace id's live in?
>
> A socketopt to change the namespace of a socket is nasty because sockets
> changing which network namespace they are in, leads to races which
> aren't worth thinking about writing the code to handle.
>
> Longer answer.
>
> You can bind mount the namespace id's /proc/<pid>/ns/net files to
> give you any name you want.  This puts naming policy in userspace
> control, and nests just fine.
>
> You can open a socket in any network namespace you want just
> by calling setns before socket.  Wrapping this idiom in a library call
> or if there is sufficient need in a socketat system call seems
> reasonable.
Yes, I agree that this SO_NETNS may be a bad idea.

>
> There is a classic question of if two network namespace files refer to
> the same network namespace and I have code in linux-next and my pull
> request to Linus to give those files a unique inode number.
Interesseting to know that.

>
> So please use the facilities already merged into the kernel.
Ok, but how can a daemon get the list of netns? Suppose that we want that
quagga manage all netns, how can it get this list to open needed netlink
socket?
For example, iproute2 is only aware of netns created with iproute2, but it
will no detect other netns.

^ permalink raw reply

* Re: [PATCH V1 net-next 1/3] net: ethtool: Add destination MAC address to flow steering API
From: Amir Vadai @ 2012-12-12 20:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David S. Miller, netdev, Or Gerlitz, Hadar Har-Zion, Yan Burman
In-Reply-To: <1355336881.3149.27.camel@bwh-desktop.uk.solarflarecom.com>

On 12/12/2012 20:28, Ben Hutchings wrote:
> On Wed, 2012-12-12 at 14:13 +0200, Amir Vadai wrote:
>> From: Yan Burman <yanb@mellanox.com>
>>
>> Add ability to specify destination MAC address for L3/L4 flow spec
>> in order to be able to specify action for different VM's under vSwitch
>> configuration. This change is transparent to older userspace.
>>
>> Signed-off-by: Yan Burman <yanb@mellanox.com>
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>> ---
>>   include/uapi/linux/ethtool.h | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>> index d3eaaaf..be8c41e 100644
>> --- a/include/uapi/linux/ethtool.h
>> +++ b/include/uapi/linux/ethtool.h
>> @@ -500,13 +500,15 @@ union ethtool_flow_union {
>>   	struct ethtool_ah_espip4_spec		esp_ip4_spec;
>>   	struct ethtool_usrip4_spec		usr_ip4_spec;
>>   	struct ethhdr				ether_spec;
>> -	__u8					hdata[60];
>> +	__u8					hdata[52];
>>   };
>>
>>   struct ethtool_flow_ext {
>> -	__be16	vlan_etype;
>> -	__be16	vlan_tci;
>> -	__be32	data[2];
>> +	__u8		padding[2];
>> +	unsigned char	h_dest[ETH_ALEN];	/* destination eth addr	*/
>> +	__be16		vlan_etype;
>> +	__be16		vlan_tci;
>> +	__be32		data[2];
>>   };
>>
>>   /**
>> @@ -1027,6 +1029,7 @@ enum ethtool_sfeatures_retval_bits {
>>   #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
>>   /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
>>   #define	FLOW_EXT	0x80000000
>> +#define	FLOW_MAC_EXT	0x40000000
>
> Please can you send another patch that adds kernel-doc to struct
> ethtool_flow_ext explaining which fields are dependent on which flags.
>
> Ben.
>
>>   /* L3-L4 network traffic flow hash options */
>>   #define	RXH_L2DA	(1 << 1)
>

Sure. Will send it.

Amir

^ permalink raw reply

* Re: [PATCH 00/11] Add basic VLAN support to bridges
From: Vlad Yasevich @ 2012-12-12 20:12 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, davem, mst, john.r.fastabend
In-Reply-To: <20121212120541.72ba1ba9@nehalam.linuxnetplumber.net>

On 12/12/2012 03:05 PM, Stephen Hemminger wrote:
> On Wed, 12 Dec 2012 15:01:06 -0500
> Vlad Yasevich <vyasevic@redhat.com> wrote:
>
>> This series of patches provides an ability to add VLANs to the bridge
>> ports.  This is similar to what can be found in most switches.  The bridge
>> port may have any number of VLANs added to it including vlan 0 priority tagged
>> traffic.  When vlans are added to the port, only traffic tagged with particular
>> vlan will forwarded over this port.  Additionally, vlan ids are added to FDB
>> entries and become part of the lookup.  This way we correctly identify the FDB
>> entry.
>>
>> A single vlan may also be designated as untagged.  Any untagged traffic
>> recieved by the port will be assigned to this vlan.  Any traffic exiting
>> the port with a VID matching the untagged vlan will exit untagged (the
>> bridge will strip the vlan header).  This is similar to "Native Vlan" support
>> available in most switches.
>>
>> The default behavior ofthe bridge is unchanged if no vlans have been
>> configured.
>
> My concern is that all older configurations must continue to work.
> Is it still unchanged if a vlan device is added to the bridge, or vlan is
> configured on an ethernet that has been added to the bridge?
>

Yes. I've tested with vlans under the bridge and on top of the bridge. 
This code doesn't have any effect if the no filtering is configured.
I am finishing changes to iproute to do the configuration.

-vlad

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox