Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2 1/2] PCI: Add Solarflare vendor ID and SFC4000 device IDs
From: Ben Hutchings @ 2011-10-05 21:28 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: netdev, linux-pci, linux-net-drivers

These will be shared between the sfc driver and a PCI quirk.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c           |   10 ++++++----
 drivers/net/sfc/efx.h           |    4 ----
 drivers/net/sfc/falcon.c        |    3 ++-
 drivers/net/sfc/falcon_boards.c |    3 ++-
 include/linux/pci_ids.h         |    5 +++++
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index b59abc7..f8b9be3 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -2197,13 +2197,15 @@ void efx_schedule_reset(struct efx_nic *efx, enum reset_type type)
 
 /* PCI device ID table */
 static DEFINE_PCI_DEVICE_TABLE(efx_pci_table) = {
-	{PCI_DEVICE(EFX_VENDID_SFC, FALCON_A_P_DEVID),
+	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE,
+		    PCI_DEVICE_ID_SOLARFLARE_SFC4000A_0),
 	 .driver_data = (unsigned long) &falcon_a1_nic_type},
-	{PCI_DEVICE(EFX_VENDID_SFC, FALCON_B_P_DEVID),
+	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE,
+		    PCI_DEVICE_ID_SOLARFLARE_SFC4000B),
 	 .driver_data = (unsigned long) &falcon_b0_nic_type},
-	{PCI_DEVICE(EFX_VENDID_SFC, BETHPAGE_A_P_DEVID),
+	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE, BETHPAGE_A_P_DEVID),
 	 .driver_data = (unsigned long) &siena_a0_nic_type},
-	{PCI_DEVICE(EFX_VENDID_SFC, SIENA_A_P_DEVID),
+	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE, SIENA_A_P_DEVID),
 	 .driver_data = (unsigned long) &siena_a0_nic_type},
 	{0}			/* end of list */
 };
diff --git a/drivers/net/sfc/efx.h b/drivers/net/sfc/efx.h
index b0d1209..c7e7523 100644
--- a/drivers/net/sfc/efx.h
+++ b/drivers/net/sfc/efx.h
@@ -15,10 +15,6 @@
 #include "filter.h"
 
 /* PCI IDs */
-#define EFX_VENDID_SFC	        0x1924
-#define FALCON_A_P_DEVID	0x0703
-#define FALCON_A_S_DEVID        0x6703
-#define FALCON_B_P_DEVID        0x0710
 #define BETHPAGE_A_P_DEVID      0x0803
 #define SIENA_A_P_DEVID         0x0813
 
diff --git a/drivers/net/sfc/falcon.c b/drivers/net/sfc/falcon.c
index 94bf4aa..9334b59 100644
--- a/drivers/net/sfc/falcon.c
+++ b/drivers/net/sfc/falcon.c
@@ -1424,7 +1424,8 @@ static int falcon_probe_nic(struct efx_nic *efx)
 		}
 
 		dev = pci_dev_get(efx->pci_dev);
-		while ((dev = pci_get_device(EFX_VENDID_SFC, FALCON_A_S_DEVID,
+		while ((dev = pci_get_device(PCI_VENDOR_ID_SOLARFLARE,
+					     PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1,
 					     dev))) {
 			if (dev->bus == efx->pci_dev->bus &&
 			    dev->devfn == efx->pci_dev->devfn + 1) {
diff --git a/drivers/net/sfc/falcon_boards.c b/drivers/net/sfc/falcon_boards.c
index b9cc846..6cc16b8 100644
--- a/drivers/net/sfc/falcon_boards.c
+++ b/drivers/net/sfc/falcon_boards.c
@@ -764,7 +764,8 @@ int falcon_probe_board(struct efx_nic *efx, u16 revision_info)
 
 	if (board->type) {
 		netif_info(efx, probe, efx->net_dev, "board is %s rev %c%d\n",
-			 (efx->pci_dev->subsystem_vendor == EFX_VENDID_SFC)
+			 (efx->pci_dev->subsystem_vendor ==
+			  PCI_VENDOR_ID_SOLARFLARE)
 			 ? board->type->ref_model : board->type->gen_type,
 			 'A' + board->major, board->minor);
 		return 0;
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index ae96bbe..1679ff6 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2302,6 +2302,11 @@
 #define PCI_DEVICE_ID_RENESAS_SH7785	0x0007
 #define PCI_DEVICE_ID_RENESAS_SH7786	0x0010
 
+#define PCI_VENDOR_ID_SOLARFLARE	0x1924
+#define PCI_DEVICE_ID_SOLARFLARE_SFC4000A_0	0x0703
+#define PCI_DEVICE_ID_SOLARFLARE_SFC4000A_1	0x6703
+#define PCI_DEVICE_ID_SOLARFLARE_SFC4000B	0x0710
+
 #define PCI_VENDOR_ID_TDI               0x192E
 #define PCI_DEVICE_ID_TDI_EHCI          0x0101
 
-- 
1.7.4.4



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Re: loopback IP alias breaks tftp?
From: Julian Anastasov @ 2011-10-05 21:18 UTC (permalink / raw)
  To: Josh Boyer; +Cc: netdev
In-Reply-To: <20111005202723.GE2479@zod.bos.redhat.com>


	Hello,

On Wed, 5 Oct 2011, Josh Boyer wrote:

> Hi All,
> 
> We've had a report [1] of a change in behavior when trying to use an IP
> alias to tftp from a loopback device.  Apparently the steps outlined in
> the bug worked in 2.6.35, and broke somewhere before 2.6.38.6.
> 
> I can confirm the steps fail on a 3.0 based kernel and I'm trying to do
> a git bisect to find the commit involved, but I thought I would send
> this along to see if anyone might have an idea.  (Also, I'm not really
> sure how valid of a usecase this was to begin with.)

	What about commit 9fc3bbb4a752f108cf096d96640f3b548bbbce6c ?

ipv4/route.c: respect prefsrc for local routes

http://marc.info/?t=129412232500001&r=1&w=2

> [1] https://bugzilla.redhat.com/show_bug.cgi?id=739534
> 
> josh

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: Michał Mirosław @ 2011-10-05 21:03 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, Jon Mason
In-Reply-To: <1317810532-24984-1-git-send-email-ian.campbell@citrix.com>

2011/10/5 Ian Campbell <ian.campbell@citrix.com>:
[...]
> --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
> +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
> @@ -923,9 +923,9 @@ vxge_xmit(struct sk_buff *skb, struct net_device *dev)
>                if (!frag->size)
>                        continue;
>
> -               dma_pointer = (u64) pci_map_page(fifo->pdev, frag->page,
> -                               frag->page_offset, frag->size,
> -                               PCI_DMA_TODEVICE);
> +               dma_pointer = (u64)skb_frag_dma_map(&fifo->pdev->dev, frag,
> +                                                   0, frag->size,
> +                                                   PCI_DMA_TODEVICE);

This should be DMA_TO_DEVICE instead of PCI_DMA_TODEVICE.

>
>                if (unlikely(pci_dma_mapping_error(fifo->pdev, dma_pointer)))
>                        goto _exit2;

I would also change this to dma_mapping_error() in one go.

Just a random patch check.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Ben Greear @ 2011-10-05 20:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1317846693.3457.11.camel@edumazet-laptop>

On 10/05/2011 01:31 PM, Eric Dumazet wrote:
> Le mercredi 05 octobre 2011 à 13:19 -0700, Ben Greear a écrit :
>> On 10/05/2011 01:17 PM, Eric Dumazet wrote:
>>> Le mercredi 05 octobre 2011 à 13:09 -0700, Ben Greear a écrit :
>>>> On 10/05/2011 12:54 PM, Eric Dumazet wrote:
>>>>> Le mercredi 05 octobre 2011 à 09:46 -0700, Ben Greear a écrit :
>>>>>> This is on a hacked 3.0.4 kernel...
>>>>>>
>>>>>> I am seeing an issue where an IPv4 mcast receiver will not receive
>>>>>> a 1473 or larger byte mcast message, but will receive a 1472.  The difference
>>>>>> being that 1473 ends up being two packets on the wire.  It works on
>>>>>> 802.1Q VLANs, VETH interfaces and real Ethernet.  It does not work
>>>>>> on a mac-vlan hanging off the VETH.
>>>>>>
>>>>>> I see packets received on the macvlan in tshark, and they appear correct.  No
>>>>>> obvious errors in the macvlan port stats or netstat -s,
>>>>>> and the 'ss' tool doesn't appear to support UDP sockets at all.
>>>>>>
>>>>>> So, I'm about to go digging into the code, but if anyone has any
>>>>>> suggestions for places to look, please let me know!
>>>>>>
>>>>>
>>>>> Well, problem is defragmentation and macvlan cooperation.
>>>>>
>>>>> Multicast messages are broadcasted on all macvlan ports.
>>>>>
>>>>> But IP defrag will probably deliver a single final frame.
>>>>>
>>>>> We probably need to handle defrag in macvlan before broadcasting to all
>>>>> ports.
>>>>
>>>> I see packets get to this code in ip_input.c (line 467 or so),
>>>> and that printk is mine of course.
>>>>
>>>> 	if ((dev&&   strcmp(dev->name, "rddVR10#0") == 0) ||
>>>> 	    (dev&&   strcmp(dev->name, "rddVR10") == 0)) {
>>>> 		printk("calling ip_rcv_finish through NF_HOOK, dev: %s, len: %i\n",
>>>> 		       dev->name, skb->len);
>>>> 	}
>>>>
>>>> 	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,
>>>> 		       ip_rcv_finish);
>>>>
>>>> But, the macvlan packets never make it to the ip_rcv_finish method.
>>>>
>>>> I do see a big and a little packet entering this code.
>>>>
>>>> I have no firewall rules that I'm aware of, though there
>>>> is some conn-track logic (though not associated with the
>>>> mac-vlan interface):
>>>
>>> Say you have 10 vlans on your eth0, how many times do you want one
>>> incoming multicast frame being delivered to your application listening
>>> on 0.0.0.0:port ?
>>
>> How would it work for two Ethernet devices on the same LAN?  I'd
>> say that mac-vlans should mimic that case.
>>
>> And in my case, I'm binding hard to a device&  IP address,
>> so my app should get it once regardless.
>>
>
> OK, but before frame being delivered to your app, it must be
> re-assembled by net/ipv4/inet_fragment.c&  net/ipv4/ip_fragment.c
> machinery.
>
> This machinery uses :
>
> static int ip4_frag_match(struct inet_frag_queue *q, void *a)
> {
>          struct ipq *qp;
>          struct ip4_create_arg *arg = a;
>
>          qp = container_of(q, struct ipq, q);
>          return  qp->id == arg->iph->id&&
>                          qp->saddr == arg->iph->saddr&&
>                          qp->daddr == arg->iph->daddr&&
>                          qp->protocol == arg->iph->protocol&&
>                          qp->user == arg->user;
> }
>
> All frames broadcasted (because of multicast code in macvlan) on vlans
> have same saddr/daddr/protocol (and user).

Wouldn't you have the same problem with two real Ethernet interfaces on
the same LAN, or two 802.1Q devices for that matter?  The addrs will all
be the same in that case too?

Also, if I have just a single mac-vlan active (the other 3 are 'ifconfig foo down'),
I still see the problem with mcast.

 From what you describe, I am thinking I may be hitting a different
issue.  Any ideas on how to figure out why exactly the NF_HOOK isn't
calling the ip_rcv_finish method?

> So kernel will discard all redundant copies of frames and deliver one
> copy only to upper stack.

> Check commit 7736d33f4262d437c5 (packet: Add pre-defragmentation support
> for ipv4 fanouts) for a possible hint :
>
> We could perform the re-assembly in macvlan code, before doing the
> "broadcast the frame on all ports" part.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH 2/2] bridge: allow forwarding some link local frames
From: Stephen Hemminger @ 2011-10-05 20:50 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David S. Miller, netdev
In-Reply-To: <1317843619.2802.32.camel@bwh-desktop>

On Wed, 05 Oct 2011 20:40:19 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Mon, 2011-10-03 at 21:14 -0700, Stephen Hemminger wrote:
> > plain text document attachment (bridge-multicast-filter.patch)
> > This is based on an earlier patch by Nick Carter with comments
> > by David Lamparter but with some refinements. Thanks for their patience
> > this is a confusing area with overlap of standards, user requirements,
> > and compatibility with earlier releases.
> > 
> > It adds a new sysfs attribute 
> >    /sys/class/net/brX/bridge/group_fwd_mask
> > that controls forwarding of frames with address of: 01-80-C2-00-00-0X
> > The default setting has no forwarding to retain compatibility.
> > 
> > One change from earlier releases is that forwarding of group
> > addresses is not dependent on STP being enabled or disabled. This
> > choice was made based on interpretation of tie 802.1 standards.
> > I expect complaints will arise because of this, but better to follow
> > the standard than continue acting incorrectly by default.
> > 
> > The filtering mask is writeable, but only values that don't forward
> > known control frames are allowed. It intentionally blocks attempts
> > to filter control protocols. For example: writing a 8 allows
> > forwarding 802.1X PAE addresses which is the most common request.
> [...]
> 
> I wonder why you don't forbid forwarding frames sent to reserved
> destination addresses?  The standards seem pretty clear that this should
> not be allowed.

Future proofing. Since addresses are unassigned there is no certainty of the
assigned semantics when they are used.

^ permalink raw reply

* Re: [PATCH 0/8] skb fragment API: convert network drivers (part IV)
From: David Miller @ 2011-10-05 20:49 UTC (permalink / raw)
  To: gregkh; +Cc: Ian.Campbell, netdev
In-Reply-To: <20111005202558.GA13436@suse.de>

From: Greg KH <gregkh@suse.de>
Date: Wed, 5 Oct 2011 13:25:58 -0700

> Then feel free to take these changes through your tree, I have no
> objection to that.  I'll handle any merge issues on my side that might
> happen, but given the size of these patches (i.e. almost trivial), I
> doubt that would happen.
> 
> So feel free to add my:
> 	Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> to these patches and take them through the networking tree.

Perfect, thanks a lot Greg.

^ permalink raw reply

* Re: [PATCH 0/8] skb fragment API: convert network drivers (part IV)
From: Greg KH @ 2011-10-05 20:25 UTC (permalink / raw)
  To: David Miller; +Cc: Ian.Campbell, netdev
In-Reply-To: <20111005.161341.840183744862645286.davem@davemloft.net>

On Wed, Oct 05, 2011 at 04:13:41PM -0400, David Miller wrote:
> From: Greg KH <gregkh@suse.de>
> Date: Wed, 5 Oct 2011 12:17:17 -0700
> 
> > On Wed, Oct 05, 2011 at 11:28:30AM +0100, Ian Campbell wrote:
> >> The following series converts a forth batch of network drivers to the
> >> SKB pages fragment API introduced in 131ea6675c76.
> >> 
> >> There are two drivers/staging patches in here. I'm not sure who normally
> >> takes these, I'm sure you guys know though.
> > 
> > I take them.  As I don't have commit 131ea6675c76 in my tree
> > (staging-next), I'd have to wait until that patch gets merged into
> > Linus's tree, which should be after 3.2-rc1, right?  Mind if I hold off
> > in taking them until then?
> 
> We're trying to change all the driver accesses so we can change
> the datastructures underneath this stuff.
> 
> If we have to wait until after 3.2-rc1, then we won't be able to
> make the datastructure change this merge window.

Then feel free to take these changes through your tree, I have no
objection to that.  I'll handle any merge issues on my side that might
happen, but given the size of these patches (i.e. almost trivial), I
doubt that would happen.

So feel free to add my:
	Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

to these patches and take them through the networking tree.

thanks,

greg k-h

^ permalink raw reply

* Re: [net-next 00/10 v3][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2011-10-05 20:44 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo
In-Reply-To: <1317810024-20846-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed,  5 Oct 2011 03:20:14 -0700

> The following series contains updates to e1000e and ixgbe. The one
> patch for e1000e makes function tables const, thanks to Stephen
> Hemminger for reporting this.  The remaining patches are for ixgbe,
> and the contain the following:
> 
>  - minor cleanups
>  - add support for 82599 device and ethtool -E support
>  - removal of a PHY which is not used in production silicon
> 
> v2- Updated patch 11 with the suggested changes from Ben Hutchings
> v3- Drop patch 11 so that Emil can re-work the patch based on feedback
> 
> The following are changes since commit 70f14381299984f05764c3188f1706288285c953:
>   bna: Multiple Definition and Interface Setup Fix
> and are available in the git repository at
>   git://github.com/Jkirsher/net-next.git

Pulled, thanks Jeff.

^ permalink raw reply

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Eric Dumazet @ 2011-10-05 20:31 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4E8CBBD6.3080500@candelatech.com>

Le mercredi 05 octobre 2011 à 13:19 -0700, Ben Greear a écrit :
> On 10/05/2011 01:17 PM, Eric Dumazet wrote:
> > Le mercredi 05 octobre 2011 à 13:09 -0700, Ben Greear a écrit :
> >> On 10/05/2011 12:54 PM, Eric Dumazet wrote:
> >>> Le mercredi 05 octobre 2011 à 09:46 -0700, Ben Greear a écrit :
> >>>> This is on a hacked 3.0.4 kernel...
> >>>>
> >>>> I am seeing an issue where an IPv4 mcast receiver will not receive
> >>>> a 1473 or larger byte mcast message, but will receive a 1472.  The difference
> >>>> being that 1473 ends up being two packets on the wire.  It works on
> >>>> 802.1Q VLANs, VETH interfaces and real Ethernet.  It does not work
> >>>> on a mac-vlan hanging off the VETH.
> >>>>
> >>>> I see packets received on the macvlan in tshark, and they appear correct.  No
> >>>> obvious errors in the macvlan port stats or netstat -s,
> >>>> and the 'ss' tool doesn't appear to support UDP sockets at all.
> >>>>
> >>>> So, I'm about to go digging into the code, but if anyone has any
> >>>> suggestions for places to look, please let me know!
> >>>>
> >>>
> >>> Well, problem is defragmentation and macvlan cooperation.
> >>>
> >>> Multicast messages are broadcasted on all macvlan ports.
> >>>
> >>> But IP defrag will probably deliver a single final frame.
> >>>
> >>> We probably need to handle defrag in macvlan before broadcasting to all
> >>> ports.
> >>
> >> I see packets get to this code in ip_input.c (line 467 or so),
> >> and that printk is mine of course.
> >>
> >> 	if ((dev&&  strcmp(dev->name, "rddVR10#0") == 0) ||
> >> 	    (dev&&  strcmp(dev->name, "rddVR10") == 0)) {
> >> 		printk("calling ip_rcv_finish through NF_HOOK, dev: %s, len: %i\n",
> >> 		       dev->name, skb->len);
> >> 	}
> >>
> >> 	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,
> >> 		       ip_rcv_finish);
> >>
> >> But, the macvlan packets never make it to the ip_rcv_finish method.
> >>
> >> I do see a big and a little packet entering this code.
> >>
> >> I have no firewall rules that I'm aware of, though there
> >> is some conn-track logic (though not associated with the
> >> mac-vlan interface):
> >
> > Say you have 10 vlans on your eth0, how many times do you want one
> > incoming multicast frame being delivered to your application listening
> > on 0.0.0.0:port ?
> 
> How would it work for two Ethernet devices on the same LAN?  I'd
> say that mac-vlans should mimic that case.
> 
> And in my case, I'm binding hard to a device & IP address,
> so my app should get it once regardless.
> 

OK, but before frame being delivered to your app, it must be
re-assembled by net/ipv4/inet_fragment.c & net/ipv4/ip_fragment.c
machinery.

This machinery uses :

static int ip4_frag_match(struct inet_frag_queue *q, void *a)
{
        struct ipq *qp;
        struct ip4_create_arg *arg = a;

        qp = container_of(q, struct ipq, q);
        return  qp->id == arg->iph->id &&
                        qp->saddr == arg->iph->saddr &&
                        qp->daddr == arg->iph->daddr &&
                        qp->protocol == arg->iph->protocol &&
                        qp->user == arg->user;
}

All frames broadcasted (because of multicast code in macvlan) on vlans
have same saddr/daddr/protocol (and user).

So kernel will discard all redundant copies of frames and deliver one
copy only to upper stack.

Check commit 7736d33f4262d437c5 (packet: Add pre-defragmentation support
for ipv4 fanouts) for a possible hint :

We could perform the re-assembly in macvlan code, before doing the
"broadcast the frame on all ports" part.

^ permalink raw reply

* loopback IP alias breaks tftp?
From: Josh Boyer @ 2011-10-05 20:27 UTC (permalink / raw)
  To: netdev

Hi All,

We've had a report [1] of a change in behavior when trying to use an IP
alias to tftp from a loopback device.  Apparently the steps outlined in
the bug worked in 2.6.35, and broke somewhere before 2.6.38.6.

I can confirm the steps fail on a 3.0 based kernel and I'm trying to do
a git bisect to find the commit involved, but I thought I would send
this along to see if anyone might have an idea.  (Also, I'm not really
sure how valid of a usecase this was to begin with.)

[1] https://bugzilla.redhat.com/show_bug.cgi?id=739534

josh

^ permalink raw reply

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Ben Greear @ 2011-10-05 20:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1317845835.3457.5.camel@edumazet-laptop>

On 10/05/2011 01:17 PM, Eric Dumazet wrote:
> Le mercredi 05 octobre 2011 à 13:09 -0700, Ben Greear a écrit :
>> On 10/05/2011 12:54 PM, Eric Dumazet wrote:
>>> Le mercredi 05 octobre 2011 à 09:46 -0700, Ben Greear a écrit :
>>>> This is on a hacked 3.0.4 kernel...
>>>>
>>>> I am seeing an issue where an IPv4 mcast receiver will not receive
>>>> a 1473 or larger byte mcast message, but will receive a 1472.  The difference
>>>> being that 1473 ends up being two packets on the wire.  It works on
>>>> 802.1Q VLANs, VETH interfaces and real Ethernet.  It does not work
>>>> on a mac-vlan hanging off the VETH.
>>>>
>>>> I see packets received on the macvlan in tshark, and they appear correct.  No
>>>> obvious errors in the macvlan port stats or netstat -s,
>>>> and the 'ss' tool doesn't appear to support UDP sockets at all.
>>>>
>>>> So, I'm about to go digging into the code, but if anyone has any
>>>> suggestions for places to look, please let me know!
>>>>
>>>
>>> Well, problem is defragmentation and macvlan cooperation.
>>>
>>> Multicast messages are broadcasted on all macvlan ports.
>>>
>>> But IP defrag will probably deliver a single final frame.
>>>
>>> We probably need to handle defrag in macvlan before broadcasting to all
>>> ports.
>>
>> I see packets get to this code in ip_input.c (line 467 or so),
>> and that printk is mine of course.
>>
>> 	if ((dev&&  strcmp(dev->name, "rddVR10#0") == 0) ||
>> 	    (dev&&  strcmp(dev->name, "rddVR10") == 0)) {
>> 		printk("calling ip_rcv_finish through NF_HOOK, dev: %s, len: %i\n",
>> 		       dev->name, skb->len);
>> 	}
>>
>> 	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,
>> 		       ip_rcv_finish);
>>
>> But, the macvlan packets never make it to the ip_rcv_finish method.
>>
>> I do see a big and a little packet entering this code.
>>
>> I have no firewall rules that I'm aware of, though there
>> is some conn-track logic (though not associated with the
>> mac-vlan interface):
>
> Say you have 10 vlans on your eth0, how many times do you want one
> incoming multicast frame being delivered to your application listening
> on 0.0.0.0:port ?

How would it work for two Ethernet devices on the same LAN?  I'd
say that mac-vlans should mimic that case.

And in my case, I'm binding hard to a device & IP address,
so my app should get it once regardless.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Eric Dumazet @ 2011-10-05 20:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4E8CB990.1010406@candelatech.com>

Le mercredi 05 octobre 2011 à 13:09 -0700, Ben Greear a écrit :
> On 10/05/2011 12:54 PM, Eric Dumazet wrote:
> > Le mercredi 05 octobre 2011 à 09:46 -0700, Ben Greear a écrit :
> >> This is on a hacked 3.0.4 kernel...
> >>
> >> I am seeing an issue where an IPv4 mcast receiver will not receive
> >> a 1473 or larger byte mcast message, but will receive a 1472.  The difference
> >> being that 1473 ends up being two packets on the wire.  It works on
> >> 802.1Q VLANs, VETH interfaces and real Ethernet.  It does not work
> >> on a mac-vlan hanging off the VETH.
> >>
> >> I see packets received on the macvlan in tshark, and they appear correct.  No
> >> obvious errors in the macvlan port stats or netstat -s,
> >> and the 'ss' tool doesn't appear to support UDP sockets at all.
> >>
> >> So, I'm about to go digging into the code, but if anyone has any
> >> suggestions for places to look, please let me know!
> >>
> >
> > Well, problem is defragmentation and macvlan cooperation.
> >
> > Multicast messages are broadcasted on all macvlan ports.
> >
> > But IP defrag will probably deliver a single final frame.
> >
> > We probably need to handle defrag in macvlan before broadcasting to all
> > ports.
> 
> I see packets get to this code in ip_input.c (line 467 or so),
> and that printk is mine of course.
> 
> 	if ((dev && strcmp(dev->name, "rddVR10#0") == 0) ||
> 	    (dev && strcmp(dev->name, "rddVR10") == 0)) {
> 		printk("calling ip_rcv_finish through NF_HOOK, dev: %s, len: %i\n",
> 		       dev->name, skb->len);
> 	}
> 
> 	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,
> 		       ip_rcv_finish);
> 
> But, the macvlan packets never make it to the ip_rcv_finish method.
> 
> I do see a big and a little packet entering this code.
> 
> I have no firewall rules that I'm aware of, though there
> is some conn-track logic (though not associated with the
> mac-vlan interface):

Say you have 10 vlans on your eth0, how many times do you want one
incoming multicast frame being delivered to your application listening
on 0.0.0.0:port ?

10 or 1 ?

^ permalink raw reply

* [PATCH] net: ipv6: Allow netlink to set IPv6 address scope
From: Lorenzo Colitti @ 2011-10-05 20:15 UTC (permalink / raw)
  To: maze, lorenzo, yoshfuji, netdev; +Cc: c

net: ipv6: Allow netlink to set IPv6 address scope

Currently, userspace cannot specify the scope of IPv6
addresses when creating or modifying them. Instead, the
scope is automatically determined from the address itself.
In IPv4, userspace can set whatever scope it likes.

Allow userspace to specify the scope of IPv6 addresses in
a backwards-compatible way: if the scope passed in is zero,
use the old behaviour of automatically determining the
scope based on the address.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 12368c5..b05892d 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2109,17 +2109,44 @@ err_exit:
 	return err;
 }
 
+static inline int rt_scope(int ifa_scope)
+{
+	if (ifa_scope & IFA_HOST)
+		return RT_SCOPE_HOST;
+	else if (ifa_scope & IFA_LINK)
+		return RT_SCOPE_LINK;
+	else if (ifa_scope & IFA_SITE)
+		return RT_SCOPE_SITE;
+	else
+		return RT_SCOPE_UNIVERSE;
+}
+
+static inline int ifa_scope(u8 rt_scope)
+{
+	switch (rt_scope) {
+	case RT_SCOPE_UNIVERSE:
+		return IPV6_ADDR_ANY;
+	case RT_SCOPE_SITE:
+		return IPV6_ADDR_SITELOCAL;
+	case RT_SCOPE_LINK:
+		return IPV6_ADDR_LINKLOCAL;
+	case RT_SCOPE_HOST:
+		return IPV6_ADDR_LOOPBACK;
+	default:
+		return __IPV6_ADDR_SCOPE_INVALID;
+	}
+}
+
 /*
  *	Manual configuration of address on an interface
  */
 static int inet6_addr_add(struct net *net, int ifindex, const struct in6_addr *pfx,
-			  unsigned int plen, __u8 ifa_flags, __u32 prefered_lft,
-			  __u32 valid_lft)
+			  unsigned int plen, __u8 ifa_flags, __u8 scope,
+			  __u32 prefered_lft, __u32 valid_lft)
 {
 	struct inet6_ifaddr *ifp;
 	struct inet6_dev *idev;
 	struct net_device *dev;
-	int scope;
 	u32 flags;
 	clock_t expires;
 	unsigned long timeout;
@@ -2141,8 +2168,6 @@ static int inet6_addr_add(struct net *net, int ifindex, const struct in6_addr *p
 	if (IS_ERR(idev))
 		return PTR_ERR(idev);
 
-	scope = ipv6_addr_scope(pfx);
-
 	timeout = addrconf_timeout_fixup(valid_lft, HZ);
 	if (addrconf_finite_timeout(timeout)) {
 		expires = jiffies_to_clock_t(timeout * HZ);
@@ -2239,7 +2264,8 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	rtnl_lock();
 	err = inet6_addr_add(net, ireq.ifr6_ifindex, &ireq.ifr6_addr,
 			     ireq.ifr6_prefixlen, IFA_F_PERMANENT,
-			     INFINITY_LIFE_TIME, INFINITY_LIFE_TIME);
+			     IPV6_ADDR_ANY, INFINITY_LIFE_TIME,
+			     INFINITY_LIFE_TIME);
 	rtnl_unlock();
 	return err;
 }
@@ -3333,7 +3359,7 @@ inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	return inet6_addr_del(net, ifm->ifa_index, pfx, ifm->ifa_prefixlen);
 }
 
-static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags,
+static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags, u8 scope,
 			     u32 prefered_lft, u32 valid_lft)
 {
 	u32 flags;
@@ -3363,6 +3389,7 @@ static int inet6_addr_modify(struct inet6_ifaddr *ifp, u8 ifa_flags,
 
 	spin_lock_bh(&ifp->lock);
 	ifp->flags = (ifp->flags & ~(IFA_F_DEPRECATED | IFA_F_PERMANENT | IFA_F_NODAD | IFA_F_HOMEADDRESS)) | ifa_flags;
+	ifp->scope = scope;
 	ifp->tstamp = jiffies;
 	ifp->valid_lft = valid_lft;
 	ifp->prefered_lft = prefered_lft;
@@ -3388,7 +3415,7 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct inet6_ifaddr *ifa;
 	struct net_device *dev;
 	u32 valid_lft = INFINITY_LIFE_TIME, preferred_lft = INFINITY_LIFE_TIME;
-	u8 ifa_flags;
+	u8 ifa_flags, scope;
 	int err;
 
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy);
@@ -3418,6 +3445,13 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	/* We ignore other flags so far. */
 	ifa_flags = ifm->ifa_flags & (IFA_F_NODAD | IFA_F_HOMEADDRESS);
 
+	if (ifm->ifa_scope == RT_SCOPE_UNIVERSE) {
+		/* Be backwards compatible with tools that don't set scope. */
+		scope = ipv6_addr_scope(pfx);
+	} else {
+		scope = ifa_scope(ifm->ifa_scope);
+	}
+
 	ifa = ipv6_get_ifaddr(net, pfx, dev, 1);
 	if (ifa == NULL) {
 		/*
@@ -3425,7 +3459,7 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		 * userspace alreay relies on not having to provide this.
 		 */
 		return inet6_addr_add(net, ifm->ifa_index, pfx,
-				      ifm->ifa_prefixlen, ifa_flags,
+				      ifm->ifa_prefixlen, ifa_flags, scope,
 				      preferred_lft, valid_lft);
 	}
 
@@ -3433,7 +3467,8 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	    !(nlh->nlmsg_flags & NLM_F_REPLACE))
 		err = -EEXIST;
 	else
-		err = inet6_addr_modify(ifa, ifa_flags, preferred_lft, valid_lft);
+		err = inet6_addr_modify(ifa, ifa_flags, scope,
+					preferred_lft, valid_lft);
 
 	in6_ifa_put(ifa);
 
@@ -3466,18 +3501,6 @@ static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp,
 	return nla_put(skb, IFA_CACHEINFO, sizeof(ci), &ci);
 }
 
-static inline int rt_scope(int ifa_scope)
-{
-	if (ifa_scope & IFA_HOST)
-		return RT_SCOPE_HOST;
-	else if (ifa_scope & IFA_LINK)
-		return RT_SCOPE_LINK;
-	else if (ifa_scope & IFA_SITE)
-		return RT_SCOPE_SITE;
-	else
-		return RT_SCOPE_UNIVERSE;
-}
-
 static inline int inet6_ifaddr_msgsize(void)
 {
 	return NLMSG_ALIGN(sizeof(struct ifaddrmsg))

^ permalink raw reply related

* Re: [PATCH 0/8] skb fragment API: convert network drivers (part IV)
From: David Miller @ 2011-10-05 20:13 UTC (permalink / raw)
  To: gregkh; +Cc: Ian.Campbell, netdev
In-Reply-To: <20111005191717.GC30378@suse.de>

From: Greg KH <gregkh@suse.de>
Date: Wed, 5 Oct 2011 12:17:17 -0700

> On Wed, Oct 05, 2011 at 11:28:30AM +0100, Ian Campbell wrote:
>> The following series converts a forth batch of network drivers to the
>> SKB pages fragment API introduced in 131ea6675c76.
>> 
>> There are two drivers/staging patches in here. I'm not sure who normally
>> takes these, I'm sure you guys know though.
> 
> I take them.  As I don't have commit 131ea6675c76 in my tree
> (staging-next), I'd have to wait until that patch gets merged into
> Linus's tree, which should be after 3.2-rc1, right?  Mind if I hold off
> in taking them until then?

We're trying to change all the driver accesses so we can change
the datastructures underneath this stuff.

If we have to wait until after 3.2-rc1, then we won't be able to
make the datastructure change this merge window.

^ permalink raw reply

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Ben Greear @ 2011-10-05 20:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1317844449.3457.3.camel@edumazet-laptop>

On 10/05/2011 12:54 PM, Eric Dumazet wrote:
> Le mercredi 05 octobre 2011 à 09:46 -0700, Ben Greear a écrit :
>> This is on a hacked 3.0.4 kernel...
>>
>> I am seeing an issue where an IPv4 mcast receiver will not receive
>> a 1473 or larger byte mcast message, but will receive a 1472.  The difference
>> being that 1473 ends up being two packets on the wire.  It works on
>> 802.1Q VLANs, VETH interfaces and real Ethernet.  It does not work
>> on a mac-vlan hanging off the VETH.
>>
>> I see packets received on the macvlan in tshark, and they appear correct.  No
>> obvious errors in the macvlan port stats or netstat -s,
>> and the 'ss' tool doesn't appear to support UDP sockets at all.
>>
>> So, I'm about to go digging into the code, but if anyone has any
>> suggestions for places to look, please let me know!
>>
>
> Well, problem is defragmentation and macvlan cooperation.
>
> Multicast messages are broadcasted on all macvlan ports.
>
> But IP defrag will probably deliver a single final frame.
>
> We probably need to handle defrag in macvlan before broadcasting to all
> ports.

I see packets get to this code in ip_input.c (line 467 or so),
and that printk is mine of course.

	if ((dev && strcmp(dev->name, "rddVR10#0") == 0) ||
	    (dev && strcmp(dev->name, "rddVR10") == 0)) {
		printk("calling ip_rcv_finish through NF_HOOK, dev: %s, len: %i\n",
		       dev->name, skb->len);
	}

	return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,
		       ip_rcv_finish);

But, the macvlan packets never make it to the ip_rcv_finish method.

I do see a big and a little packet entering this code.

I have no firewall rules that I'm aware of, though there
is some conn-track logic (though not associated with the
mac-vlan interface):

[root@lec2010-ath9k-1 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination


And, I just reproduced the problem on vanilla linus top-of-tree (3.1.0-rc9).


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] x25: Validate incoming call user data lengths
From: Andrew Hendry @ 2011-10-05 20:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Matthew Daley, netdev
In-Reply-To: <1317738819.2440.9.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Ran this on a few systems and put a few GB through it, looks good.

Tested-by: Andrew Hendry <andrew.hendry@gmail.com>

On Wed, Oct 5, 2011 at 1:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 04 octobre 2011 à 14:00 +0000, Matthew Daley a écrit :
>> X.25 call user data is being copied in its entirety from incoming messages
>> without consideration to the size of the destination buffers, leading to
>> possible buffer overflows. Validate incoming call user data lengths before
>> these copies are performed.
>>
>> It appears this issue was noticed some time ago, however nothing seemed to
>> come of it: see http://www.spinics.net/lists/linux-x25/msg00043.html and
>> commit 8db09f26f912f7c90c764806e804b558da520d4f.
>>
>> Signed-off-by: Matthew Daley <mattjd@gmail.com>
>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>> Cc: Andrew Hendry <andrew.hendry@gmail.com>
>> Cc: stable <stable@kernel.org>
>> ---
>> Hopefully this is acceptable; I thought someone with greater knowledge of
>> X.25 might step in :) It should take into account Andrew's suggestion to
>> move af_x25.c's check to before the call acception.
>>
>>  net/x25/af_x25.c |    6 ++++++
>>  net/x25/x25_in.c |    3 +++
>>  2 files changed, 9 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
>> index d306154..a4bd172 100644
>> --- a/net/x25/af_x25.c
>> +++ b/net/x25/af_x25.c
>> @@ -959,6 +959,12 @@ int x25_rx_call_request(struct sk_buff *skb,
>> struct x25_neigh *nb,
>>       skb_pull(skb,len);
>>
>>       /*
>> +      *      Ensure that the amount of call user data is valid.
>> +      */
>> +     if (skb->len > X25_MAX_CUD_LEN)
>> +             goto out_clear_request;
>> +
>> +     /*
>>        *      Find a listener for the particular address/cud pair.
>>        */
>>       sk = x25_find_listener(&source_addr,skb);
>> diff --git a/net/x25/x25_in.c b/net/x25/x25_in.c
>> index 0b073b5..63488fd 100644
>> --- a/net/x25/x25_in.c
>> +++ b/net/x25/x25_in.c
>> @@ -127,6 +127,9 @@ static int x25_state1_machine(struct sock *sk,
>> struct sk_buff *skb, int frametyp
>>                *      Copy any Call User Data.
>>                */
>>               if (skb->len > 0) {
>> +                     if (skb->len > X25_MAX_CUD_LEN)
>> +                             goto out_clear;
>> +
>>                       skb_copy_from_linear_data(skb,
>>                                                 x25->calluserdata.cuddata,
>>                                                 skb->len);
>> --
>
> It seems fine to me.
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Now, you could also fix x25 for undersized/frags frames :
>
> Apparently, no check is done in this respect (missing pskb_may_pull()
> calls)...
>
>
>
>

^ permalink raw reply

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Eric Dumazet @ 2011-10-05 19:54 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4E8C89EE.3090600@candelatech.com>

Le mercredi 05 octobre 2011 à 09:46 -0700, Ben Greear a écrit :
> This is on a hacked 3.0.4 kernel...
> 
> I am seeing an issue where an IPv4 mcast receiver will not receive
> a 1473 or larger byte mcast message, but will receive a 1472.  The difference
> being that 1473 ends up being two packets on the wire.  It works on
> 802.1Q VLANs, VETH interfaces and real Ethernet.  It does not work
> on a mac-vlan hanging off the VETH.
> 
> I see packets received on the macvlan in tshark, and they appear correct.  No
> obvious errors in the macvlan port stats or netstat -s,
> and the 'ss' tool doesn't appear to support UDP sockets at all.
> 
> So, I'm about to go digging into the code, but if anyone has any
> suggestions for places to look, please let me know!
> 

Well, problem is defragmentation and macvlan cooperation.

Multicast messages are broadcasted on all macvlan ports.

But IP defrag will probably deliver a single final frame.

We probably need to handle defrag in macvlan before broadcasting to all
ports.

^ permalink raw reply

* Re: [PATCH 2/2] bridge: allow forwarding some link local frames
From: Ben Hutchings @ 2011-10-05 19:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20111004041509.292932641@vyatta.com>

On Mon, 2011-10-03 at 21:14 -0700, Stephen Hemminger wrote:
> plain text document attachment (bridge-multicast-filter.patch)
> This is based on an earlier patch by Nick Carter with comments
> by David Lamparter but with some refinements. Thanks for their patience
> this is a confusing area with overlap of standards, user requirements,
> and compatibility with earlier releases.
> 
> It adds a new sysfs attribute 
>    /sys/class/net/brX/bridge/group_fwd_mask
> that controls forwarding of frames with address of: 01-80-C2-00-00-0X
> The default setting has no forwarding to retain compatibility.
> 
> One change from earlier releases is that forwarding of group
> addresses is not dependent on STP being enabled or disabled. This
> choice was made based on interpretation of tie 802.1 standards.
> I expect complaints will arise because of this, but better to follow
> the standard than continue acting incorrectly by default.
> 
> The filtering mask is writeable, but only values that don't forward
> known control frames are allowed. It intentionally blocks attempts
> to filter control protocols. For example: writing a 8 allows
> forwarding 802.1X PAE addresses which is the most common request.
[...]

I wonder why you don't forbid forwarding frames sent to reserved
destination addresses?  The standards seem pretty clear that this should
not be allowed.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 5/7] bridge: allow creating/deleting fdb entries via netlink
From: Ben Hutchings @ 2011-10-05 19:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Kevin Wilson, netdev
In-Reply-To: <20111005121305.26ac48bf@nehalam.linuxnetplumber.net>

On Wed, 2011-10-05 at 12:13 -0700, Stephen Hemminger wrote:
> On Wed, 5 Oct 2011 21:06:54 +0200
> Kevin Wilson <wkevils@gmail.com> wrote:
> 
> > Hello all,
> > I would appreciate if someone can elaborate about "bridge extensions
> > to iproute2" mentioned here.
> > I downloaded latest iproute2 git tree and did not find it there.
> > googling for it did not gave much info about it.
> > I will appreciate if someone can tell  who develop it, what is the
> > status, site, repository tree, etc.
> > 
> > rgs,
> > Kevin
> 
> The patch to handle this was posted, but is not committed to the tree yet.
[...]
> +static void usage(void)
> +{
> +	fprintf(stderr,
> +"Usage: br [ OPTIONS ] OBJECT { COMMAND | help }\n"
[...]

This and the sub-command usage functions still report the command name
as 'br'.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH 0/8] skb fragment API: convert network drivers (part IV)
From: Greg KH @ 2011-10-05 19:17 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev
In-Reply-To: <1317810511.21903.204.camel@zakaz.uk.xensource.com>

On Wed, Oct 05, 2011 at 11:28:30AM +0100, Ian Campbell wrote:
> The following series converts a forth batch of network drivers to the
> SKB pages fragment API introduced in 131ea6675c76.
> 
> There are two drivers/staging patches in here. I'm not sure who normally
> takes these, I'm sure you guys know though.

I take them.  As I don't have commit 131ea6675c76 in my tree
(staging-next), I'd have to wait until that patch gets merged into
Linus's tree, which should be after 3.2-rc1, right?  Mind if I hold off
in taking them until then?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 5/7] bridge: allow creating/deleting fdb entries via netlink
From: Stephen Hemminger @ 2011-10-05 19:13 UTC (permalink / raw)
  To: Kevin Wilson; +Cc: netdev
In-Reply-To: <CAGXs5wXc0RLE9KbTVVYRa0xuLEKh8mTM7AR7hMBe86HNfDX0HQ@mail.gmail.com>

On Wed, 5 Oct 2011 21:06:54 +0200
Kevin Wilson <wkevils@gmail.com> wrote:

> Hello all,
> I would appreciate if someone can elaborate about "bridge extensions
> to iproute2" mentioned here.
> I downloaded latest iproute2 git tree and did not find it there.
> googling for it did not gave much info about it.
> I will appreciate if someone can tell  who develop it, what is the
> status, site, repository tree, etc.
> 
> rgs,
> Kevin

The patch to handle this was posted, but is not committed to the tree yet.

--
>From 8e3d00d0602420dadc3d23877a4995ea3d7496c2 Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 4 Oct 2011 09:31:58 -0700
Subject: [PATCH] Add support for bridging control.

This adds a new 'bridge' command which is the bridging equivalent of
the ip command.
---
 Makefile          |    2 +-
 br/.gitignore     |    1 +
 br/Makefile       |   14 +++
 br/br_common.h    |   13 +++
 br/bridge.c       |  104 ++++++++++++++++++++++
 br/fdb.c          |  245 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 br/link.c         |  142 +++++++++++++++++++++++++++++++
 br/monitor.c      |  138 ++++++++++++++++++++++++++++++
 man/man8/bridge.8 |  177 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 835 insertions(+), 1 deletions(-)
 create mode 100644 br/.gitignore
 create mode 100644 br/Makefile
 create mode 100644 br/br_common.h
 create mode 100644 br/bridge.c
 create mode 100644 br/fdb.c
 create mode 100644 br/link.c
 create mode 100644 br/monitor.c
 create mode 100644 man/man8/bridge.8

diff --git a/Makefile b/Makefile
index d1ace1f..f1d360a 100644
--- a/Makefile
+++ b/Makefile
@@ -33,7 +33,7 @@ CCOPTS = -D_GNU_SOURCE -O2 -Wstrict-prototypes -Wall
 CFLAGS = $(CCOPTS) -I../include $(DEFINES)
 YACCFLAGS = -d -t -v
 
-SUBDIRS=lib ip tc misc netem genl
+SUBDIRS=lib ip tc br misc netem genl
 
 LIBNETLINK=../lib/libnetlink.a ../lib/libutil.a
 LDLIBS += $(LIBNETLINK)
diff --git a/br/.gitignore b/br/.gitignore
new file mode 100644
index 0000000..7096907
--- /dev/null
+++ b/br/.gitignore
@@ -0,0 +1 @@
+bridge
diff --git a/br/Makefile b/br/Makefile
new file mode 100644
index 0000000..9a6743e
--- /dev/null
+++ b/br/Makefile
@@ -0,0 +1,14 @@
+BROBJ = bridge.o fdb.o monitor.o link.o
+
+include ../Config
+
+all: bridge
+
+bridge: $(BROBJ) $(LIBNETLINK) 
+
+install: all
+	install -m 0755 bridge $(DESTDIR)$(SBINDIR)
+
+clean:
+	rm -f $(BROBJ) bridge
+
diff --git a/br/br_common.h b/br/br_common.h
new file mode 100644
index 0000000..ec1671d
--- /dev/null
+++ b/br/br_common.h
@@ -0,0 +1,13 @@
+extern int print_linkinfo(const struct sockaddr_nl *who,
+			  struct nlmsghdr *n,
+			  void *arg);
+extern int print_fdb(const struct sockaddr_nl *who,
+		     struct nlmsghdr *n, void *arg);
+
+extern int do_fdb(int argc, char **argv);
+extern int do_monitor(int argc, char **argv);
+
+extern int show_stats;
+extern int show_detail;
+extern int timestamp;
+extern struct rtnl_handle rth;
diff --git a/br/bridge.c b/br/bridge.c
new file mode 100644
index 0000000..9e5f69c
--- /dev/null
+++ b/br/bridge.c
@@ -0,0 +1,104 @@
+/*
+ * Get/set/delete bridge with netlink
+ *
+ * Authors:	Stephen Hemminger <shemminger@vyatta.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/socket.h>
+#include <string.h>
+
+#include "SNAPSHOT.h"
+#include "utils.h"
+#include "br_common.h"
+
+struct rtnl_handle rth = { .fd = -1 };
+int resolve_hosts;
+int show_stats;
+int show_details;
+int timestamp;
+
+static void usage(void) __attribute__((noreturn));
+
+static void usage(void)
+{
+	fprintf(stderr,
+"Usage: br [ OPTIONS ] OBJECT { COMMAND | help }\n"
+"where  OBJECT := { fdb |  monitor }\n"
+"       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails]\n" );
+	exit(-1);
+}
+
+static int do_help(int argc, char **argv)
+{
+	usage();
+}
+
+
+static const struct cmd {
+	const char *cmd;
+	int (*func)(int argc, char **argv);
+} cmds[] = {
+	{ "fdb", 	do_fdb },
+	{ "monitor",	do_monitor },
+	{ "help",	do_help },
+	{ 0 }
+};
+
+static int do_cmd(const char *argv0, int argc, char **argv)
+{
+	const struct cmd *c;
+
+	for (c = cmds; c->cmd; ++c) {
+		if (matches(argv0, c->cmd) == 0)
+			return c->func(argc-1, argv+1);
+	}
+
+	fprintf(stderr, "Object \"%s\" is unknown, try \"br help\".\n", argv0);
+	return -1;
+}
+
+int
+main(int argc, char **argv)
+{
+	while (argc > 1) {
+		char *opt = argv[1];
+		if (strcmp(opt,"--") == 0) {
+			argc--; argv++;
+			break;
+		}
+		if (opt[0] != '-')
+			break;
+		if (opt[1] == '-')
+			opt++;
+
+		if (matches(opt, "-help") == 0) {
+			usage();
+		} else if (matches(opt, "-Version") == 0) {
+			printf("br utility, 0.0\n");
+			exit(0);
+		} else if (matches(opt, "-stats") == 0 ||
+			   matches(opt, "-statistics") == 0) {
+			++show_stats;
+		} else if (matches(opt, "-details") == 0) {
+			++show_details;
+		} else if (matches(opt, "-timestamp") == 0) {
+			++timestamp;
+		} else {
+			fprintf(stderr, "Option \"%s\" is unknown, try \"br -help\".\n", opt);
+			exit(-1);
+		}
+		argc--;	argv++;
+	}
+
+	if (rtnl_open(&rth, 0) < 0)
+		exit(1);
+
+	if (argc > 1)
+		return do_cmd(argv[1], argc-1, argv+1);
+
+	rtnl_close(&rth);
+	usage();
+}
diff --git a/br/fdb.c b/br/fdb.c
new file mode 100644
index 0000000..d849f97
--- /dev/null
+++ b/br/fdb.c
@@ -0,0 +1,245 @@
+/*
+ * Get/set/delete fdb table with netlink
+ *
+ * Authors:	Stephen Hemminger <shemminger@vyatta.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <linux/if_bridge.h>
+#include <linux/if_ether.h>
+#include <linux/neighbour.h>
+#include <string.h>
+
+#include "libnetlink.h"
+#include "br_common.h"
+#include "utils.h"
+
+int filter_index;
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: br fdb { add | del | replace } ADDR dev DEV\n");
+	fprintf(stderr, "       br fdb {show} [ dev DEV ]\n");
+	exit(-1);
+}
+
+static const char *state_n2a(unsigned s)
+{
+	static char buf[32];
+
+	if (s & NUD_PERMANENT) 
+		return "local";
+
+	if (s & NUD_NOARP)
+		return "static";
+
+	if (s & NUD_STALE)
+		return "stale";
+	
+	if (s & NUD_REACHABLE)
+		return "";
+
+	sprintf(buf, "state=%#x", s);
+	return buf;
+}
+
+static char *fmt_time(char *b, size_t l, unsigned long tick)
+{
+	static int hz;
+	
+	if (hz == 0)
+		hz = __get_user_hz();
+
+	snprintf(b, l, "%lu.%02lu", tick / hz, ((tick % hz) * hz) / 100);
+	return b;
+}
+
+int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = arg;
+	struct ndmsg *r = NLMSG_DATA(n);
+	int len = n->nlmsg_len;
+	struct rtattr * tb[NDA_MAX+1];
+	const __u8 *addr = NULL;
+	char b1[32];
+
+	len -= NLMSG_LENGTH(sizeof(*r));
+	if (len < 0) {
+		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
+		return -1;
+	}
+
+	if (r->ndm_family != AF_BRIDGE)
+		return 0;
+
+	if (filter_index && filter_index != r->ndm_ifindex)
+		return 0;
+
+	parse_rtattr(tb, NDA_MAX, NDA_RTA(r),
+		     n->nlmsg_len - NLMSG_LENGTH(sizeof(*r)));
+
+	if (n->nlmsg_type == RTM_DELNEIGH)
+		fprintf(fp, "Deleted ");
+
+	if (tb[NDA_LLADDR])
+		addr = RTA_DATA(tb[NDA_LLADDR]);
+	else {
+		fprintf(stderr, "missing lladdr\n");
+		return -1;
+	}
+
+	fprintf(fp, "%s\t%.2x:%.2x:%.2x:%.2x:%.2x:%.2x\t%s",
+		ll_index_to_name(r->ndm_ifindex),
+		addr[0], addr[1], addr[2],
+		addr[3], addr[4], addr[5],
+		state_n2a(r->ndm_state));
+
+	if (show_stats && tb[NDA_CACHEINFO]) {
+		struct nda_cacheinfo *ci = RTA_DATA(tb[NDA_CACHEINFO]);
+
+		fprintf(fp, "\t%8s", fmt_time(b1, sizeof(b1), ci->ndm_updated));
+		fprintf(fp, " %8s", fmt_time(b1, sizeof(b1), ci->ndm_used));
+	}
+	fprintf(fp, "\n");
+	fflush(fp);
+	return 0;
+}
+
+static int fdb_show(int argc, char **argv)
+{
+	char *filter_dev = NULL;
+	
+	while (argc > 0) {
+		if (strcmp(*argv, "dev") == 0) {
+			NEXT_ARG();
+			if (filter_dev)
+				duparg("dev", *argv);
+			filter_dev = *argv;
+		}
+		argc--; argv++;
+	}
+
+	if (filter_dev) {
+		if ((filter_index = if_nametoindex(filter_dev)) == 0) {
+			fprintf(stderr, "Cannot find device \"%s\"\n", filter_dev);
+			return -1;
+		}
+	}
+
+	if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETNEIGH) < 0) {
+		perror("Cannot send dump request");
+		exit(1);
+	}
+	
+	printf("port\tmac addr\t\tflags%s\n",
+	       show_stats ? "\t updated     used" : "");
+
+	if (rtnl_dump_filter(&rth, print_fdb, stdout, NULL, NULL) < 0) {
+		fprintf(stderr, "Dump terminated\n");
+		exit(1);
+	}
+
+	return 0;
+}
+
+static int fdb_modify(int cmd, int flags, int argc, char **argv)
+{
+	struct {
+		struct nlmsghdr 	n;
+		struct ndmsg 		ndm;
+		char   			buf[256];
+	} req;
+	char *addr = NULL;
+	char *d = NULL;
+	char abuf[ETH_ALEN];
+
+	memset(&req, 0, sizeof(req));
+
+	req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg));
+	req.n.nlmsg_flags = NLM_F_REQUEST|flags;
+	req.n.nlmsg_type = cmd;
+	req.ndm.ndm_family = PF_BRIDGE;
+	req.ndm.ndm_state = NUD_NOARP;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "dev") == 0) {
+			NEXT_ARG();
+			d = *argv;
+		} else if (strcmp(*argv, "local") == 0) {
+			req.ndm.ndm_state = NUD_PERMANENT;
+		} else if (strcmp(*argv, "temp") == 0) {
+			req.ndm.ndm_state = NUD_REACHABLE;
+		} else {
+			if (strcmp(*argv, "to") == 0) {
+				NEXT_ARG();
+			}
+			if (matches(*argv, "help") == 0) {
+				NEXT_ARG();
+			}
+			if (addr)
+				duparg2("to", *argv);
+			addr = *argv;
+		}
+		argc--; argv++;
+	}
+
+	if (d == NULL || addr == NULL) {
+		fprintf(stderr, "Device and address are required arguments.\n");
+		exit(-1);
+	}
+
+	if (sscanf(addr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", 
+		   abuf, abuf+1, abuf+2,
+		   abuf+3, abuf+4, abuf+5) != 6) {
+		fprintf(stderr, "Invalid mac address %s\n", addr);
+		exit(-1);
+	}
+
+	addattr_l(&req.n, sizeof(req), NDA_LLADDR, abuf, ETH_ALEN);
+
+	req.ndm.ndm_ifindex = ll_name_to_index(d);
+	if (req.ndm.ndm_ifindex == 0) {
+		fprintf(stderr, "Cannot find device \"%s\"\n", d);
+		return -1;
+	}
+
+	if (rtnl_talk(&rth, &req.n, 0, 0, NULL, NULL, NULL) < 0)
+		exit(2);
+
+	return 0;
+}
+
+int do_fdb(int argc, char **argv)
+{
+	ll_init_map(&rth);
+
+	if (argc > 0) {
+		if (matches(*argv, "add") == 0)
+			return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_EXCL, argc-1, argv+1);
+		if (matches(*argv, "change") == 0)
+			return fdb_modify(RTM_NEWNEIGH, NLM_F_REPLACE, argc-1, argv+1);
+
+		if (matches(*argv, "replace") == 0)
+			return fdb_modify(RTM_NEWNEIGH, NLM_F_CREATE|NLM_F_REPLACE, argc-1, argv+1);
+		if (matches(*argv, "delete") == 0)
+			return fdb_modify(RTM_DELNEIGH, 0, argc-1, argv+1);
+		if (matches(*argv, "show") == 0 ||
+		    matches(*argv, "lst") == 0 ||
+		    matches(*argv, "list") == 0)
+			return fdb_show(argc-1, argv+1);
+		if (matches(*argv, "help") == 0)
+			usage();
+	} else
+		return fdb_show(0, NULL);
+
+	fprintf(stderr, "Command \"%s\" is unknown, try \"ip neigh help\".\n", *argv);
+	exit(-1);
+}
diff --git a/br/link.c b/br/link.c
new file mode 100644
index 0000000..1b9541d
--- /dev/null
+++ b/br/link.c
@@ -0,0 +1,142 @@
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+#include <linux/if_bridge.h>
+#include <string.h>
+
+#include "utils.h"
+#include "br_common.h"
+
+static const char *port_states[] = {
+	[BR_STATE_DISABLED] = "disabled",
+	[BR_STATE_LISTENING] = "listening",
+	[BR_STATE_LEARNING] = "learning",
+	[BR_STATE_FORWARDING] = "forwarding",
+	[BR_STATE_BLOCKING] = "blocking",
+};
+
+extern char *if_indextoname (unsigned int __ifindex, char *__ifname);
+
+static void print_link_flags(FILE *fp, unsigned flags)
+{
+	fprintf(fp, "<");
+	if (flags & IFF_UP && !(flags & IFF_RUNNING))
+		fprintf(fp, "NO-CARRIER%s", flags ? "," : "");
+	flags &= ~IFF_RUNNING;
+#define _PF(f) if (flags&IFF_##f) { \
+                  flags &= ~IFF_##f ; \
+                  fprintf(fp, #f "%s", flags ? "," : ""); }
+	_PF(LOOPBACK);
+	_PF(BROADCAST);
+	_PF(POINTOPOINT);
+	_PF(MULTICAST);
+	_PF(NOARP);
+	_PF(ALLMULTI);
+	_PF(PROMISC);
+	_PF(MASTER);
+	_PF(SLAVE);
+	_PF(DEBUG);
+	_PF(DYNAMIC);
+	_PF(AUTOMEDIA);
+	_PF(PORTSEL);
+	_PF(NOTRAILERS);
+	_PF(UP);
+	_PF(LOWER_UP);
+	_PF(DORMANT);
+	_PF(ECHO);
+#undef _PF
+        if (flags)
+		fprintf(fp, "%x", flags);
+	fprintf(fp, "> ");
+}
+
+static const char *oper_states[] = {
+	"UNKNOWN", "NOTPRESENT", "DOWN", "LOWERLAYERDOWN", 
+	"TESTING", "DORMANT",	 "UP"
+};
+
+static void print_operstate(FILE *f, __u8 state)
+{
+	if (state >= sizeof(oper_states)/sizeof(oper_states[0]))
+		fprintf(f, "state %#x ", state);
+	else
+		fprintf(f, "state %s ", oper_states[state]);
+}
+
+int print_linkinfo(const struct sockaddr_nl *who,
+		   struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = arg;
+	int len = n->nlmsg_len;
+	struct ifinfomsg *ifi = NLMSG_DATA(n);
+	struct rtattr * tb[IFLA_MAX+1];
+	char b1[IFNAMSIZ];
+
+	len -= NLMSG_LENGTH(sizeof(*ifi));
+	if (len < 0) {
+		fprintf(stderr, "Message too short!\n");
+		return -1;
+        }
+
+	if (!(ifi->ifi_family == AF_BRIDGE || ifi->ifi_family == AF_UNSPEC))
+		return 0;
+
+	parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+
+	if (tb[IFLA_IFNAME] == NULL) {
+		fprintf(stderr, "BUG: nil ifname\n");
+		return -1;
+	}
+
+	if (n->nlmsg_type == RTM_DELLINK)
+		fprintf(fp, "Deleted ");
+
+	fprintf(fp, "%d: %s ", ifi->ifi_index,
+		tb[IFLA_IFNAME] ? (char*)RTA_DATA(tb[IFLA_IFNAME]) : "<nil>");
+
+	if (tb[IFLA_OPERSTATE]) 
+		print_operstate(fp, *(__u8 *)RTA_DATA(tb[IFLA_OPERSTATE]));
+	
+	if (tb[IFLA_LINK]) {
+		SPRINT_BUF(b1);
+		int iflink = *(int*)RTA_DATA(tb[IFLA_LINK]);
+		
+		if (iflink == 0)
+			fprintf(fp, "@NONE: ");
+		else {
+			fprintf(fp, "@%s: ", 
+				if_indextoname(iflink, b1));
+		}
+	} else {
+		fprintf(fp, ": ");
+	}
+
+	print_link_flags(fp, ifi->ifi_flags);
+
+	if (tb[IFLA_MTU])
+		fprintf(fp, "mtu %u ", *(int*)RTA_DATA(tb[IFLA_MTU]));
+
+	if (tb[IFLA_MASTER]) {
+		fprintf(fp, "master %s ", 
+			if_indextoname(*(int*)RTA_DATA(tb[IFLA_MASTER]), b1));
+	}
+
+	if (tb[IFLA_PROTINFO]) {
+		uint8_t state = *(uint8_t *)RTA_DATA(tb[IFLA_PROTINFO]);
+		if (state <= BR_STATE_BLOCKING)
+			fprintf(fp, "state %s", port_states[state]);
+		else
+			fprintf(fp, "state (%d)", state);
+	}
+
+
+	fprintf(fp, "\n");
+	fflush(fp);
+	return 0;
+}
diff --git a/br/monitor.c b/br/monitor.c
new file mode 100644
index 0000000..37468e6
--- /dev/null
+++ b/br/monitor.c
@@ -0,0 +1,138 @@
+/*
+ * brmonitor.c		"br monitor"
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Stephen Hemminger <shemminger@vyatta.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <time.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <linux/if_bridge.h>
+#include <linux/neighbour.h>
+#include <string.h>
+
+#include "utils.h"
+#include "br_common.h"
+
+
+static void usage(void) __attribute__((noreturn));
+int prefix_banner;
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: br monitor\n");
+	exit(-1);
+}
+
+static int show_mark(FILE *fp, const struct nlmsghdr *n)
+{
+	char *tstr;
+	time_t secs = ((__u32*)NLMSG_DATA(n))[0];
+	long usecs = ((__u32*)NLMSG_DATA(n))[1];
+	tstr = asctime(localtime(&secs));
+	tstr[strlen(tstr)-1] = 0;
+	fprintf(fp, "Timestamp: %s %lu us\n", tstr, usecs);
+	return 0;
+}
+
+int accept_msg(const struct sockaddr_nl *who,
+	       struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = arg;
+
+	if (timestamp)
+		print_timestamp(fp);
+
+	switch (n->nlmsg_type) {
+	case RTM_NEWLINK:
+	case RTM_DELLINK:
+		if (prefix_banner)
+			fprintf(fp, "[LINK]");
+
+		return print_linkinfo(who, n, arg);
+
+	case RTM_NEWNEIGH:
+	case RTM_DELNEIGH:
+		if (prefix_banner)
+			fprintf(fp, "[NEIGH]");
+		return print_fdb(who, n, arg);
+
+	case 15:
+		return show_mark(fp, n);
+
+	default:
+		return 0;
+	}
+	
+
+}
+
+int do_monitor(int argc, char **argv)
+{
+	char *file = NULL;
+	unsigned groups = ~RTMGRP_TC;
+	int llink=0;
+	int lneigh=0;
+
+	rtnl_close(&rth);
+
+	while (argc > 0) {
+		if (matches(*argv, "file") == 0) {
+			NEXT_ARG();
+			file = *argv;
+		} else if (matches(*argv, "link") == 0) {
+			llink=1;
+			groups = 0;
+		} else if (matches(*argv, "fdb") == 0) {
+			lneigh = 1;
+			groups = 0;
+		} else if (strcmp(*argv, "all") == 0) {
+			groups = ~RTMGRP_TC;
+			prefix_banner=1;
+		} else if (matches(*argv, "help") == 0) {
+			usage();
+		} else {
+			fprintf(stderr, "Argument \"%s\" is unknown, try \"br monitor help\".\n", *argv);
+			exit(-1);
+		}
+		argc--;	argv++;
+	}
+
+	if (llink)
+		groups |= nl_mgrp(RTNLGRP_LINK);
+
+	if (lneigh) {
+		groups |= nl_mgrp(RTNLGRP_NEIGH);
+	}
+
+	if (file) {
+		FILE *fp;
+		fp = fopen(file, "r");
+		if (fp == NULL) {
+			perror("Cannot fopen");
+			exit(-1);
+		}
+		return rtnl_from_file(fp, accept_msg, stdout);
+	}
+
+	if (rtnl_open(&rth, groups) < 0)
+		exit(1);
+	ll_init_map(&rth);
+
+	if (rtnl_listen(&rth, accept_msg, stdout) < 0)
+		exit(2);
+
+	return 0;
+}
+
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
new file mode 100644
index 0000000..8a5d21e
--- /dev/null
+++ b/man/man8/bridge.8
@@ -0,0 +1,177 @@
+.TH BRIDGE 8 "4 October 2011" "iproute2" "Linux"
+.SH NAME
+bridge \- show / manipulate bridge addresses and devices
+.SH SYNOPSIS
+
+.ad l
+.in +8
+.ti -8
+.B bridge
+.RI "[ " OPTIONS " ] " OBJECT " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.IR OBJECT " := { "
+.BR fdb " | " monitor " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] |
+\fB\-s\fR[\fItatistics\fR]
+
+.ti -8
+.BR "bridge fdb" " { " add " | " del " | " change " | " replace " } "
+.I LLADDR
+.B  dev
+.IR DEV " { "
+.BR local " | " temp " }"
+
+.ti -8
+.BR "bridge fdb" " [ " show " ] [ "
+.B  dev
+.IR DEV " ]"
+
+.ti -8
+.BR "bridge monitor" " [ " all " | " neigh " | " link " ]"
+
+.SH OPTIONS
+
+.TP
+.BR "\-V" , " -Version"
+print the version of the
+.B ip
+utility and exit.
+
+.TP
+.BR "\-s" , " \-stats", " \-statistics"
+output more information.  If the option
+appears twice or more, the amount of information increases.
+As a rule, the information is statistics or some time values.
+
+
+.SH BRIDGE - COMMAND SYNTAX
+
+.SS
+.I OBJECT
+
+.TP
+.B fdb
+- Forwarding Database entry.
+
+.SS
+.I COMMAND
+
+Specifies the action to perform on the object.
+The set of possible actions depends on the object type.
+As a rule, it is possible to
+.BR "add" , " delete"
+and
+.B show
+(or
+.B list
+) objects, but some objects do not allow all of these operations
+or have some additional commands.  The
+.B help
+command is available for all objects.  It prints
+out a list of available commands and argument syntax conventions.
+.sp
+If no command is given, some default command is assumed.
+Usually it is
+.B list
+or, if the objects of this class cannot be listed,
+.BR "help" .
+
+.SH bridge fdb - forwarding database management
+
+.B fdb
+objects contain known ethernet addresses fona  link.
+
+.P
+The corresponding commands display fdb entries, add new entries,
+and delete old ones.
+
+.SS bridge fdb add - add a new neighbour entry
+.SS bridge fdb change - change an existing entry
+.SS bridge fdb replace - add a new entry or change an existing one
+
+These commands create new neighbour records or update existing ones.
+
+.TP
+.BI "ADDRESS"
+the Ethernet MAC address.
+
+.TP
+.BI dev " NAME"
+the interface to which this address is associated.
+
+.TP
+.in +8
+.B local
+- the address is associated with a local interface on the system
+and is never forwarded.
+.sp
+
+.B temp
+- the address is a dynamic entry, and will be removed if not used.
+.sp
+
+.in -8
+
+.SS bridge fdb delete - delete a forwarding database entry
+This command removes an existing fdb entry.
+
+.PP
+The arguments are the same as with
+.BR "bridge fdb add" ,
+
+.SS bridge fdb show - list forwarding entries.
+
+This commands displays current forwarding table.
+
+.PP
+With the
+.B -statistics
+option, the command becomes verbose.  It prints out the last updated
+and last used time for each entry.
+
+.SH bridge monitor - state monitoring
+
+The
+.B bridge
+utility can monitor the state of devices and  addresses
+continuously.  This option has a slightly different format.
+Namely, the
+.B monitor
+command is the first in the command line and then the object list follows:
+
+.BR "bridge monitor" " [ " all " |"
+.IR LISTofOBJECTS " ]"
+
+.I OBJECT-LIST
+is the list of object types that we want to monitor.
+It may contain
+.BR link ",  and " fdb "."
+If no
+.B file
+argument is given,
+.B ip
+opens RTNETLINK, listens on it and dumps state changes in the format
+described in previous sections.
+
+.P
+If a file name is given, it does not listen on RTNETLINK,
+but opens the file containing RTNETLINK messages saved in binary format
+and dumps them.  Such a history file can be generated with the
+
+.SH HISTORY
+.B bridge
+was written by Stephen Hemminger and uses kernel facilities added in Linux 3.0
+.SH SEE ALSO
+.BR ip (8)
+.br
+.RB "Please direct bugreports and patches to: " <netdev@vger.kernel.org>
+
+.SH AUTHOR
+Original Manpage by Stephen Hemminger
-- 
1.7.6.3

^ permalink raw reply related

* Re: [PATCH 5/7] bridge: allow creating/deleting fdb entries via netlink
From: Kevin Wilson @ 2011-10-05 19:06 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20110405000537.372602366@vyatta.com>

Hello all,
I would appreciate if someone can elaborate about "bridge extensions
to iproute2" mentioned here.
I downloaded latest iproute2 git tree and did not find it there.
googling for it did not gave much info about it.
I will appreciate if someone can tell  who develop it, what is the
status, site, repository tree, etc.

rgs,
Kevin

On Tue, Apr 5, 2011 at 3:03 AM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> Use RTM_NEWNEIGH and RTM_DELNEIGH to allow updating of entries
> in bridge forwarding table. This allows manipulating static entries
> which is not possible with existing tools.
>
> Example (using bridge extensions to iproute2)
>   # br fdb add 00:02:03:04:05:06 dev eth0
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
>  net/bridge/br_fdb.c     |  139 ++++++++++++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_netlink.c |    3 +
>  net/bridge/br_private.h |    2
>  3 files changed, 144 insertions(+)
>
> --- a/net/bridge/br_fdb.c       2011-03-22 10:25:00.329008182 -0700
> +++ b/net/bridge/br_fdb.c       2011-03-22 10:25:01.057042585 -0700
> @@ -555,3 +555,142 @@ skip:
>
>        return skb->len;
>  }
> +
> +/* Create new static fdb entry */
> +static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
> +                        __u16 state)
> +{
> +       struct net_bridge *br = source->br;
> +       struct hlist_head *head = &br->hash[br_mac_hash(addr)];
> +       struct net_bridge_fdb_entry *fdb;
> +
> +       fdb = fdb_find(head, addr);
> +       if (fdb)
> +               return -EEXIST;
> +
> +       fdb = fdb_create(head, source, addr);
> +       if (!fdb)
> +               return -ENOMEM;
> +
> +       if (state & NUD_PERMANENT)
> +               fdb->is_local = fdb->is_static = 1;
> +       else if (state & NUD_NOARP)
> +               fdb->is_static = 1;
> +       return 0;
> +}
> +
> +/* Add new permanent fdb entry with RTM_NEWNEIGH */
> +int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
> +{
> +       struct net *net = sock_net(skb->sk);
> +       struct ndmsg *ndm;
> +       struct nlattr *tb[NDA_MAX+1];
> +       struct net_device *dev;
> +       struct net_bridge_port *p;
> +       const __u8 *addr;
> +       int err;
> +
> +       ASSERT_RTNL();
> +       err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
> +       if (err < 0)
> +               return err;
> +
> +       ndm = nlmsg_data(nlh);
> +       if (ndm->ndm_ifindex == 0) {
> +               pr_info("bridge: RTM_NEWNEIGH with invalid ifindex\n");
> +               return -EINVAL;
> +       }
> +
> +       dev = __dev_get_by_index(net, ndm->ndm_ifindex);
> +       if (dev == NULL) {
> +               pr_info("bridge: RTM_NEWNEIGH with unknown ifindex\n");
> +               return -ENODEV;
> +       }
> +
> +       if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
> +               pr_info("bridge: RTM_NEWNEIGH with invalid address\n");
> +               return -EINVAL;
> +       }
> +
> +       addr = nla_data(tb[NDA_LLADDR]);
> +       if (!is_valid_ether_addr(addr)) {
> +               pr_info("bridge: RTM_NEWNEIGH with invalid ether address\n");
> +               return -EINVAL;
> +       }
> +
> +       p = br_port_get_rtnl(dev);
> +       if (p == NULL) {
> +               pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
> +                       dev->name);
> +               return -EINVAL;
> +       }
> +
> +       spin_lock_bh(&p->br->hash_lock);
> +       err = fdb_add_entry(p, addr, ndm->ndm_state);
> +       spin_unlock_bh(&p->br->hash_lock);
> +
> +       return err;
> +}
> +
> +static int fdb_delete_by_addr(struct net_bridge_port *p, const u8 *addr)
> +{
> +       struct net_bridge *br = p->br;
> +       struct hlist_head *head = &br->hash[br_mac_hash(addr)];
> +       struct net_bridge_fdb_entry *fdb;
> +
> +       fdb = fdb_find(head, addr);
> +       if (!fdb)
> +               return -ENOENT;
> +
> +       fdb_delete(fdb);
> +       return 0;
> +}
> +
> +/* Remove neighbor entry with RTM_DELNEIGH */
> +int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
> +{
> +       struct net *net = sock_net(skb->sk);
> +       struct ndmsg *ndm;
> +       struct net_bridge_port *p;
> +       struct nlattr *llattr;
> +       const __u8 *addr;
> +       struct net_device *dev;
> +       int err;
> +
> +       ASSERT_RTNL();
> +       if (nlmsg_len(nlh) < sizeof(*ndm))
> +               return -EINVAL;
> +
> +       ndm = nlmsg_data(nlh);
> +       if (ndm->ndm_ifindex == 0) {
> +               pr_info("bridge: RTM_DELNEIGH with invalid ifindex\n");
> +               return -EINVAL;
> +       }
> +
> +       dev = __dev_get_by_index(net, ndm->ndm_ifindex);
> +       if (dev == NULL) {
> +               pr_info("bridge: RTM_DELNEIGH with unknown ifindex\n");
> +               return -ENODEV;
> +       }
> +
> +       llattr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_LLADDR);
> +       if (llattr == NULL || nla_len(llattr) != ETH_ALEN) {
> +               pr_info("bridge: RTM_DELNEIGH with invalid address\n");
> +               return -EINVAL;
> +       }
> +
> +       addr = nla_data(llattr);
> +
> +       p = br_port_get_rtnl(dev);
> +       if (p == NULL) {
> +               pr_info("bridge: RTM_DELNEIGH %s not a bridge port\n",
> +                       dev->name);
> +               return -EINVAL;
> +       }
> +
> +       spin_lock_bh(&p->br->hash_lock);
> +       err = fdb_delete_by_addr(p, addr);
> +       spin_unlock_bh(&p->br->hash_lock);
> +
> +       return err;
> +}
> --- a/net/bridge/br_netlink.c   2011-03-22 10:25:00.329008182 -0700
> +++ b/net/bridge/br_netlink.c   2011-03-22 10:25:01.057042585 -0700
> @@ -196,6 +196,9 @@ int __init br_netlink_init(void)
>
>        /* Only the first call to __rtnl_register can fail */
>        __rtnl_register(PF_BRIDGE, RTM_SETLINK, br_rtm_setlink, NULL);
> +
> +       __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, br_fdb_add, NULL);
> +       __rtnl_register(PF_BRIDGE, RTM_DELNEIGH, br_fdb_delete, NULL);
>        __rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, br_fdb_dump);
>
>        return 0;
> --- a/net/bridge/br_private.h   2011-03-22 10:25:00.329008182 -0700
> +++ b/net/bridge/br_private.h   2011-03-22 10:25:01.057042585 -0700
> @@ -355,6 +355,8 @@ extern void br_fdb_update(struct net_bri
>                          struct net_bridge_port *source,
>                          const unsigned char *addr);
>  extern int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb);
> +extern int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
> +extern int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
>
>  /* br_forward.c */
>  extern void br_deliver(const struct net_bridge_port *to,
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v2 2/2] virtio-net: Prevent NULL dereference
From: Michael S. Tsirkin @ 2011-10-05 18:59 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, Rusty Russell, virtualization, netdev, kvm
In-Reply-To: <1317822614.3676.0.camel@lappy>

On Wed, Oct 05, 2011 at 03:50:14PM +0200, Sasha Levin wrote:
> On Mon, 2011-10-03 at 20:40 +0200, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2011 at 05:40:55PM +0300, Sasha Levin wrote:
> > > This patch prevents a NULL dereference when the user has passed a length
> > > longer than an actual buffer to virtio-net.
> > > 
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > Cc: virtualization@lists.linux-foundation.org
> > > Cc: netdev@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > > ---
> > >  drivers/net/virtio_net.c |   12 +++++++++++-
> > >  1 files changed, 11 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index bde0dec..4a53d2a 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -208,12 +208,22 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >  		return NULL;
> > >  	}
> > >  
> > > -	while (len) {
> > > +	while (len && page) {
> > >  		set_skb_frag(skb, page, offset, &len);
> > >  		page = (struct page *)page->private;
> > >  		offset = 0;
> > >  	}
> > >  
> > > +	/*
> > > +	 * This is the case where we ran out of pages in our linked list, but
> > > +	 * supposedly have more data to read.
> > 
> > Again, let's clarify that this only happens with broken devices.
> 
> I think that the code within the if() makes it clear that it isn't the
> regular path.

It doesn't make it clear that this never happens in absence of bugs.

> -- 
> 
> Sasha.

^ permalink raw reply

* Re: [PATCH v2 1/2] virtio-net: Verify page list size before fitting into  skb
From: Michael S. Tsirkin @ 2011-10-05 18:59 UTC (permalink / raw)
  To: Sasha Levin; +Cc: linux-kernel, Rusty Russell, virtualization, netdev, kvm
In-Reply-To: <1317822654.3676.1.camel@lappy>

On Wed, Oct 05, 2011 at 03:50:54PM +0200, Sasha Levin wrote:
> On Mon, 2011-10-03 at 21:04 +0200, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2011 at 05:40:54PM +0300, Sasha Levin wrote:
> > > This patch verifies that the length of a buffer stored in a linked list
> > > of pages is small enough to fit into a skb.
> > > 
> > > If the size is larger than a max size of a skb, it means that we shouldn't
> > > go ahead building skbs anyway since we won't be able to send the buffer as
> > > the user requested.
> > > 
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > > Cc: virtualization@lists.linux-foundation.org
> > > Cc: netdev@vger.kernel.org
> > > Cc: kvm@vger.kernel.org
> > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > > ---
> > >  drivers/net/virtio_net.c |   13 +++++++++++++
> > >  1 files changed, 13 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 0c7321c..bde0dec 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -195,6 +195,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >  	len -= copy;
> > >  	offset += copy;
> > >  
> > > +	/*
> > > +	 * Verify that we can indeed put this data into a skb.
> > > +	 * This is here to handle cases when the device erroneously
> > > +	 * tries to receive more than is possible. This is usually
> > > +	 * the case of a broken device.
> > > +	 */
> > > +	if (unlikely(len > MAX_SKB_FRAGS * PAGE_SIZE)) {
> > > +		if (net_ratelimit())
> > > +			pr_debug("%s: too much data\n", skb->dev->name);
> > > +		dev_kfree_skb(skb);
> > > +		return NULL;
> > > +	}
> > > +
> > 
> > BTW, receive_mergeable does
> >                         pr_debug("%s: packet too long\n", skb->dev->name);
> >                         skb->dev->stats.rx_length_errors++;
> > 
> > which makes sense.
> 
> Do you think we should increase rx_length_errors here as well?

this is all debugging tool for devices/drivers, right?
so maybe not worth the noise.

> -- 
> 
> Sasha.

^ permalink raw reply

* Re: [PATCH net-next] Add ethtool -g support to 8139cp
From: Ben Hutchings @ 2011-10-05 18:50 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <20111005180854.3D68129003A8@tardy>

On Wed, 2011-10-05 at 11:08 -0700, Rick Jones wrote:
> From: Rick Jones <rick.jones2@hp.com>
> 
> Add support for reporting ring sizes via ethtool -g to the 8139cp driver.
> 
> Signed-off-by: Rick Jones <rick.jones2@hp.com>
> 
> ---
> 
> diff --git a/drivers/net/ethernet/realtek/8139cp.c b/drivers/net/ethernet/realtek/8139cp.c
> index c77d5af..1dcfe57 100644
> --- a/drivers/net/ethernet/realtek/8139cp.c
> +++ b/drivers/net/ethernet/realtek/8139cp.c
> @@ -1324,6 +1324,19 @@ static void cp_get_drvinfo (struct net_device *dev, struct ethtool_drvinfo *info
>  	strcpy (info->bus_info, pci_name(cp->pdev));
>  }
>  
> +static void cp_get_ringparam(struct net_device *dev,
> +				struct ethtool_ringparam *ring)
> +{
> +	ring->rx_max_pending = CP_RX_RING_SIZE;
> +	ring->tx_max_pending = CP_TX_RING_SIZE;
> +	ring->rx_mini_max_pending = 0;
> +	ring->rx_jumbo_max_pending = 0;
> +	ring->rx_pending = CP_RX_RING_SIZE;
> +	ring->tx_pending = CP_TX_RING_SIZE;
> +	ring->rx_mini_pending = 0;
> +	ring->rx_jumbo_pending = 0;
> +}

You generally don't need to set fields to zero in ethtool 'get'
operations, as the ethtool core will initialise the entire structure to
zero.

Ben.

>  static int cp_get_regs_len(struct net_device *dev)
>  {
>  	return CP_REGS_SIZE;
> @@ -1525,6 +1538,7 @@ static const struct ethtool_ops cp_ethtool_ops = {
>  	.get_eeprom_len		= cp_get_eeprom_len,
>  	.get_eeprom		= cp_get_eeprom,
>  	.set_eeprom		= cp_set_eeprom,
> +	.get_ringparam		= cp_get_ringparam,
>  };
>  
>  static int cp_ioctl (struct net_device *dev, struct ifreq *rq, int cmd)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox