Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 63/72] tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
From: YOSHIFUJI Hideaki @ 2010-10-21 21:07 UTC (permalink / raw)
  To: kaber; +Cc: bazsi, hidden, davem, netfilter-devel, netdev, yoshfuji
In-Reply-To: <1287674399-31455-64-git-send-email-kaber@trash.net>

Hello.

On 2010-10-21, kaber@trash.net wrote:
> From: Balazs Scheidler <bazsi@balabit.hu>
> 
> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---
>  net/ipv6/af_inet6.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 56b9bf2..4869797 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -343,7 +343,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  			 */
>  			v4addr = LOOPBACK4_IPV6;
>  			if (!(addr_type & IPV6_ADDR_MULTICAST))	{
> -				if (!ipv6_chk_addr(net, &addr->sin6_addr,
> +				if (!inet->transparent &&
> +				    !ipv6_chk_addr(net, &addr->sin6_addr,
>  						   dev, 0)) {
>  					err = -EADDRNOTAVAIL;
>  					goto out_unlock;

Sorry, NAK.

http://kerneltrap.org/mailarchive/linux-netdev/2010/7/5/6280572

--yoshfuji


^ permalink raw reply

* Re: [PATCH 4/9] tproxy: added tproxy sockopt interface in the IPV6 layer
From: YOSHIFUJI Hideaki @ 2010-10-21 21:09 UTC (permalink / raw)
  To: KOVACS Krisztian
  Cc: Jan Engelhardt, netdev, netfilter-devel, Patrick McHardy,
	David Miller, yoshfuji
In-Reply-To: <1287650781.13326.1.camel@este.odu>

On 2010-10-21, KOVACS Krisztian wrote:
> Hi,
> 
> On Thu, 2010-10-21 at 10:39 +0200, Jan Engelhardt wrote:
> > On Wednesday 2010-10-20 13:21, KOVACS Krisztian wrote:
> > 
> > >@@ -268,6 +268,10 @@ struct in6_flowlabel_req {
> > > /* RFC5082: Generalized Ttl Security Mechanism */
> > > #define IPV6_MINHOPCOUNT		73
> > > 
> > >+#define IPV6_ORIGDSTADDR        74
> > >+#define IPV6_RECVORIGDSTADDR    IPV6_ORIGDSTADDR
> > >+#define IPV6_TRANSPARENT        75
> > >+
> > 
> > Why do we actually need two names for the same thing?
> 
> IPV6_RECVORIGDSTADDR is the name of the socket option you're supposed to
> set if you require the original destination address. IPV6_ORIGDSTADDR is
> the name of the ancillary message you get with the actual address in it.
> Just like we have it for IP_TOS/IP_RECVTOS, for example.

I agree.

--yoshfuji


^ permalink raw reply

* Re: [PATCH 5/9] tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
From: YOSHIFUJI Hideaki @ 2010-10-21 21:24 UTC (permalink / raw)
  To: Balazs Scheidler
  Cc: KOVACS Krisztian, netdev, netfilter-devel, Patrick McHardy,
	David Miller, yoshfuji
In-Reply-To: <1287583653.29676.9.camel@bzorp.lan>

Hello.

2010-10-20, Balazs Scheidler wrote:
> On Wed, 2010-10-20 at 21:45 +0900, YOSHIFUJI Hideaki wrote:
> > (2010/10/20 20:21), KOVACS Krisztian wrote:
> > > From: Balazs Scheidler<bazsi@balabit.hu>
> > > 
> > > Signed-off-by: Balazs Scheidler<bazsi@balabit.hu>
> > > Signed-off-by: KOVACS Krisztian<hidden@balabit.hu>
> > > ---
> > >   net/ipv6/af_inet6.c |    2 +-
> > >   1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> > > index 6022098..9480572 100644
> > > --- a/net/ipv6/af_inet6.c
> > > +++ b/net/ipv6/af_inet6.c
> > > @@ -343,7 +343,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > >   			 */
> > >   			v4addr = LOOPBACK4_IPV6;
> > >   			if (!(addr_type&  IPV6_ADDR_MULTICAST))	{
> > > -				if (!ipv6_chk_addr(net,&addr->sin6_addr,
> > > +				if (!inet->transparent&&  !ipv6_chk_addr(net,&addr->sin6_addr,
> > >   						   dev, 0)) {
> > >   					err = -EADDRNOTAVAIL;
> > >   					goto out_unlock;
> > > 
> > > 
> > 
> > As I wrote before in other thread, this does not seem sufficient --
> > well, it is sufficient to allow non-local bind, but before we're
> > allowing this, we need add checks of source address in sending side.
> 
> Can you please elaborate or point us to the other thread? Is it some
> kind of address-type check that we miss?

Please see my comment at:
<http://kerneltrap.org/mailarchive/linux-netdev/2010/7/5/6280572>

This will result in allowing non-privileged users easily sending from
non-local / unauthorized address, which is not good, and which should
not be allowed from security aspects.

Regards,

--yoshfuji


^ permalink raw reply

* [PATCH 1/2] cxgb4: fix crash due to manipulating queues before registration
From: Dimitris Michailidis @ 2010-10-21 21:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1287696596-15175-1-git-send-email-dm@chelsio.com>

Before commit "net: allocate tx queues in register_netdevice"
netif_tx_stop_all_queues and related functions could be used between
device allocation and registration but now only after registration.
cxgb4 has such a call before registration and crashes now.  Move it
after register_netdev.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
---
 drivers/net/cxgb4/cxgb4_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index 930bd07..bc354ee 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -3657,7 +3657,6 @@ static int __devinit init_one(struct pci_dev *pdev,
 		pi->rx_offload = RX_CSO;
 		pi->port_id = i;
 		netif_carrier_off(netdev);
-		netif_tx_stop_all_queues(netdev);
 		netdev->irq = pdev->irq;
 
 		netdev->features |= NETIF_F_SG | TSO_FLAGS;
@@ -3729,6 +3728,7 @@ static int __devinit init_one(struct pci_dev *pdev,
 
 			__set_bit(i, &adapter->registered_device_map);
 			adapter->chan_map[adap2pinfo(adapter, i)->tx_chan] = i;
+			netif_tx_stop_all_queues(adapter->port[i]);
 		}
 	}
 	if (!adapter->registered_device_map) {
-- 
1.5.4


^ permalink raw reply related

* [PATCH 2/2] cxgb4: update to utilize the newer VLAN infrastructure
From: Dimitris Michailidis @ 2010-10-21 21:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1287696596-15175-2-git-send-email-dm@chelsio.com>

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
---
 drivers/net/cxgb4/cxgb4.h      |    1 -
 drivers/net/cxgb4/cxgb4_main.c |   31 +++++++++++++++++++------------
 drivers/net/cxgb4/sge.c        |   23 +++++------------------
 3 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4.h b/drivers/net/cxgb4/cxgb4.h
index eaa49e4..3d4253d 100644
--- a/drivers/net/cxgb4/cxgb4.h
+++ b/drivers/net/cxgb4/cxgb4.h
@@ -281,7 +281,6 @@ struct sge_rspq;
 
 struct port_info {
 	struct adapter *adapter;
-	struct vlan_group *vlan_grp;
 	u16    viid;
 	s16    xact_addr_filt;        /* index of exact MAC address filter */
 	u16    rss_size;              /* size of VI's RSS table slice */
diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index bc354ee..26a88a0 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -403,7 +403,7 @@ static int link_start(struct net_device *dev)
 	 * that step explicitly.
 	 */
 	ret = t4_set_rxmode(pi->adapter, mb, pi->viid, dev->mtu, -1, -1, -1,
-			    pi->vlan_grp != NULL, true);
+			    !!(dev->features & NETIF_F_HW_VLAN_RX), true);
 	if (ret == 0) {
 		ret = t4_change_mac(pi->adapter, mb, pi->viid,
 				    pi->xact_addr_filt, dev->dev_addr, true,
@@ -1881,7 +1881,24 @@ static int set_tso(struct net_device *dev, u32 value)
 
 static int set_flags(struct net_device *dev, u32 flags)
 {
-	return ethtool_op_set_flags(dev, flags, ETH_FLAG_RXHASH);
+	int err;
+	unsigned long old_feat = dev->features;
+
+	err = ethtool_op_set_flags(dev, flags, ETH_FLAG_RXHASH |
+				   ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
+	if (err)
+		return err;
+
+	if ((old_feat ^ dev->features) & NETIF_F_HW_VLAN_RX) {
+		const struct port_info *pi = netdev_priv(dev);
+
+		err = t4_set_rxmode(pi->adapter, pi->adapter->fn, pi->viid, -1,
+				    -1, -1, -1, !!(flags & ETH_FLAG_RXVLAN),
+				    true);
+		if (err)
+			dev->features = old_feat;
+	}
+	return err;
 }
 
 static int get_rss_table(struct net_device *dev, struct ethtool_rxfh_indir *p)
@@ -2841,15 +2858,6 @@ static int cxgb_set_mac_addr(struct net_device *dev, void *p)
 	return 0;
 }
 
-static void vlan_rx_register(struct net_device *dev, struct vlan_group *grp)
-{
-	struct port_info *pi = netdev_priv(dev);
-
-	pi->vlan_grp = grp;
-	t4_set_rxmode(pi->adapter, pi->adapter->fn, pi->viid, -1, -1, -1, -1,
-		      grp != NULL, true);
-}
-
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cxgb_netpoll(struct net_device *dev)
 {
@@ -2877,7 +2885,6 @@ static const struct net_device_ops cxgb4_netdev_ops = {
 	.ndo_validate_addr    = eth_validate_addr,
 	.ndo_do_ioctl         = cxgb_ioctl,
 	.ndo_change_mtu       = cxgb_change_mtu,
-	.ndo_vlan_rx_register = vlan_rx_register,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller  = cxgb_netpoll,
 #endif
diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c
index 9967f3d..1702225 100644
--- a/drivers/net/cxgb4/sge.c
+++ b/drivers/net/cxgb4/sge.c
@@ -1530,18 +1530,11 @@ static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
 		skb->rxhash = (__force u32)pkt->rsshdr.hash_val;
 
 	if (unlikely(pkt->vlan_ex)) {
-		struct port_info *pi = netdev_priv(rxq->rspq.netdev);
-		struct vlan_group *grp = pi->vlan_grp;
-
+		__vlan_hwaccel_put_tag(skb, ntohs(pkt->vlan));
 		rxq->stats.vlan_ex++;
-		if (likely(grp)) {
-			ret = vlan_gro_frags(&rxq->rspq.napi, grp,
-					     ntohs(pkt->vlan));
-			goto stats;
-		}
 	}
 	ret = napi_gro_frags(&rxq->rspq.napi);
-stats:	if (ret == GRO_HELD)
+	if (ret == GRO_HELD)
 		rxq->stats.lro_pkts++;
 	else if (ret == GRO_MERGED || ret == GRO_MERGED_FREE)
 		rxq->stats.lro_merged++;
@@ -1608,16 +1601,10 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
 		skb_checksum_none_assert(skb);
 
 	if (unlikely(pkt->vlan_ex)) {
-		struct vlan_group *grp = pi->vlan_grp;
-
+		__vlan_hwaccel_put_tag(skb, ntohs(pkt->vlan));
 		rxq->stats.vlan_ex++;
-		if (likely(grp))
-			vlan_hwaccel_receive_skb(skb, grp, ntohs(pkt->vlan));
-		else
-			dev_kfree_skb_any(skb);
-	} else
-		netif_receive_skb(skb);
-
+	}
+	netif_receive_skb(skb);
 	return 0;
 }
 
-- 
1.5.4


^ permalink raw reply related

* [PATCH 0/2] cxgb4 updates
From: Dimitris Michailidis @ 2010-10-21 21:29 UTC (permalink / raw)
  To: netdev


Here are two patches for cxgb4.  The first fixes a crash triggered by
e6484930d7c73d324bccda7d43d131088da697b9.  The second updates the driver
to utilize the newer VLAN infrastructure.  If it's too late for the latter
let me know and I'll resend it when net-next reopens.

^ permalink raw reply

* [PATCH 1/2] vlan: Calling vlan_hwaccel_do_receive() is always valid.
From: Jesse Gross @ 2010-10-21 21:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

It is now acceptable to receive vlan tagged packets at any time,
even if CONFIG_VLAN_8021Q is not set.  This means that calling
vlan_hwaccel_do_receive() should not result in BUG() but rather just
behave as if there were no vlan devices configured.

Reported-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/linux/if_vlan.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index e607256..cbd3dcd 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -153,7 +153,8 @@ static inline u16 vlan_dev_vlan_id(const struct net_device *dev)
 
 static inline bool vlan_hwaccel_do_receive(struct sk_buff **skb)
 {
-	BUG();
+	if ((*skb)->vlan_tci & VLAN_VID_MASK)
+		(*skb)->pkt_type = PACKET_OTHERHOST;
 	return false;
 }
 #endif
-- 
1.7.1


^ permalink raw reply related

* [PATCH 2/2] bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
From: Jesse Gross @ 2010-10-21 21:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287696643-9695-1-git-send-email-jesse@nicira.com>

Some cards don't support changing vlan offloading settings.  Make
Ethtool set_flags return -EINVAL in those cases.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 drivers/net/bnx2.c                |    2 +-
 drivers/net/bnx2x/bnx2x_ethtool.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index bf3c830..062600b 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -7528,7 +7528,7 @@ bnx2_set_flags(struct net_device *dev, u32 data)
 
 	if (!(bp->flags & BNX2_FLAG_CAN_KEEP_VLAN) &&
 	    !(data & ETH_FLAG_RXVLAN))
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH | ETH_FLAG_RXVLAN |
 				  ETH_FLAG_TXVLAN);
diff --git a/drivers/net/bnx2x/bnx2x_ethtool.c b/drivers/net/bnx2x/bnx2x_ethtool.c
index daefef6..d02ffbd 100644
--- a/drivers/net/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/bnx2x/bnx2x_ethtool.c
@@ -1123,7 +1123,7 @@ static int bnx2x_set_flags(struct net_device *dev, u32 data)
 	}
 
 	if (!(data & ETH_FLAG_RXVLAN))
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	if ((data & ETH_FLAG_LRO) && bp->rx_csum && bp->disable_tpa)
 		return -EINVAL;
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Jesse Gross @ 2010-10-21 21:34 UTC (permalink / raw)
  To: Vladislav Zolotarov
  Cc: David Miller, netdev@vger.kernel.org, Hao Zheng, Eilon Greenstein
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDEE429137@SJEXCHCCR02.corp.ad.broadcom.com>

On Thu, Oct 21, 2010 at 7:02 AM, Vladislav Zolotarov <vladz@broadcom.com> wrote:
>> Guys, when I compiled the kernel with these patches without VLAN
>> support (CONFIG_VLAN_8021Q is not set) and tried to send VLAN tagged
>> frames from the remote side to the bnx2x interface the kernel panicked.
>>
>> The stack trace got cut with the __netif_receive_skb() on top by the
>> IPKVM and I'll have to connect a serial to get it all. But until I
>> did that maybe somebody will have any ideas anyway...
>>
>> It happens regardless there is HW RX VLAN stripping enabled or not.
>
> When RX VLAN stripping is enabled we hit the BUG() in the
> vlan_hwaccel_do_receive().

Thanks, this is just a stupid mistake - I didn't update the
non-CONFIG_VLAN_8021Q version when I changed the semantics of that
function.  I've sent out a patch to fix it.

^ permalink raw reply

* Re: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Jesse Gross @ 2010-10-21 21:36 UTC (permalink / raw)
  To: Vladislav Zolotarov
  Cc: David Miller, netdev@vger.kernel.org, Hao Zheng, Eilon Greenstein
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDEE42913C@SJEXCHCCR02.corp.ad.broadcom.com>

On Thu, Oct 21, 2010 at 7:50 AM, Vladislav Zolotarov <vladz@broadcom.com> wrote:
>> > Guys, when I compiled the kernel with these patches without VLAN
>> > support (CONFIG_VLAN_8021Q is not set) and tried to send VLAN tagged
>> > frames from the remote side to the bnx2x interface the kernel
>> panicked.
>> >
>> > The stack trace got cut with the __netif_receive_skb() on top by the
>> > IPKVM and I'll have to connect a serial to get it all. But until I
>> > did that maybe somebody will have any ideas anyway...
>> >
>> > It happens regardless there is HW RX VLAN stripping enabled or not.
>>
>> When RX VLAN stripping is enabled we hit the BUG() in the
>> vlan_hwaccel_do_receive().:
>
> We hit the same BUG() both when VLAN stripping is disabled.

This one surprises me because that function shouldn't get called at
all when VLAN stripping is disabled.  Are you sure that it is
disabled?  From my reading of the bnx2x driver it seems like it is
always enabled.

^ permalink raw reply

* Re: [PATCH v2 09/14] bnx2: Update bnx2 to use new vlan accleration.
From: Jesse Gross @ 2010-10-21 21:38 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, Michael Chan
In-Reply-To: <1287675106.2235.10.camel@achroite.uk.solarflarecom.com>

On Thu, Oct 21, 2010 at 8:31 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2010-10-20 at 16:56 -0700, Jesse Gross wrote:
>> Make the bnx2 driver use the new vlan accleration model.
>>
>> Signed-off-by: Jesse Gross <jesse@nicira.com>
>> CC: Michael Chan <mchan@broadcom.com>
>> ---
>>  drivers/net/bnx2.c |   97 +++++++++++++++-------------------------------------
>>  drivers/net/bnx2.h |    4 --
>>  2 files changed, 28 insertions(+), 73 deletions(-)
>>
>> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
>> index 363ca8b..bf3c830 100644
>> --- a/drivers/net/bnx2.c
>> +++ b/drivers/net/bnx2.c
> [...]
>> @@ -7578,7 +7523,28 @@ bnx2_set_tx_csum(struct net_device *dev, u32 data)
>>  static int
>>  bnx2_set_flags(struct net_device *dev, u32 data)
>>  {
>> -     return ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH);
>> +     struct bnx2 *bp = netdev_priv(dev);
>> +     int rc;
>> +
>> +     if (!(bp->flags & BNX2_FLAG_CAN_KEEP_VLAN) &&
>> +         !(data & ETH_FLAG_RXVLAN))
>> +             return -EOPNOTSUPP;
> [...]
>
> Should be -EINVAL.

OK, thanks.  I've sent out a patch to fix this (and a similar one in
the bnx2x driver).

^ permalink raw reply

* Re: [PATCH v2 04/14] vlan: Enable software emulation for vlan accleration.
From: Jesse Gross @ 2010-10-21 21:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev
In-Reply-To: <1287675008.2235.8.camel@achroite.uk.solarflarecom.com>

On Thu, Oct 21, 2010 at 8:30 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2010-10-20 at 16:56 -0700, Jesse Gross wrote:
>> Currently users of hardware vlan accleration need to know whether
>> the device supports it before generating packets.  However, vlan
>> acceleration will soon be available in a more flexible manner so
>> knowing ahead of time becomes much more difficult.  This adds
>> a software fallback path for vlan packets on devices without the
>> necessary offloading support, similar to other types of hardware
>> accleration.
> [...]
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 4c3ac53..1bfd96b 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1694,7 +1694,12 @@ static bool can_checksum_protocol(unsigned long features, __be16 protocol)
>>
>>  static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
>>  {
>> -     if (can_checksum_protocol(dev->features, skb->protocol))
>> +     int features = dev->features;
>> +
>> +     if (vlan_tx_tag_present(skb))
>> +             features &= dev->vlan_features;
>> +
>> +     if (can_checksum_protocol(features, skb->protocol))
>>               return true;
>>
>>       if (skb->protocol == htons(ETH_P_8021Q)) {
> [...]
>
> Additional context:
>
>                struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
>                if (can_checksum_protocol(dev->features & dev->vlan_features,
>                                          veh->h_vlan_encapsulated_proto))
>                        return true;
>        }
>
>        return false;
> }
>
> I don't think this will do the right thing if the NIC does VLAN tag
> insertion and checksum offload with only one layer of VLAN
> encapsulation, but the skb has two layers of VLAN encapsulation.
>
> I think we actually want something like:
>
> static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
> {
>        __be16 protocol = skb->protocol;
>        int features = dev->features;
>
>        if (vlan_tx_tag_present(skb)) {
>                features &= dev->vlan_features;
>        } else if (skb->protocol == htons(ETH_P_8021Q)) {
>                struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
>                protocol = veh->h_vlan_encapsulated_proto;
>                features &= dev->vlan_features;
>        }
>
>        return can_checksum_protocol(features, protocol);
> }
>
> Does that look right?

Thanks, good catch.  Yes, that looks right.  Will you submit a patch
to fix this?

^ permalink raw reply

* [RFC][net-next-2.6 PATCH 1/4] net: consolidate 8021q tagging
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse

Now that VLAN packets are tagged in dev_hard_start_xmit()
at the bottom of the stack we no longer need to tag them
in the 8021Q module (Except in the !VLAN_FLAG_REORDER_HDR
case).

This allows the accel path and non accel paths to be consolidated.
Here the vlan_tci in the skb is always set and we allow the
stack to add the actual tag in dev_hard_start_xmit().

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |  105 +++-----------------------------------------------
 1 files changed, 7 insertions(+), 98 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 14e3d1f..78b1618 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -326,24 +326,12 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	 */
 	if (veth->h_vlan_proto != htons(ETH_P_8021Q) ||
 	    vlan_dev_info(dev)->flags & VLAN_FLAG_REORDER_HDR) {
-		unsigned int orig_headroom = skb_headroom(skb);
 		u16 vlan_tci;
-
-		vlan_dev_info(dev)->cnt_encap_on_xmit++;
-
 		vlan_tci = vlan_dev_info(dev)->vlan_id;
 		vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-		skb = __vlan_put_tag(skb, vlan_tci);
-		if (!skb) {
-			txq->tx_dropped++;
-			return NETDEV_TX_OK;
-		}
-
-		if (orig_headroom < VLAN_HLEN)
-			vlan_dev_info(dev)->cnt_inc_headroom_on_tx++;
+		skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
 	}
 
-
 	skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
 	len = skb->len;
 	ret = dev_queue_xmit(skb);
@@ -357,32 +345,6 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	return ret;
 }
 
-static netdev_tx_t vlan_dev_hwaccel_hard_start_xmit(struct sk_buff *skb,
-						    struct net_device *dev)
-{
-	int i = skb_get_queue_mapping(skb);
-	struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-	u16 vlan_tci;
-	unsigned int len;
-	int ret;
-
-	vlan_tci = vlan_dev_info(dev)->vlan_id;
-	vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-	skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
-
-	skb->dev = vlan_dev_info(dev)->real_dev;
-	len = skb->len;
-	ret = dev_queue_xmit(skb);
-
-	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
-		txq->tx_packets++;
-		txq->tx_bytes += len;
-	} else
-		txq->tx_dropped++;
-
-	return ret;
-}
-
 static u16 vlan_dev_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct net_device *rdev = vlan_dev_info(dev)->real_dev;
@@ -719,8 +681,7 @@ static const struct header_ops vlan_header_ops = {
 	.parse	 = eth_header_parse,
 };
 
-static const struct net_device_ops vlan_netdev_ops, vlan_netdev_accel_ops,
-		    vlan_netdev_ops_sq, vlan_netdev_accel_ops_sq;
+static const struct net_device_ops vlan_netdev_ops, vlan_netdev_ops_sq;
 
 static int vlan_dev_init(struct net_device *dev)
 {
@@ -755,19 +716,16 @@ static int vlan_dev_init(struct net_device *dev)
 	if (real_dev->features & NETIF_F_HW_VLAN_TX) {
 		dev->header_ops      = real_dev->header_ops;
 		dev->hard_header_len = real_dev->hard_header_len;
-		if (real_dev->netdev_ops->ndo_select_queue)
-			dev->netdev_ops = &vlan_netdev_accel_ops_sq;
-		else
-			dev->netdev_ops = &vlan_netdev_accel_ops;
 	} else {
 		dev->header_ops      = &vlan_header_ops;
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
-		if (real_dev->netdev_ops->ndo_select_queue)
-			dev->netdev_ops = &vlan_netdev_ops_sq;
-		else
-			dev->netdev_ops = &vlan_netdev_ops;
 	}
 
+	if (real_dev->netdev_ops->ndo_select_queue)
+		dev->netdev_ops = &vlan_netdev_ops_sq;
+	else
+		dev->netdev_ops = &vlan_netdev_ops;
+
 	if (is_vlan_dev(real_dev))
 		subclass = 1;
 
@@ -908,30 +866,6 @@ static const struct net_device_ops vlan_netdev_ops = {
 #endif
 };
 
-static const struct net_device_ops vlan_netdev_accel_ops = {
-	.ndo_change_mtu		= vlan_dev_change_mtu,
-	.ndo_init		= vlan_dev_init,
-	.ndo_uninit		= vlan_dev_uninit,
-	.ndo_open		= vlan_dev_open,
-	.ndo_stop		= vlan_dev_stop,
-	.ndo_start_xmit =  vlan_dev_hwaccel_hard_start_xmit,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address	= vlan_dev_set_mac_address,
-	.ndo_set_rx_mode	= vlan_dev_set_rx_mode,
-	.ndo_set_multicast_list	= vlan_dev_set_rx_mode,
-	.ndo_change_rx_flags	= vlan_dev_change_rx_flags,
-	.ndo_do_ioctl		= vlan_dev_ioctl,
-	.ndo_neigh_setup	= vlan_dev_neigh_setup,
-	.ndo_get_stats64	= vlan_dev_get_stats64,
-#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
-	.ndo_fcoe_ddp_setup	= vlan_dev_fcoe_ddp_setup,
-	.ndo_fcoe_ddp_done	= vlan_dev_fcoe_ddp_done,
-	.ndo_fcoe_enable	= vlan_dev_fcoe_enable,
-	.ndo_fcoe_disable	= vlan_dev_fcoe_disable,
-	.ndo_fcoe_get_wwn	= vlan_dev_fcoe_get_wwn,
-#endif
-};
-
 static const struct net_device_ops vlan_netdev_ops_sq = {
 	.ndo_select_queue	= vlan_dev_select_queue,
 	.ndo_change_mtu		= vlan_dev_change_mtu,
@@ -957,31 +891,6 @@ static const struct net_device_ops vlan_netdev_ops_sq = {
 #endif
 };
 
-static const struct net_device_ops vlan_netdev_accel_ops_sq = {
-	.ndo_select_queue	= vlan_dev_select_queue,
-	.ndo_change_mtu		= vlan_dev_change_mtu,
-	.ndo_init		= vlan_dev_init,
-	.ndo_uninit		= vlan_dev_uninit,
-	.ndo_open		= vlan_dev_open,
-	.ndo_stop		= vlan_dev_stop,
-	.ndo_start_xmit =  vlan_dev_hwaccel_hard_start_xmit,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address	= vlan_dev_set_mac_address,
-	.ndo_set_rx_mode	= vlan_dev_set_rx_mode,
-	.ndo_set_multicast_list	= vlan_dev_set_rx_mode,
-	.ndo_change_rx_flags	= vlan_dev_change_rx_flags,
-	.ndo_do_ioctl		= vlan_dev_ioctl,
-	.ndo_neigh_setup	= vlan_dev_neigh_setup,
-	.ndo_get_stats64	= vlan_dev_get_stats64,
-#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
-	.ndo_fcoe_ddp_setup	= vlan_dev_fcoe_ddp_setup,
-	.ndo_fcoe_ddp_done	= vlan_dev_fcoe_ddp_done,
-	.ndo_fcoe_enable	= vlan_dev_fcoe_enable,
-	.ndo_fcoe_disable	= vlan_dev_fcoe_disable,
-	.ndo_fcoe_get_wwn	= vlan_dev_fcoe_get_wwn,
-#endif
-};
-
 void vlan_setup(struct net_device *dev)
 {
 	ether_setup(dev);


^ permalink raw reply related

* [RFC][net-next-2.6 PATCH 4/4] net: remove check for headroom in vlan_dev_create
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

It is possible for the headroom to be smaller then the
hard_header_len for a short period of time after toggling
the vlan offload setting.

This is not a hard error and skb_cow_head is called in
__vlan_put_tag() to resolve this.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 1645c3c..e043389 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -238,9 +238,6 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 {
 	int rc;
 
-	if (WARN_ON(skb_headroom(skb) < dev->hard_header_len))
-		return -ENOSPC;
-
 	/* When this flag is not set we make the vlan_tci visible
 	 * by setting the skb field.
 	 *


^ permalink raw reply related

* [RFC][net-next-2.6 PATCH 3/4] ethtool: set hard_header_len using ETH_FLAG_{TX|RX}VLAN
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
This results in pskb_expand_head() being used unnecessarily.

This add a check in ethtool_op_set_flags to catch the ETH_FLAG_TXVLAN
flag and set the header length.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/ethtool.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 956a9f4..4f7fe26 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -21,6 +21,7 @@
 #include <linux/uaccess.h>
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
+#include <linux/if_vlan.h>
 
 /*
  * Some useful ethtool_ops methods that're device independent.
@@ -151,6 +152,14 @@ int ethtool_op_set_flags(struct net_device *dev, u32 data, u32 supported)
 	if (data & ~supported)
 		return -EINVAL;
 
+	/* is ETH_FLAGS_TXVLAN being toggled */
+	if ((dev->features & ETH_FLAG_TXVLAN) ^ (data & ETH_FLAG_TXVLAN)) {
+		if (data & ETH_FLAG_TXVLAN)
+			dev->hard_header_len -= VLAN_HLEN;
+		else
+			dev->hard_header_len += VLAN_HLEN;
+	}
+
 	dev->features = ((dev->features & ~flags_dup_features) |
 			 (data & flags_dup_features));
 	return 0;


^ permalink raw reply related

* [RFC][net-next-2.6 PATCH 2/4] net: 8021Q consolidate header_ops routines
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

The only thing the 8021Q header ops routines are required
for is the VLAN_FLAG_REORDER_HDR otherwise by the time
the VLAN tag has been added the packet is already on
its way down the stack. In this case using the Ethernet
ops works OK.

At present the VLAN_FLAG_REORDER_HDR flag does not work
with vlan offloads. As I understand the flag the intent
is to allow taps on the vlan device and possibly the
QOS layer to see the vlan tag info.

By inserting the tag in vlan_tci any taps or QOS policies
should be able to retrieve the vlan info. This allows
the flag to work the same in both the offload case and
non-offloaded case. And allows us to use the underlying
ethernet ops.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |   83 +++++++++++++-------------------------------------
 1 files changed, 21 insertions(+), 62 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 78b1618..1645c3c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -32,39 +32,6 @@
 #include "vlanproc.h"
 #include <linux/if_vlan.h>
 
-/*
- *	Rebuild the Ethernet MAC header. This is called after an ARP
- *	(or in future other address resolution) has completed on this
- *	sk_buff. We now let ARP fill in the other fields.
- *
- *	This routine CANNOT use cached dst->neigh!
- *	Really, it is used only when dst->neigh is wrong.
- *
- * TODO:  This needs a checkup, I'm ignorant here. --BLG
- */
-static int vlan_dev_rebuild_header(struct sk_buff *skb)
-{
-	struct net_device *dev = skb->dev;
-	struct vlan_ethhdr *veth = (struct vlan_ethhdr *)(skb->data);
-
-	switch (veth->h_vlan_encapsulated_proto) {
-#ifdef CONFIG_INET
-	case htons(ETH_P_IP):
-
-		/* TODO:  Confirm this will work with VLAN headers... */
-		return arp_find(veth->h_dest, skb);
-#endif
-	default:
-		pr_debug("%s: unable to resolve type %X addresses.\n",
-			 dev->name, ntohs(veth->h_vlan_encapsulated_proto));
-
-		memcpy(veth->h_source, dev->dev_addr, ETH_ALEN);
-		break;
-	}
-
-	return 0;
-}
-
 static inline struct sk_buff *vlan_check_reorder_header(struct sk_buff *skb)
 {
 	if (vlan_dev_info(skb->dev)->flags & VLAN_FLAG_REORDER_HDR) {
@@ -269,33 +236,26 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 				const void *daddr, const void *saddr,
 				unsigned int len)
 {
-	struct vlan_hdr *vhdr;
-	unsigned int vhdrlen = 0;
-	u16 vlan_tci = 0;
 	int rc;
 
 	if (WARN_ON(skb_headroom(skb) < dev->hard_header_len))
 		return -ENOSPC;
 
+	/* When this flag is not set we make the vlan_tci visible
+	 * by setting the skb field.
+	 *
+	 * We do not immediately insert the tag here the intent
+	 * of setting VLAN_FLAG_REORDER_HDR is to make the vlan
+	 * info avaiable to tap devices and the QOS layer. Adding
+	 * the tag present bit shoould be enough for other layers
+	 * to learn the vlan tag.
+	 */
 	if (!(vlan_dev_info(dev)->flags & VLAN_FLAG_REORDER_HDR)) {
-		vhdr = (struct vlan_hdr *) skb_push(skb, VLAN_HLEN);
+		u16 vlan_tci = 0;
 
 		vlan_tci = vlan_dev_info(dev)->vlan_id;
 		vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-		vhdr->h_vlan_TCI = htons(vlan_tci);
-
-		/*
-		 *  Set the protocol type. For a packet of type ETH_P_802_3/2 we
-		 *  put the length in here instead.
-		 */
-		if (type != ETH_P_802_3 && type != ETH_P_802_2)
-			vhdr->h_vlan_encapsulated_proto = htons(type);
-		else
-			vhdr->h_vlan_encapsulated_proto = htons(len);
-
-		skb->protocol = htons(ETH_P_8021Q);
-		type = ETH_P_8021Q;
-		vhdrlen = VLAN_HLEN;
+		skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
 	}
 
 	/* Before delegating work to the lower layer, enter our MAC-address */
@@ -304,9 +264,7 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 
 	/* Now make the underlying real hard header */
 	dev = vlan_dev_info(dev)->real_dev;
-	rc = dev_hard_header(skb, dev, type, daddr, saddr, len + vhdrlen);
-	if (rc > 0)
-		rc += vhdrlen;
+	rc = dev_hard_header(skb, dev, type, daddr, saddr, len);
 	return rc;
 }
 
@@ -676,9 +634,11 @@ static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass)
 }
 
 static const struct header_ops vlan_header_ops = {
-	.create	 = vlan_dev_hard_header,
-	.rebuild = vlan_dev_rebuild_header,
-	.parse	 = eth_header_parse,
+	.create		= vlan_dev_hard_header,
+	.rebuild	= eth_rebuild_header,
+	.parse		= eth_header_parse,
+	.cache		= eth_header_cache,
+	.cache_update	= eth_header_cache_update,
 };
 
 static const struct net_device_ops vlan_netdev_ops, vlan_netdev_ops_sq;
@@ -713,13 +673,12 @@ static int vlan_dev_init(struct net_device *dev)
 	dev->fcoe_ddp_xid = real_dev->fcoe_ddp_xid;
 #endif
 
-	if (real_dev->features & NETIF_F_HW_VLAN_TX) {
-		dev->header_ops      = real_dev->header_ops;
+	dev->header_ops = &vlan_header_ops;
+
+	if (real_dev->features & NETIF_F_HW_VLAN_TX)
 		dev->hard_header_len = real_dev->hard_header_len;
-	} else {
-		dev->header_ops      = &vlan_header_ops;
+	else
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
-	}
 
 	if (real_dev->netdev_ops->ndo_select_queue)
 		dev->netdev_ops = &vlan_netdev_ops_sq;


^ permalink raw reply related

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
From: Patrick Ringl @ 2010-10-21 22:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Patrick Ringl, netdev, linux-kernel, bridge
In-Reply-To: <20101020061631.GA21679@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

On 10/20/2010 08:16 AM, Herbert Xu wrote:
> On Mon, Oct 18, 2010 at 10:37:40PM +0200, Patrick Ringl wrote:
>    
>> Anything else I could possibly provide? :-)
>>      
> Yes, testing :)
>
> First of all I'd like to rule out (or in) the IPv6 query code,
> which is clearly generating a bogus packet (wrong payload_len).
>
> So can you apply this patch and see if it makes the problem
> go away? Please take packet dumps so we know that the IPv6 query
> is no longer being sent.
>    
Hi,

Hi,

sorry for the late response. I've been using your patch on 2.6.36 and 
unfortunately, the bogus ipv6 packet is not the cause of the lockups. I 
have attached two packet dumps (br0 and eth1) again.


regards,
Patrick
> Thanks,
>    


[-- Attachment #2: n_br0 --]
[-- Type: application/octet-stream, Size: 1784 bytes --]

[-- Attachment #3: n_eth1 --]
[-- Type: application/octet-stream, Size: 1840 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/9] tproxy: split off ipv6 defragmentation to a separate module
From: Eric Dumazet @ 2010-10-21 22:19 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: KOVACS Krisztian, netdev, netfilter-devel, Balazs Scheidler,
	David Miller
In-Reply-To: <4CC0486D.60703@trash.net>

Le jeudi 21 octobre 2010 à 16:04 +0200, Patrick McHardy a écrit :
> Am 21.10.2010 13:43, schrieb KOVACS Krisztian:
> > tproxy: split off ipv6 defragmentation to a separate module
> >     
> >     Like with IPv4, TProxy needs IPv6 defragmentation but does not
> >     require connection tracking. Since defragmentation was coupled
> >     with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
> >     similar to the already existing nf_defrag_ipv4.
> 
> Applied, thanks.

Hmm...

CONFIG_IPV6=m
CONFIG_NETFILTER_TPROXY=m


  MODPOST 201 modules
ERROR: "nf_defrag_ipv6_enable" [net/netfilter/xt_TPROXY.ko] undefined!
ERROR: "ipv6_find_hdr" [net/netfilter/xt_TPROXY.ko] undefined!

Sorry, it's late here, I wont fix this ;)



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH 5/9] ipvs network name space aware
From: Julian Anastasov @ 2010-10-21 22:56 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Simon Horman, lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org, wensong@linux-vs.org,
	daniel.lezcano@free.fr
In-Reply-To: <201010211342.42833.hans.schillstrom@ericsson.com>


 	Hello,

On Thu, 21 Oct 2010, Hans Schillstrom wrote:

> static inline struct net * ipvs_devnet(struct skb *skb)
> {
> #ifdef CONFIG_NS_NET
> 	return dev_net(skb->dev);
> #else
> 	return init_net;
> #endif
> }

 	After last IPVS changes may be something like that
is needed instead:

static inline struct net *ipvs_skbnet(struct sk_buff *skb)
{
 	return dev_net(skb->dev ? : skb_dst(skb)->dev);
}

 	because now we have LOCAL_OUT processing.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: 2.6.36-rc7: net/bridge causes temporary network I/O lockups [2]
From: Herbert Xu @ 2010-10-21 23:07 UTC (permalink / raw)
  To: Patrick Ringl; +Cc: netdev, linux-kernel, bridge
In-Reply-To: <4CC0BC3E.6080906@freenet.de>

On Fri, Oct 22, 2010 at 12:18:38AM +0200, Patrick Ringl wrote:
>
> sorry for the late response. I've been using your patch on 2.6.36 and  
> unfortunately, the bogus ipv6 packet is not the cause of the lockups. I  
> have attached two packet dumps (br0 and eth1) again.

OK I see, I had thought that your whole system locked up for
20-30 seconds but it was only the external network responses
that stopped.

I think the problem is your switch.  It appears to purge our
port entry when it receives our general query.

So to work around this, I suggest that you disable the startup
queries through the parameter multicast_startup_query_count.
You can do this either through sysfs or a sufficiently recent
brctl command.

BTW, what brand/model is your switch? If this problem is common
enough then we may have to disable general queries by default.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* TIPC security issues
From: Dan Rosenberg @ 2010-10-21 23:45 UTC (permalink / raw)
  To: jon.maloy, allan.stephens; +Cc: security, netdev

The tipc_msg_build() function in net/tipc/msg.c is written in such a way
as to create a highly exploitable kernel heap overflow that would allow
a local user to escalate privileges to root by issuing maliciously
crafted sendmsg() calls.  At a minimum, the following issues should be
fixed:

1. The tipc_msg_calc_data_size() function is almost totally broken.  It
sums together size_t values (iov_lens), but returns an integer.  Two
things can go wrong - the total value can wrap around, or on 64-bit
platforms, iov_len values greater than UINT_MAX will be truncated.

2. The comparison of dsz to TIPC_MAX_USER_MSG_SIZE is signed, so
negative (large unsigned) values will pass this check.

3. The comparison of sz to max_size is also signed.

As a result of these issues, it's possible to cause the allocation of a
small heap buffer and the subsequent copying of a carefully controlled
larger amount of data into that buffer.

I haven't found a Linux distribution that defines a module alias for
TIPC (even though most compile it as a module), so an administrator will
have had to explicitly load the TIPC module for a system to be
vulnerable.

-Dan


^ permalink raw reply

* Re: [Security] TIPC security issues
From: Linus Torvalds @ 2010-10-22  0:31 UTC (permalink / raw)
  To: Dan Rosenberg; +Cc: jon.maloy, allan.stephens, netdev, security
In-Reply-To: <1287704752.11051.79.camel@Dan>

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

On Thu, Oct 21, 2010 at 4:45 PM, Dan Rosenberg <drosenberg@vsecurity.com> wrote:
>
> 1. The tipc_msg_calc_data_size() function is almost totally broken.  It
> sums together size_t values (iov_lens), but returns an integer.  Two
> things can go wrong - the total value can wrap around, or on 64-bit
> platforms, iov_len values greater than UINT_MAX will be truncated.

Hmm. I actually think that this is a problem with "verify_iovec()". We
should just make that one stricter, I think.

We already (long ago) decided that POSIX.1 compatibility be damned,
and that reading and writing more than 2GB in a single system call is
bogus: so normal write calls will actually limit size_t arguments to
MAX_INT, exactly so that various filesystems don't have to worry about
overflow and can keep length arguments in an "int".

And we probably should just do the same in verify[_compat]_iovec(). We
do 99% of the necessary work there already - adding a few simple
checks would get us there all the way. And then silly things like this
wouldn't matter.

Something simple like:
  if an iovec has a total length that is > MAX_INT, then limit that
entry to MAX_INT, and clear out the rest

Something like the appended UNTESTED patch. NOTE! it actually makes
"verify_iovec()" *change* the iovec if it grows too big.

Also note that not only is it untested, it also doesn't do this for
the verify_compat_iovec() function, so this is really more of a "look,
we could do this" patch - to get discussion started.

                       Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 993 bytes --]

 net/core/iovec.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/core/iovec.c b/net/core/iovec.c
index e6b133b..b489fd6 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -38,7 +38,7 @@
 long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
 {
 	int size, ct;
-	long err;
+	size_t err;
 
 	if (m->msg_namelen) {
 		if (mode == VERIFY_READ) {
@@ -60,14 +60,13 @@ long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address,
 	err = 0;
 
 	for (ct = 0; ct < m->msg_iovlen; ct++) {
-		err += iov[ct].iov_len;
-		/*
-		 * Goal is not to verify user data, but to prevent returning
-		 * negative value, which is interpreted as errno.
-		 * Overflow is still possible, but it is harmless.
-		 */
-		if (err < 0)
-			return -EMSGSIZE;
+		size_t len = iov[ct].iov_len;
+
+		if (len > INT_MAX - err) {
+			len = INT_MAX - err;
+			iov[ct].iov_len = len;
+		}
+		err += len;
 	}
 
 	return err;

^ permalink raw reply related

* RE: [PATCH 1/2] vlan: Calling vlan_hwaccel_do_receive() is always valid.
From: Dmitry Kravkov @ 2010-10-22  0:46 UTC (permalink / raw)
  To: Jesse Gross, David Miller; +Cc: netdev@vger.kernel.org, Vladislav Zolotarov
In-Reply-To: <1287696643-9695-1-git-send-email-jesse@nicira.com>

Thanks, this solves the problem. 

Tested-by: Dmitry Kravkov <dmitry@broadcom.com>   

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Jesse Gross
Sent: Thursday, October 21, 2010 11:31 PM
To: David Miller
Cc: netdev@vger.kernel.org
Subject: [PATCH 1/2] vlan: Calling vlan_hwaccel_do_receive() is always valid.

It is now acceptable to receive vlan tagged packets at any time,
even if CONFIG_VLAN_8021Q is not set.  This means that calling
vlan_hwaccel_do_receive() should not result in BUG() but rather just
behave as if there were no vlan devices configured.

Reported-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/linux/if_vlan.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index e607256..cbd3dcd 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -153,7 +153,8 @@ static inline u16 vlan_dev_vlan_id(const struct net_device *dev)
 
 static inline bool vlan_hwaccel_do_receive(struct sk_buff **skb)
 {
-	BUG();
+	if ((*skb)->vlan_tci & VLAN_VID_MASK)
+		(*skb)->pkt_type = PACKET_OTHERHOST;
 	return false;
 }
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply related

* RE: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Dmitry Kravkov @ 2010-10-22  0:57 UTC (permalink / raw)
  To: Jesse Gross, Vladislav Zolotarov
  Cc: David Miller, netdev@vger.kernel.org, Hao Zheng, Eilon Greenstein
In-Reply-To: <AANLkTinudtxjGAEcc9cLjn+7Y7kqcaQXMW_5SRoNe5QU@mail.gmail.com>

With the latest patch this one also disappeared and I do see the packets with vlan tag on the receiver's side
(For this purpose the driver was patched do not configure the stripping)

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Jesse Gross
Sent: Thursday, October 21, 2010 11:37 PM
To: Vladislav Zolotarov
Cc: David Miller; netdev@vger.kernel.org; Hao Zheng; Eilon Greenstein
Subject: Re: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.

On Thu, Oct 21, 2010 at 7:50 AM, Vladislav Zolotarov <vladz@broadcom.com> wrote:
>> > Guys, when I compiled the kernel with these patches without VLAN
>> > support (CONFIG_VLAN_8021Q is not set) and tried to send VLAN tagged
>> > frames from the remote side to the bnx2x interface the kernel
>> panicked.
>> >
>> > The stack trace got cut with the __netif_receive_skb() on top by the
>> > IPKVM and I'll have to connect a serial to get it all. But until I
>> > did that maybe somebody will have any ideas anyway...
>> >
>> > It happens regardless there is HW RX VLAN stripping enabled or not.
>>
>> When RX VLAN stripping is enabled we hit the BUG() in the
>> vlan_hwaccel_do_receive().:
>
> We hit the same BUG() both when VLAN stripping is disabled.

This one surprises me because that function shouldn't get called at
all when VLAN stripping is disabled.  Are you sure that it is
disabled?  From my reading of the bnx2x driver it seems like it is
always enabled.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2010-10-22  1:19 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Koki Sanagi, Frederic Weisbecker,
	Jesse Gross

Hi all,

Today's linux-next merge of the net tree got a conflict in net/core/dev.c
between commit cf66ba58b5cb8b1526e9dd2fb96ff8db048d4d44 ("netdev: Add
tracepoints to netdev layer") from Linus' tree and commits
05532121da0728eaedac2a0a5c3cecad3a95d765 ("net: 802.1q: make
vlan_hwaccel_do_receive() return void") and
3701e51382a026cba10c60b03efabe534fba4ca4 ("vlan: Centralize handling of
hardware acceleration") from the net tree.

Just context changes.  I fixed it up (see below) and can carry the fix as
necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc net/core/dev.c
index 154e25e,6d4218c..0000000
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@@ -128,9 -128,8 +128,10 @@@
  #include <linux/jhash.h>
  #include <linux/random.h>
  #include <trace/events/napi.h>
 +#include <trace/events/net.h>
 +#include <trace/events/skb.h>
  #include <linux/pci.h>
+ #include <linux/inetdevice.h>
  
  #include "net-sysfs.h"
  
@@@ -2835,10 -2898,6 +2905,8 @@@ static int __netif_receive_skb(struct s
  	if (!netdev_tstamp_prequeue)
  		net_timestamp_check(skb);
  
 +	trace_netif_receive_skb(skb);
- 	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
- 		return NET_RX_SUCCESS;
 +
  	/* if we've gotten here through NAPI, check netpoll */
  	if (netpoll_receive_skb(skb))
  		return NET_RX_DROP;

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox