Netdev List
 help / color / mirror / Atom feed
* [PATCH] ip_gre: fix possible use after free
From: Eric Dumazet @ 2012-12-21  2:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Isaku Yamahata

From: Eric Dumazet <edumazet@google.com>

Once skb_realloc_headroom() is called, tiph might point to freed memory.

Cache tiph->ttl value before the reallocation, to avoid unexpected
behavior.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
---
 net/ipv4/ip_gre.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a85ae2f..4c22e70 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -750,6 +750,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 	int    gre_hlen;
 	__be32 dst;
 	int    mtu;
+	u8     ttl;
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL &&
 	    skb_checksum_help(skb))
@@ -812,6 +813,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 			goto tx_error;
 	}
 
+	ttl = tiph->ttl;
 	tos = tiph->tos;
 	if (tos == 1) {
 		tos = 0;
@@ -904,6 +906,7 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 		dev_kfree_skb(skb);
 		skb = new_skb;
 		old_iph = ip_hdr(skb);
+		/* Warning : tiph value might point to freed memory */
 	}
 
 	skb_reset_transport_header(skb);
@@ -927,8 +930,9 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 	iph->tos		=	ipgre_ecn_encapsulate(tos, old_iph, skb);
 	iph->daddr		=	fl4.daddr;
 	iph->saddr		=	fl4.saddr;
+	iph->ttl		=	ttl;
 
-	if ((iph->ttl = tiph->ttl) == 0) {
+	if (ttl == 0) {
 		if (skb->protocol == htons(ETH_P_IP))
 			iph->ttl = old_iph->ttl;
 #if IS_ENABLED(CONFIG_IPV6)

^ permalink raw reply related

* Re: [PATCH] ipv4/ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
From: Eric Dumazet @ 2012-12-21  1:48 UTC (permalink / raw)
  To: Isaku Yamahata; +Cc: netdev
In-Reply-To: <9ee0cd8f94e1ee866b40ee7b6755e8d8705325c9.1356052319.git.yamahata@valinux.co.jp>

On Fri, 2012-12-21 at 10:12 +0900, Isaku Yamahata wrote:
> ipgre_tunnel_xmit() parses network header as IP unconditionally.
> But transmitting packets are not always IP packet. For example such packet
> can be sent by packet socket with sockaddr_ll.sll_protocol set.
> So make the function check if skb->protocol is IP.
> 
> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
> ---
>  net/ipv4/ip_gre.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index a85ae2f..8fcf0ed 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -760,7 +760,10 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
>  
>  	if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
>  		gre_hlen = 0;
> -		tiph = (const struct iphdr *)skb->data;
> +		if (skb->protocol == htons(ETH_P_IP))
> +			tiph = (const struct iphdr *)skb->data;
> +		else
> +			tiph = &tunnel->parms.iph;
>  	} else {
>  		gre_hlen = tunnel->hlen;
>  		tiph = &tunnel->parms.iph;

Seems good to me thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

BTW, it seems another bug exists at line 931 : We dereference tiph while
it could point to freed memory because of the skb_realloc_headroom() at
line 893

I'll send a patch.

^ permalink raw reply

* [PATCH] ipv4/ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally
From: Isaku Yamahata @ 2012-12-21  1:12 UTC (permalink / raw)
  To: netdev; +Cc: yamahata

ipgre_tunnel_xmit() parses network header as IP unconditionally.
But transmitting packets are not always IP packet. For example such packet
can be sent by packet socket with sockaddr_ll.sll_protocol set.
So make the function check if skb->protocol is IP.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 net/ipv4/ip_gre.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index a85ae2f..8fcf0ed 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -760,7 +760,10 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 
 	if (dev->header_ops && dev->type == ARPHRD_IPGRE) {
 		gre_hlen = 0;
-		tiph = (const struct iphdr *)skb->data;
+		if (skb->protocol == htons(ETH_P_IP))
+			tiph = (const struct iphdr *)skb->data;
+		else
+			tiph = &tunnel->parms.iph;
 	} else {
 		gre_hlen = tunnel->hlen;
 		tiph = &tunnel->parms.iph;
-- 
1.7.10.4

^ permalink raw reply related

* Re: TUN problems (regression?)
From: Stephen Hemminger @ 2012-12-20 23:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Paul Moore, Jason Wang, netdev
In-Reply-To: <1356046697.21834.3606.camel@edumazet-glaptop>

On Thu, 20 Dec 2012 15:38:17 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> > [CC'ing netdev in case this is a known problem I just missed ...]
> > 
> > Hi Jason,
> > 
> > I started doing some more testing with the multiqueue TUN changes and I ran 
> > into a problem when running tunctl: running it once w/o arguments works as 
> > expected, but running it a second time results in failure and a 
> > kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> > on my test VM and happens independent of the LSM/SELinux fixup patches.
> > 
> > Have you seen this before?
> > 
> 
> Obviously code in tun_flow_init() is wrong...
> 
> static int tun_flow_init(struct tun_struct *tun)
> {
>         int i;
> 
>         tun->flow_cache = kmem_cache_create("tun_flow_cache",
>                                             sizeof(struct tun_flow_entry), 0, 0,
>                                             NULL);
>         if (!tun->flow_cache)
>                 return -ENOMEM;
> ...
> }
> 
> 
> I have no idea why we would need a kmem_cache per tun_struct,
> and why we even need a kmem_cache.

Normally flow malloc/free should be good enough.
It might make sense to use private kmem_cache if doing hlist_nulls.


Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply

* Re: TUN problems (regression?)
From: Eric Dumazet @ 2012-12-20 23:38 UTC (permalink / raw)
  To: Paul Moore; +Cc: Jason Wang, netdev
In-Reply-To: <4151394.nMo40zlg68@sifl>

On Thu, 2012-12-20 at 18:16 -0500, Paul Moore wrote:
> [CC'ing netdev in case this is a known problem I just missed ...]
> 
> Hi Jason,
> 
> I started doing some more testing with the multiqueue TUN changes and I ran 
> into a problem when running tunctl: running it once w/o arguments works as 
> expected, but running it a second time results in failure and a 
> kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
> on my test VM and happens independent of the LSM/SELinux fixup patches.
> 
> Have you seen this before?
> 

Obviously code in tun_flow_init() is wrong...

static int tun_flow_init(struct tun_struct *tun)
{
        int i;

        tun->flow_cache = kmem_cache_create("tun_flow_cache",
                                            sizeof(struct tun_flow_entry), 0, 0,
                                            NULL);
        if (!tun->flow_cache)
                return -ENOMEM;
...
}


I have no idea why we would need a kmem_cache per tun_struct,
and why we even need a kmem_cache.


I would try following patch :

 drivers/net/tun.c |   24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 504f7f1..fbd106e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -180,7 +180,6 @@ struct tun_struct {
 	int debug;
 #endif
 	spinlock_t lock;
-	struct kmem_cache *flow_cache;
 	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
 	struct timer_list flow_gc_timer;
 	unsigned long ageing_time;
@@ -209,8 +208,8 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 					      struct hlist_head *head,
 					      u32 rxhash, u16 queue_index)
 {
-	struct tun_flow_entry *e = kmem_cache_alloc(tun->flow_cache,
-						    GFP_ATOMIC);
+	struct tun_flow_entry *e = kmalloc(sizeof(*e), GFP_ATOMIC);
+
 	if (e) {
 		tun_debug(KERN_INFO, tun, "create flow: hash %u index %u\n",
 			  rxhash, queue_index);
@@ -223,19 +222,12 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 	return e;
 }
 
-static void tun_flow_free(struct rcu_head *head)
-{
-	struct tun_flow_entry *e
-		= container_of(head, struct tun_flow_entry, rcu);
-	kmem_cache_free(e->tun->flow_cache, e);
-}
-
 static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
 {
 	tun_debug(KERN_INFO, tun, "delete flow: hash %u index %u\n",
 		  e->rxhash, e->queue_index);
 	hlist_del_rcu(&e->hash_link);
-	call_rcu(&e->rcu, tun_flow_free);
+	kfree_rcu(e, rcu);
 }
 
 static void tun_flow_flush(struct tun_struct *tun)
@@ -833,12 +825,6 @@ static int tun_flow_init(struct tun_struct *tun)
 {
 	int i;
 
-	tun->flow_cache = kmem_cache_create("tun_flow_cache",
-					    sizeof(struct tun_flow_entry), 0, 0,
-					    NULL);
-	if (!tun->flow_cache)
-		return -ENOMEM;
-
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++)
 		INIT_HLIST_HEAD(&tun->flows[i]);
 
@@ -854,10 +840,6 @@ static void tun_flow_uninit(struct tun_struct *tun)
 {
 	del_timer_sync(&tun->flow_gc_timer);
 	tun_flow_flush(tun);
-
-	/* Wait for completion of call_rcu()'s */
-	rcu_barrier();
-	kmem_cache_destroy(tun->flow_cache);
 }
 
 /* Initialize net device. */

^ permalink raw reply related

* TUN problems (regression?)
From: Paul Moore @ 2012-12-20 23:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 528 bytes --]

[CC'ing netdev in case this is a known problem I just missed ...]

Hi Jason,

I started doing some more testing with the multiqueue TUN changes and I ran 
into a problem when running tunctl: running it once w/o arguments works as 
expected, but running it a second time results in failure and a 
kmem_cache_sanity_check() failure.  The problem appears to be very repeatable 
on my test VM and happens independent of the LSM/SELinux fixup patches.

Have you seen this before?

-- 
paul moore
security and virtualization @ redhat

[-- Attachment #2: tun_problem.png --]
[-- Type: image/png, Size: 13740 bytes --]

^ permalink raw reply

* Re: bonding driver - how to recognize the active slave
From: Eric Dumazet @ 2012-12-20 22:52 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Erez Shitrit, netdev@vger.kernel.org
In-Reply-To: <CAJZOPZLOj2QCWN75hAsQnmB=YiSGRhdhzf4ODhAD-QoU+Y0ZGg@mail.gmail.com>

On Fri, 2012-12-21 at 00:21 +0200, Or Gerlitz wrote:
> On Thu, Dec 20, 2012 at 11:19 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> > cat /sys/class/net/bond0/bonding/active_slave
> 
> sure, I think Erez would like to know that from within a network
> device kernel code, e.g maybe register to netdev kernel events and on
> the event of bonding fail-over identify the active slave for
> active-backup mode, etc.

OK, but then why a network driver should be aware of this bonding
detail ?

It seems part of this commit could be reverted,
ie setting IFF_SLAVE_INACTIVE again.

commit 2d7011ca79f1a8792e04d131b8ea21db179ab917
Author: Jiri Pirko <jpirko@redhat.com>
Date:   Wed Mar 16 08:46:43 2011 +0000

    bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
    
    Since bond-related code was moved from net/core/dev.c into bonding,
    IFF_SLAVE_INACTIVE is no longer needed. Replace is with flag "inactive"
    stored in slave structure
    
    Signed-off-by: Jiri Pirko <jpirko@redhat.com>
    Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: bonding driver - how to recognize the active slave
From: Or Gerlitz @ 2012-12-20 22:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Erez Shitrit, netdev@vger.kernel.org
In-Reply-To: <1356038370.21834.3315.camel@edumazet-glaptop>

On Thu, Dec 20, 2012 at 11:19 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> cat /sys/class/net/bond0/bonding/active_slave

sure, I think Erez would like to know that from within a network
device kernel code, e.g maybe register to netdev kernel events and on
the event of bonding fail-over identify the active slave for
active-backup mode, etc.

Or.

^ permalink raw reply

* Re: [RFC PATCH v3 2/2] tun: fix LSM/SELinux labeling of tun/tap devices
From: Eric Paris @ 2012-12-20 21:58 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jason Wang, Michael S. Tsirkin, Linux Netdev List, LSM List,
	SE-Linux
In-Reply-To: <2374585.ppB8RWyPuB@sifl>

You may add my Ack to the series.

-Eric

On Wed, Dec 19, 2012 at 11:58 AM, Paul Moore <pmoore@redhat.com> wrote:
> On Wednesday, December 19, 2012 01:46:25 PM Jason Wang wrote:
>> On 12/19/2012 07:08 AM, Michael S. Tsirkin wrote:
>> > On Tue, Dec 18, 2012 at 05:53:52PM -0500, Paul Moore wrote:
>> >> This patch corrects some problems with LSM/SELinux that were introduced
>> >> with the multiqueue patchset.  The problem stems from the fact that the
>> >> multiqueue work changed the relationship between the tun device and its
>> >> associated socket; before the socket persisted for the life of the
>> >> device, however after the multiqueue changes the socket only persisted
>> >> for the life of the userspace connection (fd open).  For non-persistent
>> >> devices this is not an issue, but for persistent devices this can cause
>> >> the tun device to lose its SELinux label.
>> >>
>> >> We correct this problem by adding an opaque LSM security blob to the
>> >> tun device struct which allows us to have the LSM security state, e.g.
>> >> SELinux labeling information, persist for the lifetime of the tun
>> >> device.  In the process we tweak the LSM hooks to work with this new
>> >> approach to TUN device/socket labeling and introduce a new LSM hook,
>> >> security_tun_dev_attach_queue(), to approve requests to attach to a
>> >> TUN queue via TUNSETQUEUE.
>> >>
>> >> The SELinux code has been adjusted to match the new LSM hooks, the
>> >> other LSMs do not make use of the LSM TUN controls.  This patch makes
>> >> use of the recently added "tun_socket:attach_queue" permission to
>> >> restrict access to the TUNSETQUEUE operation.  On older SELinux
>> >> policies which do not define the "tun_socket:attach_queue" permission
>> >> the access control decision for TUNSETQUEUE will be handled according
>> >> to the SELinux policy's unknown permission setting.
>> >>
>> >> Signed-off-by: Paul Moore <pmoore@redhat.com>
>> >
>> > Looks good to me. A comment not directly related to this patch, below.
>>
>> Good to me too, will do some test on this.
>
> Great.  I'll do some more testing and make sure the LSM and SELinux crowd are
> okay with the changes.
>
> --
> paul moore
> security and virtualization @ redhat
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH net-next] be2net: fix INTx ISR for interrupt behaviour on BE2
From: Ben Hutchings @ 2012-12-20 21:20 UTC (permalink / raw)
  To: Perla, Sathya; +Cc: netdev@vger.kernel.org
In-Reply-To: <CF9D1877D81D214CB0CA0669EFAE020C0E0C422D@CMEXMB1.ad.emulex.com>

On Tue, 2012-12-18 at 12:57 +0000, Perla, Sathya wrote:
> >-----Original Message-----
> >From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> 
> >On Wed, 2012-11-28 at 20:20 +0000, Ben Hutchings wrote:
> >> On Wed, 2012-11-28 at 11:20 +0530, Sathya Perla wrote:
> >> > On BE2 chip, an interrupt may be raised even when EQ is in un-armed state.
> >> > As a result be_intx()::events_get() and be_poll:events_get() can race and
> >> > notify an EQ wrongly.
> >> >
> >> > Fix this by counting events only in be_poll(). Commit 0b545a629 fixes
> >> > the same issue in the MSI-x path.
> >> >
> >> > But, on Lancer, INTx can be de-asserted only by notifying num evts. This
> >> > is not an issue as the above BE2 behavior doesn't exist/has never been
> >> > seen on Lancer.
> >> [...]
> >> > @@ -2014,15 +1996,23 @@ static int be_rx_cqs_create(struct be_adapter
> >*adapter)
> >> >
> >> >  static irqreturn_t be_intx(int irq, void *dev)
> >> >  {
> >> > -	struct be_adapter *adapter = dev;
> >> > -	int num_evts;
> >> > +	struct be_eq_obj *eqo = dev;
> >> > +	struct be_adapter *adapter = eqo->adapter;
> >> > +	int num_evts = 0;
> >> >
> >> > -	/* With INTx only one EQ is used */
> >> > -	num_evts = event_handle(&adapter->eq_obj[0]);
> >> > -	if (num_evts)
> >> > -		return IRQ_HANDLED;
> >> > -	else
> >> > -		return IRQ_NONE;
> >> > +	/* On Lancer, clear-intr bit of the EQ DB does not work.
> >> > +	 * INTx is de-asserted only on notifying num evts.
> >> > +	 */
> >> > +	if (lancer_chip(adapter))
> >> > +		num_evts = events_get(eqo);
> >> > +
> >> > +	/* The EQ-notify may not de-assert INTx rightaway, causing
> >> > +	 * the ISR to be invoked again. So, return HANDLED even when
> >> > +	 * num_evts is zero.
> >> > +	 */
> >> > +	be_eq_notify(adapter, eqo->q.id, false, true, num_evts);
> >> > +	napi_schedule(&eqo->napi);
> >> > +	return IRQ_HANDLED;
> >> >  }
> >> [...]
> >>
> >> You shouldn't unconditionally return IRQ_HANDLED.  This prevents
> >> interrupt storm detection from working, not just for your device but for
> >> anything else sharing its IRQ.
> >>
> >> I understand there is a real problem to be fixed (PCIe write completions
> >> overtaking INTx deassertion, and maybe a specific hardware bug).
> >[...]
> >
> >I was thinking of read completions; there are no write completions to
> >wait for so you're pretty much guaranteed to get called a second time.
> >Maybe you should add an MMIO read after calling be_eq_notify().
> 
> Ben, I'm very sorry for not replying to this thread earlier. I needed time to play with
> your suggested solution and better understand the HW behavior in this case.
> 
> Adding an extra PCI memory read after the EQ-notify helps in syncing the PCI de-assert
> message *only* if it was already issued.
> But, there are cases when the HW block takes some time (longer) to initiate the INTx de-assert.
> The PCI memory read would complete but the de-assert wouldn't have happened yet.
> The PCI read sync will work only if the de-assert was issued *before* the PCI-read request was seen by the HW.

Yes, I've seen the exact same problem with our controllers.  At the PCIe
transaction level, legacy interrupt messages and read completions are
queued and flow-controlled separately.  If the chip's PCIe core doesn't
provide a way to restrict reordering of the two TLPs, this seems to be
inevitable.

What we do in sfc is to count the number of times in a row we've seen no
reason for the interrupt (irq_zero_count), and return IRQ_HANDLED only
if this is the first time.  This might work for you too.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: bonding driver - how to recognize the active slave
From: Eric Dumazet @ 2012-12-20 21:19 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: netdev@vger.kernel.org
In-Reply-To: <BA0B6857ABCA474FB07F0DA6F50A3077918B2450@MTRDAG01.mtl.com>

On Thu, 2012-12-20 at 20:38 +0000, Erez Shitrit wrote:
> Hi,
> Is there any way to know who is the active slave in active-passive mode?
> 
> Is there any indication for that in netdevice flags? Or some other way?
> (I would like to get in my network interface driver which can be a slave of the bonding interface)
> 
> I saw that there is the flag IFF_SLAVE_INACTIVE but it is not used as far as I can see.


cat /sys/class/net/bond0/bonding/active_slave

^ permalink raw reply

* Re: [PATCH 4/4] net/smsc911x: Provide common clock functionality
From: Russell King - ARM Linux @ 2012-12-20 20:51 UTC (permalink / raw)
  To: Lee Jones
  Cc: Linus Walleij, Steve Glendinning, Robert Marklund, linus.walleij,
	arnd, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <20121220203514.GN2691@gmail.com>

On Thu, Dec 20, 2012 at 08:35:14PM +0000, Lee Jones wrote:
> On Thu, 20 Dec 2012, Russell King - ARM Linux wrote:
> 
> > On Thu, Dec 20, 2012 at 08:12:08PM +0100, Linus Walleij wrote:
> > > On Wed, Dec 19, 2012 at 6:19 PM, Lee Jones <lee.jones@linaro.org> wrote:
> > > 
> > > > Some platforms provide clocks which require enabling before the
> > > > SMSC911x chip will power on. This patch uses the new common clk
> > > > framework to do just that. If no clock is provided, it will just
> > > > be ignored and the driver will continue to assume that no clock
> > > > is required for the chip to run successfully.
> > > >
> > > > Cc: Steve Glendinning <steve.glendinning@shawell.net>
> > > > Cc: netdev@vger.kernel.org
> > > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > 
> > > Seems to me like it'll do the trick.
> > > Acked-by: Linus Walleij <linus.walleij@linaro.org>
> > 
> > This looks fairly dangerous.  What about those platforms which use this
> > driver, but don't provide a clock for it?
> > 
> > It looks like this will result in those platforms losing their ethernet
> > support.  There's at least a bunch of the ARM evaluation boards which
> > make use of this driver...
> 
> Right, but nothing should regress. If no clock is provided the driver
> moves on during the request and will refuse to prepare, enable and
> disable there after. 
> 
> Unless I've made a mistake somewhere? If so, I'd be happy to fixup.

No, but... don't use NULL for that.  Use IS_ERR(pdata->clk) instead.

^ permalink raw reply

* bonding driver - how to recognize the active slave
From: Erez Shitrit @ 2012-12-20 20:38 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi,
Is there any way to know who is the active slave in active-passive mode?

Is there any indication for that in netdevice flags? Or some other way?
(I would like to get in my network interface driver which can be a slave of the bonding interface)

I saw that there is the flag IFF_SLAVE_INACTIVE but it is not used as far as I can see.

Thanks, Erez

^ permalink raw reply

* Re: [PATCH 4/4] net/smsc911x: Provide common clock functionality
From: Lee Jones @ 2012-12-20 20:35 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Linus Walleij, Steve Glendinning, Robert Marklund, linus.walleij,
	arnd, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <20121220192441.GC14363@n2100.arm.linux.org.uk>

On Thu, 20 Dec 2012, Russell King - ARM Linux wrote:

> On Thu, Dec 20, 2012 at 08:12:08PM +0100, Linus Walleij wrote:
> > On Wed, Dec 19, 2012 at 6:19 PM, Lee Jones <lee.jones@linaro.org> wrote:
> > 
> > > Some platforms provide clocks which require enabling before the
> > > SMSC911x chip will power on. This patch uses the new common clk
> > > framework to do just that. If no clock is provided, it will just
> > > be ignored and the driver will continue to assume that no clock
> > > is required for the chip to run successfully.
> > >
> > > Cc: Steve Glendinning <steve.glendinning@shawell.net>
> > > Cc: netdev@vger.kernel.org
> > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > 
> > Seems to me like it'll do the trick.
> > Acked-by: Linus Walleij <linus.walleij@linaro.org>
> 
> This looks fairly dangerous.  What about those platforms which use this
> driver, but don't provide a clock for it?
> 
> It looks like this will result in those platforms losing their ethernet
> support.  There's at least a bunch of the ARM evaluation boards which
> make use of this driver...

Right, but nothing should regress. If no clock is provided the driver
moves on during the request and will refuse to prepare, enable and
disable there after. 

Unless I've made a mistake somewhere? If so, I'd be happy to fixup.

-- 
Lee Jones
Linaro ST-Ericsson Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Shmulik Ladkani @ 2012-12-20 20:13 UTC (permalink / raw)
  To: vyasevic; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <50D342B7.9010901@redhat.com>

Hi Vlad,

On Thu, 20 Dec 2012 11:54:15 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> > (2) frame ingress on a "non-vlan" port may egress on a "vlan enabled"
> >    port, depending on the ingress VID and the port-membership map of the
> >    egress port
> >    (and thus, PVID should be defined even to "non-vlan" ports, for the
> >    case where untagged frame is received on the non-vlan port)
> 
> Sort of.  The way I did it (testing now), is like this:
>     if there is egress policy
> 	apply policy and forward.
>     else if there was ingress policy (pvid)
> 	apply it and forward
>     else
> 	forward as is (old bridge behavior).
> 
> This way if there was a pvid on an ingress port and nothing on egress,
> then pvid applies.  If there was nothing configured on ingress port,
> but we have a egress policy, we'll apply any vlan information from
> the frame to egress policy.  In this case, ingress untagged traffic 
> would be assigned vlan 0.

Sorry, got too cryptic too me ;)
We're probably aligned, but if you don't mind I'd like to make sure I
got it right.

I'd expect the following logic if the bridge is a vlan bridge:

1. Frame ingress on a port
  Frame's VID is collected: if frame was tagged, its the VID found in
  the tag; if frame was untagged (or prio-tagged), the VID would be
  port's PVID.
2. Ingress membership verification
  Verify the ingress port is a member of the frame's VID vlan (collected
  on step 1).
  (Usually policy is 'drop' in case port is not a member).
  Easily calculated by testing if the port bit is set in vlan's port
  membership map.
3. Switching logic
  Consult FDB for a forward/flood/drop decision, resulting in a map of
  candidate ports the frame might egress upon (e.g. in the common case,
  a valid existing unicast entry, the result is just one candidate
  port).
4. Egress membership verification
  For each candidate port found on step 3, verify it is a member of the
  frame's VID vlan.
  (Usually, candidate ports that aren't members of the vlan will not be
  selected for actual egress).
  This can be easily calculated by masking the candidates port map
  (found on step 3) with the vlan's port membership map. The resulting
  masked map is final egress portmap.
5. Frame tag construction and egress 
  For each final egress port (found on step 4), verify its
  tagged/untagged policy in the vlan's egress-policy map.
  Properly add/remove the vlan tag (if needed) according to port's
  egress policy, and transmit.

To my best understanding, if all the ports are "vlan-enabled" (having a
non-empty vid list, i.e. belonging to at least one vlan), the behavior
of the implemented bridge is aligned with the above scheme.

For "non-vlan" ports (having en empty vid list), we treat them as if
they belong to ANY POSSIBLE vlan (as if their bit is always set in every
vlan port membeship map). Meaning, in step (2) verification always
suceeds for such ports, and in step (4) such ports will never be masked
out of the egress candidates portmap.

Please let me know if the implementation fits this.

> I'll try to document things sufficiently.  This hybrid approach may
> produce some unintended results.  We could always remove it or introduce
> the tunable to change default policy to drop once vlan configuration is
> in effect.

Ok.

Regards,
Shmulik

^ permalink raw reply

* RE: [PATCH] wireless: mwifiex: remove unreachable paths
From: Bing Zhao @ 2012-12-20 20:13 UTC (permalink / raw)
  To: Sasha Levin
  Cc: John W. Linville, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1356030701-16284-17-git-send-email-sasha.levin@oracle.com>

Hi Sasha,

Thanks for the patch.

> We know 'firmware' is non-NULL from the beginning of mwifiex_prog_fw_w_helper,
> remove all !firmware paths from the rest of the function.

After removing all !firmware paths the function 'mwifiex_get_fw_data' becomes an orphan.

Could you please remove that function as well and resend a v2 patch?

Thanks,
Bing

> 
> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
> ---
>  drivers/net/wireless/mwifiex/usb.c | 15 +++------------
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/wireless/mwifiex/usb.c b/drivers/net/wireless/mwifiex/usb.c
> index 63ac9f2..8bd7098 100644
> --- a/drivers/net/wireless/mwifiex/usb.c
> +++ b/drivers/net/wireless/mwifiex/usb.c
> @@ -836,23 +836,14 @@ static int mwifiex_prog_fw_w_helper(struct mwifiex_adapter *adapter,
>  			dlen = 0;
>  		} else {
>  			/* copy the header of the fw_data to get the length */
> -			if (firmware)
> -				memcpy(&fwdata->fw_hdr, &firmware[tlen],
> -				       sizeof(struct fw_header));
> -			else
> -				mwifiex_get_fw_data(adapter, tlen,
> -						    sizeof(struct fw_header),
> -						    (u8 *)&fwdata->fw_hdr);
> +			memcpy(&fwdata->fw_hdr, &firmware[tlen],
> +			       sizeof(struct fw_header));
> 
>  			dlen = le32_to_cpu(fwdata->fw_hdr.data_len);
>  			dnld_cmd = le32_to_cpu(fwdata->fw_hdr.dnld_cmd);
>  			tlen += sizeof(struct fw_header);
> 
> -			if (firmware)
> -				memcpy(fwdata->data, &firmware[tlen], dlen);
> -			else
> -				mwifiex_get_fw_data(adapter, tlen, dlen,
> -						    (u8 *)fwdata->data);
> +			memcpy(fwdata->data, &firmware[tlen], dlen);
> 
>  			fwdata->seq_num = cpu_to_le32(fw_seqnum);
>  			tlen += dlen;
> --
> 1.8.0

^ permalink raw reply

* Re: NAPI documentation needed
From: Ben Hutchings @ 2012-12-20 20:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rafał Miłecki, netdev, David S. Miller
In-Reply-To: <1356033878.21834.3163.camel@edumazet-glaptop>

On Thu, 2012-12-20 at 12:04 -0800, Eric Dumazet wrote:
> On Thu, 2012-12-20 at 20:39 +0100, Rafał Miłecki wrote:
> > I wanted to report some problem I've encouraged during bgmac driver development.
> > 
> > At the very beginning I've implemented IRQ using threaded IRQ
> > (request_threaded_irq). I didn't know about NAPI until someone pointed
> > me that mistake. So I decided to rewrite IRQs handling to use NAPI.
> > I've found following documents:
> > http://www.linuxfoundation.org/collaborate/workgroups/networking/napi
> > ftp://robur.slu.se/pub/Linux/net-development/NAPI/README
> > ftp://robur.slu.se/pub/Linux/net-development/NAPI/NAPI_HOWTO.txt
> > ftp://robur.slu.se/pub/Linux/net-development/NAPI/converting-to-NAPI.txt~
> > but nothing really official sitting in kernel's Documentation dir.
> > 
> > So I started to using found documents, but then noticed they are quite outdated.
> > 
> > 1) We don't have netif_rx_schedule and netif_rx_complete anymore.
> > 2) We don't set poll and weight manually anymore but use netif_napi_add
> > 3) Return type and arguments has changed in poll. None of the
> > following is up-to-date:
> > static void my_poll (struct net_device *dev, int *budget)
> > int (*poll)(struct net_device *dev, int *budget);
> > 
> > It would be great if someone with NAPI knowledge could document it in
> > a kernel. Would be really helpful for new network drivers developers.
> 
> I think you might be the one to update/create the official NAPI
> documentation, now ideas are clear for you.
> 
> That would be really great indeed.

It would.  Last time someone asked, my answer was:

> The initial change to napi_struct is explained in
> <http://lwn.net/Articles/244640/>.
> 
> Since then there have been further changes:
> 
> - netif_napi_del() has been added.  You must call it to clean up NAPI
> contexts before freeing the associated net device(s).
> 
> - Instead of netif_rx_schedule(), netif_rx_complete(), etc. you must use
> napi_schedule(), napi_complete() etc. which just take a napi_struct
> pointer.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: NAPI documentation needed
From: Eric Dumazet @ 2012-12-20 20:04 UTC (permalink / raw)
  To: Rafał Miłecki; +Cc: netdev, David S. Miller
In-Reply-To: <CACna6rzG3KcMYUpMZv-wv6e62J9j1NLuZpTRSQHLpyfTTbKyUg@mail.gmail.com>

On Thu, 2012-12-20 at 20:39 +0100, Rafał Miłecki wrote:
> I wanted to report some problem I've encouraged during bgmac driver development.
> 
> At the very beginning I've implemented IRQ using threaded IRQ
> (request_threaded_irq). I didn't know about NAPI until someone pointed
> me that mistake. So I decided to rewrite IRQs handling to use NAPI.
> I've found following documents:
> http://www.linuxfoundation.org/collaborate/workgroups/networking/napi
> ftp://robur.slu.se/pub/Linux/net-development/NAPI/README
> ftp://robur.slu.se/pub/Linux/net-development/NAPI/NAPI_HOWTO.txt
> ftp://robur.slu.se/pub/Linux/net-development/NAPI/converting-to-NAPI.txt~
> but nothing really official sitting in kernel's Documentation dir.
> 
> So I started to using found documents, but then noticed they are quite outdated.
> 
> 1) We don't have netif_rx_schedule and netif_rx_complete anymore.
> 2) We don't set poll and weight manually anymore but use netif_napi_add
> 3) Return type and arguments has changed in poll. None of the
> following is up-to-date:
> static void my_poll (struct net_device *dev, int *budget)
> int (*poll)(struct net_device *dev, int *budget);
> 
> It would be great if someone with NAPI knowledge could document it in
> a kernel. Would be really helpful for new network drivers developers.

I think you might be the one to update/create the official NAPI
documentation, now ideas are clear for you.

That would be really great indeed.

^ permalink raw reply

* NAPI documentation needed
From: Rafał Miłecki @ 2012-12-20 19:39 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

I wanted to report some problem I've encouraged during bgmac driver development.

At the very beginning I've implemented IRQ using threaded IRQ
(request_threaded_irq). I didn't know about NAPI until someone pointed
me that mistake. So I decided to rewrite IRQs handling to use NAPI.
I've found following documents:
http://www.linuxfoundation.org/collaborate/workgroups/networking/napi
ftp://robur.slu.se/pub/Linux/net-development/NAPI/README
ftp://robur.slu.se/pub/Linux/net-development/NAPI/NAPI_HOWTO.txt
ftp://robur.slu.se/pub/Linux/net-development/NAPI/converting-to-NAPI.txt~
but nothing really official sitting in kernel's Documentation dir.

So I started to using found documents, but then noticed they are quite outdated.

1) We don't have netif_rx_schedule and netif_rx_complete anymore.
2) We don't set poll and weight manually anymore but use netif_napi_add
3) Return type and arguments has changed in poll. None of the
following is up-to-date:
static void my_poll (struct net_device *dev, int *budget)
int (*poll)(struct net_device *dev, int *budget);

It would be great if someone with NAPI knowledge could document it in
a kernel. Would be really helpful for new network drivers developers.

-- 
Rafał

^ permalink raw reply

* Re: [PATCH 4/4] net/smsc911x: Provide common clock functionality
From: Russell King - ARM Linux @ 2012-12-20 19:24 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Lee Jones, Steve Glendinning, Robert Marklund, linus.walleij,
	arnd, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <CACRpkda79O_b3Z8g7Sy7vMtW9neZU4x-Z=iEQgjqu4X5tFKyhw@mail.gmail.com>

On Thu, Dec 20, 2012 at 08:12:08PM +0100, Linus Walleij wrote:
> On Wed, Dec 19, 2012 at 6:19 PM, Lee Jones <lee.jones@linaro.org> wrote:
> 
> > Some platforms provide clocks which require enabling before the
> > SMSC911x chip will power on. This patch uses the new common clk
> > framework to do just that. If no clock is provided, it will just
> > be ignored and the driver will continue to assume that no clock
> > is required for the chip to run successfully.
> >
> > Cc: Steve Glendinning <steve.glendinning@shawell.net>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> 
> Seems to me like it'll do the trick.
> Acked-by: Linus Walleij <linus.walleij@linaro.org>

This looks fairly dangerous.  What about those platforms which use this
driver, but don't provide a clock for it?

It looks like this will result in those platforms losing their ethernet
support.  There's at least a bunch of the ARM evaluation boards which
make use of this driver...

^ permalink raw reply

* [PATCH] wireless: mwifiex: remove unreachable paths
From: Sasha Levin @ 2012-12-20 19:11 UTC (permalink / raw)
  To: Bing Zhao, John W. Linville, linux-wireless, netdev, linux-kernel
  Cc: Sasha Levin
In-Reply-To: <1356030701-16284-1-git-send-email-sasha.levin@oracle.com>

We know 'firmware' is non-NULL from the beginning of mwifiex_prog_fw_w_helper,
remove all !firmware paths from the rest of the function.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/wireless/mwifiex/usb.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/usb.c b/drivers/net/wireless/mwifiex/usb.c
index 63ac9f2..8bd7098 100644
--- a/drivers/net/wireless/mwifiex/usb.c
+++ b/drivers/net/wireless/mwifiex/usb.c
@@ -836,23 +836,14 @@ static int mwifiex_prog_fw_w_helper(struct mwifiex_adapter *adapter,
 			dlen = 0;
 		} else {
 			/* copy the header of the fw_data to get the length */
-			if (firmware)
-				memcpy(&fwdata->fw_hdr, &firmware[tlen],
-				       sizeof(struct fw_header));
-			else
-				mwifiex_get_fw_data(adapter, tlen,
-						    sizeof(struct fw_header),
-						    (u8 *)&fwdata->fw_hdr);
+			memcpy(&fwdata->fw_hdr, &firmware[tlen],
+			       sizeof(struct fw_header));
 
 			dlen = le32_to_cpu(fwdata->fw_hdr.data_len);
 			dnld_cmd = le32_to_cpu(fwdata->fw_hdr.dnld_cmd);
 			tlen += sizeof(struct fw_header);
 
-			if (firmware)
-				memcpy(fwdata->data, &firmware[tlen], dlen);
-			else
-				mwifiex_get_fw_data(adapter, tlen, dlen,
-						    (u8 *)fwdata->data);
+			memcpy(fwdata->data, &firmware[tlen], dlen);
 
 			fwdata->seq_num = cpu_to_le32(fw_seqnum);
 			tlen += dlen;
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH 4/4] net/smsc911x: Provide common clock functionality
From: Linus Walleij @ 2012-12-20 19:12 UTC (permalink / raw)
  To: Lee Jones
  Cc: linux-arm-kernel, linux-kernel, arnd, linus.walleij,
	Steve Glendinning, netdev, Robert Marklund
In-Reply-To: <1355937587-31730-4-git-send-email-lee.jones@linaro.org>

On Wed, Dec 19, 2012 at 6:19 PM, Lee Jones <lee.jones@linaro.org> wrote:

> Some platforms provide clocks which require enabling before the
> SMSC911x chip will power on. This patch uses the new common clk
> framework to do just that. If no clock is provided, it will just
> be ignored and the driver will continue to assume that no clock
> is required for the chip to run successfully.
>
> Cc: Steve Glendinning <steve.glendinning@shawell.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Lee Jones <lee.jones@linaro.org>

Seems to me like it'll do the trick.
Acked-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij

^ permalink raw reply

* [PATCH] bnx2x: use ARRAY_SIZE where possible
From: Sasha Levin @ 2012-12-20 19:11 UTC (permalink / raw)
  To: Eilon Greenstein, netdev, linux-kernel; +Cc: Sasha Levin
In-Reply-To: <1356030701-16284-1-git-send-email-sasha.levin@oracle.com>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index 09096b4..cb41f54 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -3659,7 +3659,7 @@ static void bnx2x_warpcore_enable_AN_KR2(struct bnx2x_phy *phy,
 	bnx2x_cl45_read_or_write(bp, phy, MDIO_WC_DEVAD,
 				 MDIO_WC_REG_CL49_USERB0_CTRL, (3<<6));
 
-	for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+	for (i = 0; i < ARRAY_SIZE(reg_set); i++)
 		bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
 				 reg_set[i].val);
 
@@ -3713,7 +3713,7 @@ static void bnx2x_warpcore_enable_AN_KR(struct bnx2x_phy *phy,
 	};
 	DP(NETIF_MSG_LINK, "Enable Auto Negotiation for KR\n");
 	/* Set to default registers that may be overriden by 10G force */
-	for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+	for (i = 0; i < ARRAY_SIZE(reg_set); i++)
 		bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
 				 reg_set[i].val);
 
@@ -3854,7 +3854,7 @@ static void bnx2x_warpcore_set_10G_KR(struct bnx2x_phy *phy,
 		{MDIO_PMA_DEVAD, MDIO_WC_REG_PMD_KR_CONTROL, 0x2}
 	};
 
-	for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+	for (i = 0; i < ARRAY_SIZE(reg_set); i++)
 		bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
 				 reg_set[i].val);
 
@@ -4242,7 +4242,7 @@ static void bnx2x_warpcore_clear_regs(struct bnx2x_phy *phy,
 	bnx2x_cl45_read_or_write(bp, phy, MDIO_WC_DEVAD,
 				 MDIO_WC_REG_RX66_CONTROL, (3<<13));
 
-	for (i = 0; i < sizeof(wc_regs)/sizeof(struct bnx2x_reg_set); i++)
+	for (i = 0; i < ARRAY_SIZE(wc_regs); i++)
 		bnx2x_cl45_write(bp, phy, wc_regs[i].devad, wc_regs[i].reg,
 				 wc_regs[i].val);
 
@@ -9520,7 +9520,7 @@ static void bnx2x_save_848xx_spirom_version(struct bnx2x_phy *phy,
 	} else {
 		/* For 32-bit registers in 848xx, access via MDIO2ARM i/f. */
 		/* (1) set reg 0xc200_0014(SPI_BRIDGE_CTRL_2) to 0x03000000 */
-		for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set);
+		for (i = 0; i < ARRAY_SIZE(reg_set);
 		      i++)
 			bnx2x_cl45_write(bp, phy, reg_set[i].devad,
 					 reg_set[i].reg, reg_set[i].val);
@@ -9592,7 +9592,7 @@ static void bnx2x_848xx_set_led(struct bnx2x *bp,
 			 MDIO_PMA_DEVAD,
 			 MDIO_PMA_REG_8481_LINK_SIGNAL, val);
 
-	for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+	for (i = 0; i < ARRAY_SIZE(reg_set); i++)
 		bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
 				 reg_set[i].val);
 
@@ -13395,7 +13395,7 @@ static void bnx2x_disable_kr2(struct link_params *params,
 	};
 	DP(NETIF_MSG_LINK, "Disabling 20G-KR2\n");
 
-	for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+	for (i = 0; i < ARRAY_SIZE(reg_set); i++)
 		bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
 				 reg_set[i].val);
 	vars->link_attr_sync &= ~LINK_ATTR_SYNC_KR2_ENABLE;
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH 3/3] iproute2: make `bridge mdb` output consistent with input
From: Stephen Hemminger @ 2012-12-20 18:58 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, bridge
In-Reply-To: <1356013915-20835-3-git-send-email-amwang@redhat.com>

On Thu, 20 Dec 2012 22:31:55 +0800
Cong Wang <amwang@redhat.com> wrote:

> bridge -> dev
> group -> grp
> 

All three patches accepted for next version of iproute2.

^ permalink raw reply

* Re: Lockdep warning in vxlan
From: Stephen Hemminger @ 2012-12-20 18:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yan Burman, netdev
In-Reply-To: <1356027360.21834.2973.camel@edumazet-glaptop>

On Thu, 20 Dec 2012 10:16:00 -0800
Eric Dumazet <erdnetdev@gmail.com> wrote:

> On Thu, 2012-12-20 at 08:34 -0800, Stephen Hemminger wrote:
> > On Thu, 20 Dec 2012 16:00:32 +0200
> > Yan Burman <yanb@mellanox.com> wrote:
> >   
> > > Hi.
> > > 
> > > When working with vxlan from current net-next, I got a lockdep warning 
> > > (below).
> > > It seems to happen when I have host B pinging host A and while the pings 
> > > continue,
> > > I do "ip link del" on the vxlan interface on host A. The lockdep warning 
> > > is on host A.
> > > Tell me if you need some more info.
> > >   
> > 
> > Looks like the case of nested ARP requests, the initial request is coming
> > from neigh_timer (ARP retransmit), but inside neigh_probe the lock
> > is dropped?  
> 
> Bug is from arp_solicit(), releasing the lock after arp_send()
> 
> Its used to protect neigh->ha
> 
> We could instead copy neigh->ha, without taking n->lock but ha_lock
> seqlock, using neigh_ha_snapshot() helper
> 
> Yan, could you test the following patch ?
> 
> Thanks
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index ce6fbdf..1169ed4 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -321,7 +321,7 @@ static void arp_error_report(struct neighbour *neigh, struct sk_buff *skb)
>  static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
>  {
>  	__be32 saddr = 0;
> -	u8  *dst_ha = NULL;
> +	u8 dst_ha[MAX_ADDR_LEN];
>  	struct net_device *dev = neigh->dev;
>  	__be32 target = *(__be32 *)neigh->primary_key;
>  	int probes = atomic_read(&neigh->probes);
> @@ -363,9 +363,9 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
>  	if (probes < 0) {
>  		if (!(neigh->nud_state & NUD_VALID))
>  			pr_debug("trying to ucast probe in NUD_INVALID\n");
> -		dst_ha = neigh->ha;
> -		read_lock_bh(&neigh->lock);
> +		neigh_ha_snapshot(dst_ha, neigh, dev);
>  	} else {
> +		memset(dst_ha, 0, dev->addr_len);
>  		probes -= neigh->parms->app_probes;
>  		if (probes < 0) {
>  #ifdef CONFIG_ARPD
> @@ -377,8 +377,6 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
>  
>  	arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
>  		 dst_ha, dev->dev_addr, NULL);
> -	if (dst_ha)
> -		read_unlock_bh(&neigh->lock);
>  }
>  
>  static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)

I like this. Getting rid of yet another read lock

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox