Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/2] tg3: Fix NETIF_F_LOOPBACK error
From: Matt Carlson @ 2011-05-20  1:59 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: Matthew Carlson, David Miller, linux-netdev
In-Reply-To: <BANLkTi==K_eTcqQ39HwcBcg9AyuMLuhz6w@mail.gmail.com>

On Thu, May 19, 2011 at 06:15:18PM -0700, Mahesh Bandewar wrote:
> On Thu, May 19, 2011 at 6:11 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> > Mahesh Bandewar noticed that the features cleanup in commit
> > 0da0606f493c5cdab74bdcc96b12f4305ad94085, entitled
> > "tg3: Consolidate all netdev feature assignments", mistakenly sets
> > NETIF_F_LOOPBACK by default. ?This patch corrects the error.
> >
> > Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
> > ---
> > ?drivers/net/tg3.c | ? ?3 ++-
> > ?1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> > index 012ce70..0b78c5d 100644
> > --- a/drivers/net/tg3.c
> > +++ b/drivers/net/tg3.c
> > @@ -15080,6 +15080,8 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
> > ? ? ? ? ? ? ? ? ? ? ? ?features |= NETIF_F_TSO_ECN;
> > ? ? ? ?}
> >
> > + ? ? ? dev->features |= features;
> > +
> > ? ? ? ?/*
> > ? ? ? ? * Add loopback capability only for a subset of devices that support
> > ? ? ? ? * MAC-LOOPBACK. Eventually this need to be enhanced to allow INT-PHY
> > @@ -15090,7 +15092,6 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
> > ? ? ? ? ? ? ? ?/* Add the loopback capability */
> > ? ? ? ? ? ? ? ?features |= NETIF_F_LOOPBACK;
> >
> > - ? ? ? dev->features |= features;
> > ? ? ? ?dev->hw_features |= features;
> > ? ? ? ?dev->vlan_features |= features;
> I think this line should go up too. Otherwise newly created vlan
> device(s) will have spurious loopback bit set.

Yes.  You are right.  I thought vlan_features functioned like
hw_features.


^ permalink raw reply

* [PATCH net-next v2 1/2] tg3: Fix NETIF_F_LOOPBACK error
From: Matt Carlson @ 2011-05-20  2:02 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, Mahesh Bandewar

Mahesh Bandewar noticed that the features cleanup in commit
0da0606f493c5cdab74bdcc96b12f4305ad94085, entitled
"tg3: Consolidate all netdev feature assignments", mistakenly sets
NETIF_F_LOOPBACK by default.  This patch corrects the error.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/tg3.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 012ce70..284e998 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -15080,6 +15080,9 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
 			features |= NETIF_F_TSO_ECN;
 	}
 
+	dev->features |= features;
+	dev->vlan_features |= features;
+
 	/*
 	 * Add loopback capability only for a subset of devices that support
 	 * MAC-LOOPBACK. Eventually this need to be enhanced to allow INT-PHY
@@ -15090,9 +15093,7 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
 		/* Add the loopback capability */
 		features |= NETIF_F_LOOPBACK;
 
-	dev->features |= features;
 	dev->hw_features |= features;
-	dev->vlan_features |= features;
 
 	if (tp->pci_chip_rev_id == CHIPREV_ID_5705_A1 &&
 	    !tg3_flag(tp, TSO_CAPABLE) &&
-- 
1.7.3.4



^ permalink raw reply related

* [PATCH net-next v2 0/2] tg3: Quickfixes
From: Matt Carlson @ 2011-05-20  2:02 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson

This patchset applies some quickfixes to the previous patchset.



^ permalink raw reply

* [PATCH net-next v2 2/2] tg3: Add braces around 5906 workaround.
From: Matt Carlson @ 2011-05-20  2:02 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson

Commit dabc5c670d3f86d15ee4f42ab38ec5bd2682487d, entitled
"tg3: Move TSO_CAPABLE assignment", moved some TSO flagging code around.
In the process it failed to add braces around an exceptional 5906
condition.  This patch fixes the problem.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/tg3.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 284e998..db19332 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -13707,9 +13707,11 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
 				     tp->pcie_cap + PCI_EXP_LNKCTL,
 				     &lnkctl);
 		if (lnkctl & PCI_EXP_LNKCTL_CLKREQ_EN) {
-			if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5906)
+			if (GET_ASIC_REV(tp->pci_chip_rev_id) ==
+			    ASIC_REV_5906) {
 				tg3_flag_clear(tp, HW_TSO_2);
 				tg3_flag_clear(tp, TSO_CAPABLE);
+			}
 			if (GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5784 ||
 			    GET_ASIC_REV(tp->pci_chip_rev_id) == ASIC_REV_5761 ||
 			    tp->pci_chip_rev_id == CHIPREV_ID_57780_A0 ||
-- 
1.7.3.4



^ permalink raw reply related

* RE: packet received in a wrong rx-queue?
From: Jon Zhou @ 2011-05-20  1:48 UTC (permalink / raw)
  To: David Miller; +Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
In-Reply-To: <20110518.233614.1134947771941589398.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, May 19, 2011 11:36 AM
> To: Jon Zhou
> Cc: e1000-devel@lists.sourceforge.net; netdev@vger.kernel.org
> Subject: Re: packet received in a wrong rx-queue?
> 
> From: Jon Zhou <Jon.Zhou@jdsu.com>
> Date: Wed, 18 May 2011 20:22:07 -0700
> 
> > form the 82599 datasheet, the hash algorithm is consist of src/dst
> > ip, src/dst port,protocol why it got different hash value with same
> > ip/port pair?
> 
> The same reason why feeding different sets of discrete 32-bit and
> 16-bit values to a cryptographic hash results in a different final
> hash value.
> 
> Our software RPS/RFS implementation used to have this quality too.

Even that it gets different hash value, both packets should be delivered to same rx-queue,

Because they are belong to same flow?

^ permalink raw reply

* Re: [PATCH] tcp: Expose the initial RTO via a new sysctl.
From: H.K. Jerry Chu @ 2011-05-20  2:01 UTC (permalink / raw)
  To: tsuna
  Cc: David Miller, kuznet, pekkas, jmorris, yoshfuji, kaber, hkchu,
	netdev, linux-kernel
In-Reply-To: <BANLkTik-6wh8=jb6oMEpJvYC8+KTGGsMsw@mail.gmail.com>

On Wed, May 18, 2011 at 12:40 PM, tsuna <tsunanet@gmail.com> wrote:
> On Wed, May 18, 2011 at 12:26 PM, David Miller <davem@davemloft.net> wrote:
>> If you read the ietf draft that reduces the initial RTO down to 1
>> second, it states that if we take a timeout during the initial
>> connection handshake then we have to revert the RTO back up to 3
>> seconds.
>>
>> This fallback logic conflicts with being able to only change the
>> initial RTO via sysctl, I think.  Because there are actually two
>> values at stake and they depend upon eachother, the initial RTO and
>> the value we fallback to on initial handshake retransmissions.
>>
>> So I'd rather get a patch that implements the 1 second initial
>> RTO with the 3 second fallback on SYN retransmit, than this patch.
>>
>> We already have too many knobs.
>
> I was hoping this knob would be accepted because this is such an
> important issue that it even warrants an IETF draft to attempt to
> change the standard.  I'm not sure how long it will take for this
> draft to be accepted and then implemented, so I thought adding this
> simple knob today would really help in the future.

As one of the co-authors of rfc2988bis I was planning to provide a patch
as soon as the draft gets approved but it looks like you have beaten
me to it :)

Personally I'm in favor of a knob too. We at Google has added such a
knob for years.

Jerry

>
> Plus, should the draft be accepted, this knob will still be just as
> useful (e.g. to revert back to today's behavior), and people might
> want to consider adding another knob for the fallback initRTO (this is
> debatable).  I don't believe this knob conflicts with the proposed
> change to the standard, it actually goes along with it pretty well and
> helps us prepare better for this upcoming change.
>
> I agree that there are too many knobs, and I hate feature creep too,
> but I've found many of these knobs to be really useful, and the degree
> to which Linux's TCP stack can be tuned is part of what makes it so
> versatile.
>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [Bridge] [Patch] bridge: call NETDEV_ENSLAVE notifiers when adding a slave
From: Cong Wang @ 2011-05-20  3:06 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Neil Horman, bridge, netdev, Jay Vosburgh, linux-kernel, akpm,
	David S. Miller
In-Reply-To: <20110519090411.1f039a88@nehalam>

于 2011年05月20日 00:04, Stephen Hemminger 写道:
> On Thu, 19 May 2011 08:12:13 -0700
> Stephen Hemminger<shemminger@linux-foundation.org>  wrote:
>
>> On Thu, 19 May 2011 18:24:17 +0800
>> Amerigo Wang<amwang@redhat.com>  wrote:
>>
>>> In the previous patch I added NETDEV_ENSLAVE, now
>>> we can notify netconsole when adding a device to a bridge too.
>>>
>>> By the way, s/netdev_bonding_change/call_netdevice_notifiers/ in
>>> bond_main.c, since this is not bonding specific.
>>>
>>> Signed-off-by: WANG Cong<amwang@redhat.com>
>>> Cc: Neil Horman<nhorman@redhat.com>
>>>
>>
>> Is there a usage for this? What listens for this notification?
>
> Never mind it was in the first patch which you did not send.
> You should always put a number on group of patches and send
> to all parties.

Ah, sorry, my script simply run get_maintainers.pl to
get the Cc list, so bridge list was not in included in
the first patch.

>
> Also, sending networking patches to LKML is a waste of bandwidth
> please don't bother.

Ok, will fix my script.

Thanks.

^ permalink raw reply

* Re: [V2 Patch net-next-2.6] netpoll: disable netpoll when enslave a device
From: Cong Wang @ 2011-05-20  3:10 UTC (permalink / raw)
  To: Neil Horman
  Cc: Andy Gospodarek, linux-kernel, akpm, Jay Vosburgh,
	David S. Miller, Ian Campbell, Paul E. McKenney, Josh Triplett,
	netdev
In-Reply-To: <20110519132533.GA6729@shamino.rdu.redhat.com>

于 2011年05月19日 21:25, Neil Horman 写道:
> On Thu, May 19, 2011 at 07:31:27AM -0400, Andy Gospodarek wrote:
>> On Thu, May 19, 2011 at 04:39:53PM +0800, Amerigo Wang wrote:
>> [...]
>>> diff --git a/include/linux/notifier.h b/include/linux/notifier.h
>>> index 621dfa1..3d82867 100644
>>> --- a/include/linux/notifier.h
>>> +++ b/include/linux/notifier.h
>>> @@ -211,6 +211,7 @@ static inline int notifier_to_errno(int ret)
>>>   #define NETDEV_UNREGISTER_BATCH 0x0011
>>>   #define NETDEV_BONDING_DESLAVE  0x0012
>>>   #define NETDEV_NOTIFY_PEERS	0x0013
>>> +#define NETDEV_ENSLAVE		0x0014
>>>
>>>   #define SYS_DOWN	0x0001	/* Notify of system down */
>>>   #define SYS_RESTART	SYS_DOWN
>>
>> Neil just noted the same concern I had -- the asymmetry between
>> NETDEV_ENSLAVE and NETDEV_BONDING_DESLAVE bothers me a bit.  I also
>> don't really like the followup patch that uses 'ENSLAVE' in the bridging
>> code when we typically use that language for bonding only.
>>
>> What about changing NETDEV_BONDING_DESLAVE to NETDEV_RELEASE and create
>> NETDEV_JOIN instead of NETDEV_ENSLAVE?  I would prefer that or something
>> else that might use more generic language that could be applied to all
>> for stacked interfaces.
> JOIN and RELEASE (or perhaps LEAVE) sounds good to me.

Thanks, Andy and Neil! I will rename them to JOIN and RELEASE.

^ permalink raw reply

* Re: packet received in a wrong rx-queue?
From: David Miller @ 2011-05-20  3:12 UTC (permalink / raw)
  To: Jon.Zhou; +Cc: e1000-devel, netdev
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F250D8853B7@MILEXCH2.ds.jdsu.net>

From: Jon Zhou <Jon.Zhou@jdsu.com>
Date: Thu, 19 May 2011 18:48:02 -0700

> Even that it gets different hash value, both packets should be delivered to same rx-queue,
> 
> Because they are belong to same flow?

That's not how it works.  They go to different queues, because the bits
are different for the flow keys in each direction.

I think we've brought this discussion to a conclusion, and that this
behavior is intentional.

Thank you.

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: TCP funny-ness when over-driving a 1Gbps link.
From: Ben Greear @ 2011-05-20  3:39 UTC (permalink / raw)
  To: rick.jones2; +Cc: Stephen Hemminger, netdev
In-Reply-To: <1305852377.8149.1133.camel@tardy>

On 05/19/2011 05:46 PM, Rick Jones wrote:
> On Thu, 2011-05-19 at 17:37 -0700, Ben Greear wrote:
>> On 05/19/2011 05:24 PM, Rick Jones wrote:
>>>>>> [root@i7-965-1 igb]# netstat -an|grep tcp|grep 8.1.1
>>>>>> tcp        0      0 8.1.1.1:33038               0.0.0.0:*                   LISTEN
>>>>>> tcp        0      0 8.1.1.1:33040               0.0.0.0:*                   LISTEN
>>>>>> tcp        0      0 8.1.1.1:33042               0.0.0.0:*                   LISTEN
>>>>>> tcp        0 9328612 8.1.1.2:33039               8.1.1.1:33040               ESTABLISHED
>>>>>> tcp        0 17083176 8.1.1.1:33038               8.1.1.2:33037               ESTABLISHED
>>>>>> tcp        0 9437340 8.1.1.2:33037               8.1.1.1:33038               ESTABLISHED
>>>>>> tcp        0 17024620 8.1.1.1:33040               8.1.1.2:33039               ESTABLISHED
>>>>>> tcp        0 19557040 8.1.1.1:33042               8.1.1.2:33041               ESTABLISHED
>>>>>> tcp        0 9416600 8.1.1.2:33041               8.1.1.1:33042               ESTABLISHED
>>>>>
>>>>> I take it your system has higher values for the tcp_wmem value:
>>>>>
>>>>> net.ipv4.tcp_wmem = 4096 16384 4194304
>>>>
>>>> Yes:
>>>> [root@i7-965-1 igb]# cat /proc/sys/net/ipv4/tcp_wmem
>>>> 4096	16384	50000000
>>>
>>> Why?!?  Are you trying to get link-rate to Mars or something?  (I assume
>>> tcp_rmem is similarly set...)  If you are indeed doing one 1 GbE, and no
>>> more than 100ms then the default (?) of 4194304 should have been more
>>> than sufficient.
>>
>> Well, we occasionally do tests over emulated links that have several
>> seconds of delay and may be running multiple Gbps.  Either way,
>> I'd hope that offering extra RAM to a subsystem wouldn't cause it
>> to go nuts.
>
> It has been my experience that the autotuning tends to grow things
> beyond the bandwidthXdelay product.

Seems a likely culprit, or somehow it's not detecting round-trip-time
correctly, or maybe the timestamp is calculated when the pkt goes into
the send queue, and not when it's actually sent to the NIC?

>
> As for several seconds of delay and multiple Gbps - unless you are
> shooting the Moon, sounds like bufferbloat?-)

We try to test our stuff in all sorts of strange cases.  Maybe
some users really are emulating lunar traffic, or even beyond.
We also can emulate buffer bloat..but in this particular case,
real round-trip time is about 1-2ms, so if the socket is queuing up
a second's worth of bytes on the xmit buffer, then it's not
the network's fault...it's the sender.

>> Assuming this isn't some magical 1Gbps issue, you
>> could probably hit the same problem with a wifi link and
>> default tcp_wmem settings...
>
> Do you also increase tx queue's for the NIC(s)?

No, they are at the default (1000, I think).  That's only
a few ms at 1Gbps speed, so the problem is mostly higher
in the stack.

Thanks,
Ben

>
> rick


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* linux-next: build failure after merge of the tip tree (net tree interaction)
From: Stephen Rothwell @ 2011-05-20  4:14 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra
  Cc: linux-next, linux-kernel, Jacek Luczak, David Miller, netdev,
	Lai Jiangshan, Paul E. McKenney

Hi all,

After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

net/sctp/bind_addr.c: In function 'sctp_bind_addr_clean':
net/sctp/bind_addr.c:148: error: 'sctp_local_addr_free' undeclared (first use in this function)

Caused by commit 1231f0baa547 ("net,rcu: convert call_rcu
(sctp_local_addr_free) to kfree_rcu()") interacting with commit
c182f90bc1f2 ("SCTP: fix race between sctp_bind_addr_free() and
sctp_bind_addr_conflict()") from the net tree.

I applied the following patch as a merge fix:

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 20 May 2011 14:11:11 +1000
Subject: [PATCH] net,rcu: convert another call to call_rcu(sctp_local_addr_free)

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 net/sctp/bind_addr.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 6338413..83e3011 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -145,7 +145,7 @@ static void sctp_bind_addr_clean(struct sctp_bind_addr *bp)
 	/* Empty the bind address list. */
 	list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
 		list_del_rcu(&addr->list);
-		call_rcu(&addr->rcu, sctp_local_addr_free);
+		kfree_rcu(addr, rcu);
 		SCTP_DBG_OBJCNT_DEC(addr);
 	}
 }
-- 
1.7.5.1


-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

^ permalink raw reply related

* Re: [PATCH net-next-2.6] macvlan: remove one synchronize_rcu() call
From: Eric Dumazet @ 2011-05-20  4:28 UTC (permalink / raw)
  To: Ben Greear; +Cc: David Miller, Patrick McHardy, netdev
In-Reply-To: <4DD5B36C.2020407@candelatech.com>

Le jeudi 19 mai 2011 à 17:18 -0700, Ben Greear a écrit :

> I applied this to today's wireless-testing kernel.  There is a consistent
> speedup in deleting mac-vlans!  I wouldn't read much into changes in
> creating macvlans or adding IPs..those numbers just jump around a bit
> from run to run.
> 
> Before the patch:
> Deleted 500 macvlan in 25.424282 seconds. (0.050848564 per interface)
> 
> 
> After the patch:
> 
> Deleted 500 macvlan in 21.831413 seconds. (0.043662826 per interface)
> 

Thanks for testing !

My ultimate goal would be to reduce vlan/macvlan delete latency from 50
ms to less than 5 ms. Step by step...

(But this will wait after this merge window, since next changes are a
bit more complex)

Stay tuned ;)




^ permalink raw reply

* Re: [PATCH net-next v2 0/2] tg3: Quickfixes
From: David Miller @ 2011-05-20  4:33 UTC (permalink / raw)
  To: mcarlson; +Cc: netdev
In-Reply-To: <1305856964-2291-1-git-send-email-mcarlson@broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Thu, 19 May 2011 19:02:42 -0700

> This patchset applies some quickfixes to the previous patchset.

Both applied, thanks Matt.

^ permalink raw reply

* Re: [PATCH net-next-2.6] macvlan: remove one synchronize_rcu() call
From: David Miller @ 2011-05-20  4:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kaber, netdev, greearb
In-Reply-To: <1305843856.3156.36.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 20 May 2011 00:24:16 +0200

> When one macvlan device is dismantled, we can avoid one
> synchronize_rcu() call done after deletion from hash list, since caller
> will perform a synchronize_net() call after its ndo_stop() call.
> 
> Add a new netdev->dismantle field to signal this dismantle intent.
> 
> Reduces RTNL hold time.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* [PATCH v3 resend] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: Kevin Cernekee @ 2011-05-20  4:36 UTC (permalink / raw)
  To: Patrick McHardy, David S. Miller
  Cc: Eric Dumazet, netfilter-devel, netfilter, coreteam, linux-kernel,
	netdev

Most SIP devices use a source port of 5060/udp on SIP requests, so the
response automatically comes back to port 5060:

phone_ip:5060 -> proxy_ip:5060   REGISTER
proxy_ip:5060 -> phone_ip:5060   100 Trying

The newer Cisco IP phones, however, use a randomly chosen high source
port for the SIP request but expect the response on port 5060:

phone_ip:49173 -> proxy_ip:5060  REGISTER
proxy_ip:5060 -> phone_ip:5060   100 Trying

Standard Linux NAT, with or without nf_nat_sip, will send the reply back
to port 49173, not 5060:

phone_ip:49173 -> proxy_ip:5060  REGISTER
proxy_ip:5060 -> phone_ip:49173  100 Trying

But the phone is not listening on 49173, so it will never see the reply.

This patch modifies nf_*_sip to work around this quirk by extracting
the SIP response port from the Via: header, iff the source IP in the
packet header matches the source IP in the SIP request.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
---
 include/linux/netfilter/nf_conntrack_sip.h |    3 +++
 net/ipv4/netfilter/nf_nat_sip.c            |   26 +++++++++++++++++++++++---
 net/netfilter/nf_conntrack_sip.c           |   17 +++++++++++++++++
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 0ce91d5..feda699 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -2,12 +2,15 @@
 #define __NF_CONNTRACK_SIP_H__
 #ifdef __KERNEL__
 
+#include <linux/types.h>
+
 #define SIP_PORT	5060
 #define SIP_TIMEOUT	3600
 
 struct nf_ct_sip_master {
 	unsigned int	register_cseq;
 	unsigned int	invite_cseq;
+	__be16		forced_dport;
 };
 
 enum sip_expectation_classes {
diff --git a/net/ipv4/netfilter/nf_nat_sip.c b/net/ipv4/netfilter/nf_nat_sip.c
index e40cf78..e5856b0 100644
--- a/net/ipv4/netfilter/nf_nat_sip.c
+++ b/net/ipv4/netfilter/nf_nat_sip.c
@@ -73,6 +73,7 @@ static int map_addr(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_conn_help *help = nfct_help(ct);
 	char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
 	unsigned int buflen;
 	__be32 newaddr;
@@ -85,7 +86,8 @@ static int map_addr(struct sk_buff *skb, unsigned int dataoff,
 	} else if (ct->tuplehash[dir].tuple.dst.u3.ip == addr->ip &&
 		   ct->tuplehash[dir].tuple.dst.u.udp.port == port) {
 		newaddr = ct->tuplehash[!dir].tuple.src.u3.ip;
-		newport = ct->tuplehash[!dir].tuple.src.u.udp.port;
+		newport = help->help.ct_sip_info.forced_dport ? :
+			  ct->tuplehash[!dir].tuple.src.u.udp.port;
 	} else
 		return 1;
 
@@ -121,6 +123,7 @@ static unsigned int ip_nat_sip(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_conn_help *help = nfct_help(ct);
 	unsigned int coff, matchoff, matchlen;
 	enum sip_header_types hdr;
 	union nf_inet_addr addr;
@@ -229,6 +232,20 @@ next:
 	    !map_sip_addr(skb, dataoff, dptr, datalen, SIP_HDR_TO))
 		return NF_DROP;
 
+	/* Mangle destination port for Cisco phones, then fix up checksums */
+	if (dir == IP_CT_DIR_REPLY && help->help.ct_sip_info.forced_dport) {
+		struct udphdr *uh;
+
+		if (!skb_make_writable(skb, skb->len))
+			return NF_DROP;
+
+		uh = (struct udphdr *)(skb->data + ip_hdrlen(skb));
+		uh->dest = help->help.ct_sip_info.forced_dport;
+
+		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, 0, 0, NULL, 0))
+			return NF_DROP;
+	}
+
 	return NF_ACCEPT;
 }
 
@@ -280,8 +297,10 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int dataoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_conn_help *help = nfct_help(ct);
 	__be32 newip;
 	u_int16_t port;
+	__be16 srcport;
 	char buffer[sizeof("nnn.nnn.nnn.nnn:nnnnn")];
 	unsigned buflen;
 
@@ -294,8 +313,9 @@ static unsigned int ip_nat_sip_expect(struct sk_buff *skb, unsigned int dataoff,
 	/* If the signalling port matches the connection's source port in the
 	 * original direction, try to use the destination port in the opposite
 	 * direction. */
-	if (exp->tuple.dst.u.udp.port ==
-	    ct->tuplehash[dir].tuple.src.u.udp.port)
+	srcport = help->help.ct_sip_info.forced_dport ? :
+		  ct->tuplehash[dir].tuple.src.u.udp.port;
+	if (exp->tuple.dst.u.udp.port == srcport)
 		port = ntohs(ct->tuplehash[!dir].tuple.dst.u.udp.port);
 	else
 		port = ntohs(exp->tuple.dst.u.udp.port);
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index 237cc19..b0c16b0 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1363,8 +1363,25 @@ static int process_sip_request(struct sk_buff *skb, unsigned int dataoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conn_help *help = nfct_help(ct);
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 	unsigned int matchoff, matchlen;
 	unsigned int cseq, i;
+	union nf_inet_addr addr;
+	__be16 port;
+
+	/* Many Cisco IP phones use a high source port for SIP requests, but
+	 * listen for the response on port 5060.  If we are the local
+	 * router for one of these phones, save the port number from the
+	 * Via: header so that nf_nat_sip can redirect the responses to
+	 * the correct port.
+	 */
+	if (ct_sip_parse_header_uri(ct, *dptr, NULL, *datalen,
+				    SIP_HDR_VIA_UDP, NULL, &matchoff,
+				    &matchlen, &addr, &port) > 0 &&
+	    port != ct->tuplehash[dir].tuple.src.u.udp.port &&
+	    nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.src.u3))
+		help->help.ct_sip_info.forced_dport = port;
 
 	for (i = 0; i < ARRAY_SIZE(sip_handlers); i++) {
 		const struct sip_handler *handler;
-- 
1.7.5

^ permalink raw reply related

* [PATCH net-next-2.6] net: remove synchronize_net() from netdev_set_master()
From: Eric Dumazet @ 2011-05-20  5:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger, Jiri Pirko

In the old days, we used to access dev->master in __netif_receive_skb()
in a rcu_read_lock section.

So one synchronize_net() call was needed in netdev_set_master() to make
sure another cpu could not use old master while/after we release it.

We now use netdev_rx_handler infrastructure and added one
synchronize_net() call in bond_release()/bond_release_all()

Remove the obsolete synchronize_net() from netdev_set_master() and add
one in bridge del_nbp() after its netdev_rx_handler_unregister() call.

This makes enslave -d a bit faster.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jiri Pirko <jpirko@redhat.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
---
 net/bridge/br_if.c |    1 +
 net/core/dev.c     |    4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 5dbdfdf..3e18d14 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -147,6 +147,7 @@ static void del_nbp(struct net_bridge_port *p)
 	dev->priv_flags &= ~IFF_BRIDGE_PORT;
 
 	netdev_rx_handler_unregister(dev);
+	synchronize_net();
 
 	netdev_set_master(dev, NULL);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 155de20..29b3f7a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4294,10 +4294,8 @@ int netdev_set_master(struct net_device *slave, struct net_device *master)
 
 	slave->master = master;
 
-	if (old) {
-		synchronize_net();
+	if (old)
 		dev_put(old);
-	}
 	return 0;
 }
 EXPORT_SYMBOL(netdev_set_master);



^ permalink raw reply related

* linux-next: build failure after merge of the final tree
From: Stephen Rothwell @ 2011-05-20  6:18 UTC (permalink / raw)
  To: Linus
  Cc: linux-next, linux-kernel, David S. Miller, netdev, Andrew Morton,
	Mel Gorman, linux-mm, Alexander Viro, linux-fsdevel,
	Paul E. McKenney, Dipankar Sarma

Hi all,

After merging the final tree, today's linux-next build (sparc32 defconfig)
failed like this:

mm/page_alloc.c: In function '__free_pages_bootmem':
mm/page_alloc.c:704: error: implicit declaration of function 'prefetchw'
fs/dcache.c: In function '__d_lookup_rcu':
fs/dcache.c:1810: error: implicit declaration of function 'prefetch'
fs/inode.c: In function 'new_inode':
fs/inode.c:894: error: implicit declaration of function 'spin_lock_prefetch'
net/core/skbuff.c: In function '__alloc_skb':
net/core/skbuff.c:184: error: implicit declaration of function 'prefetchw'
In file included from net/ipv4/ip_forward.c:32:
include/net/udp.h: In function 'udp_csum_outgoing':
include/net/udp.h:141: error: implicit declaration of function 'prefetch'
In file included from net/ipv6/af_inet6.c:48:
include/net/udp.h: In function 'udp_csum_outgoing':
include/net/udp.h:141: error: implicit declaration of function 'prefetch'
net/unix/af_unix.c: In function 'unix_ioctl':
net/unix/af_unix.c:2066: error: implicit declaration of function 'prefetch'
In file included from net/sunrpc/xprtsock.c:44:
include/net/udp.h: In function 'udp_csum_outgoing':
include/net/udp.h:141: error: implicit declaration of function 'prefetch'
kernel/rcutiny.c: In function 'rcu_process_callbacks':
kernel/rcutiny.c:180: error: implicit declaration of function 'prefetch'

Caused by commit e66eed651fd1 ("list: remove prefetching from regular list
iterators").

I added the following patch for today:

>From 1a101eb2766057372006b1b487d05f40fe899478 Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 20 May 2011 16:08:48 +1000
Subject: [PATCH] include prefetch.h where needed

Commit e66eed651fd1 ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h.

fixes these build errors on sparc:

mm/page_alloc.c: In function '__free_pages_bootmem':
mm/page_alloc.c:704: error: implicit declaration of function 'prefetchw'
fs/dcache.c: In function '__d_lookup_rcu':
fs/dcache.c:1810: error: implicit declaration of function 'prefetch'
fs/inode.c: In function 'new_inode':
fs/inode.c:894: error: implicit declaration of function 'spin_lock_prefetch'
net/core/skbuff.c: In function '__alloc_skb':
net/core/skbuff.c:184: error: implicit declaration of function 'prefetchw'
In file included from net/ipv4/ip_forward.c:32:
include/net/udp.h: In function 'udp_csum_outgoing':
include/net/udp.h:141: error: implicit declaration of function 'prefetch'
In file included from net/ipv6/af_inet6.c:48:
include/net/udp.h: In function 'udp_csum_outgoing':
include/net/udp.h:141: error: implicit declaration of function 'prefetch'
net/unix/af_unix.c: In function 'unix_ioctl':
net/unix/af_unix.c:2066: error: implicit declaration of function 'prefetch'
In file included from net/sunrpc/xprtsock.c:44:
include/net/udp.h: In function 'udp_csum_outgoing':
include/net/udp.h:141: error: implicit declaration of function 'prefetch'
kernel/rcutiny.c: In function 'rcu_process_callbacks':
kernel/rcutiny.c:180: error: implicit declaration of function 'prefetch'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 fs/dcache.c            |    1 +
 fs/inode.c             |    1 +
 include/linux/skbuff.h |    1 +
 kernel/rcutiny.c       |    1 +
 mm/page_alloc.c        |    1 +
 net/core/skbuff.c      |    1 +
 6 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 22a0ef4..18b2a1f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -35,6 +35,7 @@
 #include <linux/hardirq.h>
 #include <linux/bit_spinlock.h>
 #include <linux/rculist_bl.h>
+#include <linux/prefetch.h>
 #include "internal.h"
 
 /*
diff --git a/fs/inode.c b/fs/inode.c
index 33c963d..c77081f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -26,6 +26,7 @@
 #include <linux/posix_acl.h>
 #include <linux/ima.h>
 #include <linux/cred.h>
+#include <linux/prefetch.h>
 #include "internal.h"
 
 /*
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 79aafbb..f963b8f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -29,6 +29,7 @@
 #include <linux/rcupdate.h>
 #include <linux/dmaengine.h>
 #include <linux/hrtimer.h>
+#include <linux/prefetch.h>
 
 /* Don't change this without changing skb_csum_unnecessary! */
 #define CHECKSUM_NONE 0
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 421abfd..7bbac7d 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -35,6 +35,7 @@
 #include <linux/init.h>
 #include <linux/time.h>
 #include <linux/cpu.h>
+#include <linux/prefetch.h>
 
 /* Controls for rcu_kthread() kthread, replacing RCU_SOFTIRQ used previously. */
 static struct task_struct *rcu_kthread_task;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44b3d7b..9d5498e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -54,6 +54,7 @@
 #include <trace/events/kmem.h>
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
+#include <linux/prefetch.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3e934fe..46cbd28 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -57,6 +57,7 @@
 #include <linux/init.h>
 #include <linux/scatterlist.h>
 #include <linux/errqueue.h>
+#include <linux/prefetch.h>
 
 #include <net/protocol.h>
 #include <net/dst.h>
-- 
1.7.5.1

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

^ permalink raw reply related

* Re: [PATCH net-next 1/2] tg3: Fix NETIF_F_LOOPBACK error
From: Michał Mirosław @ 2011-05-20  6:57 UTC (permalink / raw)
  To: Matt Carlson; +Cc: Mahesh Bandewar, David Miller, linux-netdev
In-Reply-To: <20110520015934.GA2258@mcarlson.broadcom.com>

2011/5/20 Matt Carlson <mcarlson@broadcom.com>:
> On Thu, May 19, 2011 at 06:15:18PM -0700, Mahesh Bandewar wrote:
>> On Thu, May 19, 2011 at 6:11 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
>> > Mahesh Bandewar noticed that the features cleanup in commit
>> > 0da0606f493c5cdab74bdcc96b12f4305ad94085, entitled
>> > "tg3: Consolidate all netdev feature assignments", mistakenly sets
>> > NETIF_F_LOOPBACK by default. ?This patch corrects the error.
>> >
>> > Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
>> > ---
>> > ?drivers/net/tg3.c | ? ?3 ++-
>> > ?1 files changed, 2 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
>> > index 012ce70..0b78c5d 100644
>> > --- a/drivers/net/tg3.c
>> > +++ b/drivers/net/tg3.c
>> > @@ -15080,6 +15080,8 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
>> > ? ? ? ? ? ? ? ? ? ? ? ?features |= NETIF_F_TSO_ECN;
>> > ? ? ? ?}
>> >
>> > + ? ? ? dev->features |= features;
>> > +
>> > ? ? ? ?/*
>> > ? ? ? ? * Add loopback capability only for a subset of devices that support
>> > ? ? ? ? * MAC-LOOPBACK. Eventually this need to be enhanced to allow INT-PHY
>> > @@ -15090,7 +15092,6 @@ static int __devinit tg3_init_one(struct pci_dev *pdev,
>> > ? ? ? ? ? ? ? ?/* Add the loopback capability */
>> > ? ? ? ? ? ? ? ?features |= NETIF_F_LOOPBACK;
>> >
>> > - ? ? ? dev->features |= features;
>> > ? ? ? ?dev->hw_features |= features;
>> > ? ? ? ?dev->vlan_features |= features;
>> I think this line should go up too. Otherwise newly created vlan
>> device(s) will have spurious loopback bit set.
> Yes.  You are right.  I thought vlan_features functioned like
> hw_features.

Probably NETIF_F_LOOPBACK should be forcibly set on VLAN devices when
underlying device has it enabled. Just a quick thought for discussion.

Best Regards,
Michał Mirosław

^ permalink raw reply

* [V3 Patch 1/3] netpoll: disable netpoll when enslave a device
From: Amerigo Wang @ 2011-05-20  7:39 UTC (permalink / raw)
  To: netdev; +Cc: WANG Cong, Neil Horman

V3: rename NETDEV_ENSLAVE to NETDEV_JOIN

Currently we do nothing when we enslave a net device which is running netconsole.
Neil pointed out that we may get weird results in such case, so let's disable
netpoll on the device being enslaved. I think it is too harsh to prevent
the device being ensalved if it is running netconsole.

By the way, this patch also removes the NETDEV_GOING_DOWN from netconsole
netdev notifier, because netpoll will check if the device is running or not
and we don't handle NETDEV_PRE_UP neither.

This patch is based on net-next-2.6.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>

---

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 088fd84..f4960f5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1640,6 +1640,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		}
 	}
 
+	call_netdevice_notifiers(NETDEV_JOIN, slave_dev);
+
 	/* If this is the first slave, then we need to set the master's hardware
 	 * address to be the same as the slave's. */
 	if (is_zero_ether_addr(bond->dev->dev_addr))
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index a83e101..4190786 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -621,11 +621,10 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	bool stopped = false;
 
 	if (!(event == NETDEV_CHANGENAME || event == NETDEV_UNREGISTER ||
-	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
+	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_JOIN))
 		goto done;
 
 	spin_lock_irqsave(&target_list_lock, flags);
-restart:
 	list_for_each_entry(nt, &target_list, list) {
 		netconsole_target_get(nt);
 		if (nt->np.dev == dev) {
@@ -633,6 +632,8 @@ restart:
 			case NETDEV_CHANGENAME:
 				strlcpy(nt->np.dev_name, dev->name, IFNAMSIZ);
 				break;
+			case NETDEV_BONDING_DESLAVE:
+			case NETDEV_JOIN:
 			case NETDEV_UNREGISTER:
 				/*
 				 * rtnl_lock already held
@@ -647,11 +648,7 @@ restart:
 					dev_put(nt->np.dev);
 					nt->np.dev = NULL;
 					netconsole_target_put(nt);
-					goto restart;
 				}
-				/* Fall through */
-			case NETDEV_GOING_DOWN:
-			case NETDEV_BONDING_DESLAVE:
 				nt->enabled = 0;
 				stopped = true;
 				break;
@@ -660,10 +657,21 @@ restart:
 		netconsole_target_put(nt);
 	}
 	spin_unlock_irqrestore(&target_list_lock, flags);
-	if (stopped && (event == NETDEV_UNREGISTER || event == NETDEV_BONDING_DESLAVE))
+	if (stopped) {
 		printk(KERN_INFO "netconsole: network logging stopped on "
-			"interface %s as it %s\n",  dev->name,
-			event == NETDEV_UNREGISTER ? "unregistered" : "released slaves");
+		       "interface %s as it ", dev->name);
+		switch (event) {
+		case NETDEV_UNREGISTER:
+			printk(KERN_CONT "unregistered\n");
+			break;
+		case NETDEV_BONDING_DESLAVE:
+			printk(KERN_CONT "released slaves\n");
+			break;
+		case NETDEV_JOIN:
+			printk(KERN_CONT "is joining a master device\n");
+			break;
+		}
+	}
 
 done:
 	return NOTIFY_DONE;
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 621dfa1..a577762 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -211,6 +211,7 @@ static inline int notifier_to_errno(int ret)
 #define NETDEV_UNREGISTER_BATCH 0x0011
 #define NETDEV_BONDING_DESLAVE  0x0012
 #define NETDEV_NOTIFY_PEERS	0x0013
+#define NETDEV_JOIN		0x0014
 
 #define SYS_DOWN	0x0001	/* Notify of system down */
 #define SYS_RESTART	SYS_DOWN
-- 
1.7.1


^ permalink raw reply related

* [Patch 2/3] bridge: call NETDEV_JOIN notifiers when add a slave
From: Amerigo Wang @ 2011-05-20  7:39 UTC (permalink / raw)
  To: netdev; +Cc: WANG Cong, Neil Horman
In-Reply-To: <1305877152-30970-1-git-send-email-amwang@redhat.com>

In the previous patch I added NETDEV_JOIN, now
we can notify netconsole when adding a device to a bridge too.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>

---
 net/bridge/br_if.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 5dbdfdf..d5147dd 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -338,6 +338,8 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	if (IS_ERR(p))
 		return PTR_ERR(p);
 
+	call_netdevice_notifiers(NETDEV_JOIN, dev);
+
 	err = dev_set_promiscuity(dev, 1);
 	if (err)
 		goto put_back;
-- 
1.7.1


^ permalink raw reply related

* [Patch 3/3] net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE
From: Amerigo Wang @ 2011-05-20  7:39 UTC (permalink / raw)
  To: netdev; +Cc: WANG Cong, Andy Gospodarek, Neil Horman
In-Reply-To: <1305877152-30970-1-git-send-email-amwang@redhat.com>

s/NETDEV_BONDING_DESLAVE/NETDEV_RELEASE/ as Andy suggested.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Neil Horman <nhorman@redhat.com>

---
 drivers/net/bonding/bond_main.c |    2 +-
 drivers/net/netconsole.c        |    6 +++---
 include/linux/notifier.h        |    2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f4960f5..6dc4284 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1974,7 +1974,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	block_netpoll_tx();
-	netdev_bonding_change(bond_dev, NETDEV_BONDING_DESLAVE);
+	netdev_bonding_change(bond_dev, NETDEV_RELEASE);
 	write_lock_bh(&bond->lock);
 
 	slave = bond_get_slave_by_dev(bond, slave_dev);
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 4190786..dfc8272 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -621,7 +621,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	bool stopped = false;
 
 	if (!(event == NETDEV_CHANGENAME || event == NETDEV_UNREGISTER ||
-	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_JOIN))
+	      event == NETDEV_RELEASE || event == NETDEV_JOIN))
 		goto done;
 
 	spin_lock_irqsave(&target_list_lock, flags);
@@ -632,7 +632,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 			case NETDEV_CHANGENAME:
 				strlcpy(nt->np.dev_name, dev->name, IFNAMSIZ);
 				break;
-			case NETDEV_BONDING_DESLAVE:
+			case NETDEV_RELEASE:
 			case NETDEV_JOIN:
 			case NETDEV_UNREGISTER:
 				/*
@@ -664,7 +664,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 		case NETDEV_UNREGISTER:
 			printk(KERN_CONT "unregistered\n");
 			break;
-		case NETDEV_BONDING_DESLAVE:
+		case NETDEV_RELEASE:
 			printk(KERN_CONT "released slaves\n");
 			break;
 		case NETDEV_JOIN:
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index a577762..c0688b0 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -209,7 +209,7 @@ static inline int notifier_to_errno(int ret)
 #define NETDEV_POST_TYPE_CHANGE	0x000F
 #define NETDEV_POST_INIT	0x0010
 #define NETDEV_UNREGISTER_BATCH 0x0011
-#define NETDEV_BONDING_DESLAVE  0x0012
+#define NETDEV_RELEASE		0x0012
 #define NETDEV_NOTIFY_PEERS	0x0013
 #define NETDEV_JOIN		0x0014
 
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH 14/18] virtio: add api for delayed callbacks
From: Rusty Russell @ 2011-05-20  7:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <20110519072412.GA31253-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, 19 May 2011 10:24:12 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, May 16, 2011 at 04:43:21PM +0930, Rusty Russell wrote:
> > On Sun, 15 May 2011 15:48:18 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > On Mon, May 09, 2011 at 03:27:33PM +0930, Rusty Russell wrote:
> > > > On Wed, 4 May 2011 23:52:33 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > Add an API that tells the other side that callbacks
> > > > > should be delayed until a lot of work has been done.
> > > > > Implement using the new used_event feature.
> > > > 
> > > > Since you're going to add a capacity query anyway, why not add the
> > > > threshold argument here?
> > > 
> > > I thought that if we keep the API kind of generic
> > > there might be more of a chance that future transports
> > > will be able to implement it. For example, with an
> > > old host we can't commit to a specific index.
> > 
> > No, it's always a hint anyway: you can be notified before the threshold
> > is reached.  But best make it explicit I think.
> > 
> > Cheers,
> > Rusty.
> 
> 
> I tried doing that and remembered the real reason I went for this API:
> 
> capacity is limited by descriptor table space, not
> used ring space: each entry in the used ring frees up multiple entries
> in the descriptor ring. Thus the ring can't provide
> callback after capacity is N: capacity is only available
> after we get bufs.

I think this indicates a problem, but I haven't reviewed your patches
except very cursorily.  I am slack...

> We could try and make the API pass in the number of freed bufs, however:
> - this is not really what virtio-net cares about (it cares about
>   capacity)

Yes, max sg elements and max requests are separate, though in the
current virtio implementation remaining sg <= remaining request slots.

That's why we currently feed back remaining descriptors to the driver,
not the number of request slots.

This implies that the thresholds should refer to descriptor numbers
(ie. wake me when there are this many descriptors freed), not the used
ring at all.  Which means we're barking up the wrong tree...

I think this needs more thought.

> - if the driver passes a number > number of outstanding bufs, it will
>   never get a callback. So to stay correct the driver will need to
>   track number of outstanding requests. The simpler API avoids that. 

I think the driver should simply say "wake me when you have this many
descriptors free".  And set it during initialization, rather than every
time.  The virtio_ring code should handle it from there.

Perhaps that can be done with the current technique, where the
virtio_ring makes an educated guess on when sufficient capacity will be
available...

Cheers,
Rusty.

^ permalink raw reply

* Re: [PATCHv2 00/14] virtio and vhost-net performance enhancements
From: Rusty Russell @ 2011-05-20  7:51 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <cover.1305846412.git.mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 20 May 2011 02:10:07 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> OK, here is the large patchset that implements the virtio spec update
> that I sent earlier (the spec itself needs a minor update, will send
> that out too next week, but I think we are on the same page here
> already). It supercedes the PUBLISH_USED_IDX patches I sent
> out earlier.
> 
> What will follow will be a patchset that actually includes 4 sets of
> patches.  I note below their status.  Please consider for 2.6.40, at
> least partially. Rusty, do you think it's feasible?

Erk.  I'm still unsure that we should be using ring capacity as the
thresholding mechanism, given that *descriptor* exhaustion is what we
actually face.

That said, I will review these thoroughly in 14 hours (Sat morning my
time).  Perhaps I can convince myself that it's not a problem, because
it *is* simpler...

> List of patches and what they do:
> 
> I) With the first patchset, we change virtio ring notification
> hand-off to work like the one in Xen -
> each side publishes an event index, the other one
> notifies when it reaches that value -
> With the one difference that event index starts at 0,
> same as request index (in xen event index starts at 1).
> 
> These are the patches in this set:
> virtio: event index interface
> virtio ring: inline function to check for events
> virtio_ring: support event idx feature
> vhost: support event index
> virtio_test: support event index
> 
> Changes in this part of the patchset from v1 - address comments by Rusty et al.
> 
> I tested this a lot with virtio net block and with the simulator and esp
> with the simulator it's easy to see drastic performance improvement
> here:
> 
> [virtio]# time ./virtio_test 
> spurious wakeus: 0x7
> 
> real    0m0.169s
> user    0m0.140s
> sys     0m0.019s
> [virtio]# time ./virtio_test --no-event-idx
> spurious wakeus: 0x11
> 
> real    0m0.649s
> user    0m0.295s
> sys     0m0.335s
> 
> And these patches are mostly unchanged from the very first version,
> changes being almost exclusively code cleanups.  So I consider this part
> the most stable, I strongly think these patches should go into 2.6.40.
> One extra reason besides performance is that maintaining
> them out of tree is very painful as guest/host ABI is affected.
> 
> II) Second set of patches: new apis and use in virtio_net
> With the indexes in place it becomes possibile to request an event after
> many requests (and not just on the next one as done now). This shall fix
> the TX queue overrun which currently triggers a storm of interrupts.
> 
> Another issue I tried to fix is capacity checks in virtio-net,
> there's a new API for that, and on top of that,
> I implemented a patch improving real-time characteristics
> of virtio_net
> 
> Thus we get the second patchset:
> virtio: add api for delayed callbacks
> virtio_net: delay TX callbacks
> virtio_ring: Add capacity check API
> virtio_net: fix TX capacity checks using new API
> virtio_net: limit xmit polling
> 
> This has some fixes that I posted previously applied,
> but otherwise ideantical to v1. I tried to change API
> for enable_cb_delayed as Rusty suggested but failed to do this.
> I think it's not possible to define cleanly.
> 
> These work fine for me, I think they can be merged for 2.6.40
> too but would be nice to hear back from Shirley, Tom, Krishna.

See other mail.

> III) There's also a patch that adds a tweak to virtio ring
> virtio: don't delay avail index update
> 
> This seems to help small message sizes where we are constantly draining
> the RX VQ.

This is independent.  If someone shows some benchmark improvement I'm
definitely happy to put this in .40, if nothing else.

> I'll need to benchmark this to be able to give any numbers
> with confidence, but I don't see how it can hurt anything.
> Thoughts?
> 
> IV) Last part is a set of patches to extend feature bits
> to 64 bit. I tested this by using feature bit 32.
> vhost: fix 64 bit features
> virtio_test: update for 64 bit features
> virtio: 64 bit features

Sweetness, but .41 material at this stage.

Thanks,
Rusty.

^ permalink raw reply

* Re: linux-next: build failure after merge of the tip tree (net tree interaction)
From: Jacek Luczak @ 2011-05-20  7:59 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	linux-next, linux-kernel, David Miller, netdev, Lai Jiangshan,
	Paul E. McKenney
In-Reply-To: <20110520141417.f49bf364.sfr@canb.auug.org.au>

2011/5/20 Stephen Rothwell <sfr@canb.auug.org.au>:
> Hi all,
>
> After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
>
> net/sctp/bind_addr.c: In function 'sctp_bind_addr_clean':
> net/sctp/bind_addr.c:148: error: 'sctp_local_addr_free' undeclared (first use in this function)
>
> Caused by commit 1231f0baa547 ("net,rcu: convert call_rcu
> (sctp_local_addr_free) to kfree_rcu()") interacting with commit
> c182f90bc1f2 ("SCTP: fix race between sctp_bind_addr_free() and
> sctp_bind_addr_conflict()") from the net tree.
>
> I applied the following patch as a merge fix:
>
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Fri, 20 May 2011 14:11:11 +1000
> Subject: [PATCH] net,rcu: convert another call to call_rcu(sctp_local_addr_free)
>
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> ---
>  net/sctp/bind_addr.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index 6338413..83e3011 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -145,7 +145,7 @@ static void sctp_bind_addr_clean(struct sctp_bind_addr *bp)
>        /* Empty the bind address list. */
>        list_for_each_entry_safe(addr, temp, &bp->address_list, list) {
>                list_del_rcu(&addr->list);
> -               call_rcu(&addr->rcu, sctp_local_addr_free);
> +               kfree_rcu(addr, rcu);
>                SCTP_DBG_OBJCNT_DEC(addr);
>        }
>  }
> --
> 1.7.5.1
>


Hi

as this is planned to be backported down to stable/longterm, kfree_rcu
was not used here. I guess in meantime the callback has been removed
while introducing kfree_rcu in sctp  module. I have a patch ready to
send for net-next-2.6 converting to kfree_rcu in sctp, only I've been
waiting for it jumping to the tree.

This patch is of course valid.

-Jacek

^ permalink raw reply

* [PATCH 4/3] rtnetlink: ignore NETDEV_RELEASE and NETDEV_JOIN event
From: Amerigo Wang @ 2011-05-20  9:06 UTC (permalink / raw)
  To: netdev; +Cc: WANG Cong
In-Reply-To: <1305877152-30970-1-git-send-email-amwang@redhat.com>

These two events are not expected to be caught by userspace.

Signed-off-by: WANG Cong <amwang@redhat.com>
---
 net/core/rtnetlink.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d2ba259..d1644e3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1956,6 +1956,8 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 	case NETDEV_GOING_DOWN:
 	case NETDEV_UNREGISTER:
 	case NETDEV_UNREGISTER_BATCH:
+	case NETDEV_RELEASE:
+	case NETDEV_JOIN:
 		break;
 	default:
 		rtmsg_ifinfo(RTM_NEWLINK, dev, 0);
-- 
1.7.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox