Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 41152] New: kernel 3.0 and above fails to handle vlan id 0 (802.1p) packets properly without hardware acceleration
From: Jiri Pirko @ 2011-08-17 17:50 UTC (permalink / raw)
  To: Mike Auty; +Cc: Andrew Morton, bugme-daemon, netdev
In-Reply-To: <20110817105950.GA4259@minipsycho.brq.redhat.com>

Wed, Aug 17, 2011 at 12:59:51PM CEST, jpirko@redhat.com wrote:
>Wed, Aug 17, 2011 at 08:36:15AM CEST, mike.auty@gmail.com wrote:
>>On 17/08/11 06:37, Jiri Pirko wrote:
>>> 
>>> Hi Mike. May I ask what NIC are you seeing the regression on?
>>> It may have something to do with dev->vlangrp and ndo_vlan_add/kill_vid.
>>> VID 0 was recently only added by the latter ones. So if driver only
>>> depended on dev->vlangrp, 0 was not there. This was changed recently by
>>> my "vlan cleanup" patches. It may work for you again on net-next today.
>>> 
>>
>>Hi there,
>>
>>I'm finding it on the following two NICs:
>>
>>02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit
>>Ethernet Adapter (rev b0)
>>
>>05:01.0 Network controller [0280]: Broadcom Corporation BCM4306
>>802.11b/g Wireless LAN Controller [14e4:4320] (rev 03)
>>
>>and (on a different machine):
>>
>>02:00.0 Network controller [0280]: Intel Corporation WiFi Link 6000
>>Series [8086:422c] (rev 35)
>
>I just obtained very similar card (8086:422b). Going to look at it right
>away.
>
>One more thing. What do you use to generate vlan0 tagged packets? I'm
>using pktgen with "vlan_id 0". Would you please try that it behaves the
>same for you?

I'm using following pktgen script:
http://pastebin.com/E3f4R8XY

On rx site with Intel wireless card I use following stap script to
observe incoming packets:
http://pastebin.com/VeXLhauu

All is looking good there. What do you use to look at incoming packets?

Jirka

>	
>>
>>The only one I have had any success with is:
>>
>>00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit
>>Network Connection [8086:10ea] (rev 05)
>>
>>Which I assume is because it has actual hardware acceleration.  I may be
>>able to put net-next on the Intel Wifi box for testing at some point,
>>but I don't know how soon I'll be able to do that.  Please let me know
>>whether that would be a worthwhile test, and if so I'll try and get it
>>done.  Thanks...
>>
>>Mike  5:)
>>

^ permalink raw reply

* Re: Linux vs FreeBSD Which is correct.
From: Stephen Clark @ 2011-08-17 17:20 UTC (permalink / raw)
  To: Emil S Tantilov; +Cc: Linux Kernel Network Developers
In-Reply-To: <CANh3MnZnyuvdiMyQTKKN=v-+y98FGbu+zA+=D4VnFL41AgRHpw@mail.gmail.com>

On 08/17/2011 01:10 PM, Emil S Tantilov wrote:
> On Wed, Aug 17, 2011 at 10:03 AM, Stephen Clark<sclark46@earthlink.net>  wrote:
>    
>> Hello List,
>>
>> Firstly thank you for your patience.  I am replacing a bunch of FreeBSD
>> vpn/fw/routers
>> with a Linux based system.
>>
>> I have run into a situation where if I ping our HQ the response comes back
>> on a different
>> interface than what the request went out on. FreeBSD is happy and says it
>> got the response,
>> Linux is not and gives no indication it got a response.
>>      
> Try enabling ARP filtering:
> echo 1>  /proc/sys/net/ipv4/conf/all/arp_filter
Just tried it - made no difference.

Thanks.
[root@L101111 ~]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filterL101111:~
$ ping -I 172.21.76.150 172.21.232.55
PING 172.21.232.55 (172.21.232.55) from 172.21.76.150 : 56(84) bytes of 
data.
^C
--- 172.21.232.55 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8238ms

L101111:~
$ sudo tcpdump -nli eth0 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
13:19:13.808262 IP 172.21.232.55 > 172.21.76.150: ICMP echo reply, id 
19545, seq 6, length 64
13:19:14.807541 IP 172.21.232.55 > 172.21.76.150: ICMP echo reply, id 
19545, seq 7, length 64
^C





^ permalink raw reply

* Re: Transfer Offer
From: Song Li @ 2011-08-18  8:26 UTC (permalink / raw)
  To: Recipients

l am a Staff of Hang Seng Bank HongKong, I do not know if we can work
together in transferring $19,500,000.USD from my bank to you account.
Finally if you are interested I shall provide you with more details.
Email: mr_song150@yahoo.com.hk

^ permalink raw reply

* Re: Linux vs FreeBSD Which is correct.
From: Stephen Clark @ 2011-08-17 19:44 UTC (permalink / raw)
  To: Rémi Denis-Courmont; +Cc: Linux Kernel Network Developers
In-Reply-To: <201108172017.48683.remi@remlab.net>

On 08/17/2011 01:17 PM, Rémi Denis-Courmont wrote:
> Le mercredi 17 août 2011 20:03:18 Stephen Clark, vous avez écrit :
>    
>> I have run into a situation where if I ping our HQ the response comes
>> back on a different
>> interface than what the request went out on. FreeBSD is happy and says
>> it got the response,
>> Linux is not and gives no indication it got a response.
>>
>> So is FreeBSD wrong or is Linux wrong?
>>      
> Most distributions enable reverse path filtering by default.
> It can be disabled:
> # echo -n 0>  /proc/sys/net/ipv4/conf/all/rp_filter
>
> But you should probably fix the configuration instead (e.g. /etc/sysctl.conf).
>
>    
Sorry that didn't help either.

-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)




^ permalink raw reply

* Re: Linux vs FreeBSD Which is correct.
From: Pascal Hambourg @ 2011-08-17 20:15 UTC (permalink / raw)
  To: sclark46; +Cc: Rémi Denis-Courmont, Linux Kernel Network Developers
In-Reply-To: <4E4C1A00.80207@earthlink.net>

Hello,

Stephen Clark a écrit :
> On 08/17/2011 01:17 PM, Rémi Denis-Courmont wrote:
>> Le mercredi 17 août 2011 20:03:18 Stephen Clark, vous avez écrit :
>>    
>>> I have run into a situation where if I ping our HQ the response comes
>>> back on a different
>>> interface than what the request went out on. FreeBSD is happy and says
>>> it got the response,
>>> Linux is not and gives no indication it got a response.
>>>
>>> So is FreeBSD wrong or is Linux wrong?

Neither is right or wrong. It partly depends whether you want to enforce
so-called "weak" or "strong" host model.

>> Most distributions enable reverse path filtering by default.
>> It can be disabled:
>> # echo -n 0>  /proc/sys/net/ipv4/conf/all/rp_filter
>>
>> But you should probably fix the configuration instead (e.g. /etc/sysctl.conf).
>>    
> Sorry that didn't help either.

Since some kernel version the logic of this sysctl has changed from
AND(all, $interface) to MAX(all, $interface). So you must set
net/ipv4/conf/$interface/rp_filter to 0 too to disable it.
Or set net/ipv4/conf/all/rp_filter to 2 to make it weaker.

^ permalink raw reply

* Re: [Bugme-new] [Bug 41312] New: Regression: some web services (e.g. Dropbox, Amazon Cloud Reader) stops working in 3.1-rc2
From: Andrew Morton @ 2011-08-17 20:41 UTC (permalink / raw)
  To: netdev; +Cc: bugme-daemon, salimma
In-Reply-To: <bug-41312-10286@https.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 17 Aug 2011 20:17:05 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=41312
> 
>            Summary: Regression: some web services (e.g. Dropbox, Amazon
>                     Cloud Reader) stops working in 3.1-rc2
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 3.1-rc2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: salimma@fedoraproject.org
>         Regression: Yes
> 
> 
> Created an attachment (id=69102)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=69102)
> (working) config for kernel 3.0
> 
> Dropbox and Amazon Cloud Reader works just fine on kernel 3.0, but the former
> does not work on 3.1-rc1 and 3.1-rc2, and the latter does not work on 3.1-rc2
> (I no longer have a copy of the 3.1-rc1 build I did).
> 
> With Dropbox, ps reports that the daemon process is in an uninterruptible
> state, and it cannot be killed (even with kill -9). Setting SELinux to
> permissive does not affect things (and Dropbox works with SELinux set to
> enforcing with kernel 3.0).
> 
> With Cloud Reader, on Chrome, it does not display any book -- I perpetually get
> the "refreshing" icon.
> 
> Will attach kernel configurations (I keep the 3.0 and 3.1-rc2 configs as
> similar as possible, by using 'make oldconfig' on 3.0 with 3.1-rc2's .config.
> 
> Reported on Dropbox's forum: http://forums.dropbox.com/topic.php?id=43140
> 


^ permalink raw reply

* RE: [PATCH 04/14] bna: Add Multiple Tx Queue Support
From: Rasesh Mody @ 2011-08-17 21:24 UTC (permalink / raw)
  To: John Fastabend
  Cc: Ben Hutchings, davem@davemloft.net, netdev@vger.kernel.org,
	Adapter Linux Open SRC Team, Gurunatha Karaje
In-Reply-To: <4E4B55FC.5070000@intel.com>

>From: John Fastabend [mailto:john.r.fastabend@intel.com]
>Sent: Tuesday, August 16, 2011 10:48 PM
>
>On 8/16/2011 7:14 PM, Rasesh Mody wrote:
>>> From: John Fastabend [mailto:john.r.fastabend@intel.com]
>>> Sent: Tuesday, August 16, 2011 5:18 PM
>>>
>>> On 8/16/2011 4:43 PM, Ben Hutchings wrote:
>>>> On Tue, 2011-08-16 at 16:32 -0700, Rasesh Mody wrote:
>>>>>> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>>>>>> Sent: Tuesday, August 16, 2011 2:49 PM
>>>>>>
>>>>>> On Tue, 2011-08-16 at 14:19 -0700, Rasesh Mody wrote:
>>>>>>> Change details:
>>>>>>>  - Add macros bna_prio_allow, bna_default_nw_prio,
>bna_iscsi_prio,
>>>>>>>    bna_is_iscsi_over_cee
>>>>>>>  - Added support for multipe Tx queues with a separate iSCSI Tx
>>> queue
>>>>>> based
>>>>>>>    on the default value of iSCSI port number. The feature is
>>> supported
>>>>>> based
>>>>>>>    on the underlying hardware and enabled for DCB (CEE) mode
>only.
>>>>>>>  - Allocate multiple TxQ resource in netdev
>>>>>>>  - Implement bnad_tx_select_queue() which enables the correct
>>>>>> selection of
>>>>>>>    TxQ Id (and tcb). This function is called either by the kernel
>>> to
>>>>>> channel
>>>>>>>    packets to the right TxQ
>>>>>>>  - bnad_tx_select_queue() returns priority, while only a few
>>> packets
>>>>>> during
>>>>>>>    transition could have wrong priority, all will be associated
>>> with a
>>>>>> valid
>>>>>>>    non-NULL tcb.
>>>>>>>  - Implement bnad_iscsi_tcb_get() and BNAD_IS_ISCSI_PKT() for
>iSCSI
>>>>>> packet
>>>>>>>    inspection and retrieval of tcb corresponding to the iSCSI
>>>>>> priority.
>>>>>>>  - Construction of priority indirection table to be used by bnad
>to
>>>>>> direct
>>>>>>>    packets into TxQs
>>>>>> [...]
>>>>>>
>>>>>> You probably should implement TX priority classes through the
>>>>>> ndo_setup_tc operation, not ndo_select_queue.
>>>>>
>>>>> The reason we went with ndo_select_queue is due to the need for
>>> mapping
>>>>> iSCSI packets (TCP port 3260) to a priority derived from DCB
>>> configuration.
>>>>> Here the iSCSI packets may not have any tc defined in the packet
>>> header.
>>>>
>>>> There is an skb priority, which is derived from the socket priority
>>>> option (SO_PRIORITY).  If you implement ndo_setup_tc then the
>>> networking
>>>> core will take care of mapping the skb priority onto a different
>queue
>>>> (or range of queues).
>>>>
>>>> I don't know whether the socket priority option is easily
>configurable
>>>> for the existing iSCSI implementations.  But looking at port numbers
>>>> really doesn't seem like the right way to do this.
>>>>
>>>> Ben.
>>>>
>>>
>>> At least open-iscsi supports DCB by listening to the RTNLGRP_DCB for
>>> events.
>>> These are generated with dcbnl_cee_notify and dcbnl_ieee_notify. To
>>> support
>>> this in your driver you need to call dcb_setapp() or
>dcb_ieee_setapp()
>>> to
>>> add the application data and follow this with the appropriate notify
>>> call.
>>> Then assuming you do the necessary tc setup work the stack will
>handle
>>> the
>>> mapping of priority to queues.
>>
>> This is how iSCSI over DCB feature is expected to works in BNA
>driver:-
>> FW running in the BNA adapter implements the DCB protocol. It learns
>the
>> iSCSI priority from the switch through iSCSI TLV exchange. BNA driver
>> extracts the iSCSI priority from the FW that needs to be used for
>iSCSI
>> packets.
>
>Up to here this is fine. What I was suggesting was to then use the
>dcb_setapp() routines to program the iSCSI TLV and generate an event
>so any user space applications listening to DCB events could handle
>this. It would also be nice if your driver notified user space of
>any PG or PFC changes as well. Then management agents (lldpad) could
>use the DCB attributes to make policy decisions.
>
>> For every outgoing packet, BNA driver does a TCP header
>> inspection to classify iSCSI packet and attach right 802.1q priority &
>> send it on the correct TX queue.
>>
>> This is expected to work with iSCSI applications that do not configure
>the
>> priority with SO_PRIORITY - here the iSCSI priority configuration
>actually
>> comes from the switch to the adapter.
>>
>
>Although this works I don't think it is optimal for a few reason. Your
>L2 driver is inspecting TCP frames which is bad layering IMHO. The
>iSCSI port number is hard coded into the driver so it will only work
>with the well known port number. Also it adds more driver specific
>behavior into select_queue() where I think the trend is to try to
>use select_queue() less not more.
>
>To address iSCSI applications that do not configure the priority
>we could either work on adding the DCB hooks needed in those
>applications. Or look at adding a hook in the qdisc layer to
>to do what your select_queue() hook does here. I started prototyping
>this awhile ago but this requires running the packet classifiers
>and associated actions before picking a queue assuming you want
>to use mq or mqprio. I hope to get back to this soon I have some
>more details to flush out wrt this and need to run it by some other
>folks to make sure its a sane idea.
>
>
>> The goal of this iSCSI priority is:
>> a) adapter applies prioritized scheduling for packets in its egress -
>to
>> guarantee minimum bandwidth as per ETS
>> b) packets are tagged with right priority so that switch can also
>identify
>> and guarantee BW on its egress.
>
>Correct. This is the same for other drivers that support DCB and
>use tc_setup as Ben suggested. See the bnx2x driver for an example
>that also uses a FW based LLDP engine and does this.
>
>>
>> Hope this explains.
>
>I think the one valid item is the unsupported applications but
>hopefully that can be addressed. Thanks for the details.

Can we go with this approach of select_queue() for now till we have DCB hooks added to the iSCSI applications or the qdisc layer? We will address this to use setup_tc() as the framework becomes available.

Thanks,
Rasesh


^ permalink raw reply

* Re: Transfer Offer
From: Song Li @ 2011-08-18 10:37 UTC (permalink / raw)
  To: Recipients

l am a Staff of Hang Seng Bank HongKong, I do not know if we can work
together in transferring $19,500,000.USD from my bank to you account.
Finally if you are interested I shall provide you with more details.
Email: mr_song150@yahoo.com.hk

^ permalink raw reply

* Re: Transfer Offer
From: Song Li @ 2011-08-18 11:33 UTC (permalink / raw)
  To: Recipients

l am a Staff of Hang Seng Bank HongKong, I do not know if we can work
together in transferring $19,500,000.USD from my bank to you account.
Finally if you are interested I shall provide you with more details.
Email: mr_song150@yahoo.com.hk

^ permalink raw reply

* Re: [PATCH V2 net-next 0/3] bnx2x: Message logging cleanups
From: David Miller @ 2011-08-17 22:47 UTC (permalink / raw)
  To: eilong; +Cc: joe, netdev, linux-kernel
In-Reply-To: <1313588528.7266.6.camel@lb-tlvb-eilong.il.broadcom.com>

From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Wed, 17 Aug 2011 16:42:08 +0300

> On Sun, 2011-08-14 at 15:16 -0700, Joe Perches wrote: 
>> Joe Perches (3):
>>   bnx2x: Remove local defines for %pM and mac address
>>   bnx2x: Coalesce pr_cont uses and fix DP typos
>>   bnx2x: Use pr_fmt and message logging cleanups
>> 
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x.h        |   67 ++++-----
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c    |   48 ++++---
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h    |    4 +-
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c    |    5 +-
>>  .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |   23 ++--
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c   |  153 ++++++++++----------
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   |   72 ++++-----
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c     |   67 ++++-----
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c  |   46 +++---
>>  9 files changed, 244 insertions(+), 241 deletions(-)
>> 
> 
> Thanks Joe, sorry it took a while - but ACK for all :)

All applied, thanks everyone.

^ permalink raw reply

* Re: [Bugme-new] [Bug 41152] New: kernel 3.0 and above fails to handle vlan id 0 (802.1p) packets properly without hardware acceleration
From: Mike Auty @ 2011-08-17 22:48 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Andrew Morton, bugme-daemon, netdev
In-Reply-To: <20110817105950.GA4259@minipsycho.brq.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

On 17/08/11 11:59, Jiri Pirko wrote:
> 
> I just obtained very similar card (8086:422b). Going to look at it right
> away.
> 
> One more thing. What do you use to generate vlan0 tagged packets? I'm
> using pktgen with "vlan_id 0". Would you please try that it behaves the
> same for you?
> 	

Sorry, I haven't been using pktgen.  I've got an actual device (a
Samsung android phone) which seems to tag all normal outbound packets
with this type of vlan tag.  I only discovered a month ago that I needed
the 8021q module to be able to talk to it, and then suddenly it stopped
working once I moved to the 3.0 kernel.

I might not have made it clear, but the packets are received (in so much
as the packet is definitely sent, and it's seen by tools such as
wireshark), but no reply is ever sent.  I've attached packet logs from
the 3.0.1 kernel and the 2.6.39.3 kernel.  Oddly the tagging only seems
to be used on the first SYN,ACK packet, but again I don't know enough
about the pipeline or what the Samsung kernel's doing to cause that.

I hope that's of some help?  I may be able to get systemtap support
rolled into my kernel tomorrow at some point, but if not then it will
have to wait until the weekend.  I don't know if that will provide
useful information for debugging this, but I am happy to run whatever
tests I can to figure this out...

Mike  5:)

[-- Attachment #2: vlan0-on-kernel-2.6.39.3.pcap --]
[-- Type: application/x-extension-pcap, Size: 687 bytes --]

[-- Attachment #3: vlan0-on-kernel-3.0.1.pcap --]
[-- Type: application/x-extension-pcap, Size: 786 bytes --]

^ permalink raw reply

* [Patch] Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path
From: Tim Chen @ 2011-08-17 23:56 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller, Al Viro, Eric W. Biederman
  Cc: Andi Kleen, Matt Fleming, linux-kernel, netdev

Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets.  However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket.  This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.

On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts.  Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm.  One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path.  Until we
destroy the skb, we already set a reference when we created the skb on
the send side.

On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb.  Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_send* functions.   One pair of
increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.

By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.

Hackbench command used for testing:
./hackbench 50 thread 2000

Reviews and comments are appreciated.

Thanks.

Tim

---

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
--
diff --git a/include/net/scm.h b/include/net/scm.h
index 745460f..e24ec88 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -53,6 +53,14 @@ static __inline__ void scm_set_cred(struct scm_cookie *scm,
 	cred_to_ucred(pid, cred, &scm->creds);
 }
 
+static __inline__ void scm_set_cred_noref(struct scm_cookie *scm,
+				    struct pid *pid, const struct cred *cred)
+{
+	scm->pid  = pid;
+	scm->cred = cred;
+	cred_to_ucred(pid, cred, &scm->creds);
+}
+
 static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
 {
 	put_pid(scm->pid);
@@ -70,6 +78,15 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
 		__scm_destroy(scm);
 }
 
+static __inline__ void scm_release(struct scm_cookie *scm)
+{
+	/* keep ref on pid and cred */
+	scm->pid = NULL;
+	scm->cred = NULL;
+	if (scm && scm->fp)
+		__scm_destroy(scm);
+}
+
 static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
 			       struct scm_cookie *scm)
 {
@@ -108,15 +125,14 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
 	if (!msg->msg_control) {
 		if (test_bit(SOCK_PASSCRED, &sock->flags) || scm->fp)
 			msg->msg_flags |= MSG_CTRUNC;
-		scm_destroy(scm);
+		if (scm && scm->fp)
+			__scm_destroy(scm);
 		return;
 	}
 
 	if (test_bit(SOCK_PASSCRED, &sock->flags))
 		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds);
 
-	scm_destroy_cred(scm);
-
 	scm_passec(sock, msg, scm);
 
 	if (!scm->fp)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0722a25..bd85c06 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1382,11 +1382,17 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 	return max_level;
 }
 
-static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds)
+static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb,
+			   bool send_fds, bool ref)
 {
 	int err = 0;
-	UNIXCB(skb).pid  = get_pid(scm->pid);
-	UNIXCB(skb).cred = get_cred(scm->cred);
+	if (ref) {
+		UNIXCB(skb).pid  = get_pid(scm->pid);
+		UNIXCB(skb).cred = get_cred(scm->cred);
+	} else {
+		UNIXCB(skb).pid  = scm->pid;
+		UNIXCB(skb).cred = scm->cred;
+	}
 	UNIXCB(skb).fp = NULL;
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
@@ -1411,7 +1417,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	int namelen = 0; /* fake GCC */
 	int err;
 	unsigned hash;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	long timeo;
 	struct scm_cookie tmp_scm;
 	int max_level;
@@ -1452,7 +1458,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	if (skb == NULL)
 		goto out;
 
-	err = unix_scm_to_skb(siocb->scm, skb, true);
+	err = unix_scm_to_skb(siocb->scm, skb, true, false);
 	if (err < 0)
 		goto out_free;
 	max_level = err + 1;
@@ -1548,7 +1554,7 @@ restart:
 	unix_state_unlock(other);
 	other->sk_data_ready(other, len);
 	sock_put(other);
-	scm_destroy(siocb->scm);
+	scm_release(siocb->scm);
 	return len;
 
 out_unlock:
@@ -1558,7 +1564,8 @@ out_free:
 out:
 	if (other)
 		sock_put(other);
-	scm_destroy(siocb->scm);
+	if (skb == NULL)
+		scm_destroy(siocb->scm);
 	return err;
 }
 
@@ -1570,7 +1577,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 	struct sock *sk = sock->sk;
 	struct sock *other = NULL;
 	int err, size;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 	int sent = 0;
 	struct scm_cookie tmp_scm;
 	bool fds_sent = false;
@@ -1635,11 +1642,14 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		size = min_t(int, size, skb_tailroom(skb));
 
 
-		/* Only send the fds in the first buffer */
-		err = unix_scm_to_skb(siocb->scm, skb, !fds_sent);
+		/* Only send the fds and no ref to pid in the first buffer */
+		if (fds_sent)
+			err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, true);
+		else
+			err = unix_scm_to_skb(siocb->scm, skb, !fds_sent, false);
 		if (err < 0) {
 			kfree_skb(skb);
-			goto out_err;
+			goto out;
 		}
 		max_level = err + 1;
 		fds_sent = true;
@@ -1647,7 +1657,7 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		err = memcpy_fromiovec(skb_put(skb, size), msg->msg_iov, size);
 		if (err) {
 			kfree_skb(skb);
-			goto out_err;
+			goto out;
 		}
 
 		unix_state_lock(other);
@@ -1664,7 +1674,10 @@ static int unix_stream_sendmsg(struct kiocb *kiocb, struct socket *sock,
 		sent += size;
 	}
 
-	scm_destroy(siocb->scm);
+	if (skb)
+		scm_release(siocb->scm);
+	else
+		scm_destroy(siocb->scm);
 	siocb->scm = NULL;
 
 	return sent;
@@ -1677,7 +1690,9 @@ pipe_err:
 		send_sig(SIGPIPE, current, 0);
 	err = -EPIPE;
 out_err:
-	scm_destroy(siocb->scm);
+	if (skb == NULL)
+		scm_destroy(siocb->scm);
+out:
 	siocb->scm = NULL;
 	return sent ? : err;
 }
@@ -1781,7 +1796,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 		siocb->scm = &tmp_scm;
 		memset(&tmp_scm, 0, sizeof(tmp_scm));
 	}
-	scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).cred);
+	scm_set_cred_noref(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).cred);
 	unix_set_secdata(siocb->scm, skb);
 
 	if (!(flags & MSG_PEEK)) {
@@ -1943,7 +1958,8 @@ static int unix_stream_recvmsg(struct kiocb *iocb, struct socket *sock,
 			}
 		} else {
 			/* Copy credentials */
-			scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).cred);
+			scm_set_cred_noref(siocb->scm, UNIXCB(skb).pid,
+					   UNIXCB(skb).cred);
 			check_creds = 1;
 		}
 




^ permalink raw reply related

* Re: [PATCH v2 0/4] rps: Look into tunnels to get hash
From: David Miller @ 2011-08-18  3:06 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <alpine.DEB.2.00.1108142135340.24599@pokey.mtv.corp.google.com>

From: Tom Herbert <therbert@google.com>
Date: Sun, 14 Aug 2011 22:44:33 -0700 (PDT)

> In this version fixed calls to sock_rps_save_rxhash in IPv6 with correct
> arguments and addressed comments from Eric Dumazet.
> 
> The patches in this series are to look into encapsulated packets
> to compute the rx hash for RPS.  Before these patches, all packets
> received on the same tunnel would wind up on the same RPS CPU-- this
> can lead to very poor loading, and make RFS ineffective on these
> packets.
> 
> This patch supports getting the rxhash out of a GRE encapsulated packet.

I'm sure we'll have some follow-on tweaks to this, but the basic
infrastructure looks fine to me.

Applied, thanks!

^ permalink raw reply

* Re: [PATCH] net: netdev-features.txt update to Documentation/networking/00-INDEX
From: David Miller @ 2011-08-18  3:09 UTC (permalink / raw)
  To: willemb; +Cc: netdev, mircus
In-Reply-To: <4E496526.8090201@google.com>

From: Willem de Bruijn <willemb@google.com>
Date: Mon, 15 Aug 2011 14:27:50 -0400

> Update netdev-features.txt entry in 00-INDEX to incorporate
> feedback by Michał Mirosław. A trivial update to my previous
> patch, but I was too late with preparing a v2. Will try to
> avoid having to send patches to my own patches in the future. 
> 
>   willem
> 
> Signed-off-by: Willem de Bruijn <willemb@gmail.com>

Patch corrupted by your email client, please fix and do a full
fresh resubmission.

Thanks.

^ permalink raw reply

* Re: [PATCH] net_sched: fix port mirror/redirect stats reporting
From: David Miller @ 2011-08-18  3:10 UTC (permalink / raw)
  To: jhs, hadi; +Cc: netdev
In-Reply-To: <1313421940.1798.2.camel@mojatatu>

From: jamal <hadi@cyberus.ca>
Date: Mon, 15 Aug 2011 11:25:40 -0400

> commit 6a51508a01671114c236602d071d4bff53422c60
> Author: Jamal Hadi Salim <jhs@mojatatu.com>
> Date:   Mon Aug 15 11:17:06 2011 -0400
> 
>     [PATCH] net_sched: fix port mirror/redirect stats reporting
>     
>     When a redirected or mirrored packet is dropped by the target
>     device we need to record statistics.
>     
>     Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6 V2] bonding:reset backup and inactive flag of slave
From: David Miller @ 2011-08-18  3:12 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev
In-Reply-To: <j2dnte$p7a$1@dough.gmane.org>

From: WANG Cong <xiyou.wangcong@gmail.com>
Date: Tue, 16 Aug 2011 12:30:39 +0000 (UTC)

> On Tue, 16 Aug 2011 09:57:35 +0800, Weiping Pan wrote:
> 
>> Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
>> bonding mode from active backup to round robin, some slaves are still
>> keeping "backup", and won't transmit packets.
>> 
>> As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around
>> that by removing the bond_is_active_slave() check, because the "backup"
>> flag is only meaningful for active backup mode.
>> 
>> But if we just simply ignore the bond_is_active_slave() check, the
>> transmission will work fine, but we can't maintain the correct value of
>> "backup" flag for each slaves, though it is meaningless for other mode
>> than active backup.
>> 
>> I'd like to reset "backup" and "inactive" flag in bond_open, thus we can
>> keep the correct value of them.
>> 
>> As for bond_is_active_slave(), I'd like to prepare another patch to
>> handle it.
>> 
>> V2:
>> Use C style comment.
>> Move read_lock(&bond->curr_slave_lock). Replace restore with reset, for
>> active backup mode, it means "restore", but for other modes, it means
>> "reset".
>> 
>> Signed-off-by: Weiping Pan <panweiping3@gmail.com>
> 
> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [patch net-2.6] via-velocity: remove non-tagged packet filtering
From: David Miller @ 2011-08-18  3:14 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, shemminger, romieu, stephan.baerwolf
In-Reply-To: <1313483614-2188-1-git-send-email-jpirko@redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 16 Aug 2011 10:33:34 +0200

> It's undesired to filter untagged packets at any time. So simply remove this.
> 
> Reported-by: Stephan Bärwolf <stephan.baerwolf@tu-ilmenau.de>
> Tested-by: Stephan Bärwolf <stephan.baerwolf@tu-ilmenau.de>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net/can/mscan: add __iomem annotations
From: David Miller @ 2011-08-18  3:16 UTC (permalink / raw)
  To: mkl; +Cc: netdev, w.sang, socketcan-core
In-Reply-To: <1313405834-10340-1-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Mon, 15 Aug 2011 12:57:14 +0200

> This patch fixes the following sparse warning by adding the missing
> __iomem annotation.
> 
> drivers/net/can/mscan/mscan.c:73:32: warning: incorrect type in argument 1 (different address spaces)
> drivers/net/can/mscan/mscan.c:73:32:    expected unsigned char volatile [noderef] [usertype] <asn:2>*addr
> drivers/net/can/mscan/mscan.c:73:32:    got unsigned char *<noident>
> 
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> ---
> 
> This patch applies so current net-next-2.6/master and can be pulled:
> 
> The following changes since commit b333293058aa2d401737c7246bce58f8ba00906d:
> 
>   qeth: add support for af_iucv HiperSockets transport (2011-08-13 01:10:17 -0700)
> 
> are available in the git repository at:
>   git://git.pengutronix.de/git/mkl/linux-2.6.git can/mscan

Pulled, thanks.

^ permalink raw reply

* Re: [patch net-next-2.6] bonding: use ndo_change_rx_flags callback
From: David Miller @ 2011-08-18  3:18 UTC (permalink / raw)
  To: jpirko; +Cc: mirqus, netdev, fubar, andy
In-Reply-To: <20110816131503.GA26958@minipsycho.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 16 Aug 2011 15:15:04 +0200

> V2:
> 
> Subject: [patch net-next-2.6 v2] bonding: use ndo_change_rx_flags callback
> 
> Benefit from use of ndo_change_rx_flags in handling change of promisc
> and allmulti. No need to store previous state locally.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> 
> v1->v2: fixed IFF_ALLMULTI/IFF_PROMISC c&p typo

Applied, thanks.

^ permalink raw reply

* Re: [patch net-next-2.6 0/3] net: remove ndo_set_multicast_list
From: David Miller @ 2011-08-18  3:22 UTC (permalink / raw)
  To: jpirko
  Cc: netdev, shemminger, eric.dumazet, kaber, yoshfuji, rdunlap,
	kuznet, jmorris, mirqus, jesse
In-Reply-To: <1313512142-3355-1-git-send-email-jpirko@redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 16 Aug 2011 18:28:59 +0200

> Now we have 2 callbacks exported to drivers to set rx filtering:
> ndo_set_multicast_list
> ndo_set_rx_mode
> 
> The second one was added with second unicast address list and only drivers
> that are able to handle unicast filtering should use that. 
> 
> Having these two together is confusing and as I found out in many drivers,
> developers are not using them properly. So just kill one of them.
> ndo_set_multicast_list gets the bullet because I believe that name "set_rx_mode"
> is more appropriate.
> 
> IFF_UNICAST_FLT flag is used so core know if driver handles unicast filtering.
> 
> applies cleanly on most recent net-next with "[patch net-next-2.6 v2] bonding: use ndo_change_rx_flags callback"

All applied, thanks Jiri!

^ permalink raw reply

* (unknown)
From: Catherine.Bellenfant @ 2011-08-18  3:04 UTC (permalink / raw)




Dear beneficiary,

This is to re-notify you of the $300,000.00 USD that was deposited
here in the western union office in your name is available for pickup.
Contact us via email for your M.T.C.N Numbers.

Contact Person:Mr. Allen Williams
Email: mrallenailliams@hotmail.com
Tel. +447024037299

^ permalink raw reply

* Re: [PATCH v13 0/6] flexcan: Add support for powerpc flexcan (freescale p1010)
From: David Miller @ 2011-08-18  3:36 UTC (permalink / raw)
  To: holt-sJ/iWh9BUns
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, B22300-KZfg59tc24xl57MIdRCFDg,
	galak-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w, mkl-bIcnvbaLZ9MEGnE8C9+IrQ,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, wg-5Yr1BZd7O62+XT7JhA+gdA
In-Reply-To: <1313551944-28603-1-git-send-email-holt-sJ/iWh9BUns@public.gmane.org>

From: Robin Holt <holt-sJ/iWh9BUns@public.gmane.org>
Date: Tue, 16 Aug 2011 22:32:18 -0500

> The following set of patches have been reviewed by the above parties and
> all comments have been integrated.  Although the patches stray from the
> drivers/net/can directory, the diversions are related to changes for
> the flexcan driver.
> 
> The patch set is based upon your net-next-2.6 tree's commit 6c37e46.
> 
> Could you please queue these up for the next appropriate push to Linus'
> tree?

Applied to net-next, thanks!

^ permalink raw reply

* Re: linux-next: boot test failure (net tree)
From: Stephen Rothwell @ 2011-08-18  5:22 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-next, linux-kernel, jeffrey.t.kirsher,
	Michael Neuling, Linus, Andrew Morton
In-Reply-To: <20110817105002.efebf85d08460ad99b14be8e@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

Hi Dave,

On Wed, 17 Aug 2011 10:50:02 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Tue, 16 Aug 2011 17:15:25 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> >
> > From: Stephen Rothwell <sfr@canb.auug.org.au>
> > Date: Wed, 17 Aug 2011 10:01:46 +1000
> > 
> > > In particular, CONFIG_TIGON3 newly depends on
> > > CONFIG_NET_VENDOR_BROADCOM which will no be selected when doing a
> > > "make oldconfig" from a working config.
> > 
> > When you type "make oldconfig" with an existing .config it prompts you
> > for those vendor guards, giving you ample opportunity to say yes to
> > them.
> 
> Which is a bit of a pain for automated systems.  Ours does (essentially):
> 
> yes '' | make oldconfig
> 
> We really don't want to select every new config item that comes along.

So, Mikey did a test for me (he was bitten by this today).  Just one of
the powerpc configs (pseries_defconfig which should, in theory, build a
kernel that will boot on almost all our POWER server machines) loses all
these drivers if you do a "make pseries_defconfig":

 -CONFIG_IBMVETH=y
 -CONFIG_PCNET32=y
 -CONFIG_E100=y
 -CONFIG_ACENIC=m
 -CONFIG_ACENIC_OMIT_TIGON_I=y
 -CONFIG_E1000=y
 -CONFIG_E1000E=y
 -CONFIG_BNX2=m
 -CONFIG_CHELSIO_T3=m
 -CONFIG_CHELSIO_T4=m
 -CONFIG_IXGBE=m
 -CONFIG_IXGB=m
 -CONFIG_S2IO=m
 -CONFIG_MYRI10GE=m
 -CONFIG_NETXEN_NIC=m
 -CONFIG_QLGE=m
 -CONFIG_BE2NET=m

That is just one of our deconfigs ... there are over 400 defconfigs in
the kernel and a lot of them will need to be updated.

Mikey asks:  Will Dave take these updates if we get Acks from the
maintainers?  :-)

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: linux-next: boot test failure (net tree)
From: Jeff Kirsher @ 2011-08-18  5:40 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, netdev@vger.kernel.org, linux-next@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Neuling, Linus,
	Andrew Morton
In-Reply-To: <20110818152214.661858a61496993aaef2c704@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

On Wed, 2011-08-17 at 22:22 -0700, Stephen Rothwell wrote:
> Hi Dave,
> 
> On Wed, 17 Aug 2011 10:50:02 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > On Tue, 16 Aug 2011 17:15:25 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> > >
> > > From: Stephen Rothwell <sfr@canb.auug.org.au>
> > > Date: Wed, 17 Aug 2011 10:01:46 +1000
> > > 
> > > > In particular, CONFIG_TIGON3 newly depends on
> > > > CONFIG_NET_VENDOR_BROADCOM which will no be selected when doing a
> > > > "make oldconfig" from a working config.
> > > 
> > > When you type "make oldconfig" with an existing .config it prompts you
> > > for those vendor guards, giving you ample opportunity to say yes to
> > > them.
> > 
> > Which is a bit of a pain for automated systems.  Ours does (essentially):
> > 
> > yes '' | make oldconfig
> > 
> > We really don't want to select every new config item that comes along.
> 
> So, Mikey did a test for me (he was bitten by this today).  Just one of
> the powerpc configs (pseries_defconfig which should, in theory, build a
> kernel that will boot on almost all our POWER server machines) loses all
> these drivers if you do a "make pseries_defconfig":
> 
>  -CONFIG_IBMVETH=y
>  -CONFIG_PCNET32=y
>  -CONFIG_E100=y
>  -CONFIG_ACENIC=m
>  -CONFIG_ACENIC_OMIT_TIGON_I=y
>  -CONFIG_E1000=y
>  -CONFIG_E1000E=y
>  -CONFIG_BNX2=m
>  -CONFIG_CHELSIO_T3=m
>  -CONFIG_CHELSIO_T4=m
>  -CONFIG_IXGBE=m
>  -CONFIG_IXGB=m
>  -CONFIG_S2IO=m
>  -CONFIG_MYRI10GE=m
>  -CONFIG_NETXEN_NIC=m
>  -CONFIG_QLGE=m
>  -CONFIG_BE2NET=m
> 
> That is just one of our deconfigs ... there are over 400 defconfigs in
> the kernel and a lot of them will need to be updated.
> 
> Mikey asks:  Will Dave take these updates if we get Acks from the
> maintainers?  :-)
> 

I am open to the idea.  I considered updating the the defconfig's, but
was not sure was the best way of doing the changes.  I was not sure
making these changes in Dave's net-next tree would not upset the arch
maintainer's, especially if there is a better tree(s) for the changes.

I am in the finishing up the patches for drivers/net/ {FDDI, PPP, SLIP,
HIPPI, etc.) and since I started this change, I would sign-up for making
the defconfig changes if that helps.

Cheers,
Jeff

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: linux-next: boot test failure (net tree)
From: David Miller @ 2011-08-18  5:53 UTC (permalink / raw)
  To: sfr
  Cc: netdev, linux-next, linux-kernel, jeffrey.t.kirsher, mikey,
	torvalds, akpm
In-Reply-To: <20110818152214.661858a61496993aaef2c704@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 18 Aug 2011 15:22:14 +1000

> Mikey asks:  Will Dave take these updates if we get Acks from the
> maintainers?  :-)

I'm more than happy to :-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox