Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: UDP regression with packets rates < 10k per sec
From: Christoph Lameter @ 2009-09-26 16:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <4ABCD18E.2010501@gmail.com>

On Fri, 25 Sep 2009, Eric Dumazet wrote:

> With my current kernel on receiver (linux-2.6 32bit + some networking
> patches + SLUB_STATS) mcast -n1 -b eth3 -r 2000 on the sender (2.6.29
> unfortunatly, I cannot change it at this moment)

My tests are all done using SLAB since I wanted to exclude differences as
much as possible.

>           <idle>-0     [000] 13580.504040: __kmalloc_track_caller <-__alloc_skb
>           <idle>-0     [000] 13580.504040: get_slab <-__kmalloc_track_caller
>           <idle>-0     [000] 13580.504040: __slab_alloc <-__kmalloc_track_caller
>
> hmm... is it normal we call deactivate_slab() ?

deactivate_slab() is called when the slab page we are allocating from runs
out of objects or is not fit for allocation (we want objects from a
different node etc).

>            mcast-21429 [000] 13580.504066: sock_rfree <-skb_release_head_state
>            mcast-21429 [000] 13580.504066: skb_release_data <-__kfree_skb
>            mcast-21429 [000] 13580.504066: kfree <-skb_release_data
>            mcast-21429 [000] 13580.504066: __slab_free <-kfree
>
>   is it normal we call add_partial() ?

add_partial is called when we free objects in a slab page that had all
objects allocated before. Then it can be used for allocations again and
must be tracked. Fully allocated slab pages are not tracked.

> Too many slowpaths for 4096 slabs ?
>
> $ cd /sys/kernel/slab/:t-0004096
> $ grep . *
> aliases:1
> align:8
> grep: alloc_calls: Function not implemented
> alloc_fastpath:416584 C0=234119 C1=52938 C2=18413 C3=4739 C4=49140 C5=14679 C6=39266 C7=3290
> alloc_from_partial:459402 C0=459391 C1=8 C2=1 C5=2
> alloc_refill:459619 C0=459460 C1=54 C2=1 C3=4 C4=52 C5=31 C6=2 C7=15
> alloc_slab:103 C0=45 C1=28 C3=1 C4=26 C5=1 C6=1 C7=1
> alloc_slowpath:459628 C0=459462 C1=55 C2=2 C3=5 C4=53 C5=32 C6=3 C7=16

Hmmm. That is a high percentage. All are refills. So there are remote
frees from the other processor to the slab page the first processor
allocates from. One processor allocates the object and then pushes it to
the other for freeing? Bad for caching.

> free_slowpath:657340 C0=656835 C1=119 C2=76 C3=36 C4=159 C5=69 C6=15 C7=31

Also quite high. Consistent with remote freeing of objects allocated on
the first processors. Objects are very short lived.

> comments :
> - lots of disable_bh()/enable_bh(), (enable_bh is slow), that could be avoided...
> - many ktime_get() calls
> - my HZ=1000 setup might be stupid on a CONFIG_NO_HZ=y kernel :(

There are 8 objects per slab (order 3 allocation). You could maybe tune
things a bit increasing the objects per slab which may cut down on the #
of deactivate_slab() calls and will also reduce the need for
add_partial(). But I dont see either call causing significant latencies.

both calls should happen on every 8th or so call of kfree/kmalloc.0

To increase the objects per slab to 32:

boot with slub_max_order=5

and then

echo 5 >/sys/slab/kmalloc-4096/order

Would require order 5 allocations. Dont expect it to make too much of a
difference.

^ permalink raw reply

* Re: TCP stack bug related to F-RTO?
From: Joe Cao @ 2009-09-26 16:53 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Ray Lee, Netdev, LKML
In-Reply-To: <Pine.LNX.4.64.0909252049260.1854@melkinkari.cs.Helsinki.FI>

Hi Ilpo,

Can you elaborate on "Some retransmission would happen here as step 3"?  When the second timeout happens, it will again go into FRTO and then retransmit the write queue head.

I looked at the patch (debian Bug#478062) that's probably what you mentioned as the fix. All it does was to exclude the SACK case when considering FRTO.  But in my case, SACK was enabled, as seen in the trace.

In other words, do we still have a problem with FRTO when SACK is enabled in the latest kernel?

Thanks,
Joe  

--- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <caoco2002@yahoo.com>
> Cc: "Ray Lee" <ray-lk@madrabbit.org>, "Netdev" <netdev@vger.kernel.org>, "LKML" <linux-kernel@vger.kernel.org>
> Date: Friday, September 25, 2009, 11:03 AM
> On Fri, 25 Sep 2009, Joe Cao wrote:
> 
> > Thanks for the reply!  Do you happen to know
> which patch fixed the 
> > problem?
> 
> You can find those patches from the stable queue git tree.
> I gave you hint 
> from what release to look from in the last mail. However,
> as 2.6.24 is 
> anyway obsolete my recommendation is that you should
> probably consider 
> upgrading to fix all the other bugs that have been found
> since 2.6.24 was 
> obsoleted.
> 
> > Is there a bug tracking system for linux kernel?
> 
> Nothing that knows everything about everything.
> 
> > I studied the FRTO code in latest kernel 2.6.31.. 
> It seems the problem 
> > is still there:  
> >
> > 1. Every time a RTO fires, because tcp_is_sackfrto(tp)
> returns 1, 
> > tcp_use_frto() returns true.  And the server tcp
> enters FRTO.
> > 2. After the head of write queue is retransmitted, two
> new data packets 
> > are transmitted, the server receives two
> dup-ACKs.  That will make the 
> > TCP enter tcp_enter_frto_loss(), however, that only
> rests ssthresh and 
> > some other fields.
> 
> Perhaps those other fields are far more important than you
> think... :-)
> ...Some retransmission would happen here as step 3.
> 
> > 3. After another longer RTO fires, because
> tcp_is_sackfrto(tp) returns 
> > 1, tcp_use_frto() again returns true.  The stack
> enters FRTO again.
> > 4. The above repeats and the stack couldn't
> retransmits the lost packets 
> > faster.
> > 
> > Is my understanding above correct?
> 
> ...No. All magic that happens in tcp_enter_frto_loss should
> be enough to 
> really do more than a single retransmission (that is, in
> any other than 
> 2.6.24 series kernel). There was an unfortunate bug in this
> area in 2.6.24 
> which basically undoed the effect of correct actions
> tcp_enter_frto_loss 
> did which effectively prevented tcp_xmit_retransmit_queue
> from doing its 
> part.
> 
> -- 
>  i.
> 
> --- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> wrote:
> 
> > From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Ray Lee" <ray-lk@madrabbit.org>
> > Cc: "Joe Cao" <caoco2002@yahoo.com>,
> "Netdev" <netdev@vger.kernel.org>,
> "LKML" <linux-kernel@vger.kernel.org>,
> jcaoco2002@yahoo.com
> > Date: Friday, September 25, 2009, 6:09 AM
> > On Thu, 24 Sep 2009, Ray Lee wrote:
> > 
> > > [adding netdev cc:]
> > > 
> > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <caoco2002@yahoo.com>
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have found the following behavior with
> > different versions of linux 
> > > > kernel. The attached pcap trace is collected
> with
> > server 
> > > > (192.168.0.13) running 2.6.24 and shows the
> > problem. Basically the 
> > > > behavior is like this: 
> > > >
> > > > 1. The client opens up a big window,
> > > > 2. the server sends 19 packets in a row (pkt
> #14-
> > #32 in the trace), but all of them are dropped due to
> some
> > congestion.
> > > > 3. The server hits RTO and retransmits pkt
> #14 in
> > #33
> > > > 4. The client immediately acks #33 (=#14),
> and
> > the server (seems like to enter F-RTO) expends the
> window
> > and sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> > 2*RTO; The client immediately sends two Dup-ack to #35
> and
> > #36.
> > > > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
> > > > 6. The client immediately acks #39 (=#15) in
> #40,
> > and the server continues to expand the window and
> sends two
> > *NEW* pkt #41 & #42. Now the timeoute is doubled
> to 4
> > *RTO.
> > > > 8. After 4*RTO timeout, #16 is
> retransmitted.
> > > > 9....
> > > > 10. The above steps repeats for
> retransmitting
> > pkt #16-#32 and each time the timeout is doubled.
> > > > 11. It takes a long long time to retransmit
> all
> > the lost packets and before that is done, the client
> sends a
> > RST because of timeout.
> > > >
> > > > The above behavior looks like F-RTO is in
> effect.
> >  And there seems to 
> > > > be a bug in the TCP's congestion control
> and
> > retransmission algorithm. 
> > > > Why doesn't the TCP on server (running
> 2.6.24)
> > enter the slow start? 
> > > > Why should the server take that long to
> recover
> > from a short period 
> > > > of packet loss?
> > > >
> > > > Has anyone else noticed similar problem
> before?
> >  If my analysis was 
> > > > wrong, can anyone gives me some pointers to
> > what's really wrong and 
> > > > how to fix it?
> > 
> > Yes, 2.6.24 is an obsoleted version with known wrongs
> in
> > FRTO 
> > implementation. Fixes never when to 2.6.24 stable
> series as
> > it was 
> > _already_ obsoleted when the problems where reported
> and
> > found. The 
> > correct fixes may be found from 2.6.25.7 (.7 iirc) and
> are
> > included from 
> > 2.6.26 onward too.
> > 
> > Just in case you happen to run ubuntu based kernel
> from
> > that era (of 
> > course you should be reporting the bug here then...),
> a
> > word of warning: 
> > it seemed nearly impossible for them to get a simple
> thing
> > like that 
> > fixed, I haven't been looking if they'd eventually
> come to
> > some sensible 
> > conclusion in that matter or is it still unresolved
> (or
> > e.g., closed 
> > without real resolution).
> 
> 


      

^ permalink raw reply

* Re: TCP stack bug related to F-RTO?
From: Ilpo Järvinen @ 2009-09-26 17:51 UTC (permalink / raw)
  To: Joe Cao; +Cc: Ray Lee, Netdev, LKML
In-Reply-To: <844721.77331.qm@web63403.mail.re1.yahoo.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6893 bytes --]

On Sat, 26 Sep 2009, Joe Cao wrote:

> Can you elaborate on "Some retransmission would happen here as step 3"?  
> When the second timeout happens, it will again go into FRTO and then 
> retransmit the write queue head.

Why do you think that the second RTO will happen with anything else than 
with 2.6.24. And it's perfectly ok to go into FRTO for the second time.

> I looked at the patch (debian Bug#478062) that's probably what you 
> mentioned as the fix. All it does was to exclude the SACK case when 
> considering FRTO.  But in my case, SACK was enabled, as seen in the 
> trace.

You should be looking from where I said rather than picking up your own 
sources and assuming that they'll tell you all the story :-). In fact, 
there are two fixes that were made in a row and one workaround in the
same timeframe. ...And you managed to pick the wrong one of the fixes, so 
I kind of understand why you got confused :-).

> In other words, do we still have a problem with FRTO when SACK is 
> enabled in the latest kernel?

For sure we might have all kinds of problems no one has yet 
noticed/reported :-). ...However, it seems that this particular problem 
your trace is showing is solved. Can you please test with a fixed kernel 
before coming back here with these claims.


-- 
 i.

--- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <caoco2002@yahoo.com>
> Cc: "Ray Lee" <ray-lk@madrabbit.org>, "Netdev" <netdev@vger.kernel.org>, "LKML" <linux-kernel@vger.kernel.org>
> Date: Friday, September 25, 2009, 11:03 AM
> On Fri, 25 Sep 2009, Joe Cao wrote:
> 
> > Thanks for the reply!  Do you happen to know
> which patch fixed the 
> > problem?
> 
> You can find those patches from the stable queue git tree.
> I gave you hint 
> from what release to look from in the last mail. However,
> as 2.6.24 is 
> anyway obsolete my recommendation is that you should
> probably consider 
> upgrading to fix all the other bugs that have been found
> since 2.6.24 was 
> obsoleted.
> 
> > Is there a bug tracking system for linux kernel?
> 
> Nothing that knows everything about everything.
> 
> > I studied the FRTO code in latest kernel 2.6.31.. 
> It seems the problem 
> > is still there:  
> >
> > 1. Every time a RTO fires, because tcp_is_sackfrto(tp)
> returns 1, 
> > tcp_use_frto() returns true.  And the server tcp
> enters FRTO.
> > 2. After the head of write queue is retransmitted, two
> new data packets 
> > are transmitted, the server receives two
> dup-ACKs.  That will make the 
> > TCP enter tcp_enter_frto_loss(), however, that only
> rests ssthresh and 
> > some other fields.
> 
> Perhaps those other fields are far more important than you
> think... :-)
> ...Some retransmission would happen here as step 3.
> 
> > 3. After another longer RTO fires, because
> tcp_is_sackfrto(tp) returns 
> > 1, tcp_use_frto() again returns true.  The stack
> enters FRTO again.
> > 4. The above repeats and the stack couldn't
> retransmits the lost packets 
> > faster.
> > 
> > Is my understanding above correct?
> 
> ...No. All magic that happens in tcp_enter_frto_loss should
> be enough to 
> really do more than a single retransmission (that is, in
> any other than 
> 2.6.24 series kernel). There was an unfortunate bug in this
> area in 2.6.24 
> which basically undoed the effect of correct actions
> tcp_enter_frto_loss 
> did which effectively prevented tcp_xmit_retransmit_queue
> from doing its 
> part.
> 
> -- 
>  i.
> 
> --- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> wrote:
> 
> > From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Ray Lee" <ray-lk@madrabbit.org>
> > Cc: "Joe Cao" <caoco2002@yahoo.com>,
> "Netdev" <netdev@vger.kernel.org>,
> "LKML" <linux-kernel@vger.kernel.org>,
> jcaoco2002@yahoo.com
> > Date: Friday, September 25, 2009, 6:09 AM
> > On Thu, 24 Sep 2009, Ray Lee wrote:
> > 
> > > [adding netdev cc:]
> > > 
> > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <caoco2002@yahoo.com>
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have found the following behavior with
> > different versions of linux 
> > > > kernel. The attached pcap trace is collected
> with
> > server 
> > > > (192.168.0.13) running 2.6.24 and shows the
> > problem. Basically the 
> > > > behavior is like this: 
> > > >
> > > > 1. The client opens up a big window,
> > > > 2. the server sends 19 packets in a row (pkt
> #14-
> > #32 in the trace), but all of them are dropped due to
> some
> > congestion.
> > > > 3. The server hits RTO and retransmits pkt
> #14 in
> > #33
> > > > 4. The client immediately acks #33 (=#14),
> and
> > the server (seems like to enter F-RTO) expends the
> window
> > and sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> > 2*RTO; The client immediately sends two Dup-ack to #35
> and
> > #36.
> > > > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
> > > > 6. The client immediately acks #39 (=#15) in
> #40,
> > and the server continues to expand the window and
> sends two
> > *NEW* pkt #41 & #42. Now the timeoute is doubled
> to 4
> > *RTO.
> > > > 8. After 4*RTO timeout, #16 is
> retransmitted.
> > > > 9....
> > > > 10. The above steps repeats for
> retransmitting
> > pkt #16-#32 and each time the timeout is doubled.
> > > > 11. It takes a long long time to retransmit
> all
> > the lost packets and before that is done, the client
> sends a
> > RST because of timeout.
> > > >
> > > > The above behavior looks like F-RTO is in
> effect.
> >  And there seems to 
> > > > be a bug in the TCP's congestion control
> and
> > retransmission algorithm. 
> > > > Why doesn't the TCP on server (running
> 2.6.24)
> > enter the slow start? 
> > > > Why should the server take that long to
> recover
> > from a short period 
> > > > of packet loss?
> > > >
> > > > Has anyone else noticed similar problem
> before?
> >  If my analysis was 
> > > > wrong, can anyone gives me some pointers to
> > what's really wrong and 
> > > > how to fix it?
> > 
> > Yes, 2.6.24 is an obsoleted version with known wrongs
> in
> > FRTO 
> > implementation. Fixes never when to 2.6.24 stable
> series as
> > it was 
> > _already_ obsoleted when the problems where reported
> and
> > found. The 
> > correct fixes may be found from 2.6.25.7 (.7 iirc) and
> are
> > included from 
> > 2.6.26 onward too.
> > 
> > Just in case you happen to run ubuntu based kernel
> from
> > that era (of 
> > course you should be reporting the bug here then...),
> a
> > word of warning: 
> > it seemed nearly impossible for them to get a simple
> thing
> > like that 
> > fixed, I haven't been looking if they'd eventually
> come to
> > some sensible 
> > conclusion in that matter or is it still unresolved
> (or
> > e.g., closed 
> > without real resolution).

^ permalink raw reply

* (unknown), 
From: Alvin Baptiste @ 2009-09-26 18:22 UTC (permalink / raw)
  To: netdev

unsubscribe

^ permalink raw reply

* [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Arjan van de Ven @ 2009-09-26 18:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926204951.424e567e@infradead.org>

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
CC: netdev@vger.kernel.org

The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


diff --git a/net/socket.c b/net/socket.c
index 49917a1..13a8d67 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
 	unsigned long a[6];
 	unsigned long a0, a1;
 	int err;
+	unsigned int len;
 
 	if (call < 1 || call > SYS_ACCEPT4)
 		return -EINVAL;
 
+	len = nargs[call];
+	if (len > 6)
+		return -EINVAL;
+
 	/* copy_from_user should be SMP safe. */
-	if (copy_from_user(a, args, nargs[call]))
+	if (copy_from_user(a, args, len))
 		return -EFAULT;
 
 	audit_socketcall(nargs[call] / sizeof(unsigned long), a);


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* RE: [PATCH v2 0/2] cxgb3/cxgb3i: added support of private MAC address and provisioning packet handler for iSCSI
From: Karen Xie @ 2009-09-26 18:55 UTC (permalink / raw)
  To: David Miller
  Cc: michaelc, James.Bottomley, Steve Wise, Divy Le Ray, Rakesh Ranjan,
	linux-scsi, open-iscsi, linux-kernel, netdev
In-Reply-To: <20090926.001646.06991481.davem@davemloft.net>

Thanks, understood.

We are submitting a new version that would be isolated to changes to the
net driver only and would not affect scsi driver.

Thanks a lot.
Karen

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Saturday, September 26, 2009 12:17 AM
To: Karen Xie
Cc: michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com; Steve
Wise; Divy Le Ray; Rakesh Ranjan; linux-scsi@vger.kernel.org;
open-iscsi@googlegroups.com; linux-kernel@vger.kernel.org;
netdev@vger.kernel.org
Subject: Re: [PATCH v2 0/2] cxgb3/cxgb3i: added support of private MAC
address and provisioning packet handler for iSCSI

From: "Karen Xie" <kxie@chelsio.com>
Date: Fri, 25 Sep 2009 15:34:22 -0700

> Hmm, I am wondering how could this merge activity to be coordinated?
If
> only the driver/scsi change is merged, then it won't compile either,
> since it requires the driver/net change.

That's rediculious, frankly.

Since they are two seperate changes you are knowingly creating
a bisection point that will not work.  That's wrong.

You need to split up the changes so that each and every one of them
are independant and the tree can be checked out at either of them and
everything can be expected to work.

^ permalink raw reply

* [PATCH v3 net-next-2.6] cxgb3: Added private MAC address and provisioning packet handler for iSCSI
From: kxie @ 2009-09-26 19:03 UTC (permalink / raw)
  To: davem
  Cc: swise, divy, rakesh, kxie, michaelc, James.Bottomley,
	linux-kernel, netdev

From: Karen Xie <kxie@chelsio.com>

[PATCH v3 net-next-2.6] cxgb3: Added private MAC address and provisioning packet handler for iSCSI

This patch added support of private MAC address per port and provisioning
packet handler for iSCSI traffic only.

Acked-by: Karen Xie <kxie@chelsio.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Rakesh Ranjan <rakesh@chelsio.com>
---

 drivers/net/cxgb3/adapter.h    |   16 ++++++++++++++++
 drivers/net/cxgb3/cxgb3_main.c |   22 ++++++++++++++++++----
 drivers/net/cxgb3/sge.c        |   28 +++++++++++++++++++---------
 3 files changed, 53 insertions(+), 13 deletions(-)


diff --git a/drivers/net/cxgb3/adapter.h b/drivers/net/cxgb3/adapter.h
index 2b1aea6..463633e 100644
--- a/drivers/net/cxgb3/adapter.h
+++ b/drivers/net/cxgb3/adapter.h
@@ -48,12 +48,27 @@
 struct vlan_group;
 struct adapter;
 struct sge_qset;
+struct port_info;
 
 enum {			/* rx_offload flags */
 	T3_RX_CSUM	= 1 << 0,
 	T3_LRO		= 1 << 1,
 };
 
+enum {
+	LAN_MAC_IDX	= 0,
+	SAN_MAC_IDX,
+
+	MAX_MAC_IDX
+};
+
+struct iscsi_config {
+	__u8	mac_addr[ETH_ALEN];
+	__u32	flags;
+	int (*send)(struct port_info *pi, struct sk_buff **skb);
+	int (*recv)(struct port_info *pi, struct sk_buff *skb);
+};
+
 struct port_info {
 	struct adapter *adapter;
 	struct vlan_group *vlan_grp;
@@ -68,6 +83,7 @@ struct port_info {
 	struct net_device_stats netstats;
 	int activity;
 	__be32 iscsi_ipv4addr;
+	struct iscsi_config iscsic;
 
 	int link_fault; /* link fault was detected */
 };
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 34e776c..c9113d3 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -344,8 +344,10 @@ static void link_start(struct net_device *dev)
 
 	init_rx_mode(&rm, dev, dev->mc_list);
 	t3_mac_reset(mac);
+	t3_mac_set_num_ucast(mac, MAX_MAC_IDX);
 	t3_mac_set_mtu(mac, dev->mtu);
-	t3_mac_set_address(mac, 0, dev->dev_addr);
+	t3_mac_set_address(mac, LAN_MAC_IDX, dev->dev_addr);
+	t3_mac_set_address(mac, SAN_MAC_IDX, pi->iscsic.mac_addr);
 	t3_mac_set_rx_mode(mac, &rm);
 	t3_link_start(&pi->phy, mac, &pi->link_config);
 	t3_mac_enable(mac, MAC_DIRECTION_RX | MAC_DIRECTION_TX);
@@ -903,6 +905,7 @@ static inline int offload_tx(struct t3cdev *tdev, struct sk_buff *skb)
 static int write_smt_entry(struct adapter *adapter, int idx)
 {
 	struct cpl_smt_write_req *req;
+	struct port_info *pi = netdev_priv(adapter->port[idx]);
 	struct sk_buff *skb = alloc_skb(sizeof(*req), GFP_KERNEL);
 
 	if (!skb)
@@ -913,8 +916,8 @@ static int write_smt_entry(struct adapter *adapter, int idx)
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SMT_WRITE_REQ, idx));
 	req->mtu_idx = NMTUS - 1;	/* should be 0 but there's a T3 bug */
 	req->iff = idx;
-	memset(req->src_mac1, 0, sizeof(req->src_mac1));
 	memcpy(req->src_mac0, adapter->port[idx]->dev_addr, ETH_ALEN);
+	memcpy(req->src_mac1, pi->iscsic.mac_addr, ETH_ALEN);
 	skb->priority = 1;
 	offload_tx(&adapter->tdev, skb);
 	return 0;
@@ -2516,7 +2519,7 @@ static int cxgb_set_mac_addr(struct net_device *dev, void *p)
 		return -EINVAL;
 
 	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
-	t3_mac_set_address(&pi->mac, 0, dev->dev_addr);
+	t3_mac_set_address(&pi->mac, LAN_MAC_IDX, dev->dev_addr);
 	if (offload_running(adapter))
 		write_smt_entry(adapter, pi->port_id);
 	return 0;
@@ -2654,7 +2657,7 @@ static void check_t3b2_mac(struct adapter *adapter)
 			struct cmac *mac = &p->mac;
 
 			t3_mac_set_mtu(mac, dev->mtu);
-			t3_mac_set_address(mac, 0, dev->dev_addr);
+			t3_mac_set_address(mac, LAN_MAC_IDX, dev->dev_addr);
 			cxgb_set_rxmode(dev);
 			t3_link_start(&p->phy, mac, &p->link_config);
 			t3_mac_enable(mac, MAC_DIRECTION_RX | MAC_DIRECTION_TX);
@@ -3112,6 +3115,14 @@ static const struct net_device_ops cxgb_netdev_ops = {
 #endif
 };
 
+static void __devinit cxgb3_init_iscsi_mac(struct net_device *dev)
+{
+	struct port_info *pi = netdev_priv(dev);
+
+	memcpy(pi->iscsic.mac_addr, dev->dev_addr, ETH_ALEN);
+	pi->iscsic.mac_addr[3] |= 0x80;
+}
+
 static int __devinit init_one(struct pci_dev *pdev,
 			      const struct pci_device_id *ent)
 {
@@ -3270,6 +3281,9 @@ static int __devinit init_one(struct pci_dev *pdev,
 		goto out_free_dev;
 	}
 
+	for_each_port(adapter, i)
+		cxgb3_init_iscsi_mac(adapter->port[i]);
+
 	/* Driver's ready. Reflect it on LEDs */
 	t3_led_ready(adapter);
 
diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c
index f866128..b7f4ee4 100644
--- a/drivers/net/cxgb3/sge.c
+++ b/drivers/net/cxgb3/sge.c
@@ -1946,10 +1946,9 @@ static void restart_tx(struct sge_qset *qs)
  *	Check if the ARP request is probing the private IP address
  *	dedicated to iSCSI, generate an ARP reply if so.
  */
-static void cxgb3_arp_process(struct adapter *adapter, struct sk_buff *skb)
+static void cxgb3_arp_process(struct port_info *pi, struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
-	struct port_info *pi;
 	struct arphdr *arp;
 	unsigned char *arp_ptr;
 	unsigned char *sha;
@@ -1972,12 +1971,11 @@ static void cxgb3_arp_process(struct adapter *adapter, struct sk_buff *skb)
 	arp_ptr += dev->addr_len;
 	memcpy(&tip, arp_ptr, sizeof(tip));
 
-	pi = netdev_priv(dev);
 	if (tip != pi->iscsi_ipv4addr)
 		return;
 
 	arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
-		 dev->dev_addr, sha);
+		 pi->iscsic.mac_addr, sha);
 
 }
 
@@ -1986,6 +1984,19 @@ static inline int is_arp(struct sk_buff *skb)
 	return skb->protocol == htons(ETH_P_ARP);
 }
 
+static void cxgb3_process_iscsi_prov_pack(struct port_info *pi,
+					struct sk_buff *skb)
+{
+	if (is_arp(skb)) {
+		cxgb3_arp_process(pi, skb);
+		return;
+	}
+
+	if (pi->iscsic.recv)
+		pi->iscsic.recv(pi, skb);
+
+}
+
 /**
  *	rx_eth - process an ingress ethernet packet
  *	@adap: the adapter
@@ -2024,13 +2035,12 @@ static void rx_eth(struct adapter *adap, struct sge_rspq *rq,
 				vlan_gro_receive(&qs->napi, grp,
 						 ntohs(p->vlan), skb);
 			else {
-				if (unlikely(pi->iscsi_ipv4addr &&
-				    is_arp(skb))) {
+				if (unlikely(pi->iscsic.flags)) {
 					unsigned short vtag = ntohs(p->vlan) &
 								VLAN_VID_MASK;
 					skb->dev = vlan_group_get_device(grp,
 									 vtag);
-					cxgb3_arp_process(adap, skb);
+					cxgb3_process_iscsi_prov_pack(pi, skb);
 				}
 				__vlan_hwaccel_rx(skb, grp, ntohs(p->vlan),
 					  	  rq->polling);
@@ -2041,8 +2051,8 @@ static void rx_eth(struct adapter *adap, struct sge_rspq *rq,
 		if (lro)
 			napi_gro_receive(&qs->napi, skb);
 		else {
-			if (unlikely(pi->iscsi_ipv4addr && is_arp(skb)))
-				cxgb3_arp_process(adap, skb);
+			if (unlikely(pi->iscsic.flags))
+				cxgb3_process_iscsi_prov_pack(pi, skb);
 			netif_receive_skb(skb);
 		}
 	} else

^ permalink raw reply related

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Cyrill Gorcunov @ 2009-09-26 19:01 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926205432.24aa1023@infradead.org>

[Arjan van de Ven - Sat, Sep 26, 2009 at 08:54:32PM +0200]
| From: Arjan van de Ven <arjan@linux.intel.com>
| Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
| CC: netdev@vger.kernel.org
| 
| The sys_socketcall() function has a very clever system for the copy
| size of its arguments. Unfortunately, gcc cannot deal with this in
| terms of proving that the copy_from_user() is then always in bounds.
| This is the last (well 9th of this series, but last in the kernel) such
| case around.
| 
| With this patch, we can turn on code to make having the boundary provably
| right for the whole kernel, and detect introduction of new security
| accidents of this type early on.
| 
| Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
| 
| 
| diff --git a/net/socket.c b/net/socket.c
| index 49917a1..13a8d67 100644
| --- a/net/socket.c
| +++ b/net/socket.c
| @@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
|  	unsigned long a[6];
|  	unsigned long a0, a1;
|  	int err;
| +	unsigned int len;
|  
|  	if (call < 1 || call > SYS_ACCEPT4)
|  		return -EINVAL;
|  
| +	len = nargs[call];
| +	if (len > 6)

Hi Arjan, wouldn't ARRAY_SIZE suffice beter there?
Or I miss something?

| +		return -EINVAL;
| +
|  	/* copy_from_user should be SMP safe. */
| -	if (copy_from_user(a, args, nargs[call]))
| +	if (copy_from_user(a, args, len))
|  		return -EFAULT;
|  
|  	audit_socketcall(nargs[call] / sizeof(unsigned long), a);
| 
| 
| -- 
| Arjan van de Ven 	Intel Open Source Technology Centre
| For development, discussion and tips for power savings, 
| visit http://www.lesswatts.org
|

	-- Cyrill

^ permalink raw reply

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Arjan van de Ven @ 2009-09-26 19:05 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926190103.GB4356@lenovo>

On Sat, 26 Sep 2009 23:01:03 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> [Arjan van de Ven - Sat, Sep 26, 2009 at 08:54:32PM +0200]
> | From: Arjan van de Ven <arjan@linux.intel.com>
> | Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
> | CC: netdev@vger.kernel.org
> | 
> | The sys_socketcall() function has a very clever system for the copy
> | size of its arguments. Unfortunately, gcc cannot deal with this in
> | terms of proving that the copy_from_user() is then always in bounds.
> | This is the last (well 9th of this series, but last in the kernel)
> such | case around.
> | 
> | With this patch, we can turn on code to make having the boundary
> provably | right for the whole kernel, and detect introduction of new
> security | accidents of this type early on.
> | 
> | Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> | 
> | 
> | diff --git a/net/socket.c b/net/socket.c
> | index 49917a1..13a8d67 100644
> | --- a/net/socket.c
> | +++ b/net/socket.c
> | @@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call,
> unsigned long __user *, args) |  	unsigned long a[6];
> |  	unsigned long a0, a1;
> |  	int err;
> | +	unsigned int len;
> |  
> |  	if (call < 1 || call > SYS_ACCEPT4)
> |  		return -EINVAL;
> |  
> | +	len = nargs[call];
> | +	if (len > 6)
> 
> Hi Arjan, wouldn't ARRAY_SIZE suffice beter there?
> Or I miss something?

yeah you missed that I screwed up ;(

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
CC: netdev@vger.kernel.org

The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


diff --git a/net/socket.c b/net/socket.c
index 49917a1..13a8d67 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
 	unsigned long a[6];
 	unsigned long a0, a1;
 	int err;
+	unsigned int len;
 
 	if (call < 1 || call > SYS_ACCEPT4)
 		return -EINVAL;
 
+	len = nargs[call];
+	if (len > 6 * sizeof(unsiged long))
+		return -EINVAL;
+
 	/* copy_from_user should be SMP safe. */
-	if (copy_from_user(a, args, nargs[call]))
+	if (copy_from_user(a, args, len))
 		return -EFAULT;
 
 	audit_socketcall(nargs[call] / sizeof(unsigned long), a);



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* Re: [PATCH v3 net-next-2.6] cxgb3: Added private MAC address and provisioning packet handler for iSCSI
From: Daniel Walker @ 2009-09-26 19:22 UTC (permalink / raw)
  To: kxie
  Cc: davem, swise, divy, rakesh, michaelc, James.Bottomley,
	linux-kernel, netdev
In-Reply-To: <200909261903.n8QJ3D2b000882@localhost.localdomain>

On Sat, 2009-09-26 at 12:03 -0700, kxie@chelsio.com wrote:
>  enum {                 /* rx_offload flags */
>         T3_RX_CSUM      = 1 << 0,
>         T3_LRO          = 1 << 1,
>  };
>  
> +enum {
> +       LAN_MAC_IDX     = 0,
> +       SAN_MAC_IDX,
> +
> +       MAX_MAC_IDX
> +};

Why not name the enum and use it in the function declarations? I see
there are some other unnamed enums in there so you are following the
style in the file already.. However, naming the enum and using it allows
the input values to be known instead of just saying "int n", so I think
that's a better method..

Daniel

^ permalink raw reply

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Arjan van de Ven @ 2009-09-26 19:23 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926190103.GB4356@lenovo>

On Sat, 26 Sep 2009 23:01:03 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> [Arjan van de Ven - Sat, Sep 26, 2009 at 08:54:32PM +0200]
> | From: Arjan van de Ven <arjan@linux.intel.com>
> | Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
> | CC: netdev@vger.kernel.org
> | 
> | The sys_socketcall() function has a very clever system for the copy
> | size of its arguments. Unfortunately, gcc cannot deal with this in
> | terms of proving that the copy_from_user() is then always in bounds.
> | This is the last (well 9th of this series, but last in the kernel)
> such | case around.
> | 
> | With this patch, we can turn on code to make having the boundary
> provably | right for the whole kernel, and detect introduction of new
> security | accidents of this type early on.
> | 
> | Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> | 
> | 
> | diff --git a/net/socket.c b/net/socket.c
> | index 49917a1..13a8d67 100644
> | --- a/net/socket.c
> | +++ b/net/socket.c
> | @@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call,
> unsigned long __user *, args) |  	unsigned long a[6];
> |  	unsigned long a0, a1;
> |  	int err;
> | +	unsigned int len;
> |  
> |  	if (call < 1 || call > SYS_ACCEPT4)
> |  		return -EINVAL;
> |  
> | +	len = nargs[call];
> | +	if (len > 6)
> 
> Hi Arjan, wouldn't ARRAY_SIZE suffice beter there?
> Or I miss something?
> 

goof once goof twice, make it sizeof.. that's nicer.

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
CC: netdev@vger.kernel.org

The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


diff --git a/net/socket.c b/net/socket.c
index 49917a1..13a8d67 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
 	unsigned long a[6];
 	unsigned long a0, a1;
 	int err;
+	unsigned int len;
 
 	if (call < 1 || call > SYS_ACCEPT4)
 		return -EINVAL;
 
+	len = nargs[call];
+	if (len > sizeof(a))
+		return -EINVAL;
+
 	/* copy_from_user should be SMP safe. */
-	if (copy_from_user(a, args, nargs[call]))
+	if (copy_from_user(a, args, len))
 		return -EFAULT;
 
 	audit_socketcall(nargs[call] / sizeof(unsigned long), a);


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Cyrill Gorcunov @ 2009-09-26 19:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926212302.0ce64a5c@infradead.org>

[Arjan van de Ven - Sat, Sep 26, 2009 at 09:23:02PM +0200]
...
| 
| goof once goof twice, make it sizeof.. that's nicer.
| 

yeah, I was about to propose the same :)

...
	- Cyrill

^ permalink raw reply

* Re: TCP stack bug related to F-RTO?
From: Joe Cao @ 2009-09-26 20:48 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Ray Lee, Netdev, LKML
In-Reply-To: <Pine.LNX.4.64.0909262034130.12882@melkinkari.cs.Helsinki.FI>

Hi Ilpo,

Thanks for the replay.  We noticed the problem while we were debugging a connection failure case reported by one of our customers (we are a network device vendor).  Actually we have suggested our customer to upgrade their server software to fix the problem, and we are still waiting for the feedback from them.  Meanwhile, I asked all those questions just because I want to understand the issue and the fixes.  We also has to convince the customer to move to a right kernel and don't want them to come back with the same problem again.

Again, thanks for the help!

Joe

--- On Sat, 9/26/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <caoco2002@yahoo.com>
> Cc: "Ray Lee" <ray-lk@madrabbit.org>, "Netdev" <netdev@vger.kernel.org>, "LKML" <linux-kernel@vger.kernel.org>
> Date: Saturday, September 26, 2009, 10:51 AM
> On Sat, 26 Sep 2009, Joe Cao wrote:
> 
> > Can you elaborate on "Some retransmission would happen
> here as step 3"?  
> > When the second timeout happens, it will again go into
> FRTO and then 
> > retransmit the write queue head.
> 
> Why do you think that the second RTO will happen with
> anything else than 
> with 2.6.24. And it's perfectly ok to go into FRTO for the
> second time.
> 
> > I looked at the patch (debian Bug#478062) that's
> probably what you 
> > mentioned as the fix. All it does was to exclude the
> SACK case when 
> > considering FRTO.  But in my case, SACK was
> enabled, as seen in the 
> > trace..
> 
> You should be looking from where I said rather than picking
> up your own 
> sources and assuming that they'll tell you all the story
> :-). In fact, 
> there are two fixes that were made in a row and one
> workaround in the
> same timeframe. ...And you managed to pick the wrong one of
> the fixes, so 
> I kind of understand why you got confused :-).
> 
> > In other words, do we still have a problem with FRTO
> when SACK is 
> > enabled in the latest kernel?
> 
> For sure we might have all kinds of problems no one has yet
> 
> noticed/reported :-). ....However, it seems that this
> particular problem 
> your trace is showing is solved. Can you please test with a
> fixed kernel 
> before coming back here with these claims.
> 
> 
> -- 
>  i.
> 
> --- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> wrote:
> 
> > From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Joe Cao" <caoco2002@yahoo.com>
> > Cc: "Ray Lee" <ray-lk@madrabbit.org>,
> "Netdev" <netdev@vger.kernel.org>,
> "LKML" <linux-kernel@vger.kernel.org>
> > Date: Friday, September 25, 2009, 11:03 AM
> > On Fri, 25 Sep 2009, Joe Cao wrote:
> > 
> > > Thanks for the reply!  Do you happen to know
> > which patch fixed the 
> > > problem?
> > 
> > You can find those patches from the stable queue git
> tree.
> > I gave you hint 
> > from what release to look from in the last mail.
> However,
> > as 2.6.24 is 
> > anyway obsolete my recommendation is that you should
> > probably consider 
> > upgrading to fix all the other bugs that have been
> found
> > since 2.6.24 was 
> > obsoleted.
> > 
> > > Is there a bug tracking system for linux kernel?
> > 
> > Nothing that knows everything about everything.
> > 
> > > I studied the FRTO code in latest kernel
> 2.6.31.. 
> > It seems the problem 
> > > is still there:  
> > >
> > > 1. Every time a RTO fires, because
> tcp_is_sackfrto(tp)
> > returns 1, 
> > > tcp_use_frto() returns true.  And the server
> tcp
> > enters FRTO.
> > > 2. After the head of write queue is
> retransmitted, two
> > new data packets 
> > > are transmitted, the server receives two
> > dup-ACKs.  That will make the 
> > > TCP enter tcp_enter_frto_loss(), however, that
> only
> > rests ssthresh and 
> > > some other fields.
> > 
> > Perhaps those other fields are far more important than
> you
> > think... :-)
> > ...Some retransmission would happen here as step 3.
> > 
> > > 3. After another longer RTO fires, because
> > tcp_is_sackfrto(tp) returns 
> > > 1, tcp_use_frto() again returns true.  The
> stack
> > enters FRTO again.
> > > 4. The above repeats and the stack couldn't
> > retransmits the lost packets 
> > > faster.
> > > 
> > > Is my understanding above correct?
> > 
> > ...No. All magic that happens in tcp_enter_frto_loss
> should
> > be enough to 
> > really do more than a single retransmission (that is,
> in
> > any other than 
> > 2.6.24 series kernel). There was an unfortunate bug in
> this
> > area in 2.6.24 
> > which basically undoed the effect of correct actions
> > tcp_enter_frto_loss 
> > did which effectively prevented
> tcp_xmit_retransmit_queue
> > from doing its 
> > part.
> > 
> > -- 
> >  i.
> > 
> > --- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > wrote:
> > 
> > > From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > > Subject: Re: TCP stack bug related to F-RTO?
> > > To: "Ray Lee" <ray-lk@madrabbit.org>
> > > Cc: "Joe Cao" <caoco2002@yahoo.com>,
> > "Netdev" <netdev@vger.kernel.org>,
> > "LKML" <linux-kernel@vger.kernel.org>,
> > jcaoco2002@yahoo.com
> > > Date: Friday, September 25, 2009, 6:09 AM
> > > On Thu, 24 Sep 2009, Ray Lee wrote:
> > > 
> > > > [adding netdev cc:]
> > > > 
> > > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao
> <caoco2002@yahoo.com>
> > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I have found the following behavior
> with
> > > different versions of linux 
> > > > > kernel. The attached pcap trace is
> collected
> > with
> > > server 
> > > > > (192.168.0.13) running 2.6.24 and shows
> the
> > > problem. Basically the 
> > > > > behavior is like this: 
> > > > >
> > > > > 1. The client opens up a big window,
> > > > > 2. the server sends 19 packets in a row
> (pkt
> > #14-
> > > #32 in the trace), but all of them are dropped
> due to
> > some
> > > congestion.
> > > > > 3. The server hits RTO and retransmits
> pkt
> > #14 in
> > > #33
> > > > > 4. The client immediately acks #33
> (=#14),
> > and
> > > the server (seems like to enter F-RTO) expends
> the
> > window
> > > and sends *NEW* pkt #35 & #36.=A0 Timeoute
> is
> > doubled to
> > > 2*RTO; The client immediately sends two Dup-ack
> to #35
> > and
> > > #36.
> > > > > 5. after 2*RTO, pkt #15 is
> retransmitted in
> > #39.
> > > > > 6. The client immediately acks #39
> (=#15) in
> > #40,
> > > and the server continues to expand the window
> and
> > sends two
> > > *NEW* pkt #41 & #42. Now the timeoute is
> doubled
> > to 4
> > > *RTO.
> > > > > 8. After 4*RTO timeout, #16 is
> > retransmitted.
> > > > > 9....
> > > > > 10. The above steps repeats for
> > retransmitting
> > > pkt #16-#32 and each time the timeout is
> doubled.
> > > > > 11. It takes a long long time to
> retransmit
> > all
> > > the lost packets and before that is done, the
> client
> > sends a
> > > RST because of timeout.
> > > > >
> > > > > The above behavior looks like F-RTO is
> in
> > effect.
> > >  And there seems to 
> > > > > be a bug in the TCP's congestion
> control
> > and
> > > retransmission algorithm. 
> > > > > Why doesn't the TCP on server (running
> > 2.6.24)
> > > enter the slow start? 
> > > > > Why should the server take that long
> to
> > recover
> > > from a short period 
> > > > > of packet loss?
> > > > >
> > > > > Has anyone else noticed similar
> problem
> > before?
> > >  If my analysis was 
> > > > > wrong, can anyone gives me some
> pointers to
> > > what's really wrong and 
> > > > > how to fix it?
> > > 
> > > Yes, 2.6.24 is an obsoleted version with known
> wrongs
> > in
> > > FRTO 
> > > implementation. Fixes never when to 2.6.24
> stable
> > series as
> > > it was 
> > > _already_ obsoleted when the problems where
> reported
> > and
> > > found. The 
> > > correct fixes may be found from 2.6.25.7 (.7
> iirc) and
> > are
> > > included from 
> > > 2.6.26 onward too.
> > > 
> > > Just in case you happen to run ubuntu based
> kernel
> > from
> > > that era (of 
> > > course you should be reporting the bug here
> then...),
> > a
> > > word of warning: 
> > > it seemed nearly impossible for them to get a
> simple
> > thing
> > > like that 
> > > fixed, I haven't been looking if they'd
> eventually
> > come to
> > > some sensible 
> > > conclusion in that matter or is it still
> unresolved
> > (or
> > > e.g., closed 
> > > without real resolution).
> 


      


^ permalink raw reply

* [PATCH] /proc/net/tcp, overhead removed
From: Yakov Lerner @ 2009-09-26 21:31 UTC (permalink / raw)
  To: linux-kernel, netdev, davem, kuznet, pekkas, jmorris, yoshfuji,
	kaber, torval
  Cc: Yakov Lerner

/proc/net/tcp does 20,000 sockets in 60-80 milliseconds, with this patch.

The overhead was in tcp_seq_start(). See analysis (3) below.
The patch is against Linus git tree (1). The patch is small.

------------  -----------   ------------------------------------
Before patch  After patch   20,000 sockets (10,000 tw + 10,000 estab)(2)
------------  -----------   ------------------------------------
6 sec          0.06 sec     dd bs=1k if=/proc/net/tcp >/dev/null 
1.5 sec        0.06 sec     dd bs=4k if=/proc/net/tcp >/dev/null

1.9 sec        0.16 sec     netstat -4ant >/dev/null
------------  -----------   ------------------------------------

This is ~ x25 improvement.
The new time is not dependent on read blockize.
Speed of netstat, naturally, improves, too; both -4 and -6.
/proc/net/tcp6 does 20,000 sockets in 100 millisec.

(1) against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

(2) Used 'manysock' utility to stress system with large number of sockets:
  "manysock 10000 10000"    - 10,000 tw + 10,000 estab ip4 sockets.
  "manysock -6 10000 10000" - 10,000 tw + 10,000 estab ip6 sockets.
Found at http://ilerner.3b1.org/manysock/manysock.c

(3) Algorithmic analysis. 
    Old algorithm.

During 'cat </proc/net/tcp', tcp_seq_start() is called O(numsockets) times (4).
On average, every call to tcp_seq_start() scans half the whole hashtable. Ouch.
This is O(numsockets * hashsize). 95-99% of 'cat </proc/net/tcp' is spent in
tcp_seq_start()->tcp_get_idx. This overhead is eliminated by new algorithm,
which is O(numsockets + hashsize).

    New algorithm.

New algorithms is O(numsockets + hashsize). We jump to the right
hash bucket in tcp_seq_start(), without scanning half the hash.
To jump right to the hash bucket corresponding to *pos in tcp_seq_start(),
we reuse three pieces of state (st->num, st->bucket, st->sbucket)
as follows:
 - we check that requested pos >= last seen pos (st->num), the typical case. 
 - if so, we jump to bucket st->bucket
 - to arrive to the right item after beginning of st->bucket, we
keep in st->sbucket the position corresponding to the beginning of
bucket.

(4) Explanation of O( numsockets * hashsize) of old algorithm.

tcp_seq_start() is called once for every ~7 lines of netstat output 
if readsize is 1kb, or once for every ~28 lines if readsize >= 4kb.
Since record length of /proc/net/tcp records is 150 bytes, formula for
number of calls to tcp_seq_start() is
            (numsockets * 150 / min(4096,readsize)).
Netstat uses 4kb readsize (newer versions), or 1kb (older versions).
Note that speed of old algorithm does not improve above 4kb blocksize.

Speed of the new algorithm does not depend on blocksize.

Speed of the new algorithm does not perceptibly depend on hashsize (which
depends on ramsize). Speed of old algorithm drops with bigger hashsize.

(5) Reporting order.

Reporting order is exactly same as before if hash does not change underfoot.
When hash elements come and go during report, reporting order will be
same as that of tcpdiag.

Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
---
 net/ipv4/tcp_ipv4.c |   26 ++++++++++++++++++++++++--
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..7d9421a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1994,13 +1994,14 @@ static inline int empty_bucket(struct tcp_iter_state *st)
 		hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
 }
 
-static void *established_get_first(struct seq_file *seq)
+static void *established_get_first_after(struct seq_file *seq, int bucket)
 {
 	struct tcp_iter_state *st = seq->private;
 	struct net *net = seq_file_net(seq);
 	void *rc = NULL;
 
-	for (st->bucket = 0; st->bucket < tcp_hashinfo.ehash_size; ++st->bucket) {
+	for (st->bucket = bucket; st->bucket < tcp_hashinfo.ehash_size;
+	     ++st->bucket) {
 		struct sock *sk;
 		struct hlist_nulls_node *node;
 		struct inet_timewait_sock *tw;
@@ -2036,6 +2037,11 @@ out:
 	return rc;
 }
 
+static void *established_get_first(struct seq_file *seq)
+{
+	return established_get_first_after(seq, 0);
+}
+
 static void *established_get_next(struct seq_file *seq, void *cur)
 {
 	struct sock *sk = cur;
@@ -2045,6 +2051,7 @@ static void *established_get_next(struct seq_file *seq, void *cur)
 	struct net *net = seq_file_net(seq);
 
 	++st->num;
+	st->sbucket = st->num;
 
 	if (st->state == TCP_SEQ_STATE_TIME_WAIT) {
 		tw = cur;
@@ -2116,6 +2123,21 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
 static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct tcp_iter_state *st = seq->private;
+
+	if (*pos && *pos >= st->sbucket &&
+	    (st->state == TCP_SEQ_STATE_ESTABLISHED ||
+	     st->state == TCP_SEQ_STATE_TIME_WAIT)) {
+		int nskip;
+		void *cur;
+
+		st->num = st->sbucket;
+		st->state = TCP_SEQ_STATE_ESTABLISHED;
+		cur = established_get_first_after(seq, st->bucket);
+		for (nskip = *pos - st->sbucket; nskip > 0 && cur; --nskip)
+			cur = established_get_next(seq, cur);
+		return cur;
+	}
+
 	st->state = TCP_SEQ_STATE_LISTENING;
 	st->num = 0;
 	return *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
-- 
1.6.5.rc2


^ permalink raw reply related

* tg3 and Broadcom PHY driver
From: Felix Radensky @ 2009-09-26 21:32 UTC (permalink / raw)
  To: netdev

Hi,

I've noticed that in linux-2.6.31 I have to make tg3 driver modular, due to
its dependency on Broadcom PHY driver. If both tg3 and PHY driver are
compiled into the kernel, tg3 fails to detect a PHY, apparently because PHY
driver is loaded later. I'm using BCM57760 on embedded powerpc platform
(MPC8536).

How can I make tg3 work when it's compiled into the kernel ?

Thanks.

Felix.

^ permalink raw reply

* tg3: Badness at kernel/mutex.c:207
From: Felix Radensky @ 2009-09-26 21:20 UTC (permalink / raw)
  To: netdev

Hi,

I'm running linux-2.6.31 on a custom MPC8536 based board with BCM57760 chip.
Both tg3 driver, and Broadcom PHY driver are modules.

Each time I run ifconfig eth2 up, I get the following error message:

Badness at kernel/mutex.c:207
NIP: c025132c LR: c0251314 CTR: c0251334
REGS: efbedbd0 TRAP: 0700   Not tainted  (2.6.31)
MSR: 00029000 <EE,ME,CE>  CR: 24020422  XER: 00000000
TASK = efacce10[1080] 'ifconfig' THREAD: efbec000
GPR00: 00000000 efbedc80 efacce10 00000001 00007020 00000002 00000000 
00000200
GPR08: 00029000 c0350000 c0330000 00000001 24020424 10057d94 000002a0 
1000d82c
GPR16: 1000d81c 1000d814 10010000 10050000 ef897a0c efbede18 ffff8914 
ef897a00
GPR24: 00008000 c034b480 efbec000 efb0122c c0350000 efacce10 ef82d2c0 
efb01228
NIP [c025132c] __mutex_lock_slowpath+0x1f0/0x1f8
LR [c0251314] __mutex_lock_slowpath+0x1d8/0x1f8
Call Trace:
[efbedcd0] [c025134c] mutex_lock+0x18/0x34
[efbedcf0] [f534a228] tg3_chip_reset+0x7cc/0x9f8 [tg3]
[efbedd20] [f534a8f0] tg3_reset_hw+0x58/0x2360 [tg3]
[efbedd70] [f5351dd4] tg3_open+0x610/0x910 [tg3]
[efbeddb0] [c01e1c6c] dev_open+0x100/0x138
[efbeddd0] [c01dff20] dev_change_flags+0x80/0x1ac
[efbeddf0] [c02232cc] devinet_ioctl+0x648/0x824
[efbede60] [c0223de4] inet_ioctl+0xcc/0xf8
[efbede70] [c01cdf44] sock_ioctl+0x60/0x300
[efbede90] [c008a35c] vfs_ioctl+0x34/0x8c
[efbedea0] [c008a580] do_vfs_ioctl+0x88/0x724
[efbedf10] [c008ac5c] sys_ioctl+0x40/0x74
[efbedf40] [c000f814] ret_from_syscall+0x0/0x3c
Instruction dump:
0fe00000 4bfffe80 801a000c 5409016f 4182fe60 4bf0f6d9 2f830000 41befe54
3d20c035 8009c2c0 2f800000 40befe44 <0fe00000> 4bfffe3c 9421ffe0 7c0802a6

Does it indicate a real problem, or something that can be ignored ?

Additional information from kernel log:

tg3.c:v3.99 (April 20, 2009)
tg3 0002:05:00.0: enabling bus mastering
tg3 0002:05:00.0: PME# disabled
tg3 mdio bus: probed
eth2: Tigon3 [partno(BCM57760) rev 57780001] (PCI Express) MAC address 
00:10:18:00:00:00
eth2: attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=500:01)
eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
eth2: dma_rwctrl[76180000] dma_mask[64-bit]
tg3 0002:05:00.0: PME# disabled

Thanks.

Felix.

^ permalink raw reply

* Re: 2.6.31 regression: e1000e jumbo frames no longer work: 'Unsupported MTU setting'
From: Alexander Duyck @ 2009-09-27  2:14 UTC (permalink / raw)
  To: Nix; +Cc: e1000-devel, netdev, bruce.w.allan, linux-kernel
In-Reply-To: <871vluc5wi.fsf@spindle.srvr.nix>

On Sat, Sep 26, 2009 at 4:16 AM, Nix <nix@esperi.org.uk> wrote:
> [Bruce, you have changes in net-next in this area, so you might have a clue
>  what's going on here.]
>
> In 2.6.30.x, I was happily bringing up the 82574L cards in one server like
> this:
>
> ip link set fastnet up mtu 7200
>
> As of 2.6.31.x, what I see is this:
>
> spindle:/root# ip link set mtu 7200 dev fastnet
> RTNETLINK answers: Invalid argument
> [ 3380.261796] 0000:02:00.0: fastnet: Unsupported MTU setting
>
> As far as I can tell, all MTUs above 1500 now fail.
>
> 'Unsupported' or not, this used to work, and I'd certainly expect jumbo
> frames to be supported on a gigabit card!
>
> I can't see any terribly relevant changes to e1000e between 2.6.30 and
> 2.6.31, so I'm Cc:ing netdev on the offchance that this is something
> more generic (unlikely, as 7200-byte MTUs still work fine in 2.6.31 with
> the r8169 I'm typing this on, but that doesn't help if half the subnet
> is forced to use MTUs of 1500).

It looks like the problem is that the 82574 and 82583 seem to have
their max_hw_frame_size values swapped.  You might try applying the
patch below.  I am not sure if it will apply since I hand generated it
using the git patch that seems to have introduced the problem, and I
am sending the patch through an untested account that may mangle the
patch.  I will see about submitting an official patch for this
sometime next few days.

Thanks,

Alex

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -1803,7 +1803,7 @@ struct e1000_info e1000_82574_info = {
 				  | FLAG_HAS_AMT
 				  | FLAG_HAS_CTRLEXT_ON_LOAD,
 	.pba			= 20,
-	.max_hw_frame_size	= ETH_FRAME_LEN + ETH_FCS_LEN,
+	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
 	.mac_ops		= &e82571_mac_ops,
 	.phy_ops		= &e82_phy_ops_bm,
@@ -1820,7 +1820,7 @@ struct e1000_info e1000_82583_info = {
 				  | FLAG_HAS_AMT
 				  | FLAG_HAS_CTRLEXT_ON_LOAD,
 	.pba			= 20,
-	.max_hw_frame_size	= DEFAULT_JUMBO,
+	.max_hw_frame_size	= ETH_FRAME_LEN + ETH_FCS_LEN,
 	.get_variants		= e1000_get_variants_82571,
 	.mac_ops		= &e82571_mac_ops,
 	.phy_ops		= &e82_phy_ops_bm,

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf

^ permalink raw reply

* Re: [net-2.6 PATCH 01/13] e1000: drop dead pcie code from e1000
From: David Miller @ 2009-09-27  3:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jesse.brandeburg, donald.c.skidmore
In-Reply-To: <20090925221613.26715.66796.stgit@localhost.localdomain>


All 13 patches applied, thanks.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/3] net: fix vlan_get_size to include vlan_flags size
From: David Miller @ 2009-09-27  3:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, john.r.fastabend
In-Reply-To: <20090925231124.23450.94680.stgit@localhost.localdomain>


All 3 patches applied, thanks!

^ permalink raw reply

* Re: [PATCH 0/4] ISDN patches for 2.6.32 (v2)
From: David Miller @ 2009-09-27  3:24 UTC (permalink / raw)
  To: tilman; +Cc: isdn, keil, i4ldeveloper, netdev, linux-kernel
In-Reply-To: <4ABDFE9F.2030404@imap.cc>

From: Tilman Schmidt <tilman@imap.cc>
Date: Sat, 26 Sep 2009 13:44:31 +0200

> Hello Karsten,
> 
> is there any chance of getting these and the Gigaset patches forwarded
> for inclusion in 2.6.32 before the merge window closes?
> If not all of them, perhaps at least those which you had already acked
> before David Miller asked that they should formally go through you
> (#2-4 of the ISDN series), and those which are just fixes to the
> existing i4l version of the driver (#1-10 of the Gigaset series)?
> I would really appreciate not having to maintain all of them out of
> tree for another release cycle.

I am extremely disappointed in the lack of respinsiveness of Karsten
to ISDN patches.  He wants to handle them, but he has effectively
disappeared during the most critical time for patch integration, which
is during the merge window.

If he's busy in life or whatever, that's fine, but in such a case he
should appoint someone to handle ISDN patches until he does have time.

And if no ISDN expert is available, Tilman's suggestion of letting
me integrate the patches should be taken.

^ permalink raw reply

* Re: [PATCH] atm: dereference of he_dev->rbps_virt in he_init_group()
From: David Miller @ 2009-09-27  3:26 UTC (permalink / raw)
  To: roel.kluin; +Cc: joe, chas, linux-atm-general, netdev, akpm
In-Reply-To: <4ABE152F.20507@gmail.com>

From: Roel Kluin <roel.kluin@gmail.com>
Date: Sat, 26 Sep 2009 15:20:47 +0200

> he_dev->rbps_virt or he_dev->rbpl_virt allocation may fail, so
> check them. Make sure that he_init_group() cleans up after
> errors.
> 
> Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
> Signed-off-by: "Juha Leppanen" <juha_motorsportcom@luukku.com>
> ---
> David, I swapped rbps and rbpl arguments in my last patch, and
> there were some other problems. This was pointed out by Juha
> Leppanen. Can you please replace the former patch by this one? 
> 
> This was version was build, sparse and checkpatch tested.
> 
> Sorry for the mess.

I can't just "replace" it, especially since your change is even
already in Linus's tree.

Please send me a relative fixup rather than a new patch.

^ permalink raw reply

* Re: [PATCH] Revert "sit: stateless autoconf for isatap"
From: David Miller @ 2009-09-27  3:28 UTC (permalink / raw)
  To: contact; +Cc: netdev, fred.l.templin
In-Reply-To: <1253977393-7757-1-git-send-email-contact@saschahlusiak.de>

From: Sascha Hlusiak <contact@saschahlusiak.de>
Date: Sat, 26 Sep 2009 17:03:13 +0200

> This reverts commit 645069299a1c7358cf7330afe293f07552f11a5d.
> 
> While the code does not actually break anything, it does not completely follow
> RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
> ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
> implemented in userspace.
> 
> The kernel should not send RS packages for ISATAP at all.
> 
> Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
> Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>

Applied, thanks.

^ permalink raw reply

* [PATCH 3/3] drivers/staging/hv/: use %pU to print UUID/GUIDs
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH
In-Reply-To: <cover.1254030722.git.joe@perches.com>

Converted individual GUID/UUID printing functions
to use the new %pU[Xr] in lib/vsprintf.c

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/staging/hv/ChannelMgmt.c |   22 +-------
 drivers/staging/hv/vmbus_drv.c   |  116 ++++----------------------------------
 2 files changed, 14 insertions(+), 124 deletions(-)

diff --git a/drivers/staging/hv/ChannelMgmt.c b/drivers/staging/hv/ChannelMgmt.c
index 3db62ca..8b0fb81 100644
--- a/drivers/staging/hv/ChannelMgmt.c
+++ b/drivers/staging/hv/ChannelMgmt.c
@@ -263,28 +263,10 @@ static void VmbusChannelOnOffer(struct vmbus_channel_message_header *hdr)
 
 	DPRINT_INFO(VMBUS, "Channel offer notification - "
 		    "child relid %d monitor id %d allocated %d, "
-		    "type {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		    "%02x%02x%02x%02x%02x%02x%02x%02x} "
-		    "instance {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		    "%02x%02x%02x%02x%02x%02x%02x%02x}",
+		    "type {%pUr} instance {%pUr}",
 		    offer->ChildRelId, offer->MonitorId,
 		    offer->MonitorAllocated,
-		    guidType->data[3], guidType->data[2],
-		    guidType->data[1], guidType->data[0],
-		    guidType->data[5], guidType->data[4],
-		    guidType->data[7], guidType->data[6],
-		    guidType->data[8], guidType->data[9],
-		    guidType->data[10], guidType->data[11],
-		    guidType->data[12], guidType->data[13],
-		    guidType->data[14], guidType->data[15],
-		    guidInstance->data[3], guidInstance->data[2],
-		    guidInstance->data[1], guidInstance->data[0],
-		    guidInstance->data[5], guidInstance->data[4],
-		    guidInstance->data[7], guidInstance->data[6],
-		    guidInstance->data[8], guidInstance->data[9],
-		    guidInstance->data[10], guidInstance->data[11],
-		    guidInstance->data[12], guidInstance->data[13],
-		    guidInstance->data[14], guidInstance->data[15]);
+		    guidType->data, guidInstance->data);
 
 	/* Allocate the channel object and save this offer. */
 	newChannel = AllocVmbusChannel();
diff --git a/drivers/staging/hv/vmbus_drv.c b/drivers/staging/hv/vmbus_drv.c
index 582318f..73119a9 100644
--- a/drivers/staging/hv/vmbus_drv.c
+++ b/drivers/staging/hv/vmbus_drv.c
@@ -143,43 +143,10 @@ static ssize_t vmbus_show_device_attr(struct device *dev,
 	vmbus_child_device_get_info(&device_ctx->device_obj, &device_info);
 
 	if (!strcmp(dev_attr->attr.name, "class_id")) {
-		return sprintf(buf, "{%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			       "%02x%02x%02x%02x%02x%02x%02x%02x}\n",
-			       device_info.ChannelType.data[3],
-			       device_info.ChannelType.data[2],
-			       device_info.ChannelType.data[1],
-			       device_info.ChannelType.data[0],
-			       device_info.ChannelType.data[5],
-			       device_info.ChannelType.data[4],
-			       device_info.ChannelType.data[7],
-			       device_info.ChannelType.data[6],
-			       device_info.ChannelType.data[8],
-			       device_info.ChannelType.data[9],
-			       device_info.ChannelType.data[10],
-			       device_info.ChannelType.data[11],
-			       device_info.ChannelType.data[12],
-			       device_info.ChannelType.data[13],
-			       device_info.ChannelType.data[14],
-			       device_info.ChannelType.data[15]);
+		return sprintf(buf, "{%pUr}\n", device_info.ChannelType.data);
 	} else if (!strcmp(dev_attr->attr.name, "device_id")) {
-		return sprintf(buf, "{%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			       "%02x%02x%02x%02x%02x%02x%02x%02x}\n",
-			       device_info.ChannelInstance.data[3],
-			       device_info.ChannelInstance.data[2],
-			       device_info.ChannelInstance.data[1],
-			       device_info.ChannelInstance.data[0],
-			       device_info.ChannelInstance.data[5],
-			       device_info.ChannelInstance.data[4],
-			       device_info.ChannelInstance.data[7],
-			       device_info.ChannelInstance.data[6],
-			       device_info.ChannelInstance.data[8],
-			       device_info.ChannelInstance.data[9],
-			       device_info.ChannelInstance.data[10],
-			       device_info.ChannelInstance.data[11],
-			       device_info.ChannelInstance.data[12],
-			       device_info.ChannelInstance.data[13],
-			       device_info.ChannelInstance.data[14],
-			       device_info.ChannelInstance.data[15]);
+		return sprintf(buf, "{%pUr}\n",
+			       device_info.ChannelInstance.data);
 	} else if (!strcmp(dev_attr->attr.name, "state")) {
 		return sprintf(buf, "%d\n", device_info.ChannelState);
 	} else if (!strcmp(dev_attr->attr.name, "id")) {
@@ -487,23 +454,9 @@ static struct hv_device *vmbus_child_device_create(struct hv_guid *type,
 	}
 
 	DPRINT_DBG(VMBUS_DRV, "child device (%p) allocated - "
-		"type {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		"%02x%02x%02x%02x%02x%02x%02x%02x},"
-		"id {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		"%02x%02x%02x%02x%02x%02x%02x%02x}",
+		"type {%pUr}, id {%pUr}",
 		&child_device_ctx->device,
-		type->data[3], type->data[2], type->data[1], type->data[0],
-		type->data[5], type->data[4], type->data[7], type->data[6],
-		type->data[8], type->data[9], type->data[10], type->data[11],
-		type->data[12], type->data[13], type->data[14], type->data[15],
-		instance->data[3], instance->data[2],
-		instance->data[1], instance->data[0],
-		instance->data[5], instance->data[4],
-		instance->data[7], instance->data[6],
-		instance->data[8], instance->data[9],
-		instance->data[10], instance->data[11],
-		instance->data[12], instance->data[13],
-		instance->data[14], instance->data[15]);
+		type->data, instance->data);
 
 	child_device_obj = &child_device_ctx->device_obj;
 	child_device_obj->context = context;
@@ -629,65 +582,20 @@ static int vmbus_uevent(struct device *device, struct kobj_uevent_env *env)
 
 	DPRINT_ENTER(VMBUS_DRV);
 
-	DPRINT_INFO(VMBUS_DRV, "generating uevent - VMBUS_DEVICE_CLASS_GUID={"
-		    "%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		    "%02x%02x%02x%02x%02x%02x%02x%02x}",
-		    device_ctx->class_id.data[3], device_ctx->class_id.data[2],
-		    device_ctx->class_id.data[1], device_ctx->class_id.data[0],
-		    device_ctx->class_id.data[5], device_ctx->class_id.data[4],
-		    device_ctx->class_id.data[7], device_ctx->class_id.data[6],
-		    device_ctx->class_id.data[8], device_ctx->class_id.data[9],
-		    device_ctx->class_id.data[10],
-		    device_ctx->class_id.data[11],
-		    device_ctx->class_id.data[12],
-		    device_ctx->class_id.data[13],
-		    device_ctx->class_id.data[14],
-		    device_ctx->class_id.data[15]);
+	DPRINT_INFO(VMBUS_DRV,
+		    "generating uevent - VMBUS_DEVICE_CLASS_GUID={%pUr}",
+		    device_ctx->class_id.data);
 
 	env->envp_idx = i;
 	env->buflen = len;
-	ret = add_uevent_var(env, "VMBUS_DEVICE_CLASS_GUID={"
-			     "%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			     "%02x%02x%02x%02x%02x%02x%02x%02x}",
-			     device_ctx->class_id.data[3],
-			     device_ctx->class_id.data[2],
-			     device_ctx->class_id.data[1],
-			     device_ctx->class_id.data[0],
-			     device_ctx->class_id.data[5],
-			     device_ctx->class_id.data[4],
-			     device_ctx->class_id.data[7],
-			     device_ctx->class_id.data[6],
-			     device_ctx->class_id.data[8],
-			     device_ctx->class_id.data[9],
-			     device_ctx->class_id.data[10],
-			     device_ctx->class_id.data[11],
-			     device_ctx->class_id.data[12],
-			     device_ctx->class_id.data[13],
-			     device_ctx->class_id.data[14],
-			     device_ctx->class_id.data[15]);
+	ret = add_uevent_var(env, "VMBUS_DEVICE_CLASS_GUID={%pUr}",
+			     device_ctx->class_id.data);
 
 	if (ret)
 		return ret;
 
-	ret = add_uevent_var(env, "VMBUS_DEVICE_DEVICE_GUID={"
-			     "%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			     "%02x%02x%02x%02x%02x%02x%02x%02x}",
-			     device_ctx->device_id.data[3],
-			     device_ctx->device_id.data[2],
-			     device_ctx->device_id.data[1],
-			     device_ctx->device_id.data[0],
-			     device_ctx->device_id.data[5],
-			     device_ctx->device_id.data[4],
-			     device_ctx->device_id.data[7],
-			     device_ctx->device_id.data[6],
-			     device_ctx->device_id.data[8],
-			     device_ctx->device_id.data[9],
-			     device_ctx->device_id.data[10],
-			     device_ctx->device_id.data[11],
-			     device_ctx->device_id.data[12],
-			     device_ctx->device_id.data[13],
-			     device_ctx->device_id.data[14],
-			     device_ctx->device_id.data[15]);
+	ret = add_uevent_var(env, "VMBUS_DEVICE_DEVICE_GUID={%pUr}",
+			     device_ctx->device_id.data);
 	if (ret)
 		return ret;
 
-- 
1.6.3.1.10.g659a0.dirty


^ permalink raw reply related

* [RFC PATCH 0/3] print UUID/GUIDs with %pU
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH

Perhaps UUIDs are common enough to use a %p extension

Joe Perches (3):
  lib/vsprintf.c: Add %pU - ptr to a UUID/GUID
  treewide: use %pU to print UUID/GUIDs
  drivers/staging/hv/: use %pU to print UUID/GUIDs

 drivers/char/random.c                |   10 +--
 drivers/firmware/dmi_scan.c          |    5 +-
 drivers/md/md.c                      |   16 +----
 drivers/media/video/uvc/uvc_ctrl.c   |   69 +++++++++-----------
 drivers/media/video/uvc/uvc_driver.c |    7 +-
 drivers/media/video/uvc/uvcvideo.h   |   10 ---
 drivers/staging/hv/ChannelMgmt.c     |   22 +------
 drivers/staging/hv/vmbus_drv.c       |  116 ++++------------------------------
 fs/gfs2/sys.c                        |   16 +----
 fs/ubifs/debug.c                     |    9 +--
 fs/ubifs/super.c                     |    7 +--
 fs/xfs/xfs_log_recover.c             |   14 +---
 include/linux/efi.h                  |    6 +--
 lib/vsprintf.c                       |   58 +++++++++++++++++-
 14 files changed, 125 insertions(+), 240 deletions(-)

^ permalink raw reply

* [PATCH 1/3] lib/vsprintf.c: Add %pU - ptr to a UUID/GUID
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH
In-Reply-To: <cover.1254030722.git.joe@perches.com>

UUID/GUIDs are somewhat common in kernel source.

Standardize the printed style of UUID/GUIDs by using
another extension to %p.

%pU:    01020304:0506:0708:090a:0b0c0d0e0f10
%pUr:   04030201:0605:0807:0a09:0b0c0d0e0f10
%pU[r]X:Use upper case hex

Signed-off-by: Joe Perches <joe@perches.com>
---
 lib/vsprintf.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 57 insertions(+), 1 deletions(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index b91839e..68a49bb 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -790,6 +790,53 @@ static char *ip4_addr_string(char *buf, char *end, const u8 *addr,
 	return string(buf, end, ip4_addr, spec);
 }
 
+static char *uuid_string(char *buf, char *end, const u8 *addr,
+			 struct printf_spec spec, const char *fmt)
+{
+	char uuid[sizeof("xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx")];
+	char *p = uuid;
+	int i;
+	static const u8 r[16] = {3,2,1,0,5,4,7,6,8,9,10,11,12,13,14,15};
+	static const u8 n[16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+	const u8 *index = n;
+	bool uc = false;
+
+	while (isalnum(*(++fmt))) {
+		switch (*fmt) {
+		case 'r':
+			index = r;
+			break;
+		case 'X':
+			uc = true;
+			break;
+		}
+	}
+
+	for (i = 0; i < 16; i++) {
+		p = pack_hex_byte(p, addr[index[i]]);
+		switch (i) {
+		case 3:
+		case 5:
+		case 7:
+		case 9:
+			*p++ = '-';
+			break;
+		}
+	}
+
+	*p = 0;
+
+	if (uc) {
+		p = uuid;
+		while (*p) {
+			*p = toupper(*p);
+			p++;
+		}
+	}
+
+	return string(buf, end, uuid, spec);
+}
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -814,6 +861,13 @@ static char *ip4_addr_string(char *buf, char *end, const u8 *addr,
  *       IPv4 uses dot-separated decimal with leading 0's (010.123.045.006)
  * - 'I6c' for IPv6 addresses printed as specified by
  *       http://www.ietf.org/id/draft-kawamura-ipv6-text-representation-03.txt
+ * - 'U' For a 16 byte UUID/GUID, it prints the UUID/GUID in the form
+ *       "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
+ *       Options for %pU are:
+ *       'X' use upper case hex digits
+ *       'r' use LE byte order for U32 and U16s equivalents.  Use indices:
+ *       [3][2][1][0]-[5][4]-[7][6]-[9][8]-[10]...[15]
+ *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
  * pointer to the real address.
@@ -828,9 +882,9 @@ static char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 	case 'F':
 	case 'f':
 		ptr = dereference_function_descriptor(ptr);
-	case 's':
 		/* Fallthrough */
 	case 'S':
+	case 's':
 		return symbol_string(buf, end, ptr, spec, *fmt);
 	case 'R':
 		return resource_string(buf, end, ptr, spec);
@@ -853,6 +907,8 @@ static char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 			return ip4_addr_string(buf, end, ptr, spec, fmt);
 		}
 		break;
+	case 'U':
+		return uuid_string(buf, end, ptr, spec, fmt);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {
-- 
1.6.3.1.10.g659a0.dirty

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox