Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Arjan van de Ven @ 2009-09-26 19:05 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926190103.GB4356@lenovo>

On Sat, 26 Sep 2009 23:01:03 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> [Arjan van de Ven - Sat, Sep 26, 2009 at 08:54:32PM +0200]
> | From: Arjan van de Ven <arjan@linux.intel.com>
> | Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
> | CC: netdev@vger.kernel.org
> | 
> | The sys_socketcall() function has a very clever system for the copy
> | size of its arguments. Unfortunately, gcc cannot deal with this in
> | terms of proving that the copy_from_user() is then always in bounds.
> | This is the last (well 9th of this series, but last in the kernel)
> such | case around.
> | 
> | With this patch, we can turn on code to make having the boundary
> provably | right for the whole kernel, and detect introduction of new
> security | accidents of this type early on.
> | 
> | Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> | 
> | 
> | diff --git a/net/socket.c b/net/socket.c
> | index 49917a1..13a8d67 100644
> | --- a/net/socket.c
> | +++ b/net/socket.c
> | @@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call,
> unsigned long __user *, args) |  	unsigned long a[6];
> |  	unsigned long a0, a1;
> |  	int err;
> | +	unsigned int len;
> |  
> |  	if (call < 1 || call > SYS_ACCEPT4)
> |  		return -EINVAL;
> |  
> | +	len = nargs[call];
> | +	if (len > 6)
> 
> Hi Arjan, wouldn't ARRAY_SIZE suffice beter there?
> Or I miss something?

yeah you missed that I screwed up ;(

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
CC: netdev@vger.kernel.org

The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


diff --git a/net/socket.c b/net/socket.c
index 49917a1..13a8d67 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
 	unsigned long a[6];
 	unsigned long a0, a1;
 	int err;
+	unsigned int len;
 
 	if (call < 1 || call > SYS_ACCEPT4)
 		return -EINVAL;
 
+	len = nargs[call];
+	if (len > 6 * sizeof(unsiged long))
+		return -EINVAL;
+
 	/* copy_from_user should be SMP safe. */
-	if (copy_from_user(a, args, nargs[call]))
+	if (copy_from_user(a, args, len))
 		return -EFAULT;
 
 	audit_socketcall(nargs[call] / sizeof(unsigned long), a);



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* Re: [PATCH v3 net-next-2.6] cxgb3: Added private MAC address and provisioning packet handler for iSCSI
From: Daniel Walker @ 2009-09-26 19:22 UTC (permalink / raw)
  To: kxie
  Cc: davem, swise, divy, rakesh, michaelc, James.Bottomley,
	linux-kernel, netdev
In-Reply-To: <200909261903.n8QJ3D2b000882@localhost.localdomain>

On Sat, 2009-09-26 at 12:03 -0700, kxie@chelsio.com wrote:
>  enum {                 /* rx_offload flags */
>         T3_RX_CSUM      = 1 << 0,
>         T3_LRO          = 1 << 1,
>  };
>  
> +enum {
> +       LAN_MAC_IDX     = 0,
> +       SAN_MAC_IDX,
> +
> +       MAX_MAC_IDX
> +};

Why not name the enum and use it in the function declarations? I see
there are some other unnamed enums in there so you are following the
style in the file already.. However, naming the enum and using it allows
the input values to be known instead of just saying "int n", so I think
that's a better method..

Daniel

^ permalink raw reply

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Arjan van de Ven @ 2009-09-26 19:23 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926190103.GB4356@lenovo>

On Sat, 26 Sep 2009 23:01:03 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> [Arjan van de Ven - Sat, Sep 26, 2009 at 08:54:32PM +0200]
> | From: Arjan van de Ven <arjan@linux.intel.com>
> | Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
> | CC: netdev@vger.kernel.org
> | 
> | The sys_socketcall() function has a very clever system for the copy
> | size of its arguments. Unfortunately, gcc cannot deal with this in
> | terms of proving that the copy_from_user() is then always in bounds.
> | This is the last (well 9th of this series, but last in the kernel)
> such | case around.
> | 
> | With this patch, we can turn on code to make having the boundary
> provably | right for the whole kernel, and detect introduction of new
> security | accidents of this type early on.
> | 
> | Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> | 
> | 
> | diff --git a/net/socket.c b/net/socket.c
> | index 49917a1..13a8d67 100644
> | --- a/net/socket.c
> | +++ b/net/socket.c
> | @@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call,
> unsigned long __user *, args) |  	unsigned long a[6];
> |  	unsigned long a0, a1;
> |  	int err;
> | +	unsigned int len;
> |  
> |  	if (call < 1 || call > SYS_ACCEPT4)
> |  		return -EINVAL;
> |  
> | +	len = nargs[call];
> | +	if (len > 6)
> 
> Hi Arjan, wouldn't ARRAY_SIZE suffice beter there?
> Or I miss something?
> 

goof once goof twice, make it sizeof.. that's nicer.

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH 9/9] Add explicit bound checks in net/socket.c
CC: netdev@vger.kernel.org

The sys_socketcall() function has a very clever system for the copy
size of its arguments. Unfortunately, gcc cannot deal with this in
terms of proving that the copy_from_user() is then always in bounds.
This is the last (well 9th of this series, but last in the kernel) such
case around.

With this patch, we can turn on code to make having the boundary provably
right for the whole kernel, and detect introduction of new security
accidents of this type early on.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


diff --git a/net/socket.c b/net/socket.c
index 49917a1..13a8d67 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2098,12 +2098,17 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args)
 	unsigned long a[6];
 	unsigned long a0, a1;
 	int err;
+	unsigned int len;
 
 	if (call < 1 || call > SYS_ACCEPT4)
 		return -EINVAL;
 
+	len = nargs[call];
+	if (len > sizeof(a))
+		return -EINVAL;
+
 	/* copy_from_user should be SMP safe. */
-	if (copy_from_user(a, args, nargs[call]))
+	if (copy_from_user(a, args, len))
 		return -EFAULT;
 
 	audit_socketcall(nargs[call] / sizeof(unsigned long), a);


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* Re: [PATCH 9/9] Add explicit bound checks in net/socket.c
From: Cyrill Gorcunov @ 2009-09-26 19:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, torvalds, mingo, netdev
In-Reply-To: <20090926212302.0ce64a5c@infradead.org>

[Arjan van de Ven - Sat, Sep 26, 2009 at 09:23:02PM +0200]
...
| 
| goof once goof twice, make it sizeof.. that's nicer.
| 

yeah, I was about to propose the same :)

...
	- Cyrill

^ permalink raw reply

* Re: TCP stack bug related to F-RTO?
From: Joe Cao @ 2009-09-26 20:48 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Ray Lee, Netdev, LKML
In-Reply-To: <Pine.LNX.4.64.0909262034130.12882@melkinkari.cs.Helsinki.FI>

Hi Ilpo,

Thanks for the replay.  We noticed the problem while we were debugging a connection failure case reported by one of our customers (we are a network device vendor).  Actually we have suggested our customer to upgrade their server software to fix the problem, and we are still waiting for the feedback from them.  Meanwhile, I asked all those questions just because I want to understand the issue and the fixes.  We also has to convince the customer to move to a right kernel and don't want them to come back with the same problem again.

Again, thanks for the help!

Joe

--- On Sat, 9/26/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <caoco2002@yahoo.com>
> Cc: "Ray Lee" <ray-lk@madrabbit.org>, "Netdev" <netdev@vger.kernel.org>, "LKML" <linux-kernel@vger.kernel.org>
> Date: Saturday, September 26, 2009, 10:51 AM
> On Sat, 26 Sep 2009, Joe Cao wrote:
> 
> > Can you elaborate on "Some retransmission would happen
> here as step 3"?  
> > When the second timeout happens, it will again go into
> FRTO and then 
> > retransmit the write queue head.
> 
> Why do you think that the second RTO will happen with
> anything else than 
> with 2.6.24. And it's perfectly ok to go into FRTO for the
> second time.
> 
> > I looked at the patch (debian Bug#478062) that's
> probably what you 
> > mentioned as the fix. All it does was to exclude the
> SACK case when 
> > considering FRTO.  But in my case, SACK was
> enabled, as seen in the 
> > trace..
> 
> You should be looking from where I said rather than picking
> up your own 
> sources and assuming that they'll tell you all the story
> :-). In fact, 
> there are two fixes that were made in a row and one
> workaround in the
> same timeframe. ...And you managed to pick the wrong one of
> the fixes, so 
> I kind of understand why you got confused :-).
> 
> > In other words, do we still have a problem with FRTO
> when SACK is 
> > enabled in the latest kernel?
> 
> For sure we might have all kinds of problems no one has yet
> 
> noticed/reported :-). ....However, it seems that this
> particular problem 
> your trace is showing is solved. Can you please test with a
> fixed kernel 
> before coming back here with these claims.
> 
> 
> -- 
>  i.
> 
> --- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> wrote:
> 
> > From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > Subject: Re: TCP stack bug related to F-RTO?
> > To: "Joe Cao" <caoco2002@yahoo.com>
> > Cc: "Ray Lee" <ray-lk@madrabbit.org>,
> "Netdev" <netdev@vger.kernel.org>,
> "LKML" <linux-kernel@vger.kernel.org>
> > Date: Friday, September 25, 2009, 11:03 AM
> > On Fri, 25 Sep 2009, Joe Cao wrote:
> > 
> > > Thanks for the reply!  Do you happen to know
> > which patch fixed the 
> > > problem?
> > 
> > You can find those patches from the stable queue git
> tree.
> > I gave you hint 
> > from what release to look from in the last mail.
> However,
> > as 2.6.24 is 
> > anyway obsolete my recommendation is that you should
> > probably consider 
> > upgrading to fix all the other bugs that have been
> found
> > since 2.6.24 was 
> > obsoleted.
> > 
> > > Is there a bug tracking system for linux kernel?
> > 
> > Nothing that knows everything about everything.
> > 
> > > I studied the FRTO code in latest kernel
> 2.6.31.. 
> > It seems the problem 
> > > is still there:  
> > >
> > > 1. Every time a RTO fires, because
> tcp_is_sackfrto(tp)
> > returns 1, 
> > > tcp_use_frto() returns true.  And the server
> tcp
> > enters FRTO.
> > > 2. After the head of write queue is
> retransmitted, two
> > new data packets 
> > > are transmitted, the server receives two
> > dup-ACKs.  That will make the 
> > > TCP enter tcp_enter_frto_loss(), however, that
> only
> > rests ssthresh and 
> > > some other fields.
> > 
> > Perhaps those other fields are far more important than
> you
> > think... :-)
> > ...Some retransmission would happen here as step 3.
> > 
> > > 3. After another longer RTO fires, because
> > tcp_is_sackfrto(tp) returns 
> > > 1, tcp_use_frto() again returns true.  The
> stack
> > enters FRTO again.
> > > 4. The above repeats and the stack couldn't
> > retransmits the lost packets 
> > > faster.
> > > 
> > > Is my understanding above correct?
> > 
> > ...No. All magic that happens in tcp_enter_frto_loss
> should
> > be enough to 
> > really do more than a single retransmission (that is,
> in
> > any other than 
> > 2.6.24 series kernel). There was an unfortunate bug in
> this
> > area in 2.6.24 
> > which basically undoed the effect of correct actions
> > tcp_enter_frto_loss 
> > did which effectively prevented
> tcp_xmit_retransmit_queue
> > from doing its 
> > part.
> > 
> > -- 
> >  i.
> > 
> > --- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > wrote:
> > 
> > > From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > > Subject: Re: TCP stack bug related to F-RTO?
> > > To: "Ray Lee" <ray-lk@madrabbit.org>
> > > Cc: "Joe Cao" <caoco2002@yahoo.com>,
> > "Netdev" <netdev@vger.kernel.org>,
> > "LKML" <linux-kernel@vger.kernel.org>,
> > jcaoco2002@yahoo.com
> > > Date: Friday, September 25, 2009, 6:09 AM
> > > On Thu, 24 Sep 2009, Ray Lee wrote:
> > > 
> > > > [adding netdev cc:]
> > > > 
> > > > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao
> <caoco2002@yahoo.com>
> > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I have found the following behavior
> with
> > > different versions of linux 
> > > > > kernel. The attached pcap trace is
> collected
> > with
> > > server 
> > > > > (192.168.0.13) running 2.6.24 and shows
> the
> > > problem. Basically the 
> > > > > behavior is like this: 
> > > > >
> > > > > 1. The client opens up a big window,
> > > > > 2. the server sends 19 packets in a row
> (pkt
> > #14-
> > > #32 in the trace), but all of them are dropped
> due to
> > some
> > > congestion.
> > > > > 3. The server hits RTO and retransmits
> pkt
> > #14 in
> > > #33
> > > > > 4. The client immediately acks #33
> (=#14),
> > and
> > > the server (seems like to enter F-RTO) expends
> the
> > window
> > > and sends *NEW* pkt #35 & #36.=A0 Timeoute
> is
> > doubled to
> > > 2*RTO; The client immediately sends two Dup-ack
> to #35
> > and
> > > #36.
> > > > > 5. after 2*RTO, pkt #15 is
> retransmitted in
> > #39.
> > > > > 6. The client immediately acks #39
> (=#15) in
> > #40,
> > > and the server continues to expand the window
> and
> > sends two
> > > *NEW* pkt #41 & #42. Now the timeoute is
> doubled
> > to 4
> > > *RTO.
> > > > > 8. After 4*RTO timeout, #16 is
> > retransmitted.
> > > > > 9....
> > > > > 10. The above steps repeats for
> > retransmitting
> > > pkt #16-#32 and each time the timeout is
> doubled.
> > > > > 11. It takes a long long time to
> retransmit
> > all
> > > the lost packets and before that is done, the
> client
> > sends a
> > > RST because of timeout.
> > > > >
> > > > > The above behavior looks like F-RTO is
> in
> > effect.
> > >  And there seems to 
> > > > > be a bug in the TCP's congestion
> control
> > and
> > > retransmission algorithm. 
> > > > > Why doesn't the TCP on server (running
> > 2.6.24)
> > > enter the slow start? 
> > > > > Why should the server take that long
> to
> > recover
> > > from a short period 
> > > > > of packet loss?
> > > > >
> > > > > Has anyone else noticed similar
> problem
> > before?
> > >  If my analysis was 
> > > > > wrong, can anyone gives me some
> pointers to
> > > what's really wrong and 
> > > > > how to fix it?
> > > 
> > > Yes, 2.6.24 is an obsoleted version with known
> wrongs
> > in
> > > FRTO 
> > > implementation. Fixes never when to 2.6.24
> stable
> > series as
> > > it was 
> > > _already_ obsoleted when the problems where
> reported
> > and
> > > found. The 
> > > correct fixes may be found from 2.6.25.7 (.7
> iirc) and
> > are
> > > included from 
> > > 2.6.26 onward too.
> > > 
> > > Just in case you happen to run ubuntu based
> kernel
> > from
> > > that era (of 
> > > course you should be reporting the bug here
> then...),
> > a
> > > word of warning: 
> > > it seemed nearly impossible for them to get a
> simple
> > thing
> > > like that 
> > > fixed, I haven't been looking if they'd
> eventually
> > come to
> > > some sensible 
> > > conclusion in that matter or is it still
> unresolved
> > (or
> > > e.g., closed 
> > > without real resolution).
> 


      


^ permalink raw reply

* [PATCH] /proc/net/tcp, overhead removed
From: Yakov Lerner @ 2009-09-26 21:31 UTC (permalink / raw)
  To: linux-kernel, netdev, davem, kuznet, pekkas, jmorris, yoshfuji,
	kaber, torval
  Cc: Yakov Lerner

/proc/net/tcp does 20,000 sockets in 60-80 milliseconds, with this patch.

The overhead was in tcp_seq_start(). See analysis (3) below.
The patch is against Linus git tree (1). The patch is small.

------------  -----------   ------------------------------------
Before patch  After patch   20,000 sockets (10,000 tw + 10,000 estab)(2)
------------  -----------   ------------------------------------
6 sec          0.06 sec     dd bs=1k if=/proc/net/tcp >/dev/null 
1.5 sec        0.06 sec     dd bs=4k if=/proc/net/tcp >/dev/null

1.9 sec        0.16 sec     netstat -4ant >/dev/null
------------  -----------   ------------------------------------

This is ~ x25 improvement.
The new time is not dependent on read blockize.
Speed of netstat, naturally, improves, too; both -4 and -6.
/proc/net/tcp6 does 20,000 sockets in 100 millisec.

(1) against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

(2) Used 'manysock' utility to stress system with large number of sockets:
  "manysock 10000 10000"    - 10,000 tw + 10,000 estab ip4 sockets.
  "manysock -6 10000 10000" - 10,000 tw + 10,000 estab ip6 sockets.
Found at http://ilerner.3b1.org/manysock/manysock.c

(3) Algorithmic analysis. 
    Old algorithm.

During 'cat </proc/net/tcp', tcp_seq_start() is called O(numsockets) times (4).
On average, every call to tcp_seq_start() scans half the whole hashtable. Ouch.
This is O(numsockets * hashsize). 95-99% of 'cat </proc/net/tcp' is spent in
tcp_seq_start()->tcp_get_idx. This overhead is eliminated by new algorithm,
which is O(numsockets + hashsize).

    New algorithm.

New algorithms is O(numsockets + hashsize). We jump to the right
hash bucket in tcp_seq_start(), without scanning half the hash.
To jump right to the hash bucket corresponding to *pos in tcp_seq_start(),
we reuse three pieces of state (st->num, st->bucket, st->sbucket)
as follows:
 - we check that requested pos >= last seen pos (st->num), the typical case. 
 - if so, we jump to bucket st->bucket
 - to arrive to the right item after beginning of st->bucket, we
keep in st->sbucket the position corresponding to the beginning of
bucket.

(4) Explanation of O( numsockets * hashsize) of old algorithm.

tcp_seq_start() is called once for every ~7 lines of netstat output 
if readsize is 1kb, or once for every ~28 lines if readsize >= 4kb.
Since record length of /proc/net/tcp records is 150 bytes, formula for
number of calls to tcp_seq_start() is
            (numsockets * 150 / min(4096,readsize)).
Netstat uses 4kb readsize (newer versions), or 1kb (older versions).
Note that speed of old algorithm does not improve above 4kb blocksize.

Speed of the new algorithm does not depend on blocksize.

Speed of the new algorithm does not perceptibly depend on hashsize (which
depends on ramsize). Speed of old algorithm drops with bigger hashsize.

(5) Reporting order.

Reporting order is exactly same as before if hash does not change underfoot.
When hash elements come and go during report, reporting order will be
same as that of tcpdiag.

Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
---
 net/ipv4/tcp_ipv4.c |   26 ++++++++++++++++++++++++--
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..7d9421a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1994,13 +1994,14 @@ static inline int empty_bucket(struct tcp_iter_state *st)
 		hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
 }
 
-static void *established_get_first(struct seq_file *seq)
+static void *established_get_first_after(struct seq_file *seq, int bucket)
 {
 	struct tcp_iter_state *st = seq->private;
 	struct net *net = seq_file_net(seq);
 	void *rc = NULL;
 
-	for (st->bucket = 0; st->bucket < tcp_hashinfo.ehash_size; ++st->bucket) {
+	for (st->bucket = bucket; st->bucket < tcp_hashinfo.ehash_size;
+	     ++st->bucket) {
 		struct sock *sk;
 		struct hlist_nulls_node *node;
 		struct inet_timewait_sock *tw;
@@ -2036,6 +2037,11 @@ out:
 	return rc;
 }
 
+static void *established_get_first(struct seq_file *seq)
+{
+	return established_get_first_after(seq, 0);
+}
+
 static void *established_get_next(struct seq_file *seq, void *cur)
 {
 	struct sock *sk = cur;
@@ -2045,6 +2051,7 @@ static void *established_get_next(struct seq_file *seq, void *cur)
 	struct net *net = seq_file_net(seq);
 
 	++st->num;
+	st->sbucket = st->num;
 
 	if (st->state == TCP_SEQ_STATE_TIME_WAIT) {
 		tw = cur;
@@ -2116,6 +2123,21 @@ static void *tcp_get_idx(struct seq_file *seq, loff_t pos)
 static void *tcp_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	struct tcp_iter_state *st = seq->private;
+
+	if (*pos && *pos >= st->sbucket &&
+	    (st->state == TCP_SEQ_STATE_ESTABLISHED ||
+	     st->state == TCP_SEQ_STATE_TIME_WAIT)) {
+		int nskip;
+		void *cur;
+
+		st->num = st->sbucket;
+		st->state = TCP_SEQ_STATE_ESTABLISHED;
+		cur = established_get_first_after(seq, st->bucket);
+		for (nskip = *pos - st->sbucket; nskip > 0 && cur; --nskip)
+			cur = established_get_next(seq, cur);
+		return cur;
+	}
+
 	st->state = TCP_SEQ_STATE_LISTENING;
 	st->num = 0;
 	return *pos ? tcp_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
-- 
1.6.5.rc2


^ permalink raw reply related

* tg3 and Broadcom PHY driver
From: Felix Radensky @ 2009-09-26 21:32 UTC (permalink / raw)
  To: netdev

Hi,

I've noticed that in linux-2.6.31 I have to make tg3 driver modular, due to
its dependency on Broadcom PHY driver. If both tg3 and PHY driver are
compiled into the kernel, tg3 fails to detect a PHY, apparently because PHY
driver is loaded later. I'm using BCM57760 on embedded powerpc platform
(MPC8536).

How can I make tg3 work when it's compiled into the kernel ?

Thanks.

Felix.

^ permalink raw reply

* tg3: Badness at kernel/mutex.c:207
From: Felix Radensky @ 2009-09-26 21:20 UTC (permalink / raw)
  To: netdev

Hi,

I'm running linux-2.6.31 on a custom MPC8536 based board with BCM57760 chip.
Both tg3 driver, and Broadcom PHY driver are modules.

Each time I run ifconfig eth2 up, I get the following error message:

Badness at kernel/mutex.c:207
NIP: c025132c LR: c0251314 CTR: c0251334
REGS: efbedbd0 TRAP: 0700   Not tainted  (2.6.31)
MSR: 00029000 <EE,ME,CE>  CR: 24020422  XER: 00000000
TASK = efacce10[1080] 'ifconfig' THREAD: efbec000
GPR00: 00000000 efbedc80 efacce10 00000001 00007020 00000002 00000000 
00000200
GPR08: 00029000 c0350000 c0330000 00000001 24020424 10057d94 000002a0 
1000d82c
GPR16: 1000d81c 1000d814 10010000 10050000 ef897a0c efbede18 ffff8914 
ef897a00
GPR24: 00008000 c034b480 efbec000 efb0122c c0350000 efacce10 ef82d2c0 
efb01228
NIP [c025132c] __mutex_lock_slowpath+0x1f0/0x1f8
LR [c0251314] __mutex_lock_slowpath+0x1d8/0x1f8
Call Trace:
[efbedcd0] [c025134c] mutex_lock+0x18/0x34
[efbedcf0] [f534a228] tg3_chip_reset+0x7cc/0x9f8 [tg3]
[efbedd20] [f534a8f0] tg3_reset_hw+0x58/0x2360 [tg3]
[efbedd70] [f5351dd4] tg3_open+0x610/0x910 [tg3]
[efbeddb0] [c01e1c6c] dev_open+0x100/0x138
[efbeddd0] [c01dff20] dev_change_flags+0x80/0x1ac
[efbeddf0] [c02232cc] devinet_ioctl+0x648/0x824
[efbede60] [c0223de4] inet_ioctl+0xcc/0xf8
[efbede70] [c01cdf44] sock_ioctl+0x60/0x300
[efbede90] [c008a35c] vfs_ioctl+0x34/0x8c
[efbedea0] [c008a580] do_vfs_ioctl+0x88/0x724
[efbedf10] [c008ac5c] sys_ioctl+0x40/0x74
[efbedf40] [c000f814] ret_from_syscall+0x0/0x3c
Instruction dump:
0fe00000 4bfffe80 801a000c 5409016f 4182fe60 4bf0f6d9 2f830000 41befe54
3d20c035 8009c2c0 2f800000 40befe44 <0fe00000> 4bfffe3c 9421ffe0 7c0802a6

Does it indicate a real problem, or something that can be ignored ?

Additional information from kernel log:

tg3.c:v3.99 (April 20, 2009)
tg3 0002:05:00.0: enabling bus mastering
tg3 0002:05:00.0: PME# disabled
tg3 mdio bus: probed
eth2: Tigon3 [partno(BCM57760) rev 57780001] (PCI Express) MAC address 
00:10:18:00:00:00
eth2: attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=500:01)
eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
eth2: dma_rwctrl[76180000] dma_mask[64-bit]
tg3 0002:05:00.0: PME# disabled

Thanks.

Felix.

^ permalink raw reply

* Re: 2.6.31 regression: e1000e jumbo frames no longer work: 'Unsupported MTU setting'
From: Alexander Duyck @ 2009-09-27  2:14 UTC (permalink / raw)
  To: Nix; +Cc: e1000-devel, netdev, bruce.w.allan, linux-kernel
In-Reply-To: <871vluc5wi.fsf@spindle.srvr.nix>

On Sat, Sep 26, 2009 at 4:16 AM, Nix <nix@esperi.org.uk> wrote:
> [Bruce, you have changes in net-next in this area, so you might have a clue
>  what's going on here.]
>
> In 2.6.30.x, I was happily bringing up the 82574L cards in one server like
> this:
>
> ip link set fastnet up mtu 7200
>
> As of 2.6.31.x, what I see is this:
>
> spindle:/root# ip link set mtu 7200 dev fastnet
> RTNETLINK answers: Invalid argument
> [ 3380.261796] 0000:02:00.0: fastnet: Unsupported MTU setting
>
> As far as I can tell, all MTUs above 1500 now fail.
>
> 'Unsupported' or not, this used to work, and I'd certainly expect jumbo
> frames to be supported on a gigabit card!
>
> I can't see any terribly relevant changes to e1000e between 2.6.30 and
> 2.6.31, so I'm Cc:ing netdev on the offchance that this is something
> more generic (unlikely, as 7200-byte MTUs still work fine in 2.6.31 with
> the r8169 I'm typing this on, but that doesn't help if half the subnet
> is forced to use MTUs of 1500).

It looks like the problem is that the 82574 and 82583 seem to have
their max_hw_frame_size values swapped.  You might try applying the
patch below.  I am not sure if it will apply since I hand generated it
using the git patch that seems to have introduced the problem, and I
am sending the patch through an untested account that may mangle the
patch.  I will see about submitting an official patch for this
sometime next few days.

Thanks,

Alex

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -1803,7 +1803,7 @@ struct e1000_info e1000_82574_info = {
 				  | FLAG_HAS_AMT
 				  | FLAG_HAS_CTRLEXT_ON_LOAD,
 	.pba			= 20,
-	.max_hw_frame_size	= ETH_FRAME_LEN + ETH_FCS_LEN,
+	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
 	.mac_ops		= &e82571_mac_ops,
 	.phy_ops		= &e82_phy_ops_bm,
@@ -1820,7 +1820,7 @@ struct e1000_info e1000_82583_info = {
 				  | FLAG_HAS_AMT
 				  | FLAG_HAS_CTRLEXT_ON_LOAD,
 	.pba			= 20,
-	.max_hw_frame_size	= DEFAULT_JUMBO,
+	.max_hw_frame_size	= ETH_FRAME_LEN + ETH_FCS_LEN,
 	.get_variants		= e1000_get_variants_82571,
 	.mac_ops		= &e82571_mac_ops,
 	.phy_ops		= &e82_phy_ops_bm,

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf

^ permalink raw reply

* Re: [net-2.6 PATCH 01/13] e1000: drop dead pcie code from e1000
From: David Miller @ 2009-09-27  3:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jesse.brandeburg, donald.c.skidmore
In-Reply-To: <20090925221613.26715.66796.stgit@localhost.localdomain>


All 13 patches applied, thanks.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/3] net: fix vlan_get_size to include vlan_flags size
From: David Miller @ 2009-09-27  3:18 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, john.r.fastabend
In-Reply-To: <20090925231124.23450.94680.stgit@localhost.localdomain>


All 3 patches applied, thanks!

^ permalink raw reply

* Re: [PATCH 0/4] ISDN patches for 2.6.32 (v2)
From: David Miller @ 2009-09-27  3:24 UTC (permalink / raw)
  To: tilman; +Cc: isdn, keil, i4ldeveloper, netdev, linux-kernel
In-Reply-To: <4ABDFE9F.2030404@imap.cc>

From: Tilman Schmidt <tilman@imap.cc>
Date: Sat, 26 Sep 2009 13:44:31 +0200

> Hello Karsten,
> 
> is there any chance of getting these and the Gigaset patches forwarded
> for inclusion in 2.6.32 before the merge window closes?
> If not all of them, perhaps at least those which you had already acked
> before David Miller asked that they should formally go through you
> (#2-4 of the ISDN series), and those which are just fixes to the
> existing i4l version of the driver (#1-10 of the Gigaset series)?
> I would really appreciate not having to maintain all of them out of
> tree for another release cycle.

I am extremely disappointed in the lack of respinsiveness of Karsten
to ISDN patches.  He wants to handle them, but he has effectively
disappeared during the most critical time for patch integration, which
is during the merge window.

If he's busy in life or whatever, that's fine, but in such a case he
should appoint someone to handle ISDN patches until he does have time.

And if no ISDN expert is available, Tilman's suggestion of letting
me integrate the patches should be taken.

^ permalink raw reply

* Re: [PATCH] atm: dereference of he_dev->rbps_virt in he_init_group()
From: David Miller @ 2009-09-27  3:26 UTC (permalink / raw)
  To: roel.kluin; +Cc: joe, chas, linux-atm-general, netdev, akpm
In-Reply-To: <4ABE152F.20507@gmail.com>

From: Roel Kluin <roel.kluin@gmail.com>
Date: Sat, 26 Sep 2009 15:20:47 +0200

> he_dev->rbps_virt or he_dev->rbpl_virt allocation may fail, so
> check them. Make sure that he_init_group() cleans up after
> errors.
> 
> Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
> Signed-off-by: "Juha Leppanen" <juha_motorsportcom@luukku.com>
> ---
> David, I swapped rbps and rbpl arguments in my last patch, and
> there were some other problems. This was pointed out by Juha
> Leppanen. Can you please replace the former patch by this one? 
> 
> This was version was build, sparse and checkpatch tested.
> 
> Sorry for the mess.

I can't just "replace" it, especially since your change is even
already in Linus's tree.

Please send me a relative fixup rather than a new patch.

^ permalink raw reply

* Re: [PATCH] Revert "sit: stateless autoconf for isatap"
From: David Miller @ 2009-09-27  3:28 UTC (permalink / raw)
  To: contact; +Cc: netdev, fred.l.templin
In-Reply-To: <1253977393-7757-1-git-send-email-contact@saschahlusiak.de>

From: Sascha Hlusiak <contact@saschahlusiak.de>
Date: Sat, 26 Sep 2009 17:03:13 +0200

> This reverts commit 645069299a1c7358cf7330afe293f07552f11a5d.
> 
> While the code does not actually break anything, it does not completely follow
> RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
> ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
> implemented in userspace.
> 
> The kernel should not send RS packages for ISATAP at all.
> 
> Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
> Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>

Applied, thanks.

^ permalink raw reply

* [PATCH 3/3] drivers/staging/hv/: use %pU to print UUID/GUIDs
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH
In-Reply-To: <cover.1254030722.git.joe@perches.com>

Converted individual GUID/UUID printing functions
to use the new %pU[Xr] in lib/vsprintf.c

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/staging/hv/ChannelMgmt.c |   22 +-------
 drivers/staging/hv/vmbus_drv.c   |  116 ++++----------------------------------
 2 files changed, 14 insertions(+), 124 deletions(-)

diff --git a/drivers/staging/hv/ChannelMgmt.c b/drivers/staging/hv/ChannelMgmt.c
index 3db62ca..8b0fb81 100644
--- a/drivers/staging/hv/ChannelMgmt.c
+++ b/drivers/staging/hv/ChannelMgmt.c
@@ -263,28 +263,10 @@ static void VmbusChannelOnOffer(struct vmbus_channel_message_header *hdr)
 
 	DPRINT_INFO(VMBUS, "Channel offer notification - "
 		    "child relid %d monitor id %d allocated %d, "
-		    "type {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		    "%02x%02x%02x%02x%02x%02x%02x%02x} "
-		    "instance {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		    "%02x%02x%02x%02x%02x%02x%02x%02x}",
+		    "type {%pUr} instance {%pUr}",
 		    offer->ChildRelId, offer->MonitorId,
 		    offer->MonitorAllocated,
-		    guidType->data[3], guidType->data[2],
-		    guidType->data[1], guidType->data[0],
-		    guidType->data[5], guidType->data[4],
-		    guidType->data[7], guidType->data[6],
-		    guidType->data[8], guidType->data[9],
-		    guidType->data[10], guidType->data[11],
-		    guidType->data[12], guidType->data[13],
-		    guidType->data[14], guidType->data[15],
-		    guidInstance->data[3], guidInstance->data[2],
-		    guidInstance->data[1], guidInstance->data[0],
-		    guidInstance->data[5], guidInstance->data[4],
-		    guidInstance->data[7], guidInstance->data[6],
-		    guidInstance->data[8], guidInstance->data[9],
-		    guidInstance->data[10], guidInstance->data[11],
-		    guidInstance->data[12], guidInstance->data[13],
-		    guidInstance->data[14], guidInstance->data[15]);
+		    guidType->data, guidInstance->data);
 
 	/* Allocate the channel object and save this offer. */
 	newChannel = AllocVmbusChannel();
diff --git a/drivers/staging/hv/vmbus_drv.c b/drivers/staging/hv/vmbus_drv.c
index 582318f..73119a9 100644
--- a/drivers/staging/hv/vmbus_drv.c
+++ b/drivers/staging/hv/vmbus_drv.c
@@ -143,43 +143,10 @@ static ssize_t vmbus_show_device_attr(struct device *dev,
 	vmbus_child_device_get_info(&device_ctx->device_obj, &device_info);
 
 	if (!strcmp(dev_attr->attr.name, "class_id")) {
-		return sprintf(buf, "{%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			       "%02x%02x%02x%02x%02x%02x%02x%02x}\n",
-			       device_info.ChannelType.data[3],
-			       device_info.ChannelType.data[2],
-			       device_info.ChannelType.data[1],
-			       device_info.ChannelType.data[0],
-			       device_info.ChannelType.data[5],
-			       device_info.ChannelType.data[4],
-			       device_info.ChannelType.data[7],
-			       device_info.ChannelType.data[6],
-			       device_info.ChannelType.data[8],
-			       device_info.ChannelType.data[9],
-			       device_info.ChannelType.data[10],
-			       device_info.ChannelType.data[11],
-			       device_info.ChannelType.data[12],
-			       device_info.ChannelType.data[13],
-			       device_info.ChannelType.data[14],
-			       device_info.ChannelType.data[15]);
+		return sprintf(buf, "{%pUr}\n", device_info.ChannelType.data);
 	} else if (!strcmp(dev_attr->attr.name, "device_id")) {
-		return sprintf(buf, "{%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			       "%02x%02x%02x%02x%02x%02x%02x%02x}\n",
-			       device_info.ChannelInstance.data[3],
-			       device_info.ChannelInstance.data[2],
-			       device_info.ChannelInstance.data[1],
-			       device_info.ChannelInstance.data[0],
-			       device_info.ChannelInstance.data[5],
-			       device_info.ChannelInstance.data[4],
-			       device_info.ChannelInstance.data[7],
-			       device_info.ChannelInstance.data[6],
-			       device_info.ChannelInstance.data[8],
-			       device_info.ChannelInstance.data[9],
-			       device_info.ChannelInstance.data[10],
-			       device_info.ChannelInstance.data[11],
-			       device_info.ChannelInstance.data[12],
-			       device_info.ChannelInstance.data[13],
-			       device_info.ChannelInstance.data[14],
-			       device_info.ChannelInstance.data[15]);
+		return sprintf(buf, "{%pUr}\n",
+			       device_info.ChannelInstance.data);
 	} else if (!strcmp(dev_attr->attr.name, "state")) {
 		return sprintf(buf, "%d\n", device_info.ChannelState);
 	} else if (!strcmp(dev_attr->attr.name, "id")) {
@@ -487,23 +454,9 @@ static struct hv_device *vmbus_child_device_create(struct hv_guid *type,
 	}
 
 	DPRINT_DBG(VMBUS_DRV, "child device (%p) allocated - "
-		"type {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		"%02x%02x%02x%02x%02x%02x%02x%02x},"
-		"id {%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		"%02x%02x%02x%02x%02x%02x%02x%02x}",
+		"type {%pUr}, id {%pUr}",
 		&child_device_ctx->device,
-		type->data[3], type->data[2], type->data[1], type->data[0],
-		type->data[5], type->data[4], type->data[7], type->data[6],
-		type->data[8], type->data[9], type->data[10], type->data[11],
-		type->data[12], type->data[13], type->data[14], type->data[15],
-		instance->data[3], instance->data[2],
-		instance->data[1], instance->data[0],
-		instance->data[5], instance->data[4],
-		instance->data[7], instance->data[6],
-		instance->data[8], instance->data[9],
-		instance->data[10], instance->data[11],
-		instance->data[12], instance->data[13],
-		instance->data[14], instance->data[15]);
+		type->data, instance->data);
 
 	child_device_obj = &child_device_ctx->device_obj;
 	child_device_obj->context = context;
@@ -629,65 +582,20 @@ static int vmbus_uevent(struct device *device, struct kobj_uevent_env *env)
 
 	DPRINT_ENTER(VMBUS_DRV);
 
-	DPRINT_INFO(VMBUS_DRV, "generating uevent - VMBUS_DEVICE_CLASS_GUID={"
-		    "%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-		    "%02x%02x%02x%02x%02x%02x%02x%02x}",
-		    device_ctx->class_id.data[3], device_ctx->class_id.data[2],
-		    device_ctx->class_id.data[1], device_ctx->class_id.data[0],
-		    device_ctx->class_id.data[5], device_ctx->class_id.data[4],
-		    device_ctx->class_id.data[7], device_ctx->class_id.data[6],
-		    device_ctx->class_id.data[8], device_ctx->class_id.data[9],
-		    device_ctx->class_id.data[10],
-		    device_ctx->class_id.data[11],
-		    device_ctx->class_id.data[12],
-		    device_ctx->class_id.data[13],
-		    device_ctx->class_id.data[14],
-		    device_ctx->class_id.data[15]);
+	DPRINT_INFO(VMBUS_DRV,
+		    "generating uevent - VMBUS_DEVICE_CLASS_GUID={%pUr}",
+		    device_ctx->class_id.data);
 
 	env->envp_idx = i;
 	env->buflen = len;
-	ret = add_uevent_var(env, "VMBUS_DEVICE_CLASS_GUID={"
-			     "%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			     "%02x%02x%02x%02x%02x%02x%02x%02x}",
-			     device_ctx->class_id.data[3],
-			     device_ctx->class_id.data[2],
-			     device_ctx->class_id.data[1],
-			     device_ctx->class_id.data[0],
-			     device_ctx->class_id.data[5],
-			     device_ctx->class_id.data[4],
-			     device_ctx->class_id.data[7],
-			     device_ctx->class_id.data[6],
-			     device_ctx->class_id.data[8],
-			     device_ctx->class_id.data[9],
-			     device_ctx->class_id.data[10],
-			     device_ctx->class_id.data[11],
-			     device_ctx->class_id.data[12],
-			     device_ctx->class_id.data[13],
-			     device_ctx->class_id.data[14],
-			     device_ctx->class_id.data[15]);
+	ret = add_uevent_var(env, "VMBUS_DEVICE_CLASS_GUID={%pUr}",
+			     device_ctx->class_id.data);
 
 	if (ret)
 		return ret;
 
-	ret = add_uevent_var(env, "VMBUS_DEVICE_DEVICE_GUID={"
-			     "%02x%02x%02x%02x-%02x%02x-%02x%02x-"
-			     "%02x%02x%02x%02x%02x%02x%02x%02x}",
-			     device_ctx->device_id.data[3],
-			     device_ctx->device_id.data[2],
-			     device_ctx->device_id.data[1],
-			     device_ctx->device_id.data[0],
-			     device_ctx->device_id.data[5],
-			     device_ctx->device_id.data[4],
-			     device_ctx->device_id.data[7],
-			     device_ctx->device_id.data[6],
-			     device_ctx->device_id.data[8],
-			     device_ctx->device_id.data[9],
-			     device_ctx->device_id.data[10],
-			     device_ctx->device_id.data[11],
-			     device_ctx->device_id.data[12],
-			     device_ctx->device_id.data[13],
-			     device_ctx->device_id.data[14],
-			     device_ctx->device_id.data[15]);
+	ret = add_uevent_var(env, "VMBUS_DEVICE_DEVICE_GUID={%pUr}",
+			     device_ctx->device_id.data);
 	if (ret)
 		return ret;
 
-- 
1.6.3.1.10.g659a0.dirty


^ permalink raw reply related

* [RFC PATCH 0/3] print UUID/GUIDs with %pU
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH

Perhaps UUIDs are common enough to use a %p extension

Joe Perches (3):
  lib/vsprintf.c: Add %pU - ptr to a UUID/GUID
  treewide: use %pU to print UUID/GUIDs
  drivers/staging/hv/: use %pU to print UUID/GUIDs

 drivers/char/random.c                |   10 +--
 drivers/firmware/dmi_scan.c          |    5 +-
 drivers/md/md.c                      |   16 +----
 drivers/media/video/uvc/uvc_ctrl.c   |   69 +++++++++-----------
 drivers/media/video/uvc/uvc_driver.c |    7 +-
 drivers/media/video/uvc/uvcvideo.h   |   10 ---
 drivers/staging/hv/ChannelMgmt.c     |   22 +------
 drivers/staging/hv/vmbus_drv.c       |  116 ++++------------------------------
 fs/gfs2/sys.c                        |   16 +----
 fs/ubifs/debug.c                     |    9 +--
 fs/ubifs/super.c                     |    7 +--
 fs/xfs/xfs_log_recover.c             |   14 +---
 include/linux/efi.h                  |    6 +--
 lib/vsprintf.c                       |   58 +++++++++++++++++-
 14 files changed, 125 insertions(+), 240 deletions(-)

^ permalink raw reply

* [PATCH 1/3] lib/vsprintf.c: Add %pU - ptr to a UUID/GUID
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH
In-Reply-To: <cover.1254030722.git.joe@perches.com>

UUID/GUIDs are somewhat common in kernel source.

Standardize the printed style of UUID/GUIDs by using
another extension to %p.

%pU:    01020304:0506:0708:090a:0b0c0d0e0f10
%pUr:   04030201:0605:0807:0a09:0b0c0d0e0f10
%pU[r]X:Use upper case hex

Signed-off-by: Joe Perches <joe@perches.com>
---
 lib/vsprintf.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 57 insertions(+), 1 deletions(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index b91839e..68a49bb 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -790,6 +790,53 @@ static char *ip4_addr_string(char *buf, char *end, const u8 *addr,
 	return string(buf, end, ip4_addr, spec);
 }
 
+static char *uuid_string(char *buf, char *end, const u8 *addr,
+			 struct printf_spec spec, const char *fmt)
+{
+	char uuid[sizeof("xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx")];
+	char *p = uuid;
+	int i;
+	static const u8 r[16] = {3,2,1,0,5,4,7,6,8,9,10,11,12,13,14,15};
+	static const u8 n[16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+	const u8 *index = n;
+	bool uc = false;
+
+	while (isalnum(*(++fmt))) {
+		switch (*fmt) {
+		case 'r':
+			index = r;
+			break;
+		case 'X':
+			uc = true;
+			break;
+		}
+	}
+
+	for (i = 0; i < 16; i++) {
+		p = pack_hex_byte(p, addr[index[i]]);
+		switch (i) {
+		case 3:
+		case 5:
+		case 7:
+		case 9:
+			*p++ = '-';
+			break;
+		}
+	}
+
+	*p = 0;
+
+	if (uc) {
+		p = uuid;
+		while (*p) {
+			*p = toupper(*p);
+			p++;
+		}
+	}
+
+	return string(buf, end, uuid, spec);
+}
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -814,6 +861,13 @@ static char *ip4_addr_string(char *buf, char *end, const u8 *addr,
  *       IPv4 uses dot-separated decimal with leading 0's (010.123.045.006)
  * - 'I6c' for IPv6 addresses printed as specified by
  *       http://www.ietf.org/id/draft-kawamura-ipv6-text-representation-03.txt
+ * - 'U' For a 16 byte UUID/GUID, it prints the UUID/GUID in the form
+ *       "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
+ *       Options for %pU are:
+ *       'X' use upper case hex digits
+ *       'r' use LE byte order for U32 and U16s equivalents.  Use indices:
+ *       [3][2][1][0]-[5][4]-[7][6]-[9][8]-[10]...[15]
+ *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
  * pointer to the real address.
@@ -828,9 +882,9 @@ static char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 	case 'F':
 	case 'f':
 		ptr = dereference_function_descriptor(ptr);
-	case 's':
 		/* Fallthrough */
 	case 'S':
+	case 's':
 		return symbol_string(buf, end, ptr, spec, *fmt);
 	case 'R':
 		return resource_string(buf, end, ptr, spec);
@@ -853,6 +907,8 @@ static char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 			return ip4_addr_string(buf, end, ptr, spec, fmt);
 		}
 		break;
+	case 'U':
+		return uuid_string(buf, end, ptr, spec, fmt);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {
-- 
1.6.3.1.10.g659a0.dirty

^ permalink raw reply related

* [PATCH 2/3] treewide: use %pU to print UUID/GUIDs
From: Joe Perches @ 2009-09-27  5:57 UTC (permalink / raw)
  To: linux-kernel, netdev, Greg KH
In-Reply-To: <cover.1254030722.git.joe@perches.com>

Converted individual GUID/UUID printing functions
to use the new %pU[Xr] in lib/vsprintf.c

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/char/random.c                |   10 +---
 drivers/firmware/dmi_scan.c          |    5 +--
 drivers/md/md.c                      |   16 ++------
 drivers/media/video/uvc/uvc_ctrl.c   |   69 ++++++++++++++++------------------
 drivers/media/video/uvc/uvc_driver.c |    7 +--
 drivers/media/video/uvc/uvcvideo.h   |   10 -----
 fs/gfs2/sys.c                        |   16 +------
 fs/ubifs/debug.c                     |    9 +---
 fs/ubifs/super.c                     |    7 +---
 fs/xfs/xfs_log_recover.c             |   14 ++-----
 include/linux/efi.h                  |    6 +--
 11 files changed, 54 insertions(+), 115 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 04b505e..7104df9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1245,12 +1245,8 @@ static int proc_do_uuid(ctl_table *table, int write,
 	if (uuid[8] == 0)
 		generate_random_uuid(uuid);
 
-	sprintf(buf, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-"
-		"%02x%02x%02x%02x%02x%02x",
-		uuid[0],  uuid[1],  uuid[2],  uuid[3],
-		uuid[4],  uuid[5],  uuid[6],  uuid[7],
-		uuid[8],  uuid[9],  uuid[10], uuid[11],
-		uuid[12], uuid[13], uuid[14], uuid[15]);
+	sprintf(buf, "%pU", uuid);
+
 	fake_table.data = buf;
 	fake_table.maxlen = sizeof(buf);
 
@@ -1350,7 +1346,7 @@ ctl_table random_table[] = {
 
 /********************************************************************
  *
- * Random funtions for networking
+ * Random functions for networking
  *
  ********************************************************************/
 
diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 938100f..c0deabb 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -169,10 +169,7 @@ static void __init dmi_save_uuid(const struct dmi_header *dm, int slot, int inde
 	if (!s)
 		return;
 
-	sprintf(s,
-		"%02X%02X%02X%02X-%02X%02X-%02X%02X-%02X%02X-%02X%02X%02X%02X%02X%02X",
-		d[0], d[1], d[2], d[3], d[4], d[5], d[6], d[7],
-		d[8], d[9], d[10], d[11], d[12], d[13], d[14], d[15]);
+	sprintf(s, "%pUX", d);
 
         dmi_ident[slot] = s;
 }
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 26ba42a..68b52d7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1813,15 +1813,11 @@ static void print_sb_1(struct mdp_superblock_1 *sb)
 
 	uuid = sb->set_uuid;
 	printk(KERN_INFO
-	       "md:  SB: (V:%u) (F:0x%08x) Array-ID:<%02x%02x%02x%02x"
-	       ":%02x%02x:%02x%02x:%02x%02x:%02x%02x%02x%02x%02x%02x>\n"
+	       "md:  SB: (V:%u) (F:0x%08x) Array-ID:<%pU>\n"
 	       "md:    Name: \"%s\" CT:%llu\n",
 		le32_to_cpu(sb->major_version),
 		le32_to_cpu(sb->feature_map),
-		uuid[0], uuid[1], uuid[2], uuid[3],
-		uuid[4], uuid[5], uuid[6], uuid[7],
-		uuid[8], uuid[9], uuid[10], uuid[11],
-		uuid[12], uuid[13], uuid[14], uuid[15],
+		uuid,
 		sb->set_name,
 		(unsigned long long)le64_to_cpu(sb->ctime)
 		       & MD_SUPERBLOCK_1_TIME_SEC_MASK);
@@ -1830,8 +1826,7 @@ static void print_sb_1(struct mdp_superblock_1 *sb)
 	printk(KERN_INFO
 	       "md:       L%u SZ%llu RD:%u LO:%u CS:%u DO:%llu DS:%llu SO:%llu"
 			" RO:%llu\n"
-	       "md:     Dev:%08x UUID: %02x%02x%02x%02x:%02x%02x:%02x%02x:%02x%02x"
-	                ":%02x%02x%02x%02x%02x%02x\n"
+	       "md:     Dev:%08x UUID: %pU\n"
 	       "md:       (F:0x%08x) UT:%llu Events:%llu ResyncOffset:%llu CSUM:0x%08x\n"
 	       "md:         (MaxDev:%u) \n",
 		le32_to_cpu(sb->level),
@@ -1844,10 +1839,7 @@ static void print_sb_1(struct mdp_superblock_1 *sb)
 		(unsigned long long)le64_to_cpu(sb->super_offset),
 		(unsigned long long)le64_to_cpu(sb->recovery_offset),
 		le32_to_cpu(sb->dev_number),
-		uuid[0], uuid[1], uuid[2], uuid[3],
-		uuid[4], uuid[5], uuid[6], uuid[7],
-		uuid[8], uuid[9], uuid[10], uuid[11],
-		uuid[12], uuid[13], uuid[14], uuid[15],
+		uuid,
 		sb->devflags,
 		(unsigned long long)le64_to_cpu(sb->utime) & MD_SUPERBLOCK_1_TIME_SEC_MASK,
 		(unsigned long long)le64_to_cpu(sb->events),
diff --git a/drivers/media/video/uvc/uvc_ctrl.c b/drivers/media/video/uvc/uvc_ctrl.c
index c3225a5..2959e46 100644
--- a/drivers/media/video/uvc/uvc_ctrl.c
+++ b/drivers/media/video/uvc/uvc_ctrl.c
@@ -1093,8 +1093,8 @@ int uvc_xu_ctrl_query(struct uvc_video_chain *chain,
 
 	if (!found) {
 		uvc_trace(UVC_TRACE_CONTROL,
-			"Control " UVC_GUID_FORMAT "/%u not found.\n",
-			UVC_GUID_ARGS(entity->extension.guidExtensionCode),
+			"Control %pUr/%u not found.\n",
+			entity->extension.guidExtensionCode,
 			xctrl->selector);
 		return -EINVAL;
 	}
@@ -1171,9 +1171,9 @@ int uvc_ctrl_resume_device(struct uvc_device *dev)
 			    (ctrl->info->flags & UVC_CONTROL_RESTORE) == 0)
 				continue;
 
-			printk(KERN_INFO "restoring control " UVC_GUID_FORMAT
-				"/%u/%u\n", UVC_GUID_ARGS(ctrl->info->entity),
-				ctrl->info->index, ctrl->info->selector);
+			printk(KERN_INFO "restoring control %pUr/%u/%u\n",
+			       ctrl->info->entity,
+			       ctrl->info->index, ctrl->info->selector);
 			ctrl->dirty = 1;
 		}
 
@@ -1228,46 +1228,43 @@ static void uvc_ctrl_add_ctrl(struct uvc_device *dev,
 			dev->intfnum, info->selector, (__u8 *)&size, 2);
 		if (ret < 0) {
 			uvc_trace(UVC_TRACE_CONTROL, "GET_LEN failed on "
-				"control " UVC_GUID_FORMAT "/%u (%d).\n",
-				UVC_GUID_ARGS(info->entity), info->selector,
-				ret);
+				  "control %pUr/%u (%d).\n",
+				  info->entity, info->selector, ret);
 			return;
 		}
 
 		if (info->size != le16_to_cpu(size)) {
-			uvc_trace(UVC_TRACE_CONTROL, "Control " UVC_GUID_FORMAT
-				"/%u size doesn't match user supplied "
-				"value.\n", UVC_GUID_ARGS(info->entity),
-				info->selector);
+			uvc_trace(UVC_TRACE_CONTROL,
+				  "Control %pUr/%u size doesn't match user supplied value.\n",
+				  info->entity, info->selector);
 			return;
 		}
 
 		ret = uvc_query_ctrl(dev, UVC_GET_INFO, ctrl->entity->id,
 			dev->intfnum, info->selector, &inf, 1);
 		if (ret < 0) {
-			uvc_trace(UVC_TRACE_CONTROL, "GET_INFO failed on "
-				"control " UVC_GUID_FORMAT "/%u (%d).\n",
-				UVC_GUID_ARGS(info->entity), info->selector,
-				ret);
+			uvc_trace(UVC_TRACE_CONTROL,
+				  "GET_INFO failed on control %pUr/%u (%d).\n",
+				  info->entity, info->selector, ret);
 			return;
 		}
 
 		flags = info->flags;
 		if (((flags & UVC_CONTROL_GET_CUR) && !(inf & (1 << 0))) ||
 		    ((flags & UVC_CONTROL_SET_CUR) && !(inf & (1 << 1)))) {
-			uvc_trace(UVC_TRACE_CONTROL, "Control "
-				UVC_GUID_FORMAT "/%u flags don't match "
-				"supported operations.\n",
-				UVC_GUID_ARGS(info->entity), info->selector);
+			uvc_trace(UVC_TRACE_CONTROL,
+				  "Control %pUr/%u flags don't match supported operations.\n",
+				  info->entity, info->selector);
 			return;
 		}
 	}
 
 	ctrl->info = info;
 	ctrl->data = kmalloc(ctrl->info->size * UVC_CTRL_NDATA, GFP_KERNEL);
-	uvc_trace(UVC_TRACE_CONTROL, "Added control " UVC_GUID_FORMAT "/%u "
-		"to device %s entity %u\n", UVC_GUID_ARGS(ctrl->info->entity),
-		ctrl->info->selector, dev->udev->devpath, entity->id);
+	uvc_trace(UVC_TRACE_CONTROL,
+		  "Added control %pUr/%u to device %s entity %u\n",
+		  ctrl->info->entity, ctrl->info->selector,
+		  dev->udev->devpath, entity->id);
 }
 
 /*
@@ -1293,17 +1290,16 @@ int uvc_ctrl_add_info(struct uvc_control_info *info)
 			continue;
 
 		if (ctrl->selector == info->selector) {
-			uvc_trace(UVC_TRACE_CONTROL, "Control "
-				UVC_GUID_FORMAT "/%u is already defined.\n",
-				UVC_GUID_ARGS(info->entity), info->selector);
+			uvc_trace(UVC_TRACE_CONTROL,
+				  "Control %pUr/%u is already defined.\n",
+				  info->entity, info->selector);
 			ret = -EEXIST;
 			goto end;
 		}
 		if (ctrl->index == info->index) {
-			uvc_trace(UVC_TRACE_CONTROL, "Control "
-				UVC_GUID_FORMAT "/%u would overwrite index "
-				"%d.\n", UVC_GUID_ARGS(info->entity),
-				info->selector, info->index);
+			uvc_trace(UVC_TRACE_CONTROL,
+				  "Control %pUr/%u would overwrite index %d.\n",
+				  info->entity, info->selector, info->index);
 			ret = -EEXIST;
 			goto end;
 		}
@@ -1344,10 +1340,9 @@ int uvc_ctrl_add_mapping(struct uvc_control_mapping *mapping)
 			continue;
 
 		if (info->size * 8 < mapping->size + mapping->offset) {
-			uvc_trace(UVC_TRACE_CONTROL, "Mapping '%s' would "
-				"overflow control " UVC_GUID_FORMAT "/%u\n",
-				mapping->name, UVC_GUID_ARGS(info->entity),
-				info->selector);
+			uvc_trace(UVC_TRACE_CONTROL,
+				  "Mapping '%s' would overflow control %pUr/%u\n",
+				  mapping->name, info->entity, info->selector);
 			ret = -EOVERFLOW;
 			goto end;
 		}
@@ -1366,9 +1361,9 @@ int uvc_ctrl_add_mapping(struct uvc_control_mapping *mapping)
 
 		mapping->ctrl = info;
 		list_add_tail(&mapping->list, &info->mappings);
-		uvc_trace(UVC_TRACE_CONTROL, "Adding mapping %s to control "
-			UVC_GUID_FORMAT "/%u.\n", mapping->name,
-			UVC_GUID_ARGS(info->entity), info->selector);
+		uvc_trace(UVC_TRACE_CONTROL,
+			  "Adding mapping %s to control %pUr/%u.\n",
+			  mapping->name, info->entity, info->selector);
 
 		ret = 0;
 		break;
diff --git a/drivers/media/video/uvc/uvc_driver.c b/drivers/media/video/uvc/uvc_driver.c
index 8756be5..647d0a2 100644
--- a/drivers/media/video/uvc/uvc_driver.c
+++ b/drivers/media/video/uvc/uvc_driver.c
@@ -328,11 +328,10 @@ static int uvc_parse_format(struct uvc_device *dev,
 				sizeof format->name);
 			format->fcc = fmtdesc->fcc;
 		} else {
-			uvc_printk(KERN_INFO, "Unknown video format "
-				UVC_GUID_FORMAT "\n",
-				UVC_GUID_ARGS(&buffer[5]));
+			uvc_printk(KERN_INFO, "Unknown video format %pUr\n",
+				   &buffer[5]);
 			snprintf(format->name, sizeof format->name,
-				UVC_GUID_FORMAT, UVC_GUID_ARGS(&buffer[5]));
+				 "%pUr", &Buffer[5]);
 			format->fcc = 0;
 		}
 
diff --git a/drivers/media/video/uvc/uvcvideo.h b/drivers/media/video/uvc/uvcvideo.h
index e7958aa..9f4a437 100644
--- a/drivers/media/video/uvc/uvcvideo.h
+++ b/drivers/media/video/uvc/uvcvideo.h
@@ -555,16 +555,6 @@ extern unsigned int uvc_trace_param;
 #define uvc_printk(level, msg...) \
 	printk(level "uvcvideo: " msg)
 
-#define UVC_GUID_FORMAT "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-" \
-			"%02x%02x%02x%02x%02x%02x"
-#define UVC_GUID_ARGS(guid) \
-	(guid)[3],  (guid)[2],  (guid)[1],  (guid)[0], \
-	(guid)[5],  (guid)[4], \
-	(guid)[7],  (guid)[6], \
-	(guid)[8],  (guid)[9], \
-	(guid)[10], (guid)[11], (guid)[12], \
-	(guid)[13], (guid)[14], (guid)[15]
-
 /* --------------------------------------------------------------------------
  * Internal functions.
  */
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 4463297..56901be 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -85,11 +85,7 @@ static ssize_t uuid_show(struct gfs2_sbd *sdp, char *buf)
 	buf[0] = '\0';
 	if (!gfs2_uuid_valid(uuid))
 		return 0;
-	return snprintf(buf, PAGE_SIZE, "%02X%02X%02X%02X-%02X%02X-"
-			"%02X%02X-%02X%02X-%02X%02X%02X%02X%02X%02X\n",
-			uuid[0], uuid[1], uuid[2], uuid[3], uuid[4], uuid[5],
-			uuid[6], uuid[7], uuid[8], uuid[9], uuid[10], uuid[11],
-			uuid[12], uuid[13], uuid[14], uuid[15]);
+	return snprintf(buf, PAGE_SIZE, "%pUX\n", uuid);
 }
 
 static ssize_t freeze_show(struct gfs2_sbd *sdp, char *buf)
@@ -573,14 +569,8 @@ static int gfs2_uevent(struct kset *kset, struct kobject *kobj,
 	add_uevent_var(env, "LOCKPROTO=%s", sdp->sd_proto_name);
 	if (!sdp->sd_args.ar_spectator)
 		add_uevent_var(env, "JOURNALID=%u", sdp->sd_lockstruct.ls_jid);
-	if (gfs2_uuid_valid(uuid)) {
-		add_uevent_var(env, "UUID=%02X%02X%02X%02X-%02X%02X-%02X%02X-"
-			       "%02X%02X-%02X%02X%02X%02X%02X%02X",
-			       uuid[0], uuid[1], uuid[2], uuid[3], uuid[4],
-			       uuid[5], uuid[6], uuid[7], uuid[8], uuid[9],
-			       uuid[10], uuid[11], uuid[12], uuid[13],
-			       uuid[14], uuid[15]);
-	}
+	if (gfs2_uuid_valid(uuid))
+		add_uevent_var(env, "UUID=%pUX", uuid);
 	return 0;
 }
 
diff --git a/fs/ubifs/debug.c b/fs/ubifs/debug.c
index dbc093a..b16779e 100644
--- a/fs/ubifs/debug.c
+++ b/fs/ubifs/debug.c
@@ -350,13 +350,8 @@ void dbg_dump_node(const struct ubifs_info *c, const void *node)
 		       le32_to_cpu(sup->fmt_version));
 		printk(KERN_DEBUG "\ttime_gran      %u\n",
 		       le32_to_cpu(sup->time_gran));
-		printk(KERN_DEBUG "\tUUID           %02X%02X%02X%02X-%02X%02X"
-		       "-%02X%02X-%02X%02X-%02X%02X%02X%02X%02X%02X\n",
-		       sup->uuid[0], sup->uuid[1], sup->uuid[2], sup->uuid[3],
-		       sup->uuid[4], sup->uuid[5], sup->uuid[6], sup->uuid[7],
-		       sup->uuid[8], sup->uuid[9], sup->uuid[10], sup->uuid[11],
-		       sup->uuid[12], sup->uuid[13], sup->uuid[14],
-		       sup->uuid[15]);
+		printk(KERN_DEBUG "\tUUID           %pUX\n",
+		       sup->uuid);
 		break;
 	}
 	case UBIFS_MST_NODE:
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 333e181..7d59ab7 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1393,12 +1393,7 @@ static int mount_ubifs(struct ubifs_info *c)
 		c->leb_size, c->leb_size >> 10);
 	dbg_msg("data journal heads:  %d",
 		c->jhead_cnt - NONDATA_JHEADS_CNT);
-	dbg_msg("UUID:                %02X%02X%02X%02X-%02X%02X"
-	       "-%02X%02X-%02X%02X-%02X%02X%02X%02X%02X%02X",
-	       c->uuid[0], c->uuid[1], c->uuid[2], c->uuid[3],
-	       c->uuid[4], c->uuid[5], c->uuid[6], c->uuid[7],
-	       c->uuid[8], c->uuid[9], c->uuid[10], c->uuid[11],
-	       c->uuid[12], c->uuid[13], c->uuid[14], c->uuid[15]);
+	dbg_msg("UUID:                %pUX", c->uuid);
 	dbg_msg("big_lpt              %d", c->big_lpt);
 	dbg_msg("log LEBs:            %d (%d - %d)",
 		c->log_lebs, UBIFS_LOG_LNUM, c->log_last);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1099395..3b8e3df 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -225,16 +225,10 @@ xlog_header_check_dump(
 	xfs_mount_t		*mp,
 	xlog_rec_header_t	*head)
 {
-	int			b;
-
-	cmn_err(CE_DEBUG, "%s:  SB : uuid = ", __func__);
-	for (b = 0; b < 16; b++)
-		cmn_err(CE_DEBUG, "%02x", ((__uint8_t *)&mp->m_sb.sb_uuid)[b]);
-	cmn_err(CE_DEBUG, ", fmt = %d\n", XLOG_FMT);
-	cmn_err(CE_DEBUG, "    log : uuid = ");
-	for (b = 0; b < 16; b++)
-		cmn_err(CE_DEBUG, "%02x", ((__uint8_t *)&head->h_fs_uuid)[b]);
-	cmn_err(CE_DEBUG, ", fmt = %d\n", be32_to_cpu(head->h_fmt));
+	cmn_err(CE_DEBUG, "%s:  SB : uuid = %pU, fmt = %d\n",
+		__func__, &mp->m_sb.sb_uuid, XLOG_FMT);
+	cmn_err(CE_DEBUG, "    log : uuid = %pU, fmt = %d\n",
+		&head->h_fs_uuid, be32_to_cpu(head->h_fmt));
 }
 #else
 #define xlog_header_check_dump(mp, head)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index ce4581f..dd85b39 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -280,11 +280,7 @@ efi_guidcmp (efi_guid_t left, efi_guid_t right)
 static inline char *
 efi_guid_unparse(efi_guid_t *guid, char *out)
 {
-	sprintf(out, "%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x",
-		guid->b[3], guid->b[2], guid->b[1], guid->b[0],
-		guid->b[5], guid->b[4], guid->b[7], guid->b[6],
-		guid->b[8], guid->b[9], guid->b[10], guid->b[11],
-		guid->b[12], guid->b[13], guid->b[14], guid->b[15]);
+	sprintf(out, "%pUr", guid->b);
         return out;
 }
 
-- 
1.6.3.1.10.g659a0.dirty

^ permalink raw reply related

* Re: [PATCH] ax25: Fix ax25_cb refcounting in ax25_ctl_ioctl
From: Ralf Baechle DL5RB @ 2009-09-27  7:23 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Bernard Pidoux F6BVP, Bernard Pidoux,
	Linux Netdev List, linux-hams
In-Reply-To: <20090925183504.GA3307@del.dom.local>

On Fri, Sep 25, 2009 at 08:35:04PM +0200, Jarek Poplawski wrote:

> > > This bug isn't responsible for these oopses here, but looks quite
> > > obviously. (I'm not sure if it's easy to test/hit with the common
> > > tools.)
> > 
> > The issue your patch fixes is obvious enough.
> 
> Yes, with new code there would be no doubt. But here, if you know it's
> worked for some time, you wonder if you're not blind. |-)

Most of of the ioctls are used by AX.25 userland which does error
checking on user supplied values so userland will never attempt invalid
ioctl calls.  So no surprise this went unnoticed.

  Ralf

^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-09-27  7:43 UTC (permalink / raw)
  To: Ira W. Snyder
  Cc: netdev, virtualization, kvm, linux-kernel, mingo, linux-mm, akpm,
	hpa, gregory.haskins, Rusty Russell, s.hetze
In-Reply-To: <20090925170158.GA16014@ovro.caltech.edu>

On Fri, Sep 25, 2009 at 10:01:58AM -0700, Ira W. Snyder wrote:
> > +	case VHOST_SET_VRING_KICK:
> > +		r = copy_from_user(&f, argp, sizeof f);
> > +		if (r < 0)
> > +			break;
> > +		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> > +		if (IS_ERR(eventfp))
> > +			return PTR_ERR(eventfp);
> > +		if (eventfp != vq->kick) {
> > +			pollstop = filep = vq->kick;
> > +			pollstart = vq->kick = eventfp;
> > +		} else
> > +			filep = eventfp;
> > +		break;
> > +	case VHOST_SET_VRING_CALL:
> > +		r = copy_from_user(&f, argp, sizeof f);
> > +		if (r < 0)
> > +			break;
> > +		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> > +		if (IS_ERR(eventfp))
> > +			return PTR_ERR(eventfp);
> > +		if (eventfp != vq->call) {
> > +			filep = vq->call;
> > +			ctx = vq->call_ctx;
> > +			vq->call = eventfp;
> > +			vq->call_ctx = eventfp ?
> > +				eventfd_ctx_fileget(eventfp) : NULL;
> > +		} else
> > +			filep = eventfp;
> > +		break;
> > +	case VHOST_SET_VRING_ERR:
> > +		r = copy_from_user(&f, argp, sizeof f);
> > +		if (r < 0)
> > +			break;
> > +		eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd);
> > +		if (IS_ERR(eventfp))
> > +			return PTR_ERR(eventfp);
> > +		if (eventfp != vq->error) {
> > +			filep = vq->error;
> > +			vq->error = eventfp;
> > +			ctx = vq->error_ctx;
> > +			vq->error_ctx = eventfp ?
> > +				eventfd_ctx_fileget(eventfp) : NULL;
> > +		} else
> > +			filep = eventfp;
> > +		break;
> 
> I'm not sure how these eventfd's save a trip to userspace.
> 
> AFAICT, eventfd's cannot be used to signal another part of the kernel,
> they can only be used to wake up userspace.

Yes, they can.  See irqfd code in virt/kvm/eventfd.c.

> In my system, when an IRQ for kick() comes in, I have an eventfd which
> gets signalled to notify userspace. When I want to send a call(), I have
> to use a special ioctl(), just like lguest does.
> 
> Doesn't this mean that for call(), vhost is just going to signal an
> eventfd to wake up userspace, which is then going to call ioctl(), and
> then we're back in kernelspace. Seems like a wasted userspace
> round-trip.
> 
> Or am I mis-reading this code?

Yes. Kernel can poll eventfd and deliver an interrupt directly
without involving userspace.

> PS - you can see my current code at:
> http://www.mmarray.org/~iws/virtio-phys/
> 
> Thanks,
> Ira
> 
> > +	default:
> > +		r = -ENOIOCTLCMD;
> > +	}
> > +
> > +	if (pollstop && vq->handle_kick)
> > +		vhost_poll_stop(&vq->poll);
> > +
> > +	if (ctx)
> > +		eventfd_ctx_put(ctx);
> > +	if (filep)
> > +		fput(filep);
> > +
> > +	if (pollstart && vq->handle_kick)
> > +		vhost_poll_start(&vq->poll, vq->kick);
> > +
> > +	mutex_unlock(&vq->mutex);
> > +
> > +	if (pollstop && vq->handle_kick)
> > +		vhost_poll_flush(&vq->poll);
> > +	return 0;
> > +}
> > +
> > +long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
> > +{
> > +	void __user *argp = (void __user *)arg;
> > +	long r;
> > +
> > +	mutex_lock(&d->mutex);
> > +	/* If you are not the owner, you can become one */
> > +	if (ioctl == VHOST_SET_OWNER) {
> > +		r = vhost_dev_set_owner(d);
> > +		goto done;
> > +	}
> > +
> > +	/* You must be the owner to do anything else */
> > +	r = vhost_dev_check_owner(d);
> > +	if (r)
> > +		goto done;
> > +
> > +	switch (ioctl) {
> > +	case VHOST_SET_MEM_TABLE:
> > +		r = vhost_set_memory(d, argp);
> > +		break;
> > +	default:
> > +		r = vhost_set_vring(d, ioctl, argp);
> > +		break;
> > +	}
> > +done:
> > +	mutex_unlock(&d->mutex);
> > +	return r;
> > +}
> > +
> > +static const struct vhost_memory_region *find_region(struct vhost_memory *mem,
> > +						     __u64 addr, __u32 len)
> > +{
> > +	struct vhost_memory_region *reg;
> > +	int i;
> > +	/* linear search is not brilliant, but we really have on the order of 6
> > +	 * regions in practice */
> > +	for (i = 0; i < mem->nregions; ++i) {
> > +		reg = mem->regions + i;
> > +		if (reg->guest_phys_addr <= addr &&
> > +		    reg->guest_phys_addr + reg->memory_size - 1 >= addr)
> > +			return reg;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
> > +		   struct iovec iov[], int iov_size)
> > +{
> > +	const struct vhost_memory_region *reg;
> > +	struct vhost_memory *mem;
> > +	struct iovec *_iov;
> > +	u64 s = 0;
> > +	int ret = 0;
> > +
> > +	rcu_read_lock();
> > +
> > +	mem = rcu_dereference(dev->memory);
> > +	while ((u64)len > s) {
> > +		u64 size;
> > +		if (ret >= iov_size) {
> > +			ret = -ENOBUFS;
> > +			break;
> > +		}
> > +		reg = find_region(mem, addr, len);
> > +		if (!reg) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +		_iov = iov + ret;
> > +		size = reg->memory_size - addr + reg->guest_phys_addr;
> > +		_iov->iov_len = min((u64)len, size);
> > +		_iov->iov_base = (void *)
> > +			(reg->userspace_addr + addr - reg->guest_phys_addr);
> > +		s += size;
> > +		addr += size;
> > +		++ret;
> > +	}
> > +
> > +	rcu_read_unlock();
> > +	return ret;
> > +}
> > +
> > +/* Each buffer in the virtqueues is actually a chain of descriptors.  This
> > + * function returns the next descriptor in the chain, or vq->vring.num if we're
> > + * at the end. */
> > +static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
> > +{
> > +	unsigned int next;
> > +
> > +	/* If this descriptor says it doesn't chain, we're done. */
> > +	if (!(desc->flags & VRING_DESC_F_NEXT))
> > +		return vq->num;
> > +
> > +	/* Check they're not leading us off end of descriptors. */
> > +	next = desc->next;
> > +	/* Make sure compiler knows to grab that: we don't want it changing! */
> > +	/* We will use the result as an index in an array, so most
> > +	 * architectures only need a compiler barrier here. */
> > +	read_barrier_depends();
> > +
> > +	if (next >= vq->num) {
> > +		vq_err(vq, "Desc next is %u > %u", next, vq->num);
> > +		return vq->num;
> > +	}
> > +
> > +	return next;
> > +}
> > +
> > +/* This looks in the virtqueue and for the first available buffer, and converts
> > + * it to an iovec for convenient access.  Since descriptors consist of some
> > + * number of output then some number of input descriptors, it's actually two
> > + * iovecs, but we pack them into one and note how many of each there were.
> > + *
> > + * This function returns the descriptor number found, or vq->num (which
> > + * is never a valid descriptor number) if none was found. */
> > +unsigned vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
> > +			   struct iovec iov[],
> > +			   unsigned int *out_num, unsigned int *in_num)
> > +{
> > +	struct vring_desc desc;
> > +	unsigned int i, head;
> > +	u16 last_avail_idx;
> > +	int ret;
> > +
> > +	/* Check it isn't doing very strange things with descriptor numbers. */
> > +	last_avail_idx = vq->last_avail_idx;
> > +	if (get_user(vq->avail_idx, &vq->avail->idx)) {
> > +		vq_err(vq, "Failed to access avail idx at %p\n",
> > +		       &vq->avail->idx);
> > +		return vq->num;
> > +	}
> > +
> > +	if ((u16)(vq->avail_idx - last_avail_idx) > vq->num) {
> > +		vq_err(vq, "Guest moved used index from %u to %u",
> > +		       last_avail_idx, vq->avail_idx);
> > +		return vq->num;
> > +	}
> > +
> > +	/* If there's nothing new since last we looked, return invalid. */
> > +	if (vq->avail_idx == last_avail_idx)
> > +		return vq->num;
> > +
> > +	/* Grab the next descriptor number they're advertising, and increment
> > +	 * the index we've seen. */
> > +	if (get_user(head, &vq->avail->ring[last_avail_idx % vq->num])) {
> > +		vq_err(vq, "Failed to read head: idx %d address %p\n",
> > +		       last_avail_idx,
> > +		       &vq->avail->ring[last_avail_idx % vq->num]);
> > +		return vq->num;
> > +	}
> > +
> > +	/* If their number is silly, that's an error. */
> > +	if (head >= vq->num) {
> > +		vq_err(vq, "Guest says index %u > %u is available",
> > +		       head, vq->num);
> > +		return vq->num;
> > +	}
> > +
> > +	vq->last_avail_idx++;
> > +
> > +	/* When we start there are none of either input nor output. */
> > +	*out_num = *in_num = 0;
> > +
> > +	i = head;
> > +	do {
> > +		unsigned iov_count = *in_num + *out_num;
> > +		if (copy_from_user(&desc, vq->desc + i, sizeof desc)) {
> > +			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
> > +			       i, vq->desc + i);
> > +			return vq->num;
> > +		}
> > +		ret = translate_desc(dev, desc.addr, desc.len, iov + iov_count,
> > +				     VHOST_NET_MAX_SG - iov_count);
> > +		if (ret < 0) {
> > +			vq_err(vq, "Translation failure %d descriptor idx %d\n",
> > +			       ret, i);
> > +			return vq->num;
> > +		}
> > +		/* If this is an input descriptor, increment that count. */
> > +		if (desc.flags & VRING_DESC_F_WRITE)
> > +			*in_num += ret;
> > +		else {
> > +			/* If it's an output descriptor, they're all supposed
> > +			 * to come before any input descriptors. */
> > +			if (*in_num) {
> > +				vq_err(vq, "Descriptor has out after in: "
> > +				       "idx %d\n", i);
> > +				return vq->num;
> > +			}
> > +			*out_num += ret;
> > +		}
> > +	} while ((i = next_desc(vq, &desc)) != vq->num);
> > +	return head;
> > +}
> > +
> > +/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
> > +void vhost_discard_vq_desc(struct vhost_virtqueue *vq)
> > +{
> > +	vq->last_avail_idx--;
> > +}
> > +
> > +/* After we've used one of their buffers, we tell them about it.  We'll then
> > + * want to send them an interrupt, using vq->call. */
> > +int vhost_add_used(struct vhost_virtqueue *vq,
> > +			  unsigned int head, int len)
> > +{
> > +	struct vring_used_elem *used;
> > +
> > +	/* The virtqueue contains a ring of used buffers.  Get a pointer to the
> > +	 * next entry in that used ring. */
> > +	used = &vq->used->ring[vq->last_used_idx % vq->num];
> > +	if (put_user(head, &used->id)) {
> > +		vq_err(vq, "Failed to write used id");
> > +		return -EFAULT;
> > +	}
> > +	if (put_user(len, &used->len)) {
> > +		vq_err(vq, "Failed to write used len");
> > +		return -EFAULT;
> > +	}
> > +	/* Make sure buffer is written before we update index. */
> > +	wmb();
> > +	if (put_user(vq->last_used_idx + 1, &vq->used->idx)) {
> > +		vq_err(vq, "Failed to increment used idx");
> > +		return -EFAULT;
> > +	}
> > +	vq->last_used_idx++;
> > +	return 0;
> > +}
> > +
> > +/* This actually sends the interrupt for this virtqueue */
> > +void vhost_trigger_irq(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> > +{
> > +	__u16 flags = 0;
> > +	if (get_user(flags, &vq->avail->flags)) {
> > +		vq_err(vq, "Failed to get flags");
> > +		return;
> > +	}
> > +
> > +	/* If they don't want an interrupt, don't send one, unless empty. */
> > +	if ((flags & VRING_AVAIL_F_NO_INTERRUPT) &&
> > +	    (!vhost_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) ||
> > +	     vq->avail_idx != vq->last_avail_idx))
> > +		return;
> > +
> > +	/* Send the Guest an interrupt tell them we used something up. */
> > +	if (vq->call_ctx)
> > +		eventfd_signal(vq->call_ctx, 1);
> > +}
> > +
> > +/* And here's the combo meal deal.  Supersize me! */
> > +void vhost_add_used_and_trigger(struct vhost_dev *dev,
> > +				struct vhost_virtqueue *vq,
> > +				unsigned int head, int len)
> > +{
> > +	vhost_add_used(vq, head, len);
> > +	vhost_trigger_irq(dev, vq);
> > +}
> > +
> > +/* OK, now we need to know about added descriptors. */
> > +bool vhost_notify(struct vhost_virtqueue *vq)
> > +{
> > +	int r;
> > +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> > +		return false;
> > +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
> > +	r = put_user(vq->used_flags, &vq->used->flags);
> > +	if (r)
> > +		vq_err(vq, "Failed to disable notification: %d\n", r);
> > +	/* They could have slipped one in as we were doing that: make
> > +	 * sure it's written, tell caller it needs to check again. */
> > +	mb();
> > +	return true;
> > +}
> > +
> > +/* We don't need to be notified again. */
> > +void vhost_no_notify(struct vhost_virtqueue *vq)
> > +{
> > +	int r;
> > +	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
> > +		return;
> > +	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
> > +	r = put_user(vq->used_flags, &vq->used->flags);
> > +	if (r)
> > +		vq_err(vq, "Failed to enable notification: %d\n", r);
> > +}
> > +
> > +int vhost_init(void)
> > +{
> > +	vhost_workqueue = create_workqueue("vhost");
> > +	if (!vhost_workqueue)
> > +		return -ENOMEM;
> > +	return 0;
> > +}
> > +
> > +void vhost_cleanup(void)
> > +{
> > +	destroy_workqueue(vhost_workqueue);
> > +}
> > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > new file mode 100644
> > index 0000000..8e13d06
> > --- /dev/null
> > +++ b/drivers/vhost/vhost.h
> > @@ -0,0 +1,122 @@
> > +#ifndef _VHOST_H
> > +#define _VHOST_H
> > +
> > +#include <linux/eventfd.h>
> > +#include <linux/vhost.h>
> > +#include <linux/mm.h>
> > +#include <linux/mutex.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/poll.h>
> > +#include <linux/file.h>
> > +#include <linux/skbuff.h>
> > +#include <linux/uio.h>
> > +#include <linux/virtio_config.h>
> > +
> > +struct vhost_device;
> > +
> > +enum {
> > +	VHOST_NET_MAX_SG = MAX_SKB_FRAGS + 2,
> > +};
> > +
> > +/* Poll a file (eventfd or socket) */
> > +/* Note: there's nothing vhost specific about this structure. */
> > +struct vhost_poll {
> > +	poll_table                table;
> > +	wait_queue_head_t        *wqh;
> > +	wait_queue_t              wait;
> > +	/* struct which will handle all actual work. */
> > +	struct work_struct        work;
> > +	unsigned long		  mask;
> > +};
> > +
> > +void vhost_poll_init(struct vhost_poll *poll, work_func_t func,
> > +		     unsigned long mask);
> > +void vhost_poll_start(struct vhost_poll *poll, struct file *file);
> > +void vhost_poll_stop(struct vhost_poll *poll);
> > +void vhost_poll_flush(struct vhost_poll *poll);
> > +
> > +/* The virtqueue structure describes a queue attached to a device. */
> > +struct vhost_virtqueue {
> > +	struct vhost_dev *dev;
> > +
> > +	/* The actual ring of buffers. */
> > +	struct mutex mutex;
> > +	unsigned int num;
> > +	struct vring_desc __user *desc;
> > +	struct vring_avail __user *avail;
> > +	struct vring_used __user *used;
> > +	struct file *kick;
> > +	struct file *call;
> > +	struct file *error;
> > +	struct eventfd_ctx *call_ctx;
> > +	struct eventfd_ctx *error_ctx;
> > +
> > +	struct vhost_poll poll;
> > +
> > +	/* The routine to call when the Guest pings us, or timeout. */
> > +	work_func_t handle_kick;
> > +
> > +	/* Last available index we saw. */
> > +	u16 last_avail_idx;
> > +
> > +	/* Caches available index value from user. */
> > +	u16 avail_idx;
> > +
> > +	/* Last index we used. */
> > +	u16 last_used_idx;
> > +
> > +	/* Used flags */
> > +	u16 used_flags;
> > +
> > +	struct iovec iov[VHOST_NET_MAX_SG];
> > +	struct iovec hdr[VHOST_NET_MAX_SG];
> > +};
> > +
> > +struct vhost_dev {
> > +	/* Readers use RCU to access memory table pointer.
> > +	 * Writers use mutex below.*/
> > +	struct vhost_memory *memory;
> > +	struct mm_struct *mm;
> > +	struct vhost_virtqueue *vqs;
> > +	int nvqs;
> > +	struct mutex mutex;
> > +	unsigned acked_features;
> > +};
> > +
> > +long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
> > +long vhost_dev_check_owner(struct vhost_dev *);
> > +long vhost_dev_reset_owner(struct vhost_dev *);
> > +void vhost_dev_cleanup(struct vhost_dev *);
> > +long vhost_dev_ioctl(struct vhost_dev *, unsigned int ioctl, unsigned long arg);
> > +
> > +unsigned vhost_get_vq_desc(struct vhost_dev *, struct vhost_virtqueue *,
> > +			   struct iovec iov[],
> > +			   unsigned int *out_num, unsigned int *in_num);
> > +void vhost_discard_vq_desc(struct vhost_virtqueue *);
> > +
> > +int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
> > +void vhost_trigger_irq(struct vhost_dev *, struct vhost_virtqueue *);
> > +void vhost_add_used_and_trigger(struct vhost_dev *, struct vhost_virtqueue *,
> > +				unsigned int head, int len);
> > +void vhost_no_notify(struct vhost_virtqueue *);
> > +bool vhost_notify(struct vhost_virtqueue *);
> > +
> > +int vhost_init(void);
> > +void vhost_cleanup(void);
> > +
> > +#define vq_err(vq, fmt, ...) do {                                  \
> > +		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \
> > +		if ((vq)->error_ctx)                               \
> > +				eventfd_signal((vq)->error_ctx, 1);\
> > +	} while (0)
> > +
> > +enum {
> > +	VHOST_FEATURES = 1 << VIRTIO_F_NOTIFY_ON_EMPTY,
> > +};
> > +
> > +static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
> > +{
> > +	return dev->acked_features & (1 << bit);
> > +}
> > +
> > +#endif
> > diff --git a/include/linux/Kbuild b/include/linux/Kbuild
> > index dec2f18..975df9a 100644
> > --- a/include/linux/Kbuild
> > +++ b/include/linux/Kbuild
> > @@ -360,6 +360,7 @@ unifdef-y += uio.h
> >  unifdef-y += unistd.h
> >  unifdef-y += usbdevice_fs.h
> >  unifdef-y += utsname.h
> > +unifdef-y += vhost.h
> >  unifdef-y += videodev2.h
> >  unifdef-y += videodev.h
> >  unifdef-y += virtio_config.h
> > diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
> > index 0521177..781a8bb 100644
> > --- a/include/linux/miscdevice.h
> > +++ b/include/linux/miscdevice.h
> > @@ -30,6 +30,7 @@
> >  #define HPET_MINOR		228
> >  #define FUSE_MINOR		229
> >  #define KVM_MINOR		232
> > +#define VHOST_NET_MINOR		233
> >  #define MISC_DYNAMIC_MINOR	255
> >  
> >  struct device;
> > diff --git a/include/linux/vhost.h b/include/linux/vhost.h
> > new file mode 100644
> > index 0000000..3f441a9
> > --- /dev/null
> > +++ b/include/linux/vhost.h
> > @@ -0,0 +1,101 @@
> > +#ifndef _LINUX_VHOST_H
> > +#define _LINUX_VHOST_H
> > +/* Userspace interface for in-kernel virtio accelerators. */
> > +
> > +/* vhost is used to reduce the number of system calls involved in virtio.
> > + *
> > + * Existing virtio net code is used in the guest without modification.
> > + *
> > + * This header includes interface used by userspace hypervisor for
> > + * device configuration.
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/compiler.h>
> > +#include <linux/ioctl.h>
> > +#include <linux/virtio_config.h>
> > +#include <linux/virtio_ring.h>
> > +
> > +struct vhost_vring_state {
> > +	unsigned int index;
> > +	unsigned int num;
> > +};
> > +
> > +struct vhost_vring_file {
> > +	unsigned int index;
> > +	int fd;
> > +};
> > +
> > +struct vhost_vring_addr {
> > +	unsigned int index;
> > +	unsigned int padding;
> > +	__u64 user_addr;
> > +};
> > +
> > +struct vhost_memory_region {
> > +	__u64 guest_phys_addr;
> > +	__u64 memory_size; /* bytes */
> > +	__u64 userspace_addr;
> > +	__u64 padding; /* read/write protection? */
> > +};
> > +
> > +struct vhost_memory {
> > +	__u32 nregions;
> > +	__u32 padding;
> > +	struct vhost_memory_region regions[0];
> > +};
> > +
> > +/* ioctls */
> > +
> > +#define VHOST_VIRTIO 0xAF
> > +
> > +/* Features bitmask for forward compatibility.  Transport bits are used for
> > + * vhost specific features. */
> > +#define VHOST_GET_FEATURES	_IOR(VHOST_VIRTIO, 0x00, __u64)
> > +#define VHOST_ACK_FEATURES	_IOW(VHOST_VIRTIO, 0x00, __u64)
> > +
> > +/* Set current process as the (exclusive) owner of this file descriptor.  This
> > + * must be called before any other vhost command.  Further calls to
> > + * VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */
> > +#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
> > +/* Give up ownership, and reset the device to default values.
> > + * Allows subsequent call to VHOST_OWNER_SET to succeed. */
> > +#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
> > +
> > +/* Set up/modify memory layout */
> > +#define VHOST_SET_MEM_TABLE	_IOW(VHOST_VIRTIO, 0x03, struct vhost_memory)
> > +
> > +/* Ring setup. These parameters can not be modified while ring is running
> > + * (bound to a device). */
> > +/* Set number of descriptors in ring */
> > +#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
> > +/* Start of array of descriptors (virtually contiguous) */
> > +#define VHOST_SET_VRING_DESC _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
> > +/* Used structure address */
> > +#define VHOST_SET_VRING_USED _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_addr)
> > +/* Available structure address */
> > +#define VHOST_SET_VRING_AVAIL _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_addr)
> > +/* Base value where queue looks for available descriptors */
> > +#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> > +/* Get accessor: reads index, writes value in num */
> > +#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
> > +
> > +/* The following ioctls use eventfd file descriptors to signal and poll
> > + * for events. */
> > +
> > +/* Set eventfd to poll for added buffers */
> > +#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
> > +/* Set eventfd to signal when buffers have beed used */
> > +#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
> > +/* Set eventfd to signal an error */
> > +#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
> > +
> > +/* VHOST_NET specific defines */
> > +
> > +/* Attach virtio net device to a raw socket. The socket must be already
> > + * bound to an ethernet device, this device will be used for transmit.
> > + * Pass -1 to unbind from the socket and the transmit device.
> > + * This can be used to stop the device (e.g. for migration). */
> > +#define VHOST_NET_SET_SOCKET _IOW(VHOST_VIRTIO, 0x30, int)
> > +
> > +#endif
> > -- 
> > 1.6.2.5
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Avi Kivity @ 2009-09-27  9:43 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ira W. Snyder, Michael S. Tsirkin, netdev, virtualization, kvm,
	linux-kernel, mingo, linux-mm, akpm, hpa, Rusty Russell, s.hetze,
	alacrityvm-devel
In-Reply-To: <4ABD36E3.9070503@gmail.com>

On 09/26/2009 12:32 AM, Gregory Haskins wrote:
>>>
>>> I realize in retrospect that my choice of words above implies vbus _is_
>>> complete, but this is not what I was saying.  What I was trying to
>>> convey is that vbus is _more_ complete.  Yes, in either case some kind
>>> of glue needs to be written.  The difference is that vbus implements
>>> more of the glue generally, and leaves less required to be customized
>>> for each iteration.
>>>
>>>        
>>
>> No argument there.  Since you care about non-virt scenarios and virtio
>> doesn't, naturally vbus is a better fit for them as the code stands.
>>      
> Thanks for finally starting to acknowledge there's a benefit, at least.
>    

I think I've mentioned vbus' finer grained layers as helpful here, 
though I doubt the value of this.  Hypervisors are added rarely, while 
devices and drivers are added (and modified) much more often.  I don't 
buy the anything-to-anything promise.

> To be more precise, IMO virtio is designed to be a performance oriented
> ring-based driver interface that supports all types of hypervisors (e.g.
> shmem based kvm, and non-shmem based Xen).  vbus is designed to be a
> high-performance generic shared-memory interconnect (for rings or
> otherwise) framework for environments where linux is the underpinning
> "host" (physical or virtual).  They are distinctly different, but
> complementary (the former addresses the part of the front-end, and
> latter addresses the back-end, and a different part of the front-end).
>    

They're not truly complementary since they're incompatible.  A 2.6.27 
guest, or Windows guest with the existing virtio drivers, won't work 
over vbus.  Further, non-shmem virtio can't work over vbus.  Since 
virtio is guest-oriented and host-agnostic, it can't ignore 
non-shared-memory hosts (even though it's unlikely virtio will be 
adopted there).

> In addition, the kvm-connector used in AlacrityVM's design strives to
> add value and improve performance via other mechanisms, such as dynamic
>   allocation, interrupt coalescing (thus reducing exit-ratio, which is a
> serious issue in KVM)

Do you have measurements of inter-interrupt coalescing rates (excluding 
intra-interrupt coalescing).

> and priortizable/nestable signals.
>    

That doesn't belong in a bus.

> Today there is a large performance disparity between what a KVM guest
> sees and what a native linux application sees on that same host.  Just
> take a look at some of my graphs between "virtio", and "native", for
> example:
>
> http://developer.novell.com/wiki/images/b/b7/31-rc4_throughput.png
>    

That's a red herring.  The problem is not with virtio as an ABI, but 
with its implementation in userspace.  vhost-net should offer equivalent 
performance to vbus.

> A dominant vbus design principle is to try to achieve the same IO
> performance for all "linux applications" whether they be literally
> userspace applications, or things like KVM vcpus or Ira's physical
> boards.  It also aims to solve problems not previously expressible with
> current technologies (even virtio), like nested real-time.
>
> And even though you repeatedly insist otherwise, the neat thing here is
> that the two technologies mesh (at least under certain circumstances,
> like when virtio is deployed on a shared-memory friendly linux backend
> like KVM).  I hope that my stack diagram below depicts that clearly.
>    

Right, when you ignore the points where they don't fit, it's a perfect mesh.

>> But that's not a strong argument for vbus; instead of adding vbus you
>> could make virtio more friendly to non-virt
>>      
> Actually, it _is_ a strong argument then because adding vbus is what
> helps makes virtio friendly to non-virt, at least for when performance
> matters.
>    

As vhost-net shows, you can do that without vbus and without breaking 
compatibility.



>> Right.  virtio assumes that it's in a virt scenario and that the guest
>> architecture already has enumeration and hotplug mechanisms which it
>> would prefer to use.  That happens to be the case for kvm/x86.
>>      
> No, virtio doesn't assume that.  It's stack provides the "virtio-bus"
> abstraction and what it does assume is that it will be wired up to
> something underneath. Kvm/x86 conveniently has pci, so the virtio-pci
> adapter was created to reuse much of that facility.  For other things
> like lguest and s360, something new had to be created underneath to make
> up for the lack of pci-like support.
>    

Right, I was wrong there.  But it does allow you to have a 1:1 mapping 
between native devices and virtio devices.


>>> So to answer your question, the difference is that the part that has to
>>> be customized in vbus should be a fraction of what needs to be
>>> customized with vhost because it defines more of the stack.
>>>        
>> But if you want to use the native mechanisms, vbus doesn't have any
>> added value.
>>      
> First of all, thats incorrect.  If you want to use the "native"
> mechanisms (via the way the vbus-connector is implemented, for instance)
> you at least still have the benefit that the backend design is more
> broadly re-useable in more environments (like non-virt, for instance),
> because vbus does a proper job of defining the requisite
> layers/abstractions compared to vhost.  So it adds value even in that
> situation.
>    

Maybe.  If vhost-net isn't sufficient I'm sure there will be patches sent.

> Second of all, with PV there is no such thing as "native".  It's
> software so it can be whatever we want.  Sure, you could argue that the
> guest may have built-in support for something like PCI protocol.
> However, PCI protocol itself isn't suitable for high-performance PV out
> of the can.  So you will therefore invariably require new software
> layers on top anyway, even if part of the support is already included.
>    

Of course there is such a thing as native, a pci-ready guest has tons of 
support built into it that doesn't need to be retrofitted.  Since 
practically everyone (including Xen) does their paravirt drivers atop 
pci, the claim that pci isn't suitable for high performance is incorrect.


> And lastly, why would you _need_ to use the so called "native"
> mechanism?  The short answer is, "you don't".  Any given system (guest
> or bare-metal) already have a wide-range of buses (try running "tree
> /sys/bus" in Linux).  More importantly, the concept of adding new buses
> is widely supported in both the Windows and Linux driver model (and
> probably any other guest-type that matters).  Therefore, despite claims
> to the contrary, its not hard or even unusual to add a new bus to the mix.
>    

The short answer is "compatibility".


> In summary, vbus is simply one more bus of many, purpose built to
> support high-end IO in a virt-like model, giving controlled access to
> the linux-host underneath it.  You can write a high-performance layer
> below the OS bus-model (vbus), or above it (virtio-pci) but either way
> you are modifying the stack to add these capabilities, so we might as
> well try to get this right.
>
> With all due respect, you are making a big deal out of a minor issue.
>    

It's not minor to me.

>>> And, as
>>> eluded to in my diagram, both virtio-net and vhost (with some
>>> modifications to fit into the vbus framework) are potentially
>>> complementary, not competitors.
>>>
>>>        
>> Only theoretically.  The existing installed base would have to be thrown
>> away
>>      
> "Thrown away" is pure hyperbole.  The installed base, worse case, needs
> to load a new driver for a missing device.

Yes, we all know how fun this is.  Especially if the device changed is 
your boot disk.  You may not care about the pain caused to users, but I 
do, so I will continue to insist on compatibility.

>> or we'd need to support both.
>>
>>
>>      
> No matter what model we talk about, there's always going to be a "both"
> since the userspace virtio models are probably not going to go away (nor
> should they).
>    

virtio allows you to have userspace-only, kernel-only, or 
start-with-userspace-and-move-to-kernel-later, all transparent to the 
guest.  In many cases we'll stick with userspace-only.

>> All this is after kvm has decoded that vbus is addresses.  It can't work
>> without someone outside vbus deciding that.
>>      
> How the connector message is delivered is really not relevant.  Some
> architectures will simply deliver the message point-to-point (like the
> original hypercall design for KVM, or something like Ira's rig), and
> some will need additional demuxing (like pci-bridge/pio based KVM).
> It's an implementation detail of the connector.
>
> However, the real point here is that something needs to establish a
> scoped namespace mechanism, add items to that namespace, and advertise
> the presence of the items to the guest.  vbus has this facility built in
> to its stack.  vhost doesn't, so it must come from elsewhere.
>    

So we have: vbus needs a connector, vhost needs a connector.  vbus 
doesn't need userspace to program the addresses (but does need userspace 
to instantiate the devices and to program the bus address decode), vhost 
needs userspace to instantiate the devices and program the addresses.

>>> In fact, it's actually a simpler design to unify things this way because
>>> you avoid splitting the device model up. Consider how painful the vhost
>>> implementation would be if it didn't already have the userspace
>>> virtio-net to fall-back on.  This is effectively what we face for new
>>> devices going forward if that model is to persist.
>>>
>>>        
>>
>> It doesn't have just virtio-net, it has userspace-based hostplug
>>      
> vbus has hotplug too: mkdir and rmdir
>    

Does that work from nonprivileged processes?  Does it work on Windows?

> As an added bonus, its device-model is modular.  A developer can write a
> new device model, compile it, insmod it to the host kernel, hotplug it
> to the running guest with mkdir/ln, and the come back out again
> (hotunplug with rmdir, rmmod, etc).  They may do this all without taking
> the guest down, and while eating QEMU based IO solutions for breakfast
> performance wise.
>
> Afaict, qemu can't do either of those things.
>    

We've seen that herring before, and it's redder than ever.



>> Refactor instead of duplicating.
>>      
> There is no duplicating.  vbus has no equivalent today as virtio doesn't
> define these layers.
>    

So define them if they're missing.


>>>
>>>        
>>>>    Use libraries (virtio-shmem.ko, libvhost.so).
>>>>
>>>>          
>>> What do you suppose vbus is?  vbus-proxy.ko = virtio-shmem.ko, and you
>>> dont need libvhost.so per se since you can just use standard kernel
>>> interfaces (like configfs/sysfs).  I could create an .so going forward
>>> for the new ioctl-based interface, I suppose.
>>>
>>>        
>> Refactor instead of rewriting.
>>      
> There is no rewriting.  vbus has no equivalent today as virtio doesn't
> define these layers.
>
> By your own admission, you said if you wanted that capability, use a
> library.  What I think you are not understanding is vbus _is_ that
> library.  So what is the problem, exactly?
>    

It's not compatible.  If you were truly worried about code duplication 
in virtio, you'd refactor it to remove the duplication, without 
affecting existing guests.

>>>> For kvm/x86 pci definitely remains king.
>>>>
>>>>          
>>> For full virtualization, sure.  I agree.  However, we are talking about
>>> PV here.  For PV, PCI is not a requirement and is a technical dead-end
>>> IMO.
>>>
>>> KVM seems to be the only virt solution that thinks otherwise (*), but I
>>> believe that is primarily a condition of its maturity.  I aim to help
>>> advance things here.
>>>
>>> (*) citation: xen has xenbus, lguest has lguest-bus, vmware has some
>>> vmi-esq thing (I forget what its called) to name a few.  Love 'em or
>>> hate 'em, most other hypervisors do something along these lines.  I'd
>>> like to try to create one for KVM, but to unify them all (at least for
>>> the Linux-based host designs).
>>>
>>>        
>> VMware are throwing VMI away (won't be supported in their new product,
>> and they've sent a patch to rip it off from Linux);
>>      
> vmware only cares about x86 iiuc, so probably not a good example.
>    

Well, you brought it up.  Between you and me, I only care about x86 too.

>> Xen has to tunnel
>> xenbus in pci for full virtualization (which is where Windows is, and
>> where Linux will be too once people realize it's faster).  lguest is
>> meant as an example hypervisor, not an attempt to take over the world.
>>      
> So pick any other hypervisor, and the situation is often similar.
>    

The situation is often pci.

>
>> An right now you can have a guest using pci to access a mix of
>> userspace-emulated devices, userspace-emulated-but-kernel-accelerated
>> virtio devices, and real host devices.  All on one dead-end bus.  Try
>> that with vbus.
>>      
> vbus is not interested in userspace devices.  The charter is to provide
> facilities for utilizing the host linux kernel's IO capabilities in the
> most efficient, yet safe, manner possible.  Those devices that fit
> outside that charter can ride on legacy mechanisms if that suits them best.
>    

vbus isn't, but I am.  I would prefer not to have to expose 
implementation decisions (kernel vs userspace) to the guest (vbus vs pci).

>>> That won't cut it.  For one, creating an eventfd is only part of the
>>> equation.  I.e. you need to have originate/terminate somewhere
>>> interesting (and in-kernel, otherwise use tuntap).
>>>
>>>        
>> vbus needs the same thing so it cancels out.
>>      
> No, it does not.  vbus just needs a relatively simple single message
> pipe between the guest and host (think "hypercall tunnel", if you will).
>    

That's ioeventfd.  So far so similar.

>   Per queue/device addressing is handled by the same conceptual namespace
> as the one that would trigger eventfds in the model you mention.  And
> that namespace is built in to the vbus stack, and objects are registered
> automatically as they are created.
>
> Contrast that to vhost, which requires some other kernel interface to
> exist, and to be managed manually for each object that is created.  Your
> libvhostconfig would need to somehow know how to perform this
> registration operation, and there would have to be something in the
> kernel to receive it, presumably on a per platform basis.  Solving this
> problem generally would probably end up looking eerily like vbus,
> because thats what vbus does.
>    

vbus devices aren't magically instantiated.  Userspace needs to 
instantiate them too.  Sure, there's less work on the host side since 
you're using vbus instead of the native interface, but more work on the 
guest side since you're using vbus instead of the native interface.



>> Well, let's see.  Can vbus today:
>>
>> - let userspace know which features are available (so it can decide if
>> live migration is possible)
>>      
> yes, its in sysfs.
>
>    
>> - let userspace limit which features are exposed to the guest (so it can
>> make live migration possible among hosts of different capabilities)
>>      
> yes, its in sysfs.
>    

Per-device?  non-privileged-user capable?

>> - let userspace know which features were negotiated (so it can transfer
>> them to the other host during live migration)
>>      
> no, but we can easily add ->save()/->restore() to the model going
> forward, and the negotiated features are just a subcomponent if its
> serialized stream.
>
>    
>> - let userspace tell the kernel which features were negotiated (when
>> live migration completes, to avoid requiring the guest to re-negotiate)
>>      
> that would be the function of the ->restore() deserializer.
>
>    
>> - do all that from an unprivileged process
>>      
> yes, in the upcoming alacrityvm v0.3 with the ioctl based control plane.
>    

Ah, so you have two control planes.

> Bottom line: vbus isn't done, especially w.r.t. live-migration..but that
> is not an valid argument against the idea if you believe in
> release-early/release-often. kvm wasn't (isn't) done either when it was
> proposed/merged.
>
>    

kvm didn't have an existing counterpart in Linux when it was 
proposed/merged.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] /proc/net/tcp, overhead removed
From: Eric Dumazet @ 2009-09-27  9:53 UTC (permalink / raw)
  To: Yakov Lerner
  Cc: linux-kernel, netdev, davem, kuznet, pekkas, jmorris, yoshfuji,
	kaber, torvalds
In-Reply-To: <1254000675-8327-1-git-send-email-iler.ml@gmail.com>

Yakov Lerner a écrit :
> /proc/net/tcp does 20,000 sockets in 60-80 milliseconds, with this patch.
> 
> The overhead was in tcp_seq_start(). See analysis (3) below.
> The patch is against Linus git tree (1). The patch is small.
> 
> ------------  -----------   ------------------------------------
> Before patch  After patch   20,000 sockets (10,000 tw + 10,000 estab)(2)
> ------------  -----------   ------------------------------------
> 6 sec          0.06 sec     dd bs=1k if=/proc/net/tcp >/dev/null 
> 1.5 sec        0.06 sec     dd bs=4k if=/proc/net/tcp >/dev/null
> 
> 1.9 sec        0.16 sec     netstat -4ant >/dev/null
> ------------  -----------   ------------------------------------
> 
> This is ~ x25 improvement.
> The new time is not dependent on read blockize.
> Speed of netstat, naturally, improves, too; both -4 and -6.
> /proc/net/tcp6 does 20,000 sockets in 100 millisec.
> 
> (1) against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
> 
> (2) Used 'manysock' utility to stress system with large number of sockets:
>   "manysock 10000 10000"    - 10,000 tw + 10,000 estab ip4 sockets.
>   "manysock -6 10000 10000" - 10,000 tw + 10,000 estab ip6 sockets.
> Found at http://ilerner.3b1.org/manysock/manysock.c
> 
> (3) Algorithmic analysis. 
>     Old algorithm.
> 
> During 'cat </proc/net/tcp', tcp_seq_start() is called O(numsockets) times (4).
> On average, every call to tcp_seq_start() scans half the whole hashtable. Ouch.
> This is O(numsockets * hashsize). 95-99% of 'cat </proc/net/tcp' is spent in
> tcp_seq_start()->tcp_get_idx. This overhead is eliminated by new algorithm,
> which is O(numsockets + hashsize).
> 
>     New algorithm.
> 
> New algorithms is O(numsockets + hashsize). We jump to the right
> hash bucket in tcp_seq_start(), without scanning half the hash.
> To jump right to the hash bucket corresponding to *pos in tcp_seq_start(),
> we reuse three pieces of state (st->num, st->bucket, st->sbucket)
> as follows:
>  - we check that requested pos >= last seen pos (st->num), the typical case. 
>  - if so, we jump to bucket st->bucket
>  - to arrive to the right item after beginning of st->bucket, we
> keep in st->sbucket the position corresponding to the beginning of
> bucket.
> 
> (4) Explanation of O( numsockets * hashsize) of old algorithm.
> 
> tcp_seq_start() is called once for every ~7 lines of netstat output 
> if readsize is 1kb, or once for every ~28 lines if readsize >= 4kb.
> Since record length of /proc/net/tcp records is 150 bytes, formula for
> number of calls to tcp_seq_start() is
>             (numsockets * 150 / min(4096,readsize)).
> Netstat uses 4kb readsize (newer versions), or 1kb (older versions).
> Note that speed of old algorithm does not improve above 4kb blocksize.
> 
> Speed of the new algorithm does not depend on blocksize.
> 
> Speed of the new algorithm does not perceptibly depend on hashsize (which
> depends on ramsize). Speed of old algorithm drops with bigger hashsize.
> 
> (5) Reporting order.
> 
> Reporting order is exactly same as before if hash does not change underfoot.
> When hash elements come and go during report, reporting order will be
> same as that of tcpdiag.
> 
> Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
> ---
>  net/ipv4/tcp_ipv4.c |   26 ++++++++++++++++++++++++--
>  1 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7cda24b..7d9421a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1994,13 +1994,14 @@ static inline int empty_bucket(struct tcp_iter_state *st)
>  		hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain);
>  }
>  
> -static void *established_get_first(struct seq_file *seq)
> +static void *established_get_first_after(struct seq_file *seq, int bucket)
>  {
>  	struct tcp_iter_state *st = seq->private;
>  	struct net *net = seq_file_net(seq);
>  	void *rc = NULL;
>  
> -	for (st->bucket = 0; st->bucket < tcp_hashinfo.ehash_size; ++st->bucket) {
> +	for (st->bucket = bucket; st->bucket < tcp_hashinfo.ehash_size;
> +	     ++st->bucket) {
>  		struct sock *sk;
>  		struct hlist_nulls_node *node;
>  		struct inet_timewait_sock *tw;
> @@ -2036,6 +2037,11 @@ out:
>  	return rc;
>  }
>  
> +static void *established_get_first(struct seq_file *seq)
> +{
> +	return established_get_first_after(seq, 0);
> +}
> +
>  static void *established_get_next(struct seq_file *seq, void *cur)
>  {
>  	struct sock *sk = cur;
> @@ -2045,6 +2051,7 @@ static void *established_get_next(struct seq_file *seq, void *cur)
>  	struct net *net = seq_file_net(seq);
>  
>  	++st->num;
> +	st->sbucket = st->num;

Hello Yakov

Intention of your patch is very good, but not currently working.

It seems you believe there is at most one entry per hash slot or something like that

Please reboot your test machine with "thash_entries=4096" so that tcp hash
size is 4096, and try to fill 20000 tcp sockets with a test program.

then :

# ss | wc -l
20001
(ok)

# cat /proc/net/tcp | wc -l
22160
(not quite correct ...)

# netstat -tn | wc -l
<never ends>


# dd if=/proc/net/tcp ibs=1024 | wc -l
<never ends>


Please send your next patch on netdev@vger.kernel.org , DaveM only , were netdev people
are reviewing netdev patches, there is no need include other people for first submissions.

Thank you


#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
int fdlisten;
main()
{
        int i;
        struct sockaddr_in sockaddr;

        fdlisten = socket(AF_INET, SOCK_STREAM, 0);
        memset(&sockaddr, 0, sizeof(sockaddr));
        sockaddr.sin_family = AF_INET;
        sockaddr.sin_port = htons(2222);
        if (bind(fdlisten, (struct sockaddr *)&sockaddr, sizeof(sockaddr))== -1) {
                perror("bind");
                return 1;
        }
        if (listen(fdlisten, 10)== -1) {
                perror("listen");
                return 1;
        }
        if (fork() == 0) {
                while (1) {
                        socklen_t len = sizeof(sockaddr);
                        int newfd = accept(fdlisten, (struct sockaddr *)&sockaddr, &len);
                }
        }
        for (i = 0 ; i < 10000; i++) {
                int fd = socket(AF_INET, SOCK_STREAM, 0);
                if (fd == -1) {
                        perror("socket");
                        break;
                        }
                connect(fd, (struct sockaddr *)&sockaddr, sizeof(sockaddr));
        }
        pause();
}

^ permalink raw reply

* Re: [PATCH 1/3] lib/vsprintf.c: Add %pU - ptr to a UUID/GUID
From: Ingo Oeser @ 2009-09-27 10:45 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, netdev, Greg KH
In-Reply-To: <1507728e0ea3deafa71c481d508a6e9765c92221.1254030722.git.joe@perches.com>

Hi Joe,

On Sunday 27 September 2009, Joe Perches wrote:
> UUID/GUIDs are somewhat common in kernel source.
> 
> Standardize the printed style of UUID/GUIDs by using
> another extension to %p.
> 
> %pU:    01020304:0506:0708:090a:0b0c0d0e0f10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ here
> %pUr:   04030201:0605:0807:0a09:0b0c0d0e0f10
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ and here

Code does "01020304-0506-0708-090a-0b0c0d0e0f10".
This is not, what commit promises. Please change the commit message!

Best Regards

Ingo Oeser

^ permalink raw reply

* [bisected] Wireless regression in 2.6.32-git
From: Arjan van de Ven @ 2009-09-27 13:18 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, linux-wireless

Hi,

With todays git my laptop fails to associate with my access point.
Bisection points to the commit below, and reverting this one commit on
the HEAD of tree also fixes the issue, so I'm pretty confident that this
commit is to blame.

I have a 4965 wifi card in my laptop, and the network I'm trying to
connect to has no encryption. I'm running Fedora 11 as OS.

I would like to kindly request for this commit to be reverted until a
more permanent solution is found (I'm happy to test any patches)..

94f85853324e02c3a32bc3101f090dc9a3f512b4 is first bad commit
commit 94f85853324e02c3a32bc3101f090dc9a3f512b4
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Thu Sep 17 17:15:31 2009 -0700

    cfg80211: don't overwrite privacy setting

    When cfg80211 is instructed to connect, it always
    uses the default WEP key for the privacy setting,
    which clearly is wrong when using wpa_supplicant.
    Don't overwrite the setting, and rely on it being
    false when wpa_supplicant is not running, instead
    set it to true when we have keys.

    Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

:040000 040000 27fb46273e88eefee373699eb7e3f2923ac0886b
9518ee3e52c8320613cc5eee5ac54aabf082432f M	net

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* Re: [bisected] Wireless regression in 2.6.32-git
From: Maciej Rutecki @ 2009-09-27 13:24 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: netdev, linux-kernel, linux-wireless
In-Reply-To: <20090927151855.29efcc53@infradead.org>

Did You have similar messages in dmesg like this:

http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2089
(use WPA)
?

I try make sure, that that result of my bisection is correct.
-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox