Netdev List
 help / color / mirror / Atom feed
* [PATCH] af_unix: unix_write_space() use keyed wakeups
From: Eric Dumazet @ 2010-10-30  6:44 UTC (permalink / raw)
  To: Alban Crequy, Davide Libenzi
  Cc: David S. Miller, Stephen Hemminger, Cyrill Gorcunov,
	Alexey Dobriyan, netdev, linux-kernel, Pauli Nieminen,
	Rainer Weikusat
In-Reply-To: <1288380431.2680.3.camel@edumazet-laptop>

Le vendredi 29 octobre 2010 à 21:27 +0200, Eric Dumazet a écrit :
> Le vendredi 29 octobre 2010 à 19:18 +0100, Alban Crequy a écrit :
> > Hi,
> > 
> > When a process calls the poll or select, the kernel calls (struct
> > file_operations)->poll on every file descriptor and returns a mask of
> > events which are ready. If the process is only interested by POLLIN
> > events, the mask is still computed for POLLOUT and it can be expensive.
> > For example, on Unix datagram sockets, a process running poll() with
> > POLLIN will wakes-up when the remote end call read(). This is a
> > performance regression introduced when fixing another bug by
> > 3c73419c09a5ef73d56472dbfdade9e311496e9b and
> > ec0d215f9420564fc8286dcf93d2d068bb53a07e.
> > 

unix_write_space() doesn’t yet use the wake_up_interruptible_sync_poll()
to restrict wakeups to only POLLOUT | POLLWRNORM | POLLWRBAND interested
sleepers. Same for unix_dgram_recvmsg()

In your pathological case, each time the other process calls
unix_dgram_recvmsg(), it loops on 800 pollwake() /
default_wake_function() / try_to_wake_up(), which are obviously
expensive, as you pointed out with your test program, carefully designed
to show the false sharing and O(N^2) effect :)

Once do_select() thread can _really_ block, the false sharing problem
disappears for good.

We still loop on 800 items, on each wake_up_interruptible_sync_poll()
call, so maybe we want to optimize this later, adding a global key,
ORing all items keys. I dont think its worth the added complexity, given
the biased usage of your program (800 'listeners' to one event). Is it a
real life scenario ?

Thanks

[PATCH] af_unix: use keyed wakeups

Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to
wakeup only ones interested into writing the socket.

This patch is a specialization of commit 37e5540b3c9d (epoll keyed
wakeups: make sockets use keyed wakeups).

On a test program provided by Alan Crequy :

Before:
real    0m3.101s
user    0m0.000s
sys     0m6.104s

After:

real	0m0.211s
user	0m0.000s
sys	0m0.208s

Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
---
 net/unix/af_unix.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3c95304..f33c595 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -316,7 +316,8 @@ static void unix_write_space(struct sock *sk)
 	if (unix_writable(sk)) {
 		wq = rcu_dereference(sk->sk_wq);
 		if (wq_has_sleeper(wq))
-			wake_up_interruptible_sync(&wq->wait);
+			wake_up_interruptible_sync_poll(&wq->wait,
+				POLLOUT | POLLWRNORM | POLLWRBAND);
 		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
 	}
 	rcu_read_unlock();
@@ -1710,7 +1711,8 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 		goto out_unlock;
 	}
 
-	wake_up_interruptible_sync(&u->peer_wait);
+	wake_up_interruptible_sync_poll(&u->peer_wait,
+					POLLOUT | POLLWRNORM | POLLWRBAND);
 
 	if (msg->msg_name)
 		unix_copy_addr(msg, skb->sk);



^ permalink raw reply related

* [PATCH] isdn: mISDN: socket: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30  9:04 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Karsten Keil, Arnaldo Carvalho de Melo, David S. Miller,
	Tejun Heo, Eric Paris, netdev, linux-kernel

Structure mISDN_devinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 drivers/isdn/mISDN/socket.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index 3232206..7446d8b 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -392,6 +392,7 @@ data_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 		if (dev) {
 			struct mISDN_devinfo di;
 
+			memset(&di, 0, sizeof(di));
 			di.id = dev->id;
 			di.Dprotocols = dev->Dprotocols;
 			di.Bprotocols = dev->Bprotocols | get_all_Bprotocols();
@@ -672,6 +673,7 @@ base_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 		if (dev) {
 			struct mISDN_devinfo di;
 
+			memset(&di, 0, sizeof(di));
 			di.id = dev->id;
 			di.Dprotocols = dev->Dprotocols;
 			di.Bprotocols = dev->Bprotocols | get_all_Bprotocols();
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] af_unix: optimize unix_dgram_poll()
From: Eric Dumazet @ 2010-10-30  9:53 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Alban Crequy, David S. Miller, Stephen Hemminger, Cyrill Gorcunov,
	Alexey Dobriyan, netdev, Linux Kernel Mailing List,
	Pauli Nieminen, Rainer Weikusat
In-Reply-To: <alpine.DEB.2.00.1010291339180.8517@davide-lnx1>

Le vendredi 29 octobre 2010 à 13:46 -0700, Davide Libenzi a écrit :

> Also, why not using the existing wait->key instead of adding a poll2()?

Indeed, if wait is not null, we have in wait->key the interest of
poller. If a particular poll() function is expensive, it can test these
bits.

Thanks !

Note: I chose the 'goto skip_write' to make this patch really obvious.

[PATCH] af_unix: optimize unix_dgram_poll()

unix_dgram_poll() is pretty expensive to check POLLOUT status, because
it has to lock the socket to get its peer, take a reference on the peer
to check its receive queue status, and queue another poll_wait on
peer_wait. This all can be avoided if the process calling
unix_dgram_poll() is not interested in POLLOUT status. It makes
unix_dgram_recvmsg() faster by not queueing irrelevant pollers in
peer_wait.

On a test program provided by Alan Crequy :

Before:

real    0m0.211s
user    0m0.000s
sys     0m0.208s

After:

real	0m0.044s
user	0m0.000s
sys	0m0.040s

Suggested-by: Davide Libenzi <davidel@xmailserver.org>
Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/unix/af_unix.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3c95304..dcb84fe 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2090,6 +2090,9 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 			return mask;
 	}
 
+	if (wait && !(wait->key & (POLLWRBAND | POLLWRNORM | POLLOUT)))
+		goto skip_write;
+
 	/* writable? */
 	writable = unix_writable(sk);
 	if (writable) {
@@ -2111,6 +2114,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	else
 		set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
 
+skip_write:
 	return mask;
 }
 




^ permalink raw reply related

* Re: [PATCH 0/1] RFC: poll/select performance on datagram sockets
From: Alban Crequy @ 2010-10-30 11:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Cyrill Gorcunov,
	Alexey Dobriyan, netdev, linux-kernel, Pauli Nieminen,
	Rainer Weikusat, Davide Libenzi
In-Reply-To: <1288380431.2680.3.camel@edumazet-laptop>

Le Fri, 29 Oct 2010 21:27:11 +0200,
Eric Dumazet <eric.dumazet@gmail.com> a écrit :

> Le vendredi 29 octobre 2010 à 19:18 +0100, Alban Crequy a écrit :
> > Hi,
> > 
> > When a process calls the poll or select, the kernel calls (struct
> > file_operations)->poll on every file descriptor and returns a mask
> > of events which are ready. If the process is only interested by
> > POLLIN events, the mask is still computed for POLLOUT and it can be
> > expensive. For example, on Unix datagram sockets, a process running
> > poll() with POLLIN will wakes-up when the remote end call read().
> > This is a performance regression introduced when fixing another bug
> > by 3c73419c09a5ef73d56472dbfdade9e311496e9b and
> > ec0d215f9420564fc8286dcf93d2d068bb53a07e.
> > 
> > The attached program illustrates the problem. It compares the
> > performance of sending/receiving data on an Unix datagram socket and
> > select(). When the datagram sockets are not connected, the
> > performance problem is not triggered, but when they are connected
> > it becomes a lot slower. On my computer, I have the following time:
> > 
> > Connected datagram sockets: >4 seconds
> > Non-connected datagram sockets: <1 second
> > 
> > The patch attached in the next email fixes the performance problem:
> > it becomes <1 second for both cases. I am not suggesting the patch
> > for inclusion; I would like to change the prototype of (struct
> > file_operations)->poll instead of adding ->poll2. But there is a
> > lot of poll functions to change (grep tells me 337 functions).
> > 
> > Any opinions?
> 
> My opinion would be to use epoll() for this kind of workload.

I found a problem with epoll() with the following program. When there
is several datagram sockets connected to the same server and the
receiving queue is full, epoll(EPOLLOUT) wakes up only the emitter who
has its skb removed from the queue, and not all the emitters. It is
because sock_wfree() runs sk->sk_write_space() only for one emitter.

poll/select do not have this problem.

-----------------------

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <time.h>
#include <sys/epoll.h>

int
main(int argc, char *argv[])
{

// cat /proc/sys/net/unix/max_dgram_qlen
#define NB_CLIENTS (10 + 2)

  int client_fds[NB_CLIENTS];
  int server_fd;
  struct sockaddr_un addr_server;
  int epollfd = epoll_create(10);
#define MAX_EVENTS 10
  struct epoll_event ev, events[MAX_EVENTS];
  int i;
  char buffer[1024];

  int trigger = atoi(argv[1]);
  printf("trigger = %d\n", trigger);

  memset(&addr_server, 0, sizeof(addr_server));
  addr_server.sun_family = AF_UNIX;
  addr_server.sun_path[0] = '\0';
  strcpy(addr_server.sun_path + 1, "dgram_perfs_server");

  server_fd = socket(AF_UNIX, SOCK_DGRAM, 0);
  bind(server_fd, (struct sockaddr*)&addr_server, sizeof(addr_server));

  for (i = 0 ; i < NB_CLIENTS ; i++)
    {
      client_fds[i] = socket(AF_UNIX, SOCK_DGRAM, 0);
    }

  ev.events = EPOLLOUT;
  ev.data.fd = client_fds[NB_CLIENTS-1];
  if (epoll_ctl(epollfd, EPOLL_CTL_ADD, client_fds[NB_CLIENTS-1], &ev) == -1) {
      perror("epoll_ctl: client_fds max");
      exit(EXIT_FAILURE);
  }
  if (trigger == 0)
    {
      ev.events = EPOLLOUT;
      ev.data.fd = client_fds[0];
      if (epoll_ctl(epollfd, EPOLL_CTL_ADD, client_fds[0], &ev) == -1) {
        perror("epoll_ctl: client_fds 0");
        exit(EXIT_FAILURE);
      }
    }

  for (i = 0 ; i < NB_CLIENTS ; i++)
    {
      connect(client_fds[i], (struct sockaddr*)&addr_server, sizeof(addr_server));
    }

  if (fork() > 0)
    {
      for (i = 0 ; i < NB_CLIENTS - 1 ; i++)
        sendto(client_fds[i], "S", 1, 0, (struct sockaddr*)&addr_server, sizeof(addr_server));
      printf("Everything sent successfully. Now epoll_wait...\n");

      epoll_wait(epollfd, events, MAX_EVENTS, -1);
      printf("epoll_wait works fine :-)\n");
      wait(NULL);
      exit(0);
    }

  sleep(1);

  printf("Receiving one buffer...\n");
  recv(server_fd, buffer, 1024, 0);
  printf("One buffer received\n");

  return 0;
}

^ permalink raw reply

* Re: [PATCH 0/1] RFC: poll/select performance on datagram sockets
From: Eric Dumazet @ 2010-10-30 12:53 UTC (permalink / raw)
  To: Alban Crequy
  Cc: David S. Miller, Stephen Hemminger, Cyrill Gorcunov,
	Alexey Dobriyan, netdev, linux-kernel, Pauli Nieminen,
	Rainer Weikusat, Davide Libenzi
In-Reply-To: <20101030123403.5e01540d@chocolatine.cbg.collabora.co.uk>

Le samedi 30 octobre 2010 à 12:34 +0100, Alban Crequy a écrit :
> Le Fri, 29 Oct 2010 21:27:11 +0200,
> Eric Dumazet <eric.dumazet@gmail.com> a écrit :
> 
> > Le vendredi 29 octobre 2010 à 19:18 +0100, Alban Crequy a écrit :
> > > Hi,
> > > 
> > > When a process calls the poll or select, the kernel calls (struct
> > > file_operations)->poll on every file descriptor and returns a mask
> > > of events which are ready. If the process is only interested by
> > > POLLIN events, the mask is still computed for POLLOUT and it can be
> > > expensive. For example, on Unix datagram sockets, a process running
> > > poll() with POLLIN will wakes-up when the remote end call read().
> > > This is a performance regression introduced when fixing another bug
> > > by 3c73419c09a5ef73d56472dbfdade9e311496e9b and
> > > ec0d215f9420564fc8286dcf93d2d068bb53a07e.
> > > 
> > > The attached program illustrates the problem. It compares the
> > > performance of sending/receiving data on an Unix datagram socket and
> > > select(). When the datagram sockets are not connected, the
> > > performance problem is not triggered, but when they are connected
> > > it becomes a lot slower. On my computer, I have the following time:
> > > 
> > > Connected datagram sockets: >4 seconds
> > > Non-connected datagram sockets: <1 second
> > > 
> > > The patch attached in the next email fixes the performance problem:
> > > it becomes <1 second for both cases. I am not suggesting the patch
> > > for inclusion; I would like to change the prototype of (struct
> > > file_operations)->poll instead of adding ->poll2. But there is a
> > > lot of poll functions to change (grep tells me 337 functions).
> > > 
> > > Any opinions?
> > 
> > My opinion would be to use epoll() for this kind of workload.
> 
> I found a problem with epoll() with the following program. When there
> is several datagram sockets connected to the same server and the
> receiving queue is full, epoll(EPOLLOUT) wakes up only the emitter who
> has its skb removed from the queue, and not all the emitters. It is
> because sock_wfree() runs sk->sk_write_space() only for one emitter.
> 

I dont think this is the reason.

sock_wfree() really is good here, since it copes with one socket (the
one that sent the message)

Problem is the peer_wait, that epoll doesnt seem to be plugged into.

Bug is in unix_dgram_poll()

It calls sock_poll_wait( ... &unix_sk(other)->peer_wait,) only if socket
is 'writable'. Its a clear bug

Try this patch please ?

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0ebc777..315716c 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2092,7 +2092,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 
 	/* writable? */
 	writable = unix_writable(sk);
-	if (writable) {
+	if (1 /*writable*/) {
 		other = unix_peer_get(sk);
 		if (other) {
 			if (unix_peer(other) != sk) {

^ permalink raw reply related

* Re: [Bugme-new] [Bug 16350] New: RTL8103EL interface new tcp connections get stuck on SYN_SENT state after 10 hours uptime
From: zic422 @ 2010-10-30 12:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, bugzilla-daemon, bugme-daemon, Francois Romieu
In-Reply-To: <20100708142526.5bf5938e.akpm@linux-foundation.org>

Now I cannot reproduce this bug. Don't know when it get banished. I
did not managed to reach more than ten hours uptime last two months.

On Fri, Jul 9, 2010 at 12:25 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 7 Jul 2010 14:04:59 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=16350
>>
>>            Summary: RTL8103EL interface new tcp connections get stuck on
>>                     SYN_SENT state after 10 hours uptime
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 2.6.34.1
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: IPV4
>>         AssignedTo: shemminger@linux-foundation.org
>>         ReportedBy: zic422@gmail.com
>>         Regression: No
>>
>>
>> Created an attachment (id=27038)
>>  --> (https://bugzilla.kernel.org/attachment.cgi?id=27038)
>> ioports, iomem, lspci, ver_linux and kernel config
>>
>> After 9-11 hours uptime network interface of RTL8103EL card cannot make new tcp
>> connections to internet and correctly close existing ones. ICMP and UDP works
>> fine. Besides r8169 driver I tried also the r8101-1.015.00 one from realtek
>> site with older kernel and bug were in place.
>>
>> Connections inside home network work normally. I tried to reboot router but
>> nor reconnecting ethernet cable nor restarting network daemon nor reloading
>> kernel module does not make any help. Bug disappears only after rebooting my
>> pc.
>>
>> repeatability of bug is 100% on arch, debian and gentoo
>>
>> default kernel config
>>
>> http://bugs.archlinux.org/task/19162  (this bug)

^ permalink raw reply

* Re: [PATCH 0/1] RFC: poll/select performance on datagram sockets
From: Eric Dumazet @ 2010-10-30 13:17 UTC (permalink / raw)
  To: Alban Crequy
  Cc: David S. Miller, Stephen Hemminger, Cyrill Gorcunov,
	Alexey Dobriyan, netdev, linux-kernel, Pauli Nieminen,
	Rainer Weikusat, Davide Libenzi
In-Reply-To: <1288443217.2680.962.camel@edumazet-laptop>


> Problem is the peer_wait, that epoll doesnt seem to be plugged into.
> 
> Bug is in unix_dgram_poll()
> 
> It calls sock_poll_wait( ... &unix_sk(other)->peer_wait,) only if socket
> is 'writable'. Its a clear bug
> 
> Try this patch please ?
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 0ebc777..315716c 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2092,7 +2092,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>  
>  	/* writable? */
>  	writable = unix_writable(sk);
> -	if (writable) {
> +	if (1 /*writable*/) {
>  		other = unix_peer_get(sk);
>  		if (other) {
>  			if (unix_peer(other) != sk) {
> 
> 
> 

Also you'll need to change your program to make the epoll registrations
_after_ sockets connect(), or else you can see that epoll() wont know
about the other peer stuff.

for (i = 0 ; i < NB_CLIENTS ; i++) {
	client_fds[i] = socket(AF_UNIX, SOCK_DGRAM, 0);
}


for (i = 0 ; i < NB_CLIENTS ; i++) {
	connect(client_fds[i], (struct sockaddr*)&addr_server, sizeof(addr_server));
}

ev.events = EPOLLOUT;
ev.data.fd = client_fds[NB_CLIENTS-1];
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, client_fds[NB_CLIENTS-1], &ev) == -1) {
      perror("epoll_ctl: client_fds max");
      exit(EXIT_FAILURE);
}
if (trigger == 0) {
      ev.events = EPOLLOUT;
      ev.data.fd = client_fds[0];
      if (epoll_ctl(epollfd, EPOLL_CTL_ADD, client_fds[0], &ev) == -1) {
        perror("epoll_ctl: client_fds 0");
        exit(EXIT_FAILURE);
      }
}




^ permalink raw reply

* [PATCH] bluetooth: bnep: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Eric Dumazet, Thadeu Lima de Souza Cascardo, Tejun Heo,
	Jiri Kosina, linux-bluetooth, netdev, linux-kernel

Structure bnep_conninfo is copied to userland with the field "device"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/bluetooth/bnep/core.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
index f10b41f..5868597 100644
--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -648,6 +648,7 @@ int bnep_del_connection(struct bnep_conndel_req *req)
 
 static void __bnep_copy_ci(struct bnep_conninfo *ci, struct bnep_session *s)
 {
+	memset(ci, 0, sizeof(*ci));
 	memcpy(ci->dst, s->eh.h_source, ETH_ALEN);
 	strcpy(ci->device, s->dev->name);
 	ci->flags = s->flags;
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] bluetooth: cmtp: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Marcel Holtmann, Gustavo F. Padovan, David S. Miller,
	Eric Dumazet, linux-bluetooth, netdev, linux-kernel

Structure cmtp_conninfo is copied to userland with some padding fields
unitialized.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/bluetooth/cmtp/core.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index ec0a134..8e5f292 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -78,6 +78,7 @@ static void __cmtp_unlink_session(struct cmtp_session *session)
 
 static void __cmtp_copy_session(struct cmtp_session *session, struct cmtp_conninfo *ci)
 {
+	memset(ci, 0, sizeof(*ci));
 	bacpy(&ci->bdaddr, &session->bdaddr);
 
 	ci->flags = session->flags;
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] bluetooth: hidp: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Marcel Holtmann, Gustavo F. Padovan, David S. Miller, Jiri Kosina,
	Michael Poole, Bastien Nocera, linux-bluetooth, netdev,
	linux-kernel

Structure hidp_conninfo is copied to userland with version, product,
vendor and name fields unitialized if both session->input and session->hid
are NULL.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/bluetooth/hidp/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index c0ee8b3..29544c2 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -107,6 +107,7 @@ static void __hidp_unlink_session(struct hidp_session *session)
 
 static void __hidp_copy_session(struct hidp_session *session, struct hidp_conninfo *ci)
 {
+	memset(ci, 0, sizeof(*ci));
 	bacpy(&ci->bdaddr, &session->bdaddr);
 
 	ci->flags = session->flags;
@@ -115,7 +116,6 @@ static void __hidp_copy_session(struct hidp_session *session, struct hidp_connin
 	ci->vendor  = 0x0000;
 	ci->product = 0x0000;
 	ci->version = 0x0000;
-	memset(ci->name, 0, 128);
 
 	if (session->input) {
 		ci->vendor  = session->input->id.vendor;
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] net: core: scm: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: David S. Miller, Eric W. Biederman, Eric Dumazet, Tejun Heo,
	Serge E. Hallyn, netdev, linux-kernel

Structure cmsghdr is copied to userland with padding bytes
unitialized on architectures where __kernel_size_t is unsigned long.
It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/core/scm.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 413cab8..a4a9b70 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -233,6 +233,7 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
 		msg->msg_flags |= MSG_CTRUNC;
 		cmlen = msg->msg_controllen;
 	}
+	memset(&cmhdr, 0, sizeof(cmhdr));
 	cmhdr.cmsg_level = level;
 	cmhdr.cmsg_type = type;
 	cmhdr.cmsg_len = cmlen;
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] net: core: sock: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: David S. Miller, Eric Dumazet, Eric W. Biederman, Herbert Xu,
	Paul E. McKenney, netdev, linux-kernel

"Address" variable might be not fully initialized in sock->ops->get_name().
The only current implementation is get_name(), it leaves some padding
fields of sockaddr_tipc uninitialized.  It leads to leaking of contents
of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/core/sock.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 3eed542..759dd81 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -930,6 +930,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 	{
 		char address[128];
 
+		memset(&address, 0, sizeof(address));
 		if (sock->ops->getname(sock, (struct sockaddr *)address, &lv, 2))
 			return -ENOTCONN;
 		if (lv < len)
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH] ipv4: netfilter: arp_tables: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	netfilter-devel, netfilter, coreteam, netdev, linux-kernel

Structure arpt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/ipv4/netfilter/arp_tables.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 3cad259..3fac340 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -927,6 +927,7 @@ static int get_info(struct net *net, void __user *user,
 			private = &tmp;
 		}
 #endif
+		memset(&info, 0, sizeof(info));
 		info.valid_hooks = t->valid_hooks;
 		memcpy(info.hook_entry, private->hook_entry,
 		       sizeof(info.hook_entry));
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] ipv4: netfilter: ip_tables: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:26 UTC (permalink / raw)
  To: kernel-janitors
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	netfilter-devel, netfilter, coreteam, netdev, linux-kernel

Structure ipt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
---
 Compile tested.

 net/ipv4/netfilter/ip_tables.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index d31b007..a846d63 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1124,6 +1124,7 @@ static int get_info(struct net *net, void __user *user,
 			private = &tmp;
 		}
 #endif
+		memset(&info, 0, sizeof(info));
 		info.valid_hooks = t->valid_hooks;
 		memcpy(info.hook_entry, private->hook_entry,
 		       sizeof(info.hook_entry));
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH] net: core: scm: fix information leak to userland
From: Eric Dumazet @ 2010-10-30 14:33 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: kernel-janitors, David S. Miller, Eric W. Biederman, Tejun Heo,
	Serge E. Hallyn, netdev, linux-kernel
In-Reply-To: <1288448796-6147-1-git-send-email-segooon@gmail.com>

Le samedi 30 octobre 2010 à 18:26 +0400, Vasiliy Kulikov a écrit :
> Structure cmsghdr is copied to userland with padding bytes
> unitialized on architectures where __kernel_size_t is unsigned long.
> It leads to leaking of contents of kernel stack memory.
> 
> Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> ---
>  Compile tested.
> 
>  net/core/scm.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/scm.c b/net/core/scm.c
> index 413cab8..a4a9b70 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -233,6 +233,7 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
>  		msg->msg_flags |= MSG_CTRUNC;
>  		cmlen = msg->msg_controllen;
>  	}
> +	memset(&cmhdr, 0, sizeof(cmhdr));
>  	cmhdr.cmsg_level = level;
>  	cmhdr.cmsg_type = type;
>  	cmhdr.cmsg_len = cmlen;


???

struct cmsghdr {
        __kernel_size_t cmsg_len;       /* data byte count, including hdr */
        int             cmsg_level;     /* originating protocol */
        int             cmsg_type;      /* protocol-specific type */
};

Could you explain where are the padding bytes ?




^ permalink raw reply

* Re: [PATCH] net: core: sock: fix information leak to userland
From: Eric Dumazet @ 2010-10-30 14:35 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: kernel-janitors, David S. Miller, Eric W. Biederman, Herbert Xu,
	Paul E. McKenney, netdev, linux-kernel
In-Reply-To: <1288448801-6303-1-git-send-email-segooon@gmail.com>

Le samedi 30 octobre 2010 à 18:26 +0400, Vasiliy Kulikov a écrit :
> "Address" variable might be not fully initialized in sock->ops->get_name().
> The only current implementation is get_name(), it leaves some padding
> fields of sockaddr_tipc uninitialized.  It leads to leaking of contents
> of kernel stack memory.
> 
> Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> ---
>  Compile tested.
> 
>  net/core/sock.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 3eed542..759dd81 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -930,6 +930,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
>  	{
>  		char address[128];
>  
> +		memset(&address, 0, sizeof(address));
>  		if (sock->ops->getname(sock, (struct sockaddr *)address, &lv, 2))
>  			return -ENOTCONN;
>  		if (lv < len)

???

Please fix the real bug.

^ permalink raw reply

* Re: [PATCH] net: core: sock: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kernel-janitors, David S. Miller, Eric W. Biederman, Herbert Xu,
	Paul E. McKenney, netdev, linux-kernel
In-Reply-To: <1288449350.2680.970.camel@edumazet-laptop>

On Sat, Oct 30, 2010 at 16:35 +0200, Eric Dumazet wrote:
> Le samedi 30 octobre 2010 à 18:26 +0400, Vasiliy Kulikov a écrit :
> > "Address" variable might be not fully initialized in sock->ops->get_name().
> > The only current implementation is get_name(), it leaves some padding
> > fields of sockaddr_tipc uninitialized.  It leads to leaking of contents
> > of kernel stack memory.
> > 
> > Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> > ---
> >  Compile tested.
> > 
> >  net/core/sock.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 3eed542..759dd81 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -930,6 +930,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
> >  	{
> >  		char address[128];
> >  
> > +		memset(&address, 0, sizeof(address));
> >  		if (sock->ops->getname(sock, (struct sockaddr *)address, &lv, 2))
> >  			return -ENOTCONN;
> >  		if (lv < len)
> 
> ???
> 
> Please fix the real bug.

What if somebody want to create his own implementation of getname()?
IMO it's much safer to introduce memset() here and relax getname()'s
responsibilities.  Quite many drivers "forget" to initialize outputs
structures.  E.g. new net_device's private field is kzalloc'ed to
simplify driver's code.

-- 
Vasiliy

^ permalink raw reply

* Re: [PATCH] net: core: scm: fix information leak to userland
From: Vasiliy Kulikov @ 2010-10-30 14:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kernel-janitors, David S. Miller, Eric W. Biederman, Tejun Heo,
	Serge E. Hallyn, netdev, linux-kernel
In-Reply-To: <1288449205.2680.968.camel@edumazet-laptop>

On Sat, Oct 30, 2010 at 16:33 +0200, Eric Dumazet wrote:
> Le samedi 30 octobre 2010 à 18:26 +0400, Vasiliy Kulikov a écrit :
> > Structure cmsghdr is copied to userland with padding bytes
> > unitialized on architectures where __kernel_size_t is unsigned long.
> > It leads to leaking of contents of kernel stack memory.
> > 
> > Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
> > ---
> >  Compile tested.
> > 
> >  net/core/scm.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/net/core/scm.c b/net/core/scm.c
> > index 413cab8..a4a9b70 100644
> > --- a/net/core/scm.c
> > +++ b/net/core/scm.c
> > @@ -233,6 +233,7 @@ int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
> >  		msg->msg_flags |= MSG_CTRUNC;
> >  		cmlen = msg->msg_controllen;
> >  	}
> > +	memset(&cmhdr, 0, sizeof(cmhdr));
> >  	cmhdr.cmsg_level = level;
> >  	cmhdr.cmsg_type = type;
> >  	cmhdr.cmsg_len = cmlen;
> 
> 
> ???
> 
> struct cmsghdr {
>         __kernel_size_t cmsg_len;       /* data byte count, including hdr */
>         int             cmsg_level;     /* originating protocol */
>         int             cmsg_type;      /* protocol-specific type */
> };
> 
> Could you explain where are the padding bytes ?

Ah, sorry, nowhere :)  int is stored quite OK after long.  Please ignore this patch.

Thanks,

-- 
Vasiliy

^ permalink raw reply

* Re: [PATCH] af_unix: unix_write_space() use keyed wakeups
From: Davide Libenzi @ 2010-10-30 15:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alban Crequy, David S. Miller, Stephen Hemminger, Cyrill Gorcunov,
	Alexey Dobriyan, netdev, linux-kernel, Pauli Nieminen,
	Rainer Weikusat
In-Reply-To: <1288421084.2680.717.camel@edumazet-laptop>

On Sat, 30 Oct 2010, Eric Dumazet wrote:

> This patch is a specialization of commit 37e5540b3c9d (epoll keyed
> wakeups: make sockets use keyed wakeups).

Whoops, I must have missed AF_UNIX :)
Looks good to me.


- Davide



^ permalink raw reply

* Pica8 - Linux Cloud Switch
From: Lin Pica8 @ 2010-10-30 16:54 UTC (permalink / raw)
  To: netdev

Hello,

For your information :

http://www.pica8.org/

http://www.networkworld.com/news/2010/102810-pica8-opensource-switching.html?page=1

Mail : pica8@gmail.org

^ permalink raw reply

* Re: [PATCH] macvlan: Introduce 'passthru' mode to takeover the underlying device
From: Michael S. Tsirkin @ 2010-10-30 16:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sridhar Samudrala, Patrick McHardy, Stephen Hemminger, netdev,
	kvm@vger.kernel.org
In-Reply-To: <201010291545.17910.arnd@arndb.de>

On Fri, Oct 29, 2010 at 03:45:17PM +0200, Arnd Bergmann wrote:
> On Friday 29 October 2010, Sridhar Samudrala wrote:
> > With the current default 'vepa' mode, a KVM guest using virtio with 
> > macvtap backend has the following limitations.
> > - cannot change/add a mac address on the guest virtio-net
> 
> I believe this could be changed if there is a neeed, but I actually
> consider it one of the design points of macvlan that the guest
> is not able to change the mac address. With 802.1Qbg you rely on
> the switch being able to identify the guest by its MAC address,
> which the host kernel must ensure.
> 
> > - cannot create a vlan device on the guest virtio-net
> 
> Why not? If this doesn't work, it's probably a bug!
> Why does the passthru mode enable it if it doesn't work
> already?
> 
> > - cannot enable promiscuous mode on guest virtio-net
> 
> Could you elaborate why such a setup would be useful?
> 
> 	Arnd

E.g. to support bridging in the guest.

-- 
MST

^ permalink raw reply

* Re: [PATCH] macvlan: Introduce 'passthru' mode to takeover the underlying device
From: Michael S. Tsirkin @ 2010-10-30 17:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sridhar Samudrala, Patrick McHardy, Stephen Hemminger, netdev,
	kvm@vger.kernel.org
In-Reply-To: <201010291545.17910.arnd@arndb.de>

On Fri, Oct 29, 2010 at 03:45:17PM +0200, Arnd Bergmann wrote:
> On Friday 29 October 2010, Sridhar Samudrala wrote:
> > With the current default 'vepa' mode, a KVM guest using virtio with 
> > macvtap backend has the following limitations.
> > - cannot change/add a mac address on the guest virtio-net
> 
> I believe this could be changed if there is a neeed, but I actually
> consider it one of the design points of macvlan that the guest
> is not able to change the mac address.

It's a policy question that we should not set at the kernel level.

> With 802.1Qbg you rely on
> the switch being able to identify the guest by its MAC address,
> which the host kernel must ensure.

This is required to be able to get feature parity with
both tun and device assignment. At the moment, changing
the mac when using macvtap silently breaks guest networking.

-- 
MST

^ permalink raw reply

* @ms_world you won!
From: (ms-world) @ 2010-10-30 18:07 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: YOU HAVE WON A PRIZE.doc --]
[-- Type: application/msword, Size: 196096 bytes --]

^ permalink raw reply

* Re: [PATCH] af_unix: optimize unix_dgram_poll()
From: Davide Libenzi @ 2010-10-30 17:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Davide Libenzi, Alban Crequy, David S. Miller, Stephen Hemminger,
	Cyrill Gorcunov, Alexey Dobriyan, netdev,
	Linux Kernel Mailing List, Pauli Nieminen, Rainer Weikusat
In-Reply-To: <1288432420.2680.933.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 740 bytes --]

On Sat, 30 Oct 2010, Eric Dumazet wrote:

> Le vendredi 29 octobre 2010 à 13:46 -0700, Davide Libenzi a écrit :
> 
> > Also, why not using the existing wait->key instead of adding a poll2()?
> 
> Indeed, if wait is not null, we have in wait->key the interest of
> poller. If a particular poll() function is expensive, it can test these
> bits.
> 
> Thanks !
> 
> Note: I chose the 'goto skip_write' to make this patch really obvious.

Plain agreement on th patch, and I understand the indent overflow 
concerns, but why not ...

	/*
	 * No write status requested, avoid expensive OUT tests.
	 */
	if (wait && !(wait->key & (POLLWRBAND | POLLWRNORM | POLLOUT)))
		return mask

The write-test code is the last one we do anyway.


- Davide


^ permalink raw reply

* Re: [PATCH 1/2] IPv6: conntrack: Use protocol-related initialization routine to initial queues of IPv6 connection track
From: Pascal Hambourg @ 2010-10-30 18:41 UTC (permalink / raw)
  To: David Miller; +Cc: kaber, shanwei, yasuyuki.kozakai, netfilter-devel, netdev
In-Reply-To: <20100126.051147.256313206.davem@davemloft.net>

Hello,

David Miller a écrit :
> 
> Anyways, meanwhile I'll apply the fix.  And yes I know it needs
> to go to stable too... :-)

It appears that the patch "ipv6: reassembly: use seperate reassembly
queues for conntrack and local delivery" went to both the 2.6.32 and
2.6.27 stable branches, but the fix from Shan Wei went to the 2.6.32
branch only, not to the 2.6.27 branch. However I checked that it applies
 on the latest 2.6.27 release, and fixes the problem.

Could someone submit it for the next 2.6.27 stable release ? Or if I
have to do it, what is the process ?
Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox