Netdev List
 help / color / mirror / Atom feed
* Re: mac80211: NOHZ: local_softirq_pending 08
From: Oliver Hartkopp @ 2009-09-30 17:51 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Michael Buesch, Kalle Valo, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1254324077.3959.7.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org>

Johannes Berg wrote:
> On Wed, 2009-09-30 at 17:10 +0200, Michael Buesch wrote:
>> On Wednesday 30 September 2009 16:54:26 Johannes Berg wrote:
>>> On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:
>>>
>>>> I agree with Michael. The bug is real and I have verified that
>>>> Michael's patch fixes the issue. Better to apply the patch now, it's
>>>> trivial to change the implementation if/when the network stack has
>>>> support for this.
>>> FWIW, I think in mac80211 the in_interrupt() check can never return true
>>> since we postpone all RX to the tasklet. But the tasklet seems to be ok
>>> -- so should it really be in_interrupt()?
>> I think a tasklet is also in_interrupt(), because it's a softirq.
> 
> Ah, yes, indeed, in_interrupt() vs. in_irq().
> 

Oops!

I missed that for my previous patch i added for two occurrences in the CAN
sources.

I'm currently compiling the patch for netif_rx_ti() and will post it in some
minutes (for CAN and mac80211) when it runs without probs.

Regards,
Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] tg3: Remove prev_vlan_tag from struct tx_ring_info
From: Matt Carlson @ 2009-09-30 17:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Matthew Carlson, Linux Netdev List, Michael Chan
In-Reply-To: <4AC36638.8070304@gmail.com>

On Wed, Sep 30, 2009 at 07:07:52AM -0700, Eric Dumazet wrote:
> prev_vlan_tag field is not used.
> 
> Patch saves 512*8 bytes per tx queue ring on 64bit arches.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good to me.

Acked-by: Matthew Carlson <mcarlson@broadcom.com>

> ---
> 
> diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
> index 82b45d8..524691c 100644
> --- a/drivers/net/tg3.h
> +++ b/drivers/net/tg3.h
> @@ -2412,7 +2412,6 @@ struct ring_info {
>  
>  struct tx_ring_info {
>  	struct sk_buff                  *skb;
> -	u32                             prev_vlan_tag;
>  };
>  
>  struct tg3_config_info {
> 


^ permalink raw reply

* Re: N_PPP_SYNC ldisc BUG: sleeping function called from invalid context
From: Alan Cox @ 2009-09-30 16:47 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: linux-kernel, netdev, Alan Cox
In-Reply-To: <4AC37032.4030901@imap.cc>

>  [<c026d39b>] tty_unthrottle+0x10/0x38
>  [<f8dcc31f>] ppp_sync_receive+0x168/0x170 [ppp_synctty]
>  [<f8fbb9ce>] handle_minor_recv+0x187/0x1cd [capi]
>  [<f8fbc19b>] capi_recv_message+0x1d9/0x24e [capi]

Really need to see the rest of the call trace to be sure

> Turns out the ppp_sync_receive() function (drivers/net/ppp_synctty.c
> line 385ff.) has a comment in front stating:
> 
> /*
>  * This can now be called from hard interrupt level as well
>  * as soft interrupt level or mainline.
>  */

Which is wrong. The flip_buffer_push -> rx processing path should never
be called from IRQ context and that was fixed for various drivers that
mis-set tty->low_latency, as well as in the PPP rework. The PPP case is
actually unrelated in many was.

> Opinions?

See how we got into that code direct from an IRQ path. The expectation of
the tty logic is that it gets processed from work queues either
specifically in driver or via tty_flip_buffer_push when tty->low_latency
= 0

^ permalink raw reply

* Re: [PATCH] ipvs: Add boundary check on ioctl arguments
From: Hannes Eder @ 2009-09-30 15:33 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Wensong Zhang, netdev, linux-kernel, Simon Horman
In-Reply-To: <20090930171833.5ce0011d@infradead.org>

On Wed, Sep 30, 2009 at 17:18, Arjan van de Ven <arjan@infradead.org> wrote:
> On Wed, 30 Sep 2009 15:38:12 +0200
> Hannes Eder <heder@google.com> wrote:
>>  > @@ -2353,17 +2357,25 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd,
>>  > void __user
>> *user, int *len)
>>  >  {
>>  >    unsigned char arg[128];
>>
>> can MAX_ARG_LEN be used here?
>
> I am not convinced... it is a different numerical value,
> so it could be an ABI change. Rather not do that in this
> type of patch...

For do_ip_vs_set_ctl MAX_ARG_LEN is used:

static int
do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
{
	int ret;
	unsigned char arg[MAX_ARG_LEN];
...

I assume that will be fine for do_ip_vs_get_ctl as well.

-Hannes

^ permalink raw reply

* Re: mac80211: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-09-30 15:21 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Kalle Valo, Oliver Hartkopp, John W. Linville, linux-wireless,
	netdev
In-Reply-To: <200909301710.31082.mb@bu3sch.de>

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

On Wed, 2009-09-30 at 17:10 +0200, Michael Buesch wrote:
> On Wednesday 30 September 2009 16:54:26 Johannes Berg wrote:
> > On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:
> > 
> > > I agree with Michael. The bug is real and I have verified that
> > > Michael's patch fixes the issue. Better to apply the patch now, it's
> > > trivial to change the implementation if/when the network stack has
> > > support for this.
> > 
> > FWIW, I think in mac80211 the in_interrupt() check can never return true
> > since we postpone all RX to the tasklet. But the tasklet seems to be ok
> > -- so should it really be in_interrupt()?
> 
> I think a tasklet is also in_interrupt(), because it's a softirq.

Ah, yes, indeed, in_interrupt() vs. in_irq().

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH] ipvs: Add boundary check on ioctl arguments
From: Arjan van de Ven @ 2009-09-30 15:18 UTC (permalink / raw)
  To: Hannes Eder; +Cc: Wensong Zhang, netdev, linux-kernel, Simon Horman
In-Reply-To: <4AC35F44.60707@google.com>

On Wed, 30 Sep 2009 15:38:12 +0200
Hannes Eder <heder@google.com> wrote:
>  > @@ -2353,17 +2357,25 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd,
>  > void __user 
> *user, int *len)
>  >  {
>  >  	unsigned char arg[128];
> 
> can MAX_ARG_LEN be used here?

I am not convinced... it is a different numerical value,
so it could be an ABI change. Rather not do that in this
type of patch...

>  > +	copylen = get_arglen[GET_CMDID(cmd)];
>  > +	if (copylen > 128)
> 
> I think it's better to use 'copylen > sizeof(arg)' here.

fair enough; updated patch below

>From 28ae217858e683c0c94c02219d46a9a9c87f61c6 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Wed, 30 Sep 2009 13:05:51 +0200
Subject: [PATCH] ipvs: Add boundary check on ioctl arguments

The ipvs code has a nifty system for doing the size of ioctl command copies;
it defines an array with values into which it indexes the cmd to find the
right length.

Unfortunately, the ipvs code forgot to check if the cmd was in the range
that the array provides, allowing for an index outside of the array,
which then gives a "garbage" result into the length, which then gets
used for copying into a stack buffer.

Fix this by adding sanity checks on these as well as the copy size.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index ac624e5..7adc876 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2077,6 +2077,10 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX + 1)
+		return -EINVAL;
+	if (len < 0 || len >  sizeof(arg))
+		return -EINVAL;
 	if (len != set_arglen[SET_CMDID(cmd)]) {
 		pr_err("set_ctl: len %u != %u\n",
 		       len, set_arglen[SET_CMDID(cmd)]);
@@ -2353,17 +2357,25 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	unsigned char arg[128];
 	int ret = 0;
+	unsigned int copylen;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX + 1)
+		return -EINVAL;
+
 	if (*len < get_arglen[GET_CMDID(cmd)]) {
 		pr_err("get_ctl: len %u < %u\n",
 		       *len, get_arglen[GET_CMDID(cmd)]);
 		return -EINVAL;
 	}
 
-	if (copy_from_user(arg, user, get_arglen[GET_CMDID(cmd)]) != 0)
+	copylen = get_arglen[GET_CMDID(cmd)];
+	if (copylen > sizeof(arg))
+		return -EINVAL;
+
+	if (copy_from_user(arg, user, copylen) != 0)
 		return -EFAULT;
 
 	if (mutex_lock_interruptible(&__ip_vs_mutex))
-- 
1.6.2.5



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* Re: mac80211: NOHZ: local_softirq_pending 08
From: Michael Buesch @ 2009-09-30 15:10 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Kalle Valo, Oliver Hartkopp, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1254322466.3959.5.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org>

On Wednesday 30 September 2009 16:54:26 Johannes Berg wrote:
> On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:
> 
> > I agree with Michael. The bug is real and I have verified that
> > Michael's patch fixes the issue. Better to apply the patch now, it's
> > trivial to change the implementation if/when the network stack has
> > support for this.
> 
> FWIW, I think in mac80211 the in_interrupt() check can never return true
> since we postpone all RX to the tasklet. But the tasklet seems to be ok
> -- so should it really be in_interrupt()?

I think a tasklet is also in_interrupt(), because it's a softirq.
in_interrupt() returns false in process context. The problem appeared when
the b43 driver started passing RX frames while being in process context (threaded IRQ).
It previously was in tasklet (= softirq) context.

-- 
Greetings, Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: mac80211: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-09-30 14:54 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Michael Buesch, Oliver Hartkopp, John W. Linville, linux-wireless,
	netdev
In-Reply-To: <87ocosqykb.fsf@purkki.valot.fi>

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]

On Wed, 2009-09-30 at 17:47 +0300, Kalle Valo wrote:

> I agree with Michael. The bug is real and I have verified that
> Michael's patch fixes the issue. Better to apply the patch now, it's
> trivial to change the implementation if/when the network stack has
> support for this.

FWIW, I think in mac80211 the in_interrupt() check can never return true
since we postpone all RX to the tasklet. But the tasklet seems to be ok
-- so should it really be in_interrupt()?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* N_PPP_SYNC ldisc BUG: sleeping function called from invalid context
From: Tilman Schmidt @ 2009-09-30 14:50 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Alan Cox

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

While testing pppd with capiplugin on kernel 2.6.31 (plus my CAPI and
Gigaset patches) I came across this BUG message. (Please ignore the
Tainted flag, it's from the Nvidia driver and doesn't influence what's
following.)

BUG: sleeping function called from invalid context at kernel/mutex.c:280
in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: events/0
INFO: lockdep is turned off.
irq event stamp: 2034
hardirqs last  enabled at (2033): [<c0347b07>]
_spin_unlock_irqrestore+0x3c/0x6c
hardirqs last disabled at (2034): [<c03475d0>] _spin_lock_irq+0x11/0x3f
softirqs last  enabled at (2008): [<c012f598>] __do_softirq+0x114/0x11c
softirqs last disabled at (1971): [<c010593a>] do_softirq+0x69/0xc7
Pid: 10, comm: events/0 Tainted: P           2.6.31-vanilla #1
Call Trace:
 [<c0124a3c>] __might_sleep+0x10e/0x116
 [<c0346bdd>] mutex_lock_nested+0x18/0x31
 [<c026d39b>] tty_unthrottle+0x10/0x38
 [<f8dcc31f>] ppp_sync_receive+0x168/0x170 [ppp_synctty]
 [<f8fbb9ce>] handle_minor_recv+0x187/0x1cd [capi]
 [<f8fbc19b>] capi_recv_message+0x1d9/0x24e [capi]
 [<fa5be461>] recv_handler+0x56/0x6f [kernelcapi]
 [<c0138cba>] worker_thread+0x14a/0x21d
 [<c0138c9e>] ? worker_thread+0x12e/0x21d
 [<fa5be40b>] ? recv_handler+0x0/0x6f [kernelcapi]
 [<c013b375>] ? autoremove_wake_function+0x0/0x30
 [<c013b201>] kthread+0x64/0x69
 [<c0138b70>] ? worker_thread+0x0/0x21d
 [<c013b19d>] ? kthread+0x0/0x69
 [<c0103f33>] kernel_thread_helper+0x7/0x10

Turns out the ppp_sync_receive() function (drivers/net/ppp_synctty.c
line 385ff.) has a comment in front stating:

/*
 * This can now be called from hard interrupt level as well
 * as soft interrupt level or mainline.
 */

but calls tty_unthrottle() which in turn calls mutex_lock() which of
course can sleep.

That tty_unthrottle() call was already removed once by

commit a6540f731d506d9e82444cf0020e716613d4c46c
Author: Alan Cox <alan@linux.intel.com>
Date:   Mon Jun 22 18:42:29 2009 +0100

    ppp: Fix throttling bugs

but re-added by

commit 4a21b8cb3550f19f838f7c48345fbbf6a0e8536b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Jul 16 09:14:23 2009 -0700

    Revert "ppp: Fix throttling bugs"

    This reverts commit a6540f731d506d9e82444cf0020e716613d4c46c, as
    requested by Alan:

      "... as it was wrong, the pty code is now fixed and the fact this
       isn't reverted is breaking pptp setups."

Opinions?

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFKw3AyQ3+did9BuFsRAiZXAKCGaos/qZNTlStEP4SE90PA0ZoMAQCdFtvf
U9chE7at35y8c6CGGS1IGg0=
=Vpq4
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: mac80211: NOHZ: local_softirq_pending 08
From: Kalle Valo @ 2009-09-30 14:47 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Oliver Hartkopp, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Johannes Berg
In-Reply-To: <200909301633.04376.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>

Michael Buesch <mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org> writes:

>> I don't know how expensive in_interrupt() is, but it IMO should be
>> avoided when the context for a code section can be determined in
>> another way.
>
> What if we just get the fix merged and discuss later whether it's
> worth to optimize a picosecond or not?? My patch fixes the _bug_.
> You can merge a more "efficient" fix later that saves one or two CPU
> cycles.

I agree with Michael. The bug is real and I have verified that
Michael's patch fixes the issue. Better to apply the patch now, it's
trivial to change the implementation if/when the network stack has
support for this.

-- 
Kalle Valo
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: mac80211: NOHZ: local_softirq_pending 08
From: Michael Buesch @ 2009-09-30 14:33 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: John W. Linville, Kalle Valo, linux-wireless, netdev,
	Johannes Berg
In-Reply-To: <4AC3475C.7000403@hartkopp.net>

On Wednesday 30 September 2009 13:56:12 Oliver Hartkopp wrote:
> John W. Linville wrote:
> > On Sat, Sep 12, 2009 at 06:41:12PM +0200, Oliver Hartkopp wrote:
> > 
> >> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
> >> mac80211 and the CAN subsystem.
> > 
> > Oliver,
> > 
> > Are you going to send this patch to Dave?  If you want me to carry
> > it instead, please resend it with a proper changelog including a
> > Signed-off-by line.  For that matter, Dave will most certainly want
> > that as well...
> 
> Hello John,
> 
> as i wrote here
> 
> http://marc.info/?l=linux-netdev&m=125277885910179&w=2
> 
> there are currently only three occurrences of checks that use netif_rx() and
> netif_rx_ni() depending on in_interrupt().
> 
> And regarding the suggested fix from Michael, that checked every(!) netif_rx()
> whether it is in interrupt or not, i was unsure if a netif_tx_ti() would make
> sense for only three cases?!?
> 
> If you think it makes sense, i can post a patch for that ... but:
> 
> Indeed it costs some additional investigation to prove whether netif_rx() or
> netif_rx_ni() should be used in each case. But IMHO this has to be done before
> providing a pump-gun function that solves the problem without thinking if we
> are in irq-context or not. I want to avoid that people are using netif_rx_ti()
> as some kind of default ...
> 
> I don't know how expensive in_interrupt() is, but it IMO should be avoided
> when the context for a code section can be determined in another way.

What if we just get the fix merged and discuss later whether it's worth to optimize a picosecond or not??
My patch fixes the _bug_. You can merge a more "efficient" fix later that saves one or two CPU cycles.

-- 
Greetings, Michael.

^ permalink raw reply

* [PATCH] tg3: Remove prev_vlan_tag from struct tx_ring_info
From: Eric Dumazet @ 2009-09-30 14:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: Matt Carlson, Linux Netdev List, Michael Chan

prev_vlan_tag field is not used.

Patch saves 512*8 bytes per tx queue ring on 64bit arches.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index 82b45d8..524691c 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2412,7 +2412,6 @@ struct ring_info {
 
 struct tx_ring_info {
 	struct sk_buff                  *skb;
-	u32                             prev_vlan_tag;
 };
 
 struct tg3_config_info {

^ permalink raw reply related

* Re: [PATCH] bcm63xx_enet: timeout off by one in do_mdio_op()
From: Maxime Bizon @ 2009-09-30 14:01 UTC (permalink / raw)
  To: Roel Kluin; +Cc: netdev, Andrew Morton
In-Reply-To: <4AB7DD50.6030409@gmail.com>


On Mon, 2009-09-21 at 22:08 +0200, Roel Kluin wrote:

Hi Roel,

> `while (limit-- >= 0)' reaches -2 after the loop upon timeout.

The 1000us limit was chosen arbitrarily, since mdio access are much
shorter, and was just to prevent CPU lockup in case of hardware bug.

But it looks like a bug, and since you're the second one reporting this,
this should be fixed :)


> Signed-off-by: Roel Kluin <roel.kluin@gmail.com>

Acked-by: Maxime Bizon <mbizon@freebox.fr>

> ---
> Small chance to occur, probably.
> 
> diff --git a/drivers/net/bcm63xx_enet.c b/drivers/net/bcm63xx_enet.c
> index 09d2709..ba29dc3 100644
> --- a/drivers/net/bcm63xx_enet.c
> +++ b/drivers/net/bcm63xx_enet.c
> @@ -90,7 +90,7 @@ static int do_mdio_op(struct bcm_enet_priv *priv, unsigned int data)
>  		if (enet_readl(priv, ENET_IR_REG) & ENET_IR_MII)
>  			break;
>  		udelay(1);
> -	} while (limit-- >= 0);
> +	} while (limit-- > 0);
>  
>  	return (limit < 0) ? 1 : 0;
>  }


-- 
Maxime



^ permalink raw reply

* Re: [PATCH] ipvs: Add boundary check on ioctl arguments
From: Hannes Eder @ 2009-09-30 13:38 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Wensong Zhang, netdev, linux-kernel, Simon Horman
In-Reply-To: <20090930131109.2b3f71b8@infradead.org>

[cc: +Simon Horman]

Arjan van de Ven wrote:
 > From 761a182f96b3707e1fee44e1079ba227e48745d1 Mon Sep 17 00:00:00 2001
 > From: Arjan van de Ven <arjan@linux.intel.com>
 > Date: Wed, 30 Sep 2009 13:05:51 +0200
 > Subject: [PATCH] ipvs: Add boundary check on ioctl arguments
 >
 > The ipvs code has a nifty system for doing the size of ioctl command copies;
 > it defines an array with values into which it indexes the cmd to find the
 > right length.
 >
 > Unfortunately, the ipvs code forgot to check if the cmd was in the range
 > that the array provides, allowing for an index outside of the array,
 > which then gives a "garbage" result into the length, which then gets
 > used for copying into a stack buffer.
 >
 > Fix this by adding sanity checks on these as well as the copy size.
 >
 > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
 > ---
 >  net/netfilter/ipvs/ip_vs_ctl.c |   14 +++++++++++++-
 >  1 files changed, 13 insertions(+), 1 deletions(-)
 >
 > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
 > index ac624e5..3c52796 100644
 > --- a/net/netfilter/ipvs/ip_vs_ctl.c
 > +++ b/net/netfilter/ipvs/ip_vs_ctl.c
 > @@ -2077,6 +2077,10 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user 
*user, unsigned int len)
 >  	if (!capable(CAP_NET_ADMIN))
 >  		return -EPERM;
 >
 > +	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX + 1)
 > +		return -EINVAL;
 > +	if (len < 0 || len >  MAX_ARG_LEN)
 > +		return -EINVAL;
 >  	if (len != set_arglen[SET_CMDID(cmd)]) {
 >  		pr_err("set_ctl: len %u != %u\n",
 >  		       len, set_arglen[SET_CMDID(cmd)]);
 > @@ -2353,17 +2357,25 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user 
*user, int *len)
 >  {
 >  	unsigned char arg[128];

can MAX_ARG_LEN be used here?

 >  	int ret = 0;
 > +	unsigned int copylen;
 >
 >  	if (!capable(CAP_NET_ADMIN))
 >  		return -EPERM;
 >
 > +	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX + 1)
 > +		return -EINVAL;
 > +
 >  	if (*len < get_arglen[GET_CMDID(cmd)]) {
 >  		pr_err("get_ctl: len %u < %u\n",
 >  		       *len, get_arglen[GET_CMDID(cmd)]);
 >  		return -EINVAL;
 >  	}
 >
 > -	if (copy_from_user(arg, user, get_arglen[GET_CMDID(cmd)]) != 0)
 > +	copylen = get_arglen[GET_CMDID(cmd)];
 > +	if (copylen > 128)

I think it's better to use 'copylen > sizeof(arg)' here.

 > +		return -EINVAL;
 > +
 > +	if (copy_from_user(arg, user, copylen) != 0)
 >  		return -EFAULT;
 >
 >  	if (mutex_lock_interruptible(&__ip_vs_mutex))

^ permalink raw reply

* Re: [PATCH] connector: Allow permission checking in the receiver callbacks
From: Lars Ellenberg @ 2009-09-30 13:20 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Philipp Reisner, linux-kernel, netdev, Andrew Morton
In-Reply-To: <20090930112057.GA15150@ioremap.net>

On Wed, Sep 30, 2009 at 03:20:57PM +0400, Evgeniy Polyakov wrote:
> Hi Philipp.
> 
> On Tue, Sep 29, 2009 at 04:48:07PM +0200, Philipp Reisner (philipp.reisner@linbit.com) wrote:
> > Various users of the connector should actually check if the
> > sender's capabilities of a netlink/connector packet are
> > actually sufficient for the operation they trigger. Up to
> > now the connector framework did not allow the kernel side
> > receiver to do so.
> > 
> > This patch set does the groundwork.
> > 
> > Philipp Reisner (4):
> >   connector: Keep the skb in cn_callback_data
> >   connector: Provide the sender's credentials to the callback
> >   connector/dm: Fixed a compilation warning
> >   connector: Removed the destruct_data callback since it is always
> >     kfree_skb()
> 
> Patches look good to me.
> Andrew please apply to the appropriate tree. I do not know whether it is
> acceptible now, since it is not a bugfix, but merely a simple cleanup.
> Feel free to add my signed off or ack, thank you.

Thanks.

Actually it is the basis for follow-up security fixes.

Without this, unprivileged user space is able to send arbitrary
connector requests to kernel subsystems, which have no way to verify the
privileges of the sender anymore, because that information, even though
available at the netlink layer, has been dropped by the connector.

Once this is applied, the various in-kernel receiving connector
callbacks can (and need to) add cap_raised(nsb->eff_cap, cap) where
appropriate. For example, you don't want some guest user to be able to
trigger a dst_del_node callback by sending a crafted netlink message,
right?

So it _is_ a (design-) bug fix.
Or am I missing something?

	Lars

^ permalink raw reply

* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
From: Eric Dumazet @ 2009-09-30 13:06 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Gilad Ben-Yossef, Netdev, Ori Finkalman
In-Reply-To: <alpine.DEB.2.00.0909301428140.13543@wel-95.cs.helsinki.fi>

Ilpo Järvinen a écrit :
> On Wed, 30 Sep 2009, Eric Dumazet wrote:
> 
>> Gilad Ben-Yossef a écrit :
>>> Eric Dumazet wrote:
>>>
>>>> Gilad Ben-Yossef a écrit :
>>>>  
>>>>> From: Ori Finkalman <ori@comsleep.com>
>>>>>
>>>>>
>>>>> Acknowledge TCP window scale support by inserting the proper option in
>>>>> SYN/ACK header
>>>>> even if our window scale is zero.
>>>>>
>>>>>
>>>>> This fixes the following observed behavior:
>>>>>
>>>>>
>>>>> 1. Client sends a SYN with TCP window scaling option and non zero window
>>>>> scale value to a Linux box.
>>>>>
>>>>> 2. Linux box notes large receive window from client.
>>>>>
>>>>> 3. Linux decides on a zero value of window scale for its part.
>>>>>
>>>>> 4. Due to compare against requested window scale size option, Linux does
>>>>> not to send windows scale
>>>>>
>>>>> TCP option header on SYN/ACK at all.
>>>>>
>>>>>
>>>>> Result:
>>>>>
>>>>>
>>>>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
>>>>> no TCP window scale option,
>>>>> while Linux thinks that TCP window scaling is supported (and scale might
>>>>> be non zero), since SYN had
>>>>>
>>>>> TCP window scale option and we have a mismatched idea between the client
>>>>> and server regarding window sizes.
>>>>>
>>>>>
>>>>> Please comment and/or apply.
>>>>> ...
>>>>>
>>>>>
>>>>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>>>>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
>>>>>
>>>>>
>>>>> Index: net/ipv4/tcp_output.c
>>>>> ===================================================================
>>>>> --- net/ipv4/tcp_output.c    (revision 46)
>>>>> +++ net/ipv4/tcp_output.c    (revision 210)
>>>>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
>>>>> #define OPTION_SACK_ADVERTISE    (1 << 0)
>>>>> #define OPTION_TS        (1 << 1)
>>>>> #define OPTION_MD5        (1 << 2)
>>>>> +#define OPTION_WSCALE        (1 << 3)
>>>>>
>>>>> struct tcp_out_options {
>>>>>     u8 options;        /* bit field of OPTION_* */
>>>>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
>>>>>                    TCPOLEN_SACK_PERM);
>>>>>     }
>>>>>
>>>>> -    if (unlikely(opts->ws)) {
>>>>> +    if (unlikely(OPTION_WSCALE & opts->options)) {
>>>>>         *ptr++ = htonl((TCPOPT_NOP << 24) |
>>>>>                    (TCPOPT_WINDOW << 16) |
>>>>>                    (TCPOLEN_WINDOW << 8) |
>>>>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
>>>>>
>>>>>     if (likely(ireq->wscale_ok)) {
>>>>>         opts->ws = ireq->rcv_wscale;
>>>>> -        if(likely(opts->ws))
>>>>> -            size += TCPOLEN_WSCALE_ALIGNED;
>>>>> +        opts->options |= OPTION_WSCALE;
>>>>> +        size += TCPOLEN_WSCALE_ALIGNED;
>>>>>     }
>>>>>     if (likely(doing_ts)) {
>>>>>         opts->options |= OPTION_TS;
>>>>>
>>>>>
>>>>>
>>>>>     
>>>> Seems not the more logical places to put this logic...
>>>>
>>>> How about this instead ?
>>>>
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index 5200aab..b78c084 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32
>>>> mss,
>>>>              space >>= 1;
>>>>              (*rcv_wscale)++;
>>>>          }
>>>> +        /*
>>>> +         * Set a minimum wscale of 1
>>>> +         */
>>>> +        if (*rcv_wscale == 0)
>>>> +            *rcv_wscale = 1;
>>>>         }
>>>>
>>>>         /* Set initial window to value enough for senders,
>>>>
>>>>   
>>> Thank you for the patch review. The suggested replacement patch
>>> certainly is shorter, code wise, which is an advantage.
>>>
>>> I cant help but feel though, that it is less readable - a window scale
>>> of zero is a perfectly legit value. Adding special logic to rule it out
>>> just because we chose to overload this setting for something else
>>> (whether window scaling is supported or not) seems like an invitation
>>> for someone to get it wrong again down the line, in my opinion.
>> As a matter of fact I didnot test your patch.
>>
>> My reaction was driven by :
>>
>> Your version slows down the tcp_options_write() function, once per tx packet.
> 
> Are you serious that anding would cost that much? :-/

Not really :)

> 
>> tcp_options_write() should not change socket state,
> 
> I fail to see how his patch was changing socket state in anyway in 
> anywhere?

Me too, now you say it :)

> 
>> while
>> tcp_select_initial_window() is the exact place where we are supposed to
>> compute wscale. 
> 
> And it calculated yielding to result of 0, which is perfectly valid. The 
> problem is that tcp_write_options thinks that 0 is indication of no window 
> scaling, instead of the correct interpretation of zero window scaling 
> which makes the huge difference for the opposite direction traffic as 
> these guys have noted. Not that I find your approach that bad either as 
> we only lose 1-bit accuracy for the window which is rather insignificant 
> as 1-byte window increments do not really make that much sense anyway 
> (and we have to specifically code against doing them anyway so the 
> effective granularity is much higher).

Yes, wscale 0 is RFC valid, but are we sure some equipment wont play funny games
with such value ? At least sending "wscale 1-14" must be working...


My quick&dirty patch was only for discussion, I have no strong opinion on it,
only that was on one place to patch instead of two/three/four I dont know yet.

So please Gilad & Ori send us a new patch :)


> 
>> Also how is managed tcp_syn_options() case (for outgoing connections ?)
>>
>>         if (likely(sysctl_tcp_window_scaling)) {
>>                 opts->ws = tp->rx_opt.rcv_wscale;
>>                 if (likely(opts->ws))
>>                         size += TCPOLEN_WSCALE_ALIGNED;
>>         }
>>
>> Dont you need to patch it as well ?
> 
> One certainly should change that too if that patch is the way to go 
> forward.
> 

^ permalink raw reply

* [PATCH] conntrack: do not print garbage after the usage message
From: Hannes Eder @ 2009-09-30 13:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel

When 'conntrack' is called with no arguments then garbage is printed
after the usage message.  This patch fixes this.

Signed-off-by: Hannes Eder <heder@google.com>

 src/conntrack.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/conntrack.c b/src/conntrack.c
index 42b5133..5ca68d1 100644
--- a/src/conntrack.c
+++ b/src/conntrack.c
@@ -1493,7 +1493,7 @@ int main(int argc, char *argv[])
 
 	free_options();
 
-	if (exit_msg[cmd][0]) {
+	if (cmd != CT_NONE && exit_msg[cmd][0]) {
 		fprintf(stderr, "%s v%s (conntrack-tools): ",PROGNAME,VERSION);
 		fprintf(stderr, exit_msg[cmd], counter);
 		if (counter == 0 && !(command & (CT_LIST | EXP_LIST)))


^ permalink raw reply related

* Re: [Pv-drivers] [PATCH 2.6.31-rc9] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Arnd Bergmann @ 2009-09-29 21:23 UTC (permalink / raw)
  To: David Miller
  Cc: bhavesh, chrisw, pv-drivers, netdev, linux-kernel, virtualization,
	greg, anthony, jgarzik, akpm, shemminger
In-Reply-To: <20090929.125530.31894576.davem@davemloft.net>

On Tuesday 29 September 2009, David Miller wrote:
> > 
> > These header files are indeed shared with the host implementation,
> > as you've guessed. If it's not a big deal, we would like to keep
> > the names the same, just for our own sanity's sake?
> 
> No.  This isn't your source tree, it's everyone's.  So you should
> adhere to basic naming conventions and coding standards of the
> tree regardless of what you happen to use or need to use internally.

Well, there is nothing wrong with making the identifiers the same
everywhere, as long as they all follow the Linux coding style ;-).

I heard that a number of cross-OS device drivers do that nowadays.

	Arnd <><

^ permalink raw reply

* Re: mac80211: NOHZ: local_softirq_pending 08
From: Oliver Hartkopp @ 2009-09-30 11:56 UTC (permalink / raw)
  To: John W. Linville
  Cc: Michael Buesch, Kalle Valo, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Johannes Berg
In-Reply-To: <20090929192928.GF2678-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

John W. Linville wrote:
> On Sat, Sep 12, 2009 at 06:41:12PM +0200, Oliver Hartkopp wrote:
> 
>> i cooked a patch that introduces netif_rx_ti() and fixes up the problems in
>> mac80211 and the CAN subsystem.
> 
> Oliver,
> 
> Are you going to send this patch to Dave?  If you want me to carry
> it instead, please resend it with a proper changelog including a
> Signed-off-by line.  For that matter, Dave will most certainly want
> that as well...

Hello John,

as i wrote here

http://marc.info/?l=linux-netdev&m=125277885910179&w=2

there are currently only three occurrences of checks that use netif_rx() and
netif_rx_ni() depending on in_interrupt().

And regarding the suggested fix from Michael, that checked every(!) netif_rx()
whether it is in interrupt or not, i was unsure if a netif_tx_ti() would make
sense for only three cases?!?

If you think it makes sense, i can post a patch for that ... but:

Indeed it costs some additional investigation to prove whether netif_rx() or
netif_rx_ni() should be used in each case. But IMHO this has to be done before
providing a pump-gun function that solves the problem without thinking if we
are in irq-context or not. I want to avoid that people are using netif_rx_ti()
as some kind of default ...

I don't know how expensive in_interrupt() is, but it IMO should be avoided
when the context for a code section can be determined in another way.

Regards,
Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] [RFC] IPv4 TCP fails to send window scale option when window scale is zero
From: Ilpo Järvinen @ 2009-09-30 11:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Gilad Ben-Yossef, Netdev, Ori Finkalman
In-Reply-To: <4AC305BF.6080306@gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5468 bytes --]

On Wed, 30 Sep 2009, Eric Dumazet wrote:

> Gilad Ben-Yossef a écrit :
> >
> > Eric Dumazet wrote:
> > 
> >> Gilad Ben-Yossef a écrit :
> >>  
> >>> From: Ori Finkalman <ori@comsleep.com>
> >>>
> >>>
> >>> Acknowledge TCP window scale support by inserting the proper option in
> >>> SYN/ACK header
> >>> even if our window scale is zero.
> >>>
> >>>
> >>> This fixes the following observed behavior:
> >>>
> >>>
> >>> 1. Client sends a SYN with TCP window scaling option and non zero window
> >>> scale value to a Linux box.
> >>>
> >>> 2. Linux box notes large receive window from client.
> >>>
> >>> 3. Linux decides on a zero value of window scale for its part.
> >>>
> >>> 4. Due to compare against requested window scale size option, Linux does
> >>> not to send windows scale
> >>>
> >>> TCP option header on SYN/ACK at all.
> >>>
> >>>
> >>> Result:
> >>>
> >>>
> >>> Client box thinks TCP window scaling is not supported, since SYN/ACK had
> >>> no TCP window scale option,
> >>> while Linux thinks that TCP window scaling is supported (and scale might
> >>> be non zero), since SYN had
> >>>
> >>> TCP window scale option and we have a mismatched idea between the client
> >>> and server regarding window sizes.
> >>>
> >>>
> >>> Please comment and/or apply.
> >>> ...
> >>>
> >>>
> >>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> >>> Signed-off-by: Ori Finkelman <ori@comsleep.com>
> >>>
> >>>
> >>> Index: net/ipv4/tcp_output.c
> >>> ===================================================================
> >>> --- net/ipv4/tcp_output.c    (revision 46)
> >>> +++ net/ipv4/tcp_output.c    (revision 210)
> >>> @@ -353,6 +353,7 @@ static void tcp_init_nondata_skb(struct
> >>> #define OPTION_SACK_ADVERTISE    (1 << 0)
> >>> #define OPTION_TS        (1 << 1)
> >>> #define OPTION_MD5        (1 << 2)
> >>> +#define OPTION_WSCALE        (1 << 3)
> >>>
> >>> struct tcp_out_options {
> >>>     u8 options;        /* bit field of OPTION_* */
> >>> @@ -417,7 +418,7 @@ static void tcp_options_write(__be32 *pt
> >>>                    TCPOLEN_SACK_PERM);
> >>>     }
> >>>
> >>> -    if (unlikely(opts->ws)) {
> >>> +    if (unlikely(OPTION_WSCALE & opts->options)) {
> >>>         *ptr++ = htonl((TCPOPT_NOP << 24) |
> >>>                    (TCPOPT_WINDOW << 16) |
> >>>                    (TCPOLEN_WINDOW << 8) |
> >>> @@ -530,8 +531,8 @@ static unsigned tcp_synack_options(struc
> >>>
> >>>     if (likely(ireq->wscale_ok)) {
> >>>         opts->ws = ireq->rcv_wscale;
> >>> -        if(likely(opts->ws))
> >>> -            size += TCPOLEN_WSCALE_ALIGNED;
> >>> +        opts->options |= OPTION_WSCALE;
> >>> +        size += TCPOLEN_WSCALE_ALIGNED;
> >>>     }
> >>>     if (likely(doing_ts)) {
> >>>         opts->options |= OPTION_TS;
> >>>
> >>>
> >>>
> >>>     
> >>
> >> Seems not the more logical places to put this logic...
> >>
> >> How about this instead ?
> >>
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index 5200aab..b78c084 100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -216,6 +216,11 @@ void tcp_select_initial_window(int __space, __u32
> >> mss,
> >>              space >>= 1;
> >>              (*rcv_wscale)++;
> >>          }
> >> +        /*
> >> +         * Set a minimum wscale of 1
> >> +         */
> >> +        if (*rcv_wscale == 0)
> >> +            *rcv_wscale = 1;
> >>         }
> >>
> >>         /* Set initial window to value enough for senders,
> >>
> >>   
> > 
> > Thank you for the patch review. The suggested replacement patch
> > certainly is shorter, code wise, which is an advantage.
> > 
> > I cant help but feel though, that it is less readable - a window scale
> > of zero is a perfectly legit value. Adding special logic to rule it out
> > just because we chose to overload this setting for something else
> > (whether window scaling is supported or not) seems like an invitation
> > for someone to get it wrong again down the line, in my opinion.
> 
> As a matter of fact I didnot test your patch.
> 
> My reaction was driven by :
> 
> Your version slows down the tcp_options_write() function, once per tx packet.

Are you serious that anding would cost that much? :-/

> tcp_options_write() should not change socket state,

I fail to see how his patch was changing socket state in anyway in 
anywhere?

> while
> tcp_select_initial_window() is the exact place where we are supposed to
> compute wscale. 

And it calculated yielding to result of 0, which is perfectly valid. The 
problem is that tcp_write_options thinks that 0 is indication of no window 
scaling, instead of the correct interpretation of zero window scaling 
which makes the huge difference for the opposite direction traffic as 
these guys have noted. Not that I find your approach that bad either as 
we only lose 1-bit accuracy for the window which is rather insignificant 
as 1-byte window increments do not really make that much sense anyway 
(and we have to specifically code against doing them anyway so the 
effective granularity is much higher).

> Also how is managed tcp_syn_options() case (for outgoing connections ?)
> 
>         if (likely(sysctl_tcp_window_scaling)) {
>                 opts->ws = tp->rx_opt.rcv_wscale;
>                 if (likely(opts->ws))
>                         size += TCPOLEN_WSCALE_ALIGNED;
>         }
> 
> Dont you need to patch it as well ?

One certainly should change that too if that patch is the way to go 
forward.

-- 
 i.

^ permalink raw reply

* Re: WARNING: at net/ipv4/af_inet.c:154 inet_sock_destruct
From: Francis Moreau @ 2009-09-30 11:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Mailing List, Linux Netdev List, David S. Miller
In-Reply-To: <4AC1D0F5.4050709@gmail.com>

On Tue, Sep 29, 2009 at 11:18 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Francis Moreau a écrit :
>> Hello,
>>
>> I got this kernel warning when stopping nfsd:
>>
>> [260104.553720] WARNING: at net/ipv4/af_inet.c:154
>> inet_sock_destruct+0x164/0x182()
>> [260104.553722] Hardware name: P5K-VM
>> [260104.553724] Modules linked in: jfs loop nfsd lockd nfs_acl
>> auth_rpcgss exportfs sunrpc [last unloaded: microcode]
>> [260104.553736] Pid: 858, comm: nfsd Tainted: G   M       2.6.31 #13
>> [260104.553738] Call Trace:
>> [260104.553743]  [<ffffffff813ed53a>] ? inet_sock_destruct+0x164/0x182
>> [260104.553748]  [<ffffffff81044471>] warn_slowpath_common+0x7c/0xa9
>> [260104.553751]  [<ffffffff810444b2>] warn_slowpath_null+0x14/0x16
>> [260104.553754]  [<ffffffff813ed53a>] inet_sock_destruct+0x164/0x182
>> [260104.553759]  [<ffffffff8138e1c0>] __sk_free+0x23/0xe7
>> [260104.553762]  [<ffffffff8138e2fd>] sk_free+0x1f/0x21
>> [260104.553765]  [<ffffffff8138e3c7>] sk_common_release+0xc8/0xcd
>> [260104.553769]  [<ffffffff813e4459>] udp_lib_close+0xe/0x10
>> [260104.553772]  [<ffffffff813ecfe2>] inet_release+0x55/0x5c
>> [260104.553775]  [<ffffffff8138b746>] sock_release+0x1f/0x71
>> [260104.553778]  [<ffffffff8138b7bf>] sock_close+0x27/0x2b
>> [260104.553782]  [<ffffffff810d0641>] __fput+0xfb/0x1c0
>> [260104.553787]  [<ffffffff8104a197>] ? local_bh_disable+0x12/0x14
>> [260104.553790]  [<ffffffff810d0723>] fput+0x1d/0x1f
>> [260104.553810]  [<ffffffffa0014035>] svc_sock_free+0x40/0x56 [sunrpc]
>> [260104.553827]  [<ffffffffa001dea0>] svc_xprt_free+0x43/0x53 [sunrpc]
>> [260104.553843]  [<ffffffffa001de5d>] ? svc_xprt_free+0x0/0x53 [sunrpc]
>> [260104.553847]  [<ffffffff811b4641>] kref_put+0x43/0x4f
>> [260104.553863]  [<ffffffffa001d224>] svc_close_xprt+0x55/0x5e [sunrpc]
>> [260104.553879]  [<ffffffffa001d27d>] svc_close_all+0x50/0x69 [sunrpc]
>> [260104.553894]  [<ffffffffa0012922>] svc_destroy+0x9e/0x142 [sunrpc]
>> [260104.553910]  [<ffffffffa0012a7f>] svc_exit_thread+0xb9/0xc2 [sunrpc]
>> [260104.553922]  [<ffffffffa00707b1>] ? nfsd+0x0/0x151 [nfsd]
>> [260104.553932]  [<ffffffffa00708e8>] nfsd+0x137/0x151 [nfsd]
>> [260104.553936]  [<ffffffff8105ad28>] kthread+0x94/0x9c
>> [260104.553941]  [<ffffffff8100c1fa>] child_rip+0xa/0x20
>> [260104.553944]  [<ffffffff81047b00>] ? do_exit+0x5d7/0x691
>> [260104.553948]  [<ffffffff81039cf8>] ? finish_task_switch+0x6a/0xc7
>> [260104.553953]  [<ffffffff8100bb6d>] ? restore_args+0x0/0x30
>> [260104.553956]  [<ffffffff8105ac94>] ? kthread+0x0/0x9c
>> [260104.553959]  [<ffffffff8100c1f0>] ? child_rip+0x0/0x20
>>
>> It happens on 2.6.31 and older kernels as well though I don't remember
>> when it really started.
>
> Could you please try following patch ?
>

No trace of this bug has been seen so far.

thanks
-- 
Francis

^ permalink raw reply

* Re: [PATCH] connector: Allow permission checking in the receiver callbacks
From: Evgeniy Polyakov @ 2009-09-30 11:20 UTC (permalink / raw)
  To: Philipp Reisner; +Cc: linux-kernel, netdev, Lars Ellenberg, Andrew Morton
In-Reply-To: <1254235692-1631-1-git-send-email-philipp.reisner@linbit.com>

Hi Philipp.

On Tue, Sep 29, 2009 at 04:48:07PM +0200, Philipp Reisner (philipp.reisner@linbit.com) wrote:
> Various users of the connector should actually check if the
> sender's capabilities of a netlink/connector packet are
> actually sufficient for the operation they trigger. Up to
> now the connector framework did not allow the kernel side
> receiver to do so.
> 
> This patch set does the groundwork.
> 
> Philipp Reisner (4):
>   connector: Keep the skb in cn_callback_data
>   connector: Provide the sender's credentials to the callback
>   connector/dm: Fixed a compilation warning
>   connector: Removed the destruct_data callback since it is always
>     kfree_skb()

Patches look good to me.
Andrew please apply to the appropriate tree. I do not know whether it is
acceptible now, since it is not a bugfix, but merely a simple cleanup.
Feel free to add my signed off or ack, thank you.

>  Documentation/connector/cn_test.c      |    2 +-
>  Documentation/connector/connector.txt  |    8 ++++----
>  drivers/connector/cn_queue.c           |   12 +++++++-----
>  drivers/connector/connector.c          |   22 ++++++++--------------
>  drivers/md/dm-log-userspace-transfer.c |    3 +--
>  drivers/staging/dst/dcore.c            |    2 +-
>  drivers/staging/pohmelfs/config.c      |    2 +-
>  drivers/video/uvesafb.c                |    2 +-
>  drivers/w1/w1_netlink.c                |    2 +-
>  include/linux/connector.h              |   11 ++++-------
>  10 files changed, 29 insertions(+), 37 deletions(-)

-- 
	Evgeniy Polyakov

^ permalink raw reply

* [PATCH] ipvs: Add boundary check on ioctl arguments
From: Arjan van de Ven @ 2009-09-30 11:11 UTC (permalink / raw)
  To: Wensong Zhang; +Cc: netdev, linux-kernel


>From 761a182f96b3707e1fee44e1079ba227e48745d1 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Wed, 30 Sep 2009 13:05:51 +0200
Subject: [PATCH] ipvs: Add boundary check on ioctl arguments

The ipvs code has a nifty system for doing the size of ioctl command copies;
it defines an array with values into which it indexes the cmd to find the
right length.

Unfortunately, the ipvs code forgot to check if the cmd was in the range
that the array provides, allowing for an index outside of the array,
which then gives a "garbage" result into the length, which then gets
used for copying into a stack buffer.

Fix this by adding sanity checks on these as well as the copy size.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index ac624e5..3c52796 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2077,6 +2077,10 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX + 1)
+		return -EINVAL;
+	if (len < 0 || len >  MAX_ARG_LEN)
+		return -EINVAL;
 	if (len != set_arglen[SET_CMDID(cmd)]) {
 		pr_err("set_ctl: len %u != %u\n",
 		       len, set_arglen[SET_CMDID(cmd)]);
@@ -2353,17 +2357,25 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	unsigned char arg[128];
 	int ret = 0;
+	unsigned int copylen;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
+	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX + 1)
+		return -EINVAL;
+
 	if (*len < get_arglen[GET_CMDID(cmd)]) {
 		pr_err("get_ctl: len %u < %u\n",
 		       *len, get_arglen[GET_CMDID(cmd)]);
 		return -EINVAL;
 	}
 
-	if (copy_from_user(arg, user, get_arglen[GET_CMDID(cmd)]) != 0)
+	copylen = get_arglen[GET_CMDID(cmd)];
+	if (copylen > 128)
+		return -EINVAL;
+
+	if (copy_from_user(arg, user, copylen) != 0)
 		return -EFAULT;
 
 	if (mutex_lock_interruptible(&__ip_vs_mutex))
-- 
1.6.2.5


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related

* [PATCH net-next-2.6] be2net: Workaround to fix a bug in Rx Completion processing.
From: Ajit Khaparde @ 2009-09-30  9:07 UTC (permalink / raw)
  To: davem, netdev

vtp bit in RX completion descriptor could be wrongly set in
some skews of BladEngine.  Ignore this  bit if vtm is not set.
This patch is against the net-next-2.6 tree.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
---
 drivers/net/benet/be.h      |    1 +
 drivers/net/benet/be_cmds.c |    3 ++-
 drivers/net/benet/be_cmds.h |    3 ++-
 drivers/net/benet/be_main.c |   23 +++++++++++++++++++----
 4 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 13b72ce..7503b92 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -258,6 +258,7 @@ struct be_adapter {
 	bool link_up;
 	u32 port_num;
 	bool promiscuous;
+	u32 cap;
 };
 
 extern const struct ethtool_ops be_ethtool_ops;
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 1db0924..513e47b 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -995,7 +995,7 @@ int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc)
 	return status;
 }
 
-int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num)
+int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num, u32 *cap)
 {
 	struct be_mcc_wrb *wrb = wrb_from_mbox(&adapter->mbox_mem);
 	struct be_cmd_req_query_fw_cfg *req = embedded_payload(wrb);
@@ -1014,6 +1014,7 @@ int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num)
 	if (!status) {
 		struct be_cmd_resp_query_fw_cfg *resp = embedded_payload(wrb);
 		*port_num = le32_to_cpu(resp->phys_port);
+		*cap = le32_to_cpu(resp->function_cap);
 	}
 
 	spin_unlock(&adapter->mbox_lock);
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index fd7028e..f4fd74e 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -759,7 +759,8 @@ extern int be_cmd_set_flow_control(struct be_adapter *adapter,
 			u32 tx_fc, u32 rx_fc);
 extern int be_cmd_get_flow_control(struct be_adapter *adapter,
 			u32 *tx_fc, u32 *rx_fc);
-extern int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num);
+extern int be_cmd_query_fw_cfg(struct be_adapter *adapter,
+			u32 *port_num, u32 *cap);
 extern int be_cmd_reset_function(struct be_adapter *adapter);
 extern void be_process_mcc(struct be_adapter *adapter);
 extern int be_cmd_write_flashrom(struct be_adapter *adapter,
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index ce11bba..44d2e6a 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -747,9 +747,16 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 			struct be_eth_rx_compl *rxcp)
 {
 	struct sk_buff *skb;
-	u32 vtp, vid;
+	u32 vlanf, vid;
+	u8 vtm;
 
-	vtp = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
+	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
+	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
+
+	/* vlanf could be wrongly set in some cards.
+	 * ignore if vtm is not set */
+	if ((adapter->cap == 0x400) && !vtm)
+		vlanf = 0;
 
 	skb = netdev_alloc_skb(adapter->netdev, BE_HDR_LEN + NET_IP_ALIGN);
 	if (!skb) {
@@ -772,7 +779,7 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 	skb->protocol = eth_type_trans(skb, adapter->netdev);
 	skb->dev = adapter->netdev;
 
-	if (vtp) {
+	if (vlanf) {
 		if (!adapter->vlan_grp || adapter->num_vlans == 0) {
 			kfree_skb(skb);
 			return;
@@ -797,11 +804,18 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 	struct be_eq_obj *eq_obj =  &adapter->rx_eq;
 	u32 num_rcvd, pkt_size, remaining, vlanf, curr_frag_len;
 	u16 i, rxq_idx = 0, vid, j;
+	u8 vtm;
 
 	num_rcvd = AMAP_GET_BITS(struct amap_eth_rx_compl, numfrags, rxcp);
 	pkt_size = AMAP_GET_BITS(struct amap_eth_rx_compl, pktsize, rxcp);
 	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
 	rxq_idx = AMAP_GET_BITS(struct amap_eth_rx_compl, fragndx, rxcp);
+	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
+
+	/* vlanf could be wrongly set in some cards.
+	 * ignore if vtm is not set */
+	if ((adapter->cap == 0x400) && !vtm)
+		vlanf = 0;
 
 	skb = napi_get_frags(&eq_obj->napi);
 	if (!skb) {
@@ -2045,7 +2059,8 @@ static int be_hw_up(struct be_adapter *adapter)
 	if (status)
 		return status;
 
-	status = be_cmd_query_fw_cfg(adapter, &adapter->port_num);
+	status = be_cmd_query_fw_cfg(adapter,
+				&adapter->port_num, &adapter->cap);
 	return status;
 }
 
-- 
1.6.0.4


^ permalink raw reply related

* Re: [PATCH 3/3] bonding: send ARP requests on interfaces other than the primary for tlb/alb
From: Eric Dumazet @ 2009-09-30  7:49 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, netdev, andy
In-Reply-To: <20090930.002721.160750704.davem@davemloft.net>

David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 30 Sep 2009 07:20:23 +0200
> 
>> Jay Vosburgh a écrit :
>>> -		/* The ARP relpy packets must be delayed so that
>>> -		 * they can cancel out the influence of the ARP request.
>>> -		 */
>> COmments should have the following form :
>> /*
>>  * This is a fine comment
>>  */
> 
> I definitely prefer what is used in the patch, whereas your suggestion
> wastes a precious line on the screen and doesn't look markedly better
> at all.
> 
> Look at any TCP protocol source file and you'll see the consistent
> application of:
> 
> 	/* This form of
> 	 * comment.
> 	 */
> 
> there.

No problem David, I dont want to waste time to discuss this kind of stuff,
but you know my own opinion, and I know yours as well !

As a non native, some things that are painful details for you definitly help
my brain to decode English sentences. Uppercase on first letter, final point,
spelling, ...


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox