* linux-next: build warnings after merge of the net tree
From: Stephen Rothwell @ 2010-06-16 3:38 UTC (permalink / raw)
To: David Miller, netdev; +Cc: linux-next, linux-kernel, Ben Hutchings
[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]
Hi Dave,
After merging the net tree, today's linux-next build (x86_64 allmodconfig)
produced these warnings:
In file included from drivers/usb/gadget/ether.c:123:
drivers/usb/gadget/rndis.c: In function 'gen_ndis_query_resp':
drivers/usb/gadget/rndis.c:197: warning: assignment from incompatible pointer type
In file included from drivers/usb/gadget/multi.c:67:
drivers/usb/gadget/rndis.c: In function 'gen_ndis_query_resp':
drivers/usb/gadget/rndis.c:197: warning: assignment from incompatible pointer type
In file included from drivers/usb/gadget/g_ffs.c:30:
drivers/usb/gadget/rndis.c: In function 'gen_ndis_query_resp':
drivers/usb/gadget/rndis.c:197: warning: assignment from incompatible pointer type
Introduced by commit be1f3c2c027cc5ad735df6a45a542ed1db7ec48b ("net:
Enable 64-bit net device statistics on 32-bit architectures"). This is a
call to dev_get_stats() and the return value is being assigned to a
"struct net_device_stats *".
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Herbert Xu @ 2010-06-16 3:33 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Paul E. McKenney, shemminger, mst, frzhang, netdev,
amwang, mpm
In-Reply-To: <1276657400.19249.53.camel@edumazet-laptop>
On Wed, Jun 16, 2010 at 05:03:20AM +0200, Eric Dumazet wrote:
>
> I wonder how these patches were tested, Herbert ?
You know, not everyone enables RCU debugging...
Anyway, this patch should fix the problems you've spotted.
netpoll: Use correct primitives for RCU dereferencing
Now that RCU debugging checks for matching rcu_dereference calls
and rcu_read_lock, we need to use the correct primitives or face
nasty warnings.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 4c77fe7..413742c 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -64,7 +64,7 @@ static inline bool netpoll_rx(struct sk_buff *skb)
bool ret = false;
rcu_read_lock_bh();
- npinfo = rcu_dereference(skb->dev->npinfo);
+ npinfo = rcu_dereference_bh(skb->dev->npinfo);
if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
goto out;
@@ -82,7 +82,7 @@ out:
static inline int netpoll_rx_on(struct sk_buff *skb)
{
- struct netpoll_info *npinfo = rcu_dereference(skb->dev->npinfo);
+ struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo);
return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
}
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* Re: Proposed linux kernel changes : scaling tcp/ip stack : 2nd part
From: Mitchell Erblich @ 2010-06-16 3:30 UTC (permalink / raw)
To: Mitchell Erblich; +Cc: Eric Dumazet, netdev
In-Reply-To: <97746864-ED54-4A12-AFE7-752AA6E41CDD@earthlink.net>
On Jun 15, 2010, at 8:11 PM, Mitchell Erblich wrote:
>
> On Jun 3, 2010, at 2:14 AM, Eric Dumazet wrote:
>
>> Le jeudi 03 juin 2010 à 01:16 -0700, Mitchell Erblich a écrit :
>>> To whom it may concern,
>>>
>>> First, my assumption is to keep this discussion local to just a few tcp/ip
>>> developers to see if there is any consensus that the below is a logical
>>> approach. Please also pass this email if there is a "owner(s)" of this stack
>>> to identify if a case exists for the below possible changes.
>>>
>>> I am not currently on the linux kernel mail group.
>>>
>>> I have experience with modifications of the Linux tcp/ip stack, and have
>>> merged the changes into the company's local tree and left the possible
>>> global integration to others.
>>>
>>> I have been approached by a number of companies about scaling the
>>> stack with the assumption of a number of cpu cores. At present, I find extra
>>> time on my hands and am considering looking into this area on my own.
>>>
>>> The first assumption is that if extra cores are available, that a single
>>> received homogeneous flow of a large number of packets/segments per
>>> second (pps) can be split into non-equal flows. This split can in effect
>>> allow a larger recv'd pps rate at the same core load while splitting off
>>> other workloads, such as xmit'ing pure ACKs.
>>>
>>> Simply, again assuming Amdahl's law (and not looking to equalize the load
>>> between cores), and creating logical separations where in a many core
>>> system, different cores could have new kernel threads that operate in
>>> parallel within the tcp/ip stack. The initial separation points would be at
>>> the ip/tcp layer boundry and where any recv'd sk/pkt would generate some
>>> form of output.
>>>
>>> The ip/tcp layer would be split like the vintage AT&T STREAMs protocol,
>>> with some form of queuing & scheduling, would be needed. In addition,
>>> the queuing/schedullng of other kernel threads would occur within ip & tcp
>>> to separate the I/O.
>>>
>>> A possible validation test is to identify the max recv'd pps rate within the
>>> tcp/ip modules within normal flow TCP established state with normal order
>>> of say 64byte non fragmented segments, before and after each
>>> incremental change. Or the same rate with fewer core/cpu cycles.
>>>
>>> I am willing to have a private git Linux.org tree that concentrates proposed
>>> changes into this tree and if there is willingness, a seen want/need then identify
>>> how to implement the merge.
>>
>> Hi Mitchell
>>
>> We work everyday to improve network stack, and standard linux tree is
>> pretty scalable, you dont need to setup a separate git tree for that.
>>
>> Our beloved maintainer David S. Miller handles two trees, net-2.6 and
>> net-next-2.6 where we put all our changes.
>>
>> http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
>>
>> I suggest you read the last patches (say .. about 10.000 of them), to
>> have an idea of things we did during last years.
>>
>> keywords : RCU, multiqueue, RPS, percpu data, lockless algos, cache line
>> placement...
>>
>> Its nice to see another man joining the team !
>>
>> Thanks
>>
>
>
> Lets start with a two part Linux kernel change and a tcp input/output change:
>
> 2 Parts: 2nd part TBD
>
> Summary: Don't use last free pages for TCP ACKs with GFP_ATOMIC for our
> sk buf allocs. 1 line change in tcp_output.c with a new gfp.h arg, and a change
> in the generic kernel. TBD.
>
> This change should have no effect with normal available kernel mem allocs.
>
> Assuming memory pressure ( WAITING for clean memory) we should be allocating
> our last pages for input skbufs and not for xmit allocs.
>
> By delaying skbuf allocations when we have low kmem, we secondarily slow down the
> tcp flow : if in slow start (SS) we are almost doing a DELACK, else CA should/could
> decrease the number of in-flight ACKs and the peer should do burst avoidance
> if our later ack increases the window in a larger chunk..
>
> And use the last pages to decrease the chance of dropping a input pkt or
> running out of recv descriptors, because of mem back pressure.
>
> The change could check for some form of mem pressure before the alloc,
> but the alloc in itself should suffice. We could also do a ECN type check before
> the alloc.
>
> Now the kicker. I want a GFP_KERNEL with NO_SLEEP OR a GFP_ATOMIC and
> NOT use emergency pools, thus CAN FAIL, to have 0 other secondary effects
> and change just the 1 arg.
>
> code : tcp_output.c : tcp_send_ack()
> line : buff = alloc_skb(MAX_TCP_HDR, GFP_KERNEL_NSLEEP); /* with a NO SLEEP */
>
> Suggestions, feedback??
>
> Mitchell Erblich
>
>
>
>
Sorry :),
2nd part:
use GFP_NOWAIT as 2nd arg to alloc_skb()
Mitchell Erblich
>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: Proposed linux kernel changes : scaling tcp/ip stack
From: Mitchell Erblich @ 2010-06-16 3:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1275556440.2456.19.camel@edumazet-laptop>
On Jun 3, 2010, at 2:14 AM, Eric Dumazet wrote:
> Le jeudi 03 juin 2010 à 01:16 -0700, Mitchell Erblich a écrit :
>> To whom it may concern,
>>
>> First, my assumption is to keep this discussion local to just a few tcp/ip
>> developers to see if there is any consensus that the below is a logical
>> approach. Please also pass this email if there is a "owner(s)" of this stack
>> to identify if a case exists for the below possible changes.
>>
>> I am not currently on the linux kernel mail group.
>>
>> I have experience with modifications of the Linux tcp/ip stack, and have
>> merged the changes into the company's local tree and left the possible
>> global integration to others.
>>
>> I have been approached by a number of companies about scaling the
>> stack with the assumption of a number of cpu cores. At present, I find extra
>> time on my hands and am considering looking into this area on my own.
>>
>> The first assumption is that if extra cores are available, that a single
>> received homogeneous flow of a large number of packets/segments per
>> second (pps) can be split into non-equal flows. This split can in effect
>> allow a larger recv'd pps rate at the same core load while splitting off
>> other workloads, such as xmit'ing pure ACKs.
>>
>> Simply, again assuming Amdahl's law (and not looking to equalize the load
>> between cores), and creating logical separations where in a many core
>> system, different cores could have new kernel threads that operate in
>> parallel within the tcp/ip stack. The initial separation points would be at
>> the ip/tcp layer boundry and where any recv'd sk/pkt would generate some
>> form of output.
>>
>> The ip/tcp layer would be split like the vintage AT&T STREAMs protocol,
>> with some form of queuing & scheduling, would be needed. In addition,
>> the queuing/schedullng of other kernel threads would occur within ip & tcp
>> to separate the I/O.
>>
>> A possible validation test is to identify the max recv'd pps rate within the
>> tcp/ip modules within normal flow TCP established state with normal order
>> of say 64byte non fragmented segments, before and after each
>> incremental change. Or the same rate with fewer core/cpu cycles.
>>
>> I am willing to have a private git Linux.org tree that concentrates proposed
>> changes into this tree and if there is willingness, a seen want/need then identify
>> how to implement the merge.
>
> Hi Mitchell
>
> We work everyday to improve network stack, and standard linux tree is
> pretty scalable, you dont need to setup a separate git tree for that.
>
> Our beloved maintainer David S. Miller handles two trees, net-2.6 and
> net-next-2.6 where we put all our changes.
>
> http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
>
> I suggest you read the last patches (say .. about 10.000 of them), to
> have an idea of things we did during last years.
>
> keywords : RCU, multiqueue, RPS, percpu data, lockless algos, cache line
> placement...
>
> Its nice to see another man joining the team !
>
> Thanks
>
Lets start with a two part Linux kernel change and a tcp input/output change:
2 Parts: 2nd part TBD
Summary: Don't use last free pages for TCP ACKs with GFP_ATOMIC for our
sk buf allocs. 1 line change in tcp_output.c with a new gfp.h arg, and a change
in the generic kernel. TBD.
This change should have no effect with normal available kernel mem allocs.
Assuming memory pressure ( WAITING for clean memory) we should be allocating
our last pages for input skbufs and not for xmit allocs.
By delaying skbuf allocations when we have low kmem, we secondarily slow down the
tcp flow : if in slow start (SS) we are almost doing a DELACK, else CA should/could
decrease the number of in-flight ACKs and the peer should do burst avoidance
if our later ack increases the window in a larger chunk..
And use the last pages to decrease the chance of dropping a input pkt or
running out of recv descriptors, because of mem back pressure.
The change could check for some form of mem pressure before the alloc,
but the alloc in itself should suffice. We could also do a ECN type check before
the alloc.
Now the kicker. I want a GFP_KERNEL with NO_SLEEP OR a GFP_ATOMIC and
NOT use emergency pools, thus CAN FAIL, to have 0 other secondary effects
and change just the 1 arg.
code : tcp_output.c : tcp_send_ack()
line : buff = alloc_skb(MAX_TCP_HDR, GFP_KERNEL_NSLEEP); /* with a NO SLEEP */
Suggestions, feedback??
Mitchell Erblich
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Eric Dumazet @ 2010-06-16 3:03 UTC (permalink / raw)
To: David Miller
Cc: Paul E. McKenney, herbert, shemminger, mst, frzhang, netdev,
amwang, mpm
In-Reply-To: <1276657139.19249.50.camel@edumazet-laptop>
Le mercredi 16 juin 2010 à 04:59 +0200, Eric Dumazet a écrit :
> Le mardi 15 juin 2010 à 11:39 -0700, David Miller a écrit :
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Fri, 11 Jun 2010 12:11:42 +1000
> >
> > > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> > >> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> > >> >
> > >> > Okay, then add a comment where in_irq is used?
> > >>
> > >> Actually let me put it into a wrapper. I'll respin the patches.
> > >
> > > OK here is a repost. And this time it really is 8 patches :)
> > > I've tested it lightly.
> >
> > All applied to net-next-2.6, thanks Herbert.
>
For this second splat, I dont know yet how to fix it, its 5 in the
morning here, I need a sleep ;)
At this point, no rcu_lock is held.
I wonder how these patches were tested, Herbert ?
[ 74.431712]
[ 74.431713] ===================================================
[ 74.431717] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 74.431719] ---------------------------------------------------
[ 74.431722] include/linux/netpoll.h:85 invoked rcu_dereference_check() without protection!
[ 74.431725]
[ 74.431726] other info that might help us debug this:
[ 74.431727]
[ 74.431730]
[ 74.431730] rcu_scheduler_active = 1, debug_locks = 1
[ 74.431733] no locks held by swapper/0.
[ 74.431735]
[ 74.431736] stack backtrace:
[ 74.431739] Pid: 0, comm: swapper Not tainted 2.6.35-rc1-00508-gdbe3a24-dirty #78
[ 74.431742] Call Trace:
[ 74.431748] [<c132cf0c>] ? printk+0xf/0x13
[ 74.431754] [<c1059ac6>] lockdep_rcu_dereference+0x74/0x7d
[ 74.431759] [<c1297628>] __napi_gro_receive+0x4d/0xf6
[ 74.431764] [<c12977a3>] napi_gro_receive+0x19/0x24
[ 74.431775] [<f805d83f>] bnx2x_rx_int+0x101b/0x124e [bnx2x]
[ 74.431781] [<c1050ffc>] ? async_thread+0x198/0x1de
[ 74.431787] [<c129580f>] ? net_tx_action+0x9a/0x12a
[ 74.431797] [<f805f267>] bnx2x_poll+0x5d/0x18b [bnx2x]
[ 74.431801] [<c1297360>] ? net_rx_action+0x1e4/0x21a
[ 74.431805] [<c105ccb2>] ? trace_hardirqs_on_caller+0xe2/0x11c
[ 74.431810] [<c1297218>] net_rx_action+0x9c/0x21a
[ 74.431814] [<c1039a21>] __do_softirq+0x126/0x277
[ 74.431819] [<c10398fb>] ? __do_softirq+0x0/0x277
[ 74.431821] <IRQ> [<c1039c0d>] ? irq_exit+0x38/0x74
[ 74.431828] [<c1003d1f>] ? do_IRQ+0x87/0x9b
[ 74.431833] [<c1002d2e>] ? common_interrupt+0x2e/0x34
[ 74.431838] [<c105007b>] ? sched_clock_local+0x3f/0x11f
[ 74.431843] [<c11ba45b>] ? acpi_idle_enter_bm+0x271/0x2a0
[ 74.431848] [<c12797bd>] ? cpuidle_idle_call+0x76/0x151
[ 74.431852] [<c1001565>] ? cpu_idle+0x49/0x76
[ 74.431857] [<c1319ece>] ? rest_init+0xd6/0xdb
[ 74.431861] [<c156579f>] ? start_kernel+0x31b/0x320
[ 74.431865] [<c15650c9>] ? i386_start_kernel+0xc9/0xd0
^ permalink raw reply
* Re: [0/8] netpoll/bridge fixes
From: Eric Dumazet @ 2010-06-16 2:58 UTC (permalink / raw)
To: David Miller, Paul E. McKenney
Cc: herbert, shemminger, mst, frzhang, netdev, amwang, mpm
In-Reply-To: <20100615.113940.245399246.davem@davemloft.net>
Le mardi 15 juin 2010 à 11:39 -0700, David Miller a écrit :
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Fri, 11 Jun 2010 12:11:42 +1000
>
> > On Fri, Jun 11, 2010 at 08:48:39AM +1000, Herbert Xu wrote:
> >> On Thu, Jun 10, 2010 at 02:59:15PM -0700, Stephen Hemminger wrote:
> >> >
> >> > Okay, then add a comment where in_irq is used?
> >>
> >> Actually let me put it into a wrapper. I'll respin the patches.
> >
> > OK here is a repost. And this time it really is 8 patches :)
> > I've tested it lightly.
>
> All applied to net-next-2.6, thanks Herbert.
Well...
[ 52.914014] ===================================================
[ 52.914018] [ INFO: suspicious rcu_dereference_check() usage. ]
[ 52.914020] ---------------------------------------------------
[ 52.914024] include/linux/netpoll.h:67 invoked rcu_dereference_check() without protection!
[ 52.914027]
[ 52.914027] other info that might help us debug this:
[ 52.914029]
[ 52.914031]
[ 52.914032] rcu_scheduler_active = 1, debug_locks = 1
[ 52.914035] 4 locks held by swapper/0:
[ 52.914037] #0: (&n->timer){+.-...}, at: [<c103fd95>] run_timer_softirq+0x1b8/0x419
[ 52.914052] #1: (slock-AF_INET){+.....}, at: [<c12f2b3d>] icmp_send+0x149/0x58b
[ 52.914063] #2: (rcu_read_lock_bh){.+....}, at: [<c129978d>] dev_queue_xmit+0xf7/0x5df
[ 52.914073] #3: (rcu_read_lock_bh){.+....}, at: [<c12977ae>] netif_rx+0x0/0x195
[ 52.914081]
[ 52.914081] stack backtrace:
[ 52.914086] Pid: 0, comm: swapper Not tainted 2.6.35-rc1-00508-gdbe3a24-dirty #78
[ 52.914089] Call Trace:
[ 52.914095] [<c132cf0c>] ? printk+0xf/0x13
[ 52.914103] [<c1059ac6>] lockdep_rcu_dereference+0x74/0x7d
[ 52.914107] [<c1297819>] netif_rx+0x6b/0x195
[ 52.914111] [<c129978d>] ? dev_queue_xmit+0xf7/0x5df
[ 52.914117] [<c1240775>] loopback_xmit+0x4a/0x70
[ 52.914122] [<c12995cf>] dev_hard_start_xmit+0x25b/0x322
[ 52.914126] [<c1299b5b>] dev_queue_xmit+0x4c5/0x5df
[ 52.914131] [<c105ccf7>] ? trace_hardirqs_on+0xb/0xd
[ 52.914135] [<c129f611>] neigh_resolve_output+0x2e8/0x33f
[ 52.914142] [<c12a8b2a>] ? eth_header+0x0/0x8e
[ 52.914147] [<c12d3dbb>] ip_finish_output+0x323/0x3b1
[ 52.914152] [<c103955f>] ? local_bh_enable_ip+0x97/0xad
[ 52.914156] [<c12d485d>] ip_output+0xe2/0xfe
[ 52.914160] [<c12d3ff5>] ip_local_out+0x41/0x55
[ 52.914164] [<c12d5755>] ip_push_pending_frames+0x284/0x2fa
[ 52.914169] [<c12f218d>] icmp_push_reply+0xe8/0xf3
[ 52.914174] [<c12f2f36>] icmp_send+0x542/0x58b
[ 52.914181] [<c102b6af>] ? find_busiest_group+0x1c9/0x631
[ 52.914188] [<c12cb280>] ipv4_link_failure+0x17/0x7b
[ 52.914193] [<c12f0841>] arp_error_report+0x46/0x61
[ 52.914197] [<c129f8e0>] neigh_invalidate+0x68/0x80
[ 52.914201] [<c12a0bef>] neigh_timer_handler+0x124/0x1d2
[ 52.914206] [<c103fe7b>] run_timer_softirq+0x29e/0x419
[ 52.914210] [<c12a0acb>] ? neigh_timer_handler+0x0/0x1d2
[ 52.914215] [<c1039a21>] __do_softirq+0x126/0x277
[ 52.914219] [<c10398fb>] ? __do_softirq+0x0/0x277
[ 52.914222] <IRQ> [<c1039c0d>] ? irq_exit+0x38/0x74
[ 52.914230] [<c1003d1f>] ? do_IRQ+0x87/0x9b
[ 52.914235] [<c1002d2e>] ? common_interrupt+0x2e/0x34
[ 52.914241] [<c105007b>] ? sched_clock_local+0x3f/0x11f
[ 52.914249] [<c11ba45b>] ? acpi_idle_enter_bm+0x271/0x2a0
[ 52.914256] [<c12797bd>] ? cpuidle_idle_call+0x76/0x151
[ 52.914261] [<c1001565>] ? cpu_idle+0x49/0x76
[ 52.914266] [<c1319ece>] ? rest_init+0xd6/0xdb
[ 52.914274] [<c156579f>] ? start_kernel+0x31b/0x320
[ 52.914278] [<c15650c9>] ? i386_start_kernel+0xc9/0xd0
Paul, could you please explain if current lockdep rules are correct, or could be relaxed ?
I thought :
rcu_read_lock_bh();
was a shorthand to
local_disable_bh();
rcu_read_lock();
Why lockdep is not able to make a correct diagnostic ?
Thanks
[PATCH net-next-2.6] netpoll: Fix one rcu_dereference() lockdep splat
lockdep doesnt allow yet following construct :
rcu_read_lock_bh();
npinfo = rcu_dereference(skb->dev->npinfo);
Fix lockdep splat using rcu_dereference_bh()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/netpoll.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 4c77fe7..472365e 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -64,7 +64,7 @@ static inline bool netpoll_rx(struct sk_buff *skb)
bool ret = false;
rcu_read_lock_bh();
- npinfo = rcu_dereference(skb->dev->npinfo);
+ npinfo = rcu_dereference_bh(skb->dev->npinfo);
if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
goto out;
^ permalink raw reply related
* [PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries
From: Eric Dumazet @ 2010-06-16 2:45 UTC (permalink / raw)
To: David Miller; +Cc: netdev, paulmck
In-Reply-To: <20100615.142506.02275206.davem@davemloft.net>
Le mardi 15 juin 2010 à 14:25 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 15 Jun 2010 20:23:14 +0200
>
> > inetpeer currently uses an AVL tree protected by an rwlock.
> >
> > It's possible to make most lookups use RCU
> ...
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Applied, nice work Eric.
Thanks David !
Re-reading patch I realize refcnt is expected to be 0 for unused entries
(obviously), so we should use a different marker for 'about to be freed'
ones.
Thanks
[PATCH net-next-2.6] inetpeer: do not use zero refcnt for freed entries
Followup of commit aa1039e73cc2 (inetpeer: RCU conversion)
Unused inet_peer entries have a null refcnt.
Using atomic_inc_not_zero() in rcu lookups is not going to work for
them, and slow path is taken.
Fix this using -1 marker instead of 0 for deleted entries.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/ipv4/inetpeer.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 58fbc7e..39a14ba 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -187,7 +187,12 @@ static struct inet_peer *lookup_rcu_bh(__be32 daddr)
while (u != peer_avl_empty) {
if (daddr == u->v4daddr) {
- if (unlikely(!atomic_inc_not_zero(&u->refcnt)))
+ /* Before taking a reference, check if this entry was
+ * deleted, unlink_from_pool() sets refcnt=-1 to make
+ * distinction between an unused entry (refcnt=0) and
+ * a freed one.
+ */
+ if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1)))
u = NULL;
return u;
}
@@ -322,8 +327,9 @@ static void unlink_from_pool(struct inet_peer *p)
* in cleanup() function to prevent sudden disappearing. If we can
* atomically (because of lockless readers) take this last reference,
* it's safe to remove the node and free it later.
+ * We use refcnt=-1 to alert lockless readers this entry is deleted.
*/
- if (atomic_cmpxchg(&p->refcnt, 1, 0) == 1) {
+ if (atomic_cmpxchg(&p->refcnt, 1, -1) == 1) {
struct inet_peer **stack[PEER_MAXDEPTH];
struct inet_peer ***stackptr, ***delp;
if (lookup(p->v4daddr, stack) != p)
^ permalink raw reply related
* Re: [PATCH] virtio_net: implements ethtool_ops.get_drvinfo
From: Rusty Russell @ 2010-06-16 1:54 UTC (permalink / raw)
To: Taku Izumi; +Cc: David S. Miller, netdev@vger.kernel.org, Michael S. Tsirkin
In-Reply-To: <4C170D9E.5090407@jp.fujitsu.com>
On Tue, 15 Jun 2010 02:50:30 pm Taku Izumi wrote:
> Hi Rusty,
>
> (2010/06/15 13:28), Rusty Russell wrote:
> > On Fri, 11 Jun 2010 10:59:02 am Taku Izumi wrote:
> >> This patch implements ethtool_ops.get_drvinfo interface of virtio_net driver.
> >>
> >> Signed-off-by: Taku Izumi<izumi.taku@jp.fujitsu.com>
> >
> > Hi Taku!
> >
> > Does this have any useful effect?
>
> I often use "ethtool -i" command to check what driver controls the ehternet device.
> But because current virtio_net driver doesn't support "ethtool -i", it becomes the
> following:
>
> # ethtool -i eth3
> Cannot get driver information: Operation not supported
>
> My patch simply adds the "ethtool -i" support. The following is the result when
> using the virtio_net driver with my patch applied to.
>
> # ethtool -i eth3
> driver: virtio_net
> version: N/A
> firmware-version: N/A
> bus-info: virtio0
>
> Personally, "-i" is one of the most frequently-used option, and
> most network drivers support "ethtool -i", so I think virtio_net also should do.
Thanks, Taku.
I put this explanation in the commit message, and changed 32 to ARRAY_SIZE().
It's queued for sending to DaveM for the next merge window.
Result below.
Thanks!
Rusty.
Subject: virtio_net: implements ethtool_ops.get_drvinfo
Date: Fri, 11 Jun 2010 10:29:02 +0900
From: Taku Izumi <izumi.taku@jp.fujitsu.com>
I often use "ethtool -i" command to check what driver controls the
ehternet device. But because current virtio_net driver doesn't
support "ethtool -i", it becomes the following:
# ethtool -i eth3
Cannot get driver information: Operation not supported
This patch simply adds the "ethtool -i" support. The following is the
result when using the virtio_net driver with my patch applied to.
# ethtool -i eth3
driver: virtio_net
version: N/A
firmware-version: N/A
bus-info: virtio0
Personally, "-i" is one of the most frequently-used option, and most
network drivers support "ethtool -i", so I think virtio_net also
should do.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use ARRAY_SIZE)
---
0 files changed
Index: net-next.35/drivers/net/virtio_net.c
===================================================================
--- net-next.35.orig/drivers/net/virtio_net.c
+++ net-next.35/drivers/net/virtio_net.c
@@ -701,6 +701,19 @@ static int virtnet_close(struct net_devi
return 0;
}
+static void virtnet_get_drvinfo(struct net_device *dev,
+ struct ethtool_drvinfo *drvinfo)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ struct virtio_device *vdev = vi->vdev;
+
+ strncpy(drvinfo->driver, KBUILD_MODNAME, ARRAY_SIZE(drvinfo->driver));
+ strncpy(drvinfo->version, "N/A", ARRAY_SIZE(drvinfo->version));
+ strncpy(drvinfo->fw_version, "N/A", ARRAY_SIZE(drvinfo->fw_version));
+ strncpy(drvinfo->bus_info, dev_name(&vdev->dev),
+ ARRAY_SIZE(drvinfo->bus_info));
+}
+
static int virtnet_set_tx_csum(struct net_device *dev, u32 data)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -813,6 +825,7 @@ static void virtnet_vlan_rx_kill_vid(str
}
static const struct ethtool_ops virtnet_ethtool_ops = {
+ .get_drvinfo = virtnet_get_drvinfo,
.set_tx_csum = virtnet_set_tx_csum,
.set_sg = ethtool_op_set_sg,
.set_tso = ethtool_op_set_tso,
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: NET_SKB_PAD should depend on L1_CACHE_BYTES
From: David Miller @ 2010-06-16 1:16 UTC (permalink / raw)
To: eric.dumazet
Cc: alexander.h.duyck, jeffrey.t.kirsher, mingo, tglx, hpa, x86,
linux-kernel, netdev, gospo
In-Reply-To: <1276520234.2478.82.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 14 Jun 2010 14:57:14 +0200
> [PATCH net-next-2.6] net: NET_SKB_PAD should depend on L1_CACHE_BYTES
>
> In old kernels, NET_SKB_PAD was defined to 16.
>
> Then commit d6301d3dd1c2 (net: Increase default NET_SKB_PAD to 32), and
> commit 18e8c134f4e9 (net: Increase NET_SKB_PAD to 64 bytes) increased it
> to 64.
>
> While first patch was governed by network stack needs, second was more
> driven by performance issues on current hardware. Real intent was to
> align data on a cache line boundary.
>
> So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic.
>
> Remove microblaze and powerpc own NET_SKB_PAD definitions.
>
> Thanks to Alexander Duyck and David Miller for their comments.
>
> Suggested-by: David Miller <davem@davemloft.net>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH 0/3]netxen: bug fixes
From: David Miller @ 2010-06-16 1:15 UTC (permalink / raw)
To: amit.salecha; +Cc: netdev, ameen.rahman
In-Reply-To: <1276508345-17070-1-git-send-email-amit.salecha@qlogic.com>
From: Amit Kumar Salecha <amit.salecha@qlogic.com>
Date: Mon, 14 Jun 2010 02:39:02 -0700
> Sending series of 3 bug fixes. Please apply them on net-2.6.
All applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: David Miller @ 2010-06-16 1:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1276507363.2478.43.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 14 Jun 2010 11:22:43 +0200
> Third param (work) is unused, remove it.
>
> Remove __inline__ and inline qualifiers.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: David Miller @ 2010-06-16 1:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1276506144.2478.40.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 14 Jun 2010 11:02:24 +0200
> Instead of doing one atomic operation per frag, we can factorize them.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next-2.6] ipv6: syncookies: do not skip ->iif initialization
From: David Miller @ 2010-06-16 1:10 UTC (permalink / raw)
To: fw; +Cc: netdev, ggriffin.kernel
In-Reply-To: <1276464579-4399-1-git-send-email-fw@strlen.de>
From: Florian Westphal <fw@strlen.de>
Date: Sun, 13 Jun 2010 23:29:39 +0200
> When syncookies are in effect, req->iif is left uninitialized.
> In case of e.g. link-local addresses the route lookup then fails
> and no syn-ack is sent.
>
> Rearrange things so ->iif is also initialized in the syncookie case.
>
> want_cookie can only be true when the isn was zero, thus move the want_cookie
> check into the "!isn" branch.
>
> Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next-2.6] syncookies: check decoded options against sysctl settings
From: David Miller @ 2010-06-16 1:09 UTC (permalink / raw)
To: fw; +Cc: netdev
In-Reply-To: <1276464875-4460-1-git-send-email-fw@strlen.de>
From: Florian Westphal <fw@strlen.de>
Date: Sun, 13 Jun 2010 23:34:35 +0200
> - if (tcp_opt->sack_ok)
> - tcp_sack_reset(tcp_opt);
> + if (tcp_opt->sack_ok && !sysctl_tcp_sack)
> + return false;
>
If you remove the tcp_sack_reset() call here, who is going to
do it?
^ permalink raw reply
* Re: RX/close vcc race with solos/atmtcp/usbatm/he
From: Nathan Williams @ 2010-06-16 0:33 UTC (permalink / raw)
To: David Woodhouse; +Cc: linux-atm-general, netdev
In-Reply-To: <1275904970.17903.4658.camel@macbook.infradead.org>
On 7/06/2010 8:02 PM, David Woodhouse wrote:
> On Wed, 2010-05-26 at 12:16 +0100, David Woodhouse wrote:
>> I've had this crash reported to me...
>>
>> [18842.727906] EIP: [<e082f490>] br2684_push+0x19/0x234 [br2684]
>> SS:ESP 0068:dfb89d14
>
> Nathan, did you manage to get your customer to confirm that this fixes
> the problem? It'd be useful to get this into 2.6.35 and -stable.
>
No, I still haven't heard back from the customer. I'll keep sending email reminders.
^ permalink raw reply
* Re: [PATCH] drivers/staging/batman-adv: Use (pr|netdev)_<level> macro helpers
From: Joe Perches @ 2010-06-15 22:58 UTC (permalink / raw)
To: Sven Eckelmann, netdev
Cc: devel, Greg Kroah-Hartman,
b.a.t.m.a.n-ZwoEplunGu2X36UT3dwlltHuzzzSOjJt,
b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
Marek Lindner
In-Reply-To: <201006160037.48573.sven.eckelmann-Mmb7MZpHnFY@public.gmane.org>
On Wed, 2010-06-16 at 00:37 +0200, Sven Eckelmann wrote:
> Sven Eckelmann wrote:
Hi Sven.
> > The problem seems to be that dev_printk is used by netdev_printk (which is
> > used by netdev_info). netdev_printk will add (netdev)->dev.parent as second
> > parameter of dev_printk (and parent is NULL in our case). This macro will
> > now call dev_driver_string with NULL as parameter and just dereference
> > this null pointer.
> >
> > Maybe it is related to something else, but at least I think that this could
> > be the cause of the crash.
Nope, I think that's exactly correct.
> As far as I understand, the netdev_* stuff is made to be used by real drivers
> with more or less physical hardware. batman-adv is a virtual bridge used for
> mesh networks. Like net/bridge/ it has no physical parent device and only
> other net_devices are used inside of it - which may have real physical network
> devices as parents.
> Please correct me if my assumption is wrong.
No correction necessary...
netdev_printk and netdev_<level> are meant to be used
with parented network devices.
I think that netdev_<level> will eventually do the right
thing when dev->dev.parent is NULL. Right now, that'd
be a bit of an expensive test as it would be expanded in
place for every use of the macro.
Right now it's:
#define netdev_printk(level, netdev, format, args...) \
dev_printk(level, (netdev)->dev.parent, \
"%s: " format, \
netdev_name(netdev), ##args)
It could be something like:
#define netdev_printk(level, netdev, format, args...) \
do { \
if ((netdev)->dev.parent) \
dev_printk(level, (netdev)->dev.parent, \
"%s: " format, \
netdev_name(netdev), ##args); \
else \
printk(level "%s: " format, \
netdev_name(netdev), ##args); \
} while (0)
Unfortunately, that just about doubles the format string space,
so I don't really want to do that.
If/when %pV is accepted,
http://lkml.org/lkml/2010/3/4/17
http://lkml.org/lkml/2010/3/4/18
then the netdev_<level> macros will be converted to functions,
so size reduced with an added test for dev.parent == NULL without
the need to double the string space.
^ permalink raw reply
* Re: [PATCH 2/8] user_ns: Introduce user_nsmap_uid and user_ns_map_gid.
From: Eric W. Biederman @ 2010-06-15 22:37 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: David Miller, Serge Hallyn, Linux Containers, Daniel Lezcano,
netdev
In-Reply-To: <4C173389.1010000@openvz.org>
Pavel Emelyanov <xemul@openvz.org> writes:
> On 06/13/2010 05:28 PM, Eric W. Biederman wrote:
>>
>> Define what happens when a we view a uid from one user_namespace
>> in another user_namepece.
>>
>> - If the user namespaces are the same no mapping is necessary.
>>
>> - For most cases of difference use overflowuid and overflowgid,
>> the uid and gid currently used for 16bit apis when we have a 32bit uid
>> that does fit in 16bits. Effectively the situation is the same,
>> we want to return a uid or gid that is not assigned to any user.
>>
>> - For the case when we happen to be mapping the uid or gid of the
>> creator of the target user namespace use uid 0 and gid as confusing
>> that user with root is not a problem.
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> I suppose this one should go via Andrew, not Dave.
If it was stand alone I would send it that way.
In this case I'm hope Dave will indulge me because this bit is
simple, the only user for now is the network stack, and the people
maintaining the code have already acked the patch.
Eric
^ permalink raw reply
* Re: [PATCH] net: Fix error in comment on net_device_ops::ndo_get_stats
From: Ben Hutchings @ 2010-06-15 22:24 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <20100615.151040.191416638.davem@davemloft.net>
On Tue, 2010-06-15 at 15:10 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Mon, 14 Jun 2010 16:19:41 +0100
>
> > ndo_get_stats still returns struct net_device_stats *; there is
> > no struct net_device_stats64.
> >
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
>
> Applied.
>
> But, I am ignoring every single patch you submit from here on
> out that lacks a proper "net-2.6" or "net-next-2.6" destination
> tree indication in your Subject line.
>
> I've asked this of you at least 3 times, and you seem content to just
> ignore my request. But that's OK, as I think you'll stop ignoring
> me when it starts causing patches you care about to be dropped.
>
> Thanks :-)
Sorry Dave, I thought net-next-2.6 was still the default and I only
needed to make it more obvious when I wanted patches to go to net-2.6.
I've added the suffix to my git configuration now.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] net: Fix error in comment on net_device_ops::ndo_get_stats
From: David Miller @ 2010-06-15 22:10 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1276528781.2074.0.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 14 Jun 2010 16:19:41 +0100
> ndo_get_stats still returns struct net_device_stats *; there is
> no struct net_device_stats64.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
But, I am ignoring every single patch you submit from here on
out that lacks a proper "net-2.6" or "net-next-2.6" destination
tree indication in your Subject line.
I've asked this of you at least 3 times, and you seem content to just
ignore my request. But that's OK, as I think you'll stop ignoring
me when it starts causing patches you care about to be dropped.
Thanks :-)
^ permalink raw reply
* Re: [PATCH 6/8] scm: Capture the full credentials of the scm sender.
From: Eric W. Biederman @ 2010-06-15 22:08 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: David Miller, Linux Containers, Serge Hallyn, Pavel Emelyanov,
netdev
In-Reply-To: <20100615214541.GA22570@hallyn.com>
"Serge E. Hallyn" <serge@hallyn.com> writes:
> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>
>> Start capturing not only the userspace pid, uid and gid values of the
>> sending process but also the struct pid and struct cred of the sending
>> process as well.
>>
>> This is in preparation for properly supporting SCM_CREDENTIALS for
>> sockets that have different uid and/or pid namespaces at the different
>> ends.
>>
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>> include/net/scm.h | 28 ++++++++++++++++++++++++----
>> net/core/scm.c | 24 ++++++++++++++++++++++++
>> 2 files changed, 48 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/net/scm.h b/include/net/scm.h
>> index 17d9d2e..3165650 100644
>> --- a/include/net/scm.h
>> +++ b/include/net/scm.h
>> @@ -19,6 +19,8 @@ struct scm_fp_list {
>> };
>>
>> struct scm_cookie {
>> + struct pid *pid; /* Skb credentials */
>> + const struct cred *cred;
>> struct scm_fp_list *fp; /* Passed files */
>> struct ucred creds; /* Skb credentials */
>> #ifdef CONFIG_SECURITY_NETWORK
>> @@ -42,8 +44,27 @@ static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct scm_co
>> { }
>> #endif /* CONFIG_SECURITY_NETWORK */
>>
>> +static __inline__ void scm_set_cred(struct scm_cookie *scm,
>> + struct pid *pid, const struct cred *cred)
>> +{
>> + scm->pid = get_pid(pid);
>> + scm->cred = get_cred(cred);
>> + cred_to_ucred(pid, cred, &scm->creds);
>> +}
>> +
>> +static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
>> +{
>> + put_pid(scm->pid);
>> + scm->pid = NULL;
>> +
>> + if (scm->cred)
>> + put_cred(scm->cred);
>> + scm->cred = NULL;
>> +}
>> +
>> static __inline__ void scm_destroy(struct scm_cookie *scm)
>> {
>> + scm_destroy_cred(scm);
>> if (scm && scm->fp)
>> __scm_destroy(scm);
>> }
>> @@ -51,10 +72,7 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
>> static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
>> struct scm_cookie *scm)
>> {
>> - struct task_struct *p = current;
>> - scm->creds.uid = current_uid();
>> - scm->creds.gid = current_gid();
>> - scm->creds.pid = task_tgid_vnr(p);
>> + scm_set_cred(scm, task_tgid(current), current_cred());
>> scm->fp = NULL;
>> unix_get_peersec_dgram(sock, scm);
>> if (msg->msg_controllen <= 0)
>> @@ -96,6 +114,8 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
>> if (test_bit(SOCK_PASSCRED, &sock->flags))
>> put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds);
>>
>> + scm_destroy_cred(scm);
>> +
>> scm_passec(sock, msg, scm);
>>
>> if (!scm->fp)
>> diff --git a/net/core/scm.c b/net/core/scm.c
>> index b88f6f9..681c976 100644
>> --- a/net/core/scm.c
>> +++ b/net/core/scm.c
>> @@ -170,6 +170,30 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
>> err = scm_check_creds(&p->creds);
>> if (err)
>> goto error;
>> +
>
> I think this hunk needs to be documented. I.e. given that scm_send()
> will call scm_set_cred() before calling __scm_send, I don't see how
> these conditions could happen? If the condition can legitimately
> happen, then given all of the pid_t vs struct pid and 'cred' vs. 'creds'
> in these two hunks, I think a comment over each would be nice.
I think if you have the full context of __scm_send it becomes pretty obvious.
case SCM_CREDENTIALS:
if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
goto error;
memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
err = scm_check_creds(&p->creds);
if (err)
goto error;
At this point we have just copied ucred from userspace. We have done
scm_check_creds to ensure we allow the user to send the pid, uid, and
gid they have passed in.
These tests catch the case where the user is legitimately sending
something other than their own credentials.
>> + if (pid_vnr(p->pid) != p->creds.pid) {
>> + struct pid *pid;
>> + err = -ESRCH;
>> + pid = find_get_pid(p->creds.pid);
>> + if (!pid)
>> + goto error;
>> + put_pid(p->pid);
>> + p->pid = pid;
>> + }
>> +
>> + if ((p->cred->euid != p->creds.uid) ||
>> + (p->cred->egid != p->creds.gid)) {
>> + struct cred *cred;
>> + err = -ENOMEM;
>> + cred = prepare_creds();
>> + if (!cred)
>> + goto error;
>> +
>> + cred->uid = cred->euid = p->creds.uid;
>> + cred->gid = cred->egid = p->creds.uid;
>> + put_cred(p->cred);
>> + p->cred = cred;
>> + }
>> break;
>> default:
>> goto error;
Eric
^ permalink raw reply
* Re: [PATCH v4] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
From: David Miller @ 2010-06-15 22:04 UTC (permalink / raw)
To: sonic.adi; +Cc: netdev, uclinux-dist-devel
In-Reply-To: <1276250702.30044.2.camel@eight.analog.com>
From: sonic zhang <sonic.adi@gmail.com>
Date: Fri, 11 Jun 2010 18:05:02 +0800
>>From 4779e43a5a8446f695f8d6f5a006cfb45dc093d8 Mon Sep 17 00:00:00 2001
> From: Sonic Zhang <sonic.zhang@analog.com>
> Date: Fri, 11 Jun 2010 17:44:31 +0800
> Subject: [PATCH v4] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
>
> SKBs hold onto resources that can't be held indefinitely, such as TCP
> socket references and netfilter conntrack state. So if a packet is left
> in TX ring for a long time, there might be a TCP socket that cannot be
> closed and freed up.
>
> Current blackfin EMAC driver always reclaim and free used tx skbs in future
> transfers. The problem is that future transfer may not come as soon as
> possible. This patch start a timer after transfer to reclaim and free skb.
> There is nearly no performance drop with this patch.
>
> TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC.
> If EMAC TX transfer control is turned on, endless TX interrupts are triggered
> no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically,
> TX transfer control can't be turned off in the middle. The only way is to disable
> TX interrupt completely.
>
> Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Applied to net-next-2.6, thanks.
^ permalink raw reply
* Re: [PATCH 2/2] pktgen: receive packets and process incoming rate
From: David Miller @ 2010-06-15 21:59 UTC (permalink / raw)
To: daniel.turull; +Cc: eric.dumazet, netdev, robert, jens.laas, voravit
In-Reply-To: <4C10F117.60800@gmail.com>
From: Daniel Turull <daniel.turull@gmail.com>
Date: Thu, 10 Jun 2010 16:05:11 +0200
> This patch adds receiver part to pktgen taking advantages of SMP systems
> with multiple rx queues:
> - Creation of new proc file /proc/net/pktgen/pgrx to control and display the receiver.
> - It uses PER-CPU variable to store the results per each CPU.
> - Results displayed per CPU and aggregated.
> - The packet handler is add in the protocols handlers (dev_add_pack())
> - Available statistics: packets and bytes received, work time and rate
> - Only process pktgen packets
> - It is possible to select the incoming interface
> - Documentation updated with the new commands to control the receiver part.
>
> Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
I completely disagree with this patch on two levels:
1) pktgen is for "generating" packets, not receiving them.
Trying to put lipstick on a pig is never a good idea.
2) The information it gathers and shows is completely useless.
What's interesting as "RX work cost" is what happens deep
down in the netif_receive_skb() code paths, IP input, routing,
netfilter, whatever... but that is not what this thing is
measuring at all.
Sorry, I'm not applying this. You can probably do something more
clever with tracepoints.
^ permalink raw reply
* Re: [PATCH 6/8] scm: Capture the full credentials of the scm sender.
From: Serge E. Hallyn @ 2010-06-15 21:45 UTC (permalink / raw)
To: Eric W. Biederman
Cc: David Miller, Linux Containers, Serge Hallyn, Pavel Emelyanov,
netdev
In-Reply-To: <m1d3vvgirx.fsf@fess.ebiederm.org>
Quoting Eric W. Biederman (ebiederm@xmission.com):
>
> Start capturing not only the userspace pid, uid and gid values of the
> sending process but also the struct pid and struct cred of the sending
> process as well.
>
> This is in preparation for properly supporting SCM_CREDENTIALS for
> sockets that have different uid and/or pid namespaces at the different
> ends.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> include/net/scm.h | 28 ++++++++++++++++++++++++----
> net/core/scm.c | 24 ++++++++++++++++++++++++
> 2 files changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 17d9d2e..3165650 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -19,6 +19,8 @@ struct scm_fp_list {
> };
>
> struct scm_cookie {
> + struct pid *pid; /* Skb credentials */
> + const struct cred *cred;
> struct scm_fp_list *fp; /* Passed files */
> struct ucred creds; /* Skb credentials */
> #ifdef CONFIG_SECURITY_NETWORK
> @@ -42,8 +44,27 @@ static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct scm_co
> { }
> #endif /* CONFIG_SECURITY_NETWORK */
>
> +static __inline__ void scm_set_cred(struct scm_cookie *scm,
> + struct pid *pid, const struct cred *cred)
> +{
> + scm->pid = get_pid(pid);
> + scm->cred = get_cred(cred);
> + cred_to_ucred(pid, cred, &scm->creds);
> +}
> +
> +static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
> +{
> + put_pid(scm->pid);
> + scm->pid = NULL;
> +
> + if (scm->cred)
> + put_cred(scm->cred);
> + scm->cred = NULL;
> +}
> +
> static __inline__ void scm_destroy(struct scm_cookie *scm)
> {
> + scm_destroy_cred(scm);
> if (scm && scm->fp)
> __scm_destroy(scm);
> }
> @@ -51,10 +72,7 @@ static __inline__ void scm_destroy(struct scm_cookie *scm)
> static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
> struct scm_cookie *scm)
> {
> - struct task_struct *p = current;
> - scm->creds.uid = current_uid();
> - scm->creds.gid = current_gid();
> - scm->creds.pid = task_tgid_vnr(p);
> + scm_set_cred(scm, task_tgid(current), current_cred());
> scm->fp = NULL;
> unix_get_peersec_dgram(sock, scm);
> if (msg->msg_controllen <= 0)
> @@ -96,6 +114,8 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
> if (test_bit(SOCK_PASSCRED, &sock->flags))
> put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(scm->creds), &scm->creds);
>
> + scm_destroy_cred(scm);
> +
> scm_passec(sock, msg, scm);
>
> if (!scm->fp)
> diff --git a/net/core/scm.c b/net/core/scm.c
> index b88f6f9..681c976 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -170,6 +170,30 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
> err = scm_check_creds(&p->creds);
> if (err)
> goto error;
> +
I think this hunk needs to be documented. I.e. given that scm_send()
will call scm_set_cred() before calling __scm_send, I don't see how
these conditions could happen? If the condition can legitimately
happen, then given all of the pid_t vs struct pid and 'cred' vs. 'creds'
in these two hunks, I think a comment over each would be nice.
> + if (pid_vnr(p->pid) != p->creds.pid) {
> + struct pid *pid;
> + err = -ESRCH;
> + pid = find_get_pid(p->creds.pid);
> + if (!pid)
> + goto error;
> + put_pid(p->pid);
> + p->pid = pid;
> + }
> +
> + if ((p->cred->euid != p->creds.uid) ||
> + (p->cred->egid != p->creds.gid)) {
> + struct cred *cred;
> + err = -ENOMEM;
> + cred = prepare_creds();
> + if (!cred)
> + goto error;
> +
> + cred->uid = cred->euid = p->creds.uid;
> + cred->gid = cred->egid = p->creds.uid;
> + put_cred(p->cred);
> + p->cred = cred;
> + }
> break;
> default:
> goto error;
> --
> 1.6.5.2.143.g8cc62
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply
* Re: [Bugme-new] [Bug 16187] New: Carrier detection failed in dhcpcd when link is up
From: Andrew Morton @ 2010-06-15 21:24 UTC (permalink / raw)
To: netdev, Grant Grundler, Kyle McMartin
Cc: bugzilla-daemon, bugme-daemon, casteyde.christian
In-Reply-To: <bug-16187-10286@https.bugzilla.kernel.org/>
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Sat, 12 Jun 2010 15:15:31 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=16187
>
> Summary: Carrier detection failed in dhcpcd when link is up
> Product: Networking
> Version: 2.5
> Kernel Version: 2.6.35-rc2
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> AssignedTo: acme@ghostprotocols.net
> ReportedBy: casteyde.christian@free.fr
> Regression: Yes
>
>
> Created an attachment (id=26742)
> --> (https://bugzilla.kernel.org/attachment.cgi?id=26742)
> lspci output for 2.6.34 on my computer
>
> Kernel at least 2.6.35-rc2, 2.6.34 works fine
Seems to be post-2.6.34 breakage in the tulip driver.
> Athlon X2 3000 in 64bits mode
> Slackware 13.1
> "Ethernet controller: ALi Corporation ULi 1689,1573 integrated ethernet. (rev
> 40)" (from lspci)
>
> Since 2.6.35-rc2 (didn't checked -rc1), dhcpcd hangs fails to detect carrier
> appearance at boot.
>
> The Slackware network script uses dhcpcd to bring DHCP interfaces up. At boot,
> it seems my network device doesn't have the link up immediatly. dhcpcd tries to
> bring it up, and wait for the carrier. The carrier indeed goes up, but dhcpcd
> gets absolutly no notification of that, and therefore times out.
>
> logs says that:
> Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net
> 127.0.0.0 netmask 255.0.0.0 lo
> Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10 eth0
> Jun 12 16:53:58 sirius dhcpcd: version 5.2.2 starting
> Jun 12 16:53:58 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
> Jun 12 16:53:58 sirius dhcpcd: eth0: waiting for carrier
> Jun 12 16:54:01 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full
> duplex
> Jun 12 16:54:01 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes
> ready
> then dhcpcd times out a little later (10 seconds as -t 10 is specified).
>
> If I use the -K option (no carrier detection), and does ifconfig eth0 up just
> before issueing the dhcpcd command, dhcpcd doesn't wait for the carrier and
> gets a lease correctly.
>
> Reversely, with the 2.6.34 kernel, dhcpcd indeed gets the link up notification,
> and gets the lease immediatly. In this case, the log says:
>
> Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net
> 127.0.0.0 netmask 255.0.0.0 lo
> Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10 eth0
> Jun 12 17:04:22 sirius dhcpcd: version 5.2.2 starting
> Jun 12 17:04:22 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
> Jun 12 17:04:22 sirius dhcpcd: eth0: waiting for carrier
> Jun 12 17:04:25 sirius dhcpcd: eth0: carrier acquired
> Jun 12 17:04:25 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full
> duplex
> Jun 12 17:04:25 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes
> ready
> Jun 12 17:04:25 sirius dhcpcd: eth0: broadcasting for a lease
> Jun 12 17:04:29 sirius dhcpcd: eth0: offered 192.168.1.3 from 192.168.1.1
> Jun 12 17:04:29 sirius dhcpcd: eth0: acknowledged 192.168.1.3 from 192.168.1.1
> Jun 12 17:04:29 sirius dhcpcd: eth0: checking for 192.168.1.3
> Jun 12 17:04:34 sirius dhcpcd: eth0: leased 192.168.1.3 for 864000 seconds
> Jun 12 17:04:34 sirius dhcpcd: forking to background
>
> As you can see, dhcpcd asks for a lease as soon as the kernel tells it the link
> is up (message "carrier acquired").
>
> Therefore, I think 2.6.35-rc* notification of carrier is broken, and at least
> it broke dhcpcd way of watching the link.
>
^ permalink raw reply
* Re: [PATCH net-next-2.6] inetpeer: RCU conversion
From: David Miller @ 2010-06-15 21:25 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, paulmck
In-Reply-To: <1276626194.2541.186.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 15 Jun 2010 20:23:14 +0200
> inetpeer currently uses an AVL tree protected by an rwlock.
>
> It's possible to make most lookups use RCU
...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, nice work Eric.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox