netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* race in skb_splice_bits?
@ 2008-05-27  0:25 Octavian Purdila
  2008-05-27  2:08 ` Ben Hutchings
  2008-05-27 11:01 ` Evgeniy Polyakov
  0 siblings, 2 replies; 29+ messages in thread
From: Octavian Purdila @ 2008-05-27  0:25 UTC (permalink / raw)
  To: netdev


Hi,

The following socket lock dropping in skb_splice_bits seems to open a race 
condition which causes an invalid kernel access:

>        if (spd.nr_pages) {
>                int ret;
>
>                /*                                                                                                                            
>                 * Drop the socket lock, otherwise we have reverse                                                                            
>                 * locking dependencies between sk_lock and i_mutex                                                                           
>                 * here as compared to sendfile(). We enter here                                                                              
>                 * with the socket lock held, and splice_to_pipe() will                                                                       
>                 * grab the pipe inode lock. For sendfile() emulation,                                                                        
>                 * we call into ->sendpage() with the i_mutex lock held                                                                       
>                 * and networking will grab the socket lock.                                                                                  
>                 */
>                release_sock(__skb->sk);                                                                                                    
>                ret = splice_to_pipe(pipe, &spd);
>                lock_sock(__skb->sk);                                                                                                       
>                return ret;
>        }

Setup: 

- powerpc, non-SMP, no preemption, 2.6.25
- RX side: LRO enabled, splice from socket to /dev/null; 
- TX side: MTU set to 128 bytes (on the TX side), GSO enabled, splice from 
file to socket

The oops - on the RX side: 

Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0x80109ee0
Oops: Kernel access of bad area, sig: 11 [#1]
Ixia TCPX
Modules linked in: almfmanager(P) filtermanager ixnam_llm(P) ixna
m_tcpx(P) hwstate ixllm ixhostm ixsysctl(P) nlproc_driver
NIP: 80109ee0 LR: 80109edc CTR: 8010c52c
REGS: bcd25b90 TRAP: 0300   Tainted: P          (2.6.25-00005-gf7b547d)
MSR: 00009032 <EE,ME,IR,DR>  CR: 24000822  XER: 20000000
DAR: 00000030, DSISR: 40000000
TASK = bfbe1bf0[156] 'splice' THREAD: bcd24000
GPR00: 8010c94c bcd25c40 bfbe1bf0 00000000 00000000 802835f8 00000001 0000004c 
GPR08: 00024000 00000100 00000032 bcd24000 00010dc4 100198b4 390046a8 0a5042f3 
GPR16: 8028238c bd18fe00 00000008 10010000 6fbcbac0 00000000 10001060 bcd25dd8 
GPR24: 8014b520 00000000 bcd25e30 bccefa00 bf33e300 fffffe00 bcd25d70 00000000 
NIP [80109ee0] lock_sock_nested+0x1c/0x50
LR [80109edc] lock_sock_nested+0x18/0x50
Call Trace:
[bcd25c60] [8010c94c] skb_splice_bits+0x130/0x134
[bcd25dc0] [8014b548] tcp_splice_data_recv+0x28/0x38
[bcd25dd0] [8014d08c] tcp_read_sock+0x108/0x1f8
[bcd25e20] [8014b58c] __tcp_splice_read+0x34/0x44
[bcd25e40] [8014b61c] tcp_splice_read+0x80/0x220
[bcd25e90] [80105730] sock_splice_read+0x2c/0x44
[bcd25ea0] [8008a374] do_splice_to+0x90/0xac
[bcd25ed0] [8008a850] do_splice+0x258/0x2f0
[bcd25f10] [8008b1d4] sys_splice+0xe0/0xe8
[bcd25f40] [8000ff14] ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0x10000894
     LR = 0x10000e2c


Analysis: 

Printks show that __skb->sk is non-NULL before splice_to_pipe and NULL after. 
Using a hardware watchpoint I was able to see that the write in __skb->sk is 
caused by __allock_skb()'s memset() which seems to indicate that the __skb is 
freed between release_sock() and lock_sock(). Turning on slab debugging and 
the hardware watchpoint shows that the free happens during tcp_collapse() 
which was initiated as a result of an timer interrupt -> softirq -> NAPI 
polling -> lro_flush_all().

Commenting out the sequence that drops the socket lock seems to fix the 
problem on my setup.

Regards,
tavi

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-05-28 20:17 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-27  0:25 race in skb_splice_bits? Octavian Purdila
2008-05-27  2:08 ` Ben Hutchings
2008-05-27 10:41   ` Octavian Purdila
2008-05-27 11:01 ` Evgeniy Polyakov
2008-05-27 11:08   ` Ben Hutchings
2008-05-27 11:52     ` Evgeniy Polyakov
2008-05-27 11:56       ` Evgeniy Polyakov
2008-05-27 12:53         ` Octavian Purdila
2008-05-27 13:21           ` Evgeniy Polyakov
2008-05-27 14:03             ` Evgeniy Polyakov
2008-05-27 14:39               ` Octavian Purdila
2008-05-27 15:09                 ` Evgeniy Polyakov
2008-05-27 15:12                   ` Evgeniy Polyakov
2008-05-27 15:22                     ` Evgeniy Polyakov
2008-05-27 15:33                       ` Octavian Purdila
2008-05-27 15:47                         ` Evgeniy Polyakov
2008-05-27 17:28                           ` Evgeniy Polyakov
2008-05-27 23:59                             ` Octavian Purdila
2008-05-28  8:52                               ` Evgeniy Polyakov
2008-05-28 13:20                                 ` Octavian Purdila
2008-05-28 14:11                                   ` Evgeniy Polyakov
2008-05-28 15:20                                     ` Octavian Purdila
2008-05-28 15:42                                       ` Evgeniy Polyakov
2008-05-28 17:08                                       ` Octavian Purdila
2008-05-28 17:51                                         ` Evgeniy Polyakov
2008-05-28 18:02                                           ` Octavian Purdila
2008-05-28 20:01                                             ` Jarek Poplawski
2008-05-28 20:09                                               ` Octavian Purdila
2008-05-28 20:16                                                 ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).