Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
@ 2007-11-11  1:39 Chazarain Guillaume
  2007-11-11 22:40 ` Ilpo Järvinen
  0 siblings, 1 reply; 10+ messages in thread
From: Chazarain Guillaume @ 2007-11-11  1:39 UTC (permalink / raw)
  To: Ilpo Järvinen, David Miller; +Cc: Netdev

Hello Ilpo, thanks a lot for your investigation

> Do you have GSO enabled?

According to ethtool -k, no.

> Is this reproducable?

Unfortunately not, I saw it only once.

> You can try to provoke it by setting tcp_sack
 sysctl 
>  to 0 as this seems to be non-SACK related... If so, you could try the 
> debug patch below

> # CONFIG_DEBUG_LIST is not set

I'm currently running bittorrent with all of this, I just saw this (for the first time ever),
but otherwise it works fine:

WARNING: at net/ipv4/tcp_output.c:1807 tcp_simple_retransmit()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f6a79>] tcp_simple_retransmit+0xfa/0x185
 [<c02fa072>] tcp_v4_err+0x35d/0x4cb
 [<c0301f7d>] icmp_unreach+0x327/0x352
 [<c030159d>] icmp_rcv+0xe0/0xf7
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92

> Have you run memtest recently?

Just ran it with no errors for 6 minutes 30. The box is otherwise stable though.

I forgot to say that I have a kdump image of the crash (I had to recompile this

2.6.24-rc2 kernel as I deleted its vmlinux), so I could check that you are
right on track with your assertions at the time of the crash.

> +    if (WARN_ON(tcp_write_queue_head(sk) == NULL))
> +        return;

(gdb) p sk->sk_write_queue.next
$11 = (struct sk_buff *) 0xe43a04b0
(gdb) p &sk->sk_write_queue
$12 = (struct sk_buff_head *) 0xe43a04b0


> +    if (WARN_ON(!tp->packets_out))
> +        return;

(gdb) p ((struct tcp_sock *) sk)->packets_out
$13 = 0

> +    if (tp->lost_out > tp->packets_out)
> +        printk(KERN_ERR "Lost underflowed to %u\n", tp->lost_out);

(gdb) p ((struct tcp_sock *) sk)->lost_out
$14 = 4294967295

Some more gdb output for information:

#0  tcp_xmit_retransmit_queue (sk=0xe43a0440) at net/ipv4/tcp_output.c:1962
1962                            __u8 sacked = TCP_SKB_CB(skb)->sacked;

(gdb) bt
#0  tcp_xmit_retransmit_queue (sk=0xe43a0440) at net/ipv4/tcp_output.c:1962
#1  0xc02f298a in tcp_ack (sk=0xe43a0440, skb=0xc75720c0, flag=1038) at net/ipv4/tcp_input.c:2524
#2  0xc02f5208 in tcp_rcv_established (sk=0xe43a0440, skb=0xc75720c0, th=0xeac35058, len=32) at net/ipv4/tcp_input.c:4502
#3  0xc02fa711 in tcp_v4_do_rcv (sk=0xe43a0440, skb=0xc75720c0) at net/ipv4/tcp_ipv4.c:1572
#4  0xc02fc557 in tcp_v4_rcv (skb=0xc75720c0) at net/ipv4/tcp_ipv4.c:1696
#5  0xc02e4961 in ip_local_deliver_finish (skb=0xc75720c0) at net/ipv4/ip_input.c:233
#6  0xc02e4d64 in ip_local_deliver (skb=0xc75720c0) at net/ipv4/ip_input.c:271
#7  0xc02e481d in ip_rcv_finish (skb=0xc75720c0) at include/net/dst.h:241
#8  0xc02e4cd4 in ip_rcv (skb=<value optimized out>, dev=0xc717c000, pt=<value optimized out>, orig_dev=0xc717c000) at net/ipv4/ip_input.c:445
#9  0xc02c9062 in netif_receive_skb (skb=0xc75720c0) at net/core/dev.c:2088
#10 0xc02cae8e in process_backlog (napi=0xc04b651c, quota=64) at net/core/dev.c:2125
#11 0xc02cab3f in net_rx_action (h=<value optimized out>) at net/core/dev.c:2195
#12 0xc0121d17 in __do_softirq () at kernel/softirq.c:232
#13 0xc0105975 in do_softirq () at arch/x86/kernel/irq_32.c:216
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb) bt full
#0  tcp_xmit_retransmit_queue (sk=0xe43a0440) at net/ipv4/tcp_output.c:1962
        sacked = 176 '�'
        skb = (struct sk_buff *) 0xe43a04b0
        packet_cnt = 0
#1  0xc02f298a in tcp_ack (sk=0xe43a0440, skb=0xc75720c0, flag=1038) at net/ipv4/tcp_input.c:2524
        packets_acked = 1
        sacked = 134 '\206'
        tp = <value optimized out>
        prior_snd_una = 2015065950
        ack_seq = 4044906321
        ack = 2015065959
        prior_in_flight = 2
        seq_rtt = -1
        frto_cwnd = <value optimized out>
#2  0xc02f5208 in tcp_rcv_established (sk=0xe43a0440, skb=0xc75720c0, th=0xeac35058, len=32) at net/ipv4/tcp_input.c:4502
        tcp_header_len = <value optimized out>
        tp = <value optimized out>
#3  0xc02fa711 in tcp_v4_do_rcv (sk=0xe43a0440, skb=0xc75720c0) at net/ipv4/tcp_ipv4.c:1572
        rsk = <value optimized out>
#4  0xc02fc557 in tcp_v4_rcv (skb=0xc75720c0) at net/ipv4/tcp_ipv4.c:1696
        err = -950591296
        filter = <value optimized out>
        iph = (const struct iphdr *) 0xeac35044
        th = (struct tcphdr *) 0xeac35058
        sk = (struct sock *) 0xe43a0440
        ret = <value optimized out>
#5  0xc02e4961 in ip_local_deliver_finish (skb=0xc75720c0) at net/ipv4/ip_input.c:233
        ret = <value optimized out>
        protocol = <value optimized out>
        hash = 0
        raw_sk = (struct sock *) 0x0
#6  0xc02e4d64 in ip_local_deliver (skb=0xc75720c0) at net/ipv4/ip_input.c:271
        __ret = -465959760
#7  0xc02e481d in ip_rcv_finish (skb=0xc75720c0) at include/net/dst.h:241
        iph = (const struct iphdr *) 0xeac35044
        rt = <value optimized out>
#8  0xc02e4cd4 in ip_rcv (skb=<value optimized out>, dev=0xc717c000, pt=<value optimized out>, orig_dev=0xc717c000) at net/ipv4/ip_input.c:445
        __ret = <value optimized out>
        iph = (struct iphdr *) 0xeac35044
        len = 3829007536
#9  0xc02c9062 in netif_receive_skb (skb=0xc75720c0) at net/core/dev.c:2088
        ptype = (struct packet_type *) 0xc0437a08
        pt_prev = <value optimized out>
        orig_dev = (struct net_device *) 0xc717c000
        ret = 1
        type = 8
#10 0xc02cae8e in process_backlog (napi=0xc04b651c, quota=64) at net/core/dev.c:2125
        skb = (struct sk_buff *) 0xe43a04b0
        dev = (struct net_device *) 0xc717c000
        work = 0
        start_time = 35819170
#11 0xc02cab3f in net_rx_action (h=<value optimized out>) at net/core/dev.c:2195
        n = (struct napi_struct *) 0xc04b651c
        work = 0
        weight = 64
        start_time = 35819170
        budget = 300
        have = (void *) 0x0
        __func__ = "net_rx_action"
        __warned = 0
#12 0xc0121d17 in __do_softirq () at kernel/softirq.c:232
        h = (struct softirq_action *) 0xc049e6b8
        pending = 1
        max_restart = 9
#13 0xc0105975 in do_softirq () at arch/x86/kernel/irq_32.c:216
        flags = 70
        irqctx = <value optimized out>
        isp = (u32 *) 0xc0439f14
        __func__ = "do_softirq"
        __warned = 0
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb) p *((struct tcp_sock *) sk)
$1 = {inet_conn = {icsk_inet = {sk = {__sk_common = {skc_family = 2, skc_state = 1 '\001', skc_reuse = 1 '\001', skc_bound_dev_if = 0, skc_node = {next = 0x0, pprev = 0xc65e0f38}, skc_bind_node = {next = 0xe604acd0, pprev = 0xe42fa890}, skc_refcnt = {counter = 3}, skc_hash = 2132787687, skc_prot = 0xc042ef40, skc_net = 0xc04b64a0}, sk_shutdown = 0 '\0', sk_no_check = 0 '\0', sk_userlocks = 0 '\0', sk_protocol = 6 '\006', sk_type = 1, sk_rcvbuf = 87380, sk_lock = {slock = {raw_lock = {<No data fields>}}, owned = 0, wq = {lock = {raw_lock = {<No data fields>}}, task_list = {next = 0xe43a0474, prev = 0xe43a0474}}}, sk_backlog = {head = 0x0, tail = 0x0}, sk_sleep = 0xc9be8d98, sk_dst_cache = 0xc86eb200, sk_policy = {0x0, 0x0}, sk_dst_lock = {raw_lock = {<No data fields>}}, sk_rmem_alloc = {counter = 0}, sk_wmem_alloc = {counter = 0}, sk_omem_alloc = {counter = 0}, sk_sndbuf = 35520, sk_receive_queue = {next = 0xe43a04a4, prev = 0xe43a04a4, qlen = 0, lock =
 {raw_lock = {<No data fields>}}}, sk_write_queue = {next = 0xe43a04b0, prev = 0xe43a04b0, qlen = 0, lock = {raw_lock = {<No data fields>}}}, sk_async_wait_queue = {next = 0x0, prev = 0x0, qlen = 0, lock = {raw_lock = {<No data fields>}}}, sk_wmem_queued = 0, sk_forward_alloc = 4096, sk_allocation = 208, sk_route_caps = 0, sk_gso_type = 1, sk_rcvlowat = 1, sk_flags = 17152, sk_lingertime = 0, sk_error_queue = {next = 0xe43a04e8, prev = 0xe43a04e8, qlen = 0, lock = {raw_lock = {<No data fields>}}}, sk_prot_creator = 0xc042ef40, sk_callback_lock = {raw_lock = {<No data fields>}}, sk_err = 0, sk_err_soft = 0, sk_ack_backlog = 0, sk_max_ack_backlog = 50, sk_priority = 2, sk_peercred = {pid = 0, uid = 4294967295, gid = 4294967295}, sk_rcvtimeo = 2147483647, sk_sndtimeo = 2147483647, sk_filter = 0x0, sk_protinfo = 0x0, sk_timer = {entry = {next = 0x0, prev = 0xc049ef80}, expires = 34855119, function = 0xc02f8e64 <tcp_keepalive_timer>, data = 3829007424, base =
 0xc049e900}, sk_stamp = {tv64 = 3294967295}, sk_socket = 0xc9be8d80, sk_user_data = 0x0, sk_sndmsg_page = 0x0, sk_send_head = 0x0, sk_sndmsg_off = 0, sk_write_pending = 0, sk_security = 0x0, sk_state_change = 0xc02c2378 <sock_def_wakeup>, sk_data_ready = 0xc02c2b7c <sock_def_readable>, sk_write_space = 0xc02c6817 <sk_stream_write_space>, sk_error_report = 0xc02c2b12 <sock_def_error_report>, sk_backlog_rcv = 0xc02fa6e6 <tcp_v4_do_rcv>, sk_destruct = 0xc0306a97 <inet_sock_destruct>}, pinet6 = 0x0, daddr = 2282963090, rcv_saddr = 50374848, dport = 41928, num = 6881, saddr = 50374848, uc_ttl = -1, cmsg_flags = 0, opt = 0x0, sport = 57626, id = 11867, tos = 8 '\b', mc_ttl = 46 '.', pmtudisc = 1 '\001', recverr = 0 '\0', is_icsk = 1 '\001', freebind = 0 '\0', hdrincl = 0 '\0', mc_loop = 1 '\001', mc_index = 2, mc_addr = 0, mc_list = 0x0, cork = {flags = 0, fragsize = 0, opt = 0x0, rt = 0x0, length = 0, addr = 0, fl = {oif = 0, iif = 0, mark = 0, nl_u = {ip4_u
 = {daddr = 0, saddr = 0, tos = 0 '\0', scope = 0 '\0'}, ip6_u = {daddr = {in6_u = {u6_addr8 = {0 '\0' <repeats 16 times>}, u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, u6_addr32 = {0, 0, 0, 0}}}, saddr = {in6_u = {u6_addr8 = {0 '\0' <repeats 16 times>}, u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, u6_addr32 = {0, 0, 0, 0}}}, flowlabel = 0}, dn_u = {daddr = 0, saddr = 0, scope = 0 '\0'}}, proto = 0 '\0', flags = 0 '\0', uli_u = {ports = {sport = 0, dport = 0}, icmpt = {type = 0 '\0', code = 0 '\0'}, dnports = {sport = 0, dport = 0}, spi = 0, mht = {type = 0 '\0'}}, secid = 0}}}, icsk_accept_queue = {rskq_accept_head = 0x0, rskq_accept_tail = 0x0, syn_wait_lock = {raw_lock = {<No data fields>}}, rskq_defer_accept = 0 '\0', listen_opt = 0x0}, icsk_bind_hash = 0xc7137bd0, icsk_timeout = 35822107, icsk_retransmit_timer = {entry = {next = 0xe4302e94, prev = 0xc851c1d4}, expires = 35822107, function = 0xc02f91a8 <tcp_write_timer>, data = 3829007424, base = 0xc049e900},
 icsk_delack_timer = {entry = {next = 0x0, prev = 0x200200}, expires = 35816596, function = 0xc02f9026 <tcp_delack_timer>, data = 3829007424, base = 0xc049e900}, icsk_rto = 1860, icsk_pmtu_cookie = 1500, icsk_ca_ops = 0xc0430c20, icsk_af_ops = 0xc042ef00, icsk_sync_mss = 0xc02f59a9 <tcp_sync_mss>, icsk_ca_state = 3 '\003', icsk_retransmits = 0 '\0', icsk_pending = 0 '\0', icsk_backoff = 0 '\0', icsk_syn_retries = 0 '\0', icsk_probes_out = 0 '\0', icsk_ext_hdr_len = 0, icsk_ack = {pending = 0 '\0', quick = 14 '\016', pingpong = 0 '\0', blocked = 0 '\0', ato = 40, timeout = 35816596, lrcvtime = 35816556, last_seg_size = 0, rcv_mss = 1368}, icsk_mtup = {enabled = 0, search_high = 1420, search_low = 564, probe_size = 0}, icsk_ca_priv = {0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0}}, tcp_header_len = 32, xmit_size_goal = 1368, pred_flags = 2520649856, rcv_nxt = 4044906321, copied_seq = 4044906321, rcv_wup = 4044906321, snd_nxt = 2015065959, snd_una =
 2015065959, snd_sml = 2015065959, rcv_tstamp = 35819170, lsndtime = 35818339, ucopy = {prequeue = {next = 0xe43a06ec, prev = 0xe43a06ec, qlen = 0, lock = {raw_lock = {<No data fields>}}}, task = 0x0, iov = 0x0, memory = 0, len = 0}, snd_wl1 = 4044906321, snd_wnd = 64088, max_window = 64088, mss_cache = 1368, window_clamp = 64087, rcv_ssthresh = 64087, frto_highmark = 0, reordering = 3 '\003', frto_counter = 0 '\0', nonagle = 0 '\0', keepalive_probes = 0 '\0', srtt = 8775, mdev = 764, mdev_max = 200, rttvar = 764, rtt_seq = 2015065959, packets_out = 0, retrans_out = 0, rx_opt = {ts_recent_stamp = 1194667355, ts_recent = 106039901, rcv_tsval = 106039901, rcv_tsecr = 35818339, saw_tstamp = 1, tstamp_ok = 1, dsack = 0, wscale_ok = 1, sack_ok = 0, snd_wscale = 2, rcv_wscale = 7, eff_sacks = 0 '\0', num_sacks = 0 '\0', user_mss = 0, mss_clamp = 1380}, snd_ssthresh = 2, snd_cwnd = 2, snd_cwnd_cnt = 1, snd_cwnd_clamp = 4294967295, snd_cwnd_used = 0,
 snd_cwnd_stamp = 35819170, out_of_order_queue = {next = 0xe43a0774, prev = 0xe43a0774, qlen = 0, lock = {raw_lock = {<No data fields>}}}, rcv_wnd = 64128, write_seq = 2015065959, pushed_seq = 2015065959, duplicate_sack = {{start_seq = 0, end_seq = 0}}, selective_acks = {{start_seq = 0, end_seq = 0}, {start_seq = 0, end_seq = 0}, {start_seq = 0, end_seq = 0}, {start_seq = 0, end_seq = 0}}, recv_sack_cache = {{start_seq = 0, end_seq = 0}, {start_seq = 0, end_seq = 0}, {start_seq = 0, end_seq = 0}, {start_seq = 0, end_seq = 0}}, highest_sack = 0, lost_skb_hint = 0x0, scoreboard_skb_hint = 0x0, retransmit_skb_hint = 0x0, forward_skb_hint = 0x0, fastpath_skb_hint = 0x0, fastpath_cnt_hint = 0, lost_cnt_hint = 1, retransmit_cnt_hint = 0, lost_retrans_low = 2015065959, advmss = 1448, prior_ssthresh = 3, lost_out = 4294967295, sacked_out = 0, fackets_out = 0, high_seq = 2015065959, retrans_stamp = 35794455, undo_marker = 2015065959, undo_retrans = 0, urg_seq =
 0, urg_data = 0, urg_mode = 0 '\0', ecn_flags = 0 '\0', snd_up = 0, total_retrans = 143, bytes_acked = 0, keepalive_time = 0, keepalive_intvl = 0, linger2 = 0, last_synq_overflow = 0, tso_deferred = 0, rcv_rtt_est = {rtt = 15645, seq = 4044968664, time = 35713869}, rcvq_space = {space = 10944, seq = 4044906321, time = 35816556}, mtu_probe = {probe_seq_start = 0, probe_seq_end = 0}}

My naive attempt at understanding what's going on:

My oops starts with:
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000045


gdb tells me the crash is in:
#0  tcp_xmit_retransmit_queue (sk=0xe43a0440) at net/ipv4/tcp_output.c:1962
1962                            __u8 sacked = TCP_SKB_CB(skb)->sacked;

(gdb) p ((struct tcp_skb_cb *)((struct sk_buff *)0)->cb)->sacked
Cannot access memory at address 0x45

A 0x45 offset is definitely a ->sacked on a null skb, but:

(gdb) p skb
$5 = (struct sk_buff *) 0xe43a04b0

which is sk->sk_write_queue so I don't understand why the tcp_for_write_queue_from made an iteration.

I don't know if gdb is playing tricks or if it's because I had to recompile the crashing kernel.

Thanks.

-- 
Guillaume



      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
@ 2007-11-11 12:53 Chazarain Guillaume
  0 siblings, 0 replies; 10+ messages in thread
From: Chazarain Guillaume @ 2007-11-11 12:53 UTC (permalink / raw)
  To: Ilpo Järvinen, David Miller; +Cc: Netdev

Hi,

> I'm currently running bittorrent with all of this, I just saw this (for
 the first time ever),
> but otherwise it works fine:
> 
> WARNING: at net/ipv4/tcp_output.c:1807 tcp_simple_retransmit()
>  [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
>  [<c0105563>] show_trace+0x12/0x14
>  [<c0105668>] dump_stack+0x15/0x17
>  [<c02f6a79>] tcp_simple_retransmit+0xfa/0x185
>  [<c02fa072>] tcp_v4_err+0x35d/0x4cb
>  [<c0301f7d>] icmp_unreach+0x327/0x352
>  [<c030159d>] icmp_rcv+0xe0/0xf7
>  [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
>  [<c02e3178>] ip_local_deliver+0x72/0x7e
>  [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
>  [<c02e30e8>] ip_rcv+0x1e1/0x1ff
>  [<c02c755c>] netif_receive_skb+0x37d/0x401
>  [<c02c9372>] process_backlog+0x5b/0x96
>  [<c02c9037>] net_rx_action+0x87/0x152
>  [<c0121c9f>] __do_softirq+0x38/0x7a
>  [<c0105975>] do_softirq+0x41/0x92

I don't know if it's caused by the disabling of tcp_sack, but bittorrenting the whole night, I
have a lot more of tcp_verify_left_out() warnings in the logs:

WARNING: at net/ipv4/tcp_output.c:1807 tcp_simple_retransmit()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f6a79>] tcp_simple_retransmit+0xfa/0x185
 [<c02fa072>] tcp_v4_err+0x35d/0x4cb
 [<c0301f7d>] icmp_unreach+0x327/0x352
 [<c030159d>] icmp_rcv+0xe0/0xf7
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_input.c:2405 tcp_fastretrans_alert()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f05a3>] tcp_ack+0xd8c/0x17dd
 [<c02f36bd>] tcp_rcv_established+0xdb/0x5f2
 [<c02f8bc5>] tcp_v4_do_rcv+0x2b/0x310
 [<c02faa0b>] tcp_v4_rcv+0x82b/0x89d
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_input.c:2405 tcp_fastretrans_alert()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f05a3>] tcp_ack+0xd8c/0x17dd
 [<c02f3b03>] tcp_rcv_established+0x521/0x5f2
 [<c02f8bc5>] tcp_v4_do_rcv+0x2b/0x310
 [<c02faa0b>] tcp_v4_rcv+0x82b/0x89d
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_input.c:1672 tcp_enter_frto()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f2a60>] tcp_enter_frto+0x166/0x1db
 [<c02f7a06>] tcp_write_timer+0x3aa/0x5bc
 [<c01249b4>] run_timer_softirq+0x105/0x177
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_input.c:2941 tcp_process_frto()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f014b>] tcp_ack+0x934/0x17dd
 [<c02f3b03>] tcp_rcv_established+0x521/0x5f2
 [<c02f8bc5>] tcp_v4_do_rcv+0x2b/0x310
 [<c02faa0b>] tcp_v4_rcv+0x82b/0x89d
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_input.c:2405 tcp_fastretrans_alert()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f05a3>] tcp_ack+0xd8c/0x17dd
 [<c02f3b03>] tcp_rcv_established+0x521/0x5f2
 [<c02f8bc5>] tcp_v4_do_rcv+0x2b/0x310
 [<c02faa0b>] tcp_v4_rcv+0x82b/0x89d
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_input.c:2306 tcp_try_to_open()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f0a8c>] tcp_ack+0x1275/0x17dd
 [<c02f3b03>] tcp_rcv_established+0x521/0x5f2
 [<c02f8bc5>] tcp_v4_do_rcv+0x2b/0x310
 [<c02faa0b>] tcp_v4_rcv+0x82b/0x89d
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================
WARNING: at net/ipv4/tcp_output.c:1807 tcp_simple_retransmit()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f6a79>] tcp_simple_retransmit+0xfa/0x185
 [<c02fa072>] tcp_v4_err+0x35d/0x4cb
 [<c0301f7d>] icmp_unreach+0x327/0x352
 [<c030159d>] icmp_rcv+0xe0/0xf7
 [<c02e2d75>] ip_local_deliver_finish+0x124/0x1ba
 [<c02e3178>] ip_local_deliver+0x72/0x7e
 [<c02e2c31>] ip_rcv_finish+0x299/0x2b9
 [<c02e30e8>] ip_rcv+0x1e1/0x1ff
 [<c02c755c>] netif_receive_skb+0x37d/0x401
 [<c02c9372>] process_backlog+0x5b/0x96
 [<c02c9037>] net_rx_action+0x87/0x152
 [<c0121c9f>] __do_softirq+0x38/0x7a
 [<c0105975>] do_softirq+0x41/0x92
 =======================


Cheers.

-- 
Guillaume





      _____________________________________________________________________________ 
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-11  1:39 Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks() Chazarain Guillaume
@ 2007-11-11 22:40 ` Ilpo Järvinen
  2007-11-13 21:35   ` Ilpo Järvinen
  0 siblings, 1 reply; 10+ messages in thread
From: Ilpo Järvinen @ 2007-11-11 22:40 UTC (permalink / raw)
  To: Chazarain Guillaume; +Cc: David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 12317 bytes --]

On Sun, 11 Nov 2007, Chazarain Guillaume wrote:

> > Do you have GSO enabled?
>
> According to ethtool -k, no.

Ok, thanks, it excludes lot of possibilities...

> > Is this reproducable?
>
> Unfortunately not, I saw it only once.

The messages you had in the other mail are very likely symptom of the 
same problem, it's just hard to tell from them where it really originates 
from (because it would requires expensive verification that nobody wants 
to do by default after simple operations). In many cases that WARN_ON is 
simply too late to tell when the problem causing adjustment/corruption 
occurred but it's still better than nothing as a starting point :-).

> > You can try to provoke it by setting tcp_sack sysctl 
> >  to 0 as this seems to be non-SACK related... If so, you could try the 
> > debug patch below
>
> > > # CONFIG_DEBUG_LIST is not set

> I'm currently running bittorrent with all of this, I just saw this (for 
> the first time ever), but otherwise it works fine:

WARNING: at net/ipv4/tcp_output.c:1807 tcp_simple_retransmit()
 [<c0104cb3>] show_trace_log_lvl+0x1a/0x2f
 [<c0105563>] show_trace+0x12/0x14
 [<c0105668>] dump_stack+0x15/0x17
 [<c02f6a79>] tcp_simple_retransmit+0xfa/0x185
 [<c02fa072>] tcp_v4_err+0x35d/0x4cb
 [<c0301f7d>] icmp_unreach+0x327/0x352

Hmm, that's related to path MTU things... It might have something to do 
with this... I'm not at all sure how it handles pcounts...

> > Have you run memtest recently?
>
> Just ran it with no errors for 6 minutes 30. The box is otherwise stable though.

Yeah, it's more likely a miscount somewhere rather than corruption but 
that wasn't obvious from the first mail...

...but alas, I haven't yet been able to come up with any theory on how 
a miscount could occur....

> I forgot to say that I have a kdump image of the crash (I had to 
> recompile this
>
> 2.6.24-rc2 kernel as I deleted its vmlinux), so I could check that you 
> are right on track with your assertions at the time of the crash.
>
> > +    if (WARN_ON(tcp_write_queue_head(sk) == NULL))
> > +        return;
> 
> (gdb) p sk->sk_write_queue.next
> $11 = (struct sk_buff *) 0xe43a04b0
> (gdb) p &sk->sk_write_queue
> $12 = (struct sk_buff_head *) 0xe43a04b0
> 
> 
> > +    if (WARN_ON(!tp->packets_out))
> > +        return;
> 
> (gdb) p ((struct tcp_sock *) sk)->packets_out
> $13 = 0

Yeah, they are expected, the write_queue is empty. Another cause for 
those could have been corrupted write_queue (that's why I asked for the 
list debugging).

> > +    if (tp->lost_out > tp->packets_out)
> > +        printk(KERN_ERR "Lost underflowed to %u\n", tp->lost_out);
>
> (gdb) p ((struct tcp_sock *) sk)->lost_out
> $14 = 4294967295

Underflows by one. ...We should just find out what causes this and fix 
that and we're done with it. :-)


> Some more gdb output for information:

Thanks about them, though they're not that useful because the problem 
occurred prior to its detection... :-)

> My naive attempt at understanding what's going on:
> 
> My oops starts with:
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 00000045
>
>
> gdb tells me the crash is in:
> #0  tcp_xmit_retransmit_queue (sk=0xe43a0440) at 
> net/ipv4/tcp_output.c:1962
> 1962                            __u8 sacked = TCP_SKB_CB(skb)->sacked;
>
> (gdb) p ((struct tcp_skb_cb *)((struct sk_buff *)0)->cb)->sacked
> Cannot access memory at address 0x45
> 
> A 0x45 offset is definitely a ->sacked on a null skb, but:

This is right.

> (gdb) p skb
> $5 = (struct sk_buff *) 0xe43a04b0
>
> which is sk->sk_write_queue so I don't understand why the 
> tcp_for_write_queue_from made an iteration.
>
> I don't know if gdb is playing tricks or if it's because I had to 
> recompile the crashing kernel.

No, it won't happen like that. ...I'd say that gdb is just confused. In 
case packets_out is zero (it occurs after a cumulative ACK only), for sure 
skb will become NULL because the retransmit_skb_hint was cleared due to 
cumulative ACK.

The crash location is the expected one in case packets_out gets zero 
during recovery and lost_out is miscounted/corrupt, as your dump shows.

Anyway, thanks for digging these out.


Here's a bruteforce patch below... Since you had couple of them during 
your overnight test, I'm sure it's relatively easy to catch... The 
first place where the tcp_verify_lost is triggered is the most 
interesting, rest are likely ripples due to that earlier corruption... 
(Hopefully I've placed them this time to places where both queue and 
lost_out states should agree, once did similar patch that had
incorrectly placed some verification calls which caused lot of spurious 
stacktraces :-))

...I left those !packets_out things there to prevent crashing when it 
occurs though it's not the main problem itself.

Please keep the tcp_sack set to 0, and once you have at least one 
stacktrace with it, you could try also with tcp_sack if the same thing 
occurs there as well.

--
[PATCH] TCP DEBUG

- Check if empty queue is passed to xmit_retrans...
- Print lost_out underflow value
- Track lost_out and LOST discrepancies everywhere (costs a bit).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h     |    2 ++
 net/ipv4/tcp_input.c  |   21 +++++++++++++++++++++
 net/ipv4/tcp_ipv4.c   |   19 +++++++++++++++++++
 net/ipv4/tcp_output.c |   14 ++++++++++++++
 4 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d695cea..a939bd5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void			tcp_verify_lost(struct sock *sk);
+
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void			tcp_shutdown (struct sock *sk, int how);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ca9590f..588b105 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1512,6 +1512,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 		flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq);
 
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	if ((reord < tp->fackets_out) && icsk->icsk_ca_state != TCP_CA_Loss &&
 	    (!tp->frto_highmark || after(tp->snd_una, tp->frto_highmark)))
@@ -1552,6 +1553,7 @@ static void tcp_add_reno_sack(struct sock *sk)
 	tp->sacked_out++;
 	tcp_check_reno_reordering(sk, 0);
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 }
 
 /* Account for ACK, ACKing some data in Reno Recovery phase. */
@@ -1569,6 +1571,7 @@ static void tcp_remove_reno_sacks(struct sock *sk, int acked)
 	}
 	tcp_check_reno_reordering(sk, acked);
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 }
 
 static inline void tcp_reset_reno_sack(struct tcp_sock *tp)
@@ -1670,6 +1673,7 @@ void tcp_enter_frto(struct sock *sk)
 		tp->retrans_out -= tcp_skb_pcount(skb);
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	/* Earlier loss recovery underway (see RFC4138; Appendix B).
 	 * The last condition is necessary at least in tp->frto_counter case.
@@ -1727,6 +1731,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
 	tp->snd_cwnd_cnt = 0;
@@ -1812,6 +1817,7 @@ void tcp_enter_loss(struct sock *sk, int how)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	tp->reordering = min_t(unsigned int, tp->reordering,
 					     sysctl_tcp_reordering);
@@ -2044,6 +2050,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 }
 
 /* Account newly detected lost packet(s) */
@@ -2088,6 +2095,7 @@ static void tcp_update_scoreboard(struct sock *sk)
 		tp->scoreboard_skb_hint = skb;
 
 		tcp_verify_left_out(tp);
+		tcp_verify_lost(sk);
 	}
 }
 
@@ -2304,6 +2312,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	if (tp->retrans_out == 0)
 		tp->retrans_stamp = 0;
@@ -2403,6 +2412,7 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 
 	/* D. Check consistency of the current state. */
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	/* E. Check state exit conditions. State can be terminated
 	 *    when high_seq is ACKed. */
@@ -2521,6 +2531,12 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (do_lost || tcp_head_timedout(sk))
 		tcp_update_scoreboard(sk);
 	tcp_cwnd_down(sk, flag);
+	
+	if (WARN_ON(tcp_write_queue_head(sk) == NULL))
+		return;
+	if (WARN_ON(!tp->packets_out))
+		return;
+	
 	tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2721,6 +2737,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 		sk_stream_free_skb(sk, skb);
 		tcp_clear_all_retrans_hints(tp);
 	}
+	
+	tcp_verify_lost(sk);
 
 	if (flag & FLAG_ACKED) {
 		u32 pkts_acked = prior_packets - tp->packets_out;
@@ -2759,6 +2777,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 #if FASTRETRANS_DEBUG > 0
 	BUG_TRAP((int)tp->sacked_out >= 0);
 	BUG_TRAP((int)tp->lost_out >= 0);
+	if (tp->lost_out > tp->packets_out)
+		printk(KERN_ERR "Lost underflowed to %u\n", tp->lost_out);
 	BUG_TRAP((int)tp->retrans_out >= 0);
 	if (!tp->packets_out && tcp_is_sack(tp)) {
 		icsk = inet_csk(sk);
@@ -2931,6 +2951,7 @@ static int tcp_process_frto(struct sock *sk, int flag)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	tcp_verify_left_out(tp);
+	tcp_verify_lost(sk);
 
 	/* Duplicate the behavior from Loss state (fastretrans_alert) */
 	if (flag&FLAG_DATA_ACKED)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e566f3c..5e10d90 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,25 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_verify_lost(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 lost = 0;
+	struct sk_buff *skb;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			lost += tcp_skb_pcount(skb);
+	}
+	
+	if (WARN_ON(lost != tp->lost_out)) {
+		printk(KERN_ERR "Lost: %u vs %u, %u (%d)\n", lost, tp->lost_out,
+		       tp->packets_out, tcp_is_sack(tp));
+	}
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 324b420..09260ac 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -779,6 +779,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
 
+	tcp_verify_lost(sk);
+
 	return 0;
 }
 
@@ -1443,10 +1445,12 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 
 	/* Do MTU probing. */
 	if ((result = tcp_mtu_probe(sk)) == 0) {
+		tcp_verify_lost(sk);
 		return 0;
 	} else if (result > 0) {
 		sent_pkts = 1;
 	}
+	tcp_verify_lost(sk);
 
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
@@ -1767,6 +1771,8 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		}
 
 		sk_stream_free_skb(sk, next_skb);
+		
+		tcp_verify_lost(sk);
 	}
 }
 
@@ -1798,6 +1804,8 @@ void tcp_simple_retransmit(struct sock *sk)
 			}
 		}
 	}
+	
+	tcp_verify_lost(sk);
 
 	tcp_clear_all_retrans_hints(tp);
 
@@ -1819,6 +1827,8 @@ void tcp_simple_retransmit(struct sock *sk)
 		tcp_set_ca_state(sk, TCP_CA_Loss);
 	}
 	tcp_xmit_retransmit_queue(sk);
+	
+	tcp_verify_lost(sk);
 }
 
 /* This retransmits one SKB.  Policy decisions and retransmit queue
@@ -2000,6 +2010,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			}
 		}
 	}
+	
+	tcp_verify_lost(sk);
 
 	/* OK, demanded retransmission is finished. */
 
@@ -2058,6 +2070,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
 	}
+	
+	tcp_verify_lost(sk);
 }
 
 
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-11 22:40 ` Ilpo Järvinen
@ 2007-11-13 21:35   ` Ilpo Järvinen
  2007-11-14  5:04     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Ilpo Järvinen @ 2007-11-13 21:35 UTC (permalink / raw)
  To: Chazarain Guillaume, David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1908 bytes --]

On Mon, 12 Nov 2007, Ilpo Järvinen wrote:

> Yeah, it's more likely a miscount somewhere rather than corruption but 
> that wasn't obvious from the first mail...
> 
> ...but alas, I haven't yet been able to come up with any theory on how 
> a miscount could occur....

Cancel that, first idea is presented in this patch (not sure if it's one 
that fixes your symptoms, but at least it seems a potential place where 
such thing could happen, no idea what events can cause that to occur 
though :-():

--
[PATCH] [TCP] FRTO: Plug potential LOST-bit leak

It might be possible that, in some extreme scenario that
I just cannot now construct in my mind, end_seq <=
frto_highmark check does not match causing the lost_out
and LOST bits become out-of-sync due to clearing and
recounting in the loop.

This may fix LOST-bit leak reported by Chazarain Guillaume
<guichaz@yahoo.fr>.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 23a0092..cc358d4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1706,6 +1706,8 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 	tcp_for_write_queue(skb, sk) {
 		if (skb == tcp_send_head(sk))
 			break;
+
+		TCP_SKB_CB(skb)->sacked &= ~TCPCB_LOST;
 		/*
 		 * Count the retransmission made on RTO correctly (only when
 		 * waiting for the first ACK and did not get it)...
@@ -1719,7 +1721,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 		} else {
 			if (TCP_SKB_CB(skb)->sacked & TCPCB_RETRANS)
 				tp->undo_marker = 0;
-			TCP_SKB_CB(skb)->sacked &= ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
+			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_RETRANS;
 		}
 
 		/* Don't lost mark skbs that were fwd transmitted after RTO */
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-13 21:35   ` Ilpo Järvinen
@ 2007-11-14  5:04     ` David Miller
  2007-11-14 13:32       ` Ilpo Järvinen
  2007-11-15 10:31       ` Guillaume Chazarain
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2007-11-14  5:04 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: guichaz, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Tue, 13 Nov 2007 23:35:39 +0200 (EET)

> [PATCH] [TCP] FRTO: Plug potential LOST-bit leak
> 
> It might be possible that, in some extreme scenario that
> I just cannot now construct in my mind, end_seq <=
> frto_highmark check does not match causing the lost_out
> and LOST bits become out-of-sync due to clearing and
> recounting in the loop.
> 
> This may fix LOST-bit leak reported by Chazarain Guillaume
> <guichaz@yahoo.fr>.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

This patch looks correct to me, so I added it to net-2.6

Chazarain please let us know if it does indeed cure your
problem.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-14  5:04     ` David Miller
@ 2007-11-14 13:32       ` Ilpo Järvinen
  2007-11-14 23:55         ` David Miller
  2007-11-15 10:31       ` Guillaume Chazarain
  1 sibling, 1 reply; 10+ messages in thread
From: Ilpo Järvinen @ 2007-11-14 13:32 UTC (permalink / raw)
  To: David Miller; +Cc: guichaz, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2453 bytes --]

On Tue, 13 Nov 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Tue, 13 Nov 2007 23:35:39 +0200 (EET)
> 
> > [PATCH] [TCP] FRTO: Plug potential LOST-bit leak
> > 
> > It might be possible that, in some extreme scenario that
> > I just cannot now construct in my mind, end_seq <=
> > frto_highmark check does not match causing the lost_out
> > and LOST bits become out-of-sync due to clearing and
> > recounting in the loop.
> > 
> > This may fix LOST-bit leak reported by Chazarain Guillaume
> > <guichaz@yahoo.fr>.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> This patch looks correct to me, so I added it to net-2.6
> 
> Chazarain please let us know if it does indeed cure your
> problem.

Ok, now after one night more, I think I know what it was, this
indeed "cured" it (and IMHO we can leave it there too).

...But here's a fix that very well explains why the frto_highmark
check could give a bit strange results :-).

--
[PATCH] [TCP] FRTO: Clear frto_highmark only after process_frto that uses it

I broke this in commit 3de96471bd7fb76406e975ef6387abe3a0698149.
tcp_process_frto should always see a valid frto_highmark. An
invalid frto_highmark (zero) is very likely what ultimately
caused a seqno compare in tcp_frto_enter_loss to do the wrong
leading to the LOST-bit leak.

Having LOST-bits integry ensured like done after commit
23aeeec365dcf8bc87fae44c533e50d0bb4f23cc won't hurt. It may
still be useful in some other, possibly legimate, scenario.

Reported by Chazarain Guillaume <guichaz@yahoo.fr>.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3f126ec..0f0c1c9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3113,11 +3113,11 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	/* See if we can take anything off of the retransmit queue. */
 	flag |= tcp_clean_rtx_queue(sk, &seq_rtt, prior_fackets);
 
+	if (tp->frto_counter)
+		frto_cwnd = tcp_process_frto(sk, flag);
 	/* Guarantee sacktag reordering detection against wrap-arounds */
 	if (before(tp->frto_highmark, tp->snd_una))
 		tp->frto_highmark = 0;
-	if (tp->frto_counter)
-		frto_cwnd = tcp_process_frto(sk, flag);
 
 	if (tcp_ack_is_dubious(sk, flag)) {
 		/* Advance CWND, if state allows this. */
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-14 13:32       ` Ilpo Järvinen
@ 2007-11-14 23:55         ` David Miller
  2007-11-15  8:11           ` Ilpo Järvinen
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2007-11-14 23:55 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: guichaz, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 14 Nov 2007 15:32:58 +0200 (EET)

> [PATCH] [TCP] FRTO: Clear frto_highmark only after process_frto that uses it
> 
> I broke this in commit 3de96471bd7fb76406e975ef6387abe3a0698149.
> tcp_process_frto should always see a valid frto_highmark. An
> invalid frto_highmark (zero) is very likely what ultimately
> caused a seqno compare in tcp_frto_enter_loss to do the wrong
> leading to the LOST-bit leak.
> 
> Having LOST-bits integry ensured like done after commit
> 23aeeec365dcf8bc87fae44c533e50d0bb4f23cc won't hurt. It may
> still be useful in some other, possibly legimate, scenario.
> 
> Reported by Chazarain Guillaume <guichaz@yahoo.fr>.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

Thanks for making such an incredibly thorough investigation
into this bug!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-14 23:55         ` David Miller
@ 2007-11-15  8:11           ` Ilpo Järvinen
  0 siblings, 0 replies; 10+ messages in thread
From: Ilpo Järvinen @ 2007-11-15  8:11 UTC (permalink / raw)
  To: David Miller; +Cc: guichaz, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1430 bytes --]

On Wed, 14 Nov 2007, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Wed, 14 Nov 2007 15:32:58 +0200 (EET)
> 
> > [PATCH] [TCP] FRTO: Clear frto_highmark only after process_frto that uses it
> > 
> > I broke this in commit 3de96471bd7fb76406e975ef6387abe3a0698149.
> > tcp_process_frto should always see a valid frto_highmark. An
> > invalid frto_highmark (zero) is very likely what ultimately
> > caused a seqno compare in tcp_frto_enter_loss to do the wrong
> > leading to the LOST-bit leak.
> > 
> > Having LOST-bits integry ensured like done after commit
> > 23aeeec365dcf8bc87fae44c533e50d0bb4f23cc won't hurt. It may
> > still be useful in some other, possibly legimate, scenario.
> > 
> > Reported by Chazarain Guillaume <guichaz@yahoo.fr>.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Applied.
> 
> Thanks for making such an incredibly thorough investigation
> into this bug!

I suppose this bug also caused all those spurious rtos I used to see with 
my home connection (~10% of all RTOs during 10M scp transfer). They seemed 
a bit out of place because it's all wired and low RTT. Though there are bw 
limits enforced by ISP which I first suspected could cause it, except for 
suspecting bug in my code of course :-). ...It seems I can drop 
investigating them now since last evening test run gave 0 spurious
RTOs :-).

Thanks Chazarain for you report.

-- 
 i.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-14  5:04     ` David Miller
  2007-11-14 13:32       ` Ilpo Järvinen
@ 2007-11-15 10:31       ` Guillaume Chazarain
  2007-11-15 11:51         ` Ilpo Järvinen
  1 sibling, 1 reply; 10+ messages in thread
From: Guillaume Chazarain @ 2007-11-15 10:31 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, netdev

David Miller <davem@davemloft.net> wrote:

> Chazarain please let us know if it does indeed cure your
> problem.

Unfortunately, I couldn't manage to reproduce the problem with an
unpatched kernel. But your investigation Ilpo was really impressive.

BTW, even though I messed up the yahoo webmail configuration, you can
call me by my first name: Guillaume ;-)

Thanks again for such an awesome bug fixing attitude!

-- 
Guillaume

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks()
  2007-11-15 10:31       ` Guillaume Chazarain
@ 2007-11-15 11:51         ` Ilpo Järvinen
  0 siblings, 0 replies; 10+ messages in thread
From: Ilpo Järvinen @ 2007-11-15 11:51 UTC (permalink / raw)
  To: Guillaume Chazarain; +Cc: David Miller, Netdev

On Thu, 15 Nov 2007, Guillaume Chazarain wrote:

> David Miller <davem@davemloft.net> wrote:
> 
> > Chazarain please let us know if it does indeed cure your
> > problem.
> 
> Unfortunately, I couldn't manage to reproduce the problem with an
> unpatched kernel. But your investigation Ilpo was really impressive.

These are usually very sensitive on other traffic because even a simple 
change in packet pattern changes behavior enough for it do disappear.
The same thing occurred with the month ago fackets_out miscount as 
well, at different weekday it just wasn't reproducable. ...Anyway, I'm 
pretty sure it's now fixed because there's a simple explination to it 
due to the frto_highmark premature clearing bug. But if you would still 
end up seeing them after that, make sure to report it... :-)

> BTW, even though I messed up the yahoo webmail configuration, you can
> call me by my first name: Guillaume ;-)

Fair enough. :-)

> Thanks again for such an awesome bug fixing attitude!

The best thing is that usually when forced to really think what could go 
wrong, also other, unrelated bugs seem to come up, though up to 10%
of the initial oh-nos end up being genuine bugs. ...Thus I still have 
couple of miscount-due-to-GSO&hints fixes to do as a result of this 
venture besides the problems already fixed.

-- 
 i.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-11-15 11:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-11  1:39 Re : Oops preceded by WARNING: at net/ipv4/tcp_input.c:1571 tcp_remove_reno_sacks() Chazarain Guillaume
2007-11-11 22:40 ` Ilpo Järvinen
2007-11-13 21:35   ` Ilpo Järvinen
2007-11-14  5:04     ` David Miller
2007-11-14 13:32       ` Ilpo Järvinen
2007-11-14 23:55         ` David Miller
2007-11-15  8:11           ` Ilpo Järvinen
2007-11-15 10:31       ` Guillaume Chazarain
2007-11-15 11:51         ` Ilpo Järvinen
  -- strict thread matches above, loose matches on Subject: below --
2007-11-11 12:53 Chazarain Guillaume

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).