From: Daniel Petre <daniel.petre@rcs-rds.ro>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach
Date: Wed, 22 May 2013 18:40:32 +0300 [thread overview]
Message-ID: <519CE6F0.3040703@rcs-rds.ro> (raw)
In-Reply-To: <1369230739.3301.334.camel@edumazet-glaptop>
On 05/22/2013 04:52 PM, Eric Dumazet wrote:
> On Wed, 2013-05-22 at 14:49 +0300, Daniel Petre wrote:
>
>> Hello Eric,
>> some machines have e1000e others have tg3 (with mtu 1524) then we have
>> few gre tunnels on top of the downlink ethernet and the traffic goes up
>> the router via the second ethernet interface, nothing complicated.
>>
>
> The crash by the way is happening in icmp_send() called from
> ipv4_link_failure(), called from ip_tunnel_xmit() when IPv6 destination
> cannot be reached.
>
> Your patch therefore should not 'avoid' the problem ...
>
> My guess is kernel stack is too small to afford icmp_send() being called
> twice (recursively)
>
> Could you try :
>
Hello Eric,
thanks for the patch, we managed to compile and push the kernel live,
it went in panic when we shut the port to the server..
crash> bt
PID: 0 TASK: ffffffff81813420 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88003fc05df0] machine_kexec at ffffffff81027430
#1 [ffff88003fc05e40] crash_kexec at ffffffff8107da80
#2 [ffff88003fc05f10] oops_end at ffffffff81005bf8
#3 [ffff88003fc05f30] do_stack_segment at ffffffff8100365f
#4 [ffff88003fc05f50] retint_signal at ffffffff81542d12
[exception RIP: __kmalloc+144]
RIP: ffffffff810d0a20 RSP: ffff88003fc03a30 RFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88003d672a00 RCX: 00000000003c1bf9
RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI: 0000000000013ba0
RBP: 37f5089fae060a80 R8: ffffffff814d5def R9: ffff88003fc03a80
R10: 00000000557809c3 R11: ffff88003e1053c0 R12: ffff88003e001240
R13: 0000000000008020 R14: 0000000000000000 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <STACKFAULT exception stack> ---
#5 [ffff88003fc03a30] __kmalloc at ffffffff810d0a20
#6 [ffff88003fc03a58] icmp_send at ffffffff814d5def
#7 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#8 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#9 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#10 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596
#11 [ffff88003fc03ce8] __netif_receive_skb at ffffffff8146ed13
#12 [ffff88003fc03d88] napi_gro_receive at ffffffff8146fc50
#13 [ffff88003fc03da8] e1000_clean_rx_irq at ffffffff813bc67b
#14 [ffff88003fc03e48] e1000e_poll at ffffffff813c3a20
#15 [ffff88003fc03e98] net_rx_action at ffffffff8146f796
#16 [ffff88003fc03ee8] __do_softirq at ffffffff8103ebb9
#17 [ffff88003fc03f38] segment_not_present at ffffffff8154438c
#18 [ffff88003fc03f70] irq_exit at ffffffff8103e9cd
#19 [ffff88003fc03f80] do_IRQ at ffffffff81003f6c
#20 [ffff88003fc03fb0] save_paranoid at ffffffff81542b6a
--- <IRQ stack> ---
#21 [ffffffff81801ea8] save_paranoid at ffffffff81542b6a
[exception RIP: mwait_idle+95]
RIP: ffffffff8100ad8f RSP: ffffffff81801f50 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff8154189e RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81801fd8 RDI: ffff88003fc0d840
RBP: ffffffff8185be80 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81813420 R14: ffff88003fc11000 R15: ffffffff81813420
ORIG_RAX: ffffffffffffff1e CS: 0010 SS: 0018
#22 [ffffffff81801f50] cpu_idle at ffffffff8100b126
---------------------
[ 645.650121] e1000e: eth3 NIC Link is Down
[ 664.596968] stack segment: 0000 [#1] SMP
[ 664.597121] Modules linked in: coretemp
[ 664.597264] CPU 0
[ 664.597309] Pid: 0, comm: swapper/0 Not tainted 3.8.13 #4 IBM IBM
System x3250 M2
[ 664.597447] RIP: 0010:[<ffffffff810d0a20>] [<ffffffff810d0a20>]
__kmalloc+0x90/0x180
[ 664.597559] RSP: 0018:ffff88003fc03a30 EFLAGS: 00010202
[ 664.597621] RAX: 0000000000000000 RBX: ffff88003d672a00 RCX:
00000000003c1bf9
[ 664.597687] RDX: 00000000003c1bf8 RSI: 0000000000008020 RDI:
0000000000013ba0
[ 664.597752] RBP: 37f5089fae060a80 R08: ffffffff814d5def R09:
ffff88003fc03a80
[ 664.597817] R10: 00000000557809c3 R11: ffff88003e1053c0 R12:
ffff88003e001240
[ 664.597882] R13: 0000000000008020 R14: 0000000000000000 R15:
0000000000000001
[ 664.597948] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[ 664.598015] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 664.598077] CR2: 00007fefa9e458e0 CR3: 000000003d848000 CR4:
00000000000007f0
[ 664.598143] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 664.598208] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 664.598273] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
task ffffffff81813420)
[ 664.598340] Stack:
[ 664.598396] 00000000c3097855 ffff88003d672a00 0000000000000003
0000000000000001
[ 664.598627] ffff880039ead70e ffffffff814d5def ffff88003ce11840
0000000000000246
[ 664.598859] ffff88003d0b4000 ffffffff814a2beb 0000000000010018
ffff88003e1053c0
[ 664.599090] Call Trace:
[ 664.599147] <IRQ>
[ 664.599190]
[ 664.599289] [<ffffffff814d5def>] ? icmp_send+0x11f/0x390
[ 664.599353] [<ffffffff814a2beb>] ? __ip_rt_update_pmtu+0xbb/0x110
[ 664.599418] [<ffffffff814a1795>] ? ipv4_link_failure+0x15/0x60
[ 664.599482] [<ffffffff814e78b5>] ? ipgre_tunnel_xmit+0x7f5/0x9f0
[ 664.599547] [<ffffffff8146e032>] ? dev_hard_start_xmit+0x102/0x490
[ 664.599612] [<ffffffff81487d66>] ? sch_direct_xmit+0x106/0x1e0
[ 664.599676] [<ffffffff81487efd>] ? __qdisc_run+0xbd/0x150
[ 664.599739] [<ffffffff8146e5a7>] ? dev_queue_xmit+0x1e7/0x3a0
[ 664.600002] [<ffffffff814ab596>] ? ip_finish_output+0x2e6/0x3e0
[ 664.600002] [<ffffffff8146ed13>] ? __netif_receive_skb+0x5b3/0x7c0
[ 664.600002] [<ffffffff8146f114>] ? netif_receive_skb+0x24/0x80
[ 664.600002] [<ffffffff8146fc50>] ? napi_gro_receive+0x110/0x140
[ 664.600002] [<ffffffff813bc67b>] ? e1000_clean_rx_irq+0x29b/0x490
[ 664.600002] [<ffffffff813c3a20>] ? e1000e_poll+0x90/0x3a0
[ 664.600002] [<ffffffff8146f796>] ? net_rx_action+0xc6/0x1e0
[ 664.600002] [<ffffffff8103ebb9>] ? __do_softirq+0xa9/0x170
[ 664.600002] [<ffffffff8154438c>] ? call_softirq+0x1c/0x30
[ 664.600002] [<ffffffff810047dd>] ? do_softirq+0x4d/0x80
[ 664.600002] [<ffffffff8103e9cd>] ? irq_exit+0x7d/0x90
[ 664.600002] [<ffffffff81003f6c>] ? do_IRQ+0x5c/0xd0
[ 664.600002] [<ffffffff81542b6a>] ? common_interrupt+0x6a/0x6a
[ 664.600002] <EOI>
[ 664.600002]
[ 664.600002] [<ffffffff8154189e>] ? __schedule+0x26e/0x5b0
[ 664.600002] [<ffffffff8100ad8f>] ? mwait_idle+0x5f/0x70
[ 664.600002] [<ffffffff8100b126>] ? cpu_idle+0xf6/0x110
[ 664.600002] [<ffffffff81875c58>] ? start_kernel+0x33d/0x348
[ 664.600002] [<ffffffff8187573b>] ? repair_env_string+0x5b/0x5b
[ 664.600002] [<ffffffff8187541d>] ? x86_64_start_kernel+0xee/0xf2
[ 664.600002] Code: 28 49 8b 0c 24 65 48 03 0c 25 88 cc 00 00 48 8b 51
08 48 8b 29 48 85 ed 0f 84 d3 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d
4a 01 <48> 8b 5c 05 00 48 89 e8 65 48 0f c7 0f 0f 94 c0 3c 01 75 c2 49
[ 664.600002] RIP [<ffffffff810d0a20>] __kmalloc+0x90/0x180
[ 664.600002] RSP <ffff88003fc03a30>
> net/ipv4/icmp.c | 72 ++++++++++++++++++++++++----------------------
> 1 file changed, 38 insertions(+), 34 deletions(-)
>
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 76e10b4..e33f3b0 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -208,7 +208,7 @@ static struct sock *icmp_sk(struct net *net)
> return net->ipv4.icmp_sk[smp_processor_id()];
> }
>
> -static inline struct sock *icmp_xmit_lock(struct net *net)
> +static struct sock *icmp_xmit_lock(struct net *net)
> {
> struct sock *sk;
>
> @@ -226,7 +226,7 @@ static inline struct sock *icmp_xmit_lock(struct net *net)
> return sk;
> }
>
> -static inline void icmp_xmit_unlock(struct sock *sk)
> +static void icmp_xmit_unlock(struct sock *sk)
> {
> spin_unlock_bh(&sk->sk_lock.slock);
> }
> @@ -235,8 +235,8 @@ static inline void icmp_xmit_unlock(struct sock *sk)
> * Send an ICMP frame.
> */
>
> -static inline bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
> - struct flowi4 *fl4, int type, int code)
> +static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
> + struct flowi4 *fl4, int type, int code)
> {
> struct dst_entry *dst = &rt->dst;
> bool rc = true;
> @@ -375,19 +375,22 @@ out_unlock:
> icmp_xmit_unlock(sk);
> }
>
> -static struct rtable *icmp_route_lookup(struct net *net,
> - struct flowi4 *fl4,
> - struct sk_buff *skb_in,
> - const struct iphdr *iph,
> - __be32 saddr, u8 tos,
> - int type, int code,
> - struct icmp_bxm *param)
> +struct icmp_send_data {
> + struct icmp_bxm icmp_param;
> + struct ipcm_cookie ipc;
> + struct flowi4 fl4;
> +};
> +
> +static noinline_for_stack struct rtable *
> +icmp_route_lookup(struct net *net, struct flowi4 *fl4,
> + struct sk_buff *skb_in, const struct iphdr *iph,
> + __be32 saddr, u8 tos, int type, int code,
> + struct icmp_bxm *param)
> {
> struct rtable *rt, *rt2;
> struct flowi4 fl4_dec;
> int err;
>
> - memset(fl4, 0, sizeof(*fl4));
> fl4->daddr = (param->replyopts.opt.opt.srr ?
> param->replyopts.opt.opt.faddr : iph->saddr);
> fl4->saddr = saddr;
> @@ -482,14 +485,12 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> {
> struct iphdr *iph;
> int room;
> - struct icmp_bxm icmp_param;
> struct rtable *rt = skb_rtable(skb_in);
> - struct ipcm_cookie ipc;
> - struct flowi4 fl4;
> __be32 saddr;
> u8 tos;
> struct net *net;
> struct sock *sk;
> + struct icmp_send_data *data = NULL;
>
> if (!rt)
> goto out;
> @@ -585,7 +586,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> IPTOS_PREC_INTERNETCONTROL) :
> iph->tos;
>
> - if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
> + data = kzalloc(sizeof(*data), GFP_ATOMIC);
> + if (!data)
> + goto out_unlock;
> +
> + if (ip_options_echo(&data->icmp_param.replyopts.opt.opt, skb_in))
> goto out_unlock;
>
>
> @@ -593,23 +598,21 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> * Prepare data for ICMP header.
> */
>
> - icmp_param.data.icmph.type = type;
> - icmp_param.data.icmph.code = code;
> - icmp_param.data.icmph.un.gateway = info;
> - icmp_param.data.icmph.checksum = 0;
> - icmp_param.skb = skb_in;
> - icmp_param.offset = skb_network_offset(skb_in);
> + data->icmp_param.data.icmph.type = type;
> + data->icmp_param.data.icmph.code = code;
> + data->icmp_param.data.icmph.un.gateway = info;
> + data->icmp_param.skb = skb_in;
> + data->icmp_param.offset = skb_network_offset(skb_in);
> inet_sk(sk)->tos = tos;
> - ipc.addr = iph->saddr;
> - ipc.opt = &icmp_param.replyopts.opt;
> - ipc.tx_flags = 0;
> + data->ipc.addr = iph->saddr;
> + data->ipc.opt = &data->icmp_param.replyopts.opt;
>
> - rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
> - type, code, &icmp_param);
> + rt = icmp_route_lookup(net, &data->fl4, skb_in, iph, saddr, tos,
> + type, code, &data->icmp_param);
> if (IS_ERR(rt))
> goto out_unlock;
>
> - if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
> + if (!icmpv4_xrlim_allow(net, rt, &data->fl4, type, code))
> goto ende;
>
> /* RFC says return as much as we can without exceeding 576 bytes. */
> @@ -617,19 +620,20 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> room = dst_mtu(&rt->dst);
> if (room > 576)
> room = 576;
> - room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
> + room -= sizeof(struct iphdr) + data->icmp_param.replyopts.opt.opt.optlen;
> room -= sizeof(struct icmphdr);
>
> - icmp_param.data_len = skb_in->len - icmp_param.offset;
> - if (icmp_param.data_len > room)
> - icmp_param.data_len = room;
> - icmp_param.head_len = sizeof(struct icmphdr);
> + data->icmp_param.data_len = skb_in->len - data->icmp_param.offset;
> + if (data->icmp_param.data_len > room)
> + data->icmp_param.data_len = room;
> + data->icmp_param.head_len = sizeof(struct icmphdr);
>
> - icmp_push_reply(&icmp_param, &fl4, &ipc, &rt);
> + icmp_push_reply(&data->icmp_param, &data->fl4, &data->ipc, &rt);
> ende:
> ip_rt_put(rt);
> out_unlock:
> icmp_xmit_unlock(sk);
> + kfree(data);
> out:;
> }
> EXPORT_SYMBOL(icmp_send);
>
>
next prev parent reply other threads:[~2013-05-22 15:40 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-21 17:53 [PATCH] ip_gre: fix kernel panic with icmp_dest_unreach Daniel Petre
2013-05-21 18:51 ` David Miller
2013-05-21 21:01 ` Eric Dumazet
2013-05-22 8:36 ` Daniel Petre
2013-05-22 11:37 ` Eric Dumazet
2013-05-22 11:49 ` Daniel Petre
2013-05-22 11:53 ` Eric Dumazet
2013-05-22 13:52 ` Eric Dumazet
2013-05-22 15:40 ` Daniel Petre [this message]
2013-05-23 8:47 ` Daniel Petre
2013-05-23 15:53 ` Eric Dumazet
2013-05-23 16:59 ` Daniel Petre
2013-05-23 17:11 ` Eric Dumazet
2013-05-23 17:10 ` Eric Dumazet
2013-05-24 9:40 ` Daniel Petre
2013-05-24 13:47 ` Eric Dumazet
2013-05-24 15:49 ` [PATCH] ip_tunnel: " Eric Dumazet
2013-05-26 6:27 ` David Miller
-- strict thread matches above, loose matches on Subject: below --
2013-05-21 17:53 [PATCH] ip_gre: " Daniel Petre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=519CE6F0.3040703@rcs-rds.ro \
--to=daniel.petre@rcs-rds.ro \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.