netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* System hangs (unable to handle kernel paging request)
@ 2016-04-04  7:59 Oleksii Berezhniak
  2016-04-04 14:30 ` Bastien Philbert
  0 siblings, 1 reply; 4+ messages in thread
From: Oleksii Berezhniak @ 2016-04-04  7:59 UTC (permalink / raw)
  To: netdev

Good day.

We have PPPoE server with CentOS 7 (kernel 3.10.0-327.10.1.el7.dsip.x86_64)

We applied some PPPoE related patches to this kernel:

ppp: don't override sk->sk_state in pppoe_flush_dev()
ppp: fix pppoe_dev deletion condition in pppoe_release()
pppoe: fix memory corruption in padt work structure
pppoe: fix reference counting in PPPoE proxy

Also we built latest version of ixgbe driver from Intel.

Now we have crashes after approx. one week of uptime:

[545444.673270] BUG: unable to handle kernel paging request at ffff88a005040200
[545444.673306] IP: [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
[545444.673335] PGD 0
[545444.673348] Oops: 0000 [#1] SMP
[545444.673367] Modules linked in: arc4 ppp_mppe act_police cls_u32
sch_ingress sch_tbf pptp gre pppoe pppox ppp_generic slhc 8021q garp
stp mrp llc iptable_nat nf_conn
track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter xt_TCPMSS
iptable_mangle xt_CT nf_conntrack iptable_raw w83793 hwmon_vid
snd_hda_codec_realtek snd_hda_codec
_generic snd_hda_intel snd_hda_codec coretemp snd_hda_core iTCO_wdt
kvm iTCO_vendor_support snd_hwdep snd_seq snd_seq_device ipmi_ssif
ppdev lpc_ich snd_pcm pcspkr mfd_
core sg ipmi_si snd_timer snd i2c_i801 ipmi_msghandler ioatdma
parport_pc parport shpchp soundcore i7core_edac tpm_infineon edac_core
ip_tables ext4 mbcache jbd2 sd_mod
 crct10dif_generic crc_t10dif crct10dif_common syscopyarea sysfillrect
firewire_ohci sysimgblt i2c_algo_bit drm_kms_helper ata_generic
pata_acpi
[545444.674383]  ttm firewire_core crc_itu_t serio_raw drm ata_piix
libata crc32c_intel i2c_core ixgbe(OE) vxlan e1000e ip6_udp_tunnel
udp_tunnel aacraid dca ptp pps_co
re
[545444.674783] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           OE
------------   3.10.0-327.10.1.el7.dsip.x86_64 #1
[545444.675032] Hardware name: empty empty/S7010, BIOS 'V2.06  ' 03/31/2010
[545444.675162] task: ffff880139c55c00 ti: ffff880139c84000 task.ti:
ffff880139c84000
[545444.675400] RIP: 0010:[<ffffffff811c0e95>]  [<ffffffff811c0e95>]
kmem_cache_alloc+0x75/0x1d0
[545444.675641] RSP: 0018:ffff88023fc23ce8  EFLAGS: 00010286
[545444.675766] RAX: 0000000000000000 RBX: ffff8802302eab00 RCX:
000000010eb8edbe
[545444.676002] RDX: 000000010eb8edbd RSI: 0000000000000020 RDI:
ffff88013b803700
[545444.676237] RBP: ffff88023fc23d18 R08: 00000000000175a0 R09:
ffffffff81517e70
[545444.676472] R10: 000000000000006b R11: 0000000000000000 R12:
ffff88a005040200
[545444.676706] R13: 0000000000000020 R14: ffff88013b803700 R15:
ffff88013b803700
[545444.676942] FS:  0000000000000000(0000) GS:ffff88023fc20000(0000)
knlGS:0000000000000000
[545444.677180] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[545444.677307] CR2: ffff88a005040200 CR3: 0000000237e63000 CR4:
00000000000007e0
[545444.677543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[545444.677779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[545444.678014] Stack:
[545444.678127]  ffff880237ea2040 ffff8802302eab00 0000000000000280
0000000000000280
[545444.678370]  0000000000000006 ffff880236bb1b60 ffff88023fc23d40
ffffffff81517e70
[545444.678614]  0000000000000280 ffff8802302eab00 0000000000000000
ffff88023fc23d60
[545444.678857] Call Trace:
[545444.678973]  <IRQ>

[545444.678982]
[545444.679100]  [<ffffffff81517e70>] build_skb+0x30/0x1d0
[545444.679222]  [<ffffffff8151a973>] __alloc_rx_skb+0x63/0xb0
[545444.679349]  [<ffffffff8151a9db>] __netdev_alloc_skb+0x1b/0x40
[545444.679492]  [<ffffffffa0104d8e>] ixgbe_clean_rx_irq+0xee/0xa50 [ixgbe]
[545444.679624]  [<ffffffff8152862f>] ? __napi_complete+0x1f/0x30
[545444.679756]  [<ffffffffa0106738>] ixgbe_poll+0x2d8/0x6d0 [ixgbe]
[545444.679886]  [<ffffffff8152b092>] net_rx_action+0x152/0x240
[545444.680015]  [<ffffffff81084aef>] __do_softirq+0xef/0x280
[545444.680144]  [<ffffffff8164735c>] call_softirq+0x1c/0x30
[545444.680277]  [<ffffffff81016fc5>] do_softirq+0x65/0xa0
[545444.680402]  [<ffffffff81084e85>] irq_exit+0x115/0x120
[545444.680529]  [<ffffffff81647ef8>] do_IRQ+0x58/0xf0
[545444.680660]  [<ffffffff8163d1ad>] common_interrupt+0x6d/0x6d
[545444.680786]  <EOI>
[545444.680794]
[545444.680914]  [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
[545444.681041]  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
[545444.681168]  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
[545444.681297]  [<ffffffff810d62c5>] cpu_startup_entry+0x245/0x290
[545444.681427]  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
[545444.681554] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85
e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a
01 4d 8b 06 <49> 8b 1c 04 4c
89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
[545444.682056] RIP  [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
[545444.682186]  RSP <ffff88023fc23ce8>
[545444.682305] CR2: ffff88a005040200


Every time description and call stack are the same.

What can be cause of these crashes?

Thanks.

-- 
WBR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: System hangs (unable to handle kernel paging request)
  2016-04-04  7:59 System hangs (unable to handle kernel paging request) Oleksii Berezhniak
@ 2016-04-04 14:30 ` Bastien Philbert
  2016-04-04 15:01   ` Oleksii Berezhniak
  0 siblings, 1 reply; 4+ messages in thread
From: Bastien Philbert @ 2016-04-04 14:30 UTC (permalink / raw)
  To: Oleksii Berezhniak, netdev



On 2016-04-04 03:59 AM, Oleksii Berezhniak wrote:
> Good day.
> 
> We have PPPoE server with CentOS 7 (kernel 3.10.0-327.10.1.el7.dsip.x86_64)
> 
> We applied some PPPoE related patches to this kernel:
> 
> ppp: don't override sk->sk_state in pppoe_flush_dev()
> ppp: fix pppoe_dev deletion condition in pppoe_release()
> pppoe: fix memory corruption in padt work structure
> pppoe: fix reference counting in PPPoE proxy
> 
> Also we built latest version of ixgbe driver from Intel.
> 
> Now we have crashes after approx. one week of uptime:
> 
> [545444.673270] BUG: unable to handle kernel paging request at ffff88a005040200
> [545444.673306] IP: [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
> [545444.673335] PGD 0
> [545444.673348] Oops: 0000 [#1] SMP
> [545444.673367] Modules linked in: arc4 ppp_mppe act_police cls_u32
> sch_ingress sch_tbf pptp gre pppoe pppox ppp_generic slhc 8021q garp
> stp mrp llc iptable_nat nf_conn
> track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter xt_TCPMSS
> iptable_mangle xt_CT nf_conntrack iptable_raw w83793 hwmon_vid
> snd_hda_codec_realtek snd_hda_codec
> _generic snd_hda_intel snd_hda_codec coretemp snd_hda_core iTCO_wdt
> kvm iTCO_vendor_support snd_hwdep snd_seq snd_seq_device ipmi_ssif
> ppdev lpc_ich snd_pcm pcspkr mfd_
> core sg ipmi_si snd_timer snd i2c_i801 ipmi_msghandler ioatdma
> parport_pc parport shpchp soundcore i7core_edac tpm_infineon edac_core
> ip_tables ext4 mbcache jbd2 sd_mod
>  crct10dif_generic crc_t10dif crct10dif_common syscopyarea sysfillrect
> firewire_ohci sysimgblt i2c_algo_bit drm_kms_helper ata_generic
> pata_acpi
> [545444.674383]  ttm firewire_core crc_itu_t serio_raw drm ata_piix
> libata crc32c_intel i2c_core ixgbe(OE) vxlan e1000e ip6_udp_tunnel
> udp_tunnel aacraid dca ptp pps_co
> re
> [545444.674783] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           OE
> ------------   3.10.0-327.10.1.el7.dsip.x86_64 #1
> [545444.675032] Hardware name: empty empty/S7010, BIOS 'V2.06  ' 03/31/2010
> [545444.675162] task: ffff880139c55c00 ti: ffff880139c84000 task.ti:
> ffff880139c84000
> [545444.675400] RIP: 0010:[<ffffffff811c0e95>]  [<ffffffff811c0e95>]
> kmem_cache_alloc+0x75/0x1d0
> [545444.675641] RSP: 0018:ffff88023fc23ce8  EFLAGS: 00010286
> [545444.675766] RAX: 0000000000000000 RBX: ffff8802302eab00 RCX:
> 000000010eb8edbe
> [545444.676002] RDX: 000000010eb8edbd RSI: 0000000000000020 RDI:
> ffff88013b803700
> [545444.676237] RBP: ffff88023fc23d18 R08: 00000000000175a0 R09:
> ffffffff81517e70
> [545444.676472] R10: 000000000000006b R11: 0000000000000000 R12:
> ffff88a005040200
> [545444.676706] R13: 0000000000000020 R14: ffff88013b803700 R15:
> ffff88013b803700
> [545444.676942] FS:  0000000000000000(0000) GS:ffff88023fc20000(0000)
> knlGS:0000000000000000
> [545444.677180] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [545444.677307] CR2: ffff88a005040200 CR3: 0000000237e63000 CR4:
> 00000000000007e0
> [545444.677543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [545444.677779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [545444.678014] Stack:
> [545444.678127]  ffff880237ea2040 ffff8802302eab00 0000000000000280
> 0000000000000280
> [545444.678370]  0000000000000006 ffff880236bb1b60 ffff88023fc23d40
> ffffffff81517e70
> [545444.678614]  0000000000000280 ffff8802302eab00 0000000000000000
> ffff88023fc23d60
> [545444.678857] Call Trace:
> [545444.678973]  <IRQ>
> 
> [545444.678982]
> [545444.679100]  [<ffffffff81517e70>] build_skb+0x30/0x1d0
> [545444.679222]  [<ffffffff8151a973>] __alloc_rx_skb+0x63/0xb0
> [545444.679349]  [<ffffffff8151a9db>] __netdev_alloc_skb+0x1b/0x40
> [545444.679492]  [<ffffffffa0104d8e>] ixgbe_clean_rx_irq+0xee/0xa50 [ixgbe]
> [545444.679624]  [<ffffffff8152862f>] ? __napi_complete+0x1f/0x30
> [545444.679756]  [<ffffffffa0106738>] ixgbe_poll+0x2d8/0x6d0 [ixgbe]
> [545444.679886]  [<ffffffff8152b092>] net_rx_action+0x152/0x240
> [545444.680015]  [<ffffffff81084aef>] __do_softirq+0xef/0x280
> [545444.680144]  [<ffffffff8164735c>] call_softirq+0x1c/0x30
> [545444.680277]  [<ffffffff81016fc5>] do_softirq+0x65/0xa0
> [545444.680402]  [<ffffffff81084e85>] irq_exit+0x115/0x120
> [545444.680529]  [<ffffffff81647ef8>] do_IRQ+0x58/0xf0
> [545444.680660]  [<ffffffff8163d1ad>] common_interrupt+0x6d/0x6d
> [545444.680786]  <EOI>
> [545444.680794]
> [545444.680914]  [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
> [545444.681041]  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
> [545444.681168]  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
> [545444.681297]  [<ffffffff810d62c5>] cpu_startup_entry+0x245/0x290
> [545444.681427]  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
> [545444.681554] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85
> e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a
> 01 4d 8b 06 <49> 8b 1c 04 4c
> 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
> [545444.682056] RIP  [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
> [545444.682186]  RSP <ffff88023fc23ce8>
> [545444.682305] CR2: ffff88a005040200
> 
> 
> Every time description and call stack are the same.
> 
> What can be cause of these crashes?
> 
> Thanks.
> 
I am wondering if your kernel has this commit id, 32b3e08fff60494cd1d281a39b51583edfd2b18f.
As this seems to be added to fix issues that look very similar to the trace you are receiving.
Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: System hangs (unable to handle kernel paging request)
  2016-04-04 14:30 ` Bastien Philbert
@ 2016-04-04 15:01   ` Oleksii Berezhniak
  2016-04-04 17:50     ` Bastien Philbert
  0 siblings, 1 reply; 4+ messages in thread
From: Oleksii Berezhniak @ 2016-04-04 15:01 UTC (permalink / raw)
  To: netdev

Can you please point me to more detailed description of similar issues
that you mentioned?

I can only find this:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32b3e08fff60494cd1d281a39b51583edfd2b18f

But there are no any hangs. Only performance issues.

BTW, GRO (Generic Receive Offloading) is disabled on our network adapter.

2016-04-04 17:30 GMT+03:00 Bastien Philbert <bastienphilbert@gmail.com>:
>
>
> On 2016-04-04 03:59 AM, Oleksii Berezhniak wrote:
>> Good day.
>>
>> We have PPPoE server with CentOS 7 (kernel 3.10.0-327.10.1.el7.dsip.x86_64)
>>
>> We applied some PPPoE related patches to this kernel:
>>
>> ppp: don't override sk->sk_state in pppoe_flush_dev()
>> ppp: fix pppoe_dev deletion condition in pppoe_release()
>> pppoe: fix memory corruption in padt work structure
>> pppoe: fix reference counting in PPPoE proxy
>>
>> Also we built latest version of ixgbe driver from Intel.
>>
>> Now we have crashes after approx. one week of uptime:
>>
>> [545444.673270] BUG: unable to handle kernel paging request at ffff88a005040200
>> [545444.673306] IP: [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
>> [545444.673335] PGD 0
>> [545444.673348] Oops: 0000 [#1] SMP
>> [545444.673367] Modules linked in: arc4 ppp_mppe act_police cls_u32
>> sch_ingress sch_tbf pptp gre pppoe pppox ppp_generic slhc 8021q garp
>> stp mrp llc iptable_nat nf_conn
>> track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter xt_TCPMSS
>> iptable_mangle xt_CT nf_conntrack iptable_raw w83793 hwmon_vid
>> snd_hda_codec_realtek snd_hda_codec
>> _generic snd_hda_intel snd_hda_codec coretemp snd_hda_core iTCO_wdt
>> kvm iTCO_vendor_support snd_hwdep snd_seq snd_seq_device ipmi_ssif
>> ppdev lpc_ich snd_pcm pcspkr mfd_
>> core sg ipmi_si snd_timer snd i2c_i801 ipmi_msghandler ioatdma
>> parport_pc parport shpchp soundcore i7core_edac tpm_infineon edac_core
>> ip_tables ext4 mbcache jbd2 sd_mod
>>  crct10dif_generic crc_t10dif crct10dif_common syscopyarea sysfillrect
>> firewire_ohci sysimgblt i2c_algo_bit drm_kms_helper ata_generic
>> pata_acpi
>> [545444.674383]  ttm firewire_core crc_itu_t serio_raw drm ata_piix
>> libata crc32c_intel i2c_core ixgbe(OE) vxlan e1000e ip6_udp_tunnel
>> udp_tunnel aacraid dca ptp pps_co
>> re
>> [545444.674783] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           OE
>> ------------   3.10.0-327.10.1.el7.dsip.x86_64 #1
>> [545444.675032] Hardware name: empty empty/S7010, BIOS 'V2.06  ' 03/31/2010
>> [545444.675162] task: ffff880139c55c00 ti: ffff880139c84000 task.ti:
>> ffff880139c84000
>> [545444.675400] RIP: 0010:[<ffffffff811c0e95>]  [<ffffffff811c0e95>]
>> kmem_cache_alloc+0x75/0x1d0
>> [545444.675641] RSP: 0018:ffff88023fc23ce8  EFLAGS: 00010286
>> [545444.675766] RAX: 0000000000000000 RBX: ffff8802302eab00 RCX:
>> 000000010eb8edbe
>> [545444.676002] RDX: 000000010eb8edbd RSI: 0000000000000020 RDI:
>> ffff88013b803700
>> [545444.676237] RBP: ffff88023fc23d18 R08: 00000000000175a0 R09:
>> ffffffff81517e70
>> [545444.676472] R10: 000000000000006b R11: 0000000000000000 R12:
>> ffff88a005040200
>> [545444.676706] R13: 0000000000000020 R14: ffff88013b803700 R15:
>> ffff88013b803700
>> [545444.676942] FS:  0000000000000000(0000) GS:ffff88023fc20000(0000)
>> knlGS:0000000000000000
>> [545444.677180] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [545444.677307] CR2: ffff88a005040200 CR3: 0000000237e63000 CR4:
>> 00000000000007e0
>> [545444.677543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [545444.677779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [545444.678014] Stack:
>> [545444.678127]  ffff880237ea2040 ffff8802302eab00 0000000000000280
>> 0000000000000280
>> [545444.678370]  0000000000000006 ffff880236bb1b60 ffff88023fc23d40
>> ffffffff81517e70
>> [545444.678614]  0000000000000280 ffff8802302eab00 0000000000000000
>> ffff88023fc23d60
>> [545444.678857] Call Trace:
>> [545444.678973]  <IRQ>
>>
>> [545444.678982]
>> [545444.679100]  [<ffffffff81517e70>] build_skb+0x30/0x1d0
>> [545444.679222]  [<ffffffff8151a973>] __alloc_rx_skb+0x63/0xb0
>> [545444.679349]  [<ffffffff8151a9db>] __netdev_alloc_skb+0x1b/0x40
>> [545444.679492]  [<ffffffffa0104d8e>] ixgbe_clean_rx_irq+0xee/0xa50 [ixgbe]
>> [545444.679624]  [<ffffffff8152862f>] ? __napi_complete+0x1f/0x30
>> [545444.679756]  [<ffffffffa0106738>] ixgbe_poll+0x2d8/0x6d0 [ixgbe]
>> [545444.679886]  [<ffffffff8152b092>] net_rx_action+0x152/0x240
>> [545444.680015]  [<ffffffff81084aef>] __do_softirq+0xef/0x280
>> [545444.680144]  [<ffffffff8164735c>] call_softirq+0x1c/0x30
>> [545444.680277]  [<ffffffff81016fc5>] do_softirq+0x65/0xa0
>> [545444.680402]  [<ffffffff81084e85>] irq_exit+0x115/0x120
>> [545444.680529]  [<ffffffff81647ef8>] do_IRQ+0x58/0xf0
>> [545444.680660]  [<ffffffff8163d1ad>] common_interrupt+0x6d/0x6d
>> [545444.680786]  <EOI>
>> [545444.680794]
>> [545444.680914]  [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
>> [545444.681041]  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
>> [545444.681168]  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
>> [545444.681297]  [<ffffffff810d62c5>] cpu_startup_entry+0x245/0x290
>> [545444.681427]  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
>> [545444.681554] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85
>> e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a
>> 01 4d 8b 06 <49> 8b 1c 04 4c
>> 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
>> [545444.682056] RIP  [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
>> [545444.682186]  RSP <ffff88023fc23ce8>
>> [545444.682305] CR2: ffff88a005040200
>>
>>
>> Every time description and call stack are the same.
>>
>> What can be cause of these crashes?
>>
>> Thanks.
>>
> I am wondering if your kernel has this commit id, 32b3e08fff60494cd1d281a39b51583edfd2b18f.
> As this seems to be added to fix issues that look very similar to the trace you are receiving.
> Nick



-- 
WBR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: System hangs (unable to handle kernel paging request)
  2016-04-04 15:01   ` Oleksii Berezhniak
@ 2016-04-04 17:50     ` Bastien Philbert
  0 siblings, 0 replies; 4+ messages in thread
From: Bastien Philbert @ 2016-04-04 17:50 UTC (permalink / raw)
  To: Oleksii Berezhniak, netdev



On 2016-04-04 11:01 AM, Oleksii Berezhniak wrote:
> Can you please point me to more detailed description of similar issues
> that you mentioned?
> 
Mostly it's in reworks for the Intel Drivers related to improving performance in order
to avoid over usage of CPU leading to a soft lockup being found during kernel polling 
at high loads with millions of packets being send per second. In addition this has been
in various parts of these drivers so it's hard to find one exact detailed commit. However
I based my finding of this commit maybe helping you based on the release history of the
longterm kernel your using as the release date for that commit is way after your kernel
was released. However you may want to check if the commit with the id I sent you has 
been back ported to your kernel, if so and this is being *still* triggered then this
is probably a bug somewhere else.
Cheers,
Bastien
> I can only find this:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32b3e08fff60494cd1d281a39b51583edfd2b18f
> 
> But there are no any hangs. Only performance issues.
> 
> BTW, GRO (Generic Receive Offloading) is disabled on our network adapter.
> 
> 2016-04-04 17:30 GMT+03:00 Bastien Philbert <bastienphilbert@gmail.com>:
>>
>>
>> On 2016-04-04 03:59 AM, Oleksii Berezhniak wrote:
>>> Good day.
>>>
>>> We have PPPoE server with CentOS 7 (kernel 3.10.0-327.10.1.el7.dsip.x86_64)
>>>
>>> We applied some PPPoE related patches to this kernel:
>>>
>>> ppp: don't override sk->sk_state in pppoe_flush_dev()
>>> ppp: fix pppoe_dev deletion condition in pppoe_release()
>>> pppoe: fix memory corruption in padt work structure
>>> pppoe: fix reference counting in PPPoE proxy
>>>
>>> Also we built latest version of ixgbe driver from Intel.
>>>
>>> Now we have crashes after approx. one week of uptime:
>>>
>>> [545444.673270] BUG: unable to handle kernel paging request at ffff88a005040200
>>> [545444.673306] IP: [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
>>> [545444.673335] PGD 0
>>> [545444.673348] Oops: 0000 [#1] SMP
>>> [545444.673367] Modules linked in: arc4 ppp_mppe act_police cls_u32
>>> sch_ingress sch_tbf pptp gre pppoe pppox ppp_generic slhc 8021q garp
>>> stp mrp llc iptable_nat nf_conn
>>> track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_filter xt_TCPMSS
>>> iptable_mangle xt_CT nf_conntrack iptable_raw w83793 hwmon_vid
>>> snd_hda_codec_realtek snd_hda_codec
>>> _generic snd_hda_intel snd_hda_codec coretemp snd_hda_core iTCO_wdt
>>> kvm iTCO_vendor_support snd_hwdep snd_seq snd_seq_device ipmi_ssif
>>> ppdev lpc_ich snd_pcm pcspkr mfd_
>>> core sg ipmi_si snd_timer snd i2c_i801 ipmi_msghandler ioatdma
>>> parport_pc parport shpchp soundcore i7core_edac tpm_infineon edac_core
>>> ip_tables ext4 mbcache jbd2 sd_mod
>>>  crct10dif_generic crc_t10dif crct10dif_common syscopyarea sysfillrect
>>> firewire_ohci sysimgblt i2c_algo_bit drm_kms_helper ata_generic
>>> pata_acpi
>>> [545444.674383]  ttm firewire_core crc_itu_t serio_raw drm ata_piix
>>> libata crc32c_intel i2c_core ixgbe(OE) vxlan e1000e ip6_udp_tunnel
>>> udp_tunnel aacraid dca ptp pps_co
>>> re
>>> [545444.674783] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           OE
>>> ------------   3.10.0-327.10.1.el7.dsip.x86_64 #1
>>> [545444.675032] Hardware name: empty empty/S7010, BIOS 'V2.06  ' 03/31/2010
>>> [545444.675162] task: ffff880139c55c00 ti: ffff880139c84000 task.ti:
>>> ffff880139c84000
>>> [545444.675400] RIP: 0010:[<ffffffff811c0e95>]  [<ffffffff811c0e95>]
>>> kmem_cache_alloc+0x75/0x1d0
>>> [545444.675641] RSP: 0018:ffff88023fc23ce8  EFLAGS: 00010286
>>> [545444.675766] RAX: 0000000000000000 RBX: ffff8802302eab00 RCX:
>>> 000000010eb8edbe
>>> [545444.676002] RDX: 000000010eb8edbd RSI: 0000000000000020 RDI:
>>> ffff88013b803700
>>> [545444.676237] RBP: ffff88023fc23d18 R08: 00000000000175a0 R09:
>>> ffffffff81517e70
>>> [545444.676472] R10: 000000000000006b R11: 0000000000000000 R12:
>>> ffff88a005040200
>>> [545444.676706] R13: 0000000000000020 R14: ffff88013b803700 R15:
>>> ffff88013b803700
>>> [545444.676942] FS:  0000000000000000(0000) GS:ffff88023fc20000(0000)
>>> knlGS:0000000000000000
>>> [545444.677180] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [545444.677307] CR2: ffff88a005040200 CR3: 0000000237e63000 CR4:
>>> 00000000000007e0
>>> [545444.677543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [545444.677779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>> 0000000000000400
>>> [545444.678014] Stack:
>>> [545444.678127]  ffff880237ea2040 ffff8802302eab00 0000000000000280
>>> 0000000000000280
>>> [545444.678370]  0000000000000006 ffff880236bb1b60 ffff88023fc23d40
>>> ffffffff81517e70
>>> [545444.678614]  0000000000000280 ffff8802302eab00 0000000000000000
>>> ffff88023fc23d60
>>> [545444.678857] Call Trace:
>>> [545444.678973]  <IRQ>
>>>
>>> [545444.678982]
>>> [545444.679100]  [<ffffffff81517e70>] build_skb+0x30/0x1d0
>>> [545444.679222]  [<ffffffff8151a973>] __alloc_rx_skb+0x63/0xb0
>>> [545444.679349]  [<ffffffff8151a9db>] __netdev_alloc_skb+0x1b/0x40
>>> [545444.679492]  [<ffffffffa0104d8e>] ixgbe_clean_rx_irq+0xee/0xa50 [ixgbe]
>>> [545444.679624]  [<ffffffff8152862f>] ? __napi_complete+0x1f/0x30
>>> [545444.679756]  [<ffffffffa0106738>] ixgbe_poll+0x2d8/0x6d0 [ixgbe]
>>> [545444.679886]  [<ffffffff8152b092>] net_rx_action+0x152/0x240
>>> [545444.680015]  [<ffffffff81084aef>] __do_softirq+0xef/0x280
>>> [545444.680144]  [<ffffffff8164735c>] call_softirq+0x1c/0x30
>>> [545444.680277]  [<ffffffff81016fc5>] do_softirq+0x65/0xa0
>>> [545444.680402]  [<ffffffff81084e85>] irq_exit+0x115/0x120
>>> [545444.680529]  [<ffffffff81647ef8>] do_IRQ+0x58/0xf0
>>> [545444.680660]  [<ffffffff8163d1ad>] common_interrupt+0x6d/0x6d
>>> [545444.680786]  <EOI>
>>> [545444.680794]
>>> [545444.680914]  [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
>>> [545444.681041]  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
>>> [545444.681168]  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
>>> [545444.681297]  [<ffffffff810d62c5>] cpu_startup_entry+0x245/0x290
>>> [545444.681427]  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
>>> [545444.681554] Code: ce 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85
>>> e4 0f 84 1f 01 00 00 48 85 c0 0f 84 16 01 00 00 49 63 46 20 48 8d 4a
>>> 01 4d 8b 06 <49> 8b 1c 04 4c
>>> 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
>>> [545444.682056] RIP  [<ffffffff811c0e95>] kmem_cache_alloc+0x75/0x1d0
>>> [545444.682186]  RSP <ffff88023fc23ce8>
>>> [545444.682305] CR2: ffff88a005040200
>>>
>>>
>>> Every time description and call stack are the same.
>>>
>>> What can be cause of these crashes?
>>>
>>> Thanks.
>>>
>> I am wondering if your kernel has this commit id, 32b3e08fff60494cd1d281a39b51583edfd2b18f.
>> As this seems to be added to fix issues that look very similar to the trace you are receiving.
>> Nick
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-04 17:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-04  7:59 System hangs (unable to handle kernel paging request) Oleksii Berezhniak
2016-04-04 14:30 ` Bastien Philbert
2016-04-04 15:01   ` Oleksii Berezhniak
2016-04-04 17:50     ` Bastien Philbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).