* Lockup/High ksoftirqd when rate-limiting is enabled @ 2017-06-20 9:31 Jean-Louis Dupond 2017-06-20 11:18 ` Wei Liu 0 siblings, 1 reply; 5+ messages in thread From: Jean-Louis Dupond @ 2017-06-20 9:31 UTC (permalink / raw) To: xen-devel; +Cc: paul.durrant, wei.liu2 Hi, As requested via IRC i'm sending this to xen-devel & netback maintainers. We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64. Now recently we're having issues with rate-limiting enabled. When we enable rate limiting in Xen, and then do alot of outbound traffic on the domU, we notice a high ksoftirqd load. But in some cases the system locks up completely. This gives the following stacktrace: Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4 dm_cache_mq dm_cache dm_bio_prison Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4 dm_cache_mq dm_cache dm_bio_prison dm_persistent_data libcrc32c ext2 mbcache arptable_filter arp_tables xt_CT nf_conntrack iptable_raw iptable_filter ip_tables nbd(O) xen_gntalloc rdma_ucm(O) ib_ucm(O) rdma_cm(O) iw_cm(O) configfs ib_ipoib(O) ib_cm(O) ib_uverbs(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlx4_en(O) vxlan udp_tunnel ip6_udp_tunnel mlx4_ib(O) ib_sa(O) ib_mad(O) ib_core(O) ib_addr(O) ib_netlink(O) mlx4_core(O) mlx_compat(O) xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dm_snapshot dm_bufio dm_mirror_sync(O) dm_mirror dm_region_hash dm_log nfsv3 nfs_acl nfs fscache lockd sunrpc grace bridge ipv6 stp llc sg iTCO_wdt iTCO_vendor_support sd_mod mxm_wmi dcdbas pcspkr dm_mod ixgbe mdio sb_edac edac_core mgag200 Jun 4 11:07:56 xensrv1 kernel: ttm drm_kms_helper shpchp lpc_ich 8250_fintek ipmi_devintf ipmi_si ipmi_msghandler mei_me mei ahci libahci igb dca ptp pps_core megaraid_sas wmi acpi_power_meter hwmon xen_pciback cramfs Jun 4 11:07:56 xensrv1 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 3.18.44-20.el6.x86_64 #1 Jun 4 11:07:56 xensrv1 kernel: Hardware name: Dell Inc. PowerEdge R730xd/xxxx, BIOS 2.1.6 05/19/2016 Jun 4 11:07:56 xensrv1 kernel: task: ffff880275f6e010 ti: ffff880275fd0000 task.ti: ffff880275fd0000 Jun 4 11:07:56 xensrv1 kernel: RIP: e030:[<ffffffff8100bf38>] [<ffffffff8100bf38>] xen_restore_fl_direct+0x18/0x1b Jun 4 11:07:56 xensrv1 kernel: RSP: e02b:ffff88027aa23e30 EFLAGS: 00000297 Jun 4 11:07:56 xensrv1 kernel: RAX: 0000000000000008 RBX: 0000000000000200 RCX: 0000000000000003 Jun 4 11:07:56 xensrv1 kernel: RDX: ffff88027aa33f50 RSI: ffffc90013f88000 RDI: 0000000000000200 Jun 4 11:07:56 xensrv1 kernel: RBP: ffff88027aa23e48 R08: ffff88027aa33340 R09: ffff8802758d8a00 Jun 4 11:07:56 xensrv1 kernel: R10: ffff880283400c48 R11: 0000000000000000 R12: 0000000000000040 Jun 4 11:07:56 xensrv1 kernel: R13: ffffc90013f50000 R14: 0000000000000040 R15: 000000000000012b Jun 4 11:07:56 xensrv1 kernel: FS: 0000000000000000(0000) GS:ffff88027aa20000(0000) knlGS:ffff88027aa20000 Jun 4 11:07:56 xensrv1 kernel: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 Jun 4 11:07:56 xensrv1 kernel: CR2: 00007fad4acc6b08 CR3: 000000024e0a1000 CR4: 0000000000042660 Jun 4 11:07:56 xensrv1 kernel: Stack: Jun 4 11:07:56 xensrv1 kernel: ffffffff815a1139 ffff88027aa23e58 ffffc90013f50028 ffff88027aa23e58 Jun 4 11:07:56 xensrv1 kernel: ffffffffa036fc81 ffff88027aa23e98 ffffffffa03733cd ffff88027aa23e98 Jun 4 11:07:56 xensrv1 kernel: ffffffff00000000 ffff880251e25050 ffffc90013f50028 0000000000000000 Jun 4 11:07:56 xensrv1 kernel: Call Trace: Jun 4 11:07:56 xensrv1 kernel: <IRQ> [<ffffffff815a1139>] ? __napi_schedule+0x59/0x60 Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa036fc81>] xenvif_napi_schedule_or_enable_events+0x81/0x90 [xen_netback] Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa03733cd>] xenvif_poll+0x4d/0x68 [xen_netback] Jun 4 11:07:56 xensrv1 kernel: [<ffffffff815a8b32>] net_rx_action+0x112/0x2c0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff81077d4c>] __do_softirq+0xfc/0x2f0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8107804d>] irq_exit+0xbd/0xd0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff813b668c>] xen_evtchn_do_upcall+0x3c/0x50 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8167c49e>] xen_do_hypervisor_callback+0x1e/0x40 Jun 4 11:07:56 xensrv1 kernel: <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8100b700>] ? xen_safe_halt+0x10/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101fd44>] ? default_idle+0x24/0xf0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101f34f>] ? arch_cpu_idle+0xf/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b37f6>] ? cpuidle_idle_call+0xd6/0x1d0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810926c2>] ? __atomic_notifier_call_chain+0x12/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3a25>] ? cpu_idle_loop+0x135/0x200 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b0b>] ? cpu_startup_entry+0x1b/0x70 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b50>] ? cpu_startup_entry+0x60/0x70 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101261a>] ? cpu_bringup_and_idle+0x2a/0x40 Jun 4 11:07:56 xensrv1 kernel: Code: 44 00 00 65 f6 04 25 c1 a0 00 00 ff 0f 94 c4 00 e4 c3 90 66 f7 c7 00 02 65 0f 94 04 25 c1 a0 00 00 65 66 83 3c 25 c0 a0 00 00 01 <75> 05 e8 01 00 00 00 c3 50 51 52 56 57 41 50 41 51 41 52 41 53 Sometimes we get this lockups for minutes, and then the system recovers. But its clear we need to find a solution for this :) And it seems like we're not the only ones: https://lists.centos.org/pipermail/centos-virt/2016-March/005014.html There was also some other thread were there was a proposed patch (https://www.spinics.net/lists/netdev/msg282765.html). But I don't see any followup on this. Any advice? Thanks! Jean-Louis Dupond _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lockup/High ksoftirqd when rate-limiting is enabled 2017-06-20 9:31 Lockup/High ksoftirqd when rate-limiting is enabled Jean-Louis Dupond @ 2017-06-20 11:18 ` Wei Liu 2017-06-21 8:35 ` Jean-Louis Dupond 0 siblings, 1 reply; 5+ messages in thread From: Wei Liu @ 2017-06-20 11:18 UTC (permalink / raw) To: Jean-Louis Dupond; +Cc: paul.durrant, wei.liu2, xen-devel On Tue, Jun 20, 2017 at 11:31:02AM +0200, Jean-Louis Dupond wrote: > Hi, > > As requested via IRC i'm sending this to xen-devel & netback maintainers. > > We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64. > Now recently we're having issues with rate-limiting enabled. > > When we enable rate limiting in Xen, and then do alot of outbound traffic on > the domU, we notice a high ksoftirqd load. > But in some cases the system locks up completely. > Can you give this patch a try? ---8<-- From a242d4a74cc4ec46c5e3d43dd07eb146be4ca233 Mon Sep 17 00:00:00 2001 From: Wei Liu <wei.liu2@citrix.com> Date: Tue, 20 Jun 2017 11:49:28 +0100 Subject: [PATCH] xen-netback: correctly schedule rate-limited queues Add a flag to indicate if a queue is rate-limited. Test the flag in NAPI poll handler and avoid rescheduling the queue if true, otherwise we risk locking up the host. The rescheduling shall be done when replenishing credit. Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- drivers/net/xen-netback/common.h | 1 + drivers/net/xen-netback/interface.c | 6 +++++- drivers/net/xen-netback/netback.c | 6 +++++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 530586be05b4..5b1d2e8402d9 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -199,6 +199,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */ unsigned long remaining_credit; struct timer_list credit_timeout; u64 credit_window_start; + bool rate_limited; /* Statistics */ struct xenvif_stats stats; diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 8397f6c92451..e322a862ddfe 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -106,7 +106,11 @@ static int xenvif_poll(struct napi_struct *napi, int budget) if (work_done < budget) { napi_complete_done(napi, work_done); - xenvif_napi_schedule_or_enable_events(queue); + /* If the queue is rate-limited, it shall be + * rescheduled in the timer callback. + */ + if (likely(!queue->rate_limited)) + xenvif_napi_schedule_or_enable_events(queue); } return work_done; diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 602d408fa25e..5042ff8d449a 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -180,6 +180,7 @@ static void tx_add_credit(struct xenvif_queue *queue) max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */ queue->remaining_credit = min(max_credit, max_burst); + queue->rate_limited = false; } void xenvif_tx_credit_callback(unsigned long data) @@ -686,8 +687,10 @@ static bool tx_credit_exceeded(struct xenvif_queue *queue, unsigned size) msecs_to_jiffies(queue->credit_usec / 1000); /* Timer could already be pending in rare cases. */ - if (timer_pending(&queue->credit_timeout)) + if (timer_pending(&queue->credit_timeout)) { + queue->rate_limited = true; return true; + } /* Passed the point where we can replenish credit? */ if (time_after_eq64(now, next_credit)) { @@ -702,6 +705,7 @@ static bool tx_credit_exceeded(struct xenvif_queue *queue, unsigned size) mod_timer(&queue->credit_timeout, next_credit); queue->credit_window_start = next_credit; + queue->rate_limited = true; return true; } -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Lockup/High ksoftirqd when rate-limiting is enabled 2017-06-20 11:18 ` Wei Liu @ 2017-06-21 8:35 ` Jean-Louis Dupond 2017-06-21 9:10 ` Wei Liu 0 siblings, 1 reply; 5+ messages in thread From: Jean-Louis Dupond @ 2017-06-21 8:35 UTC (permalink / raw) To: Wei Liu; +Cc: paul.durrant, xen-devel Thanks for this quick patch. I was able to test it today, and the high ksoftirqd cpu usage is gone. Great! Is there a chance this can get pushed into stable kernel versions (3.18.x, 4.4.x, etc)? There is not really a backport work, as the netback driver hasn't changed alot recently. Tested-by: Jean-Louis Dupond <jean-louis@dupond.be> Op 2017-06-20 13:18, schreef Wei Liu: > On Tue, Jun 20, 2017 at 11:31:02AM +0200, Jean-Louis Dupond wrote: >> Hi, >> >> As requested via IRC i'm sending this to xen-devel & netback >> maintainers. >> >> We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64. >> Now recently we're having issues with rate-limiting enabled. >> >> When we enable rate limiting in Xen, and then do alot of outbound >> traffic on >> the domU, we notice a high ksoftirqd load. >> But in some cases the system locks up completely. >> > > Can you give this patch a try? > > ---8<-- > From a242d4a74cc4ec46c5e3d43dd07eb146be4ca233 Mon Sep 17 00:00:00 2001 > From: Wei Liu <wei.liu2@citrix.com> > Date: Tue, 20 Jun 2017 11:49:28 +0100 > Subject: [PATCH] xen-netback: correctly schedule rate-limited queues > > Add a flag to indicate if a queue is rate-limited. Test the flag in > NAPI poll handler and avoid rescheduling the queue if true, otherwise > we risk locking up the host. The rescheduling shall be done when > replenishing credit. > > Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > --- > drivers/net/xen-netback/common.h | 1 + > drivers/net/xen-netback/interface.c | 6 +++++- > drivers/net/xen-netback/netback.c | 6 +++++- > 3 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h > b/drivers/net/xen-netback/common.h > index 530586be05b4..5b1d2e8402d9 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -199,6 +199,7 @@ struct xenvif_queue { /* Per-queue data for xenvif > */ > unsigned long remaining_credit; > struct timer_list credit_timeout; > u64 credit_window_start; > + bool rate_limited; > > /* Statistics */ > struct xenvif_stats stats; > diff --git a/drivers/net/xen-netback/interface.c > b/drivers/net/xen-netback/interface.c > index 8397f6c92451..e322a862ddfe 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -106,7 +106,11 @@ static int xenvif_poll(struct napi_struct *napi, > int budget) > > if (work_done < budget) { > napi_complete_done(napi, work_done); > - xenvif_napi_schedule_or_enable_events(queue); > + /* If the queue is rate-limited, it shall be > + * rescheduled in the timer callback. > + */ > + if (likely(!queue->rate_limited)) > + xenvif_napi_schedule_or_enable_events(queue); > } > > return work_done; > diff --git a/drivers/net/xen-netback/netback.c > b/drivers/net/xen-netback/netback.c > index 602d408fa25e..5042ff8d449a 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -180,6 +180,7 @@ static void tx_add_credit(struct xenvif_queue > *queue) > max_credit = ULONG_MAX; /* wrapped: clamp to ULONG_MAX */ > > queue->remaining_credit = min(max_credit, max_burst); > + queue->rate_limited = false; > } > > void xenvif_tx_credit_callback(unsigned long data) > @@ -686,8 +687,10 @@ static bool tx_credit_exceeded(struct > xenvif_queue *queue, unsigned size) > msecs_to_jiffies(queue->credit_usec / 1000); > > /* Timer could already be pending in rare cases. */ > - if (timer_pending(&queue->credit_timeout)) > + if (timer_pending(&queue->credit_timeout)) { > + queue->rate_limited = true; > return true; > + } > > /* Passed the point where we can replenish credit? */ > if (time_after_eq64(now, next_credit)) { > @@ -702,6 +705,7 @@ static bool tx_credit_exceeded(struct xenvif_queue > *queue, unsigned size) > mod_timer(&queue->credit_timeout, > next_credit); > queue->credit_window_start = next_credit; > + queue->rate_limited = true; > > return true; > } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lockup/High ksoftirqd when rate-limiting is enabled 2017-06-21 8:35 ` Jean-Louis Dupond @ 2017-06-21 9:10 ` Wei Liu 0 siblings, 0 replies; 5+ messages in thread From: Wei Liu @ 2017-06-21 9:10 UTC (permalink / raw) To: Jean-Louis Dupond; +Cc: paul.durrant, Wei Liu, xen-devel On Wed, Jun 21, 2017 at 10:35:11AM +0200, Jean-Louis Dupond wrote: > Thanks for this quick patch. > I was able to test it today, and the high ksoftirqd cpu usage is gone. > > Great! > > Is there a chance this can get pushed into stable kernel versions (3.18.x, > 4.4.x, etc)? > There is not really a backport work, as the netback driver hasn't changed > alot recently. 3.18 is EOL. I think it will eventually trickle down to all 4.X longterm kernels. > > > Tested-by: Jean-Louis Dupond <jean-louis@dupond.be> > Thanks. I will submit this to netdev soon. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Lockup/High ksoftirqd when rate-limiting is enabled @ 2017-06-19 14:47 Jean-Louis Dupond 0 siblings, 0 replies; 5+ messages in thread From: Jean-Louis Dupond @ 2017-06-19 14:47 UTC (permalink / raw) To: xen-devel Hi, We are using Xen 4.4.4-23.el6 with kernel 3.18.44-20.el6.x86_64. Now recently we're having issues with rate-limiting enabled. When we enable rate limiting in Xen, and then do alot of outbound traffic on the domU, we notice a high ksoftirqd load. But in some cases the system locks up completely. This gives the following stacktrace: Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4 dm_cache_mq dm_cache dm_bio_prison Jun 4 11:07:56 xensrv1 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] Jun 4 11:07:56 xensrv1 kernel: Modules linked in: fuse tun cls_fw sch_htb iptable_mangle ip6table_mangle sch_tbf nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_multiport 8021q garp xt_mark ip6_tables xt_physdev br_netfilter dm_zero xfs ipt_REJECT nf_reject_ipv4 dm_cache_mq dm_cache dm_bio_prison dm_persistent_data libcrc32c ext2 mbcache arptable_filter arp_tables xt_CT nf_conntrack iptable_raw iptable_filter ip_tables nbd(O) xen_gntalloc rdma_ucm(O) ib_ucm(O) rdma_cm(O) iw_cm(O) configfs ib_ipoib(O) ib_cm(O) ib_uverbs(O) ib_umad(O) mlx5_ib(O) mlx5_core(O) mlx4_en(O) vxlan udp_tunnel ip6_udp_tunnel mlx4_ib(O) ib_sa(O) ib_mad(O) ib_core(O) ib_addr(O) ib_netlink(O) mlx4_core(O) mlx_compat(O) xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dm_snapshot dm_bufio dm_mirror_sync(O) dm_mirror dm_region_hash dm_log nfsv3 nfs_acl nfs fscache lockd sunrpc grace bridge ipv6 stp llc sg iTCO_wdt iTCO_vendor_support sd_mod mxm_wmi dcdbas pcspkr dm_mod ixgbe mdio sb_edac edac_core mgag200 Jun 4 11:07:56 xensrv1 kernel: ttm drm_kms_helper shpchp lpc_ich 8250_fintek ipmi_devintf ipmi_si ipmi_msghandler mei_me mei ahci libahci igb dca ptp pps_core megaraid_sas wmi acpi_power_meter hwmon xen_pciback cramfs Jun 4 11:07:56 xensrv1 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G O 3.18.44-20.el6.x86_64 #1 Jun 4 11:07:56 xensrv1 kernel: Hardware name: Dell Inc. PowerEdge R730xd/xxxx, BIOS 2.1.6 05/19/2016 Jun 4 11:07:56 xensrv1 kernel: task: ffff880275f6e010 ti: ffff880275fd0000 task.ti: ffff880275fd0000 Jun 4 11:07:56 xensrv1 kernel: RIP: e030:[<ffffffff8100bf38>] [<ffffffff8100bf38>] xen_restore_fl_direct+0x18/0x1b Jun 4 11:07:56 xensrv1 kernel: RSP: e02b:ffff88027aa23e30 EFLAGS: 00000297 Jun 4 11:07:56 xensrv1 kernel: RAX: 0000000000000008 RBX: 0000000000000200 RCX: 0000000000000003 Jun 4 11:07:56 xensrv1 kernel: RDX: ffff88027aa33f50 RSI: ffffc90013f88000 RDI: 0000000000000200 Jun 4 11:07:56 xensrv1 kernel: RBP: ffff88027aa23e48 R08: ffff88027aa33340 R09: ffff8802758d8a00 Jun 4 11:07:56 xensrv1 kernel: R10: ffff880283400c48 R11: 0000000000000000 R12: 0000000000000040 Jun 4 11:07:56 xensrv1 kernel: R13: ffffc90013f50000 R14: 0000000000000040 R15: 000000000000012b Jun 4 11:07:56 xensrv1 kernel: FS: 0000000000000000(0000) GS:ffff88027aa20000(0000) knlGS:ffff88027aa20000 Jun 4 11:07:56 xensrv1 kernel: CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 Jun 4 11:07:56 xensrv1 kernel: CR2: 00007fad4acc6b08 CR3: 000000024e0a1000 CR4: 0000000000042660 Jun 4 11:07:56 xensrv1 kernel: Stack: Jun 4 11:07:56 xensrv1 kernel: ffffffff815a1139 ffff88027aa23e58 ffffc90013f50028 ffff88027aa23e58 Jun 4 11:07:56 xensrv1 kernel: ffffffffa036fc81 ffff88027aa23e98 ffffffffa03733cd ffff88027aa23e98 Jun 4 11:07:56 xensrv1 kernel: ffffffff00000000 ffff880251e25050 ffffc90013f50028 0000000000000000 Jun 4 11:07:56 xensrv1 kernel: Call Trace: Jun 4 11:07:56 xensrv1 kernel: <IRQ> [<ffffffff815a1139>] ? __napi_schedule+0x59/0x60 Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa036fc81>] xenvif_napi_schedule_or_enable_events+0x81/0x90 [xen_netback] Jun 4 11:07:56 xensrv1 kernel: [<ffffffffa03733cd>] xenvif_poll+0x4d/0x68 [xen_netback] Jun 4 11:07:56 xensrv1 kernel: [<ffffffff815a8b32>] net_rx_action+0x112/0x2c0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff81077d4c>] __do_softirq+0xfc/0x2f0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8107804d>] irq_exit+0xbd/0xd0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff813b668c>] xen_evtchn_do_upcall+0x3c/0x50 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8167c49e>] xen_do_hypervisor_callback+0x1e/0x40 Jun 4 11:07:56 xensrv1 kernel: <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8100b700>] ? xen_safe_halt+0x10/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101fd44>] ? default_idle+0x24/0xf0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101f34f>] ? arch_cpu_idle+0xf/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b37f6>] ? cpuidle_idle_call+0xd6/0x1d0 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810926c2>] ? __atomic_notifier_call_chain+0x12/0x20 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3a25>] ? cpu_idle_loop+0x135/0x200 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b0b>] ? cpu_startup_entry+0x1b/0x70 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff810b3b50>] ? cpu_startup_entry+0x60/0x70 Jun 4 11:07:56 xensrv1 kernel: [<ffffffff8101261a>] ? cpu_bringup_and_idle+0x2a/0x40 Jun 4 11:07:56 xensrv1 kernel: Code: 44 00 00 65 f6 04 25 c1 a0 00 00 ff 0f 94 c4 00 e4 c3 90 66 f7 c7 00 02 65 0f 94 04 25 c1 a0 00 00 65 66 83 3c 25 c0 a0 00 00 01 <75> 05 e8 01 00 00 00 c3 50 51 52 56 57 41 50 41 51 41 52 41 53 Sometimes we get this lockups for minutes, and then the system recovers. But its clear we need to find a solution for this :) And it seems like we're not the only ones: https://lists.centos.org/pipermail/centos-virt/2016-March/005014.html Any advice? Thanks! Jean-Louis Dupond _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-06-21 9:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-20 9:31 Lockup/High ksoftirqd when rate-limiting is enabled Jean-Louis Dupond 2017-06-20 11:18 ` Wei Liu 2017-06-21 8:35 ` Jean-Louis Dupond 2017-06-21 9:10 ` Wei Liu -- strict thread matches above, loose matches on Subject: below -- 2017-06-19 14:47 Jean-Louis Dupond
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).