All of lore.kernel.org
 help / color / mirror / Atom feed
* [uml-devel] UML Network Related Crashing
@ 2013-01-06 12:16 Dave Humphreys (Bob)
  2013-01-06 12:26 ` Richard Weinberger
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Humphreys (Bob) @ 2013-01-06 12:16 UTC (permalink / raw)
  To: user-mode-linux-devel

I note that my message sent yesterday ended up heavily truncated, so I have 
turned on the line-wrap for this one. Sorry about that.

What I was trying to say is that I can repeatedly crash UML by exercising the 
networking heavily as a result of rsync'ing data to and from.

I also said that I have an older kernel (3.1.0) UML (which operates reasonably 
successfully under most circumstances) that I access over the Internet. If I 
try to rsync data from it, I can crash it reliably after 5 or 10 minutes.

This 3.1.0 UML was the first step in an upgrade to a much older UML based 
virtual server that has been in operation for about a decade without any 
trouble. The main objective of upgrading was to take advantage of BTRFS, but I 
have noticed that the UML is less reliable and crashes occasionally. This UML 
is not actually using BTRFS, it has exactly the same disk images as the 
historic one, so the issue is not, I believe, related to BTRFS, but some other 
aspect of the newer kernel. I now believe that the reliability issue is 
probably related to the networking issue.

I have tried setting up a second, 3.8.0-rc2 based, UML on the same host and 
rsync'd between them over mcast network interfaces. I get crashing on one or 
the other UML.

I have found that I don't get much information about the failures. When things 
go wrong, the UML is completely locked up and is not responsive either from 
the session in which it was started, or via uml_mconsole. Things usually seem 
to have locked-up before any message comes into view. Once or twice I have 
seen the line that says '---[ cut here ]---', showing that something was 
trying to come out, but whatever message there was does not become visible.

I have set up two 3.8.0-rc2 based UMLs on a local machine and rsync data over 
mcast network interfaces. I'm hoping that I get something out that will help 
someone to identify the problem.

The latest result that has some information is:

------------[ cut here ]------------0:00
WARNING: at net/core/skbuff.c:573 skb_release_head_state+0x60/0xba()
Modules linked in:

but that is as much as I get. This UML instance is now totally locked up. This 
behaviour is consistent with the other failures I get on my older UML.

Previous conversations suggest that it is known that there is some network 
related problem in UML, but it has not been tracked down. I feel that it 
rather defeats the object if a UML instance cannot run reliably with 
networking. I will carry on with my exercising of my UMLs and report anything 
that I find.

Regards,
David Humphreys

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread
* [uml-devel] UML Network Related Crashing
@ 2013-01-19 18:59 Dave Humphreys (Bob)
  2013-01-22  6:07 ` Dave (Bob)
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Humphreys (Bob) @ 2013-01-19 18:59 UTC (permalink / raw)
  To: user-mode-linux-devel


I have experimented further with the network related crashing problem that I 
have had for some time with UML.

I have been using 3.8.0-rc3 UML kernels with Debian root filesystems.

I have been running two UML instances side-by-side on the same host and 
communicationg via mcast networking.

My current testing uses a 64 bit UML from which i copy data using rsync 
(because this is how I found the original problem).

I seem to be able to copy from 64 bit to 64 bit, but when I copy from 64 bit 
to 32 bit I get a crash.

The oringinal problem that I had was a 32 bit UML that I access via a real 
ethernet network connection and it crashes occasionally. i find that I can 
crash it regularly bu rsyncing data from it.

I can therefore say that the problem does not seems to occur when using both 
mcast and tuntap networking; and when the bulk of the network traffic is 
received or transmitted.

It does appear to be a problem with the 32 bit variant and not the 64 bit.

As an aside, I also find that I can crash the 32 bit UML by rsyncing between 
the UML and the host via a hostfs mount. I have a copy of the crash output for 
this if it is of any interest. I mention this in case there could be a link, 
but my current principle concern is to cure the networking problem.

Here is what happens when it crashes:


 map/@DEB1-32:~# rsync --archive --delete --progress 10.0.10.30:/var/imap/ 
/var/imap/
The authenticity of host '10.0.10.30 (10.0.10.30)' can't be established.
RSA key fingerprint is b8:5d:e0:69:bc:61:d9:88:e4:12:80:84:c9:94:81:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.10.30' (RSA) to the list of known hosts.
receiving incremental file list
./
.bash_history
        1970 100%    1.88MB/s    0:00:00 (xfer#1, to-check=1005/1007)
config/
config/annotations.db
         144 100%    3.70kB/s    0:00:00 (xfer#2, to-check=1356/1362)
config/deliver.db
           0   0%    0.00kB/s    0:00:00  ------------[ cut here ]------------
WARNING: at net/core/skbuff.c:573 skb_release_head_state+0x60/0x6c()
Modules linked in:

2367232c:  [<082050f7>] dump_stack+0x16/0x1a

23672340:  [<08070515>] warn_slowpath_common+0x48/0x5e

23672358:  [<08070540>] warn_slowpath_null+0x15/0x19

23672368:  [<081a1f93>] skb_release_head_state+0x60/0x6c

23672380:  [<081a3705>] __kfree_skb+0xe/0x72

2367238c:  [<081a3791>] consume_skb+0x28/0x2b

23672398:  [<080615e2>] uml_net_start_xmit+0xd5/0xdf

236723b8:  [<081ac85b>] dev_hard_start_xmit+0x26c/0x374

236723e4:  [<081bac2c>] sch_direct_xmit+0x35/0x10d

23672408:  [<081acb0b>] dev_queue_xmit+0xd6/0x203

23672438:  [<081c519a>] ip_finish_output+0x272/0x2d9

23672470:  [<081c5f9b>] ip_output+0x4b/0x50

23672484:  [<081c5a18>] ip_local_out+0x1d/0x23

23672494:  [<081c5ccd>] ip_queue_xmit+0x2af/0x2f7

236724d0:  [<081d70a7>] tcp_transmit_skb+0x697/0x6fb

23672520:  [<081d8e65>] tcp_send_ack+0xcc/0xd4

23672534:  [<081cfabb>] __tcp_ack_snd_check+0x42/0x7a

23672548:  [<081d47f1>] tcp_rcv_established+0x36e/0x594

23672570:  [<081da56f>] tcp_v4_do_rcv+0x5d/0x18e

2367259c:  [<081dcbcc>] tcp_v4_rcv+0x6a0/0x6f7

236725d4:  [<081c2195>] ip_local_deliver+0x11d/0x1b8

236725f0:  [<081c25f7>] ip_rcv+0x3c7/0x410

23672608:  [<081aa94e>] __netif_receive_skb+0x34a/0x3e4

23672650:  [<081aaa47>] process_backlog+0x5f/0xe1

2367266c:  [<081aadd5>] net_rx_action+0x49/0x121

23672690:  [<080757c9>] __do_softirq+0x84/0x129

236726b8:  [<080758d0>] do_softirq+0x30/0x3c

236726c8:  [<08075a6a>] irq_exit+0x35/0x6d

236726d4:  [<0805af16>] do_IRQ+0x24/0x34

236726e4:  [<0805af68>] sigio_handler+0x42/0x56

236726f8:  [<08068db7>] sig_handler_common+0x79/0x8c

23672978:  [<08068d2d>] unblock_signals+0x48/0x59

23672984:  [<0808accf>] finish_task_switch.isra.63+0x1b/0x51

2367299c:  [<08208103>] __schedule+0x234/0x28a

236729c0:  [<08208250>] schedule+0x57/0x59

236729cc:  [<082078ce>] schedule_hrtimeout_range_clock+0x33/0x128

23672a18:  [<082079d6>] schedule_hrtimeout_range+0x13/0x15

23672a30:  [<080d497f>] poll_schedule_timeout+0x2a/0x51

23672a4c:  [<080d5136>] do_select+0x4cd/0x504

23672d3c:  [<080d539b>] core_sys_select+0x22e/0x24b

23672e7c:  [<080d5416>] sys_select+0x5e/0x86

23672eb0:  [<0805d742>] handle_syscall+0x6a/0x80

23672ef4:  [<0806aee8>] userspace+0x362/0x488

23672fe4:  [<0805b3d6>] fork_handler+0x56/0x5b

23672ffc:  [<00746f6f>] 0x746f6f

---[ end trace 8e9ba3f2efd7a2c6 ]---
------------[ cut here ]------------
WARNING: at kernel/softirq.c:160 local_bh_enable+0x2f/0x83()

Modules linked in:
236723f4:  [<082050f7>] dump_stack+0x16/0x1a

23672408:  [<08070515>] warn_slowpath_common+0x48/0x5e
23672420:  [<08070540>] warn_slowpath_null+0x15/0x19
23672430:  [<0807598e>] local_bh_enable+0x2f/0x83
23672444:  [<081c51b1>] ip_finish_output+0x289/0x2d9
23672470:  [<081c5f9b>] ip_output+0x4b/0x50
23672484:  [<081c5a18>] ip_local_out+0x1d/0x23
23672494:  [<081c5ccd>] ip_queue_xmit+0x2af/0x2f7
236724d0:  [<081d70a7>] tcp_transmit_skb+0x697/0x6fb
23672520:  [<081d8e65>] tcp_send_ack+0xcc/0xd4
23672534:  [<081cfabb>] __tcp_ack_snd_check+0x42/0x7a
23672548:  [<081d47f1>] tcp_rcv_established+0x36e/0x594
23672570:  [<081da56f>] tcp_v4_do_rcv+0x5d/0x18e
2367259c:  [<081dcbcc>] tcp_v4_rcv+0x6a0/0x6f7
236725d4:  [<081c2195>] ip_local_deliver+0x11d/0x1b8
236725f0:  [<081c25f7>] ip_rcv+0x3c7/0x410
23672608:  [<081aa94e>] __netif_receive_skb+0x34a/0x3e4
23672650:  [<081aaa47>] process_backlog+0x5f/0xe1
2367266c:  [<081aadd5>] net_rx_action+0x49/0x121
23672690:  [<080757c9>] __do_softirq+0x84/0x129
236726b8:  [<080758d0>] do_softirq+0x30/0x3c
236726c8:  [<08075a6a>] irq_exit+0x35/0x6d
236726d4:  [<0805af16>] do_IRQ+0x24/0x34
236726e4:  [<0805af68>] sigio_handler+0x42/0x56
236726f8:  [<08068db7>] sig_handler_common+0x79/0x8c
23672978:  [<08068d2d>] unblock_signals+0x48/0x59
23672984:  [<0808accf>] finish_task_switch.isra.63+0x1b/0x51
2367299c:  [<08208103>] __schedule+0x234/0x28a
236729c0:  [<08208250>] schedule+0x57/0x59
236729cc:  [<082078ce>] schedule_hrtimeout_range_clock+0x33/0x128
23672a18:  [<082079d6>] schedule_hrtimeout_range+0x13/0x15
23672a30:  [<080d497f>] poll_schedule_timeout+0x2a/0x51
23672a4c:  [<080d5136>] do_select+0x4cd/0x504
23672d3c:  [<080d539b>] core_sys_select+0x22e/0x24b
23672e7c:  [<080d5416>] sys_select+0x5e/0x86
23672eb0:  [<0805d742>] handle_syscall+0x6a/0x80
23672ef4:  [<0806aee8>] userspace+0x362/0x488
23672fe4:  [<0805b3d6>] fork_handler+0x56/0x5b
23672ffc:  [<00746f6f>] 0x746f6f

---[ end trace 8e9ba3f2efd7a2c7 ]---
huh, entered softirq 3 NET_RX 081aad8c preempt_count 00000100, exited with 
fffffe01?

EIP: 0023:[<080d9e01>] CPU: 0 Tainted: G        W    ESP: 002b:23672a44 
EFLAGS: 00010202
    Tainted: G        W   
EAX: 00082a40 EBX: 23672ac4 ECX: 00fffffe EDX: 00000003
ESI: 00000005 EDI: 23672d54 EBP: 23672a50 DS: 002b ES: 002b
082a4750:  [<0806d29a>] show_regs+0xc0/0xc6
082a477c:  [<0805ceef>] segv+0x57/0x218
082a481c:  [<0805d102>] segv_handler+0x52/0x5d
082a4848:  [<08068db7>] sig_handler_common+0x79/0x8c
082a4ac8:  [<08068ea5>] sig_handler+0x34/0x43
082a4ad4:  [<08068b3a>] hard_handler+0x5a/0x88
082a4afc:  [<ffffe410>] 0xffffe410

Kernel panic - not syncing: Segfault with no mm
082a474c:  [<082050f7>] dump_stack+0x16/0x1a
082a4760:  [<08205181>] panic+0x67/0x149
082a4778:  [<0805cef9>] segv+0x61/0x218
082a481c:  [<0805d102>] segv_handler+0x52/0x5d
082a4848:  [<08068db7>] sig_handler_common+0x79/0x8c
082a4ac8:  [<08068ea5>] sig_handler+0x34/0x43
082a4ad4:  [<08068b3a>] hard_handler+0x5a/0x88
082a4afc:  [<ffffe410>] 0xffffe410


EIP: 0000:[<00000000>] CPU: 0 Tainted: G        W    EFLAGS: 00000000
    Tainted: G        W   
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 DS: 0000 ES: 0000
082a46bc:  [<0806d29a>] show_regs+0xc0/0xc6
082a46e8:  [<0805d32b>] panic_exit+0x20/0x36
082a46fc:  [<08088c3b>] notifier_call_chain+0x20/0x4b
082a4724:  [<08088c7d>] __atomic_notifier_call_chain+0x17/0x19
082a4734:  [<08088c94>] atomic_notifier_call_chain+0x15/0x17
082a4750:  [<08205199>] panic+0x7f/0x149
082a4778:  [<0805cef9>] segv+0x61/0x218
082a481c:  [<0805d102>] segv_handler+0x52/0x5d
082a4848:  [<08068db7>] sig_handler_common+0x79/0x8c
082a4ac8:  [<08068ea5>] sig_handler+0x34/0x43
082a4ad4:  [<08068b3a>] hard_handler+0x5a/0x88
082a4afc:  [<ffffe410>] 0xffffe410

Terminated
 

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread
* [uml-devel] UML Network Related Crashing
@ 2013-01-05 15:18 Dave Humphreys (Bob)
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Humphreys (Bob) @ 2013-01-05 15:18 UTC (permalink / raw)
  To: user-mode-linux-devel


I have problems with UML crashing that I believe are network related, because the crash happens whenever I try to make heavier use of the network connection to rsync more significant amounts of data from the UML.

It may or may not be significant, but when I am rsyncing from one UML to another, it has so far always been the one that is supplying the data that has crashed.

I have this with a UML that is accessed via a tap link and a real network connection, and also between two UMLs on the same host communicating via an 'mcast' connection.

Most of the time the affected UML simply locks up, and I can't do anything with it. Even if I try to drive it with uml_mconsole, that will not communicate with the UML. I sometimes see the ---[cut]--- type of output, but it doesn't get beyond that line. Sometimes I see no output before the thing locks up.

I've been experimenting with two UML instances communicating via mcast on my local machine, and I have experienced this output:

-bash-4.2# /sbin/sshd                                                                                                                                                                                                                                                          
-bash-4.2#                                                                                                                                                                                                                                                                     
EIP: 0023:[<08075749>] CPU: 0 Not tainted ESP: 002b:165fffb8 EFLAGS: 00010206                                                                                                                                                                                                  
    Not tainted                                                                                                                                                                                                                                                                
EAX: 963523cf EBX: 16600638 ECX: 01400002 EDX: 01400000                                                                                                                                                                                                                        
ESI: 0a0d4828 EDI: 00000000 EBP: 165fffbc DS: 002b ES: 002b
0837a750:  [<0806cf3a>] show_regs+0xc0/0xc6
0837a77c:  [<0805cd42>] segv+0x202/0x218
0837a81c:  [<0805cdaa>] segv_handler+0x52/0x5d
0837a848:  [<08068a5f>] sig_handler_common+0x79/0x8c
0837aac8:  [<08068b4d>] sig_handler+0x34/0x43
0837aad4:  [<080687e2>] hard_handler+0x5a/0x88
0837aafc:  [<ffffe410>] 0xffffe410

Kernel panic - not syncing: Kernel mode fault at addr 0x963524a7, ip 0x8075749
0837a744:  [<08249cc3>] dump_stack+0x16/0x1a
0837a758:  [<08249d4d>] panic+0x67/0x149
0837a770:  [<0805cd4e>] segv+0x20e/0x218
0837a81c:  [<0805cdaa>] segv_handler+0x52/0x5d
0837a848:  [<08068a5f>] sig_handler_common+0x79/0x8c
0837aac8:  [<08068b4d>] sig_handler+0x34/0x43
0837aad4:  [<080687e2>] hard_handler+0x5a/0x88
0837aafc:  [<ffffe410>] 0xffffe410


EIP: 0023:[<400010c2>] CPU: 0 Not tainted ESP: 002b:ff5a8674 EFLAGS: 00000212
    Not tainted
EAX: ffffffda EBX: ff5a87f0 ECX: ff5a8790 EDX: 401edff4
ESI: 00000000 EDI: ff5a87f0 EBP: ff5a8790 DS: 002b ES: 002b
0837a6b4:  [<0806cf3a>] show_regs+0xc0/0xc6
0837a6e0:  [<0805cfd3>] panic_exit+0x20/0x36
0837a6f4:  [<08088af7>] notifier_call_chain+0x20/0x4b
0837a71c:  [<08088b39>] __atomic_notifier_call_chain+0x17/0x19
0837a72c:  [<08088b50>] atomic_notifier_call_chain+0x15/0x17
0837a748:  [<08249d65>] panic+0x7f/0x149
0837a770:  [<0805cd4e>] segv+0x20e/0x218
0837a81c:  [<0805cdaa>] segv_handler+0x52/0x5d
0837a848:  [<08068a5f>] sig_handler_common+0x79/0x8c
0837aac8:  [<08068b4d>] sig_handler+0x34/0x43
0837aad4:  [<080687e2>] hard_handler+0x5a/0x88
0837aafc:  [<ffffe410>] 0xffffe410

Terminated
bash-4.1# 

What happened here was that I started two virtually identical UML instances on the same host, communicating with each other via mcast interfaces. I started sshd on one of them and then started to rsync data from it to the other. After a short while I got the above.

I have a UML running on a remote machine that I access via the Internet. I can always crash this when I try to copy data from it, and I assume that the above represents the same fault. I'm happy to do any testing required to try to help fix this.

David Humphreys



------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122912
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-01-22  6:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-06 12:16 [uml-devel] UML Network Related Crashing Dave Humphreys (Bob)
2013-01-06 12:26 ` Richard Weinberger
  -- strict thread matches above, loose matches on Subject: below --
2013-01-19 18:59 Dave Humphreys (Bob)
2013-01-22  6:07 ` Dave (Bob)
2013-01-05 15:18 Dave Humphreys (Bob)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.