[Qemu-devel] [BUG] Regression in networking code (SIGSEGV)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
@ 2009-01-21 21:17 Stefan Weil
  2009-01-22  7:38 ` Gleb Natapov
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Weil @ 2009-01-21 21:17 UTC (permalink / raw)
  To: QEMU Developers

Hi,

the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
load.

I bootet a mips malta kernel from a debian nfs root. While this worked fine,
aptitude update hangs during downloads, nfs root is lost and after some time
Qemu gets a SIGSEGV.

A similar crash occurs with a different mips machine (ar7) and different
network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
This second system does not survive the network boot.

Up to now, I could not run tests with non-mips systems.

I'm fairly sure that 2 weeks ago networking worked without problems in both
cases.

Regards
Stefan Weil

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3be89386e0 (LWP 14845)]
0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
~/src/qemu/trunk/slirp/ip_input.c:408
408             ip->ip_len = next;
(gdb) i s
#0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
~/src/qemu/trunk/slirp/ip_input.c:408
#1  0x00000000004d15ad in ip_input (m=0x110e010) at
~/src/qemu/trunk/slirp/ip_input.c:228
#2  0x00000000004b3d25 in slirp_input (pkt=0x12fcbd0 "RT", pkt_len=1294)
at ~/src/qemu/trunk/slirp/slirp.c:679
#3  0x000000000049a07d in qemu_send_packet (vc1=0x10ff440, buf=0x12fcbd0
"RT", size=1294) at ~/src/qemu/trunk/net.c:399
#4  0x000000000042ea6a in pcnet_transmit (s=0x12fc810) at
~/src/qemu/trunk/hw/pcnet.c:1300
#5  0x000000000042ebd8 in pcnet_poll_timer (opaque=<value optimized
out>) at ~/src/qemu/trunk/hw/pcnet.c:1363
#6  0x000000000042f270 in pcnet_ioport_writew (opaque=0x7f3be72a79e0,
addr=17884784, val=16684768)
    at ~/src/qemu/trunk/hw/pcnet.c:1645
#7  0x0000000000405eb8 in ioport_write (index=1, address=4146, data=0)
at ~/src/qemu/trunk/vl.c:302
#8  0x00000000004062b5 in cpu_outw (env=0x0, addr=4146, val=0) at
~/src/qemu/trunk/vl.c:432
#9  0x00000000420ae755 in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb) up
#1  0x00000000004d15ad in ip_input (m=0x110e010) at
~/src/qemu/trunk/slirp/ip_input.c:228
228                             ip = ip_reass(ip, fp);
(gdb) p ip
$1 = (struct ip *) 0x110e060
(gdb) do
#0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
~/src/qemu/trunk/slirp/ip_input.c:408
408             ip->ip_len = next;
(gdb) p *ip
Cannot access memory at address 0xfe96e0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-01-21 21:17 [Qemu-devel] [BUG] Regression in networking code (SIGSEGV) Stefan Weil
@ 2009-01-22  7:38 ` Gleb Natapov
  2009-01-24 21:00   ` Stefan Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Gleb Natapov @ 2009-01-22  7:38 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Wed, Jan 21, 2009 at 10:17:20PM +0100, Stefan Weil wrote:
> Hi,
> 
> the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
> load.
> 
> I bootet a mips malta kernel from a debian nfs root. While this worked fine,
> aptitude update hangs during downloads, nfs root is lost and after some time
> Qemu gets a SIGSEGV.
> 
> A similar crash occurs with a different mips machine (ar7) and different
> network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
> This second system does not survive the network boot.
> 
> Up to now, I could not run tests with non-mips systems.
> 
> I'm fairly sure that 2 weeks ago networking worked without problems in both
> cases.
> 
What is your host CPU?  How you run qemu (what is your command line)?

> Regards
> Stefan Weil
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f3be89386e0 (LWP 14845)]
> 0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
> ~/src/qemu/trunk/slirp/ip_input.c:408
> 408             ip->ip_len = next;
> (gdb) i s
> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
> ~/src/qemu/trunk/slirp/ip_input.c:408
> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
> ~/src/qemu/trunk/slirp/ip_input.c:228
> #2  0x00000000004b3d25 in slirp_input (pkt=0x12fcbd0 "RT", pkt_len=1294)
> at ~/src/qemu/trunk/slirp/slirp.c:679
> #3  0x000000000049a07d in qemu_send_packet (vc1=0x10ff440, buf=0x12fcbd0
> "RT", size=1294) at ~/src/qemu/trunk/net.c:399
> #4  0x000000000042ea6a in pcnet_transmit (s=0x12fc810) at
> ~/src/qemu/trunk/hw/pcnet.c:1300
> #5  0x000000000042ebd8 in pcnet_poll_timer (opaque=<value optimized
> out>) at ~/src/qemu/trunk/hw/pcnet.c:1363
> #6  0x000000000042f270 in pcnet_ioport_writew (opaque=0x7f3be72a79e0,
> addr=17884784, val=16684768)
>     at ~/src/qemu/trunk/hw/pcnet.c:1645
> #7  0x0000000000405eb8 in ioport_write (index=1, address=4146, data=0)
> at ~/src/qemu/trunk/vl.c:302
> #8  0x00000000004062b5 in cpu_outw (env=0x0, addr=4146, val=0) at
> ~/src/qemu/trunk/vl.c:432
> #9  0x00000000420ae755 in ?? ()
> #10 0x0000000000000000 in ?? ()
> (gdb) up
> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
> ~/src/qemu/trunk/slirp/ip_input.c:228
> 228                             ip = ip_reass(ip, fp);
> (gdb) p ip
> $1 = (struct ip *) 0x110e060
> (gdb) do
> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
> ~/src/qemu/trunk/slirp/ip_input.c:408
> 408             ip->ip_len = next;
> (gdb) p *ip
> Cannot access memory at address 0xfe96e0
> 
> 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-01-22  7:38 ` Gleb Natapov
@ 2009-01-24 21:00   ` Stefan Weil
  2009-01-24 21:53     ` Aurelien Jarno
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Weil @ 2009-01-24 21:00 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: QEMU Developers

Gleb Natapov schrieb:
> On Wed, Jan 21, 2009 at 10:17:20PM +0100, Stefan Weil wrote:
>   
>> Hi,
>>
>> the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
>> load.
>>
>> I bootet a mips malta kernel from a debian nfs root. While this worked fine,
>> aptitude update hangs during downloads, nfs root is lost and after some time
>> Qemu gets a SIGSEGV.
>>
>> A similar crash occurs with a different mips machine (ar7) and different
>> network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
>> This second system does not survive the network boot.
>>
>> Up to now, I could not run tests with non-mips systems.
>>
>> I'm fairly sure that 2 weeks ago networking worked without problems in both
>> cases.
>>
>>     
> What is your host CPU?  How you run qemu (what is your command line)?
>   

Debian amd64 host.

mipsel-softmmu/qemu-system-mipsel --kernel vmlinux \
    --append "debug nohz=off root=/dev/nfs rw ip=::::malta-le::dhcp" \
    -M malta --net nic --net user -m 256

The target system is Debian mips.

Regards
Stefan

>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7f3be89386e0 (LWP 14845)]
>> 0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
>> ~/src/qemu/trunk/slirp/ip_input.c:408
>> 408             ip->ip_len = next;
>> (gdb) i s
>> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
>> ~/src/qemu/trunk/slirp/ip_input.c:408
>> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
>> ~/src/qemu/trunk/slirp/ip_input.c:228
>> #2  0x00000000004b3d25 in slirp_input (pkt=0x12fcbd0 "RT", pkt_len=1294)
>> at ~/src/qemu/trunk/slirp/slirp.c:679
>> #3  0x000000000049a07d in qemu_send_packet (vc1=0x10ff440, buf=0x12fcbd0
>> "RT", size=1294) at ~/src/qemu/trunk/net.c:399
>> #4  0x000000000042ea6a in pcnet_transmit (s=0x12fc810) at
>> ~/src/qemu/trunk/hw/pcnet.c:1300
>> #5  0x000000000042ebd8 in pcnet_poll_timer (opaque=<value optimized
>> out>) at ~/src/qemu/trunk/hw/pcnet.c:1363
>> #6  0x000000000042f270 in pcnet_ioport_writew (opaque=0x7f3be72a79e0,
>> addr=17884784, val=16684768)
>>     at ~/src/qemu/trunk/hw/pcnet.c:1645
>> #7  0x0000000000405eb8 in ioport_write (index=1, address=4146, data=0)
>> at ~/src/qemu/trunk/vl.c:302
>> #8  0x00000000004062b5 in cpu_outw (env=0x0, addr=4146, val=0) at
>> ~/src/qemu/trunk/vl.c:432
>> #9  0x00000000420ae755 in ?? ()
>> #10 0x0000000000000000 in ?? ()
>> (gdb) up
>> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
>> ~/src/qemu/trunk/slirp/ip_input.c:228
>> 228                             ip = ip_reass(ip, fp);
>> (gdb) p ip
>> $1 = (struct ip *) 0x110e060
>> (gdb) do
>> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
>> ~/src/qemu/trunk/slirp/ip_input.c:408
>> 408             ip->ip_len = next;
>> (gdb) p *ip
>> Cannot access memory at address 0xfe96e0
>>
>>
>>
>>     
>
> --
> 			Gleb.
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-01-24 21:00   ` Stefan Weil
@ 2009-01-24 21:53     ` Aurelien Jarno
  2009-01-25 21:29       ` Stefan Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Aurelien Jarno @ 2009-01-24 21:53 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel

On Sat, Jan 24, 2009 at 10:00:33PM +0100, Stefan Weil wrote:
> Gleb Natapov schrieb:
> > On Wed, Jan 21, 2009 at 10:17:20PM +0100, Stefan Weil wrote:
> >   
> >> Hi,
> >>
> >> the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
> >> load.
> >>
> >> I bootet a mips malta kernel from a debian nfs root. While this worked fine,
> >> aptitude update hangs during downloads, nfs root is lost and after some time
> >> Qemu gets a SIGSEGV.
> >>
> >> A similar crash occurs with a different mips machine (ar7) and different
> >> network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
> >> This second system does not survive the network boot.
> >>
> >> Up to now, I could not run tests with non-mips systems.
> >>
> >> I'm fairly sure that 2 weeks ago networking worked without problems in both
> >> cases.
> >>
> >>     
> > What is your host CPU?  How you run qemu (what is your command line)?
> >   
> 
> Debian amd64 host.
> 
> mipsel-softmmu/qemu-system-mipsel --kernel vmlinux \
>     --append "debug nohz=off root=/dev/nfs rw ip=::::malta-le::dhcp" \
>     -M malta --net nic --net user -m 256

The fact that your host system is 64-bit and you are using the user
networking is interesting. You could try to look at revisions 6272 and
6288, they are probably the cause of your problem.

> The target system is Debian mips.
> 
> Regards
> Stefan
> 
> >> Program received signal SIGSEGV, Segmentation fault.
> >> [Switching to Thread 0x7f3be89386e0 (LWP 14845)]
> >> 0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
> >> ~/src/qemu/trunk/slirp/ip_input.c:408
> >> 408             ip->ip_len = next;
> >> (gdb) i s
> >> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
> >> ~/src/qemu/trunk/slirp/ip_input.c:408
> >> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
> >> ~/src/qemu/trunk/slirp/ip_input.c:228
> >> #2  0x00000000004b3d25 in slirp_input (pkt=0x12fcbd0 "RT", pkt_len=1294)
> >> at ~/src/qemu/trunk/slirp/slirp.c:679
> >> #3  0x000000000049a07d in qemu_send_packet (vc1=0x10ff440, buf=0x12fcbd0
> >> "RT", size=1294) at ~/src/qemu/trunk/net.c:399
> >> #4  0x000000000042ea6a in pcnet_transmit (s=0x12fc810) at
> >> ~/src/qemu/trunk/hw/pcnet.c:1300
> >> #5  0x000000000042ebd8 in pcnet_poll_timer (opaque=<value optimized
> >> out>) at ~/src/qemu/trunk/hw/pcnet.c:1363
> >> #6  0x000000000042f270 in pcnet_ioport_writew (opaque=0x7f3be72a79e0,
> >> addr=17884784, val=16684768)
> >>     at ~/src/qemu/trunk/hw/pcnet.c:1645
> >> #7  0x0000000000405eb8 in ioport_write (index=1, address=4146, data=0)
> >> at ~/src/qemu/trunk/vl.c:302
> >> #8  0x00000000004062b5 in cpu_outw (env=0x0, addr=4146, val=0) at
> >> ~/src/qemu/trunk/vl.c:432
> >> #9  0x00000000420ae755 in ?? ()
> >> #10 0x0000000000000000 in ?? ()
> >> (gdb) up
> >> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
> >> ~/src/qemu/trunk/slirp/ip_input.c:228
> >> 228                             ip = ip_reass(ip, fp);
> >> (gdb) p ip
> >> $1 = (struct ip *) 0x110e060
> >> (gdb) do
> >> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
> >> ~/src/qemu/trunk/slirp/ip_input.c:408
> >> 408             ip->ip_len = next;
> >> (gdb) p *ip
> >> Cannot access memory at address 0xfe96e0
> >>
> >>
> >>
> >>     
> >
> > --
> > 			Gleb.
> >   
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-01-24 21:53     ` Aurelien Jarno
@ 2009-01-25 21:29       ` Stefan Weil
  2009-01-26  6:29         ` Gleb Natapov
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Weil @ 2009-01-25 21:29 UTC (permalink / raw)
  To: Aurelien Jarno, QEMU Developers

Aurelien Jarno schrieb:
> On Sat, Jan 24, 2009 at 10:00:33PM +0100, Stefan Weil wrote:
>   
>> Gleb Natapov schrieb:
>>     
>>> On Wed, Jan 21, 2009 at 10:17:20PM +0100, Stefan Weil wrote:
>>>   
>>>       
>>>> Hi,
>>>>
>>>> the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
>>>> load.
>>>>
>>>> I bootet a mips malta kernel from a debian nfs root. While this worked fine,
>>>> aptitude update hangs during downloads, nfs root is lost and after some time
>>>> Qemu gets a SIGSEGV.
>>>>
>>>> A similar crash occurs with a different mips machine (ar7) and different
>>>> network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
>>>> This second system does not survive the network boot.
>>>>
>>>> Up to now, I could not run tests with non-mips systems.
>>>>
>>>> I'm fairly sure that 2 weeks ago networking worked without problems in both
>>>> cases.
>>>>
>>>>     
>>>>         
>>> What is your host CPU?  How you run qemu (what is your command line)?
>>>   
>>>       
>> Debian amd64 host.
>>
>> mipsel-softmmu/qemu-system-mipsel --kernel vmlinux \
>>     --append "debug nohz=off root=/dev/nfs rw ip=::::malta-le::dhcp" \
>>     -M malta --net nic --net user -m 256
>>     
>
> The fact that your host system is 64-bit and you are using the user
> networking is interesting. You could try to look at revisions 6272 and
> 6288, they are probably the cause of your problem.
>   

Yes, you are right. Reverting r6288 results in stable networking again.

Thank you for this hint.

Stefan

>   
>> The target system is Debian mips.
>>
>> Regards
>> Stefan
>>
>>     
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> [Switching to Thread 0x7f3be89386e0 (LWP 14845)]
>>>> 0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
>>>> ~/src/qemu/trunk/slirp/ip_input.c:408
>>>> 408             ip->ip_len = next;
>>>> (gdb) i s
>>>> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
>>>> ~/src/qemu/trunk/slirp/ip_input.c:408
>>>> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
>>>> ~/src/qemu/trunk/slirp/ip_input.c:228
>>>> #2  0x00000000004b3d25 in slirp_input (pkt=0x12fcbd0 "RT", pkt_len=1294)
>>>> at ~/src/qemu/trunk/slirp/slirp.c:679
>>>> #3  0x000000000049a07d in qemu_send_packet (vc1=0x10ff440, buf=0x12fcbd0
>>>> "RT", size=1294) at ~/src/qemu/trunk/net.c:399
>>>> #4  0x000000000042ea6a in pcnet_transmit (s=0x12fc810) at
>>>> ~/src/qemu/trunk/hw/pcnet.c:1300
>>>> #5  0x000000000042ebd8 in pcnet_poll_timer (opaque=<value optimized
>>>> out>) at ~/src/qemu/trunk/hw/pcnet.c:1363
>>>> #6  0x000000000042f270 in pcnet_ioport_writew (opaque=0x7f3be72a79e0,
>>>> addr=17884784, val=16684768)
>>>>     at ~/src/qemu/trunk/hw/pcnet.c:1645
>>>> #7  0x0000000000405eb8 in ioport_write (index=1, address=4146, data=0)
>>>> at ~/src/qemu/trunk/vl.c:302
>>>> #8  0x00000000004062b5 in cpu_outw (env=0x0, addr=4146, val=0) at
>>>> ~/src/qemu/trunk/vl.c:432
>>>> #9  0x00000000420ae755 in ?? ()
>>>> #10 0x0000000000000000 in ?? ()
>>>> (gdb) up
>>>> #1  0x00000000004d15ad in ip_input (m=0x110e010) at
>>>> ~/src/qemu/trunk/slirp/ip_input.c:228
>>>> 228                             ip = ip_reass(ip, fp);
>>>> (gdb) p ip
>>>> $1 = (struct ip *) 0x110e060
>>>> (gdb) do
>>>> #0  0x00000000004d1a6f in ip_reass (ip=0xfe96e0, fp=0x1109050) at
>>>> ~/src/qemu/trunk/slirp/ip_input.c:408
>>>> 408             ip->ip_len = next;
>>>> (gdb) p *ip
>>>> Cannot access memory at address 0xfe96e0
>>>>
>>>>
>>>>
>>>>     
>>>>         
>>> --
>>> 			Gleb.
>>>   
>>>       
>>
>>     
>
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-01-25 21:29       ` Stefan Weil
@ 2009-01-26  6:29         ` Gleb Natapov
  2009-02-01 20:11           ` Stefan Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Gleb Natapov @ 2009-01-26  6:29 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel

On Sun, Jan 25, 2009 at 10:29:38PM +0100, Stefan Weil wrote:
> Aurelien Jarno schrieb:
> > On Sat, Jan 24, 2009 at 10:00:33PM +0100, Stefan Weil wrote:
> >   
> >> Gleb Natapov schrieb:
> >>     
> >>> On Wed, Jan 21, 2009 at 10:17:20PM +0100, Stefan Weil wrote:
> >>>   
> >>>       
> >>>> Hi,
> >>>>
> >>>> the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
> >>>> load.
> >>>>
> >>>> I bootet a mips malta kernel from a debian nfs root. While this worked fine,
> >>>> aptitude update hangs during downloads, nfs root is lost and after some time
> >>>> Qemu gets a SIGSEGV.
> >>>>
> >>>> A similar crash occurs with a different mips machine (ar7) and different
> >>>> network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
> >>>> This second system does not survive the network boot.
> >>>>
> >>>> Up to now, I could not run tests with non-mips systems.
> >>>>
> >>>> I'm fairly sure that 2 weeks ago networking worked without problems in both
> >>>> cases.
> >>>>
> >>>>     
> >>>>         
> >>> What is your host CPU?  How you run qemu (what is your command line)?
> >>>   
> >>>       
> >> Debian amd64 host.
> >>
> >> mipsel-softmmu/qemu-system-mipsel --kernel vmlinux \
> >>     --append "debug nohz=off root=/dev/nfs rw ip=::::malta-le::dhcp" \
> >>     -M malta --net nic --net user -m 256
> >>     
> >
> > The fact that your host system is 64-bit and you are using the user
> > networking is interesting. You could try to look at revisions 6272 and
> > 6288, they are probably the cause of your problem.
> >   
> 
> Yes, you are right. Reverting r6288 results in stable networking again.
> 
> Thank you for this hint.
> 
I also use amd64 host, but I was not able to reproduce this with other
guests. I'll try mips guest. Can you try to reproduce the problem with
different guest (x86 for instance)?


--
			Gleb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-01-26  6:29         ` Gleb Natapov
@ 2009-02-01 20:11           ` Stefan Weil
       [not found]             ` <20090202163806.GA29674@redhat.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Weil @ 2009-02-01 20:11 UTC (permalink / raw)
  To: QEMU Developers

Gleb Natapov schrieb:
> On Sun, Jan 25, 2009 at 10:29:38PM +0100, Stefan Weil wrote:
>   
>> Aurelien Jarno schrieb:
>>     
>>> On Sat, Jan 24, 2009 at 10:00:33PM +0100, Stefan Weil wrote:
>>>   
>>>       
>>>> Gleb Natapov schrieb:
>>>>     
>>>>         
>>>>> On Wed, Jan 21, 2009 at 10:17:20PM +0100, Stefan Weil wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> the SIGSEGV crash below can be reproduced with Qemu r6391 and "high" net
>>>>>> load.
>>>>>>
>>>>>> I bootet a mips malta kernel from a debian nfs root. While this worked fine,
>>>>>> aptitude update hangs during downloads, nfs root is lost and after some time
>>>>>> Qemu gets a SIGSEGV.
>>>>>>
>>>>>> A similar crash occurs with a different mips machine (ar7) and different
>>>>>> network hardware (ar7 emac / cpmac), so it is not restricted to pcnet.
>>>>>> This second system does not survive the network boot.
>>>>>>
>>>>>> Up to now, I could not run tests with non-mips systems.
>>>>>>
>>>>>> I'm fairly sure that 2 weeks ago networking worked without problems in both
>>>>>> cases.
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> What is your host CPU?  How you run qemu (what is your command line)?
>>>>>   
>>>>>       
>>>>>           
>>>> Debian amd64 host.
>>>>
>>>> mipsel-softmmu/qemu-system-mipsel --kernel vmlinux \
>>>>     --append "debug nohz=off root=/dev/nfs rw ip=::::malta-le::dhcp" \
>>>>     -M malta --net nic --net user -m 256
>>>>     
>>>>         
>>> The fact that your host system is 64-bit and you are using the user
>>> networking is interesting. You could try to look at revisions 6272 and
>>> 6288, they are probably the cause of your problem.
>>>   
>>>       
>> Yes, you are right. Reverting r6288 results in stable networking again.
>>
>> Thank you for this hint.
>>
>>     
> I also use amd64 host, but I was not able to reproduce this with other
> guests. I'll try mips guest. Can you try to reproduce the problem with
> different guest (x86 for instance)?
>
>
> --
> 			Gleb.
>   

Hi,

Up to now, I could not get the crash with i386 guests, but I don't think it
depends on the guest because all guests share the same binary code of slirp.

It does depend on the kind and on the amount of network
data, because it is triggered only by fragmented packets (see stack in
my first mail).
Please note that you have to compile without optimization in order to
see the
full stack, because ip_reass is normally inlined by the compiler.

Running a qemu-system-mipsel which  was compiled for 32 bit hosts results
in the same kind of crash (I tested this on my amd64 host), so I don't
think it depends
on the host.

Summary: crash in function ip_reass, can be reproduced with
32- and 64-bit qemu executables running on debian amd64,
guest systems big or little endian mips malta or ar7 router
(all guests are running newer linux kernels). The regression
was caused by change r6288.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
       [not found]             ` <20090202163806.GA29674@redhat.com>
@ 2009-02-04 12:01               ` Stefan Weil
  2009-02-05 15:34                 ` Gleb Natapov
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Weil @ 2009-02-04 12:01 UTC (permalink / raw)
  To: Gleb Natapov, QEMU Developers

Gleb Natapov schrieb:
> On Sun, Feb 01, 2009 at 09:11:59PM +0100, Stefan Weil wrote:
>> Hi,
>>
>> Up to now, I could not get the crash with i386 guests, but I don't
>> think it
>> depends on the guest because all guests share the same binary code of
>> slirp.
>>
>> It does depend on the kind and on the amount of network
>> data, because it is triggered only by fragmented packets (see stack in
>> my first mail).
>> Please note that you have to compile without optimization in order to
>> see the
>> full stack, because ip_reass is normally inlined by the compiler.
>>
> Can you catch the traffic with tcpdump in a way suitable for tcpreplay
> and send it to me (or make it available for download)?
>
> --
> Gleb.
>

Hi,

of course. But I found a simple way to reproduce the bug, so I think
this new way is simpler to handle than tcpreplay:

Host: amd64, debian 5.0 (I think others will do, too)
Guest: i686, debian 4.0 (I think others will do, too)

The host must export an NFS filesystem (/tftpboot in my tests).
The guest must be able to mount this NFS filesystem using special options.

Start the guest (hda.img contains a minimal debian 4.0 installation):
$ i386-softmmu/qemu -m 512 -hda ~/hda.img

Mount host NFS on guest:
$ mount 10.0.2.2:/tftpboot /mnt -o
proto=udp,rsize=4096,wsize=4096,nointr,nolock,nfsvers=2

Copy files from host NFS to host NFS on guest:
$ cp /mnt/malta-le/usr/lib/libstdc++.so.6.0.8 /mnt/malta-le/tmp

In my tests, the file to copy has 1164392 bytes, the guest creates
the destination file with 0 bytes and crashs.

The NFS mount options are identical to the options used by Linux NFS root
but different to those used by default. With default NFS options, there
is no crash,
so this explains why I get crashs in my NFS root tests but had difficulties
to get a crash with other network operations.
I know that proto=udp is important but did not check many other
combinations.

With malta and other mips guests, the crash can be reproduced in the
same way,
so I am now fairly sure that any guest (on any host) will crash like this.

Regards
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-02-04 12:01               ` Stefan Weil
@ 2009-02-05 15:34                 ` Gleb Natapov
  2009-02-05 19:24                   ` Stefan Weil
  0 siblings, 1 reply; 10+ messages in thread
From: Gleb Natapov @ 2009-02-05 15:34 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Wed, Feb 04, 2009 at 01:01:36PM +0100, Stefan Weil wrote:
> Hi,
> 
> of course. But I found a simple way to reproduce the bug, so I think
> this new way is simpler to handle than tcpreplay:
> 
> Host: amd64, debian 5.0 (I think others will do, too)
> Guest: i686, debian 4.0 (I think others will do, too)
> 
> The host must export an NFS filesystem (/tftpboot in my tests).
> The guest must be able to mount this NFS filesystem using special options.
> 
> Start the guest (hda.img contains a minimal debian 4.0 installation):
> $ i386-softmmu/qemu -m 512 -hda ~/hda.img
> 
> Mount host NFS on guest:
> $ mount 10.0.2.2:/tftpboot /mnt -o
> proto=udp,rsize=4096,wsize=4096,nointr,nolock,nfsvers=2
> 
> Copy files from host NFS to host NFS on guest:
> $ cp /mnt/malta-le/usr/lib/libstdc++.so.6.0.8 /mnt/malta-le/tmp
> 
> In my tests, the file to copy has 1164392 bytes, the guest creates
> the destination file with 0 bytes and crashs.
> 
> The NFS mount options are identical to the options used by Linux NFS root
> but different to those used by default. With default NFS options, there
> is no crash,
> so this explains why I get crashs in my NFS root tests but had difficulties
> to get a crash with other network operations.
> I know that proto=udp is important but did not check many other
> combinations.
> 
> With malta and other mips guests, the crash can be reproduced in the
> same way,
> so I am now fairly sure that any guest (on any host) will crash like this.
> 
Cool, I can reproduce it now! Can you try the patch below?

Signed-off-by: Gleb Natapov <gleb@redhat.com>

diff --git a/qemu/slirp/ip_input.c b/qemu/slirp/ip_input.c
index e7f2756..f00a2e8 100644
--- a/qemu/slirp/ip_input.c
+++ b/qemu/slirp/ip_input.c
@@ -393,7 +393,7 @@ insert:
 	 */
 	if (m->m_flags & M_EXT) {
 	  int delta;
-	  delta = (char *)ip - m->m_dat;
+	  delta = (char *)q - m->m_dat;
 	  q = (struct ipasfrag *)(m->m_ext + delta);
 	}
 
--
			Gleb.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [BUG] Regression in networking code (SIGSEGV)
  2009-02-05 15:34                 ` Gleb Natapov
@ 2009-02-05 19:24                   ` Stefan Weil
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Weil @ 2009-02-05 19:24 UTC (permalink / raw)
  To: Gleb Natapov, QEMU Developers

Gleb Natapov schrieb:
> On Wed, Feb 04, 2009 at 01:01:36PM +0100, Stefan Weil wrote:
>   
>> Hi,
>>
>> of course. But I found a simple way to reproduce the bug, so I think
>> this new way is simpler to handle than tcpreplay:
>>
>> Host: amd64, debian 5.0 (I think others will do, too)
>> Guest: i686, debian 4.0 (I think others will do, too)
>>
>> The host must export an NFS filesystem (/tftpboot in my tests).
>> The guest must be able to mount this NFS filesystem using special options.
>>
>> Start the guest (hda.img contains a minimal debian 4.0 installation):
>> $ i386-softmmu/qemu -m 512 -hda ~/hda.img
>>
>> Mount host NFS on guest:
>> $ mount 10.0.2.2:/tftpboot /mnt -o
>> proto=udp,rsize=4096,wsize=4096,nointr,nolock,nfsvers=2
>>
>> Copy files from host NFS to host NFS on guest:
>> $ cp /mnt/malta-le/usr/lib/libstdc++.so.6.0.8 /mnt/malta-le/tmp
>>
>> In my tests, the file to copy has 1164392 bytes, the guest creates
>> the destination file with 0 bytes and crashs.
>>
>> The NFS mount options are identical to the options used by Linux NFS root
>> but different to those used by default. With default NFS options, there
>> is no crash,
>> so this explains why I get crashs in my NFS root tests but had difficulties
>> to get a crash with other network operations.
>> I know that proto=udp is important but did not check many other
>> combinations.
>>
>> With malta and other mips guests, the crash can be reproduced in the
>> same way,
>> so I am now fairly sure that any guest (on any host) will crash like this.
>>
>>     
> Cool, I can reproduce it now! Can you try the patch below?
>
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
>
> diff --git a/qemu/slirp/ip_input.c b/qemu/slirp/ip_input.c
> index e7f2756..f00a2e8 100644
> --- a/qemu/slirp/ip_input.c
> +++ b/qemu/slirp/ip_input.c
> @@ -393,7 +393,7 @@ insert:
>  	 */
>  	if (m->m_flags & M_EXT) {
>  	  int delta;
> -	  delta = (char *)ip - m->m_dat;
> +	  delta = (char *)q - m->m_dat;
>  	  q = (struct ipasfrag *)(m->m_ext + delta);
>  	}
>  
> --
> 			Gleb.
>
>   

Very good. Your patch should be applied to Qemu trunk, because
it fixes the network bug which was introduced by r6288.

Mips Malta, Ar7 and i386 no longer crashed in my test scenario.

Regards
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-02-05 19:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-21 21:17 [Qemu-devel] [BUG] Regression in networking code (SIGSEGV) Stefan Weil
2009-01-22  7:38 ` Gleb Natapov
2009-01-24 21:00   ` Stefan Weil
2009-01-24 21:53     ` Aurelien Jarno
2009-01-25 21:29       ` Stefan Weil
2009-01-26  6:29         ` Gleb Natapov
2009-02-01 20:11           ` Stefan Weil
     [not found]             ` <20090202163806.GA29674@redhat.com>
2009-02-04 12:01               ` Stefan Weil
2009-02-05 15:34                 ` Gleb Natapov
2009-02-05 19:24                   ` Stefan Weil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).