* [PATCH] Fix device removal on net and block frontend drivers
@ 2005-11-21 15:43 Murillo Fernandes Bernardes
2005-11-21 16:07 ` Stefan Berger
2005-11-21 17:48 ` Adam Heath
0 siblings, 2 replies; 10+ messages in thread
From: Murillo Fernandes Bernardes @ 2005-11-21 15:43 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 352 bytes --]
Frontend devices are not being unregistered when in closed state. The
following patch fix that.
Fix bug #420.
Makes "05_attach_and_dettach_device_repeatedly_pos" and
"09_attach_and_dettach_device_check_data_pos" tests pass.
Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
--
Murillo Fernandes Bernardes
IBM Linux Technology Center
[-- Attachment #2: frontend_unregister_device.patch --]
[-- Type: text/x-diff, Size: 1117 bytes --]
diff -r 6a666940fa04 linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c Sun Nov 20 09:19:38 2005
+++ b/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c Mon Nov 21 14:58:42 2005
@@ -273,7 +273,6 @@
case XenbusStateInitialising:
case XenbusStateInitWait:
case XenbusStateInitialised:
- case XenbusStateClosed:
break;
case XenbusStateConnected:
@@ -282,6 +281,10 @@
case XenbusStateClosing:
blkfront_closing(dev);
+ break;
+
+ case XenbusStateClosed:
+ device_unregister(&dev->dev);
break;
}
}
diff -r 6a666940fa04 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Sun Nov 20 09:19:38 2005
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Nov 21 14:58:42 2005
@@ -406,11 +406,14 @@
case XenbusStateInitialised:
case XenbusStateConnected:
case XenbusStateUnknown:
- case XenbusStateClosed:
break;
case XenbusStateClosing:
netfront_closing(dev);
+ break;
+
+ case XenbusStateClosed:
+ device_unregister(&dev->dev);
break;
}
}
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 15:43 [PATCH] Fix device removal on net and block frontend drivers Murillo Fernandes Bernardes
@ 2005-11-21 16:07 ` Stefan Berger
2005-11-21 16:33 ` Murillo Fernandes Bernardes
2005-11-21 17:48 ` Adam Heath
1 sibling, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2005-11-21 16:07 UTC (permalink / raw)
To: Murillo Fernandes Bernardes; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 932 bytes --]
xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM:
>
> Frontend devices are not being unregistered when in closed state. The
> following patch fix that.
>
> Fix bug #420.
>
> Makes "05_attach_and_dettach_device_repeatedly_pos" and
> "09_attach_and_dettach_device_check_data_pos" tests pass.
Did you test this with suspending / resuming a dom U? The reason I am
asking is that when suspending the driver immediately gets into state
'Closed' and when resuming into state 'Connected', but now your device is
unregistered.
Stefan
>
>
> Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
>
> --
> Murillo Fernandes Bernardes
> IBM Linux Technology Center
> [attachment "frontend_unregister_device.patch" deleted by Stefan
> Berger/Watson/IBM] _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
[-- Attachment #1.2: Type: text/html, Size: 1236 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 16:07 ` Stefan Berger
@ 2005-11-21 16:33 ` Murillo Fernandes Bernardes
2005-11-21 16:49 ` Stefan Berger
0 siblings, 1 reply; 10+ messages in thread
From: Murillo Fernandes Bernardes @ 2005-11-21 16:33 UTC (permalink / raw)
To: Stefan Berger; +Cc: xen-devel
On Monday 21 November 2005 14:07, Stefan Berger wrote:
> xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM:
> > Frontend devices are not being unregistered when in closed state. The
> > following patch fix that.
> >
> > Fix bug #420.
> >
> > Makes "05_attach_and_dettach_device_repeatedly_pos" and
> > "09_attach_and_dettach_device_check_data_pos" tests pass.
>
> Did you test this with suspending / resuming a dom U? The reason I am
> asking is that when suspending the driver immediately gets into state
> 'Closed' and when resuming into state 'Connected', but now your device is
> unregistered.
No, I did not test suspend/resume.
I really don't see why it should get into Closed on suspend, but anyway, is
this really hapenning? I could not find any switch to Closed into suspend's
code, neither on resume.
How to test suspend/resume on a domU? It does not have /sys/power/state
neither /proc/sleep.
--
Murillo Fernandes Bernardes
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 16:33 ` Murillo Fernandes Bernardes
@ 2005-11-21 16:49 ` Stefan Berger
2005-11-21 18:31 ` Ewan Mellor
0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2005-11-21 16:49 UTC (permalink / raw)
To: Murillo Fernandes Bernardes; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1868 bytes --]
Murillo Fernandes Bernardes <mfb@br.ibm.com> wrote on 11/21/2005 11:33:31
AM:
> On Monday 21 November 2005 14:07, Stefan Berger wrote:
> > xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM:
> > > Frontend devices are not being unregistered when in closed state.
The
> > > following patch fix that.
> > >
> > > Fix bug #420.
> > >
> > > Makes "05_attach_and_dettach_device_repeatedly_pos" and
> > > "09_attach_and_dettach_device_check_data_pos" tests pass.
> >
> > Did you test this with suspending / resuming a dom U? The reason I am
> > asking is that when suspending the driver immediately gets into state
> > 'Closed' and when resuming into state 'Connected', but now your
device is
> > unregistered.
>
> No, I did not test suspend/resume.
>
> I really don't see why it should get into Closed on suspend, but anyway,
is
> this really hapenning? I could not find any switch to Closed into
suspend's
> code, neither on resume.
>
What I am seeing is that after a suspend / resume the interface 'eth0' is
completely gone. 'ifconfig -a' shows everything, but no eth0.
You might only want to unregister if the domain was not suspended. So you
probably need to implement the .suspend function in the frontend and set a
state variable to know whether the domain is being hibernated, and you
clear that variable in the .resume. You check that variable when the
driver is going into the 'Closed' state and only unregister if not in
'suspend' mode.
> How to test suspend/resume on a domU? It does not have /sys/power/state
> neither /proc/sleep.
'xm save <dom id> <dom state filename>' lets you suspend a domain
'xm restore <dom state filename>' lets you resume a domain.
I would only use the network driver for testing this by booting into a
RAMDisk.
Stefan
>
> --
> Murillo Fernandes Bernardes
> IBM Linux Technology Center
[-- Attachment #1.2: Type: text/html, Size: 2486 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 15:43 [PATCH] Fix device removal on net and block frontend drivers Murillo Fernandes Bernardes
2005-11-21 16:07 ` Stefan Berger
@ 2005-11-21 17:48 ` Adam Heath
2005-11-21 18:40 ` Ewan Mellor
1 sibling, 1 reply; 10+ messages in thread
From: Adam Heath @ 2005-11-21 17:48 UTC (permalink / raw)
To: Murillo Fernandes Bernardes; +Cc: xen-devel@lists.xensource.com
On Mon, 21 Nov 2005, Murillo Fernandes Bernardes wrote:
>
> Frontend devices are not being unregistered when in closed state. The
> following patch fix that.
>
> Fix bug #420.
>
> Makes "05_attach_and_dettach_device_repeatedly_pos" and
> "09_attach_and_dettach_device_check_data_pos" tests pass.
>
>
> Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
Hmm. I have a way to make dom0 in unstable kernel-oops. If I attempt to
setup a virtual block device to a /dev/nbN(enbd), but never actually configure
the /dev/nbN, I get a kernel oops in dom0 when the domU shutdowns down. I can
then no longer reboot or shutdown the dom0.
Would this be related?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 16:49 ` Stefan Berger
@ 2005-11-21 18:31 ` Ewan Mellor
2005-11-21 20:02 ` Stefan Berger
2005-11-21 21:11 ` Murillo Fernandes Bernardes
0 siblings, 2 replies; 10+ messages in thread
From: Ewan Mellor @ 2005-11-21 18:31 UTC (permalink / raw)
To: Stefan Berger; +Cc: Murillo Fernandes Bernardes, xen-devel
On Mon, Nov 21, 2005 at 11:49:19AM -0500, Stefan Berger wrote:
> Murillo Fernandes Bernardes <mfb@br.ibm.com> wrote on 11/21/2005 11:33:31
> AM:
>
> > On Monday 21 November 2005 14:07, Stefan Berger wrote:
> > > xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01 AM:
> > > > Frontend devices are not being unregistered when in closed state.
> The
> > > > following patch fix that.
> > > >
> > > > Fix bug #420.
> > > >
> > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and
> > > > "09_attach_and_dettach_device_check_data_pos" tests pass.
> > >
> > > Did you test this with suspending / resuming a dom U? The reason I am
> > > asking is that when suspending the driver immediately gets into state
> > > 'Closed' and when resuming into state 'Connected', but now your
> device is
> > > unregistered.
> >
> > No, I did not test suspend/resume.
> >
> > I really don't see why it should get into Closed on suspend, but anyway,
> is
> > this really hapenning? I could not find any switch to Closed into
> suspend's
> > code, neither on resume.
xenbus_read_driver_state returns Closed if the backend path is no longer
present. Maybe this is where the Closed has come from. However,
xenbus_probe.c:otherend_changed is supposed to be protecting us from watches
that have fired immediately after a resume.
Could you please enable the DPRINTK in xenbus_probe and see whether the Closed
is coming through the test in otherend_changed? This would help diagnose the
problem.
The intention with xenbus_read_driver_state returning Closed was that this was
the correct way of forcing the driver to close down if the path goes away, as
in normal use the backend path should not just disappear, and for resumption
we have a way to detect that. Perhaps one or other of these things should
change, but it's not clear to me which one it is, or if indeed this is the
problem at all.
> What I am seeing is that after a suspend / resume the interface 'eth0' is
> completely gone. 'ifconfig -a' shows everything, but no eth0.
>
> You might only want to unregister if the domain was not suspended. So you
> probably need to implement the .suspend function in the frontend and set a
> state variable to know whether the domain is being hibernated, and you
> clear that variable in the .resume. You check that variable when the
> driver is going into the 'Closed' state and only unregister if not in
> 'suspend' mode.
If this is necessary, and it's not clear to me that it is, then this is a
facility that Xenbus should provide in general, rather than each driver having
to hack around the problem itself.
Returning to Murillo's patch, I assumed that the unregister_netdev in
close_netdev would implicitly call device_unregister, and that this was the
correct way to close down the device. Is this not the case?
My intention for closedown of the device was that the backend would move to
state Closing, triggering a graceful shutdown of the frontend (in this case
through netfront_closing, close_netdev, etc.). AFAIK, Xend is correctly
setting the backend to state Closing, so I expect unregister_netdev to be
being called.
There is the different issue that Xend does not check for the existence or
state of a device before hotplugging a new one. This means that the frontend
might not have time to see the Closing before having a chance to close down,
for example. This is a problem with Xend that needs to be fixed there. Xend
should refuse to hotplug a device if the frontend for the old one has not yet
closed down. This is not to say that Murillo's patch is wrong, but simply to
say that I expect wider issues than can be fixed by this patch alone.
Ewan.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 17:48 ` Adam Heath
@ 2005-11-21 18:40 ` Ewan Mellor
2005-11-21 21:30 ` Adam Heath
0 siblings, 1 reply; 10+ messages in thread
From: Ewan Mellor @ 2005-11-21 18:40 UTC (permalink / raw)
To: Adam Heath; +Cc: Murillo Fernandes Bernardes, xen-devel@lists.xensource.com
On Mon, Nov 21, 2005 at 11:48:10AM -0600, Adam Heath wrote:
> On Mon, 21 Nov 2005, Murillo Fernandes Bernardes wrote:
>
> >
> > Frontend devices are not being unregistered when in closed state. The
> > following patch fix that.
> >
> > Fix bug #420.
> >
> > Makes "05_attach_and_dettach_device_repeatedly_pos" and
> > "09_attach_and_dettach_device_check_data_pos" tests pass.
> >
> >
> > Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
>
> Hmm. I have a way to make dom0 in unstable kernel-oops. If I attempt to
> setup a virtual block device to a /dev/nbN(enbd), but never actually configure
> the /dev/nbN, I get a kernel oops in dom0 when the domU shutdowns down. I can
> then no longer reboot or shutdown the dom0.
>
> Would this be related?
This doesn't seem very related, no. What does your oops look like?
Ewan.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 18:31 ` Ewan Mellor
@ 2005-11-21 20:02 ` Stefan Berger
2005-11-21 21:11 ` Murillo Fernandes Bernardes
1 sibling, 0 replies; 10+ messages in thread
From: Stefan Berger @ 2005-11-21 20:02 UTC (permalink / raw)
To: Ewan Mellor; +Cc: Murillo Fernandes Bernardes, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 6982 bytes --]
xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 01:31:48 PM:
> On Mon, Nov 21, 2005 at 11:49:19AM -0500, Stefan Berger wrote:
>
> > Murillo Fernandes Bernardes <mfb@br.ibm.com> wrote on 11/21/2005
11:33:31
> > AM:
> >
> > > On Monday 21 November 2005 14:07, Stefan Berger wrote:
> > > > xen-devel-bounces@lists.xensource.com wrote on 11/21/2005 10:43:01
AM:
> > > > > Frontend devices are not being unregistered when in closed
state.
> > The
> > > > > following patch fix that.
> > > > >
> > > > > Fix bug #420.
> > > > >
> > > > > Makes "05_attach_and_dettach_device_repeatedly_pos" and
> > > > > "09_attach_and_dettach_device_check_data_pos" tests pass.
> > > >
> > > > Did you test this with suspending / resuming a dom U? The reason I
am
> > > > asking is that when suspending the driver immediately gets into
state
> > > > 'Closed' and when resuming into state 'Connected', but now your
> > device is
> > > > unregistered.
> > >
> > > No, I did not test suspend/resume.
> > >
> > > I really don't see why it should get into Closed on suspend, but
anyway,
> > is
> > > this really hapenning? I could not find any switch to Closed into
> > suspend's
> > > code, neither on resume.
>
> xenbus_read_driver_state returns Closed if the backend path is no longer
> present. Maybe this is where the Closed has come from. However,
> xenbus_probe.c:otherend_changed is supposed to be protecting us from
watches
> that have fired immediately after a resume.
>
> Could you please enable the DPRINTK in xenbus_probe and see whether the
Closed
> is coming through the test in otherend_changed? This would help
diagnose the
> problem.
Here's the log from domain 0's /var/log/messages:
Nov 21 14:54:43 jlfb-2 gpm[2667]: *** info [startup.c(95)]:
Nov 21 14:54:43 jlfb-2 gpm[2667]: Started gpm successfully. Entered daemon
mode.
Nov 21 14:54:43 jlfb-2 gpm[2667]: *** info [mice.c(1766)]:
Nov 21 14:54:43 jlfb-2 gpm[2667]: imps2: Auto-detected intellimouse PS/2
Nov 21 14:54:52 jlfb-2 fstab-sync[2817]: removed all generated mount
points
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend:639) .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe
(xenbus_probe_backend_unit:624) backend/vif/15/0
Nov 21 14:54:55 jlfb-2 kernel: .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend:639) .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe
(xenbus_probe_backend_unit:624) backend/vtpm/15/6
Nov 21 14:54:55 jlfb-2 kernel: .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (xenbus_probe_backend:639) .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe
(xenbus_probe_backend_unit:624) backend/vtpm/2/6
Nov 21 14:54:55 jlfb-2 kernel: .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (frontend_changed:763) .
Nov 21 14:54:55 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:54:55 jlfb-2 gconfd (root-2848): starting (version 2.10.0), pid
2848 user 'root'
Nov 21 14:54:55 jlfb-2 gconfd (root-2848): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration
source at position 0
Nov 21 14:54:55 jlfb-2 gconfd (root-2848): Resolved address
"xml:readwrite:/root/.gconf" to a writable configuration source at
position 1
Nov 21 14:54:56 jlfb-2 gconfd (root-2848): Resolved address
"xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration
source at position 2
Nov 21 14:55:03 jlfb-2 gconfd (root-2848): Resolved address
"xml:readwrite:/root/.gconf" to a writable configuration source at
position 0
below: starting user domain
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232)
.
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (xenbus_dev_probe:338) .
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:55:40 jlfb-2 last message repeated 5 times
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state
is 1, /local/domain/1/device/vif/0/state,
/local/domain/1/device/vif/0/state.
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232)
.
Nov 21 14:55:40 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:55:41 jlfb-2 kernel: device vif1.0 entered promiscuous mode
Nov 21 14:55:41 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering learning
state
Nov 21 14:55:41 jlfb-2 kernel: xenbr0: topology change detected,
propagating
Nov 21 14:55:41 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering forwarding
state
Nov 21 14:55:41 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:55:45 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state
is 4, /local/domain/1/device/vif/0/state,
/local/domain/1/device/vif/0/state.
Nov 21 14:55:45 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
below: suspending user domain
Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state
is 6, /local/domain/1/device/vif/0/state,
/local/domain/1/device/vif/0/state.
Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (xenbus_dev_remove:376) .
Nov 21 14:56:09 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering disabled
state
Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232)
.
Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:56:09 jlfb-2 kernel: device vif1.0 left promiscuous mode
Nov 21 14:56:09 jlfb-2 kernel: xenbr0: port 1(vif1.0) entering disabled
state
Nov 21 14:56:09 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
below: resuming user domain
Nov 21 14:56:30 jlfb-2 last message repeated 2 times
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232)
.
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_dev_probe:338) .
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:56:30 jlfb-2 last message repeated 5 times
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state
is 1, /local/domain/2/device/vif/0/state,
/local/domain/2/device/vif/0/state.
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232)
.
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (otherend_changed:307) state
is 6, /local/domain/2/device/vif/0/state,
/local/domain/2/device/vif/0/state.
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_dev_remove:376) .
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (xenbus_hotplug_backend:232)
.
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Nov 21 14:56:30 jlfb-2 kernel: device vif2.0 entered promiscuous mode
Nov 21 14:56:30 jlfb-2 kernel: xenbr0: port 1(vif2.0) entering learning
state
Nov 21 14:56:30 jlfb-2 kernel: xenbr0: topology change detected,
propagating
Nov 21 14:56:30 jlfb-2 kernel: xenbr0: port 1(vif2.0) entering forwarding
state
Nov 21 14:56:30 jlfb-2 kernel: xenbus;_probe (backend_changed:771) .
Result: eth0 is gone.
This is a user domain that is booting into a RAM disk. Memory of the user
domain is 64M.
Stefan
[-- Attachment #1.2: Type: text/html, Size: 8178 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 18:31 ` Ewan Mellor
2005-11-21 20:02 ` Stefan Berger
@ 2005-11-21 21:11 ` Murillo Fernandes Bernardes
1 sibling, 0 replies; 10+ messages in thread
From: Murillo Fernandes Bernardes @ 2005-11-21 21:11 UTC (permalink / raw)
To: Ewan Mellor; +Cc: xen-devel, Stefan Berger
On Monday 21 November 2005 16:31, Ewan Mellor wrote:
> On Mon, Nov 21, 2005 at 11:49:19AM -0500, Stefan Berger wrote:
> Could you please enable the DPRINTK in xenbus_probe and see whether the
> Closed is coming through the test in otherend_changed? This would help
> diagnose the problem.
>
on DomU:
xenbus_probe (xenbus_suspend:831) .
xenbus_probe (suspend_dev:786) .
xenbus_probe (suspend_dev:786) .
xenbus_probe (suspend_dev:786) .
xenbus_probe (resume_dev:806) .
xenbus_probe (otherend_changed:301) state is
6, /local/domain/0/backend/vif/5/0/state, /local/domain/0/backend/vif/5/0/state.
xenbus_probe (otherend_changed:301) state is
6, /local/domain/0/backend/vbd/5/770/state, /local/domain/0/backend/vbd/5/770/state.
xenbus_probe (resume_dev:806) .
xenbus_probe (backend_changed:764) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (resume_dev:806) .
xenbus_probe (otherend_changed:301) state is
4, /local/domain/0/backend/vbd/6/769/state, /local/domain/0/backend/vbd/6/769/state.
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (otherend_changed:301) state is
4, /local/domain/0/backend/vbd/6/769/state, /local/domain/0/backend/vbd/6/769/state.
xenbus_probe (otherend_changed:301) state is
4, /local/domain/0/backend/vbd/6/770/state, /local/domain/0/backend/vbd/6/770/state.
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (otherend_changed:301) state is
4, /local/domain/0/backend/vbd/6/770/state, /local/domain/0/backend/vbd/6/770/state.
xenbus_probe (otherend_changed:301) state is
4, /local/domain/0/backend/vif/6/0/state, /local/domain/0/backend/vif/6/0/state.
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (frontend_changed:756) .
xenbus_probe (otherend_changed:301) state is
4, /local/domain/0/backend/vif/6/0/state, /local/domain/0/backend/vif/6/0/state.
> The intention with xenbus_read_driver_state returning Closed was that this
> was the correct way of forcing the driver to close down if the path goes
> away, as in normal use the backend path should not just disappear, and for
> resumption we have a way to detect that. Perhaps one or other of these
> things should change, but it's not clear to me which one it is, or if
> indeed this is the problem at all.
>
> > What I am seeing is that after a suspend / resume the interface 'eth0' is
> > completely gone. 'ifconfig -a' shows everything, but no eth0.
> >
> > You might only want to unregister if the domain was not suspended. So you
> > probably need to implement the .suspend function in the frontend and set
> > a state variable to know whether the domain is being hibernated, and you
> > clear that variable in the .resume. You check that variable when the
> > driver is going into the 'Closed' state and only unregister if not in
> > 'suspend' mode.
>
> If this is necessary, and it's not clear to me that it is, then this is a
> facility that Xenbus should provide in general, rather than each driver
> having to hack around the problem itself.
What about a XenbusStateSuspended?
>
> Returning to Murillo's patch, I assumed that the unregister_netdev in
> close_netdev would implicitly call device_unregister, and that this was the
> correct way to close down the device. Is this not the case?
>
It is not happening. All references to device were cleared ? I'm not sure if
it is needed in this case.
> There is the different issue that Xend does not check for the existence or
> state of a device before hotplugging a new one. This means that the
> frontend might not have time to see the Closing before having a chance to
> close down, for example. This is a problem with Xend that needs to be
> fixed there. Xend should refuse to hotplug a device if the frontend for
> the old one has not yet closed down. This is not to say that Murillo's
> patch is wrong, but simply to say that I expect wider issues than can be
> fixed by this patch alone.
>
--
Murillo Fernandes Bernardes
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Fix device removal on net and block frontend drivers
2005-11-21 18:40 ` Ewan Mellor
@ 2005-11-21 21:30 ` Adam Heath
0 siblings, 0 replies; 10+ messages in thread
From: Adam Heath @ 2005-11-21 21:30 UTC (permalink / raw)
To: Ewan Mellor; +Cc: Murillo Fernandes Bernardes, xen-devel@lists.xensource.com
On Mon, 21 Nov 2005, Ewan Mellor wrote:
> On Mon, Nov 21, 2005 at 11:48:10AM -0600, Adam Heath wrote:
>
> > On Mon, 21 Nov 2005, Murillo Fernandes Bernardes wrote:
> >
> > >
> > > Frontend devices are not being unregistered when in closed state. The
> > > following patch fix that.
> > >
> > > Fix bug #420.
> > >
> > > Makes "05_attach_and_dettach_device_repeatedly_pos" and
> > > "09_attach_and_dettach_device_check_data_pos" tests pass.
> > >
> > >
> > > Signed-off-by: Murillo Fernandes Bernardes <mfb@br.ibm.com>
> >
> > Hmm. I have a way to make dom0 in unstable kernel-oops. If I attempt to
> > setup a virtual block device to a /dev/nbN(enbd), but never actually configure
> > the /dev/nbN, I get a kernel oops in dom0 when the domU shutdowns down. I can
> > then no longer reboot or shutdown the dom0.
> >
> > Would this be related?
>
> This doesn't seem very related, no. What does your oops look like?
>From the config file:
==
disk = [ 'phy:/dev/space/xen-0-16-swap-0,hda,w',
'phy:/dev/space/xen-0-16-tmp,hdb,w', 'phy:/dev/nda,hdc,w',
'phy:/dev/ndb,hdd,w' ]
==
/dev/nda and /dev/ndb have not been configured yet.
>From the domU:
==
[61776.756910] Registering block device major 3
[61776.756999] hda: unknown partition table
[61776.782867] hdb: unknown partition table
[61776.805007] Registering block device major 22
[61776.805096] hdc:end_request: I/O error, dev hdc, sector 0
[61776.805225] Buffer I/O error on device hdc, logical block 0
[61776.805372] end_request: I/O error, dev hdc, sector 0
[61776.805379] Buffer I/O error on device hdc, logical block 0
[61776.805392] unable to read partition table
==
And from dom0, the oops(plus leading lines from syslog):
Nov 21 14:21:38 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/768
Nov 21 14:21:39 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/768/physical-device 0xfe01 backend/vbd/1/768/node /dev/space/xen-0-16-swap-0 to xenstore.
Nov 21 14:21:39 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/768/hotplug-status connected to xenstore.
Nov 21 14:21:39 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/832
Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/832/physical-device 0xfe02 backend/vbd/1/832/node /dev/space/xen-0-16-tmp to xenstore.
Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/832/hotplug-status connected to xenstore.
Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/5632
Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5632/physical-device 0x2b00 backend/vbd/1/5632/node /dev/nda to xenstore.
Nov 21 14:21:40 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5632/hotplug-status connected to xenstore.
Nov 21 14:21:41 xen-3 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/1/5696
Nov 21 14:21:41 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5696/physical-device 0x2b10 backend/vbd/1/5696/node /dev/ndb to xenstore.
Nov 21 14:21:41 xen-3 logger: /etc/xen/scripts/block: Writing backend/vbd/1/5696/hotplug-status connected to xenstore.
Nov 21 14:21:42 xen-3 logger: /etc/xen/scripts/vif-route: online XENBUS_PATH=backend/vif/1/0
Nov 21 14:21:43 xen-3 kernel: [61773.038490] ip_tables: (C) 2000-2002 Netfilter core team
Nov 21 14:21:43 xen-3 logger: /etc/xen/scripts/vif-route: Writing backend/vif/1/0/hotplug-status connected to xenstore.
Nov 21 14:21:46 xen-3 kernel: [61776.805155] nbd0: Request when not-ready
Nov 21 14:21:46 xen-3 kernel: [61776.805187] end_request: I/O error, dev nbd0, sector 0
Nov 21 14:21:46 xen-3 kernel: [61776.805323] nbd0: Request when not-ready
Nov 21 14:21:46 xen-3 kernel: [61776.805343] end_request: I/O error, dev nbd0, sector 0
Nov 21 14:21:46 xen-3 kernel: [61776.820855] general protection fault: 0000 [#1]
Nov 21 14:21:46 xen-3 kernel: [61776.820879] SMP
Nov 21 14:21:46 xen-3 kernel: [61776.820898] Modules linked in: ipt_physdev iptable_filter ip_tables i2c_i801 i2c_core dm_mod nbd sd_mod ata_
piix libata scsi_mod
Nov 21 14:21:46 xen-3 kernel: [61776.820979] CPU: 0
Nov 21 14:21:46 xen-3 kernel: [61776.820980] EIP: 0061:[<c0160b60>] Not tainted VLI
Nov 21 14:21:46 xen-3 kernel: [61776.820982] EFLAGS: 00010282 (2.6.12.6-xen)
Nov 21 14:21:46 xen-3 kernel: [61776.821039] EIP is at blkdev_put+0x9/0x13c
Nov 21 14:21:46 xen-3 kernel: [61776.821059] eax: fffffffa ebx: fffffffa ecx: 00000000 edx: 00000106
Nov 21 14:21:46 xen-3 kernel: [61776.821083] esi: c59c46a0 edi: c005a800 ebp: c59c4658 esp: c0427f40
Nov 21 14:21:46 xen-3 kernel: [61776.821107] ds: 007b es: 007b ss: 0069
Nov 21 14:21:46 xen-3 kernel: [61776.821126] Process events/0 (pid: 4, threadinfo=c0426000 task=c0057a20)
Nov 21 14:21:46 xen-3 kernel: [61776.821137] Stack: 00000000 c59c467c c59c46a0 c005a800 c59c4658 c025e0bb c59c4658 c025df6c
Nov 21 14:21:46 xen-3 kernel: [61776.821207] 00000000 c012c343 00000000 00000002 c1114c60 000de3dd c005a80c c005a814
Nov 21 14:21:46 xen-3 kernel: [61776.821272] c0426000 c59c469c c025df4a 00000001 00000000 c1114c60 00010000 00000000
Nov 21 14:21:46 xen-3 kernel: [61776.821337] Call Trace:
Nov 21 14:21:46 xen-3 kernel: [61776.821366] [<c025e0bb>] vbd_free+0xf/0x18
Nov 21 14:21:46 xen-3 kernel: [61776.821394] [<c025df6c>] free_blkif+0x22/0x4c
Nov 21 14:21:46 xen-3 kernel: [61776.821421] [<c012c343>] worker_thread+0x175/0x242
Nov 21 14:21:46 xen-3 kernel: [61776.821450] [<c025df4a>] free_blkif+0x0/0x4c
Nov 21 14:21:46 xen-3 kernel: [61776.822757] [<c0118fd3>] default_wake_function+0x0/0xc
Nov 21 14:21:46 xen-3 kernel: [61776.822786] [<c012c1ce>] worker_thread+0x0/0x242
Nov 21 14:21:46 xen-3 kernel: [61776.822814] [<c012ff99>] kthread+0x93/0x97
Nov 21 14:21:46 xen-3 kernel: [61776.822840] [<c012ff06>] kthread+0x0/0x97
Nov 21 14:21:46 xen-3 kernel: [61776.822867] [<c01070b5>] kernel_thread_helper+0x5/0xb
Nov 21 14:21:46 xen-3 kernel: [61776.822894] Code: 89 d8 5b 5e 5f c3 89 f2 89 f8 e8 b6 fa ff ff 89 c3 85 c0 74 e9 89 f8 e8 06 00 00 00 89 d8
5b 5e 5f c3 55 57 56 53 83 ec 04 89 c3 <8b> 70 04 8b 78 58 8d 40 0c 89 04 24 f0 ff 4b 0c 0f 88 3d 03 00
==
Doing an objdump of fs/block_dev.c(where blkdev_put exists), I get this:
==
00000ca7 <blkdev_put>:
ca7: 55 push %ebp
ca8: 57 push %edi
ca9: 56 push %esi
caa: 53 push %ebx
cab: 83 ec 04 sub $0x4,%esp
cae: 89 c3 mov %eax,%ebx
cb0: 8b 70 04 mov 0x4(%eax),%esi
cb3: 8b 78 58 mov 0x58(%eax),%edi
cb6: 8d 40 0c lea 0xc(%eax),%eax
cb9: 89 04 24 mov %eax,(%esp)
cbc: f0 ff 4b 0c lock decl 0xc(%ebx)
cc0: 0f 88 3d 03 00 00 js 1003 <.text.lock.block_dev+0x7e>
cc6: e8 fc ff ff ff call cc7 <lock_kernel+0xcc7>
ccb: 8b 43 08 mov 0x8(%ebx),%eax
==
So, address cb0 is at fault. The snippet from block_dev.c has this:
==
int blkdev_put(struct block_device *bdev)
{
int ret = 0;
struct inode *bd_inode = bdev->bd_inode;
struct gendisk *disk = bdev->bd_disk;
==
bdev->bd_inode is at offset 4. Combined with eax being fffffffa, we get
either a wrap-around, or overflow, on the addressing.
This all points, however, to vbd->bdev not being initialized properly. In
fact, with eax being what it is(-5 is signed int land), makes me believe some
error condition isn't being checked.
However, after I read the log again, it looks like an error occurs once,
something is freed, and error occurs again, then the oops occurs. However,
that may just be coincidence.
Anyways, that should be enough info for someone a bit more knowledgable to
debug this.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-11-21 21:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-21 15:43 [PATCH] Fix device removal on net and block frontend drivers Murillo Fernandes Bernardes
2005-11-21 16:07 ` Stefan Berger
2005-11-21 16:33 ` Murillo Fernandes Bernardes
2005-11-21 16:49 ` Stefan Berger
2005-11-21 18:31 ` Ewan Mellor
2005-11-21 20:02 ` Stefan Berger
2005-11-21 21:11 ` Murillo Fernandes Bernardes
2005-11-21 17:48 ` Adam Heath
2005-11-21 18:40 ` Ewan Mellor
2005-11-21 21:30 ` Adam Heath
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.