All of lore.kernel.org
 help / color / mirror / Atom feed
From: han <vanbas.han@gmail.com>
To: Xen-devel@lists.xensource.com
Subject: Re:  analyze for the P1 bug 593(xensource bug tracker)
Date: Wed, 10 May 2006 21:00:52 +0800	[thread overview]
Message-ID: <4461E404.4000607@gmail.com> (raw)
In-Reply-To: <0EBFB99D260C5B40AC33E0F807B1AD660E08EB@pdsmsx411.ccr.corp.intel.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=gb18030; format=flowed, Size: 3230 bytes --]

Hi, Keir!

Your patch works quite well. We have created and destroyed the VMX more than 500 times, and everything goes OK! I suppose the patch could solve the race condition! You may put the correctness code about VBD and VNIF together and send it to the maillist. We could help you to test it!
I prefer the wait_event and wakeup approach, it is clearer and straightful just as you said! :-)
BTW: I'm out of office right now, so i can't send the patch back to you! That's also why I change to another mailbox to send this mail. 

Thanks a lot for your help!

_______________________________________________________
Best Regards,
hanzhu



Han, Zhu дµÀ:
> Best Regards, 
> hanzhu
>
> -----Original Message-----
> From: Han, Zhu 
> Sent: 2006^[$BG/^[(J5^[$B7n^[(J10^[$BF|^[(J 14:27
> To: Yu, Ke; 'xen-devel@lists.xensource.com'
> Cc: Helix-vmm
> Subject: analyze for the P1 bug 593(xensource bug tracker)
>
> Hi, all!
> Our QA team submitted a bug 593 to xensource bug tracker one month ago and it was boosted up to P1 several days ago! So I spend some time to trace this bug this week! Below words is what I have found:
> 1) This bug is hard to been reproduced on most of the platforms we owns, especially the UP box.  The platform on which we got the bug and could reproduce the bug stably is Paxville, which owns 4 physical CPUs, and 2 cores, 2 hyperthreads for each CPU.
> 2) This root cause of this problem is "losetup -d /dev/loop*" could fail at a rather low probability. "losetup -d /dev/loop*" is invoked by /etc/xen/scripts/block when the script processes remove action. If we exhausted all the loop devices, the VMX cannot be initialized properly. That's why XEND complains "Error: Device creation failed for domain ****". However, if we remove the loop device manually, everything goes OK!
> 3) "losetup -d /dev/loop" failed because kernel/drivers/block/loop.c return EBUSY for the LOOP_CLR_FD ioctl operation. The probable cause for this action is some one else didn't close the loop device when we try to delete it!
> 4) The program opens the loop device could be VBD device driver. It opens the loop device in vbd_create() through open_by_devnum. It closes the handle for the loop device in vbd_free which is called by a schedulable work item free_blkif. Is it true? If so, the problem could be arised by the possible race condition between the work item and the hotplug script! When the xenbus driver is notified the front end device has been destroyed by the xenstore thread, it will remove the backend device and related resources, and then notify the hotplug subsystem the remove action! Because the code close the loop device's handle and the script delete the loop device can run concurrently, the script could fail when it try to delete the loop device!
>
> My question is:
> 1) Does this possible race condition exist?
> 2) Why does the code closing the loop device been put to another out of code workitem instead of finishing all work directly in blkback_remove()? Any operation in free_blkif() could be blocked? Which one?
>
> Since I'm a really newbie to this field, any tips and comments will be appreciated!
> Thanks a lot!
>
>
>
> Best Regards, 
> hanzhu
>
>   

       reply	other threads:[~2006-05-10 13:00 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0EBFB99D260C5B40AC33E0F807B1AD660E08EB@pdsmsx411.ccr.corp.intel.com>
2006-05-10 13:00 ` han [this message]
2006-05-10 12:29   ` Re: analyze for the P1 bug 593(xensource bug tracker) Keir Fraser
2006-05-10  6:26 Han, Zhu
2006-05-10  7:13 ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4461E404.4000607@gmail.com \
    --to=vanbas.han@gmail.com \
    --cc=Xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.