* Shared image files and block script performance
@ 2015-09-28 23:14 Mike Latimer
2015-09-29 9:25 ` Ian Campbell
0 siblings, 1 reply; 4+ messages in thread
From: Mike Latimer @ 2015-09-28 23:14 UTC (permalink / raw)
To: xen-devel
Hi,
In an environment with read-only image files being shared across domains, the
block script becomes exponentially slower with every block attached. While
this is irritating with a few domains, it becomes very problematic with
hundreds of domains.
Part of the issue was mentioned in a udev timeout thread started in August.
[1] In that thread, Roger Pau Monné pointed out a portion of the block script
that takes an increasingly longer time to execute with every instance of an
shared image being used. The block of code mentioned is:
shared_list=$(losetup -a |
sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}
[[:blank:]](.*)\)@\1@p" )
for dev in $shared_list
do
if [ -n "$dev" ]
then
check_file_sharing "$file" "$dev" "$mode"
fi
done
The code snippet above builds a list of all loopback mounted devices sharing
the inode of the device being attached. This is not a problem, but the
subsequent check_file_sharing (which leads to check_sharing) performs the
following xenstore actions:
for dom in $(xenstore-list "$base_path")
do
for dev in $(xenstore-list "$base_path/$dom")
do
d=$(xenstore_read_default "$base_path/$dom/$dev/physical-device" "")
...
local m=$(xenstore_read_default "$base_path/$dom/$dev/mode" "")
In other words, for every loopback device found in the "shared_list", the
entire xenstore database is walked - to determine if a domain is using that
device in write mode. If nothing else is, the 'physical-device' and 'mode' of
every vbd block device under every domain is checked and cleared. After
looking through the entire database, the next device is checked - and the tree
walk starts from the beginning again. The net result is an exponential
increase in the execution time of the block attach process.
Is there some way the efficiency of this process can be improved? Some options
I've tried include:
- Reducing the number of calls to xenstore (combining two reads into one,
etc.). I saw no real performance gains, but more optimization could be done.
- Reading all vbd devices from xenstore at the entry of check_sharing, and
searching this output for matching devices. This works, but does not seem to
improve things much as the entire database is still read for every shared_list
device.
- Passing the entire shared_list to check_sharing (in a single variable), and
checking every vbd device found in xenstore against this list. This change had
a dramatic improvement in performance. However, this change seemed like a
hack. Given enough domains, the size of this list could still be an issue.
With the last option above in place, all of my tests showed a block attach
time of around 1 second. Without the change, I saw block attach times from 1
to 1500 (with less than 40 domains sharing one device).
Any better options or ideas?
Thanks,
Mike
[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00132.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Shared image files and block script performance
2015-09-28 23:14 Shared image files and block script performance Mike Latimer
@ 2015-09-29 9:25 ` Ian Campbell
2015-09-29 21:18 ` Mike Latimer
0 siblings, 1 reply; 4+ messages in thread
From: Ian Campbell @ 2015-09-29 9:25 UTC (permalink / raw)
To: Mike Latimer, xen-devel
On Mon, 2015-09-28 at 17:14 -0600, Mike Latimer wrote:
> Any better options or ideas?
Is part of the problem that shell is a terrible choice for this kind of
check?
Would shelling out to a helper utility allow this to be written in
something better?
I think we'd still be constrained in the languages used, probably it would
have to be C to avoid adding dependencies on additional runtimes or
compilers, but that might be good enough to allow us to minimise the number
of times we need to go through the database. If nothing else it would make
it possible to use actual datastructures, locking, XS transactions etc.
Ian.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Shared image files and block script performance
2015-09-29 9:25 ` Ian Campbell
@ 2015-09-29 21:18 ` Mike Latimer
2015-09-30 10:35 ` Ian Campbell
0 siblings, 1 reply; 4+ messages in thread
From: Mike Latimer @ 2015-09-29 21:18 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Hi Ian,
On Tuesday, September 29, 2015 10:25:32 AM Ian Campbell wrote:
> On Mon, 2015-09-28 at 17:14 -0600, Mike Latimer wrote:
> > Any better options or ideas?
>
> Is part of the problem that shell is a terrible choice for this kind of
> check?
There is some truth to that... ;)
> Would shelling out to a helper utility allow this to be written in
> something better?
A helper utility would be useful, however, I'm seeing a huge amount of gain
with nothing more than a little code motion. Specifically, if shared_list is
generated within the check_sharing function, the (potentially) large list of
devices is not too painful to work with.
For example, the attached patch works well in my environment, and removes the
exponential slowdown. The main change is that $devmm becomes a comma delimited
list of devices (major:minor) to check against the vbd's found in xenstore. A
few minor changes are required along the way, but nothing significant. The
comma delimited list might become problematic at very large numbers (hundreds)
of a single shared device, but I don't think it will be a problem in practice.
Even if it has limitations, this approach offers significant improvements in
performance.
I'll continue to test this patch here, but I'm interested in your opinion.
Thanks,
Mike
---
--- a/etc/xen/scripts/block 2015-09-26 16:52:19.000000000 -0600
+++ b/etc/xen/scripts/block 2015-09-29 14:48:43.000000000 -0600
@@ -66,10 +66,27 @@ find_free_loopback_dev() {
#
check_sharing()
{
- local dev="$1"
- local mode="$2"
+ local devtype=$1
+ local dev="$2"
+ local mode="$3"
+
+ if [ "$devtype" = "file" ];
+ then
+ local inode="$4"
+
+ shared_list=$(losetup -a |
+ sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}
[[:blank:]](.*)\)@\1@p" )
+ for dev in $shared_list
+ do
+ if [ -n "$dev" ]
+ then
+ devmm="${devmm}$(device_major_minor $dev),"
+ fi
+ done
+ else
+ local devmm="$(device_major_minor \"$dev\"),"
+ fi
- local devmm=$(device_major_minor "$dev")
local file
if [ "$mode" = 'w' ]
@@ -85,9 +102,9 @@ check_sharing()
then
local d=$(device_major_minor "$file")
- if [ "$d" = "$devmm" ]
+ if [[ "$devmm" == *"$d,"* ]]
then
- echo 'local'
+ echo 'local' "$d"
return
fi
fi
@@ -100,13 +117,13 @@ check_sharing()
do
d=$(xenstore_read_default "$base_path/$dom/$dev/physical-device" "")
- if [ "$d" = "$devmm" ]
+ if [ -n "$d" ] && [[ "$devmm" == *"$d,"* ]]
then
if [ "$mode" = 'w' ]
then
if ! same_vm $dom
then
- echo 'guest'
+ echo 'guest' "$d"
return
fi
else
@@ -117,7 +134,7 @@ check_sharing()
then
if ! same_vm $dom
then
- echo 'guest'
+ echo 'guest' "$d"
return
fi
fi
@@ -146,11 +163,11 @@ check_device_sharing()
return 0
fi
- result=$(check_sharing "$dev" "$mode")
+ result=$(check_sharing "device" "$dev" "$mode")
- if [ "$result" != 'ok' ]
+ if [[ "$result" != "ok"* ]]
then
- do_ebusy "Device $dev is mounted " "$mode" "$result"
+ do_ebusy "Device $dev is mounted " "$mode" "${result%% *}"
fi
}
@@ -166,13 +183,14 @@ check_file_sharing()
local file="$1"
local dev="$2"
local mode="$3"
+ local inode="$4"
- result=$(check_sharing "$dev" "$mode")
+ result=$(check_sharing "file" "$dev" "$mode" "$inode")
- if [ "$result" != 'ok' ]
+ if [[ "$result" != "ok"* ]]
then
- do_ebusy "File $file is loopback-mounted through $dev,
-which is mounted " "$mode" "$result"
+ do_ebusy "File $file is loopback-mounted through ${dev#* },
+which is mounted " "$mode" "${result%% *}"
fi
}
@@ -289,15 +307,7 @@ mount it read-write in a guest domain."
fatal "Unable to lookup $file: dev: $dev inode: $inode"
fi
- shared_list=$(losetup -a |
- sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}
[[:blank:]](.*)\)@\1@p" )
- for dev in $shared_list
- do
- if [ -n "$dev" ]
- then
- check_file_sharing "$file" "$dev" "$mode"
- fi
- done
+ check_file_sharing "$file" "$dev" "$mode" "$inode"
fi
loopdev=$(losetup -f 2>/dev/null || find_free_loopback_dev)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Shared image files and block script performance
2015-09-29 21:18 ` Mike Latimer
@ 2015-09-30 10:35 ` Ian Campbell
0 siblings, 0 replies; 4+ messages in thread
From: Ian Campbell @ 2015-09-30 10:35 UTC (permalink / raw)
To: Mike Latimer; +Cc: xen-devel
On Tue, 2015-09-29 at 15:18 -0600, Mike Latimer wrote:
> Hi Ian,
>
> On Tuesday, September 29, 2015 10:25:32 AM Ian Campbell wrote:
> > On Mon, 2015-09-28 at 17:14 -0600, Mike Latimer wrote:
> > > Any better options or ideas?
> >
> > Is part of the problem that shell is a terrible choice for this kind of
> > check?
>
> There is some truth to that... ;)
>
> > Would shelling out to a helper utility allow this to be written in
> > something better?
>
> A helper utility would be useful, however, I'm seeing a huge amount of gain
> with nothing more than a little code motion. Specifically, if shared_list is
> generated within the check_sharing function, the (potentially) large list of
> devices is not too painful to work with.
>
> For example, the attached patch works well in my environment, and removes the
> exponential slowdown. The main change is that $devmm becomes a comma delimited
> list of devices (major:minor) to check against the vbd's found in xenstore. A
> few minor changes are required along the way, but nothing significant. The
> comma delimited list might become problematic at very large numbers (hundreds)
> of a single shared device, but I don't think it will be a problem in practice.
> Even if it has limitations, this approach offers significant improvements in
> performance.
>
> I'll continue to test this patch here, but I'm interested in your opinion.
If you have a patch which improves things then I think you should just go
ahead formally submit it (http://wiki.xen.org/wiki/Submitting_Xen_Patches).
Ian.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-09-30 10:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-28 23:14 Shared image files and block script performance Mike Latimer
2015-09-29 9:25 ` Ian Campbell
2015-09-29 21:18 ` Mike Latimer
2015-09-30 10:35 ` Ian Campbell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).