xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Shared image files and block script performance
@ 2015-09-28 23:14 Mike Latimer
  2015-09-29  9:25 ` Ian Campbell
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Latimer @ 2015-09-28 23:14 UTC (permalink / raw)
  To: xen-devel

Hi,

In an environment with read-only image files being shared across domains, the 
block script becomes exponentially slower with every block attached. While 
this is irritating with a few domains, it becomes very problematic with 
hundreds of domains.

Part of the issue was mentioned in a udev timeout thread started in August. 
[1] In that thread, Roger Pau Monné pointed out a portion of the block script 
that takes an increasingly longer time to execute with every instance of an 
shared image being used. The block of code mentioned is:

shared_list=$(losetup -a |
          sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}
[[:blank:]](.*)\)@\1@p" )
for dev in $shared_list
do
  if [ -n "$dev" ]
  then
    check_file_sharing "$file" "$dev" "$mode"
  fi
done

The code snippet above builds a list of all loopback mounted devices sharing 
the inode of the device being attached. This is not a problem, but the 
subsequent check_file_sharing (which leads to check_sharing) performs the 
following xenstore actions:

for dom in $(xenstore-list "$base_path")
do
  for dev in $(xenstore-list "$base_path/$dom")
  do
    d=$(xenstore_read_default "$base_path/$dom/$dev/physical-device" "")
    ...
    local m=$(xenstore_read_default "$base_path/$dom/$dev/mode" "")

In other words, for every loopback device found in the "shared_list", the 
entire xenstore database is walked - to determine if a domain is using that 
device in write mode. If nothing else is, the 'physical-device' and 'mode' of 
every vbd block device under every domain is checked and cleared. After 
looking through the entire database, the next device is checked - and the tree 
walk starts from the beginning again. The net result is an exponential 
increase in the execution time of the block attach process.

Is there some way the efficiency of this process can be improved? Some options 
I've tried include:

 - Reducing the number of calls to xenstore (combining two reads into one, 
etc.). I saw no real performance gains, but more optimization could be done.

 - Reading all vbd devices from xenstore at the entry of check_sharing, and 
searching this output for matching devices. This works, but does not seem to 
improve things much as the entire database is still read for every shared_list 
device.

 - Passing the entire shared_list to check_sharing (in a single variable), and 
checking every vbd device found in xenstore against this list. This change had 
a dramatic improvement in performance. However, this change seemed like a 
hack. Given enough domains, the size of this list could still be an issue.

With the last option above in place, all of my tests showed a block attach 
time of around 1 second. Without the change, I saw block attach times from 1 
to 1500 (with less than 40 domains sharing one device).

Any better options or ideas?

Thanks,
Mike

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00132.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Shared image files and block script performance
  2015-09-28 23:14 Shared image files and block script performance Mike Latimer
@ 2015-09-29  9:25 ` Ian Campbell
  2015-09-29 21:18   ` Mike Latimer
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Campbell @ 2015-09-29  9:25 UTC (permalink / raw)
  To: Mike Latimer, xen-devel

On Mon, 2015-09-28 at 17:14 -0600, Mike Latimer wrote:
> Any better options or ideas?

Is part of the problem that shell is a terrible choice for this kind of
check?

Would shelling out to a helper utility allow this to be written in
something better?

I think we'd still be constrained in the languages used, probably it would
have to be C to avoid adding dependencies on additional runtimes or
compilers, but that might be good enough to allow us to minimise the number
of times we need to go through the database. If nothing else it would make
it possible to use actual datastructures, locking, XS transactions etc.

Ian.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Shared image files and block script performance
  2015-09-29  9:25 ` Ian Campbell
@ 2015-09-29 21:18   ` Mike Latimer
  2015-09-30 10:35     ` Ian Campbell
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Latimer @ 2015-09-29 21:18 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Hi Ian,

On Tuesday, September 29, 2015 10:25:32 AM Ian Campbell wrote:
> On Mon, 2015-09-28 at 17:14 -0600, Mike Latimer wrote:
> > Any better options or ideas?
> 
> Is part of the problem that shell is a terrible choice for this kind of
> check?

There is some truth to that...  ;)

> Would shelling out to a helper utility allow this to be written in
> something better?

A helper utility would be useful, however, I'm seeing a huge amount of gain 
with nothing more than a little code motion. Specifically, if shared_list is 
generated within the check_sharing function, the (potentially) large list of 
devices is not too painful to work with.

For example, the attached patch works well in my environment, and removes the 
exponential slowdown. The main change is that $devmm becomes a comma delimited 
list of devices (major:minor) to check against the vbd's found in xenstore. A 
few minor changes are required along the way, but nothing significant. The 
comma delimited list might become problematic at very large numbers (hundreds) 
of a single shared device, but I don't think it will be a problem in practice. 
Even if it has limitations, this approach offers significant improvements in 
performance.

I'll continue to test this patch here, but I'm interested in your opinion.

Thanks,
Mike

---

--- a/etc/xen/scripts/block     2015-09-26 16:52:19.000000000 -0600
+++ b/etc/xen/scripts/block     2015-09-29 14:48:43.000000000 -0600
@@ -66,10 +66,27 @@ find_free_loopback_dev() {
 #
 check_sharing()
 {
-  local dev="$1"
-  local mode="$2"
+  local devtype=$1
+  local dev="$2"
+  local mode="$3"
+
+  if [ "$devtype" = "file" ];
+  then
+    local inode="$4"
+
+    shared_list=$(losetup -a |
+          sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}
[[:blank:]](.*)\)@\1@p" )
+    for dev in $shared_list
+    do
+      if [ -n "$dev" ]
+      then
+        devmm="${devmm}$(device_major_minor $dev),"
+      fi
+    done
+  else
+    local devmm="$(device_major_minor \"$dev\"),"
+  fi
 
-  local devmm=$(device_major_minor "$dev")
   local file
 
   if [ "$mode" = 'w' ]
@@ -85,9 +102,9 @@ check_sharing()
     then
       local d=$(device_major_minor "$file")
 
-      if [ "$d" = "$devmm" ]
+      if [[ "$devmm" == *"$d,"* ]]
       then
-        echo 'local'
+        echo 'local' "$d"
         return
       fi
     fi
@@ -100,13 +117,13 @@ check_sharing()
     do
       d=$(xenstore_read_default "$base_path/$dom/$dev/physical-device" "")
 
-      if [ "$d" = "$devmm" ]
+      if [ -n "$d" ] && [[ "$devmm" == *"$d,"* ]]
       then
         if [ "$mode" = 'w' ]
         then
           if ! same_vm $dom
           then
-            echo 'guest'
+            echo 'guest' "$d"
             return
           fi
         else
@@ -117,7 +134,7 @@ check_sharing()
           then
             if ! same_vm $dom
             then
-              echo 'guest'
+              echo 'guest' "$d"
               return
             fi
           fi
@@ -146,11 +163,11 @@ check_device_sharing()
     return 0
   fi
 
-  result=$(check_sharing "$dev" "$mode")
+  result=$(check_sharing "device" "$dev" "$mode")
 
-  if [ "$result" != 'ok' ]
+  if [[ "$result" != "ok"* ]]
   then
-    do_ebusy "Device $dev is mounted " "$mode" "$result"
+    do_ebusy "Device $dev is mounted " "$mode" "${result%% *}"
   fi
 }
 
@@ -166,13 +183,14 @@ check_file_sharing()
   local file="$1"
   local dev="$2"
   local mode="$3"
+  local inode="$4"
 
-  result=$(check_sharing "$dev" "$mode")
+  result=$(check_sharing "file" "$dev" "$mode" "$inode")
 
-  if [ "$result" != 'ok' ]
+  if [[ "$result" != "ok"* ]]
   then
-    do_ebusy "File $file is loopback-mounted through $dev,
-which is mounted " "$mode" "$result"
+    do_ebusy "File $file is loopback-mounted through ${dev#* },
+which is mounted " "$mode" "${result%% *}"
   fi
 }
 
@@ -289,15 +307,7 @@ mount it read-write in a guest domain."
             fatal "Unable to lookup $file: dev: $dev inode: $inode"
           fi
 
-          shared_list=$(losetup -a |
-                sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}
[[:blank:]](.*)\)@\1@p" )
-          for dev in $shared_list
-          do
-            if [ -n "$dev" ]
-            then
-              check_file_sharing "$file" "$dev" "$mode"
-            fi
-          done
+          check_file_sharing "$file" "$dev" "$mode" "$inode"
         fi
 
         loopdev=$(losetup -f 2>/dev/null || find_free_loopback_dev)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Shared image files and block script performance
  2015-09-29 21:18   ` Mike Latimer
@ 2015-09-30 10:35     ` Ian Campbell
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Campbell @ 2015-09-30 10:35 UTC (permalink / raw)
  To: Mike Latimer; +Cc: xen-devel

On Tue, 2015-09-29 at 15:18 -0600, Mike Latimer wrote:
> Hi Ian,
> 
> On Tuesday, September 29, 2015 10:25:32 AM Ian Campbell wrote:
> > On Mon, 2015-09-28 at 17:14 -0600, Mike Latimer wrote:
> > > Any better options or ideas?
> > 
> > Is part of the problem that shell is a terrible choice for this kind of
> > check?
> 
> There is some truth to that...  ;)
> 
> > Would shelling out to a helper utility allow this to be written in
> > something better?
> 
> A helper utility would be useful, however, I'm seeing a huge amount of gain 
> with nothing more than a little code motion. Specifically, if shared_list is 
> generated within the check_sharing function, the (potentially) large list of 
> devices is not too painful to work with.
> 
> For example, the attached patch works well in my environment, and removes the 
> exponential slowdown. The main change is that $devmm becomes a comma delimited 
> list of devices (major:minor) to check against the vbd's found in xenstore. A 
> few minor changes are required along the way, but nothing significant. The 
> comma delimited list might become problematic at very large numbers (hundreds) 
> of a single shared device, but I don't think it will be a problem in practice. 
> Even if it has limitations, this approach offers significant improvements in 
> performance.
> 
> I'll continue to test this patch here, but I'm interested in your opinion.

If you have a patch which improves things then I think you should just go
ahead formally submit it (http://wiki.xen.org/wiki/Submitting_Xen_Patches).

Ian.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-09-30 10:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-28 23:14 Shared image files and block script performance Mike Latimer
2015-09-29  9:25 ` Ian Campbell
2015-09-29 21:18   ` Mike Latimer
2015-09-30 10:35     ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).