All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eugene Istomin <E.Istomin@edss.ee>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Read IOPS storm in case of reflinking running VM disk
Date: Thu, 21 May 2015 01:33:43 +0300	[thread overview]
Message-ID: <2137407.TBXHbaxxee@evis> (raw)
In-Reply-To: <555A2544.2070607@suse.de>

Goldwyn,

thanks for the answer!

I read 
https://oss.oracle.com/osswiki/OCFS2(2f)DesignDocs(2f)RefcountTrees.html  
carefully to understand the problem.

As i understand:
There are B-Tree structures for reflink: ocfs2_refcount_tree; 
ocfs2_refcount_block -> ocfs2_refcount_list -> ocfs2_refcount_rec
"The refcount tree root is a refcount block pointed to by i_refcount_loc"
Some operations needs extra uncached lookups
Also i dumped frag/stat/refcount from production hypervisor node using 
debugfs.ocfs2, files are in attach (url as alt way - 
http://public.edss.ee/tmp/debugfs.tar.gz ). 

Hypervisor OCFS2 mount options: 
rw,nosuid,noexec,noatime,heartbeat=none,nointr,data=ordered,errors=remount-
ro,localalloc=2048,coherency=full,user_xattr,acl

Mkfs string:
mkfs.ocfs2 -b 4KB -C 1MB -N 2 -T vmstore -L "storage" --fs-
features=local,backup-super,sparse,unwritten,inline-
data,metaecc,refcount,xattr,indexed-dirs,discontig-bg


Can you please explain why there are so many extent blocks (204)? Is it really 
impossible to store plenty of clusters in single extent (like #25, block 
3874095 -> 20847 clusters)? 

-- 
Best regards,
Eugene Istomin
IT Architect

On Monday, May 18, 2015 12:45:40 PM Goldwyn Rodrigues wrote:
> Hi Eugene,
> 
> Sorry, had been busy with other work and this slipped on the list.
> 
> >  > Do you know something about such behavior?
> >  > 
> >  > The question is why a reflink operation on VM disk leads to plenty of
> > 
> > read
> > 
> >  > ops? Is this related to CoW specific structures?
> 
> This is in fact related to the CoW. An ocfs2 file is an extent tree,
> which the extent headers marking if the extent is a reflinked or not
> with the number of reflinks.
> 
> If you perform a reflink on a file which is being changed constantly,
> not only recreate the extent tree, but also decrease the refcount of the
> ones already present. Add to it, the extents which need to be read for
> replication.
> 
> 
> HTH,
> 
> >  > We can provide others details & ssh to testbed.
> >  > 
> >  > > Hello,
> >  > > 
> >  > > 
> >  > > 
> >  > > after deploying reflink-based VM snapshots to production servers we
> >  > > 
> >  > > discovered a performace degradation:
> >  > > 
> >  > > 
> >  > > 
> >  > > OS: Opensuse 13.1, 13.2
> >  > > 
> >  > > Hypervisors: Xen 4.4, 4.5
> >  > > 
> >  > > Dom0 kernels: 3.12, 3.16, 3.18
> >  > > 
> >  > > DomU kernels: 3.12, 3.16, 3.18
> >  > > 
> >  > > Tested DomU disk backends: tapdisk2, qdisk
> >  > > 
> >  > > 
> >  > > 
> >  > > 
> >  > > 
> >  > > 1) on DomU (VM)
> >  > > 
> >  > > #dd if=/dev/zero of=test2 bs=1M count=6000
> >  > > 
> >  > > 
> >  > > 
> >  > > 2) atop on Dom0:
> >  > > 
> >  > > sdb - busy:92% - read:375 - write:130902
> >  > > 
> >  > > Reads are from others VMs, seems OK
> >  > > 
> >  > > 
> >  > > 
> >  > > 3) DomU dd finished:
> >  > > 
> >  > > 6291456000 bytes (6.3 GB) copied, 16.6265 s, 378 MB/s
> >  > > 
> >  > > 
> >  > > 
> >  > > 4) Lets start dd again & do a snapshot:
> >  > > 
> >  > > #dd if=/dev/zero of=test2 bs=1M count=6000
> >  > > 
> >  > > #reflink test.raw ref/
> >  > > 
> >  > > 
> >  > > 
> >  > > 5) atop on Dom0:
> >  > > 
> >  > > sdb - busy:97% - read:112740 - write:28037
> >  > > 
> >  > > So, Read IOPS = 112740, why?
> >  > > 
> >  > > 
> >  > > 
> >  > > 6) DomU dd finished:
> >  > > 
> >  > > 6291456000 bytes (6.3 GB) copied, 175.45 s, 35.9 MB/s
> >  > > 
> >  > > 
> >  > > 
> >  > > 7) Second & further reflinks do not change the atop stat & dd time
> >  > > 
> >  > > #dd if=/dev/zero of=test2 bs=1M count=6000
> >  > > 
> >  > > #reflink --backup=t test.raw ref/ \\ * n times
> >  > > 
> >  > > ~ 6291456000 bytes (6.3 GB) copied, 162.959 s, 38.6 MB/s
> >  > > 
> >  > > 
> >  > > 
> >  > > The question is why reflinking a running VM disk leads to read IOPS
> > 
> > storm?
> > 
> >  > > Thanks!
> >  > 
> >  > _______________________________________________
> >  > 
> >  > Ocfs2-devel mailing list
> >  > 
> >  > Ocfs2-devel at oss.oracle.com
> >  > 
> >  > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150521/23cd43e2/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugfs.tar.gz
Type: application/x-compressed-tar
Size: 729820 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150521/23cd43e2/attachment-0001.bin 

  reply	other threads:[~2015-05-20 22:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-08  5:56 [Ocfs2-devel] Read IOPS storm in case of reflinking running VM disk Eugene Istomin
2015-05-11  8:48 ` Eugene Istomin
2015-05-18 10:05   ` Eugene Istomin
2015-05-18 17:45     ` Goldwyn Rodrigues
2015-05-20 22:33       ` Eugene Istomin [this message]
2015-05-21 11:57         ` Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2137407.TBXHbaxxee@evis \
    --to=e.istomin@edss.ee \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.