From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com,
abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport
Date: Fri, 05 Apr 2013 16:45:34 -0400 [thread overview]
Message-ID: <515F37EE.1000702@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130321061159.GA28328@redhat.com>
On 03/21/2013 02:11 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 19, 2013 at 01:49:34PM -0400, Michael R. Hines wrote:
>> I also did a test using RDMA + cgroup, and the kernel killed my QEMU :)
>>
>> So, infiniband is not smart enough to know how to avoid pinning a
>> zero page, I guess.
>>
>> - Michael
>>
>> On 03/19/2013 01:14 PM, Paolo Bonzini wrote:
>>> Il 19/03/2013 18:09, Michael R. Hines ha scritto:
>>>> Allowing QEMU to swap due to a cgroup limit during migration is a viable
>>>> overcommit option?
>>>>
>>>> I'm trying to keep an open mind, but that would kill the migration
>>>> time.....
>>> Would it swap? Doesn't the kernel back all zero pages with a single
>>> copy-on-write page? If that still accounts towards cgroup limits, it
>>> would be a bug.
>>>
>>> Old kernels do not have a shared zero hugepage, and that includes some
>>> distro kernels. Perhaps that's the problem.
>>>
>>> Paolo
>>>
> I really shouldn't break COW if you don't request LOCAL_WRITE.
> I think it's a kernel bug, and apparently has been there in the code since the
> first version: get_user_pages parameters swapped.
>
> I'll send a patch. If it's applied, you should also
> change your code from
>
> + IBV_ACCESS_LOCAL_WRITE |
> + IBV_ACCESS_REMOTE_WRITE |
> + IBV_ACCESS_REMOTE_READ);
>
> to
>
> + IBV_ACCESS_REMOTE_READ);
>
> on send side.
> Then, each time we detect a page has changed we must make sure to
> unregister and re-register it. Or if you want to be very
> smart, check that the PFN didn't change and reregister
> if it did.
>
> This will make overcommit work.
>
Unfortunately RDMA + cgroups still kills QEMU:
I removed the *_WRITE flags and did a test like this:
1. Start QEMU with 2GB ram configured
$ cd /sys/fs/cgroup/memory/libvirt/qemu
$ echo "-1" > memory.memsw.limit_in_bytes
$ echo "-1" > memory.limit_in_bytes
$ echo $(pidof qemu-system-x86_64) > tasks
$ echo 512M > memory.limit_in_bytes # maximum RSS
$ echo 3G > memory.memsw.limit_in_bytes # maximum RSS + swap, extra
1G to be safe
2. Start RDMA migration
3. RSS of 512M is reached
4. swap starts filling up
5. the kernel kills QEMU
6. dmesg:
[ 2981.657135] Task in /libvirt/qemu killed as a result of limit of
/libvirt/qemu
[ 2981.657140] memory: usage 524288kB, limit 524288kB, failcnt 18031
[ 2981.657143] memory+swap: usage 525460kB, limit 3145728kB, failcnt 0
[ 2981.657146] Mem-Info:
[ 2981.657148] Node 0 DMA per-cpu:
[ 2981.657152] CPU 0: hi: 0, btch: 1 usd: 0
[ 2981.657155] CPU 1: hi: 0, btch: 1 usd: 0
[ 2981.657157] CPU 2: hi: 0, btch: 1 usd: 0
[ 2981.657160] CPU 3: hi: 0, btch: 1 usd: 0
[ 2981.657163] CPU 4: hi: 0, btch: 1 usd: 0
[ 2981.657165] CPU 5: hi: 0, btch: 1 usd: 0
[ 2981.657167] CPU 6: hi: 0, btch: 1 usd: 0
[ 2981.657170] CPU 7: hi: 0, btch: 1 usd: 0
[ 2981.657172] Node 0 DMA32 per-cpu:
[ 2981.657176] CPU 0: hi: 186, btch: 31 usd: 160
[ 2981.657178] CPU 1: hi: 186, btch: 31 usd: 22
[ 2981.657181] CPU 2: hi: 186, btch: 31 usd: 179
[ 2981.657184] CPU 3: hi: 186, btch: 31 usd: 6
[ 2981.657186] CPU 4: hi: 186, btch: 31 usd: 21
[ 2981.657189] CPU 5: hi: 186, btch: 31 usd: 15
[ 2981.657191] CPU 6: hi: 186, btch: 31 usd: 19
[ 2981.657194] CPU 7: hi: 186, btch: 31 usd: 22
[ 2981.657196] Node 0 Normal per-cpu:
[ 2981.657200] CPU 0: hi: 186, btch: 31 usd: 44
[ 2981.657202] CPU 1: hi: 186, btch: 31 usd: 58
[ 2981.657205] CPU 2: hi: 186, btch: 31 usd: 156
[ 2981.657207] CPU 3: hi: 186, btch: 31 usd: 107
[ 2981.657210] CPU 4: hi: 186, btch: 31 usd: 44
[ 2981.657213] CPU 5: hi: 186, btch: 31 usd: 70
[ 2981.657215] CPU 6: hi: 186, btch: 31 usd: 76
[ 2981.657218] CPU 7: hi: 186, btch: 31 usd: 173
[ 2981.657223] active_anon:181703 inactive_anon:68856 isolated_anon:0
[ 2981.657224] active_file:66881 inactive_file:141056 isolated_file:0
[ 2981.657225] unevictable:2174 dirty:6 writeback:0 unstable:0
[ 2981.657226] free:4058168 slab_reclaimable:5152 slab_unreclaimable:10785
[ 2981.657227] mapped:7709 shmem:192 pagetables:1913 bounce:0
[ 2981.657230] Node 0 DMA free:15896kB min:56kB low:68kB high:84kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15672kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2981.657242] lowmem_reserve[]: 0 1966 18126 18126
[ 2981.657249] Node 0 DMA32 free:1990652kB min:7324kB low:9152kB
high:10984kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2013280kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB
shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[ 2981.657260] lowmem_reserve[]: 0 0 16160 16160
[ 2981.657268] Node 0 Normal free:14226124kB min:60200kB low:75248kB
high:90300kB active_anon:726812kB inactive_anon:275424kB
active_file:267524kB inactive_file:564224kB unevictable:8696kB
isolated(anon):0kB isolated(file):0kB present:16547840kB mlocked:6652kB
dirty:24kB writeback:0kB mapped:30832kB shmem:768kB
slab_reclaimable:20608kB slab_unreclaimable:43140kB kernel_stack:1784kB
pagetables:7652kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[ 2981.657281] lowmem_reserve[]: 0 0 0 0
[ 2981.657289] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB
1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
[ 2981.657307] Node 0 DMA32: 17*4kB 9*8kB 7*16kB 4*32kB 8*64kB 5*128kB
6*256kB 4*512kB 3*1024kB 6*2048kB 481*4096kB = 1990652kB
[ 2981.657325] Node 0 Normal: 2*4kB 1*8kB 991*16kB 893*32kB 271*64kB
50*128kB 50*256kB 12*512kB 5*1024kB 1*2048kB 3450*4096kB = 14225504kB
[ 2981.657343] 277718 total pagecache pages
[ 2981.657345] 68816 pages in swap cache
[ 2981.657348] Swap cache stats: add 656848, delete 588032, find 19850/22338
[ 2981.657350] Free swap = 15288376kB
[ 2981.657353] Total swap = 15564796kB
[ 2981.706982] 4718576 pages RAM
next prev parent reply other threads:[~2013-04-05 20:45 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-18 3:18 [Qemu-devel] [RFC PATCH RDMA support v4: 00/10] cleaner ramblocks and documentation mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 01/10] ./configure --enable-rdma mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 02/10] check for CONFIG_RDMA mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 03/10] more verbose documentation of the RDMA transport mrhines
2013-03-18 10:40 ` Michael S. Tsirkin
2013-03-18 20:24 ` Michael R. Hines
2013-03-18 21:26 ` Michael S. Tsirkin
2013-03-18 23:23 ` Michael R. Hines
2013-03-19 8:19 ` Michael S. Tsirkin
2013-03-19 13:21 ` Michael R. Hines
2013-03-19 15:08 ` Michael R. Hines
2013-03-19 15:16 ` Michael S. Tsirkin
2013-03-19 15:32 ` Michael R. Hines
2013-03-19 15:36 ` Michael S. Tsirkin
2013-03-19 17:09 ` Michael R. Hines
2013-03-19 17:14 ` Paolo Bonzini
2013-03-19 17:23 ` Michael S. Tsirkin
2013-03-19 17:40 ` Michael R. Hines
2013-03-19 17:52 ` Paolo Bonzini
2013-03-19 18:04 ` Michael R. Hines
2013-03-20 13:07 ` Michael S. Tsirkin
2013-03-20 15:15 ` Michael R. Hines
2013-03-20 15:22 ` Michael R. Hines
2013-03-20 15:55 ` Michael S. Tsirkin
2013-03-20 16:08 ` Michael R. Hines
2013-03-20 19:06 ` Michael S. Tsirkin
2013-03-20 20:20 ` Michael R. Hines
2013-03-20 20:31 ` Michael S. Tsirkin
2013-03-20 20:39 ` Michael R. Hines
2013-03-20 20:46 ` Michael S. Tsirkin
2013-03-20 20:56 ` Michael R. Hines
2013-03-21 5:20 ` Michael S. Tsirkin
2013-03-20 20:24 ` Michael R. Hines
2013-03-20 20:37 ` Michael S. Tsirkin
2013-03-20 20:45 ` Michael R. Hines
2013-03-20 20:52 ` Michael S. Tsirkin
2013-03-19 17:49 ` Michael R. Hines
2013-03-21 6:11 ` Michael S. Tsirkin
2013-03-21 15:22 ` Michael R. Hines
2013-04-05 20:45 ` Michael R. Hines [this message]
2013-04-05 20:46 ` Michael R. Hines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 04/10] iterators for getting the RAMBlocks mrhines
2013-03-18 8:48 ` Paolo Bonzini
2013-03-18 20:25 ` Michael R. Hines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 05/10] reuse function for parsing the QMP 'migrate' string mrhines
2013-03-18 3:18 ` [Qemu-devel] [RFC PATCH RDMA support v4: 06/10] core RDMA migration code (rdma.c) mrhines
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 07/10] connection-establishment for RDMA mrhines
2013-03-18 8:56 ` Paolo Bonzini
2013-03-18 20:26 ` Michael R. Hines
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA mrhines
2013-03-18 9:09 ` Paolo Bonzini
2013-03-18 20:33 ` Michael R. Hines
2013-03-19 9:18 ` Paolo Bonzini
2013-03-19 13:12 ` Michael R. Hines
2013-03-19 13:25 ` Paolo Bonzini
2013-03-19 13:40 ` Michael R. Hines
2013-03-19 13:45 ` Paolo Bonzini
2013-03-19 14:10 ` Michael R. Hines
2013-03-19 14:22 ` Paolo Bonzini
2013-03-19 15:02 ` [Qemu-devel] [Bug]? (RDMA-related) ballooned memory not consulted during migration? Michael R. Hines
2013-03-19 15:12 ` Michael R. Hines
2013-03-19 15:17 ` Michael S. Tsirkin
2013-03-19 18:27 ` [Qemu-devel] [RFC PATCH RDMA support v4: 08/10] introduce QEMUFileRDMA Michael R. Hines
2013-03-19 18:40 ` Paolo Bonzini
2013-03-20 15:20 ` Paolo Bonzini
2013-03-20 16:09 ` Michael R. Hines
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 09/10] check for QMP string and bypass nonblock() calls mrhines
2013-03-18 8:47 ` Paolo Bonzini
2013-03-18 20:37 ` Michael R. Hines
2013-03-19 9:23 ` Paolo Bonzini
2013-03-19 13:08 ` Michael R. Hines
2013-03-19 13:20 ` Paolo Bonzini
2013-03-18 3:19 ` [Qemu-devel] [RFC PATCH RDMA support v4: 10/10] send pc.ram over RDMA mrhines
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=515F37EE.1000702@linux.vnet.ibm.com \
--to=mrhines@linux.vnet.ibm.com \
--cc=abali@us.ibm.com \
--cc=aliguori@us.ibm.com \
--cc=gokul@us.ibm.com \
--cc=mrhines@us.ibm.com \
--cc=mst@redhat.com \
--cc=owasserm@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).