From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752356Ab2LDTO5 (ORCPT ); Tue, 4 Dec 2012 14:14:57 -0500 Received: from e9.ny.us.ibm.com ([32.97.182.139]:34712 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752105Ab2LDTOz (ORCPT ); Tue, 4 Dec 2012 14:14:55 -0500 Message-ID: <50BE4B64.6000003@linaro.org> Date: Tue, 04 Dec 2012 11:13:40 -0800 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Minchan Kim CC: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Lameter , Android Kernel Team , Robert Love , Mel Gorman , Hugh Dickins , Dave Hansen , Rik van Riel , Dave Chinner , Neil Brown , Mike Hommey , Taras Glek , KOSAKI Motohiro , KAMEZAWA Hiroyuki Subject: Re: [RFC v2] Support volatile range for anon vma References: <1351560594-18366-1-git-send-email-minchan@kernel.org> <50AD739A.30804@linaro.org> <50B6E1F9.5010301@linaro.org> <20121204000042.GB20395@bbox> <50BD4A70.9060506@linaro.org> <20121204072207.GA9782@blaptop> In-Reply-To: <20121204072207.GA9782@blaptop> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12120419-7182-0000-0000-000003860F01 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/03/2012 11:22 PM, Minchan Kim wrote: > On Mon, Dec 03, 2012 at 04:57:20PM -0800, John Stultz wrote: >> On 12/03/2012 04:00 PM, Minchan Kim wrote: >>> On Wed, Nov 28, 2012 at 08:18:01PM -0800, John Stultz wrote: >>>> On 11/21/2012 04:36 PM, John Stultz wrote: >>>>> 2) Being able to use this with tmpfs files. I'm currently trying >>>>> to better understand the rmap code, looking to see if there's a >>>>> way to have try_to_unmap_file() work similarly to >>>>> try_to_unmap_anon(), to allow allow users to madvise() on mmapped >>>>> tmpfs files. This would provide a very similar interface as to >>>>> what I've been proposing with fadvise/fallocate, but just using >>>>> process virtual addresses instead of (fd, offset) pairs. The >>>>> benefit with (fd,offset) pairs for Android is that its easier to >>>>> manage shared volatile ranges between two processes that are >>>>> sharing data via an mmapped tmpfs file (although this actual use >>>>> case may be fairly rare). I believe we should still be able to >>>>> rework the ashmem internals to use madvise (which would provide >>>>> legacy support for existing android apps), so then its just a >>>>> question of if we could then eventually convince Android apps to >>>>> use the madvise interface directly, rather then the ashmem unpin >>>>> ioctl. >>>> Hey Minchan, >>>> I've been playing around with your patch trying to better >>>> understand your approach and to extend it to support tmpfs files. In >>>> doing so I've found a few bugs, and have some rough fixes I wanted >>>> to share. There's still a few edge cases I need to deal with (the >>>> vma-purged flag isn't being properly handled through vma merge/split >>>> operations), but its starting to come along. >>> Hmm, my patch doesn't allow to merge volatile with another one by >>> inserting VM_VOLATILE into VM_SPECIAL so I guess merge isn't problem. >>> In case of split, __split_vma copy old vma to new vma like this >>> >>> *new = *vma; >>> >>> So the problem shouldn't happen, I guess. >>> Did you see the real problem about that? >> Yes, depending on the pattern that MADV_VOLATILE and MADV_NOVOLATILE >> is applied, we can get a result where data is purged, but we aren't >> notified of it. Also, since madvise returns early if it encounters >> an error, in the case where you have checkerboard volatile regions >> (say every other page is volatile), which you mark non-volatile with >> one large MADV_NOVOLATILE call, the first volatile vma will be >> marked non-volatile, but since it returns purged, the madvise loop >> will stop and the following volatile regions will be left volatile. >> >> The patches in the git tree below which handle the perged state >> better seem to work for my tests, as far as resolving any >> overlapping calls. Of course there may yet still be problems I've >> not found. >> >>>> Anyway, take a look at the tree here and let me know what you think. >>>> http://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=refs/heads/dev/minchan-anonvol >> Eager to hear what you think! > Below two patches look good to me. > > [rmap: Simplify volatility checking by moving it out of try_to_unmap_one] > [rmap: ClearPageDirty() when returning SWAP_DISCARD] > > [madvise: Fix NOVOLATILE bug] > I can't understand description of the patch. > Could you elaborate it with example? The case I ran into here is if you have a range where you mark every other page as volatile. Then mark all the pages in that range as non-volatile in one madvise call. sys_madvise() will then find the first vma in the range, and call madvise_vma(), which marks the first vma non-volatile and return the purged state. If the page has been purged, sys_madvise code will note that as an error, and break out of the vma iteration loop, leaving the following vmas in the range volatile. > [madvise: Fixup vma->purged handling] > I included VM_VOLATILE into VM_SPECIAL intentionally. > If comment of VM_SPECIAL is right, merge with volatile vmas shouldn't happen. > So I guess you see other problem. When I see my source code today, locking > scheme/purge handling is totally broken. I will look at it. Maybe you are seeing > bug related that. Part of patch is needed. It could be separate patch. > I will merge it. I don't think the problem is when vmas being marked VM_VOLATILE are being merged, its that when we mark the vma as *non-volatile*, and remove the VM_VOLATILE flag we merge the non-volatile vmas with neighboring vmas. So preserving the purged flag during that merge is important. Again, the example I used to trigger this was an alternating pattern of volatile and non volatile vmas, then marking the entire range non-volatile (though sometimes in two overlapping passes). thanks -john