From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751190AbdAaFMK (ORCPT ); Tue, 31 Jan 2017 00:12:10 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57548 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750758AbdAaFLP (ORCPT ); Tue, 31 Jan 2017 00:11:15 -0500 Subject: Re: [RFC V2 11/12] mm: Tag VMA with VM_CDM flag during page fault To: Dave Hansen , Anshuman Khandual , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20170130033602.12275-1-khandual@linux.vnet.ibm.com> <20170130033602.12275-12-khandual@linux.vnet.ibm.com> <5f1ec7f6-16d3-8653-4494-50e124916a9e@intel.com> Cc: mhocko@suse.com, vbabka@suse.cz, mgorman@suse.de, minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com, srikar@linux.vnet.ibm.com, haren@linux.vnet.ibm.com, jglisse@redhat.com, dan.j.williams@intel.com From: Anshuman Khandual Date: Tue, 31 Jan 2017 10:40:07 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <5f1ec7f6-16d3-8653-4494-50e124916a9e@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17013105-0040-0000-0000-000002EA5CED X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17013105-0041-0000-0000-00000C296088 Message-Id: <01ed36eb-bb1d-bb75-57f9-90159985e75e@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-01-31_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701310048 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/30/2017 11:21 PM, Dave Hansen wrote: > Here's the flag definition: > >> +#ifdef CONFIG_COHERENT_DEVICE >> +#define VM_CDM 0x00800000 /* Contains coherent device memory */ >> +#endif > > But it doesn't match the implementation: > >> +#ifdef CONFIG_COHERENT_DEVICE >> +static void mark_vma_cdm(nodemask_t *nmask, >> + struct page *page, struct vm_area_struct *vma) >> +{ >> + if (!page) >> + return; >> + >> + if (vma->vm_flags & VM_CDM) >> + return; >> + >> + if (nmask && !nodemask_has_cdm(*nmask)) >> + return; >> + >> + if (is_cdm_node(page_to_nid(page))) >> + vma->vm_flags |= VM_CDM; >> +} > > That flag is a one-way trip. Any VMA with that flag set on it will keep > it for the life of the VMA, despite whether it has CDM pages in it now > or not. Even if you changed the policy back to one that doesn't allow > CDM and forced all the pages to be migrated out. Right, we have this limitation right now. But as I have mentioned in the reply on the other thread, will work towards both static and runtime re-evaluation of the VMA flag next time around. > > This also assumes that the only way to get a page mapped into a VMA is > via alloc_pages_vma(). Do the NUMA migration APIs use this path? Right now I have just taken care of these two paths. * Page fault path * mbind() path agreed, will work on the NUMA migration APIs paths next. Wondering if I need to update for migrate_pages() kernel API also as it will be used by the driver or should the driver tag the VMA explicitly knowing what has just happened ? I had also mentioned about this in the cover letter :) But as you have pointed out will move the documentation to the patches. " VM_CDM tagged VMA: There are two parts to this problem. * How to mark a VMA with VM_CDM ? - During page fault path - During mbind(MPOL_BIND) call - Any other paths ? - Should a driver mark a VMA with VM_CDM explicitly ? * How VM_CDM marked VMA gets treated ? - Disabled from auto NUMA migrations - Disabled from KSM merging - Anything else ? " > > When you *set* this flag, you don't go and turn off KSM merging, for > instance. You keep it from being turned on from this point forward, but > you don't turn it off. I was in the impression that the KSM merging does not start unless we do madvise(MADV_MERGEABLE) call on the VMA (where its blocked now). I might be missing something here if it can start before hand. > > This is happening with mmap_sem held for read. Correct? Is it OK that > you're modifying the VMA? That vm_flags manipulation is non-atomic, so > how can that even be safe? Hmm. should it be done with mmap_sem being held for write. Will look into this further. But intercepting the page faults inside alloc_pages_vma() for tagging the VMA is okay from over all design perspective ?. Or this should be moved up or down the call chain in the page fault path ? > > If you're going to go down this route, I think you need to be very > careful. We need to ensure that when this flag gets set, it's never set > on VMAs that are "normal" and will only be set on VMAs that were > *explicitly* set up for accessing CDM. That means that you'll need to > make sure that there's no possible way to get a CDM page faulted into a > VMA unless it's via an explicitly assigned policy that would have cause > the VMA to be split from any "normal" one in the system. > > This all makes me really nervous. Got it, will work towards this.