From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25D6EC43381 for ; Fri, 22 Feb 2019 23:29:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E87DA20651 for ; Fri, 22 Feb 2019 23:29:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="E4jsy1nG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726834AbfBVX3C (ORCPT ); Fri, 22 Feb 2019 18:29:02 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:43382 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725811AbfBVX3A (ORCPT ); Fri, 22 Feb 2019 18:29:00 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x1MNOR14084307; Fri, 22 Feb 2019 23:28:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2018-07-02; bh=96OUo80OzrgDQX6O4TmhU6GGnP6xqgxXWRU2Uuox0uI=; b=E4jsy1nGPaKgy2vXFlVFZTN2nio8RrZUVAbS1lsfnSC9tM1mevnGZn6ChcL9yeGnYQsu rmiFTcT5/qAqkMQ6BHNMXFFHIPRn20k89fCzf7LEWbQoPIvoHHQzijuByqkLsEKNdd57 hFBzHQhdlkPEmL7ij8o0S1MHhTumKk2i42w6slpPcvDxEbXDfMo1b0bBJ3YttuiE8/Za 7PezuQz+B6NpEo8xu0WwaOzZuEepcmbrl0dJhYBkDo8KJ+zP+arS1w33Yn0wZCBlGSCs NoUDJq65HV2DQ2RrpqgCeQN6lbJ0svH82O64v1S4SKlyXP1N6PcOGjUzAqaVBgE2l+Vl PA== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2130.oracle.com with ESMTP id 2qp81et4sc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 Feb 2019 23:28:55 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x1MNSsaG001474 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 Feb 2019 23:28:54 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x1MNSr5H020329; Fri, 22 Feb 2019 23:28:53 GMT Received: from localhost (/10.159.224.245) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 22 Feb 2019 15:28:53 -0800 Date: Fri, 22 Feb 2019 15:28:51 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: dan.j.williams@intel.com, linux-nvdimm@lists.01.org, zwisler@kernel.org, vishal.l.verma@intel.com, xfs , linux-fsdevel Subject: Re: [RFC PATCH] pmem: advertise page alignment for pmem devices supporting fsdax Message-ID: <20190222232851.GC21626@magnolia> References: <20190222182008.GT6503@magnolia> <20190222231136.GC23020@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190222231136.GC23020@dastard> User-Agent: Mutt/1.9.4 (2018-02-28) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9175 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=911 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902220161 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Sat, Feb 23, 2019 at 10:11:36AM +1100, Dave Chinner wrote: > On Fri, Feb 22, 2019 at 10:20:08AM -0800, Darrick J. Wong wrote: > > Hi all! > > > > Uh, we have an internal customer who's been trying out MAP_SYNC > > on pmem, and they've observed that one has to do a fair amount of > > legwork (in the form of mkfs.xfs parameters) to get the kernel to set up > > 2M PMD mappings. They (of course) want to mmap hundreds of GB of pmem, > > so the PMD mappings are much more efficient. > > > > I started poking around w.r.t. what mkfs.xfs was doing and realized that > > if the fsdax pmem device advertised iomin/ioopt of 2MB, then mkfs will > > set up all the parameters automatically. Below is my ham-handed attempt > > to teach the kernel to do this. > > What's the before and after mkfs output? > > (need to see the context that this "fixes" before I comment) Here's what we do today assuming no options and 800GB pmem devices: # blockdev --getiomin --getioopt /dev/pmem0 /dev/pmem1 4096 0 4096 0 # mkfs.xfs -N /dev/pmem0 -r rtdev=/dev/pmem1 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=52428800 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=209715200, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=102400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =/dev/pmem1 extsz=4096 blocks=209715200, rtextents=209715200 And here's what we do to get 2M aligned mappings: # mkfs.xfs -N /dev/pmem0 -r rtdev=/dev/pmem1,extsize=2m -d su=2m,sw=1 meta-data=/dev/pmem0 isize=512 agcount=32, agsize=6553600 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=209715200, imaxpct=25 = sunit=512 swidth=512 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=102400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =/dev/pmem1 extsz=2097152 blocks=209715200, rtextents=409600 With this patch, things change as such: # blockdev --getiomin --getioopt /dev/pmem0 /dev/pmem1 2097152 2097152 2097152 2097152 # mkfs.xfs -N /dev/pmem0 -r rtdev=/dev/pmem1 meta-data=/dev/pmem0 isize=512 agcount=32, agsize=6553600 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=209715200, imaxpct=25 = sunit=512 swidth=512 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=102400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =/dev/pmem1 extsz=2097152 blocks=209715200, rtextents=409600 I think the only change is the agcount, which for 2M mappings probably isn't a huge deal. It's obviously a bigger deal for 1G pages, assuming we decide that's even advisable. --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com