From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB9FEC3DA45 for ; Thu, 11 Jul 2024 16:49:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B9C56B0088; Thu, 11 Jul 2024 12:49:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 568E06B0099; Thu, 11 Jul 2024 12:49:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 430706B009A; Thu, 11 Jul 2024 12:49:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 24D2A6B0088 for ; Thu, 11 Jul 2024 12:49:55 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 81AD91C14B9 for ; Thu, 11 Jul 2024 16:49:49 +0000 (UTC) X-FDA: 82328058498.25.3C23F61 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf02.hostedemail.com (Postfix) with ESMTP id AD1528001F for ; Thu, 11 Jul 2024 16:49:47 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lTCuOl3K; spf=pass (imf02.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720716555; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ajPnXMpJ4/AzzbvVdXpfE2ZBxYyBVXFGIzlJOg2lq+Q=; b=kj4vVqyvyJIH5tz8D+1n1cnfzuejGlzICsYGiJpx+/jn7ROyFU7ebeMCiT4i+7OeIP0gbP 5KUSc6igqfmorNQp+B6K+KxEPZBZdd9v1fItxpMmXYDOkAoHieEAL5im1xYDRwKV1lpkYW MsriLygyybmNS7+MTXjf09qnYgR/NyQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720716555; a=rsa-sha256; cv=none; b=UtHVRf7lFjlgTdRZ5ZZmRBS9RznSqHoSovrhI5zasuXYeTg6mBRKVdCo7oD6mNzMVMwbz9 f+yzh1hRGfzeG6tBeXiAFAY9JhI77+tuX/Jf4UOecV26UlnunPlA5OLrEx1yd6euoxhnm7 al4lS3gg3OyWQmAia3Fft97E8Co72qs= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lTCuOl3K; spf=pass (imf02.hostedemail.com: domain of alexander.duyck@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=alexander.duyck@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4266f535e82so7118445e9.1 for ; Thu, 11 Jul 2024 09:49:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720716586; x=1721321386; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ajPnXMpJ4/AzzbvVdXpfE2ZBxYyBVXFGIzlJOg2lq+Q=; b=lTCuOl3KO1neXt0/zQzJz/QG/hD1VewMECm/thEwnfnuzFQiZSLRUmmQ3GPMx/it9n GQ686xcttKDsoSkLAfPvOL3LFidATmHKZpB7xNSMMrfJvredJ2iTdkVjgKwnXEouWIre TYmVeYRdCIf2V8L4OJHfdK2Qua2fgjatRkp4RwpmGfyysUUdwaKFpmACU1xOOeHsGvcv yXUjmmJtN+VW2hgHXwCZbntF9TsEMSWIs1KGdImvrjyC48aGbz2DhK2rHYj2afxOUycW nUORjwPlS/vs9xc9o1QSH47eI9VnJjfESb1pmjgKbuwpdFeIz+biR0TtjTMYFV1hjf3Y VGrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720716586; x=1721321386; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ajPnXMpJ4/AzzbvVdXpfE2ZBxYyBVXFGIzlJOg2lq+Q=; b=COf7+A94fGO11jwDOUgZDh4UETmnaRpAzCRFiQRN1XsOEU+ST1i5nD7KsVvs3oREHD 61aY1Gch0JBSSyZM6Z6u4JjwVSZkJu5FrniBHqj98SsPvWJaL94CliTmQUAISfHhdk2X SCzd8byzT7cXaTLjtngJhgI37EVfNps9Cvv/Z6Ohshg1d4Hzkuuy9v5e7drURt4TMoUX C1AfaF5RK/yCEM8O/e1JoZ+banSqxccvngh4ZR1aFkEukwEJ4T7G2ab/Kdq2du5PAAei Qo5At5VnbWp5zDZUxF0uwjtzXO5fqCNd4SnsWp1ZQWwfqkRuOHvOwGxV5vWLLjB4QKDY hsgg== X-Forwarded-Encrypted: i=1; AJvYcCVBvX4o0udEZdtOZnjjgdSCnFAvfzpcHFTW2S7wUxUyStuCIhxilPS0BfcmlsVB4kGHLUcyC5bYd63zGpnKcZdxtVE= X-Gm-Message-State: AOJu0YyG6gwYo8VELiX37lE3A+JJqAeqbqbfQBEK8+cIJCwv8ZQFgqkQ B0xhzZUuNdUqNxX6o0FAfQgGjmUN4UBFamB73aXnIyKfQQTHYv1r11iYyWgIx/58LRt3UUkcYdK zPdk+nmb5DNLujTuMnftNWlldvQo= X-Google-Smtp-Source: AGHT+IFc7HUmEYXtlKgYDkVH3yjImijWEKyv1u8Kb7VsEAdpWPEY4UjSG0hV/MtfvCkDNjOpqFv4+gWq457GVJJ4Ezo= X-Received: by 2002:a05:6000:2ab:b0:367:437f:1784 with SMTP id ffacd0b85a97d-367cea46275mr9552545f8f.7.1720716585630; Thu, 11 Jul 2024 09:49:45 -0700 (PDT) MIME-Version: 1.0 References: <20240625135216.47007-1-linyunsheng@huawei.com> <20240625135216.47007-7-linyunsheng@huawei.com> <12a8b9ddbcb2da8431f77c5ec952ccfb2a77b7ec.camel@gmail.com> <808be796-6333-c116-6ecb-95a39f7ad76e@huawei.com> <96b04ebb7f46d73482d5f71213bd800c8195f00d.camel@gmail.com> <5daed410-063b-4d86-b544-d1a85bd86375@huawei.com> In-Reply-To: <5daed410-063b-4d86-b544-d1a85bd86375@huawei.com> From: Alexander Duyck Date: Thu, 11 Jul 2024 09:49:09 -0700 Message-ID: Subject: Re: [PATCH net-next v9 06/13] mm: page_frag: reuse existing space for 'size' and 'pfmemalloc' To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: AD1528001F X-Stat-Signature: 493iu6dqomxe7351pcsb3ktqom11914g X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1720716587-540946 X-HE-Meta: U2FsdGVkX1//8lhgwdG5HmgZG+oRJFnqZy1pPWGFjnkR43dhfzrlI9GH5rZMdKnpHDN7Lcb8L+kdtN1JY8OT3OBYorwlMbCreQSNDHkRWNkFUnzeNTa58/Klh8u0C58NFjF/MtxeSeSZ9tPDCNWSERC1wqdaHJ071p8QksLvqt/BwhnMiHnfBfrgSvf6ZVzdR69s8fvA21p6pRM20K+BKuHlW4wchO3ZCtDjKpIjUjSLM8wqu3dShxFcCxE0+NGyroivmXFJEvVqmYiwgxAVFFkmqSKPwfqH8dZN1lP38R/aCaIWe3+YBB7h7b+Lz0sWkjZgJ1KBDu1XG/78Kf/QLj1aEm6sxxkcKQJtdGPW6l2ZgX86XKCoIm2rLLdKmTLC5+VJeOHw+af3RhDgxDNi8QBSGLljXqVokHeOiDS28XhsyfJfNlsivngSLNCxC2u5/yrlgW7TPfMdPugjUI46ymaT0Obqi4evYTRMSxoMUxooM9JSPHK1LjCP5SA04XKG0wL5FrM3kuHADlhHXlj4t/dbvFX8yHhoo35wl1OarBsU7EWNUl6gbWZG6Cd4Gnia05uGG/2K/X3dOZH02JXeSxA+zP6xP1HYZW+cjsPIzYGnoK9gTbLCK1WZM7H6XrHtRN8qh/UuP0cLe299xCNhdHUbNwgo3qKMfNrB8HbzdR1tKjQeQVA635CaEw3s7QTvTRg6+ebFbva4AygmvxGSePmCcQ0X2c61B8j2v17wAGNva3aeR4vWdgdHV+l/bfmOWZ2kJFBTYza4B0hRovfGGQ6U5MjRWrSiWnStTnUTtUQWtPFUAxAidv6CeF17Z2PzxYqddFwBVyWZqfO5kM2a7APetfwoZ/8Vjj9RCLn5Sbhybeivzf6Qyk/gbxgdh+IFmpw+V13kaIVfdNvbgWp/OPXq8NRyNcn0IU9kRQ2X2GxFH54EvhI4M0L0/BWzOExRMlZKl4KeViaB7kO7Ue6 rX9a3mHU zZHuSXpuwouJnXCf7aU+diHQkTimrZ7auVQuq0kE0sI9enMOERMT0rZbx4pBVhabvXX3ej3jxQSd0y4yewi7geN2AQYqheyLlvXvmHgIIbB2wJYwc3At+SlbjpMOWPvXtwhdY93oAj0+upNhe3Czo5gauMbnwmVtfr+qGMhuXJyF+Kt4qzX7N3tJS8fNOq8Bk0ryAmlJxoD2z73dIulUJJe0TwqbpfyZuKWfUPCqj2IGhYYc8xXzST7djK4FDeLnsEDm6B1VTa4Jo8zcV9P2nclQS0txrV8Y38dMTgnLJ88llTQPAemYavmdUfJ3+FhBFQOa0I7oNjAUr4ELdOny13R/X95Cp6hALR26YKnos1CidZcPQavkYhENuNeJUbi/C3n/YC3Fnv+r4epE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 11, 2024 at 1:16=E2=80=AFAM Yunsheng Lin wrote: > > On 2024/7/10 23:28, Alexander H Duyck wrote: ... > >> > >> Yes, agreed. It would be good to be more specific about how to avoid > >> the above problem using a signed negative number for 'remaining' as > >> I am not sure how it can be done yet. > >> > > > > My advice would be to go back to patch 3 and figure out how to do this > > re-ordering without changing the alignment behaviour. The old code > > essentially aligned both the offset and fragsz by combining the two and > > then doing the alignment. Since you are doing a count up setup you will > > I am not sure I understand 'aligned both the offset and fragsz ' part, as > 'fragsz' being aligned or not seems to depend on last caller' align_mask, > for the same page_frag_cache instance, suppose offset =3D 32768 initially= for > the old code: > Step 1: __page_frag_alloc_align() is called with fragsz=3D7 and align_mas= k=3D~0u > the offset after this is 32761, the true fragsz is 7 too. > > Step 2: __page_frag_alloc_align() is called with fragsz=3D7 and align_mas= k=3D-16 > the offset after this is 32752, the true fragsz is 9, which does > not seems to be aligned. I was referring to my original code before this patchset. I was doing the subtraction of the fragsz first, and then aligning which would end up padding the end of the frame as it was adding to the total size by pulling the offset down *after* I had already subtracted fragsz. The result was that I was always adding additional room depending on the setting of the fragsz and how it related to the alignment. After changing the code to realign on the start of the next frag the fragsz is at the mercy of the next caller's alignment. In the event that the caller didn't bother to align the fragsz by the align mask before hand they can end up with a scenario that might result in false sharing. > The above is why I added the below paragraph in the doc to make the seman= tic > more explicit: > "Depending on different aligning requirement, the page_frag API caller ma= y call > page_frag_alloc*_align*() to ensure the returned virtual address or offse= t of > the page is aligned according to the 'align/alignment' parameter. Note th= e size > of the allocated fragment is not aligned, the caller needs to provide an = aligned > fragsz if there is an alignment requirement for the size of the fragment.= " > > And existing callers of page_frag aligned API does seems to follow the ab= ove > rule last time I checked. > > Or did I miss something obvious here? No you didn't miss anything. It is just that there is now more potential for error than there was before. > > need to align the remaining, then add fragsz, and then I guess you > > could store remaining and then subtract fragsz from your final virtual > > address to get back to where the starting offset is actually located. > > remaining =3D __ALIGN_KERNEL_MASK(nc->remaining, ~align_mask); > remaining +=3D fragsz; > nc->remaining =3D remaining; > return encoded_page_address(nc->encoded_va) + (size + remaining) - fragsz= ; > > If yes, I am not sure what is the point of doing the above yet, it > just seem to make thing more complicated and harder to understand. That isn't right. I am not sure why you are adding size + remaining or what those are supposed to represent. The issue was that the "remaining" ends up being an unaligned value as you were starting by aligning it and then adding fragsz. So by subtracting fragsz you can get back to the aliglined start. What this patch was doing before was adding the raw unaligned nc->remaining at the end of the function. > > > > Basically your "remaining" value isn't a safe number to use for an > > offset since it isn't aligned to your starting value at any point. > > Does using 'aligned_remaining' local variable to make it more obvious > seem reasonable to you? No, as the value you are storing above isn't guaranteed to be aligned. If you stored it before adding fragsz then it would be aligned. > > > > As far as the negative value, it is more about making it easier to keep > > track of what is actually going on. Basically we can use regular > > pointer math and as such I suspect the compiler is having to do extra > > instructions to flip your value negative before it can combine the > > values via something like the LEA (load effective address) assembler > > call. > > I am not an asm expert here, I am not sure I understand the optimization > trick here. The LEA instruction takes a base address adds 1/2/4/8 times a multiple and then a fixed offset all in one function and provides an address as an output. The general idea is that you could look at converting things such that you are putting together the page address + remaining*1 + PAGE_SIZE. Basically what I was getting at is that addition works, but it doesn't do negative values for the multiple.