From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D77E226168 for ; Thu, 12 Dec 2024 15:46:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734018407; cv=none; b=fWcIj8JcEGM8K1VdiR3bPdNTFS9yznVcd7CqbBtxpiRJr4uCvYMDBErX4Ik1ML+VwWd8u0lF8+eFoohK4U+XhR+X8FlaWr9SxM7InOCrKdEpxopz+ehBhj2C00PxPCTHDb1xlUjvt95qOVD2NJR75Wg7TcrBzPsPhZ1Qwn9p3JQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734018407; c=relaxed/simple; bh=Ac/lH1N8Vwnk3NvD4ByTPUBIHOz/Uhxzw21Ss9+dQoY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=EoB9kIp2BTj13rnypCQIilDMAp85TnmHa4Yrb+/XFQHTCoDBrkB71A9jal+ViyhR51gSMUcWYJgqjWssT6nGPfywpHbooNCYfNYo8eziiJWqzBWmQQpuasDa4r/S+5iuqrJ96wBzXcdFEO5JuiN+KT6eJcVIG2Ye+h6tuW88CPE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=RFEH025H; arc=none smtp.client-ip=209.85.166.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="RFEH025H" Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-844dfe4b136so20666139f.3 for ; Thu, 12 Dec 2024 07:46:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1734018403; x=1734623203; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=tO8UEzJ4vRcwe2Pe8Itytncx4R9qJI/GK3O9fbQd828=; b=RFEH025H2ZIDuJmmbq+qp1qKV9wrEseWZpu5O/kvEWFv/e1g8rXhAHtOdJfSeanNAi rZoyQFjTSjGlmV0BZS7TC5/OUiAOeIwDAe23ektmVORCZy5DVhyzXy7emiK3gX4VCn9q r8wHwmx9NErjjKLLH01scm6kdg47wvi0sy4xsThunDREHOfKL8bcE83Itcfey4OeuMSl jc/lbWHEg3BC++Su8gLGllq3bHIjvmp3Ui4dMHu4zTsEXVPMFdCc0M0wt3SkGWYkiaG2 uH3P9UF46xqt5awlHa71wh3Ja2lfXERKY4q9ATTwm/pWWHtAG1vs6Bvg+4Z/yURl8hM8 znIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734018403; x=1734623203; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tO8UEzJ4vRcwe2Pe8Itytncx4R9qJI/GK3O9fbQd828=; b=dSEBtDDIVv/ISIzvSnR28ewWNg24ndAqgF3l4w/pCKqYeEjYZKw4XLMKs6QGgWXJP0 hBFUfNRyUu24rInk0urPsBbUPbK0P6+ABQu/Q+Mwl7gokhmpVaFgJVOelmmHisekgXHW MLeKBSRXnw/qG/32YtF9HibyPlMIj0bNMRoT/q+TJR8zaxtYWV6Pes+kH+TCMTY3/Ldx uXGDY7S14M0YYRPIWIVHq/CQrfoDaS9H10o+rt6UEJLgkI4P1DVMcqgu8IFPKYwkbPNl XKLTqgDi5uYI08E4BnX95WfS8LokJ/IKe9benJloCMafpNa38CMPjlSEROtxwgq4ZSbA AptA== X-Forwarded-Encrypted: i=1; AJvYcCULLckHTIfzSoPPRKV6wSXJY5B7zgO5lfJfjHY2kpnah4FHmmwZ4dZdgBbmGquP+5e6vVJv5uPD+/ohQPc=@vger.kernel.org X-Gm-Message-State: AOJu0YxlDrZ6jjHeqlHiSVtOkCYrVqDytON6Wl8lIph/piSrjbC4dIoA hca/IUs9ZNy0B4SP8lrWL3/A9CObGJV/jTdnNnGARFVG452GKDghGxQ10NJprLE= X-Gm-Gg: ASbGnctOP5Sg9DPqzvAwYnPl4vXLv4vMMkQrezfLYO2djzXV6wou9hv7NzrwcHnhm7e dvjqK6QdyqrWZ6jzqUVaramPAFdvgPTGKRdNk9Inx/XvGI4bB9AxQhD7SaXz7sWUU3QC58puRD7 v5eGc6cWxuUhFNuHtuawBwlY6Jna4Sxm7L81Kb1TXD2u7r0OVSG+iBcq6Uq3jimExTE0aojF0Db G+2k4na4dBSXlAWW1IK7FZGBpEXBgRiXfQbpBJP03x9yR8w/FSI X-Google-Smtp-Source: AGHT+IEWOOfyZauproNpLzyqbWskkA8C5m2JWmRRDzC2lEgVYvSQEsLpIBT1S8N7zYh/eZTxtW0sVQ== X-Received: by 2002:a05:6602:3404:b0:843:eca9:a050 with SMTP id ca18e2360f4ac-844e560bb23mr72779339f.1.1734018403424; Thu, 12 Dec 2024 07:46:43 -0800 (PST) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-844d44183d4sm66027039f.51.2024.12.12.07.46.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Dec 2024 07:46:42 -0800 (PST) Message-ID: <6e6475c2-7a5f-4742-b4ae-e1eef5f2c508@kernel.dk> Date: Thu, 12 Dec 2024 08:46:41 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCHSET v6 0/12] Uncached buffered IO To: Bharata B Rao Cc: bfoster@redhat.com, clm@meta.com, hannes@cmpxchg.org, kirill@shutemov.name, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org References: <20241203153232.92224-2-axboe@kernel.dk> <20241210094842.204504-1-bharata@amd.com> From: Jens Axboe Content-Language: en-US In-Reply-To: <20241210094842.204504-1-bharata@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 12/10/24 2:48 AM, Bharata B Rao wrote: > Hi Jens, > > I ran a couple of variants of FIO to check how this patchset affects the > FIO numbers. My other motivation with this patchset is to check if it > improves the scalability issues that were discussed in [1] and [2]. > But for now, here are some initial numbers. Thanks for testing! > Also note that I am using your buffered-uncached.8 branch from > https://git.kernel.dk/cgit/linux/log/?h=buffered-uncached.8 that has > changes to enable uncached buffered IO for EXT4 and block devices. > > In the below reported numbers, > 'base' means kernel from buffered-uncached.8 branch and > 'patched' means kernel from buffered-uncached.8 branch + above shown FIO change > > FIO on EXT4 partitions > ====================== > nvme1n1 259:12 0 3.5T 0 disk > ??nvme1n1p1 259:13 0 894.3G 0 part /mnt1 > ??nvme1n1p2 259:14 0 894.3G 0 part /mnt2 > ??nvme1n1p3 259:15 0 894.3G 0 part /mnt3 > ??nvme1n1p4 259:16 0 894.1G 0 part /mnt4 > > fio -directory=/mnt4/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -directory=/mnt3/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -directory=/mnt1/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -directory=/mnt2/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > > Four NVME devices are formatted with EXT4 and four parallel FIO instances > are run on them with the options as shown above. > > FIO output looks like this: > > base: > READ: bw=1233MiB/s (1293MB/s), 1233MiB/s-1233MiB/s (1293MB/s-1293MB/s), io=4335GiB (4654GB), run=3600097-3600097msec > WRITE: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=1858GiB (1995GB), run=3600097-3600097msec > READ: bw=1248MiB/s (1308MB/s), 1248MiB/s-1248MiB/s (1308MB/s-1308MB/s), io=4387GiB (4710GB), run=3600091-3600091msec > WRITE: bw=535MiB/s (561MB/s), 535MiB/s-535MiB/s (561MB/s-561MB/s), io=1880GiB (2019GB), run=3600091-3600091msec > READ: bw=1235MiB/s (1294MB/s), 1235MiB/s-1235MiB/s (1294MB/s-1294MB/s), io=4340GiB (4660GB), run=3600094-3600094msec > WRITE: bw=529MiB/s (555MB/s), 529MiB/s-529MiB/s (555MB/s-555MB/s), io=1860GiB (1997GB), run=3600094-3600094msec > READ: bw=1234MiB/s (1294MB/s), 1234MiB/s-1234MiB/s (1294MB/s-1294MB/s), io=4337GiB (4657GB), run=3600093-3600093msec > WRITE: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=1859GiB (1996GB), run=3600093-3600093msec > > patched: > READ: bw=1400MiB/s (1469MB/s), 1400MiB/s-1400MiB/s (1469MB/s-1469MB/s), io=4924GiB (5287GB), run=3600100-3600100msec > WRITE: bw=600MiB/s (629MB/s), 600MiB/s-600MiB/s (629MB/s-629MB/s), io=2110GiB (2266GB), run=3600100-3600100msec > READ: bw=1395MiB/s (1463MB/s), 1395MiB/s-1395MiB/s (1463MB/s-1463MB/s), io=4904GiB (5266GB), run=3600148-3600148msec > WRITE: bw=598MiB/s (627MB/s), 598MiB/s-598MiB/s (627MB/s-627MB/s), io=2102GiB (2257GB), run=3600148-3600148msec > READ: bw=1385MiB/s (1452MB/s), 1385MiB/s-1385MiB/s (1452MB/s-1452MB/s), io=4868GiB (5227GB), run=3600136-3600136msec > WRITE: bw=594MiB/s (622MB/s), 594MiB/s-594MiB/s (622MB/s-622MB/s), io=2087GiB (2241GB), run=3600136-3600136msec > READ: bw=1376MiB/s (1443MB/s), 1376MiB/s-1376MiB/s (1443MB/s-1443MB/s), io=4837GiB (5194GB), run=3600145-3600145msec > WRITE: bw=590MiB/s (618MB/s), 590MiB/s-590MiB/s (618MB/s-618MB/s), io=2073GiB (2226GB), run=3600145-3600145msec > > FIO on block devices > ==================== > nvme1n1 259:12 0 3.5T 0 disk > ??nvme1n1p1 259:13 0 894.3G 0 part > ??nvme1n1p2 259:14 0 894.3G 0 part > ??nvme1n1p3 259:15 0 894.3G 0 part > ??nvme1n1p4 259:16 0 894.1G 0 part > > fio -filename=/dev/nvme1n1p4 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -filename=/dev/nvme1n1p2 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -filename=/dev/nvme1n1p1 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -filename=/dev/nvme1n1p3 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > > Four instances of FIO are run on four different NVME block devices > with the options as shown above. > > base: > READ: bw=8712MiB/s (9135MB/s), 8712MiB/s-8712MiB/s (9135MB/s-9135MB/s), io=29.9TiB (32.9TB), run=3600011-3600011msec > WRITE: bw=3734MiB/s (3915MB/s), 3734MiB/s-3734MiB/s (3915MB/s-3915MB/s), io=12.8TiB (14.1TB), run=3600011-3600011msec > READ: bw=8727MiB/s (9151MB/s), 8727MiB/s-8727MiB/s (9151MB/s-9151MB/s), io=30.0TiB (32.9TB), run=3600005-3600005msec > WRITE: bw=3740MiB/s (3922MB/s), 3740MiB/s-3740MiB/s (3922MB/s-3922MB/s), io=12.8TiB (14.1TB), run=3600005-3600005msec > READ: bw=8701MiB/s (9123MB/s), 8701MiB/s-8701MiB/s (9123MB/s-9123MB/s), io=29.9TiB (32.8TB), run=3600004-3600004msec > WRITE: bw=3729MiB/s (3910MB/s), 3729MiB/s-3729MiB/s (3910MB/s-3910MB/s), io=12.8TiB (14.1TB), run=3600004-3600004msec > READ: bw=8706MiB/s (9128MB/s), 8706MiB/s-8706MiB/s (9128MB/s-9128MB/s), io=29.9TiB (32.9TB), run=3600005-3600005msec > WRITE: bw=3731MiB/s (3913MB/s), 3731MiB/s-3731MiB/s (3913MB/s-3913MB/s), io=12.8TiB (14.1TB), run=3600005-3600005msec > > patched: > READ: bw=1844MiB/s (1933MB/s), 1844MiB/s-1844MiB/s (1933MB/s-1933MB/s), io=6500GiB (6980GB), run=3610641-3610641msec > WRITE: bw=790MiB/s (828MB/s), 790MiB/s-790MiB/s (828MB/s-828MB/s), io=2786GiB (2991GB), run=3610642-3610642msec > READ: bw=1753MiB/s (1838MB/s), 1753MiB/s-1753MiB/s (1838MB/s-1838MB/s), io=6235GiB (6695GB), run=3641973-3641973msec > WRITE: bw=751MiB/s (788MB/s), 751MiB/s-751MiB/s (788MB/s-788MB/s), io=2672GiB (2869GB), run=3641969-3641969msec > READ: bw=1078MiB/s (1130MB/s), 1078MiB/s-1078MiB/s (1130MB/s-1130MB/s), io=3788GiB (4068GB), run=3600007-3600007msec > WRITE: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s), io=1624GiB (1743GB), run=3600007-3600007msec > READ: bw=1752MiB/s (1838MB/s), 1752MiB/s-1752MiB/s (1838MB/s-1838MB/s), io=6234GiB (6694GB), run=3642657-3642657msec > WRITE: bw=751MiB/s (788MB/s), 751MiB/s-751MiB/s (788MB/s-788MB/s), io=2672GiB (2869GB), run=3642622-3642622msec > > While FIO on FS shows improvement, FIO on block shows numbers going down. > Is this expected or am I missing enabling anything else for the block option? The fs side looks as expected, that's a nice win. For the bdev side, I deliberately did not post the bdev patch for enabling uncached buffered IO on a raw block device, as it's just a hack for testing. It currently needs punting similar to dirtying of pages, and it's not optimized at all. We really need the raw bdev ops moving fully to iomap and not being dependent on buffer_heads for this to pan out, so the most likely outcome here is that raw bdev uncached buffered IO will not really be supported until the time that someone (probably Christoph) does that work. I don't think this is a big issue, can't imagine buffered IO on raw block devices being THAT interesting of a use case. -- Jens Axboe