Re: [RFC V3 7/9] dcbm: add dma core batch migrator for batch page offloading

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Shivank Garg <shivankg@amd.com>
Cc: <akpm@linux-foundation.org>, <david@redhat.com>, <ziy@nvidia.com>,
	<willy@infradead.org>, <matthew.brost@intel.com>,
	<joshua.hahnjy@gmail.com>, <rakie.kim@sk.com>, <byungchul@sk.com>,
	<gourry@gourry.net>, <ying.huang@linux.alibaba.com>,
	<apopple@nvidia.com>, <lorenzo.stoakes@oracle.com>,
	<Liam.Howlett@oracle.com>, <vbabka@suse.cz>, <rppt@kernel.org>,
	<surenb@google.com>, <mhocko@suse.com>, <vkoul@kernel.org>,
	<lucas.demarchi@intel.com>, <rdunlap@infradead.org>,
	<jgg@ziepe.ca>, <kuba@kernel.org>, <justonli@chromium.org>,
	<ivecera@redhat.com>, <dave.jiang@intel.com>,
	<dan.j.williams@intel.com>, <rientjes@google.com>,
	<Raghavendra.KodsaraThimmappa@amd.com>, <bharata@amd.com>,
	<alirad.malek@zptcorp.com>, <yiannis@zptcorp.com>,
	<weixugc@google.com>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>
Subject: Re: [RFC V3 7/9] dcbm: add dma core batch migrator for batch page offloading
Date: Thu, 2 Oct 2025 12:38:31 +0100	[thread overview]
Message-ID: <20251002123831.000000f9@huawei.com> (raw)
In-Reply-To: <20250923174752.35701-8-shivankg@amd.com>

On Tue, 23 Sep 2025 17:47:42 +0000
Shivank Garg <shivankg@amd.com> wrote:

> The dcbm (DMA core batch migrator) provides a generic interface using
> DMAEngine for end-to-end testing of the batch page migration offload
> feature.
> 
> Enable DCBM offload:
> echo 1 > /sys/kernel/dcbm/offloading
> echo NR_DMA_CHAN_TO_USE > /sys/kernel/dcbm/nr_dma_chan
> 
> Disable DCBM offload:
> echo 0 > /sys/kernel/dcbm/offloading
> 
> Signed-off-by: Shivank Garg <shivankg@amd.com>
Hi Shivank,

Some minor comments inline.

J
> ---
>  drivers/migoffcopy/Kconfig       |   8 +
>  drivers/migoffcopy/Makefile      |   1 +
>  drivers/migoffcopy/dcbm/Makefile |   1 +
>  drivers/migoffcopy/dcbm/dcbm.c   | 415 +++++++++++++++++++++++++++++++
>  4 files changed, 425 insertions(+)
>  create mode 100644 drivers/migoffcopy/dcbm/Makefile
>  create mode 100644 drivers/migoffcopy/dcbm/dcbm.c
> 
> diff --git a/drivers/migoffcopy/Kconfig b/drivers/migoffcopy/Kconfig
> index e73698af3e72..c1b2eff7650d 100644
> --- a/drivers/migoffcopy/Kconfig
> +++ b/drivers/migoffcopy/Kconfig
> @@ -6,4 +6,12 @@ config MTCOPY_CPU
>           Interface MT COPY CPU driver for batch page migration
>           offloading. Say Y if you want to try offloading with
>           MultiThreaded CPU copy APIs.

I'd put a blank line here.

> +config DCBM_DMA
> +       bool "DMA Core Batch Migrator"
> +       depends on OFFC_MIGRATION && DMA_ENGINE
> +       default n

no need to say this. Everything is default n.

> +       help
> +         Interface DMA driver for batch page migration offloading.
> +         Say Y if you want to try offloading with DMAEngine APIs
> +         based driver.

Similar comment on the 'try'

>  
> diff --git a/drivers/migoffcopy/Makefile b/drivers/migoffcopy/Makefile
> index 0a3c356d67e6..dedc86ff54c1 100644
> --- a/drivers/migoffcopy/Makefile
> +++ b/drivers/migoffcopy/Makefile
> @@ -1 +1,2 @@
>  obj-$(CONFIG_MTCOPY_CPU) += mtcopy/
> +obj-$(CONFIG_DCBM_DMA)		+= dcbm/
> diff --git a/drivers/migoffcopy/dcbm/Makefile b/drivers/migoffcopy/dcbm/Makefile
> new file mode 100644
> index 000000000000..56ba47cce0f1
> --- /dev/null
> +++ b/drivers/migoffcopy/dcbm/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_DCBM_DMA) += dcbm.o
> diff --git a/drivers/migoffcopy/dcbm/dcbm.c b/drivers/migoffcopy/dcbm/dcbm.c
> new file mode 100644
> index 000000000000..87a58c0c3b9b
> --- /dev/null
> +++ b/drivers/migoffcopy/dcbm/dcbm.c


> +/**
> + * folios_copy_dma - Copy folios using DMA engine
> + * @dst_list: Destination folio list
> + * @src_list: Source folio list
> + * @nr_folios: Number of folios to copy
> + *
> + * Return: 0. Fallback to CPU copy on any error.
> + */
> +static int folios_copy_dma(struct list_head *dst_list,
> +			   struct list_head *src_list,
> +			   unsigned int nr_folios)
> +{
> +	struct dma_work *works;
> +	struct list_head *src_pos = src_list->next;
> +	struct list_head *dst_pos = dst_list->next;
> +	int i, folios_per_chan, ret = 0;
> +	dma_cap_mask_t mask;
> +	int actual_channels = 0;
> +	int max_channels;
> +
> +	max_channels = min3(nr_dma_channels, nr_folios, MAX_DMA_CHANNELS);
> +
> +	works = kcalloc(max_channels, sizeof(*works), GFP_KERNEL);
> +	if (!works)
> +		goto fallback;
> +
> +	dma_cap_zero(mask);
> +	dma_cap_set(DMA_MEMCPY, mask);
> +
> +	for (i = 0; i < max_channels; i++) {
> +		works[actual_channels].chan = dma_request_chan_by_mask(&mask);
> +		if (IS_ERR(works[actual_channels].chan))
> +			break;
> +		init_completion(&works[actual_channels].done);
> +		actual_channels++;
> +	}
> +
> +	if (actual_channels == 0) {
> +		kfree(works);
> +		goto fallback;
> +	}
> +
> +	for (i = 0; i < actual_channels; i++) {
> +		folios_per_chan = nr_folios * (i + 1) / actual_channels -
> +					(nr_folios * i) / actual_channels;
> +		if (folios_per_chan == 0)
> +			continue;
> +
> +		ret = setup_sg_tables(&works[i], &src_pos, &dst_pos, folios_per_chan);
> +	if (ret)
> +		goto cleanup;
> +	}

Indent issues here.

> +
> +	for (i = 0; i < actual_channels; i++) {
> +		ret = submit_dma_transfers(&works[i]);
> +		if (ret) {
> +			dev_err(dmaengine_get_dma_device(works[i].chan),
> +				"Failed to submit transfers for channel %d\n", i);
> +			goto cleanup;
> +		}
> +	}
> +
> +	for (i = 0; i < actual_channels; i++) {
> +		if (atomic_read(&works[i].pending) > 0)
> +			dma_async_issue_pending(works[i].chan);
> +	}
> +
> +	for (i = 0; i < actual_channels; i++) {
> +		if (atomic_read(&works[i].pending) > 0) {
I'd flip logic to something like.
		if (!atomic_read(&works[i].pending)
			continue;

		if (!wait_for_...

Just to reduce the deep indent.

> +			if (!wait_for_completion_timeout(&works[i].done, msecs_to_jiffies(10000))) {
> +				dev_err(dmaengine_get_dma_device(works[i].chan),
> +					"DMA timeout on channel %d\n", i);
> +				ret = -ETIMEDOUT;
> +				goto cleanup;
> +			}
> +		}
> +	}
> +
> +cleanup:
> +	cleanup_dma_work(works, actual_channels);
> +	if (ret)
> +		goto fallback;

This goto goto dance is probably not worth it.  I'd just duplicate the
cleanup_dma_work() call to have a copy in the error path and one in the non error
path.  Then you just end up with a conventional error block of labels + stuff to do.

> +	return 0;
> +fallback:
> +	/* Fall back to CPU copy */
> +	pr_err("DCBM: Falling back to CPU copy\n");
> +	folios_mc_copy(dst_list, src_list, nr_folios);
> +	return 0;
> +}

> +static ssize_t offloading_store(struct kobject *kobj,
> +				struct kobj_attribute *attr,
> +				const char *buf, size_t count)
> +{
> +	bool enable;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&dcbm_mutex);
> +
> +	if (enable == offloading_enabled) {
> +		pr_err("migration offloading is already %s\n",
> +			 enable ? "ON" : "OFF");

To me that's not an error. Pointless, but not worth moaning about.
Just exit saying nothing.

> +		goto out;
> +	}
> +
> +	if (enable) {
> +		start_offloading(&dma_migrator);
> +		offloading_enabled = true;
> +		pr_info("migration offloading is now ON\n");
> +	} else {
> +		stop_offloading();
> +		offloading_enabled = false;
> +		pr_info("migration offloading is now OFF\n");
> +	}
> +out:
> +	mutex_unlock(&dcbm_mutex);

Perhaps guard and direct return above


> +	return count;
> +}

> +static struct kobj_attribute offloading_attr = __ATTR_RW(offloading);
> +static struct kobj_attribute nr_dma_chan_attr = __ATTR_RW(nr_dma_chan);
> +
> +static struct attribute *dcbm_attrs[] = {
> +	&offloading_attr.attr,
> +	&nr_dma_chan_attr.attr,
> +	NULL,
Trivial but doesn't need a trailing comma given this is a terminating entry
and nothing should ever come after it.

> +};
> +ATTRIBUTE_GROUPS(dcbm);
> +
> +static struct kobject *dcbm_kobj;
> +
> +static int __init dcbm_init(void)
> +{
> +	int ret;
> +
> +	dcbm_kobj = kobject_create_and_add("dcbm", kernel_kobj);
> +	if (!dcbm_kobj)
> +		return -ENOMEM;
> +
> +	ret = sysfs_create_groups(dcbm_kobj, dcbm_groups);
Why use a group here and separate files in the CPU thread one?

I'd prefer a group there as well given slightly simpler error
handling as seen here.


> +	if (ret) {
> +		kobject_put(dcbm_kobj);
> +		return ret;
> +	}
> +
> +	pr_info("DMA Core Batch Migrator initialized\n");
> +	return 0;
> +}
> +

next prev parent reply	other threads:[~2025-10-02 11:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 17:47 [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload Shivank Garg
2025-09-23 17:47 ` [RFC V3 1/9] mm/migrate: factor out code in move_to_new_folio() and migrate_folio_move() Shivank Garg
2025-10-02 10:30   ` Jonathan Cameron
2025-09-23 17:47 ` [RFC V3 2/9] mm/migrate: revive MIGRATE_NO_COPY in migrate_mode Shivank Garg
2025-09-23 17:47 ` [RFC V3 3/9] mm: Introduce folios_mc_copy() for batch copying folios Shivank Garg
2025-09-23 17:47 ` [RFC V3 4/9] mm/migrate: add migrate_folios_batch_move to batch the folio move operations Shivank Garg
2025-10-02 11:03   ` Jonathan Cameron
2025-10-16  9:17     ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 5/9] mm: add support for copy offload for folio Migration Shivank Garg
2025-10-02 11:10   ` Jonathan Cameron
2025-10-16  9:40     ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 6/9] mtcopy: introduce multi-threaded page copy routine Shivank Garg
2025-10-02 11:29   ` Jonathan Cameron
2025-10-20  8:28   ` Byungchul Park
2025-11-06  6:27     ` Garg, Shivank
2025-11-12  2:12       ` Byungchul Park
2025-09-23 17:47 ` [RFC V3 7/9] dcbm: add dma core batch migrator for batch page offloading Shivank Garg
2025-10-02 11:38   ` Jonathan Cameron [this message]
2025-10-16  9:59     ` Garg, Shivank
2025-09-23 17:47 ` [RFC V3 8/9] adjust NR_MAX_BATCHED_MIGRATION for testing Shivank Garg
2025-09-23 17:47 ` [RFC V3 9/9] mtcopy: spread threads across die " Shivank Garg
2025-09-24  1:49 ` [RFC V3 0/9] Accelerate page migration with batch copying and hardware offload Huang, Ying
2025-09-24  2:03   ` Zi Yan
2025-09-24  3:11     ` Huang, Ying
2025-09-24  3:22 ` Zi Yan
2025-10-02 17:10   ` Garg, Shivank

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251002123831.000000f9@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=Raghavendra.KodsaraThimmappa@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alirad.malek@zptcorp.com \
    --cc=apopple@nvidia.com \
    --cc=bharata@amd.com \
    --cc=byungchul@sk.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=ivecera@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=joshua.hahnjy@gmail.com \
    --cc=justonli@chromium.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=rakie.kim@sk.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=shivankg@amd.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vkoul@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.