* [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe @ 2013-01-27 6:35 Mark Brown 2013-02-05 14:21 ` Grant Likely 0 siblings, 1 reply; 20+ messages in thread From: Mark Brown @ 2013-01-27 6:35 UTC (permalink / raw) To: Grant Likely; +Cc: linux-kernel, spi-devel-general, Mark Brown Use GFP_DMA in order to ensure that the memory we allocate for transfers in spi_write_then_read() can be DMAed. On most platforms this will have no effect. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> --- drivers/spi/spi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 19ee901..14d0fba 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1656,7 +1656,8 @@ int spi_write_then_read(struct spi_device *spi, * using the pre-allocated buffer or the transfer is too large. */ if ((n_tx + n_rx) > SPI_BUFSIZ || !mutex_trylock(&lock)) { - local_buf = kmalloc(max((unsigned)SPI_BUFSIZ, n_tx + n_rx), GFP_KERNEL); + local_buf = kmalloc(max((unsigned)SPI_BUFSIZ, n_tx + n_rx), + GFP_KERNEL | GFP_DMA); if (!local_buf) return -ENOMEM; } else { -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2013-01-27 6:35 [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe Mark Brown @ 2013-02-05 14:21 ` Grant Likely 2025-03-20 11:43 ` Petr Tesarik 0 siblings, 1 reply; 20+ messages in thread From: Grant Likely @ 2013-02-05 14:21 UTC (permalink / raw) To: Mark Brown; +Cc: linux-kernel, spi-devel-general, Mark Brown On Sun, 27 Jan 2013 14:35:04 +0800, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > Use GFP_DMA in order to ensure that the memory we allocate for transfers > in spi_write_then_read() can be DMAed. On most platforms this will have > no effect. > > Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Applied, thanks. g. > --- > drivers/spi/spi.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 19ee901..14d0fba 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1656,7 +1656,8 @@ int spi_write_then_read(struct spi_device *spi, > * using the pre-allocated buffer or the transfer is too large. > */ > if ((n_tx + n_rx) > SPI_BUFSIZ || !mutex_trylock(&lock)) { > - local_buf = kmalloc(max((unsigned)SPI_BUFSIZ, n_tx + n_rx), GFP_KERNEL); > + local_buf = kmalloc(max((unsigned)SPI_BUFSIZ, n_tx + n_rx), > + GFP_KERNEL | GFP_DMA); > if (!local_buf) > return -ENOMEM; > } else { > -- > 1.7.10.4 > -- Grant Likely, B.Sc, P.Eng. Secret Lab Technologies, Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2013-02-05 14:21 ` Grant Likely @ 2025-03-20 11:43 ` Petr Tesarik 2025-03-20 12:29 ` Mark Brown 0 siblings, 1 reply; 20+ messages in thread From: Petr Tesarik @ 2025-03-20 11:43 UTC (permalink / raw) To: Mark Brown; +Cc: Grant Likely, linux-kernel, linux-spi On Tue, 05 Feb 2013 14:21:28 +0000 Grant Likely <grant.likely@secretlab.ca> wrote: > On Sun, 27 Jan 2013 14:35:04 +0800, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > > Use GFP_DMA in order to ensure that the memory we allocate for transfers > > in spi_write_then_read() can be DMAed. On most platforms this will have > > no effect. > > > > Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> > > Applied, thanks. Hi all, I'm sorry to revive such an old thread, but I'm trying to clean up DMA zone use in preparation of killing the need for that zone entirely, and this use looks fishy to me. I'm curious if it solves a real-world issue. First, the semantics of GFP_DMA can be confusing. FWIW allocating with GFP_DMA does *not* mean you get a buffer that can be directly passed to a DMA controller (think of cache coherency on arm, or memory encryption with confidential computing). Second, this code path is taken only if transfer size is greater than SPI_BUFSIZ, or if there is contention over the pre-allocated buffer, which is initialized in spi_init() without GFP_DMA: buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL); IIUC most transfers use this buffer, and they have apparently worked fine for the last 10+ years... What about reverting commit 2cd94c8a1b41 ("spi: Ensure memory used for spi_write_then_read() is DMA safe"), unless you have strong evidence that it is needed? Petr T > g. > > > --- > > drivers/spi/spi.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > > index 19ee901..14d0fba 100644 > > --- a/drivers/spi/spi.c > > +++ b/drivers/spi/spi.c > > @@ -1656,7 +1656,8 @@ int spi_write_then_read(struct spi_device *spi, > > * using the pre-allocated buffer or the transfer is too large. > > */ > > if ((n_tx + n_rx) > SPI_BUFSIZ || !mutex_trylock(&lock)) { > > - local_buf = kmalloc(max((unsigned)SPI_BUFSIZ, n_tx + n_rx), GFP_KERNEL); > > + local_buf = kmalloc(max((unsigned)SPI_BUFSIZ, n_tx + n_rx), > > + GFP_KERNEL | GFP_DMA); > > if (!local_buf) > > return -ENOMEM; > > } else { > > -- > > 1.7.10.4 > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 11:43 ` Petr Tesarik @ 2025-03-20 12:29 ` Mark Brown 2025-03-20 13:35 ` Arnd Bergmann 0 siblings, 1 reply; 20+ messages in thread From: Mark Brown @ 2025-03-20 12:29 UTC (permalink / raw) To: Petr Tesarik; +Cc: Grant Likely, linux-kernel, linux-spi, arnd [-- Attachment #1: Type: text/plain, Size: 2687 bytes --] On Thu, Mar 20, 2025 at 12:43:30PM +0100, Petr Tesarik wrote: > Grant Likely <grant.likely@secretlab.ca> wrote: > > On Sun, 27 Jan 2013 14:35:04 +0800, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > > > Use GFP_DMA in order to ensure that the memory we allocate for transfers > > > in spi_write_then_read() can be DMAed. On most platforms this will have > > > no effect. > > Applied, thanks. > I'm sorry to revive such an old thread, but I'm trying to clean up DMA > zone use in preparation of killing the need for that zone entirely, and > this use looks fishy to me. I'm curious if it solves a real-world issue. Copying in Arnd who was muttering about this stuff the other day. Since the original patch was over a decade ago I have absolutely no recollection of the circumstances around the change. I imagine I was running into issues on some customer platform. > First, the semantics of GFP_DMA can be confusing. FWIW allocating with > GFP_DMA does *not* mean you get a buffer that can be directly passed to > a DMA controller (think of cache coherency on arm, or memory encryption > with confidential computing). I'm not sure what you're thinking of with cache coherency there? The usual issues are around the DMA controller not being able to address the memory. > Second, this code path is taken only if transfer size is greater than > SPI_BUFSIZ, or if there is contention over the pre-allocated buffer, > which is initialized in spi_init() without GFP_DMA: > buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL); > IIUC most transfers use this buffer, and they have apparently worked > fine for the last 10+ years... On a lot of systems most transfers are short and won't be DMAed at all since PIO ends up being more efficient, and most hardware is perfectly happy to DMA to/from wherever so *shrug*. SPI_BUFSIZ is a maximum of 32 bytes which is going to be under the copybreak limit for quite a few controllers, though admittedly 16 is also a popular number, so a lot of the time we don't actually DMA out of it at all. > What about reverting commit 2cd94c8a1b41 ("spi: Ensure memory used for > spi_write_then_read() is DMA safe"), unless you have strong evidence > that it is needed? The whole goal there is to try to avoid triggering another copy to do the DMA so just reverting rather than replacing with some other construct that achieves the same goal doesn't seem great. I think possibly we should just not do the copy at all any more and trust the core to DTRT with any buffers that are passed in, I think we've got enough stuff in the core though I can't remember if it'll copy with things allocated on the stack well. I'd need to page the status back in. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 12:29 ` Mark Brown @ 2025-03-20 13:35 ` Arnd Bergmann 2025-03-20 14:35 ` Petr Tesarik 2025-03-20 14:42 ` Mark Brown 0 siblings, 2 replies; 20+ messages in thread From: Arnd Bergmann @ 2025-03-20 13:35 UTC (permalink / raw) To: Mark Brown, Petr Tesarik; +Cc: Grant Likely, linux-kernel, linux-spi On Thu, Mar 20, 2025, at 13:29, Mark Brown wrote: > On Thu, Mar 20, 2025 at 12:43:30PM +0100, Petr Tesarik wrote: >> Grant Likely <grant.likely@secretlab.ca> wrote: >> > On Sun, 27 Jan 2013 14:35:04 +0800, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > >> > > Use GFP_DMA in order to ensure that the memory we allocate for transfers >> > > in spi_write_then_read() can be DMAed. On most platforms this will have >> > > no effect. > >> > Applied, thanks. > >> I'm sorry to revive such an old thread, but I'm trying to clean up DMA >> zone use in preparation of killing the need for that zone entirely, and >> this use looks fishy to me. I'm curious if it solves a real-world issue. > > Copying in Arnd who was muttering about this stuff the other day. Since > the original patch was over a decade ago I have absolutely no > recollection of the circumstances around the change. I imagine I was > running into issues on some customer platform. Thanks for adding me! >> Second, this code path is taken only if transfer size is greater than >> SPI_BUFSIZ, or if there is contention over the pre-allocated buffer, >> which is initialized in spi_init() without GFP_DMA: > >> buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL); > >> IIUC most transfers use this buffer, and they have apparently worked >> fine for the last 10+ years... > > On a lot of systems most transfers are short and won't be DMAed at all > since PIO ends up being more efficient, and most hardware is perfectly > happy to DMA to/from wherever so *shrug*. SPI_BUFSIZ is a maximum of 32 > bytes which is going to be under the copybreak limit for quite a few > controllers, though admittedly 16 is also a popular number, so a lot of > the time we don't actually DMA out of it at all. I saw the same thing looked at it the other day and got confused about why 'local_buf' is allocated with GFP_DMA and 'buf' uses plain GFP_KERNEL when they are both used in the same place. It also seems that the copy happens in the regmap_bulk_read() path but not the regmap_bulk_write(), which just passes down the original buffer without copying, as far as I can tell. >> What about reverting commit 2cd94c8a1b41 ("spi: Ensure memory used for >> spi_write_then_read() is DMA safe"), unless you have strong evidence >> that it is needed? > > The whole goal there is to try to avoid triggering another copy to do > the DMA so just reverting rather than replacing with some other > construct that achieves the same goal doesn't seem great. I think > possibly we should just not do the copy at all any more and trust the > core to DTRT with any buffers that are passed in, I think we've got > enough stuff in the core though I can't remember if it'll copy with > things allocated on the stack well. I'd need to page the status back > in. From what I found, there are two scenarios that may depend on GFP_DMA today: a) a performance optimization where allocating from GFP_DMA avoids the swiotlb bounce buffering. This would be the normal case on any 64-bit machine with more than 4GB of RAM and an SPI controller with a 32-bit DMA mask. b) An SPI controller on a 32-bit machine without swiotlb and an effective DMA mask that covers less than the lowmem area. E.g. on Raspberry Pi 4, the brcm,bcm2835-spi lives on a bus with an 1GB dma-ranges translation, but there may be more than 1GB of lowmem with CONFIG_VMSPLIT_2G=y and CONFIG_SWIOTLB=n. Without GFP_DMA that would just end up causing data corruption. I've started playing around with a patch that annotates all kmalloc(..., GFP_DMA) users that use buffers for SPI transfers, as opposed to those that do it for another reason (ISA driver, odd DMA mask, ...). There are probably some missing below, and some of the regmap users are likely not SPI but something else, but overall there are not a lot of them. I think we have some corner cases where a driver allocates a GFP_DMA buffer, calls spi_write_then_read through regmap, which copies the data to the non-GFP_DMA global buffer, and then the SPI controller driver calls dma_map_single() on that, ending up with a third bounce buffer from swiotlb. I don't know what a good replacement interface would be, but ideally there should never be more than one copy here, which means that any temporary buffer would need to be allocated according to the dma_mask of the underlying bus master (dmaengine, spi controller, ...). Arnd diff --git a/drivers/firmware/cirrus/cs_dsp.c b/drivers/firmware/cirrus/cs_dsp.c index 42433c19eb30..10611858bef6 100644 --- a/drivers/firmware/cirrus/cs_dsp.c +++ b/drivers/firmware/cirrus/cs_dsp.c @@ -763,7 +763,7 @@ static int cs_dsp_coeff_write_ctrl_raw(struct cs_dsp_coeff_ctl *ctl, if (ret) return ret; - scratch = kmemdup(buf, len, GFP_KERNEL | GFP_DMA); + scratch = kmemdup(buf, len, GFP_KERNEL | GFP_SPI_DMA); if (!scratch) return -ENOMEM; @@ -868,7 +868,7 @@ static int cs_dsp_coeff_read_ctrl_raw(struct cs_dsp_coeff_ctl *ctl, if (ret) return ret; - scratch = kmalloc(len, GFP_KERNEL | GFP_DMA); + scratch = kmalloc(len, GFP_KERNEL | GFP_SPI_DMA); if (!scratch) return -ENOMEM; @@ -1724,7 +1724,7 @@ static void *cs_dsp_read_algs(struct cs_dsp *dsp, size_t n_algs, /* Convert length from DSP words to bytes */ len *= sizeof(u32); - alg = kzalloc(len, GFP_KERNEL | GFP_DMA); + alg = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); if (!alg) return ERR_PTR(-ENOMEM); diff --git a/drivers/iio/common/ssp_sensors/ssp_iio.c b/drivers/iio/common/ssp_sensors/ssp_iio.c index 78ac689de2fe..246d03187b54 100644 --- a/drivers/iio/common/ssp_sensors/ssp_iio.c +++ b/drivers/iio/common/ssp_sensors/ssp_iio.c @@ -27,7 +27,7 @@ int ssp_common_buffer_postenable(struct iio_dev *indio_dev) /* the allocation is made in post because scan size is known in this * moment * */ - spd->buffer = kmalloc(indio_dev->scan_bytes, GFP_KERNEL | GFP_DMA); + spd->buffer = kmalloc(indio_dev->scan_bytes, GFP_KERNEL | GFP_SPI_DMA); if (!spd->buffer) return -ENOMEM; diff --git a/drivers/iio/common/ssp_sensors/ssp_spi.c b/drivers/iio/common/ssp_sensors/ssp_spi.c index f32b04b63ea1..b70dd891801f 100644 --- a/drivers/iio/common/ssp_sensors/ssp_spi.c +++ b/drivers/iio/common/ssp_sensors/ssp_spi.c @@ -87,7 +87,7 @@ static struct ssp_msg *ssp_create_msg(u8 cmd, u16 len, u16 opt, u32 data) h.data = cpu_to_le32(data); msg->buffer = kzalloc(SSP_HEADER_SIZE_ALIGNED + len, - GFP_KERNEL | GFP_DMA); + GFP_KERNEL | GFP_SPI_DMA); if (!msg->buffer) { kfree(msg); return NULL; @@ -375,7 +375,7 @@ int ssp_irq_msg(struct ssp_data *data) * but the slave should not send such ones - it is to * check but let's handle this */ - buffer = kmalloc(length, GFP_KERNEL | GFP_DMA); + buffer = kmalloc(length, GFP_KERNEL | GFP_SPI_DMA); if (!buffer) { ret = -ENOMEM; goto _unlock; @@ -420,7 +420,7 @@ int ssp_irq_msg(struct ssp_data *data) mutex_unlock(&data->pending_lock); break; case SSP_HUB2AP_WRITE: - buffer = kzalloc(length, GFP_KERNEL | GFP_DMA); + buffer = kzalloc(length, GFP_KERNEL | GFP_SPI_DMA); if (!buffer) return -ENOMEM; diff --git a/drivers/input/rmi4/rmi_spi.c b/drivers/input/rmi4/rmi_spi.c index 9d92129aa432..a9abc021bfad 100644 --- a/drivers/input/rmi4/rmi_spi.c +++ b/drivers/input/rmi4/rmi_spi.c @@ -67,7 +67,7 @@ static int rmi_spi_manage_pools(struct rmi_spi_xport *rmi_spi, int len) tmp = rmi_spi->rx_buf; buf = devm_kcalloc(&spi->dev, buf_size, 2, - GFP_KERNEL | GFP_DMA); + GFP_KERNEL | GFP_SPI_DMA); if (!buf) return -ENOMEM; diff --git a/drivers/media/spi/cxd2880-spi.c b/drivers/media/spi/cxd2880-spi.c index 65fa7f857fca..7063d46c4166 100644 --- a/drivers/media/spi/cxd2880-spi.c +++ b/drivers/media/spi/cxd2880-spi.c @@ -389,7 +389,7 @@ static int cxd2880_start_feed(struct dvb_demux_feed *feed) if (dvb_spi->feed_count == 0) { dvb_spi->ts_buf = kzalloc(MAX_TRANS_PKT * 188, - GFP_KERNEL | GFP_DMA); + GFP_KERNEL | GFP_SPI_DMA); if (!dvb_spi->ts_buf) { pr_err("ts buffer allocate failed\n"); memset(&dvb_spi->filter_config, 0, diff --git a/drivers/misc/gehc-achc.c b/drivers/misc/gehc-achc.c index b8fca4d393c6..42af67fd232d 100644 --- a/drivers/misc/gehc-achc.c +++ b/drivers/misc/gehc-achc.c @@ -225,7 +225,7 @@ static int ezport_flash_transfer(struct spi_device *spi, u32 address, if (ret < 0) return ret; - command = kmalloc(4, GFP_KERNEL | GFP_DMA); + command = kmalloc(4, GFP_KERNEL | GFP_SPI_DMA); if (!command) return -ENOMEM; @@ -255,7 +255,7 @@ static int ezport_flash_compare(struct spi_device *spi, u32 address, u8 *buffer; int ret; - buffer = kmalloc(payload_size + 5, GFP_KERNEL | GFP_DMA); + buffer = kmalloc(payload_size + 5, GFP_KERNEL | GFP_SPI_DMA); if (!buffer) return -ENOMEM; diff --git a/drivers/net/wireless/st/cw1200/fwio.c b/drivers/net/wireless/st/cw1200/fwio.c index 2a03dc533b6a..6cdbb4980b02 100644 --- a/drivers/net/wireless/st/cw1200/fwio.c +++ b/drivers/net/wireless/st/cw1200/fwio.c @@ -148,7 +148,7 @@ static int cw1200_load_firmware_cw1200(struct cw1200_common *priv) goto exit; } - buf = kmalloc(DOWNLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); + buf = kmalloc(DOWNLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_SPI_DMA); if (!buf) { pr_err("Can't allocate firmware load buffer.\n"); ret = -ENOMEM; diff --git a/drivers/net/wireless/st/cw1200/wsm.c b/drivers/net/wireless/st/cw1200/wsm.c index 4a9e4b5d3547..9a8c6510d1d6 100644 --- a/drivers/net/wireless/st/cw1200/wsm.c +++ b/drivers/net/wireless/st/cw1200/wsm.c @@ -1775,7 +1775,7 @@ void wsm_txed(struct cw1200_common *priv, u8 *data) void wsm_buf_init(struct wsm_buf *buf) { BUG_ON(buf->begin); - buf->begin = kmalloc(FWLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); + buf->begin = kmalloc(FWLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_SPI_DMA); buf->end = buf->begin ? &buf->begin[FWLOAD_BLOCK_SIZE] : buf->begin; wsm_buf_reset(buf); } @@ -1804,7 +1804,7 @@ static int wsm_buf_reserve(struct wsm_buf *buf, size_t extra_size) size = round_up(size, FWLOAD_BLOCK_SIZE); - tmp = krealloc(buf->begin, size, GFP_KERNEL | GFP_DMA); + tmp = krealloc(buf->begin, size, GFP_KERNEL | GFP_SPI_DMA); if (!tmp) { wsm_buf_deinit(buf); return -ENOMEM; diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c index cd8ad0fe59cc..a0063878e47c 100644 --- a/drivers/net/wireless/ti/wlcore/cmd.c +++ b/drivers/net/wireless/ti/wlcore/cmd.c @@ -172,7 +172,7 @@ int wlcore_cmd_wait_for_event_or_timeout(struct wl1271 *wl, *timeout = false; - events_vector = kmalloc(sizeof(*events_vector), GFP_KERNEL | GFP_DMA); + events_vector = kmalloc(sizeof(*events_vector), GFP_KERNEL); if (!events_vector) return -ENOMEM; diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c index 8fb58a5d911c..02962702b72d 100644 --- a/drivers/net/wireless/ti/wlcore/main.c +++ b/drivers/net/wireless/ti/wlcore/main.c @@ -6477,7 +6477,7 @@ struct ieee80211_hw *wlcore_alloc_hw(size_t priv_size, u32 aggr_buf_size, } wl->mbox_size = mbox_size; - wl->mbox = kmalloc(wl->mbox_size, GFP_KERNEL | GFP_DMA); + wl->mbox = kmalloc(wl->mbox_size, GFP_KERNEL | GFP_SPI_DMA); if (!wl->mbox) { ret = -ENOMEM; goto err_fwlog; diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 65db9349f905..6c3d3d7cc6fe 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -382,6 +382,7 @@ enum { #define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) #define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL) #define GFP_DMA __GFP_DMA +#define GFP_SPI_DMA __GFP_DMA #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE | __GFP_SKIP_KASAN) diff --git a/include/sound/cs35l56.h b/include/sound/cs35l56.h index 5d653a3491d0..415497307c45 100644 --- a/include/sound/cs35l56.h +++ b/include/sound/cs35l56.h @@ -295,7 +295,7 @@ static inline int cs35l56_init_config_for_spi(struct cs35l56_base *cs35l56, { cs35l56->spi_payload_buf = devm_kzalloc(&spi->dev, sizeof(*cs35l56->spi_payload_buf), - GFP_KERNEL | GFP_DMA); + GFP_KERNEL | GFP_SPI_DMA); if (!cs35l56->spi_payload_buf) return -ENOMEM; diff --git a/sound/soc/codecs/arizona.c b/sound/soc/codecs/arizona.c index 74b0968f425a..c679d278c53e 100644 --- a/sound/soc/codecs/arizona.c +++ b/sound/soc/codecs/arizona.c @@ -2734,7 +2734,7 @@ int arizona_eq_coeff_put(struct snd_kcontrol *kcontrol, len = params->num_regs * regmap_get_val_bytes(arizona->regmap); - data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | GFP_DMA); + data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; diff --git a/sound/soc/codecs/madera.c b/sound/soc/codecs/madera.c index bc3470cf2c54..c75f6a617c73 100644 --- a/sound/soc/codecs/madera.c +++ b/sound/soc/codecs/madera.c @@ -4754,7 +4754,7 @@ int madera_eq_coeff_put(struct snd_kcontrol *kcontrol, len = params->num_regs * regmap_get_val_bytes(madera->regmap); - data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | GFP_DMA); + data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; diff --git a/sound/soc/codecs/nau8810.c b/sound/soc/codecs/nau8810.c index 6f432b992941..2b66c50bbf1b 100644 --- a/sound/soc/codecs/nau8810.c +++ b/sound/soc/codecs/nau8810.c @@ -205,7 +205,7 @@ static int nau8810_eq_put(struct snd_kcontrol *kcontrol, __be16 *tmp; data = kmemdup(ucontrol->value.bytes.data, - params->max, GFP_KERNEL | GFP_DMA); + params->max, GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; diff --git a/sound/soc/codecs/nau8821.c b/sound/soc/codecs/nau8821.c index edb95f869a4a..7cca9dac7e5f 100644 --- a/sound/soc/codecs/nau8821.c +++ b/sound/soc/codecs/nau8821.c @@ -303,7 +303,7 @@ static int nau8821_biq_coeff_put(struct snd_kcontrol *kcontrol, return -EINVAL; data = kmemdup(ucontrol->value.bytes.data, - params->max, GFP_KERNEL | GFP_DMA); + params->max, GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; diff --git a/sound/soc/codecs/nau8822.c b/sound/soc/codecs/nau8822.c index 15d6f8d01f78..016f537ebb30 100644 --- a/sound/soc/codecs/nau8822.c +++ b/sound/soc/codecs/nau8822.c @@ -221,7 +221,7 @@ static int nau8822_eq_put(struct snd_kcontrol *kcontrol, __be16 *tmp; data = kmemdup(ucontrol->value.bytes.data, - params->max, GFP_KERNEL | GFP_DMA); + params->max, GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; diff --git a/sound/soc/codecs/nau8825.c b/sound/soc/codecs/nau8825.c index 25b8b19e27ec..43a73aabf891 100644 --- a/sound/soc/codecs/nau8825.c +++ b/sound/soc/codecs/nau8825.c @@ -1017,7 +1017,7 @@ static int nau8825_biq_coeff_put(struct snd_kcontrol *kcontrol, return -EINVAL; data = kmemdup(ucontrol->value.bytes.data, - params->max, GFP_KERNEL | GFP_DMA); + params->max, GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; diff --git a/sound/soc/codecs/tas571x.c b/sound/soc/codecs/tas571x.c index 6c6e7ae07d80..a0f75b089aa4 100644 --- a/sound/soc/codecs/tas571x.c +++ b/sound/soc/codecs/tas571x.c @@ -150,7 +150,7 @@ static int tas571x_reg_write_multiword(struct i2c_client *client, int ret; size_t send_size = 1 + len * sizeof(uint32_t); - buf = kzalloc(send_size, GFP_KERNEL | GFP_DMA); + buf = kzalloc(send_size, GFP_KERNEL | GFP_SPI_DMA); if (!buf) return -ENOMEM; buf[0] = reg; @@ -183,7 +183,7 @@ static int tas571x_reg_read_multiword(struct i2c_client *client, unsigned int recv_size = len * sizeof(uint32_t); int ret; - recv_buf = kzalloc(recv_size, GFP_KERNEL | GFP_DMA); + recv_buf = kzalloc(recv_size, GFP_KERNEL | GFP_DMA); // XXXX if (!recv_buf) return -ENOMEM; diff --git a/sound/soc/codecs/wm0010.c b/sound/soc/codecs/wm0010.c index 8d1a575532ff..d1ef988aa84e 100644 --- a/sound/soc/codecs/wm0010.c +++ b/sound/soc/codecs/wm0010.c @@ -405,14 +405,14 @@ static int wm0010_firmware_load(const char *name, struct snd_soc_component *comp xfer->component = component; list_add_tail(&xfer->list, &xfer_list); - out = kzalloc(len, GFP_KERNEL | GFP_DMA); + out = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); if (!out) { ret = -ENOMEM; goto abort1; } xfer->t.rx_buf = out; - img = kzalloc(len, GFP_KERNEL | GFP_DMA); + img = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); if (!img) { ret = -ENOMEM; goto abort1; @@ -504,13 +504,13 @@ static int wm0010_stage2_load(struct snd_soc_component *component) dev_dbg(component->dev, "Downloading %zu byte stage 2 loader\n", fw->size); /* Copy to local buffer first as vmalloc causes problems for dma */ - img = kmemdup(&fw->data[0], fw->size, GFP_KERNEL | GFP_DMA); + img = kmemdup(&fw->data[0], fw->size, GFP_KERNEL | GFP_SPI_DMA); if (!img) { ret = -ENOMEM; goto abort2; } - out = kzalloc(fw->size, GFP_KERNEL | GFP_DMA); + out = kzalloc(fw->size, GFP_KERNEL | GFP_SPI_DMA); if (!out) { ret = -ENOMEM; goto abort1; @@ -638,11 +638,11 @@ static int wm0010_boot(struct snd_soc_component *component) ret = -ENOMEM; len = pll_rec.length + 8; - out = kzalloc(len, GFP_KERNEL | GFP_DMA); + out = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); if (!out) goto abort; - img_swap = kzalloc(len, GFP_KERNEL | GFP_DMA); + img_swap = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); if (!img_swap) goto abort_out; diff --git a/sound/soc/codecs/wm_adsp.c b/sound/soc/codecs/wm_adsp.c index 91c8697c29c3..7e8b44b911c0 100644 --- a/sound/soc/codecs/wm_adsp.c +++ b/sound/soc/codecs/wm_adsp.c @@ -1373,7 +1373,7 @@ int wm_adsp_compr_set_params(struct snd_soc_component *component, compr->size.fragment_size, compr->size.fragments); size = wm_adsp_compr_frag_words(compr) * sizeof(*compr->raw_buf); - compr->raw_buf = kmalloc(size, GFP_DMA | GFP_KERNEL); + compr->raw_buf = kmalloc(size, GFP_SPI_DMA | GFP_KERNEL); if (!compr->raw_buf) return -ENOMEM; diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c index cd5f927bcd4e..231e80ca0386 100644 --- a/sound/soc/soc-ops.c +++ b/sound/soc/soc-ops.c @@ -749,7 +749,7 @@ int snd_soc_bytes_put(struct snd_kcontrol *kcontrol, len = params->num_regs * component->val_bytes; void *data __free(kfree) = kmemdup(ucontrol->value.bytes.data, len, - GFP_KERNEL | GFP_DMA); + GFP_KERNEL | GFP_SPI_DMA); if (!data) return -ENOMEM; ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 13:35 ` Arnd Bergmann @ 2025-03-20 14:35 ` Petr Tesarik 2025-03-20 15:34 ` Mark Brown 2025-03-20 14:42 ` Mark Brown 1 sibling, 1 reply; 20+ messages in thread From: Petr Tesarik @ 2025-03-20 14:35 UTC (permalink / raw) To: Arnd Bergmann Cc: Mark Brown, Grant Likely, linux-kernel, linux-spi, Robin Murphy CC'ing Robin Murphy, because there seem to be some doubts about DMA API efficiency. On Thu, 20 Mar 2025 14:35:41 +0100 "Arnd Bergmann" <arnd@arndb.de> wrote: > On Thu, Mar 20, 2025, at 13:29, Mark Brown wrote: > > On Thu, Mar 20, 2025 at 12:43:30PM +0100, Petr Tesarik wrote: > >> Grant Likely <grant.likely@secretlab.ca> wrote: > >> > On Sun, 27 Jan 2013 14:35:04 +0800, Mark Brown <broonie@opensource.wolfsonmicro.com> wrote: > > > >> > > Use GFP_DMA in order to ensure that the memory we allocate for transfers > >> > > in spi_write_then_read() can be DMAed. On most platforms this will have > >> > > no effect. > > > >> > Applied, thanks. > > > >> I'm sorry to revive such an old thread, but I'm trying to clean up DMA > >> zone use in preparation of killing the need for that zone entirely, and > >> this use looks fishy to me. I'm curious if it solves a real-world issue. > > > > Copying in Arnd who was muttering about this stuff the other day. Since > > the original patch was over a decade ago I have absolutely no > > recollection of the circumstances around the change. I imagine I was > > running into issues on some customer platform. > > Thanks for adding me! > > >> Second, this code path is taken only if transfer size is greater than > >> SPI_BUFSIZ, or if there is contention over the pre-allocated buffer, > >> which is initialized in spi_init() without GFP_DMA: > > > >> buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL); > > > >> IIUC most transfers use this buffer, and they have apparently worked > >> fine for the last 10+ years... > > > > On a lot of systems most transfers are short and won't be DMAed at all > > since PIO ends up being more efficient, and most hardware is perfectly > > happy to DMA to/from wherever so *shrug*. SPI_BUFSIZ is a maximum of 32 > > bytes which is going to be under the copybreak limit for quite a few > > controllers, though admittedly 16 is also a popular number, so a lot of > > the time we don't actually DMA out of it at all. > > I saw the same thing looked at it the other day and got confused > about why 'local_buf' is allocated with GFP_DMA and 'buf' > uses plain GFP_KERNEL when they are both used in the same place. > > It also seems that the copy happens in the regmap_bulk_read() > path but not the regmap_bulk_write(), which just passes down > the original buffer without copying, as far as I can tell. > > >> What about reverting commit 2cd94c8a1b41 ("spi: Ensure memory used for > >> spi_write_then_read() is DMA safe"), unless you have strong evidence > >> that it is needed? > > > > The whole goal there is to try to avoid triggering another copy to do > > the DMA so just reverting rather than replacing with some other > > construct that achieves the same goal doesn't seem great. I think > > possibly we should just not do the copy at all any more and trust the > > core to DTRT with any buffers that are passed in, I think we've got > > enough stuff in the core though I can't remember if it'll copy with > > things allocated on the stack well. I'd need to page the status back > > in. No, I'm afraid kernel stack addresses (and many other types of addresses) cannot be used for DMA: https://docs.kernel.org/core-api/dma-api-howto.html#what-memory-is-dma-able > From what I found, there are two scenarios that may depend on > GFP_DMA today: > > a) a performance optimization where allocating from GFP_DMA avoids > the swiotlb bounce buffering. This would be the normal case on > any 64-bit machine with more than 4GB of RAM and an SPI > controller with a 32-bit DMA mask. I must be missing something. How is a memcpy() in spi_write_then_read() faster than a memcpy() by swiotlb? > b) An SPI controller on a 32-bit machine without swiotlb and an > effective DMA mask that covers less than the lowmem area. > E.g. on Raspberry Pi 4, the brcm,bcm2835-spi lives on a > bus with an 1GB dma-ranges translation, but there may be more > than 1GB of lowmem with CONFIG_VMSPLIT_2G=y and CONFIG_SWIOTLB=n. > Without GFP_DMA that would just end up causing data corruption. Thanks for mentioning RPi4. You may remember that the wrapper around the PCIe block in early BCM2711 chip revisions had a bug preventing it from accessing memory beyond the first 3GB (physical): https://www.spinics.net/lists/arm-kernel/msg740693.html What does it mean? The limit does not depend only on the device, but also on how it is connected to the CPU. We had some trouble defining the DMA zone on Arm because of that... Even worse, you can have multiple buses with different limits. An offset may be added by a bus bridge (this does happen in the wild). OTOH an IOMMU may map any physical address into the target device's limited address space... In short, I believe you should not try to reinvent the DMA API here. > I've started playing around with a patch that annotates all > kmalloc(..., GFP_DMA) users that use buffers for SPI transfers, > as opposed to those that do it for another reason (ISA driver, > odd DMA mask, ...). There are probably some missing below, and some > of the regmap users are likely not SPI but something else, but > overall there are not a lot of them. > > I think we have some corner cases where a driver allocates > a GFP_DMA buffer, calls spi_write_then_read through regmap, > which copies the data to the non-GFP_DMA global buffer, > and then the SPI controller driver calls dma_map_single() > on that, ending up with a third bounce buffer from > swiotlb. > > I don't know what a good replacement interface would be, but > ideally there should never be more than one copy here, > which means that any temporary buffer would need to be > allocated according to the dma_mask of the underlying > bus master (dmaengine, spi controller, ...). Thank you for the attached patch. That's a lot of places that may be using GFP_DMA incorrectly, and you have put a lot of effort into understanding every single one of them. Greatly appreciated! I still believe the SPI subsystem should not try to be clever. The DMA API already avoids unnecessary copying as much as possible. Petr T > > Arnd > > diff --git a/drivers/firmware/cirrus/cs_dsp.c > b/drivers/firmware/cirrus/cs_dsp.c index 42433c19eb30..10611858bef6 > 100644 --- a/drivers/firmware/cirrus/cs_dsp.c > +++ b/drivers/firmware/cirrus/cs_dsp.c > @@ -763,7 +763,7 @@ static int cs_dsp_coeff_write_ctrl_raw(struct > cs_dsp_coeff_ctl *ctl, if (ret) > return ret; > > - scratch = kmemdup(buf, len, GFP_KERNEL | GFP_DMA); > + scratch = kmemdup(buf, len, GFP_KERNEL | GFP_SPI_DMA); > if (!scratch) > return -ENOMEM; > > @@ -868,7 +868,7 @@ static int cs_dsp_coeff_read_ctrl_raw(struct > cs_dsp_coeff_ctl *ctl, if (ret) > return ret; > > - scratch = kmalloc(len, GFP_KERNEL | GFP_DMA); > + scratch = kmalloc(len, GFP_KERNEL | GFP_SPI_DMA); > if (!scratch) > return -ENOMEM; > > @@ -1724,7 +1724,7 @@ static void *cs_dsp_read_algs(struct cs_dsp > *dsp, size_t n_algs, /* Convert length from DSP words to bytes */ > len *= sizeof(u32); > > - alg = kzalloc(len, GFP_KERNEL | GFP_DMA); > + alg = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); > if (!alg) > return ERR_PTR(-ENOMEM); > > diff --git a/drivers/iio/common/ssp_sensors/ssp_iio.c > b/drivers/iio/common/ssp_sensors/ssp_iio.c index > 78ac689de2fe..246d03187b54 100644 --- > a/drivers/iio/common/ssp_sensors/ssp_iio.c +++ > b/drivers/iio/common/ssp_sensors/ssp_iio.c @@ -27,7 +27,7 @@ int > ssp_common_buffer_postenable(struct iio_dev *indio_dev) /* the > allocation is made in post because scan size is known in this > * moment > * */ > - spd->buffer = kmalloc(indio_dev->scan_bytes, GFP_KERNEL | > GFP_DMA); > + spd->buffer = kmalloc(indio_dev->scan_bytes, GFP_KERNEL | > GFP_SPI_DMA); if (!spd->buffer) > return -ENOMEM; > > diff --git a/drivers/iio/common/ssp_sensors/ssp_spi.c > b/drivers/iio/common/ssp_sensors/ssp_spi.c index > f32b04b63ea1..b70dd891801f 100644 --- > a/drivers/iio/common/ssp_sensors/ssp_spi.c +++ > b/drivers/iio/common/ssp_sensors/ssp_spi.c @@ -87,7 +87,7 @@ static > struct ssp_msg *ssp_create_msg(u8 cmd, u16 len, u16 opt, u32 data) > h.data = cpu_to_le32(data); > msg->buffer = kzalloc(SSP_HEADER_SIZE_ALIGNED + len, > - GFP_KERNEL | GFP_DMA); > + GFP_KERNEL | GFP_SPI_DMA); > if (!msg->buffer) { > kfree(msg); > return NULL; > @@ -375,7 +375,7 @@ int ssp_irq_msg(struct ssp_data *data) > * but the slave should not send such ones - > it is to > * check but let's handle this > */ > - buffer = kmalloc(length, GFP_KERNEL | > GFP_DMA); > + buffer = kmalloc(length, GFP_KERNEL | > GFP_SPI_DMA); if (!buffer) { > ret = -ENOMEM; > goto _unlock; > @@ -420,7 +420,7 @@ int ssp_irq_msg(struct ssp_data *data) > mutex_unlock(&data->pending_lock); > break; > case SSP_HUB2AP_WRITE: > - buffer = kzalloc(length, GFP_KERNEL | GFP_DMA); > + buffer = kzalloc(length, GFP_KERNEL | GFP_SPI_DMA); > if (!buffer) > return -ENOMEM; > > diff --git a/drivers/input/rmi4/rmi_spi.c > b/drivers/input/rmi4/rmi_spi.c index 9d92129aa432..a9abc021bfad 100644 > --- a/drivers/input/rmi4/rmi_spi.c > +++ b/drivers/input/rmi4/rmi_spi.c > @@ -67,7 +67,7 @@ static int rmi_spi_manage_pools(struct > rmi_spi_xport *rmi_spi, int len) > tmp = rmi_spi->rx_buf; > buf = devm_kcalloc(&spi->dev, buf_size, 2, > - GFP_KERNEL | GFP_DMA); > + GFP_KERNEL | GFP_SPI_DMA); > if (!buf) > return -ENOMEM; > > diff --git a/drivers/media/spi/cxd2880-spi.c > b/drivers/media/spi/cxd2880-spi.c index 65fa7f857fca..7063d46c4166 > 100644 --- a/drivers/media/spi/cxd2880-spi.c > +++ b/drivers/media/spi/cxd2880-spi.c > @@ -389,7 +389,7 @@ static int cxd2880_start_feed(struct > dvb_demux_feed *feed) if (dvb_spi->feed_count == 0) { > dvb_spi->ts_buf = > kzalloc(MAX_TRANS_PKT * 188, > - GFP_KERNEL | GFP_DMA); > + GFP_KERNEL | GFP_SPI_DMA); > if (!dvb_spi->ts_buf) { > pr_err("ts buffer allocate failed\n"); > memset(&dvb_spi->filter_config, 0, > diff --git a/drivers/misc/gehc-achc.c b/drivers/misc/gehc-achc.c > index b8fca4d393c6..42af67fd232d 100644 > --- a/drivers/misc/gehc-achc.c > +++ b/drivers/misc/gehc-achc.c > @@ -225,7 +225,7 @@ static int ezport_flash_transfer(struct > spi_device *spi, u32 address, if (ret < 0) > return ret; > > - command = kmalloc(4, GFP_KERNEL | GFP_DMA); > + command = kmalloc(4, GFP_KERNEL | GFP_SPI_DMA); > if (!command) > return -ENOMEM; > > @@ -255,7 +255,7 @@ static int ezport_flash_compare(struct spi_device > *spi, u32 address, u8 *buffer; > int ret; > > - buffer = kmalloc(payload_size + 5, GFP_KERNEL | GFP_DMA); > + buffer = kmalloc(payload_size + 5, GFP_KERNEL | GFP_SPI_DMA); > if (!buffer) > return -ENOMEM; > > diff --git a/drivers/net/wireless/st/cw1200/fwio.c > b/drivers/net/wireless/st/cw1200/fwio.c index > 2a03dc533b6a..6cdbb4980b02 100644 --- > a/drivers/net/wireless/st/cw1200/fwio.c +++ > b/drivers/net/wireless/st/cw1200/fwio.c @@ -148,7 +148,7 @@ static > int cw1200_load_firmware_cw1200(struct cw1200_common *priv) goto exit; > } > > - buf = kmalloc(DOWNLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + buf = kmalloc(DOWNLOAD_BLOCK_SIZE, GFP_KERNEL | GFP_SPI_DMA); > if (!buf) { > pr_err("Can't allocate firmware load buffer.\n"); > ret = -ENOMEM; > diff --git a/drivers/net/wireless/st/cw1200/wsm.c > b/drivers/net/wireless/st/cw1200/wsm.c index > 4a9e4b5d3547..9a8c6510d1d6 100644 --- > a/drivers/net/wireless/st/cw1200/wsm.c +++ > b/drivers/net/wireless/st/cw1200/wsm.c @@ -1775,7 +1775,7 @@ void > wsm_txed(struct cw1200_common *priv, u8 *data) void > wsm_buf_init(struct wsm_buf *buf) { > BUG_ON(buf->begin); > - buf->begin = kmalloc(FWLOAD_BLOCK_SIZE, GFP_KERNEL | > GFP_DMA); > + buf->begin = kmalloc(FWLOAD_BLOCK_SIZE, GFP_KERNEL | > GFP_SPI_DMA); buf->end = buf->begin ? &buf->begin[FWLOAD_BLOCK_SIZE] > : buf->begin; wsm_buf_reset(buf); > } > @@ -1804,7 +1804,7 @@ static int wsm_buf_reserve(struct wsm_buf *buf, > size_t extra_size) > size = round_up(size, FWLOAD_BLOCK_SIZE); > > - tmp = krealloc(buf->begin, size, GFP_KERNEL | GFP_DMA); > + tmp = krealloc(buf->begin, size, GFP_KERNEL | GFP_SPI_DMA); > if (!tmp) { > wsm_buf_deinit(buf); > return -ENOMEM; > diff --git a/drivers/net/wireless/ti/wlcore/cmd.c > b/drivers/net/wireless/ti/wlcore/cmd.c index > cd8ad0fe59cc..a0063878e47c 100644 --- > a/drivers/net/wireless/ti/wlcore/cmd.c +++ > b/drivers/net/wireless/ti/wlcore/cmd.c @@ -172,7 +172,7 @@ int > wlcore_cmd_wait_for_event_or_timeout(struct wl1271 *wl, > *timeout = false; > > - events_vector = kmalloc(sizeof(*events_vector), GFP_KERNEL | > GFP_DMA); > + events_vector = kmalloc(sizeof(*events_vector), GFP_KERNEL); > if (!events_vector) > return -ENOMEM; > > diff --git a/drivers/net/wireless/ti/wlcore/main.c > b/drivers/net/wireless/ti/wlcore/main.c index > 8fb58a5d911c..02962702b72d 100644 --- > a/drivers/net/wireless/ti/wlcore/main.c +++ > b/drivers/net/wireless/ti/wlcore/main.c @@ -6477,7 +6477,7 @@ struct > ieee80211_hw *wlcore_alloc_hw(size_t priv_size, u32 aggr_buf_size, } > > wl->mbox_size = mbox_size; > - wl->mbox = kmalloc(wl->mbox_size, GFP_KERNEL | GFP_DMA); > + wl->mbox = kmalloc(wl->mbox_size, GFP_KERNEL | GFP_SPI_DMA); > if (!wl->mbox) { > ret = -ENOMEM; > goto err_fwlog; > diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h > index 65db9349f905..6c3d3d7cc6fe 100644 > --- a/include/linux/gfp_types.h > +++ b/include/linux/gfp_types.h > @@ -382,6 +382,7 @@ enum { > #define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) > #define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | > __GFP_HARDWALL) #define GFP_DMA __GFP_DMA > +#define GFP_SPI_DMA __GFP_DMA > #define GFP_DMA32 __GFP_DMA32 > #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) > #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE | > __GFP_SKIP_KASAN) diff --git a/include/sound/cs35l56.h > b/include/sound/cs35l56.h index 5d653a3491d0..415497307c45 100644 > --- a/include/sound/cs35l56.h > +++ b/include/sound/cs35l56.h > @@ -295,7 +295,7 @@ static inline int > cs35l56_init_config_for_spi(struct cs35l56_base *cs35l56, { > cs35l56->spi_payload_buf = devm_kzalloc(&spi->dev, > sizeof(*cs35l56->spi_payload_buf), > - GFP_KERNEL | > GFP_DMA); > + GFP_KERNEL | > GFP_SPI_DMA); if (!cs35l56->spi_payload_buf) > return -ENOMEM; > > diff --git a/sound/soc/codecs/arizona.c b/sound/soc/codecs/arizona.c > index 74b0968f425a..c679d278c53e 100644 > --- a/sound/soc/codecs/arizona.c > +++ b/sound/soc/codecs/arizona.c > @@ -2734,7 +2734,7 @@ int arizona_eq_coeff_put(struct snd_kcontrol > *kcontrol, > len = params->num_regs * > regmap_get_val_bytes(arizona->regmap); > - data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | > GFP_DMA); > + data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | > GFP_SPI_DMA); if (!data) > return -ENOMEM; > > diff --git a/sound/soc/codecs/madera.c b/sound/soc/codecs/madera.c > index bc3470cf2c54..c75f6a617c73 100644 > --- a/sound/soc/codecs/madera.c > +++ b/sound/soc/codecs/madera.c > @@ -4754,7 +4754,7 @@ int madera_eq_coeff_put(struct snd_kcontrol > *kcontrol, > len = params->num_regs * > regmap_get_val_bytes(madera->regmap); > - data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | > GFP_DMA); > + data = kmemdup(ucontrol->value.bytes.data, len, GFP_KERNEL | > GFP_SPI_DMA); if (!data) > return -ENOMEM; > > diff --git a/sound/soc/codecs/nau8810.c b/sound/soc/codecs/nau8810.c > index 6f432b992941..2b66c50bbf1b 100644 > --- a/sound/soc/codecs/nau8810.c > +++ b/sound/soc/codecs/nau8810.c > @@ -205,7 +205,7 @@ static int nau8810_eq_put(struct snd_kcontrol > *kcontrol, __be16 *tmp; > > data = kmemdup(ucontrol->value.bytes.data, > - params->max, GFP_KERNEL | GFP_DMA); > + params->max, GFP_KERNEL | GFP_SPI_DMA); > if (!data) > return -ENOMEM; > > diff --git a/sound/soc/codecs/nau8821.c b/sound/soc/codecs/nau8821.c > index edb95f869a4a..7cca9dac7e5f 100644 > --- a/sound/soc/codecs/nau8821.c > +++ b/sound/soc/codecs/nau8821.c > @@ -303,7 +303,7 @@ static int nau8821_biq_coeff_put(struct > snd_kcontrol *kcontrol, return -EINVAL; > > data = kmemdup(ucontrol->value.bytes.data, > - params->max, GFP_KERNEL | GFP_DMA); > + params->max, GFP_KERNEL | GFP_SPI_DMA); > if (!data) > return -ENOMEM; > > diff --git a/sound/soc/codecs/nau8822.c b/sound/soc/codecs/nau8822.c > index 15d6f8d01f78..016f537ebb30 100644 > --- a/sound/soc/codecs/nau8822.c > +++ b/sound/soc/codecs/nau8822.c > @@ -221,7 +221,7 @@ static int nau8822_eq_put(struct snd_kcontrol > *kcontrol, __be16 *tmp; > > data = kmemdup(ucontrol->value.bytes.data, > - params->max, GFP_KERNEL | GFP_DMA); > + params->max, GFP_KERNEL | GFP_SPI_DMA); > if (!data) > return -ENOMEM; > > diff --git a/sound/soc/codecs/nau8825.c b/sound/soc/codecs/nau8825.c > index 25b8b19e27ec..43a73aabf891 100644 > --- a/sound/soc/codecs/nau8825.c > +++ b/sound/soc/codecs/nau8825.c > @@ -1017,7 +1017,7 @@ static int nau8825_biq_coeff_put(struct > snd_kcontrol *kcontrol, return -EINVAL; > > data = kmemdup(ucontrol->value.bytes.data, > - params->max, GFP_KERNEL | GFP_DMA); > + params->max, GFP_KERNEL | GFP_SPI_DMA); > if (!data) > return -ENOMEM; > > diff --git a/sound/soc/codecs/tas571x.c b/sound/soc/codecs/tas571x.c > index 6c6e7ae07d80..a0f75b089aa4 100644 > --- a/sound/soc/codecs/tas571x.c > +++ b/sound/soc/codecs/tas571x.c > @@ -150,7 +150,7 @@ static int tas571x_reg_write_multiword(struct > i2c_client *client, int ret; > size_t send_size = 1 + len * sizeof(uint32_t); > > - buf = kzalloc(send_size, GFP_KERNEL | GFP_DMA); > + buf = kzalloc(send_size, GFP_KERNEL | GFP_SPI_DMA); > if (!buf) > return -ENOMEM; > buf[0] = reg; > @@ -183,7 +183,7 @@ static int tas571x_reg_read_multiword(struct > i2c_client *client, unsigned int recv_size = len * sizeof(uint32_t); > int ret; > > - recv_buf = kzalloc(recv_size, GFP_KERNEL | GFP_DMA); > + recv_buf = kzalloc(recv_size, GFP_KERNEL | GFP_DMA); // XXXX > if (!recv_buf) > return -ENOMEM; > > diff --git a/sound/soc/codecs/wm0010.c b/sound/soc/codecs/wm0010.c > index 8d1a575532ff..d1ef988aa84e 100644 > --- a/sound/soc/codecs/wm0010.c > +++ b/sound/soc/codecs/wm0010.c > @@ -405,14 +405,14 @@ static int wm0010_firmware_load(const char > *name, struct snd_soc_component *comp xfer->component = component; > list_add_tail(&xfer->list, &xfer_list); > > - out = kzalloc(len, GFP_KERNEL | GFP_DMA); > + out = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); > if (!out) { > ret = -ENOMEM; > goto abort1; > } > xfer->t.rx_buf = out; > > - img = kzalloc(len, GFP_KERNEL | GFP_DMA); > + img = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); > if (!img) { > ret = -ENOMEM; > goto abort1; > @@ -504,13 +504,13 @@ static int wm0010_stage2_load(struct > snd_soc_component *component) dev_dbg(component->dev, "Downloading > %zu byte stage 2 loader\n", fw->size); > /* Copy to local buffer first as vmalloc causes problems for > dma */ > - img = kmemdup(&fw->data[0], fw->size, GFP_KERNEL | GFP_DMA); > + img = kmemdup(&fw->data[0], fw->size, GFP_KERNEL | > GFP_SPI_DMA); if (!img) { > ret = -ENOMEM; > goto abort2; > } > > - out = kzalloc(fw->size, GFP_KERNEL | GFP_DMA); > + out = kzalloc(fw->size, GFP_KERNEL | GFP_SPI_DMA); > if (!out) { > ret = -ENOMEM; > goto abort1; > @@ -638,11 +638,11 @@ static int wm0010_boot(struct snd_soc_component > *component) > ret = -ENOMEM; > len = pll_rec.length + 8; > - out = kzalloc(len, GFP_KERNEL | GFP_DMA); > + out = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); > if (!out) > goto abort; > > - img_swap = kzalloc(len, GFP_KERNEL | GFP_DMA); > + img_swap = kzalloc(len, GFP_KERNEL | GFP_SPI_DMA); > if (!img_swap) > goto abort_out; > > diff --git a/sound/soc/codecs/wm_adsp.c b/sound/soc/codecs/wm_adsp.c > index 91c8697c29c3..7e8b44b911c0 100644 > --- a/sound/soc/codecs/wm_adsp.c > +++ b/sound/soc/codecs/wm_adsp.c > @@ -1373,7 +1373,7 @@ int wm_adsp_compr_set_params(struct > snd_soc_component *component, compr->size.fragment_size, > compr->size.fragments); > size = wm_adsp_compr_frag_words(compr) * > sizeof(*compr->raw_buf); > - compr->raw_buf = kmalloc(size, GFP_DMA | GFP_KERNEL); > + compr->raw_buf = kmalloc(size, GFP_SPI_DMA | GFP_KERNEL); > if (!compr->raw_buf) > return -ENOMEM; > > diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c > index cd5f927bcd4e..231e80ca0386 100644 > --- a/sound/soc/soc-ops.c > +++ b/sound/soc/soc-ops.c > @@ -749,7 +749,7 @@ int snd_soc_bytes_put(struct snd_kcontrol > *kcontrol, len = params->num_regs * component->val_bytes; > > void *data __free(kfree) = > kmemdup(ucontrol->value.bytes.data, len, > - GFP_KERNEL | GFP_DMA); > + GFP_KERNEL | GFP_SPI_DMA); > if (!data) > return -ENOMEM; > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 14:35 ` Petr Tesarik @ 2025-03-20 15:34 ` Mark Brown 2025-03-20 16:08 ` Petr Tesarik 0 siblings, 1 reply; 20+ messages in thread From: Mark Brown @ 2025-03-20 15:34 UTC (permalink / raw) To: Petr Tesarik Cc: Arnd Bergmann, Grant Likely, linux-kernel, linux-spi, Robin Murphy [-- Attachment #1: Type: text/plain, Size: 1988 bytes --] On Thu, Mar 20, 2025 at 03:35:36PM +0100, Petr Tesarik wrote: > CC'ing Robin Murphy, because there seem to be some doubts about DMA API > efficiency. Or possibly just documentation, the number of memory types we have to deal with and disjoint interfaces makes all this stuff pretty miserable. > > > The whole goal there is to try to avoid triggering another copy to do > > > the DMA so just reverting rather than replacing with some other > > > construct that achieves the same goal doesn't seem great. I think > > > possibly we should just not do the copy at all any more and trust the > > > core to DTRT with any buffers that are passed in, I think we've got > > > enough stuff in the core though I can't remember if it'll copy with > > > things allocated on the stack well. I'd need to page the status back > > > in. > No, I'm afraid kernel stack addresses (and many other types of > addresses) cannot be used for DMA: > https://docs.kernel.org/core-api/dma-api-howto.html#what-memory-is-dma-able Right, that's what I thought. Part of what spi_write_then_read() is doing is taking the edge off that particular sharp edge for users, on the off chance that the controller wants to DMA. > > From what I found, there are two scenarios that may depend on > > GFP_DMA today: > > a) a performance optimization where allocating from GFP_DMA avoids > > the swiotlb bounce buffering. This would be the normal case on > > any 64-bit machine with more than 4GB of RAM and an SPI > > controller with a 32-bit DMA mask. > I must be missing something. How is a memcpy() in spi_write_then_read() > faster than a memcpy() by swiotlb? spi_write_then_read() is just a convenience API, a good proportion of users will be using spi_sync() directly. > I still believe the SPI subsystem should not try to be clever. The > DMA API already avoids unnecessary copying as much as possible. It's not particularly trying to be clever here? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 15:34 ` Mark Brown @ 2025-03-20 16:08 ` Petr Tesarik 2025-03-20 18:55 ` Mark Brown 0 siblings, 1 reply; 20+ messages in thread From: Petr Tesarik @ 2025-03-20 16:08 UTC (permalink / raw) To: Mark Brown Cc: Arnd Bergmann, Grant Likely, linux-kernel, linux-spi, Robin Murphy On Thu, 20 Mar 2025 15:34:42 +0000 Mark Brown <broonie@kernel.org> wrote: > On Thu, Mar 20, 2025 at 03:35:36PM +0100, Petr Tesarik wrote: > > > CC'ing Robin Murphy, because there seem to be some doubts about DMA API > > efficiency. > > Or possibly just documentation, the number of memory types we have to > deal with and disjoint interfaces makes all this stuff pretty miserable. I have to agree here. Plus the existing documentation is confusing, as it introduces some opaque terms: streaming, consistent, coherent ... what next? I volunteer to clean it up a bit. Or at least to give it a try. > > > > The whole goal there is to try to avoid triggering another copy to do > > > > the DMA so just reverting rather than replacing with some other > > > > construct that achieves the same goal doesn't seem great. I think > > > > possibly we should just not do the copy at all any more and trust the > > > > core to DTRT with any buffers that are passed in, I think we've got > > > > enough stuff in the core though I can't remember if it'll copy with > > > > things allocated on the stack well. I'd need to page the status back > > > > in. > > > No, I'm afraid kernel stack addresses (and many other types of > > addresses) cannot be used for DMA: > > > https://docs.kernel.org/core-api/dma-api-howto.html#what-memory-is-dma-able > > Right, that's what I thought. Part of what spi_write_then_read() is > doing is taking the edge off that particular sharp edge for users, on > the off chance that the controller wants to DMA. Thanks for explaining the goal. It seems that most subsystems pass this complexity down to individual device drivers, and I agree that it is not the best approach. If we want to make life easier for authors who don't need to squeeze the last bit of performance from their driver, the core DMA API could be extended with a wrapper function that checks DMA-ability of a buffer address and takes the appropriate action. I kind of like the idea, but I'm not a subsystem maintainer, so my opinion doesn't mean much. ;-) > > > From what I found, there are two scenarios that may depend on > > > GFP_DMA today: > > > > a) a performance optimization where allocating from GFP_DMA avoids > > > the swiotlb bounce buffering. This would be the normal case on > > > any 64-bit machine with more than 4GB of RAM and an SPI > > > controller with a 32-bit DMA mask. > > > I must be missing something. How is a memcpy() in spi_write_then_read() > > faster than a memcpy() by swiotlb? > > spi_write_then_read() is just a convenience API, a good proportion of > users will be using spi_sync() directly. Got it. > > I still believe the SPI subsystem should not try to be clever. The > > DMA API already avoids unnecessary copying as much as possible. > > It's not particularly trying to be clever here? Well, it tries to guess whether the lower layer will have to make a copy, but it does not always get it right (e.g. memory encryption). Besides, txbuf and rxbuf might be used without any copying at all, e.g. if they point to direct-mapped virtual addresses (e.g. returned by kmalloc). At the end of the day, it's no big deal, because SPI transfers are usually small and not performance-critical. I wouldn't be bothered myself if it wasn't part of a larger project (getting rid of DMA zones). Petr T ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 16:08 ` Petr Tesarik @ 2025-03-20 18:55 ` Mark Brown 0 siblings, 0 replies; 20+ messages in thread From: Mark Brown @ 2025-03-20 18:55 UTC (permalink / raw) To: Petr Tesarik Cc: Arnd Bergmann, Grant Likely, linux-kernel, linux-spi, Robin Murphy [-- Attachment #1: Type: text/plain, Size: 2740 bytes --] On Thu, Mar 20, 2025 at 05:08:46PM +0100, Petr Tesarik wrote: > Mark Brown <broonie@kernel.org> wrote: > > On Thu, Mar 20, 2025 at 03:35:36PM +0100, Petr Tesarik wrote: > > > CC'ing Robin Murphy, because there seem to be some doubts about DMA API > > > efficiency. > > Or possibly just documentation, the number of memory types we have to > > deal with and disjoint interfaces makes all this stuff pretty miserable. > I have to agree here. Plus the existing documentation is confusing, as > it introduces some opaque terms: streaming, consistent, coherent ... > what next? > I volunteer to clean it up a bit. Or at least to give it a try. That would be amazing. > If we want to make life easier for authors who don't need to squeeze > the last bit of performance from their driver, the core DMA API could be > extended with a wrapper function that checks DMA-ability of a buffer > address and takes the appropriate action. I kind of like the idea, but > I'm not a subsystem maintainer, so my opinion doesn't mean much. ;-) That sounds sensible. There's the dance that spi_{map,unmap}_buf() is doing which feels like it should be more generic, a general "I have this buffer, make it DMAable" which sounds like the same sort of ballpark and I always thought could be usefully factored out but never got round to finding a home for. > > > I still believe the SPI subsystem should not try to be clever. The > > > DMA API already avoids unnecessary copying as much as possible. > > It's not particularly trying to be clever here? > Well, it tries to guess whether the lower layer will have to make a > copy, but it does not always get it right (e.g. memory encryption). > Besides, txbuf and rxbuf might be used without any copying at all, e.g. > if they point to direct-mapped virtual addresses (e.g. returned by > kmalloc). > At the end of the day, it's no big deal, because SPI transfers are > usually small and not performance-critical. I wouldn't be bothered > myself if it wasn't part of a larger project (getting rid of DMA zones). Some of the IIO users might beg to differ about performance criticality and transfer sizes there, and there's things like firmware download and SPI flashes too. A lot of the performance work on the subsystem came from people with CAN controllers they're trying to saturate, some of which was large messages. It's not the same situation as block devices or networking but it's an area where anything we can do to eliminate dead time on the bus can be really noticable to applications. It gets used a lot with mixed signal applications where implementing digital logic is expensive but you might want to get a lot of data in or out. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 13:35 ` Arnd Bergmann 2025-03-20 14:35 ` Petr Tesarik @ 2025-03-20 14:42 ` Mark Brown 2025-03-20 16:30 ` Arnd Bergmann 1 sibling, 1 reply; 20+ messages in thread From: Mark Brown @ 2025-03-20 14:42 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 2764 bytes --] On Thu, Mar 20, 2025 at 02:35:41PM +0100, Arnd Bergmann wrote: > On Thu, Mar 20, 2025, at 13:29, Mark Brown wrote: > > On a lot of systems most transfers are short and won't be DMAed at all > > since PIO ends up being more efficient, and most hardware is perfectly > > happy to DMA to/from wherever so *shrug*. SPI_BUFSIZ is a maximum of 32 > > bytes which is going to be under the copybreak limit for quite a few > > controllers, though admittedly 16 is also a popular number, so a lot of > > the time we don't actually DMA out of it at all. > I saw the same thing looked at it the other day and got confused > about why 'local_buf' is allocated with GFP_DMA and 'buf' > uses plain GFP_KERNEL when they are both used in the same place. Really we just don't expect the small buffer to be DMAed. > It also seems that the copy happens in the regmap_bulk_read() > path but not the regmap_bulk_write(), which just passes down > the original buffer without copying, as far as I can tell. Yes, writes don't need to do anything. > From what I found, there are two scenarios that may depend on > GFP_DMA today: > > a) a performance optimization where allocating from GFP_DMA avoids > the swiotlb bounce buffering. This would be the normal case on > any 64-bit machine with more than 4GB of RAM and an SPI > controller with a 32-bit DMA mask. > b) An SPI controller on a 32-bit machine without swiotlb and an > effective DMA mask that covers less than the lowmem area. > E.g. on Raspberry Pi 4, the brcm,bcm2835-spi lives on a > bus with an 1GB dma-ranges translation, but there may be more > than 1GB of lowmem with CONFIG_VMSPLIT_2G=y and CONFIG_SWIOTLB=n. > Without GFP_DMA that would just end up causing data corruption. That sounds about right. > I think we have some corner cases where a driver allocates > a GFP_DMA buffer, calls spi_write_then_read through regmap, > which copies the data to the non-GFP_DMA global buffer, > and then the SPI controller driver calls dma_map_single() > on that, ending up with a third bounce buffer from > swiotlb. I suspect that practically speaking almost everything will be under the copybreak limit. Probably we should just make regmap use spi_sync() with the supplied buffers here and avoid spi_write_then_read(). > I don't know what a good replacement interface would be, but > ideally there should never be more than one copy here, > which means that any temporary buffer would need to be > allocated according to the dma_mask of the underlying > bus master (dmaengine, spi controller, ...). Which is a pain because even if you've got the device for the SPI controller there's no way to discover if it does it's own DMA. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 14:42 ` Mark Brown @ 2025-03-20 16:30 ` Arnd Bergmann 2025-03-20 18:39 ` Mark Brown 0 siblings, 1 reply; 20+ messages in thread From: Arnd Bergmann @ 2025-03-20 16:30 UTC (permalink / raw) To: Mark Brown; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi On Thu, Mar 20, 2025, at 15:42, Mark Brown wrote: > On Thu, Mar 20, 2025 at 02:35:41PM +0100, Arnd Bergmann wrote: >> On Thu, Mar 20, 2025, at 13:29, Mark Brown wrote: > >> > On a lot of systems most transfers are short and won't be DMAed at all >> > since PIO ends up being more efficient, and most hardware is perfectly >> > happy to DMA to/from wherever so *shrug*. SPI_BUFSIZ is a maximum of 32 >> > bytes which is going to be under the copybreak limit for quite a few >> > controllers, though admittedly 16 is also a popular number, so a lot of >> > the time we don't actually DMA out of it at all. > >> I saw the same thing looked at it the other day and got confused >> about why 'local_buf' is allocated with GFP_DMA and 'buf' >> uses plain GFP_KERNEL when they are both used in the same place. > > Really we just don't expect the small buffer to be DMAed. I looked at a couple of drivers and found many have a copybreak limit less than SPI_BUFSIZE #define SPI_BUFSIZ max(32, SMP_CACHE_BYTES) drivers/spi/atmel-quadspi.c:#define ATMEL_QSPI_DMA_MIN_BYTES 16 drivers/spi/spi-at91-usart.c:#define US_DMA_MIN_BYTES 16 drivers/spi/spi-atmel.c:#define DMA_MIN_BYTES 16 drivers/spi/spi-davinci.c:#define DMA_MIN_BYTES 16 drivers/spi/spi-stm32.c:#define SPI_DMA_MIN_BYTES 16 drivers/spi/spi-imx.c: .fifo_size = 8, so any transfers from 17 to 32 bytes would try to use the non-GFP_DMA 'buf' and then try to map that. >> It also seems that the copy happens in the regmap_bulk_read() >> path but not the regmap_bulk_write(), which just passes down >> the original buffer without copying, as far as I can tell. > > Yes, writes don't need to do anything. Can you explain why writes are different from reads here? >> I think we have some corner cases where a driver allocates >> a GFP_DMA buffer, calls spi_write_then_read through regmap, >> which copies the data to the non-GFP_DMA global buffer, >> and then the SPI controller driver calls dma_map_single() >> on that, ending up with a third bounce buffer from >> swiotlb. > > I suspect that practically speaking almost everything will be under the > copybreak limit. Probably we should just make regmap use spi_sync() > with the supplied buffers here and avoid spi_write_then_read(). I'm a bit lost in that code, but doesn't spi_sync() require a buffer that can be passed into dma_map_sg() and (in theory at least) GFP_DMA? Based on what I see, every SPI transfer code paths goes through __spi_pump_transfer_message() and spi_map_msg(), which then tries to map it. This means two things: - any user passing 17 to 32 bytes into the read function either works correctly because the GFP_DMA was not needed, or it's broken and nobody noticed - the way that spi_map_buf_attrs() is written, it actually supports addresses from both kmap() and vmalloc() and will attempt to correctly map those rather than reject the buffer. While this sounds like a good idea, handling vmalloc data like this risks stack data corruption on non-coherent platforms when failing to map stack buffers would be the better response. >> I don't know what a good replacement interface would be, but >> ideally there should never be more than one copy here, >> which means that any temporary buffer would need to be >> allocated according to the dma_mask of the underlying >> bus master (dmaengine, spi controller, ...). > > Which is a pain because even if you've got the device for the SPI > controller there's no way to discover if it does it's own DMA. __spi_map_msg() already handles the case of an external DMA master through ctlr->dma_map_dev, so I think the same could be used to get a temporary buffer using dma_alloc_noncoherent() inside of spi_write_then_read() in place of the kmalloc(, GFP_DMA). Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 16:30 ` Arnd Bergmann @ 2025-03-20 18:39 ` Mark Brown 2025-03-21 12:41 ` Arnd Bergmann 0 siblings, 1 reply; 20+ messages in thread From: Mark Brown @ 2025-03-20 18:39 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 3481 bytes --] On Thu, Mar 20, 2025 at 05:30:01PM +0100, Arnd Bergmann wrote: > On Thu, Mar 20, 2025, at 15:42, Mark Brown wrote: > > Really we just don't expect the small buffer to be DMAed. > I looked at a couple of drivers and found many have a copybreak > limit less than SPI_BUFSIZE > #define SPI_BUFSIZ max(32, SMP_CACHE_BYTES) > drivers/spi/atmel-quadspi.c:#define ATMEL_QSPI_DMA_MIN_BYTES 16 > drivers/spi/spi-at91-usart.c:#define US_DMA_MIN_BYTES 16 > drivers/spi/spi-atmel.c:#define DMA_MIN_BYTES 16 > drivers/spi/spi-davinci.c:#define DMA_MIN_BYTES 16 > drivers/spi/spi-stm32.c:#define SPI_DMA_MIN_BYTES 16 > drivers/spi/spi-imx.c: .fifo_size = 8, > so any transfers from 17 to 32 bytes would try to use the > non-GFP_DMA 'buf' and then try to map that. Yes, like I said elsewhere in the thread 16 is a popular number but I suspect that was the thining there. > >> It also seems that the copy happens in the regmap_bulk_read() > >> path but not the regmap_bulk_write(), which just passes down > >> the original buffer without copying, as far as I can tell. > > Yes, writes don't need to do anything. > Can you explain why writes are different from reads here? I think there may have been some context lost there while replying? > > I suspect that practically speaking almost everything will be under the > > copybreak limit. Probably we should just make regmap use spi_sync() > > with the supplied buffers here and avoid spi_write_then_read(). > I'm a bit lost in that code, but doesn't spi_sync() require > a buffer that can be passed into dma_map_sg() and (in theory > at least) GFP_DMA? Yes, it does - the API is in general that the memory be something we could DMA, in case the controller wants to. You'll probably get away with just passing whatever for small enough transfers, it's much more common to get a controller that won't DMA than one that must DMA. I think I'd been under the impression that dma_map_sg() would handle things similarly to dma_map_single() (it's a bit of a landmine for it not to...). It's a very long time since I looked at any of this stuff. > - the way that spi_map_buf_attrs() is written, it actually > supports addresses from both kmap() and vmalloc() and > will attempt to correctly map those rather than reject > the buffer. While this sounds like a good idea, handling > vmalloc data like this risks stack data corruption > on non-coherent platforms when failing to map stack > buffers would be the better response. IIRC that's there to support filesystems on SPI flashes or some other application that uses vmalloc()ed buffers, it's definitely not intended to support data on stack. If it does anything for stack allocated data that's accidental. > >> I don't know what a good replacement interface would be, but > >> ideally there should never be more than one copy here, > >> which means that any temporary buffer would need to be > >> allocated according to the dma_mask of the underlying > >> bus master (dmaengine, spi controller, ...). > > Which is a pain because even if you've got the device for the SPI > > controller there's no way to discover if it does it's own DMA. > __spi_map_msg() already handles the case of an external > DMA master through ctlr->dma_map_dev, so I think the same > could be used to get a temporary buffer using > dma_alloc_noncoherent() inside of spi_write_then_read() > in place of the kmalloc(, GFP_DMA). That only helps spi_write_then_read() though. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-20 18:39 ` Mark Brown @ 2025-03-21 12:41 ` Arnd Bergmann 2025-03-21 14:13 ` Petr Tesarik 2025-03-21 14:45 ` Mark Brown 0 siblings, 2 replies; 20+ messages in thread From: Arnd Bergmann @ 2025-03-21 12:41 UTC (permalink / raw) To: Mark Brown; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi On Thu, Mar 20, 2025, at 19:39, Mark Brown wrote: > On Thu, Mar 20, 2025 at 05:30:01PM +0100, Arnd Bergmann wrote: >> On Thu, Mar 20, 2025, at 15:42, Mark Brown wrote: > >> drivers/spi/atmel-quadspi.c:#define ATMEL_QSPI_DMA_MIN_BYTES 16 >> drivers/spi/spi-at91-usart.c:#define US_DMA_MIN_BYTES 16 >> drivers/spi/spi-atmel.c:#define DMA_MIN_BYTES 16 >> drivers/spi/spi-davinci.c:#define DMA_MIN_BYTES 16 >> drivers/spi/spi-stm32.c:#define SPI_DMA_MIN_BYTES 16 >> drivers/spi/spi-imx.c: .fifo_size = 8, > >> so any transfers from 17 to 32 bytes would try to use the >> non-GFP_DMA 'buf' and then try to map that. > > Yes, like I said elsewhere in the thread 16 is a popular number but I > suspect that was the thining there. Ok, so do we assume we don't need the GFP_DMA then? That's fine with me, but in that case we should probably enable swiotlb on all arm32 platforms that may have ZONE_DMA smaller than ZONE_NORMAL. >> >> It also seems that the copy happens in the regmap_bulk_read() >> >> path but not the regmap_bulk_write(), which just passes down >> >> the original buffer without copying, as far as I can tell. > >> > Yes, writes don't need to do anything. > >> Can you explain why writes are different from reads here? > > I think there may have been some context lost there while replying? I found the answer now: at least on common architectures (arm, powerpc, mips, ...), doing a write from an unaligned buffer on stack or in a shared data structure won't cause data corruption, but doing a read into that buffer can end up throwing away data that shares the same cacheline. >> > I suspect that practically speaking almost everything will be under the >> > copybreak limit. Probably we should just make regmap use spi_sync() >> > with the supplied buffers here and avoid spi_write_then_read(). > >> I'm a bit lost in that code, but doesn't spi_sync() require >> a buffer that can be passed into dma_map_sg() and (in theory >> at least) GFP_DMA? > > Yes, it does - the API is in general that the memory be something we > could DMA, in case the controller wants to. You'll probably get away > with just passing whatever for small enough transfers, it's much more > common to get a controller that won't DMA than one that must DMA. > > I think I'd been under the impression that dma_map_sg() would handle > things similarly to dma_map_single() (it's a bit of a landmine for it > not to...). It's a very long time since I looked at any of this stuff. Yes, dma_map_{single,page,sg} all do the same thing internally. >> - the way that spi_map_buf_attrs() is written, it actually >> supports addresses from both kmap() and vmalloc() and >> will attempt to correctly map those rather than reject >> the buffer. While this sounds like a good idea, handling >> vmalloc data like this risks stack data corruption >> on non-coherent platforms when failing to map stack >> buffers would be the better response. > > IIRC that's there to support filesystems on SPI flashes or some other > application that uses vmalloc()ed buffers, it's definitely not intended > to support data on stack. If it does anything for stack allocated data > that's accidental. Ok, then the question is what we should do about callers that pass in stack data. I can send a patch that adds a WARN_ONCE() or similar, but it would trigger on things like static int rt1711h_read16(struct rt1711h_chip *chip, unsigned int reg, u16 *val) { return regmap_raw_read(chip->data.regmap, reg, val, sizeof(u16)); } static int rt1711h_write16(struct rt1711h_chip *chip, unsigned int reg, u16 val) { return regmap_raw_write(chip->data.regmap, reg, &val, sizeof(u16)); } which happens in a number of drivers but is harmless as long as the driver doesn't actually try to DMA into that buffer. >> >> I don't know what a good replacement interface would be, but >> >> ideally there should never be more than one copy here, >> >> which means that any temporary buffer would need to be >> >> allocated according to the dma_mask of the underlying >> >> bus master (dmaengine, spi controller, ...). > >> > Which is a pain because even if you've got the device for the SPI >> > controller there's no way to discover if it does it's own DMA. > >> __spi_map_msg() already handles the case of an external >> DMA master through ctlr->dma_map_dev, so I think the same >> could be used to get a temporary buffer using >> dma_alloc_noncoherent() inside of spi_write_then_read() >> in place of the kmalloc(, GFP_DMA). > > That only helps spi_write_then_read() though. Right, we'd need to mirror this in other interfaces, either changing the existing ones, or adding safer ones that can be used from regmap and from drivers that currently allocate their own GFP_DMA buffers for this purpose. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-21 12:41 ` Arnd Bergmann @ 2025-03-21 14:13 ` Petr Tesarik 2025-03-21 21:21 ` Arnd Bergmann 2025-03-21 14:45 ` Mark Brown 1 sibling, 1 reply; 20+ messages in thread From: Petr Tesarik @ 2025-03-21 14:13 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Mark Brown, Grant Likely, linux-kernel, linux-spi On Fri, 21 Mar 2025 13:41:52 +0100 "Arnd Bergmann" <arnd@arndb.de> wrote: > On Thu, Mar 20, 2025, at 19:39, Mark Brown wrote: > > On Thu, Mar 20, 2025 at 05:30:01PM +0100, Arnd Bergmann wrote: > >> On Thu, Mar 20, 2025, at 15:42, Mark Brown wrote: >[...] > >> >> It also seems that the copy happens in the regmap_bulk_read() > >> >> path but not the regmap_bulk_write(), which just passes down > >> >> the original buffer without copying, as far as I can tell. > > > >> > Yes, writes don't need to do anything. > > > >> Can you explain why writes are different from reads here? > > > > I think there may have been some context lost there while replying? > > I found the answer now: at least on common architectures (arm, > powerpc, mips, ...), doing a write from an unaligned buffer > on stack or in a shared data structure won't cause data corruption, > but doing a read into that buffer can end up throwing away > data that shares the same cacheline. That's right. Additionally, any cache aliasing issues are irrelevant for a write. Yes, there are (were?) some processors with a virtual-indexed, virtual-tagged cache. Thank you, Arm... >[...] > >> - the way that spi_map_buf_attrs() is written, it actually > >> supports addresses from both kmap() and vmalloc() and > >> will attempt to correctly map those rather than reject > >> the buffer. While this sounds like a good idea, handling > >> vmalloc data like this risks stack data corruption > >> on non-coherent platforms when failing to map stack > >> buffers would be the better response. > > > > IIRC that's there to support filesystems on SPI flashes or some other > > application that uses vmalloc()ed buffers, it's definitely not intended > > to support data on stack. If it does anything for stack allocated data > > that's accidental. > > Ok, then the question is what we should do about callers that pass > in stack data. I can send a patch that adds a WARN_ONCE() or similar, > but it would trigger on things like > > static int rt1711h_read16(struct rt1711h_chip *chip, unsigned int reg, u16 *val) > { > return regmap_raw_read(chip->data.regmap, reg, val, sizeof(u16)); > } > static int rt1711h_write16(struct rt1711h_chip *chip, unsigned int reg, u16 val) > { > return regmap_raw_write(chip->data.regmap, reg, &val, sizeof(u16)); > } > > which happens in a number of drivers but is harmless as long > as the driver doesn't actually try to DMA into that buffer. This sounds like we should push the WARN_ONCE() one level deeper, into the DMA code. That's a good idea, actually, because it's always wrong to do DMA to a stack address, not just when SPI does it. Petr T ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-21 14:13 ` Petr Tesarik @ 2025-03-21 21:21 ` Arnd Bergmann 0 siblings, 0 replies; 20+ messages in thread From: Arnd Bergmann @ 2025-03-21 21:21 UTC (permalink / raw) To: Petr Tesarik; +Cc: Mark Brown, Grant Likely, linux-kernel, linux-spi On Fri, Mar 21, 2025, at 15:13, Petr Tesarik wrote: > On Fri, 21 Mar 2025 13:41:52 +0100 >> Ok, then the question is what we should do about callers that pass >> in stack data. I can send a patch that adds a WARN_ONCE() or similar, >> but it would trigger on things like >> >> static int rt1711h_read16(struct rt1711h_chip *chip, unsigned int reg, u16 *val) >> { >> return regmap_raw_read(chip->data.regmap, reg, val, sizeof(u16)); >> } >> static int rt1711h_write16(struct rt1711h_chip *chip, unsigned int reg, u16 val) >> { >> return regmap_raw_write(chip->data.regmap, reg, &val, sizeof(u16)); >> } >> >> which happens in a number of drivers but is harmless as long >> as the driver doesn't actually try to DMA into that buffer. > > This sounds like we should push the WARN_ONCE() one level deeper, into > the DMA code. That's a good idea, actually, because it's always wrong > to do DMA to a stack address, not just when SPI does it. This doesn't work for the current SPI code that uses vmalloc_to_page() in order to deal with vmalloc addresses. Passing a vmap stack address in here would continue working on the address from the linear map. There is already a check_for_stack() assertion in debug_dma_map_page(), which is meant to catch this problem in the DMA layer itself, but only when CONFIG_DMA_API_DEBUG is enabled. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-21 12:41 ` Arnd Bergmann 2025-03-21 14:13 ` Petr Tesarik @ 2025-03-21 14:45 ` Mark Brown 2025-03-21 19:42 ` Arnd Bergmann 1 sibling, 1 reply; 20+ messages in thread From: Mark Brown @ 2025-03-21 14:45 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 2791 bytes --] On Fri, Mar 21, 2025 at 01:41:52PM +0100, Arnd Bergmann wrote: > On Thu, Mar 20, 2025, at 19:39, Mark Brown wrote: > > Yes, like I said elsewhere in the thread 16 is a popular number but I > > suspect that was the thining there. > Ok, so do we assume we don't need the GFP_DMA then? That's > fine with me, but in that case we should probably enable > swiotlb on all arm32 platforms that may have ZONE_DMA smaller > than ZONE_NORMAL. I think that makes sense. > >> - the way that spi_map_buf_attrs() is written, it actually > >> supports addresses from both kmap() and vmalloc() and > >> will attempt to correctly map those rather than reject > >> the buffer. While this sounds like a good idea, handling > >> vmalloc data like this risks stack data corruption > >> on non-coherent platforms when failing to map stack > >> buffers would be the better response. > > IIRC that's there to support filesystems on SPI flashes or some other > > application that uses vmalloc()ed buffers, it's definitely not intended > > to support data on stack. If it does anything for stack allocated data > > that's accidental. > Ok, then the question is what we should do about callers that pass > in stack data. I can send a patch that adds a WARN_ONCE() or similar, > but it would trigger on things like ... (single/double register raw I/O from stack) ... > which happens in a number of drivers but is harmless as long > as the driver doesn't actually try to DMA into that buffer. Hrm, right. I think that usage is reasonable so we probably should allow it rather than forcing things to do an allocation for a transfer that ends up being PIOed anyway almost all the time, OTOH the same API is also used for large transfers like firmware downloads where we don't want to trigger a spurious copy. Having different requirements at different times would be miserable though so I think we need to just accept any buffer and then figure it out inside the API, but I've not fully thought that through yet. > >> __spi_map_msg() already handles the case of an external > >> DMA master through ctlr->dma_map_dev, so I think the same > >> could be used to get a temporary buffer using > >> dma_alloc_noncoherent() inside of spi_write_then_read() > >> in place of the kmalloc(, GFP_DMA). > > That only helps spi_write_then_read() though. > Right, we'd need to mirror this in other interfaces, either changing > the existing ones, or adding safer ones that can be used from regmap > and from drivers that currently allocate their own GFP_DMA buffers > for this purpose. Yes, indeed. I don't have a clear sense for what the best solution is there yet. Possibly some libary code for the "I want to DMA this random memory" use case? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-21 14:45 ` Mark Brown @ 2025-03-21 19:42 ` Arnd Bergmann 2025-03-26 16:20 ` Mark Brown 0 siblings, 1 reply; 20+ messages in thread From: Arnd Bergmann @ 2025-03-21 19:42 UTC (permalink / raw) To: Mark Brown; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi On Fri, Mar 21, 2025, at 15:45, Mark Brown wrote: > On Fri, Mar 21, 2025 at 01:41:52PM +0100, Arnd Bergmann wrote: >> On Thu, Mar 20, 2025, at 19:39, Mark Brown wrote: > >> which happens in a number of drivers but is harmless as long >> as the driver doesn't actually try to DMA into that buffer. > > Hrm, right. I think that usage is reasonable so we probably should > allow it rather than forcing things to do an allocation for a transfer > that ends up being PIOed anyway almost all the time, OTOH the same API > is also used for large transfers like firmware downloads where we don't > want to trigger a spurious copy. Having different requirements at > different times would be miserable though so I think we need to just > accept any buffer and then figure it out inside the API, but I've not > fully thought that through yet. My feeling so far is that we want the default regmap interface to just take any buffer (stack, misaligned, vmalloc, kmap) and leave it up to the underlying bus to make sure this works in a sensible way. Using dma_alloc_noncoherent() should make the implementation much nicer than GFP_DMA in the past, so we could add a bus specific helper for SPI that checks if the controller actually wants to do DMA and whether the buffer is problematic at all, and then decides to either allocate a bounce buffer and fill the sg table with the correct DMA address, map the existing buffer, or pass it without mapping depending on what the device needs. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-21 19:42 ` Arnd Bergmann @ 2025-03-26 16:20 ` Mark Brown 2025-03-26 21:45 ` Arnd Bergmann 0 siblings, 1 reply; 20+ messages in thread From: Mark Brown @ 2025-03-26 16:20 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 1580 bytes --] On Fri, Mar 21, 2025 at 08:42:12PM +0100, Arnd Bergmann wrote: > On Fri, Mar 21, 2025, at 15:45, Mark Brown wrote: > > Hrm, right. I think that usage is reasonable so we probably should > > allow it rather than forcing things to do an allocation for a transfer > > that ends up being PIOed anyway almost all the time, OTOH the same API > > is also used for large transfers like firmware downloads where we don't > > want to trigger a spurious copy. Having different requirements at > > different times would be miserable though so I think we need to just > > accept any buffer and then figure it out inside the API, but I've not > > fully thought that through yet. > My feeling so far is that we want the default regmap interface > to just take any buffer (stack, misaligned, vmalloc, kmap) and > leave it up to the underlying bus to make sure this works in > a sensible way. Definitely for regmap, I was thinking about the implementation of that - it feels like something other places will want to do. > Using dma_alloc_noncoherent() should make the implementation > much nicer than GFP_DMA in the past, so we could add a bus > specific helper for SPI that checks if the controller actually > wants to do DMA and whether the buffer is problematic at all, > and then decides to either allocate a bounce buffer and > fill the sg table with the correct DMA address, map the > existing buffer, or pass it without mapping depending on > what the device needs. That query feels a lot like spi_optimize_message(). Which should possibly then just do the bouncing if it's needed. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-26 16:20 ` Mark Brown @ 2025-03-26 21:45 ` Arnd Bergmann 2025-03-27 15:03 ` Mark Brown 0 siblings, 1 reply; 20+ messages in thread From: Arnd Bergmann @ 2025-03-26 21:45 UTC (permalink / raw) To: Mark Brown; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi On Wed, Mar 26, 2025, at 17:20, Mark Brown wrote: > On Fri, Mar 21, 2025 at 08:42:12PM +0100, Arnd Bergmann wrote: >> Using dma_alloc_noncoherent() should make the implementation >> much nicer than GFP_DMA in the past, so we could add a bus >> specific helper for SPI that checks if the controller actually >> wants to do DMA and whether the buffer is problematic at all, >> and then decides to either allocate a bounce buffer and >> fill the sg table with the correct DMA address, map the >> existing buffer, or pass it without mapping depending on >> what the device needs. > > That query feels a lot like spi_optimize_message(). Which should > possibly then just do the bouncing if it's needed. Would that require attaching the temporary buffer to the message or could that be a permanent bounce buffer? The idea I had come up with was to have one or two pages permanently allocated in the spi_controller, the spi_device or the regmap and then use appropriate serialization to ensure only one transfer uses it at a time, similar to how spi_controller->dummy_tx gets allocated, or how spi_write_then_read() uses its small global buffer. The advantage of using a permanent buffer is that it avoids both the kmalloc and the iommu mapping in the fast path and only needs to do the dma_sync_single_() for cache management, which should be faster for small transfers. The downside would be a higher memory usage and the need for a mutex. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe 2025-03-26 21:45 ` Arnd Bergmann @ 2025-03-27 15:03 ` Mark Brown 0 siblings, 0 replies; 20+ messages in thread From: Mark Brown @ 2025-03-27 15:03 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Petr Tesarik, Grant Likely, linux-kernel, linux-spi [-- Attachment #1: Type: text/plain, Size: 2070 bytes --] On Wed, Mar 26, 2025 at 10:45:57PM +0100, Arnd Bergmann wrote: > On Wed, Mar 26, 2025, at 17:20, Mark Brown wrote: > > On Fri, Mar 21, 2025 at 08:42:12PM +0100, Arnd Bergmann wrote: > >> Using dma_alloc_noncoherent() should make the implementation > >> much nicer than GFP_DMA in the past, so we could add a bus > >> specific helper for SPI that checks if the controller actually > >> wants to do DMA and whether the buffer is problematic at all, > >> and then decides to either allocate a bounce buffer and > >> fill the sg table with the correct DMA address, map the > >> existing buffer, or pass it without mapping depending on > >> what the device needs. > > That query feels a lot like spi_optimize_message(). Which should > > possibly then just do the bouncing if it's needed. > Would that require attaching the temporary buffer to the message > or could that be a permanent bounce buffer? We probably want to be able to do both - have a permanent buffer for normal operation, and allocate a separate one when spi_optimize_message() is explicitly called by the client code since the idea is with the explicit calls is to be able to have the message baked for a long time and you might have multiple messages ready. > The advantage of using a permanent buffer is that it > avoids both the kmalloc and the iommu mapping in the fast > path and only needs to do the dma_sync_single_() > for cache management, which should be faster for small > transfers. > The downside would be a higher memory usage and the > need for a mutex. Yes, the memory consumption is a potential issue. We only tend to have small numbers of SPI controllers though so if it's a page or two per controller it's not too bad. We could potentially make the buffer discardable and allocate it on demand and release it under memory pressure but that feels like a worry about when it's an issue kind of thing. For cases where we could use the source buffer directly we also have to work out when it saves more overhead to use the existing mapping vs doing a new mapping that skips a copy. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-03-27 15:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-27 6:35 [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe Mark Brown 2013-02-05 14:21 ` Grant Likely 2025-03-20 11:43 ` Petr Tesarik 2025-03-20 12:29 ` Mark Brown 2025-03-20 13:35 ` Arnd Bergmann 2025-03-20 14:35 ` Petr Tesarik 2025-03-20 15:34 ` Mark Brown 2025-03-20 16:08 ` Petr Tesarik 2025-03-20 18:55 ` Mark Brown 2025-03-20 14:42 ` Mark Brown 2025-03-20 16:30 ` Arnd Bergmann 2025-03-20 18:39 ` Mark Brown 2025-03-21 12:41 ` Arnd Bergmann 2025-03-21 14:13 ` Petr Tesarik 2025-03-21 21:21 ` Arnd Bergmann 2025-03-21 14:45 ` Mark Brown 2025-03-21 19:42 ` Arnd Bergmann 2025-03-26 16:20 ` Mark Brown 2025-03-26 21:45 ` Arnd Bergmann 2025-03-27 15:03 ` Mark Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).