Re: [PATCH v2 2/3] usb: gadget: uvc: add scatter gather support

linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Grzeschik <mgr@pengutronix.de>
To: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: balbi@kernel.org, caleb.connolly@ideasonboard.com,
	linux-usb@vger.kernel.org, paul.elder@ideasonboard.com,
	kernel@pengutronix.de
Subject: Re: [PATCH v2 2/3] usb: gadget: uvc: add scatter gather support
Date: Mon, 28 Jun 2021 09:37:28 +0200	[thread overview]
Message-ID: <20210628073728.GC7708@pengutronix.de> (raw)
In-Reply-To: <20210625091227.GA7708@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 15436 bytes --]

Hi Laurent!

On Fri, Jun 25, 2021 at 11:12:27AM +0200, Michael Grzeschik wrote:
>On Tue, Jun 15, 2021 at 05:10:27AM +0300, Laurent Pinchart wrote:
>>Hi Michael,
>>
>>Thank you for the patch.
>>
>>On Mon, May 31, 2021 at 12:30:40AM +0200, Michael Grzeschik wrote:
>>>This patch adds support for scatter gather transfers. If the underlying
>>>gadgets sg_supported == true, then the videeobuf2-dma-sg is used and the
>>>encode routine maps all scatter entries to separate scatterlists for the
>>>usb gadget.
>>
>>This is interesting. Could you please share measurements that show how
>>much CPU time is saved by this patch in typical use cases ?
>
>When streaming 1080p with request size of 1024 times 3 bytes I see this
>change in top when applying this patch:
>
> PID USER      PR  NI    VIRT    RES  %CPU  %MEM     TIME+ S COMMAND
>
>  64 root       0 -20    0.0m   0.0m   7.7   0.0   0:01.25 I [kworker/u5:0-uvcvideo]
>  83 root       0 -20    0.0m   0.0m   4.5   0.0   0:03.71 I [kworker/u5:3-uvcvideo]
> 307 root     -51   0    0.0m   0.0m   3.8   0.0   0:01.05 S [irq/51-dwc3]
>
>vs.
>
>  64 root       0 -20    0.0m   0.0m   5.8   0.0   0:01.79 I [kworker/u5:0-uvcvideo]
> 306 root     -51   0    0.0m   0.0m   3.2   0.0   0:01.97 S [irq/51-dwc3]
>  82 root       0 -20    0.0m   0.0m   0.6   0.0   0:01.86 I [kworker/u5:1-uvcvideo]
>
>So 6.4 % less CPU load tells the measurement.
>
>>>Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
>>>---
>>>v1 -> v2: -
>>>
>>> drivers/usb/gadget/Kconfig              |  1 +
>>> drivers/usb/gadget/function/f_uvc.c     |  1 +
>>> drivers/usb/gadget/function/uvc.h       |  2 +
>>> drivers/usb/gadget/function/uvc_queue.c | 23 ++++++-
>>> drivers/usb/gadget/function/uvc_queue.h |  5 +-
>>> drivers/usb/gadget/function/uvc_video.c | 80 ++++++++++++++++++++++++-
>>> 6 files changed, 105 insertions(+), 7 deletions(-)
>>>
>>>diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
>>>index 2d152571a7de8..dd58094f0b85b 100644
>>>--- a/drivers/usb/gadget/Kconfig
>>>+++ b/drivers/usb/gadget/Kconfig
>>>@@ -450,6 +450,7 @@ config USB_CONFIGFS_F_UVC
>>> 	depends on USB_CONFIGFS
>>> 	depends on VIDEO_V4L2
>>> 	depends on VIDEO_DEV
>>>+	select VIDEOBUF2_DMA_SG
>>> 	select VIDEOBUF2_VMALLOC
>>> 	select USB_F_UVC
>>> 	help
>>>diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
>>>index f48a00e497945..9d87c0fb8f92e 100644
>>>--- a/drivers/usb/gadget/function/f_uvc.c
>>>+++ b/drivers/usb/gadget/function/f_uvc.c
>>>@@ -418,6 +418,7 @@ uvc_register_video(struct uvc_device *uvc)
>>>
>>> 	/* TODO reference counting. */
>>> 	uvc->vdev.v4l2_dev = &uvc->v4l2_dev;
>>>+	uvc->vdev.v4l2_dev->dev = &cdev->gadget->dev;
>>
>>A good change, which could possibly be split to its own patch.
>
>Right. I will move it to a single patch.
>
>>> 	uvc->vdev.fops = &uvc_v4l2_fops;
>>> 	uvc->vdev.ioctl_ops = &uvc_v4l2_ioctl_ops;
>>> 	uvc->vdev.release = video_device_release_empty;
>>>diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
>>>index 83b9e945944e8..c1f06d9df5820 100644
>>>--- a/drivers/usb/gadget/function/uvc.h
>>>+++ b/drivers/usb/gadget/function/uvc.h
>>>@@ -75,6 +75,8 @@ struct uvc_request {
>>> 	struct usb_request *req;
>>> 	__u8 *req_buffer;
>>> 	struct uvc_video *video;
>>>+	struct sg_table sgt;
>>>+	u8 header[2];
>>> };
>>>
>>> struct uvc_video {
>>>diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
>>>index dcd71304d521c..e36a3506842b7 100644
>>>--- a/drivers/usb/gadget/function/uvc_queue.c
>>>+++ b/drivers/usb/gadget/function/uvc_queue.c
>>>@@ -18,6 +18,7 @@
>>>
>>> #include <media/v4l2-common.h>
>>> #include <media/videobuf2-vmalloc.h>
>>>+#include <media/videobuf2-dma-sg.h>
>>
>>Alphabetical order please.
>
>ok.
>
>>>
>>> #include "uvc.h"
>>>
>>>@@ -52,6 +53,7 @@ static int uvc_queue_setup(struct vb2_queue *vq,
>>> 	*nplanes = 1;
>>>
>>> 	sizes[0] = video->imagesize;
>>>+	alloc_devs[0] = video->uvc->v4l2_dev.dev->parent;
>>
>>Is there a guarantee that the gadget's parent is always the UDC ?
>
>No, we can not be sure. Especially the dwc3 core has the dts parameter
>"linux,sysdev_is_parent" where the parent can point to another dev.
>
>I will add a function called gadget_to_udc that will always provide the udc
>device. But while being at it I can also remove this alloc_devs assignment
>since the queue.dev can be set to the udc dev instead, since we don't need any
>per plane allocation.

I just figured that indeed dwc3 is the only case where we can not know
if the gadgets parent is the udc. But I think even this is currently
wrong in the dwc3 core. We can simply fix that, by calling
usb_initialize_gadget with dwc->sysdev instead of dwc->dev. This will
ensure that we will get the udc as parent.

I will prepend that patch in the series and send out v3.

Thanks,
Michael

>>>
>>> 	if (cdev->gadget->speed < USB_SPEED_SUPER)
>>> 		video->uvc_num_requests = 4;
>>>@@ -66,6 +68,9 @@ static int uvc_buffer_prepare(struct vb2_buffer *vb)
>>> 	struct uvc_video_queue *queue = vb2_get_drv_priv(vb->vb2_queue);
>>> 	struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
>>> 	struct uvc_buffer *buf = container_of(vbuf, struct uvc_buffer, buf);
>>>+	struct uvc_video *video = container_of(queue, struct uvc_video, queue);
>>>+	struct uvc_device *uvc = video->uvc;
>>>+	struct usb_composite_dev *cdev = uvc->func.config->cdev;
>>>
>>> 	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT &&
>>> 	    vb2_get_plane_payload(vb, 0) > vb2_plane_size(vb, 0)) {
>>>@@ -77,7 +82,12 @@ static int uvc_buffer_prepare(struct vb2_buffer *vb)
>>> 		return -ENODEV;
>>>
>>> 	buf->state = UVC_BUF_STATE_QUEUED;
>>>-	buf->mem = vb2_plane_vaddr(vb, 0);
>>>+	if (cdev->gadget->sg_supported) {
>>
>>How about storing a use_sg flag in uvc_video_queue, to avoid poking
>>through layers ? The flag can also be used in uvcg_queue_init().
>
>Yes. This makes sense. I added a bool use_sg to uvc_video_queue and
>use it instead in v3.
>
>>>+		buf->sgt = vb2_dma_sg_plane_desc(vb, 0);
>>>+		buf->sg = buf->sgt->sgl;
>>>+	} else {
>>>+		buf->mem = vb2_plane_vaddr(vb, 0);
>>>+	}
>>> 	buf->length = vb2_plane_size(vb, 0);
>>> 	if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
>>> 		buf->bytesused = 0;
>>>@@ -117,9 +127,11 @@ static const struct vb2_ops uvc_queue_qops = {
>>> 	.wait_finish = vb2_ops_wait_finish,
>>> };
>>>
>>>-int uvcg_queue_init(struct uvc_video_queue *queue, enum v4l2_buf_type type,
>>>+int uvcg_queue_init(struct device *dev, struct uvc_video_queue *queue, enum v4l2_buf_type type,
>>
>>Please move the dev parameter after queue, to pass as the first
>>parameter the object that the function operates on.
>
>ok. will change in v3.
>
>>> 		    struct mutex *lock)
>>> {
>>>+	struct uvc_video *video = container_of(queue, struct uvc_video, queue);
>>>+	struct usb_composite_dev *cdev = video->uvc->func.config->cdev;
>>> 	int ret;
>>>
>>> 	queue->queue.type = type;
>>>@@ -128,9 +140,14 @@ int uvcg_queue_init(struct uvc_video_queue *queue, enum v4l2_buf_type type,
>>> 	queue->queue.buf_struct_size = sizeof(struct uvc_buffer);
>>> 	queue->queue.ops = &uvc_queue_qops;
>>> 	queue->queue.lock = lock;
>>>-	queue->queue.mem_ops = &vb2_vmalloc_memops;
>>>+	if (cdev->gadget->sg_supported)
>>>+		queue->queue.mem_ops = &vb2_dma_sg_memops;
>>>+	else
>>>+		queue->queue.mem_ops = &vb2_vmalloc_memops;
>>>+
>>> 	queue->queue.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC
>>> 				     | V4L2_BUF_FLAG_TSTAMP_SRC_EOF;
>>>+	queue->queue.dev = dev;
>>> 	ret = vb2_queue_init(&queue->queue);
>>> 	if (ret)
>>> 		return ret;
>>>diff --git a/drivers/usb/gadget/function/uvc_queue.h b/drivers/usb/gadget/function/uvc_queue.h
>>>index 2f0fff7698430..bb8753b26074f 100644
>>>--- a/drivers/usb/gadget/function/uvc_queue.h
>>>+++ b/drivers/usb/gadget/function/uvc_queue.h
>>>@@ -34,6 +34,9 @@ struct uvc_buffer {
>>>
>>> 	enum uvc_buffer_state state;
>>> 	void *mem;
>>>+	struct sg_table *sgt;
>>>+	struct scatterlist *sg;
>>>+	unsigned int offset;
>>> 	unsigned int length;
>>> 	unsigned int bytesused;
>>> };
>>>@@ -59,7 +62,7 @@ static inline int uvc_queue_streaming(struct uvc_video_queue *queue)
>>> 	return vb2_is_streaming(&queue->queue);
>>> }
>>>
>>>-int uvcg_queue_init(struct uvc_video_queue *queue, enum v4l2_buf_type type,
>>>+int uvcg_queue_init(struct device *d, struct uvc_video_queue *queue, enum v4l2_buf_type type,
>>> 		    struct mutex *lock);
>>>
>>> void uvcg_free_buffers(struct uvc_video_queue *queue);
>>>diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>>>index 57840083dfdda..240d361a45a44 100644
>>>--- a/drivers/usb/gadget/function/uvc_video.c
>>>+++ b/drivers/usb/gadget/function/uvc_video.c
>>>@@ -95,6 +95,71 @@ uvc_video_encode_bulk(struct usb_request *req, struct uvc_video *video,
>>> 		video->payload_size = 0;
>>> }
>>>
>>>+static void
>>>+uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
>>>+		struct uvc_buffer *buf)
>>>+{
>>>+	int pending = buf->bytesused - video->queue.buf_used;
>>>+	struct uvc_request *ureq = req->context;
>>>+	struct scatterlist *sg, *iter;
>>>+	int len = video->req_size;
>>>+	int sg_left, part = 0;
>>>+	int ret;
>>>+	int i;
>>
>>unsigned int for pending, len, sg_left, part and i.
>
>Right.
>
>>>+
>>>+	sg = ureq->sgt.sgl;
>>>+	sg_init_table(sg, ureq->sgt.nents);
>>>+
>>>+	/* Init the header. */
>>>+	memset(ureq->header, 0, 2);
>>
>>Can you add
>>
>>#define UVCG_REQUEST_HEADER_LEN		2
>>
>>somewhere, and use it here, in the uvc_request structure definition, and
>>in uvc_video_encode_header() ? Otherwise I fear we'll forget to update
>>one of the locations when we'll add support for longer headers.
>
>Makes sense. I will change the code to use the define instead.
>
>>Is the memset() needed though ?
>>
>
>Yes, it is not needed. I will remove it.
>
>>>+	ret = uvc_video_encode_header(video, buf, ureq->header,
>>>+				      video->req_size);
>>>+	sg_set_buf(sg, ureq->header, 2);
>>>+	len -= ret;
>>>+
>>>+	if (pending <= len)
>>>+		len = pending;
>>>+
>>>+	req->length = (len == pending) ? len + 2 : video->req_size;
>>>+
>>>+	/* Init the pending sgs with payload */
>>>+	sg = sg_next(sg);
>>>+
>>>+	for_each_sg(sg, iter, ureq->sgt.nents - 1, i) {
>>>+		if (!len || !buf->sg)
>>>+			break;
>>>+
>>>+		sg_left = sg_dma_len(buf->sg) - buf->offset;
>>>+		part = min_t(unsigned int, len, sg_left);
>>>+
>>>+		sg_set_page(iter, sg_page(buf->sg), part, buf->offset);
>>>+
>>>+		if (part == sg_left) {
>>>+			buf->offset = 0;
>>>+			buf->sg = sg_next(buf->sg);
>>>+		} else {
>>>+			buf->offset += part;
>>>+		}
>>>+		len -= part;
>>>+	}
>>>+
>>>+	/* Assign the video data with header. */
>>>+	req->buf = NULL;
>>>+	req->sg	= ureq->sgt.sgl;
>>>+	req->num_sgs = i + 1;
>>
>>Given that you construct the request scatterlist manually anyway,
>>wouldn't it be much simpler to use the vb2 dma contig allocator for the
>>V4L2 buffer ? Or do you expect that the device would run out of dma
>>contig space (which I expect to be provided by CMA or an IOMMU) ?
>
>Yes, it would be simpler. But with dma contig you are limited to get contig
>memory. When we always use sg_lists it is even possible to get the data
>directly from the gpu.
>
>>It would also likely help to fill every sg entry as much as possible,
>>while the above code potentially creates smaller entries in the request
>>sgt when reaching the boundary between two entries in the V4L2 buffer
>>sgt.
>
>In case we get the sg_table from an contig allocator it would be a table with
>one big entry. So this is also a possible use case with the current
>implementation. It will never run into any boundary in that case and will run
>in maximum filled chunks.
>
>>I also wonder if this couldn't be further optimized by creating an sgt
>>with two entries, one for the header and one for the data,
>>unconditionally.
>
>That is exactly what it already does.
>
>>What's the maximum possible request size, could it be larger than what
>>an sgt entry can support ?
>
>The request size is usb maxpacket size for isoc (1024) times mult (<=3)
>times burst (<=15). I think an sgt entry has no size limitation.
>
>>>+
>>>+	req->length -= len;
>>>+	video->queue.buf_used += req->length - 2;
>>>+
>>>+	if (buf->bytesused == video->queue.buf_used || !buf->sg) {
>>>+		video->queue.buf_used = 0;
>>>+		buf->state = UVC_BUF_STATE_DONE;
>>>+		buf->offset = 0;
>>>+		uvcg_queue_next_buffer(&video->queue, buf);
>>>+		video->fid ^= UVC_STREAM_FID;
>>>+	}
>>>+}
>>>+
>>> static void
>>> uvc_video_encode_isoc(struct usb_request *req, struct uvc_video *video,
>>> 		struct uvc_buffer *buf)
>>>@@ -232,6 +297,10 @@ uvc_video_alloc_requests(struct uvc_video *video)
>>> 		video->ureq[i].video = video;
>>>
>>> 		list_add_tail(&video->ureq[i].req->list, &video->req_free);
>>>+		/* req_size/PAGE_SIZE + 1 for overruns and + 1 for header */
>>>+		sg_alloc_table(&video->ureq[i].sgt,
>>>+			       DIV_ROUND_UP(req_size - 2, PAGE_SIZE) + 2,
>>>+			       GFP_KERNEL);
>>
>>It looks like this is leaked.
>
>Oh, you are right. I will add an corresponding sg_free_table in uvc_video_free_requests.
>
>>> 	}
>>>
>>> 	video->req_size = req_size;
>>>@@ -309,6 +378,7 @@ static void uvcg_video_pump(struct work_struct *work)
>>>  */
>>> int uvcg_video_enable(struct uvc_video *video, int enable)
>>> {
>>>+	struct usb_composite_dev *cdev = video->uvc->func.config->cdev;
>>> 	unsigned int i;
>>> 	int ret;
>>>
>>>@@ -340,8 +410,12 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>>> 	if (video->max_payload_size) {
>>> 		video->encode = uvc_video_encode_bulk;
>>> 		video->payload_size = 0;
>>>-	} else
>>>-		video->encode = uvc_video_encode_isoc;
>>>+	} else {
>>>+		if (cdev->gadget->sg_supported)
>>>+			video->encode = uvc_video_encode_isoc_sg;
>>>+		else
>>>+			video->encode = uvc_video_encode_isoc;
>>>+	}
>>>
>>> 	schedule_work(&video->pump);
>>>
>>>@@ -365,7 +439,7 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>>> 	video->imagesize = 320 * 240 * 2;
>>>
>>> 	/* Initialize the video buffers queue. */
>>>-	uvcg_queue_init(&video->queue, V4L2_BUF_TYPE_VIDEO_OUTPUT,
>>>+	uvcg_queue_init(uvc->v4l2_dev.dev, &video->queue, V4L2_BUF_TYPE_VIDEO_OUTPUT,
>>> 			&video->mutex);
>
>This argument will be uvc->v4l2_dev.dev->parent or the result of gadget_to_udc(uvc->v4l2_dev.dev) instead.
>
>>> 	return 0;
>>> }
>
>Thanks,
>Michael
>
>-- 
>Pengutronix e.K.                           |                             |
>Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
>31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
>Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |



-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2021-06-28  7:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-30 22:30 [PATCH v2 0/3] usb: gadget: uvc: improve uvc gadget performance Michael Grzeschik
2021-05-30 22:30 ` [PATCH v2 1/3] usb: gadget: uvc: make uvc_num_requests depend on gadget speed Michael Grzeschik
2021-06-14 10:34   ` paul.elder
2021-06-15  1:28   ` Laurent Pinchart
2021-06-15  1:38     ` Laurent Pinchart
2021-06-22 15:00       ` Michael Grzeschik
2021-05-30 22:30 ` [PATCH v2 2/3] usb: gadget: uvc: add scatter gather support Michael Grzeschik
2021-05-31  1:33   ` kernel test robot
2021-05-31  6:30   ` kernel test robot
2021-06-15  2:10   ` Laurent Pinchart
2021-06-25  9:12     ` Michael Grzeschik
2021-06-28  7:37       ` Michael Grzeschik [this message]
2021-05-30 22:30 ` [PATCH v2 3/3] usb: gadget: uvc: decrease the interrupt load to a quarter Michael Grzeschik
2021-06-14 10:35   ` paul.elder
2021-06-15  1:36     ` Laurent Pinchart
2021-06-22 15:02       ` Michael Grzeschik
2021-06-09  8:17 ` [PATCH v2 0/3] usb: gadget: uvc: improve uvc gadget performance Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210628073728.GC7708@pengutronix.de \
    --to=mgr@pengutronix.de \
    --cc=balbi@kernel.org \
    --cc=caleb.connolly@ideasonboard.com \
    --cc=kernel@pengutronix.de \
    --cc=laurent.pinchart@ideasonboard.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=paul.elder@ideasonboard.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).