[RFC PATCH v1 5/7] media: v4l2: introduce two IOCTLs for face detection

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: snjw23@gmail.com (Sylwester Nawrocki)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v1 5/7] media: v4l2: introduce two IOCTLs for face detection
Date: Thu, 08 Dec 2011 23:27:29 +0100	[thread overview]
Message-ID: <4EE139D1.4030503@gmail.com> (raw)
In-Reply-To: <CACVXFVMR_n946HW1eYWEfcig_D6wFx5e3vEKmVHtERcmtsXX6g@mail.gmail.com>

On 12/08/2011 04:42 AM, Ming Lei wrote:
>>> +/**
>>> + * struct v4l2_obj_detection
>>> + * @buf_index:       entry, index of v4l2_buffer for face detection

I would prefer having the frame sequence number here. It will be more
future proof IMHO. If for instance we decide to use such an ioctl on
a v4l2 sub-device, without dequeuing buffers, there will be no problem
with that. And still in your specific use case it's not big deal to
look up the buffer index given it's sequence number in the application.

>>> + * @centerx: return, position in x direction of detected object
>>> + * @centery: return, position in y direction of detected object
>>> + * @angle:   return, angle of detected object
>>> + *           0 deg ~ 359 deg, vertical is 0 deg, clockwise
>>> + * @sizex:   return, size in x direction of detected object
>>> + * @sizey:   return, size in y direction of detected object
>>> + * @confidence:      return, confidence level of detection result
>>> + *           0: the heighest level, 9: the lowest level
>>
>> Hmm, not a good idea to align a public interface to the capabilities
>> of a single hardware implementation.
> 
> I think that the current omap interface is general enough, so why can't
> we use it as public interface?

I meant exactly the line implying the range. What if for some hardware
it's 0..11 ?

> 
>> min/max confidence could be queried with
>> relevant controls and here we could remove the line implying range.
> 
> No, the confidence is used to describe the probability about
> the correctness of the current detection result. Anyway, no FD can
> make sure that it is 100% correct.  Other HW can normalize its
> confidence level to 0~9 so that application can handle it easily, IMO.

1..100 might be better, to minimize rounding errors. Nevertheless IMO if we
can export an exact range supported by FD device we should do it, and let
upper layers do the normalization. And the bigger numbers should mean higher
confidence, consistently for all devices.

Do you think we could assume that the FD threshold range (FD_LHIT register
in case of OMAP4) is always same as the result confidence level ?

If so then the confidence level range could possibly be queried with the
detection threshold control. We could name it V4L2_CID_FD_CONFIDENCE_THRESHOLD
for example.
I could take care of preparing the control class draft and the documentation
for it.

> 
>>> + * @reserved:        future extensions
>>> + */
>>> +struct v4l2_obj_detection {

How about changing name of this structure to v4l2_fd_primitive or v4l2_fd_shape ?

>>> +     __u16           centerx;
>>> +     __u16           centery;
>>> +     __u16           angle;
>>> +     __u16           sizex;
>>> +     __u16           sizey;
>>
>> How about using struct v4l2_rect in place of centerx/centery, sizex/sizey ?
>> After all it describes a rectangle. We could also use struct v4l2_frmsize_discrete
>> for size but there seems to be missing en equivalent for position, e.g.
> 
> Maybe user space would like to plot a circle or ellipse over the detected
> objection, and I am sure that I have seen this kind of plot over detected
> face before.

OK, in any way I suggest to replace all __u16 with __u32, to minimize performance
issues and be consistent with the data type specifying pixel values elsewhere in
V4L.
It makes sense to make 'confidence' __u32 as well and add a flags attribute to
indicate the shape.

>>
>>> +     __u16           confidence;
>>> +     __u32           reserved[4];

And then
	  __u32           reserved[10];

or
	  __u32           reserved[2];

>>> +};
>>> +
>>> +#define V4L2_FD_HAS_LEFT_EYE 0x1
>>> +#define V4L2_FD_HAS_RIGHT_EYE        0x2
>>> +#define V4L2_FD_HAS_MOUTH    0x4
>>> +#define V4L2_FD_HAS_FACE     0x8

Do you think we could change it to:

#define V4L2_FD_FL_LEFT_EYE	(1 << 0)
#define V4L2_FD_FL_RIGHT_EYE	(1 << 1)
#define V4L2_FD_FL_MOUTH	(1 << 2)
#define V4L2_FD_FL_FACE		(1 << 3)

and add:

#define V4L2_FD_FL_SMILE	(1 << 4)
#define V4L2_FD_FL_BLINK	(1 << 5)

?
>>> +
>>> +/**
>>> + * struct v4l2_fd_detection - VIDIOC_G_FD_RESULT argument
>>> + * @flag:    return, describe which objects are detected
>>> + * @left_eye:        return, left_eye position if detected
>>> + * @right_eye:       return, right_eye position if detected
>>> + * @mouth_eye:       return, mouth_eye position if detected
>>
>> mouth_eye ? ;)
> 
> Sorry, it should be mouth, :-)

:) also the word return could be omitted.

> 
>>
>>> + * @face:    return, face position if detected
>>> + */
>>> +struct v4l2_fd_detection {

How about changing the name to v4l2_fd_object ?

>>> +     __u32   flag;
>>> +     struct v4l2_obj_detection       left_eye;
>>> +     struct v4l2_obj_detection       right_eye;
>>> +     struct v4l2_obj_detection       mouth;
>>> +     struct v4l2_obj_detection       face;
>>
>> I would do this differently, i.e. put "flag" inside struct v4l2_obj_detection
>> and then struct v4l2_fd_detection would be simply an array of
>> struct v4l2_obj_detection, i.e.
>>
>> struct v4l2_fd_detection {
>>        unsigned int count;
>>        struct v4l2_obj_detection [V4L2_MAX_FD_OBJECT_NUM];
>> };
>>
>> This might be more flexible, e.g. if in the future some hardware supports
>> detecting wrinkles, we could easily add that by just defining a new flag:
>> V4L2_FD_HAS_WRINKLES, etc.
> 
> This is a bit flexible, but not explicit enough for describing
> interface, how about reserving these as below for future usage?
> 
> 	struct v4l2_fd_detection {
> 		__u32   flag;
> 		Struct v4l2_obj_detection       left_eye;
> 		Struct v4l2_obj_detection       right_eye;
> 		Struct v4l2_obj_detection       mouth;
> 		Struct v4l2_obj_detection       face;
> 		Struct v4l2_obj_detection       reserved[4];
> 	};

OK, and how about this:

 	struct v4l2_fd_object {
 		struct v4l2_fd_shape	left_eye;
 		struct v4l2_fd_shape	right_eye;
 		struct v4l2_fd_shape	mouth;
 		struct v4l2_fd_shape	face;
		__u32			reserved[33];	
 		__u32   		flags;
	} __packed;

?

-- 
Regards,
Sylwester

next prev parent reply	other threads:[~2011-12-08 22:27 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02 15:02 [RFC PATCH v1 0/7] media&omap4: introduce face detection(FD) driver Ming Lei
2011-12-02 15:02 ` [RFC PATCH v1 1/7] omap4: introduce fdif(face detect module) hwmod Ming Lei
2011-12-02 15:02 ` [RFC PATCH v1 2/7] omap4: build fdif omap device from hwmod Ming Lei
2011-12-02 16:28   ` Aguirre, Sergio
2011-12-05  4:27     ` Ming Lei
2011-12-02 15:02 ` [RFC PATCH v1 3/7] media: videobuf2: move out of setting pgprot_noncached from vb2_mmap_pfn_range Ming Lei
2011-12-02 15:02 ` [RFC PATCH v1 4/7] media: videobuf2: introduce VIDEOBUF2_PAGE memops Ming Lei
2011-12-02 15:02 ` [RFC PATCH v1 5/7] media: v4l2: introduce two IOCTLs for face detection Ming Lei
2011-12-05 22:15   ` Sylwester Nawrocki
2011-12-08  3:42     ` Ming Lei
2011-12-08 22:27       ` Sylwester Nawrocki [this message]
2011-12-09  4:34         ` Ming Lei
2011-12-11 17:27           ` Sylwester Nawrocki
2011-12-02 15:02 ` [RFC PATCH v1 6/7] media: video: introduce face detection driver module Ming Lei
2011-12-05 21:55   ` Sylwester Nawrocki
2011-12-06 14:07     ` Ming Lei
2011-12-06 22:01       ` Sylwester Nawrocki
2011-12-06 22:39         ` Sylwester Nawrocki
2011-12-07 13:40         ` Ming Lei
2011-12-08 23:25           ` Sylwester Nawrocki
2011-12-09 15:10             ` Ming Lei
2011-12-11 17:43               ` Sylwester Nawrocki
2011-12-12  9:49                 ` Ming Lei
2011-12-12 12:08                   ` HeungJun, Kim
2011-12-13  4:01                     ` Ming Lei
2011-12-13  5:59                       ` HeungJun, Kim
2011-12-13  6:29                         ` Ming Lei
2011-12-12 21:57                   ` Sylwester Nawrocki
2011-12-11 18:38   ` Sylwester Nawrocki
2011-12-02 15:02 ` [RFC PATCH v1 7/7] media: video: introduce omap4 face detection module driver Ming Lei
  -- strict thread matches above, loose matches on Subject: below --
2011-12-02  9:12 [RFC PATCH v1 0/7] media&omap4: introduce face detection(FD) driver Ming Lei
2011-12-02  9:12 ` [RFC PATCH v1 5/7] media: v4l2: introduce two IOCTLs for face detection Ming Lei
2011-12-02 12:33   ` Arnd Bergmann
2011-12-04 11:18     ` Ming Lei
2011-12-05 14:37       ` Arnd Bergmann
2011-12-06  6:30         ` Ming Lei
2011-12-06 12:55           ` Arnd Bergmann
2011-12-06 13:11             ` Ming Lei
2011-12-06 14:41               ` Arnd Bergmann
2011-12-06 14:52                 ` Ming Lei
2011-12-06 15:45                   ` Ming Lei
2011-12-06 20:59                     ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EE139D1.4030503@gmail.com \
    --to=snjw23@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).