* reducing stack usage in v4l? @ 2003-03-05 5:25 Randy.Dunlap 2003-03-05 5:43 ` Andrew Morton 2003-03-05 9:15 ` Gerd Knorr 0 siblings, 2 replies; 12+ messages in thread From: Randy.Dunlap @ 2003-03-05 5:25 UTC (permalink / raw) To: linux-kernel Hi, I was looking at stack usage in drivers/media/video/v4l1-compat.c:: v4l_compat_translate_ioctl(). It's 0x924 bytes on my peecee. It's fairly straightforward to change the stack space to kmalloc() space, but some of these functions (ioctls) are speed-critical, so I was wondering if the changes should still be done, or done only in some cases and not others, or wait until an oops is reported here.... Comments? Thanks, ~Randy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 5:25 reducing stack usage in v4l? Randy.Dunlap @ 2003-03-05 5:43 ` Andrew Morton 2003-03-05 9:15 ` Gerd Knorr 1 sibling, 0 replies; 12+ messages in thread From: Andrew Morton @ 2003-03-05 5:43 UTC (permalink / raw) To: Randy.Dunlap; +Cc: linux-kernel "Randy.Dunlap" <rddunlap@osdl.org> wrote: > > Hi, > > I was looking at stack usage in drivers/media/video/v4l1-compat.c:: > v4l_compat_translate_ioctl(). It's 0x924 bytes on my peecee. > > It's fairly straightforward to change the stack space to > kmalloc() space, but some of these functions (ioctls) are > speed-critical, so I was wondering if the changes should still > be done, or done only in some cases and not others, or wait > until an oops is reported here.... > Well the function is doing a memset() of 2k's worth of memory already, so it presumably doesn't care much. Your kmalloc will be a small cost compared with that. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 5:25 reducing stack usage in v4l? Randy.Dunlap 2003-03-05 5:43 ` Andrew Morton @ 2003-03-05 9:15 ` Gerd Knorr 2003-03-05 9:35 ` Russell King 2003-03-05 11:50 ` mdew 1 sibling, 2 replies; 12+ messages in thread From: Gerd Knorr @ 2003-03-05 9:15 UTC (permalink / raw) To: linux-kernel "Randy.Dunlap" <rddunlap@osdl.org> writes: > Hi, > > I was looking at stack usage in drivers/media/video/v4l1-compat.c:: > v4l_compat_translate_ioctl(). It's 0x924 bytes on my peecee. Whoops. Hmm. I assumed having the variable declarations within the case blocks where they are actually used helps to keep the stack size small, i.e. use this ... do_ioctl() { switch (cmd) case FOO: { struct foo; ... } case BAR: { struct bar; ... } ... } ... instead of this: do_ioctl() { struct foo; struct bar; switch (cmd) { ... } } But when looking at the disasm output it is obvious that it isn't true (at least with gcc 3.2). On the other hand it is common practice in many drivers, there must be a reason for that, no? Any chance this used to work with older gcc versions? > It's fairly straightforward to change the stack space to > kmalloc() space, but some of these functions (ioctls) are > speed-critical, so I was wondering if the changes should still > be done, or done only in some cases and not others, or wait > until an oops is reported here.... Although v4l_compat_translate_ioctl() is probably one of the worst offenders, this issue likely exists for lots of other ioctl functions too: Lot of stack space is allocated but never used because the variables for all cases are allocated but only one of the switch cases actually runs ... Not sure what is the best idea to fix that. Don't like the kmalloc idea that much. The individual structs are not huge, the real problem is that many of them are allocated and only few are needed. Any chance to tell gcc that it should allocate block-local variables at the start block not at the start of the function? Gerd -- /join #zonenkinder ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 9:15 ` Gerd Knorr @ 2003-03-05 9:35 ` Russell King 2003-03-05 15:34 ` Randy.Dunlap 2003-03-05 11:50 ` mdew 1 sibling, 1 reply; 12+ messages in thread From: Russell King @ 2003-03-05 9:35 UTC (permalink / raw) To: Gerd Knorr; +Cc: linux-kernel On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote: > But when looking at the disasm output it is obvious that it isn't true > (at least with gcc 3.2). On the other hand it is common practice in > many drivers, there must be a reason for that, no? Any chance this > used to work with older gcc versions? I don't believe so - I seem to remember looking at gcc 2.95 and finding the same annoying behaviour. > Not sure what is the best idea to fix that. Don't like the kmalloc > idea that much. The individual structs are not huge, the real problem > is that many of them are allocated and only few are needed. Any > chance to tell gcc that it should allocate block-local variables at > the start block not at the start of the function? Not a particularly clean idea, but maybe creating a union of the structures and putting that on the stack? (ie, doing what GCC should be doing in the first place.) -- Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux http://www.arm.linux.org.uk/personal/aboutme.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 9:35 ` Russell King @ 2003-03-05 15:34 ` Randy.Dunlap 2003-03-05 21:40 ` Jörn Engel 2003-03-11 4:27 ` Randy.Dunlap 0 siblings, 2 replies; 12+ messages in thread From: Randy.Dunlap @ 2003-03-05 15:34 UTC (permalink / raw) To: Russell King; +Cc: kraxel, linux-kernel On Wed, 5 Mar 2003 09:35:34 +0000 Russell King <rmk@arm.linux.org.uk> wrote: | On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote: | > But when looking at the disasm output it is obvious that it isn't true | > (at least with gcc 3.2). On the other hand it is common practice in | > many drivers, there must be a reason for that, no? Any chance this | > used to work with older gcc versions? | | I don't believe so - I seem to remember looking at gcc 2.95 and finding | the same annoying behaviour. Yes, that's what I'm seeing with 2.96 as well. | > Not sure what is the best idea to fix that. Don't like the kmalloc | > idea that much. The individual structs are not huge, the real problem | > is that many of them are allocated and only few are needed. Any | > chance to tell gcc that it should allocate block-local variables at | > the start block not at the start of the function? | | Not a particularly clean idea, but maybe creating a union of the | structures and putting that on the stack? (ie, doing what GCC should | be doing in the first place.) That's an idea. Or make separate called functions for each ioctl and declare the structures inside them? -- ~Randy Here are some v4l structure sizes from gcc 2.96 on a P4: sizeof video_audio: 40 sizeof video_buffer: 20 sizeof video_capability: 60 sizeof video_channel: 48 sizeof video_mbuf: 136 sizeof video_mmap: 16 sizeof video_picture: 14 sizeof video_tuner: 52 sizeof video_window: 32 sizeof v4l2_audio: 52 sizeof v4l2_buffer: 68 sizeof v4l2_capability: 104 sizeof v4l2_control: 8 sizeof v4l2_fmtdesc: 64 sizeof v4l2_format: 204 sizeof v4l2_framebuffer: 44 sizeof v4l2_frequency: 44 sizeof v4l2_input: 76 sizeof v4l2_queryctrl: 68 sizeof v4l2_requestbuffers: 20 sizeof v4l2_tuner: 84 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 15:34 ` Randy.Dunlap @ 2003-03-05 21:40 ` Jörn Engel 2003-03-11 4:27 ` Randy.Dunlap 1 sibling, 0 replies; 12+ messages in thread From: Jörn Engel @ 2003-03-05 21:40 UTC (permalink / raw) To: Randy.Dunlap; +Cc: Russell King, kraxel, linux-kernel On Wed, 5 March 2003 07:34:37 -0800, Randy.Dunlap wrote: > Russell King <rmk@arm.linux.org.uk> wrote: > | On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote: > > | > Not sure what is the best idea to fix that. Don't like the kmalloc > | > idea that much. The individual structs are not huge, the real problem > | > is that many of them are allocated and only few are needed. Any > | > chance to tell gcc that it should allocate block-local variables at > | > the start block not at the start of the function? > | > | Not a particularly clean idea, but maybe creating a union of the > | structures and putting that on the stack? (ie, doing what GCC should > | be doing in the first place.) > > That's an idea. Or make separate called functions for each ioctl and declare > the structures inside them? As far as I have seen, at least ds_ioctl uses the union trick. And the three marked (*) ioctl-functions appear to be suffering from the same gcc inability. How much complaining is necessary for the gcc folks to worry about this? 0xc0ad420b <snd_emu10k1_fx8010_ioctl+3/5ec>: sub $0x814,%esp 0xc06dbd77 <v4l_compat_translate_ioctl+3/159c>: sub $0x804,%esp 0xc0521e43 <ida_ioctl+3/388>: sub $0x528,%esp * 0xc01ed733 <presto_ioctl+3/13c4>: sub $0x4d0,%esp * 0xc0bfbfd3 <br_ioctl_device+3/454>: sub $0x474,%esp 0xc07968bb <megadev_ioctl+3/c94>: sub $0x394,%esp * 0xc068ff5b <ray_dev_ioctl+3/8b8>: sub $0x2e0,%esp 0xc0583a63 <netdev_ethtool_ioctl+3/e64>: sub $0x2c8,%esp 0xc0865963 <ds_ioctl+3/6b0>: sub $0x2ac,%esp 0xc04370c3 <vt_ioctl+3/1a84>: sub $0x29c,%esp 0xc054c5eb <e1000_ethtool_ioctl+3/6f8>: sub $0x290,%esp 0xc0299b23 <ncp_ioctl+3/13b0>: sub $0x278,%esp 0xc0532e67 <DAC960_UserIOCTL+3/17c0>: sub $0x230,%esp Jörn -- Optimizations always bust things, because all optimizations are, in the long haul, a form of cheating, and cheaters eventually get caught. -- Larry Wall ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 15:34 ` Randy.Dunlap 2003-03-05 21:40 ` Jörn Engel @ 2003-03-11 4:27 ` Randy.Dunlap 2003-03-11 9:19 ` Gerd Knorr 1 sibling, 1 reply; 12+ messages in thread From: Randy.Dunlap @ 2003-03-11 4:27 UTC (permalink / raw) To: rddunlap; +Cc: rmk, kraxel, linux-kernel > On Wed, 5 Mar 2003 09:35:34 +0000 > Russell King <rmk@arm.linux.org.uk> wrote: > > | On Wed, Mar 05, 2003 at 10:15:52AM +0100, Gerd Knorr wrote: > | > But when looking at the disasm output it is obvious that it isn't true | >> (at least with gcc 3.2). On the other hand it is common practice in | > > many drivers, there must be a reason for that, no? Any chance this | > used > to work with older gcc versions? > | > | I don't believe so - I seem to remember looking at gcc 2.95 and finding | > the same annoying behaviour. AFAIK we have no reason to believe that this will work and we have counter-examples of it continuing to be a problem. > Yes, that's what I'm seeing with 2.96 as well. > > | > Not sure what is the best idea to fix that. Don't like the kmalloc | > > idea that much. The individual structs are not huge, the real problem | > > is that many of them are allocated and only few are needed. Any | > chance > to tell gcc that it should allocate block-local variables at | > the start > block not at the start of the function? Not that anyone has mentioned here. > | Not a particularly clean idea, but maybe creating a union of the | > structures and putting that on the stack? (ie, doing what GCC should | be > doing in the first place.) I looked into this. The various structures are used in a variety of combinations, so this method looks bad IMO. > That's an idea. Or make separate called functions for each ioctl and > declare the structures inside them? and this method looks like a possibility. Gerd, Andrew says that the kmalloc() time will be small compared to the memset()'s that the function is already doing. Do you want to do anything about this, or want me to, or want me to drop it? > -- > > Here are some v4l structure sizes from gcc 2.96 on a P4: > > sizeof v4l2_audio: 52 > sizeof v4l2_buffer: 68 > sizeof v4l2_capability: 104 > sizeof v4l2_fmtdesc: 64 > sizeof v4l2_format: 204 > sizeof v4l2_framebuffer: 44 > sizeof v4l2_frequency: 44 > sizeof v4l2_input: 76 > sizeof v4l2_queryctrl: 68 > sizeof v4l2_requestbuffers: 20 > sizeof v4l2_tuner: 84 > - And here's a summary of how they are used together: sizeof v4l2_audio: 52 + queryctrl:68 + tuner:84 / +tuner:84 sizeof v4l2_buffer: 68 + format:204 / alone sizeof v4l2_capability: 104 + framebuffer:44 sizeof v4l2_framebuffer: 44 + capability:104 / + format:204 sizeof v4l2_frequency: 44 alone sizeof v4l2_fmtdesc: 64 + format:204 sizeof v4l2_format: 204 + fmtdesc:64 / + buffer:68 sizeof v4l2_input: 76 alone / alone sizeof v4l2_queryctrl: 68 + audio:52 + tuner:84 sizeof v4l2_tuner: 84 alone / + audio:52 + queryctrl:68 / + audio:52 ~Randy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-11 4:27 ` Randy.Dunlap @ 2003-03-11 9:19 ` Gerd Knorr 2003-03-11 16:29 ` Alan Cox 0 siblings, 1 reply; 12+ messages in thread From: Gerd Knorr @ 2003-03-11 9:19 UTC (permalink / raw) To: Randy.Dunlap; +Cc: rmk, linux-kernel > > That's an idea. Or make separate called functions for each ioctl and > > declare the structures inside them? > > and this method looks like a possibility. > > Gerd, Andrew says that the kmalloc() time will be small compared to > the memset()'s that the function is already doing. That is wrong, at least the 2k memset/call mentioned by Andrew. There are lots of memset() calls, but they all are within the case switches for the ioctls and zero out only the structs which are used in that code path, so it is actually much smaller (~50 -> ~300 bytes maybe, depending on the ioctl). > Do you want to do anything about this, or want me to, or want me > to drop it? It is on my todo list, just havn't found yet the time to do that. Probably I'll split it into smaller functions, this looks like the best way to me. Gerd ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-11 9:19 ` Gerd Knorr @ 2003-03-11 16:29 ` Alan Cox 2003-03-11 15:36 ` Daniel Jacobowitz 0 siblings, 1 reply; 12+ messages in thread From: Alan Cox @ 2003-03-11 16:29 UTC (permalink / raw) To: Gerd Knorr; +Cc: Randy.Dunlap, rmk, Linux Kernel Mailing List On Tue, 2003-03-11 at 09:19, Gerd Knorr wrote > That is wrong, at least the 2k memset/call mentioned by Andrew. There > are lots of memset() calls, but they all are within the case switches > for the ioctls and zero out only the structs which are used in that code > path, so it is actually much smaller (~50 -> ~300 bytes maybe, depending > on the ioctl). gcc sometimes does things like allocate all the objects in case statements at entry time. I assume its a performance win to do so. Alan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-11 16:29 ` Alan Cox @ 2003-03-11 15:36 ` Daniel Jacobowitz 0 siblings, 0 replies; 12+ messages in thread From: Daniel Jacobowitz @ 2003-03-11 15:36 UTC (permalink / raw) To: Alan Cox; +Cc: Gerd Knorr, Randy.Dunlap, rmk, Linux Kernel Mailing List On Tue, Mar 11, 2003 at 04:29:30PM +0000, Alan Cox wrote: > On Tue, 2003-03-11 at 09:19, Gerd Knorr wrote > > > That is wrong, at least the 2k memset/call mentioned by Andrew. There > > are lots of memset() calls, but they all are within the case switches > > for the ioctls and zero out only the structs which are used in that code > > path, so it is actually much smaller (~50 -> ~300 bytes maybe, depending > > on the ioctl). > > gcc sometimes does things like allocate all the objects in case > statements at entry time. I assume its a performance win to do so. No, it's more likely a known GCC bug to do so. See PR middle-end/9997 if you're really curious. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 9:15 ` Gerd Knorr 2003-03-05 9:35 ` Russell King @ 2003-03-05 11:50 ` mdew 2003-03-05 12:28 ` Gerd Knorr 1 sibling, 1 reply; 12+ messages in thread From: mdew @ 2003-03-05 11:50 UTC (permalink / raw) To: Gerd Knorr; +Cc: Linux Kernel On the topic of v4l, Ive had complete lockups on recent 2.5.x kernels when loading a v4l application Zapping. I havent tried anyother v4l applications just yet, have you seen this in action? should i compile debuging support to show it? /var/log/messages currently doesnt show it. Running Debian Sid/Gcc 3.2.3 20030228 -- mdew <mdew@mdew.dyndns.org> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: reducing stack usage in v4l? 2003-03-05 11:50 ` mdew @ 2003-03-05 12:28 ` Gerd Knorr 0 siblings, 0 replies; 12+ messages in thread From: Gerd Knorr @ 2003-03-05 12:28 UTC (permalink / raw) To: linux-kernel mdew <mdew@mdew.dyndns.org> writes: > On the topic of v4l, Ive had complete lockups on recent 2.5.x kernels > when loading a v4l application Zapping. There are a few known issues, pushing updates to Linus is on my todo list. If it still happens with the latest bits from http://bytesex.org/snapshot/ I'd like to know. Gerd -- /join #zonenkinder ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2003-03-11 15:26 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-03-05 5:25 reducing stack usage in v4l? Randy.Dunlap 2003-03-05 5:43 ` Andrew Morton 2003-03-05 9:15 ` Gerd Knorr 2003-03-05 9:35 ` Russell King 2003-03-05 15:34 ` Randy.Dunlap 2003-03-05 21:40 ` Jörn Engel 2003-03-11 4:27 ` Randy.Dunlap 2003-03-11 9:19 ` Gerd Knorr 2003-03-11 16:29 ` Alan Cox 2003-03-11 15:36 ` Daniel Jacobowitz 2003-03-05 11:50 ` mdew 2003-03-05 12:28 ` Gerd Knorr
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox