* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback [not found] ` <20170606163401.GA27288@wotan.suse.de> @ 2017-06-06 17:52 ` Luis R. Rodriguez 0 siblings, 0 replies; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-06 17:52 UTC (permalink / raw) To: linux-fsdevel Cc: Martin Fuzzey, mcgrof, Matthew Wilcox, Alan Cox, Jonathan Corbet, Michael Kerrisk (man-pages), Linux API, David Howells, Dmitry Torokhov, Peter Zijlstra, Eric W. Biederman, Andy Lutomirski, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, Peter Jones, Hans de G oede, Ted Ts'o, linux-kernel@vger.kernel.org Used wrong alias for fsdevel now, its linux-fsdevel ... Luis On Tue, Jun 06, 2017 at 06:34:01PM +0200, Luis R. Rodriguez wrote: > Adding fsdevel for review on the correct semantics of handling signals on > write(), in this case a sysfs write which triggered a sync request firmware > call and what the firmware API should return in such case of a signal (I gather > this should be -EINTR and not -ERESTARTSYS). Also whether or not SIGINT should > be followed or if only allowing SIGKILL is fine (fine by me, but it would > change old behaviour). > > Hoping between fsdevel and linux-api folks we can hash this out. > > On Tue, Jun 06, 2017 at 11:04:37AM +0200, Martin Fuzzey wrote: > > On 05/06/17 22:24, Luis R. Rodriguez wrote: > > > > > > > > > For these two reasons then it would seem best we do two things actually: > > > > > > 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout() > > > got interrupted by a signal (it returns -ERESTARTSYS) > > > > > > I disagree. That would force userspace to handle the signal rather than > > having the kernel retry. > > > > From Documentation/DocBook/kernel-hacking.tmpl: > > > > After you slept you should check if a signal occurred: the > > Unix/Linux way of handling signals is to temporarily exit the > > system call with the <constant>-ERESTARTSYS</constant> error. The > > system call entry code will switch back to user context, process > > the signal handler and then your system call will be restarted > > (unless the user disabled that). So you should be prepared to > > process the restart, e.g. if you're in the middle of manipulating > > some data structure. > > This applies but you are missing my point that the LWN article [0] I referred > to also stated "Kernel code which uses interruptible sleeps must always check > to see whether it woke up as a result of a signal, and, if so, clean up > whatever it was doing and return -EINTR back to user space." -- I realize there > may be contradiction with above documentation -- this perhaps can be clarified > with fsdevel folks *but* regardless of that the same article notes Alan Cox > explains that "Unix tradition (and thus almost all applications) believe file > store writes to be non signal interruptible. It would not be safe or practical > to change that guarantee." So for this reason alone there does seem to be an > exemption to the above documentation worth noting for file store writes, and > the patch which you tested below *moves* the sysfs write op for firmware in > that direction by adding a new killable swait. > > [0] https://lwn.net/Articles/288056/ > > > > 2) Do as you note below and add wait_event_killable_timeout() > > > > Hum, I do think that would be better but, (please correct me if I'm wrong) > > the _killable_ variants only allow SIGKILL (and not SIGINT). > > That seems correct given a TASK_KILLABLE is also TASK_UNINTERRUPTIBLE. > > > 0cb64249ca "firmware_loader: abort request if wait_for_completion is > > interrupted" > > > > specifically mentrions ctrl-c (SIGINT) in the commit message so that would > > no longer work. > > Great point, but it *also* allowed SIGKILL, so I do feel the goal was also to > allow it to be killable. I'm afraid that patch probably did not get proper > review from sufficient folks and its worth now asking ourselves what we'd like > to do. I'm fine with letting go of SIGINT for firmware sysfs calls for the > sake of keeping with the long standing unix tradition on write, given we *still > have SIGKILL*. > > > Myself I think having to use kill -9 to interrupt firmware loading by a > > usespace helper is OK but others may disagree. > > Its why I added fsdevel as well. This is really a semantics and uapi question. > Between fsdevel and linux-api folks I would hope we can come to a sensible > resolution. > > > > I do not see why we could not introduce wait_event_killable_timeout() > > > and swait_event_killable_timeout() into -stables. > > > After seeing how simple it is to do so I tend to agree. Greg, Peter, > > > what are your thoughts ? > > > > > > Martin Fuzzey can you test this patch as an alternative to your issue ? > > > > > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > > > index b9f907eedbf7..70fc42e5e0da 100644 > > > --- a/drivers/base/firmware_class.c > > > +++ b/drivers/base/firmware_class.c > > > @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout) > > > { > > > long ret; > > > - ret = swait_event_interruptible_timeout(fw_st->wq, > > > + ret = swait_event_killable_timeout(fw_st->wq, > > > __fw_state_is_done(READ_ONCE(fw_st->status)), > > > timeout); > > > if (ret != 0 && fw_st->status == FW_STATUS_ABORTED) > > > diff --git a/include/linux/swait.h b/include/linux/swait.h > > > index c1f9c62a8a50..9c5ca2898b2f 100644 > > > --- a/include/linux/swait.h > > > +++ b/include/linux/swait.h > > > @@ -169,4 +169,29 @@ do { \ > > > __ret; \ > > > }) > > > +#define __swait_event_killable(wq, condition) \ > > > + (void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule()) > > > + > > > +#define swait_event_killable(wq, condition) \ > > > +({ \ > > > + int __ret = 0; \ > > > + if (!(condition)) \ > > > + __ret = __swait_event_killable(wq, condition); \ > > > + __ret; \ > > > +}) > > > + > > > +#define __swait_event_killable_timeout(wq, condition, timeout) \ > > > + ___swait_event(wq, ___wait_cond_timeout(condition), \ > > > + TASK_INTERRUPTIBLE, timeout, \ > > > + __ret = schedule_timeout(__ret)) > > > + > > > > Should be TASK_KILLABLE above > > Oops yes sorry. > > > > +#define swait_event_killable_timeout(wq, condition, timeout) \ > > > +({ \ > > > + long __ret = timeout; \ > > > + if (!___wait_cond_timeout(condition)) \ > > > + __ret = __swait_event_killable_timeout(wq, \ > > > + condition, timeout); \ > > > + __ret; \ > > > +}) > > > + > > > #endif /* _LINUX_SWAIT_H */ > > > > > > Luis > > > > After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me. > > Great, thanks for testing. > > Luis > -- Luis Rodriguez, SUSE LINUX GmbH Maxfeldstrasse 5; D-90409 Nuernberg ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <1496760796.5682.48.camel@linux.intel.com>]
[parent not found: <20170606164734.GB27288@wotan.suse.de>]
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback [not found] ` <20170606164734.GB27288@wotan.suse.de> @ 2017-06-06 17:54 ` Luis R. Rodriguez 2017-06-06 22:11 ` Theodore Ts'o 1 sibling, 0 replies; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-06 17:54 UTC (permalink / raw) To: linux-fsdevel Cc: Alan Cox, fsdevel, Luis R. Rodriguez, Stephen Boyd, Li, Yi, Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, Ted Ts'o, linux-kernel@vger.kernel.org Using the right linux-fsdevel this time also, this was the second reply. Luis On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: > Adding fsdevel folks. > > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: > > > "Unix tradition (and thus almost all applications) believe file store > > > writes to > > > be non signal interruptible. It would not be safe or practical to > > > change that > > > guarantee." > > > > Yep everyone codes > > > > write(disk_file, "foo", 3); > > > > not while(..) blah around it. > > Thanks for the confirmation! That's a simple enough explanation. > > > > For these two reasons then it would seem best we do two things > > > actually: > > > > > > 1) return -EINTR instead of -EAGAIN when we detect > > > swait_event_interruptible_timeout() > > > got interrupted by a signal (it returns -ERESTARTSYS) > > > 2) Do as you note below and add wait_event_killable_timeout() > > > > Pedantic detail that I don't think affects you > > > > If you have completed a part of the I/O then you should return the byte > > processed count not EINTR, but -1,EINTR if no progress was made. > > You are right with some new exceptions and with regards to the future: > > The syfs loading interface for firmware currently goes through the > data file exposed on syfs, the respective write op firmware_data_write() > only checks for signals at the beginning. After that its a full one > swoop try to write if you are following the old tradition and are using > a buffer allocated by the firmware API. > > If you are using the relatively new request_firmware_into_buf() added > by Stephen Boyd which lets the driver provide the allocated buffer then > we have a loop in firmware_rw() which should be fixed to: > > 1) Check for signals > 2) Do what you noted above. > > Furthermore Yi Li over at Intel is adding some new API calls which would > re-use some of this for FPGA firmwares which are also very large, that > work should consider the above and fix appropriately as well. > > Luis > -- Luis Rodriguez, SUSE LINUX GmbH Maxfeldstrasse 5; D-90409 Nuernberg ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback [not found] ` <20170606164734.GB27288@wotan.suse.de> 2017-06-06 17:54 ` Luis R. Rodriguez @ 2017-06-06 22:11 ` Theodore Ts'o 2017-06-07 0:22 ` Luis R. Rodriguez 1 sibling, 1 reply; 18+ messages in thread From: Theodore Ts'o @ 2017-06-06 22:11 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Alan Cox, linux-fsdevel, Stephen Boyd, Li, Yi, Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: > > Yep everyone codes > > > > write(disk_file, "foo", 3); > > > > not while(..) blah around it. In general I/O to tty devices and other character mode devices was where you definitely needed to check for EINTR/EAGAIN because that was the place where historically Unix systems would interrupt system calls --- e.g., a user typing control-Z, for example. And in general writes to file systems and block devices in *general* were never interrupted by signals, although that was always a non-portable assumption. So I've always subscribed to the "be liberal in what you accept, conservative in what you send" rule of thumb. Which is to say, any programs *I* write I'll in general always check for EINTR/EAGAIN and check for partial writes, but in general, as a kernel program I try to adhere to the long-standing Unix trandition for disk based files. This does beg the question about whether firmware devices are more like tty devices or block devices or files, though. If before signals never caused them to return EINTR/EAGAIN, then it's probably best to not break backwards compatbility. That being said, not that you also have the option of using -ERESTARTNOINTR (always restart the system call, regardless of how the sighandle flags were set), and -ERESTARTNOHAND (restart the system call always if there was no signal handler and the process was not killed), in addition to -ERESTARTSYS. So that might be another option that's fairly easy to implement or experiment with. - Ted ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-06 22:11 ` Theodore Ts'o @ 2017-06-07 0:22 ` Luis R. Rodriguez 2017-06-07 4:56 ` Andy Lutomirski 0 siblings, 1 reply; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-07 0:22 UTC (permalink / raw) To: Theodore Ts'o, Luis R. Rodriguez, Alan Cox, linux-fsdevel, Stephen Boyd, Li, Yi, Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote: > On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: > > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: > > > Yep everyone codes > > > > > > write(disk_file, "foo", 3); > > > > > > not while(..) blah around it. > > In general I/O to tty devices and other character mode devices was > where you definitely needed to check for EINTR/EAGAIN because that was > the place where historically Unix systems would interrupt system calls > --- e.g., a user typing control-Z, for example. > > And in general writes to file systems and block devices in *general* > were never interrupted by signals, although that was always a > non-portable assumption. > > So I've always subscribed to the "be liberal in what you accept, > conservative in what you send" rule of thumb. Which is to say, any > programs *I* write I'll in general always check for EINTR/EAGAIN and > check for partial writes, but in general, as a kernel program I try to > adhere to the long-standing Unix trandition for disk based files. OK so userspace should consider checking for EINTR/EAGAIN. On the kernel front though we we do *strive* to go by the old unix tradition -- there are exceptions though, of course. > This does beg the question about whether firmware devices are more > like tty devices or block devices or files, though. And here you raise the analogy to see if its *worthy* to break the old unix tradition, the tty example is a case that we seem to accept as reasonable for an exception, clearly. This help, thanks! So, "firmware devices" is a misnomer, the firmware API does direct FS lookup and only if that fails and the distribution enabled the fallback mechanism will it be used if the file was not found on /lib/firmware/ paths. So the syfs interface we are evaluating here is this fallback mechanism. The "firmware device" then is really more a user of the firmware API, and these do vary widely. In fact it recent developments with the "driver data API" generalize the interface given things outside our typical device drivers for what we know as old "firmware" are looking to use the firmware API for looking for files from userspace. This in fact already has happened, it was just subtle: we look for default EEPROMs, configururation files, microcode updates, and soon we'll want to replace the userspace wireless CRDA tool for the regulatory database with a simple file lookup once we get firmware signing upstream. So -- more and more the API is being used for general file lookups. Which type of drivers request these files ? It all varies across the board, and we have really early things like CPU microcode files (so we enable built-in files), regular old firmware files for things like networking drivers, subsystem configuration stuff (for the wireless regulatory database), and it seems now *huge* (100s of MiBs I'm told) for remote-proc and FPGAs. The FPGA use case is one use case where having the uploader use a while loop makes more sense. Drivers may also want to have more fine control to the buffer where the "driver data" gets stashed into, so the API request_firmware_into_buf() was added for this purpose. The FPGA devices seem to want to use this breakdown mechanism as well. Although the remote-proc folks seem to be relying on the sysfs interface as a default mechanism. So it would seem to make sense to support the loop thing since some files these days *can* be really big. I think we can easily address this by relying on the killable signal (SIGKILL), rather than capturing SIGINT (CTRL-C). One problem is we had supported SIGINT before, it was reported however we never gave back to userspace the fact that a signal was captured, we always returned EAGAIN, so usersace did not know WTF was going on. The reporter wanted to detect this and wanted to get us to return the same value the wait call passed to the firmware API, -ERESTARTSYS. Hence this patch, however since -ERESTARTSYS is special and folks may just restart the syscall always when this happens, one question is if we should return EINTR instead of today's EAGAIN. > If before signals > never caused them to return EINTR/EAGAIN, then it's probably best to > not break backwards compatbility. There was a time where did not have signal, and always just used to return -ENOMEM on failure. After signals support was added we still were returning -ENOMEM for a while. It was only later that we started capturing the actual signal, however this was masked as EAGAIN, for no precise good reason, I suppose just lack of review. The patch in question wants us to return -ERESTARTSYS so that userspace can restart the upload. But the issue was that the sysfs firmware loader was killed by a signal as well, so this is why Dmitry noted that we can just fix this issue by just making the wait killable with SIGKILL only. The swait killable patch I supplied upon Dmitry's suggestion as a replacement would do away with capturing SIGINT but allow SIGKILL then. Since EAGAIN was always returned even if a signal was captured I'm inclined to take the position we really never gave userspace a proper clue about the signal. Additionally I cannot see why userspace would rely on SIGINT working *only* but not *SIGKILL*. The risk here I think is if userspace ever *did* rely on SIGINT for the sysfs interface as userspace functional API. If this is not worth breaking then I suppose we are stuck with the current wait. If we do that I'd say we return EINTR, based on what I have read so far. If doing away with SIGINT but keeping SIGKILL is rather sane then I'd go with the approach of the new swait killable + returning -EINTR. > That being said, note that you also have the option of using > -ERESTARTNOINTR (always restart the system call, regardless of how the > sighandle flags were set), and -ERESTARTNOHAND (restart the system > call always if there was no signal handler and the process was not > killed), in addition to -ERESTARTSYS. We rely on swait, and swait right now only uses -ERESTARTSYS. Are you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR or -ERESTARTNOHAND if we see fit for some future functionality / need ? > So that might be another option > that's fairly easy to implement or experiment with. Thanks! Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 0:22 ` Luis R. Rodriguez @ 2017-06-07 4:56 ` Andy Lutomirski 2017-06-07 6:25 ` Dmitry Torokhov 0 siblings, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2017-06-07 4:56 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li, Yi, Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote: >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: > > We rely on swait, and swait right now only uses -ERESTARTSYS. Are > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR > or -ERESTARTNOHAND if we see fit for some future functionality / need ? I think that has essentially nothing to do with swait. User code does some syscall. That syscall triggers a firmware load. The caller gets a signal. If you're going to let firmware load get interrupted, you need to consider what the syscall is. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 4:56 ` Andy Lutomirski @ 2017-06-07 6:25 ` Dmitry Torokhov 2017-06-07 12:25 ` Alan Cox 2017-06-09 1:14 ` Andy Lutomirski 0 siblings, 2 replies; 18+ messages in thread From: Dmitry Torokhov @ 2017-06-07 6:25 UTC (permalink / raw) To: Andy Lutomirski Cc: Luis R. Rodriguez, Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote: > On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote: > >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: > >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: > > > > We rely on swait, and swait right now only uses -ERESTARTSYS. Are > > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR > > or -ERESTARTNOHAND if we see fit for some future functionality / need ? > > I think that has essentially nothing to do with swait. User code does > some syscall. That syscall triggers a firmware load. The caller gets > a signal. If you're going to let firmware load get interrupted, you > need to consider what the syscall is. I think it is way too complicated and I do not think driver writers will stand a chance of implementing this correctly, given that often firmware load might be triggered indirectly and by multitude of syscalls. What's wrong with saying that the only way to interrupt firmware loading is to kill the process? So ctrl-c will no longer interrupt it, but I do not think that ease of aborting firmware update is primary goal here. I consider simple is good here. Thanks. -- Dmitry ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 6:25 ` Dmitry Torokhov @ 2017-06-07 12:25 ` Alan Cox 2017-06-07 17:15 ` Luis R. Rodriguez 2017-06-09 1:14 ` Andy Lutomirski 1 sibling, 1 reply; 18+ messages in thread From: Alan Cox @ 2017-06-07 12:25 UTC (permalink / raw) To: Dmitry Torokhov, Andy Lutomirski Cc: Luis R. Rodriguez, Theodore Ts'o, Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org > What's wrong with saying that the only way to interrupt firmware > loading is to kill the process? So ctrl-c will no longer interrupt > it, but I do not think that ease of aborting firmware update is > primary goal here. I consider simple is good here. Agreed 100%. The user process did not ask for firmware load, it asked for an I/O operation. Semantically it should appear as if someone else did the firmware load and it just had to wait for it to happen. Alan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 12:25 ` Alan Cox @ 2017-06-07 17:15 ` Luis R. Rodriguez 0 siblings, 0 replies; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-07 17:15 UTC (permalink / raw) To: Alan Cox Cc: Dmitry Torokhov, Andy Lutomirski, Luis R. Rodriguez, Theodore Ts'o, Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org On Wed, Jun 07, 2017 at 01:25:51PM +0100, Alan Cox wrote: > > What's wrong with saying that the only way to interrupt firmware > > loading is to kill the process? So ctrl-c will no longer interrupt > > it, but I do not think that ease of aborting firmware update is > > primary goal here. I consider simple is good here. > > Agreed 100%. The user process did not ask for firmware load, it asked > for an I/O operation. Semantically it should appear as if someone else > did the firmware load and it just had to wait for it to happen. Fine by me ! Will wrap up the patch for the new killable swait then. I suppose noting it as a stable fix is worth it given the known issues with for example Android killing loaders unexpectedly. Unless I hear otherwise I'll also provide a follow up to return -EINTR instead of -EAGAIN if swait returned -ERESTARTSYS, this way at least userspace could tell a signal was definitely received. I *don't* think that follow up is required for stable though. Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 6:25 ` Dmitry Torokhov 2017-06-07 12:25 ` Alan Cox @ 2017-06-09 1:14 ` Andy Lutomirski 2017-06-09 1:33 ` Luis R. Rodriguez 1 sibling, 1 reply; 18+ messages in thread From: Andy Lutomirski @ 2017-06-09 1:14 UTC (permalink / raw) To: Dmitry Torokhov Cc: Andy Lutomirski, Luis R. Rodriguez, Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov <dmitry.torokhov@gmail.com> wrote: > On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote: >> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote: >> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: >> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: >> > >> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are >> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR >> > or -ERESTARTNOHAND if we see fit for some future functionality / need ? >> >> I think that has essentially nothing to do with swait. User code does >> some syscall. That syscall triggers a firmware load. The caller gets >> a signal. If you're going to let firmware load get interrupted, you >> need to consider what the syscall is. > > I think it is way too complicated and I do not think driver writers will > stand a chance of implementing this correctly, given that often firmware > load might be triggered indirectly and by multitude of syscalls. > That's what I meant, but I said it unclearly. I meant that, if we're going to start allowing interruption, we would need to audit all the callers. Ugh. I suppose we could have request_firmware_interruptable(), but that seems like it's barely worth it. --Andy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-09 1:14 ` Andy Lutomirski @ 2017-06-09 1:33 ` Luis R. Rodriguez 2017-06-09 21:29 ` Luis R. Rodriguez 0 siblings, 1 reply; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-09 1:33 UTC (permalink / raw) To: Andy Lutomirski Cc: Dmitry Torokhov, Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org, Julia Lawall, Luis R. Rodriguez On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <luto@kernel.org> wrote: > On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov > <dmitry.torokhov@gmail.com> wrote: >> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote: >>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote: >>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote: >>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote: >>> > >>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are >>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR >>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ? >>> >>> I think that has essentially nothing to do with swait. User code does >>> some syscall. That syscall triggers a firmware load. The caller gets >>> a signal. If you're going to let firmware load get interrupted, you >>> need to consider what the syscall is. >> >> I think it is way too complicated and I do not think driver writers will >> stand a chance of implementing this correctly, given that often firmware >> load might be triggered indirectly and by multitude of syscalls. >> > > That's what I meant, but I said it unclearly. I meant that, if we're > going to start allowing interruption, we would need to audit all the > callers. Ugh. There are actually two audits worth evaluating if what we've concluded is fair game: a) firmware sync calls on interruptible paths b) use of swait / old interruptible waits on sysfs paths As for a) we drove tons of code away from using sync, request firmware, and on async firmware the return value is lost, we only can currently know if a failure of some sort occurred. If that push to async failed we also scared away tons of drivers to use request_firmware() calls on init and later incorrectly on probe due to serialized init + probe delay on boot. From what I recall my last Coccinelle evil-monster-hunt, I did indeed find quite a bit of drivers still still relying on sync firmware which ultimately revealed use on a probe path. The signal however was only effective as of commit 0cb64249ca500 ("firmware_loader: abort request if wait_for_completion is interrupted") merged as of v4.0. Creating awareness of the issue seems fair, but I don't think its worth a huge fly swatter. I have no information to contribute for b) other than I was reluctant to even consider it. > I suppose we could have request_firmware_interruptable(), but that > seems like it's barely worth it. I don't think that's worth it, given the signal was effective only as of v4.0, we already had a big push away from sync requests, and also had the "don't use request firmware" on init scare which also propagated to probe later. Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-09 1:33 ` Luis R. Rodriguez @ 2017-06-09 21:29 ` Luis R. Rodriguez 0 siblings, 0 replies; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-09 21:29 UTC (permalink / raw) To: Andy Lutomirski Cc: Dmitry Torokhov, Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org, Julia Lawall, Luis R. Rodriguez, atull On Thu, Jun 8, 2017 at 6:33 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <luto@kernel.org> wrote: >> That's what I meant, but I said it unclearly. I meant that, if we're >> going to start allowing interruption, we would need to audit all the >> callers. Ugh. > > There are actually two audits worth evaluating if what we've concluded > is fair game: > > a) firmware sync calls on interruptible paths > b) use of swait / old interruptible waits on sysfs paths And as I noted in the other thread -- another possible issue could be any swait / interruptable wait on init or probe. Provided any child completes and the kernel code for wait handler does abort, that request would be terminated. This could for instance happen at bootup as modules load and any child from the loader terminates. We already have Coccinelle grammar to hunt for "though shall not request firmware on init or probe", such SmPL grammar could be in turn be repruposed to hunt for these types of conditions. Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback [not found] ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg@mail.gmail.com> [not found] ` <87fufr3mdy.fsf@xmission.com> @ 2017-06-07 17:08 ` Luis R. Rodriguez 2017-06-07 17:54 ` Martin Fuzzey 1 sibling, 1 reply; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-07 17:08 UTC (permalink / raw) To: Fuzzey, Martin, linux-fsdevel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov Cc: Luis R. Rodriguez, Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote: > 1) Android init calls write() on the sysfs file > 2) The sysfs .store() callback registered by a driver is called > 3) The driver calls request_firmware() > 4) request_firmware() sends the firmware load request to userspace and > calls wait_for_completion_interruptible() Martin, just for completeness on documenting on the commit log of the next swait proposed fix for this -- what signal did the process get from which you note the child dies below ? Exactly what in Android sent this signal ? > 5) A child dies and raises SIGCHLD > 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 17:08 ` Luis R. Rodriguez @ 2017-06-07 17:54 ` Martin Fuzzey 2017-06-09 1:10 ` Luis R. Rodriguez 0 siblings, 1 reply; 18+ messages in thread From: Martin Fuzzey @ 2017-06-07 17:54 UTC (permalink / raw) To: Luis R. Rodriguez, linux-fsdevel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov Cc: Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org On 07/06/17 19:08, Luis R. Rodriguez wrote: > On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote: >> 1) Android init calls write() on the sysfs file >> 2) The sysfs .store() callback registered by a driver is called >> 3) The driver calls request_firmware() >> 4) request_firmware() sends the firmware load request to userspace and >> calls wait_for_completion_interruptible() > Martin, just for completeness on documenting on the commit log of the next > swait proposed fix for this -- what signal did the process get from which you > note the child dies below ? Exactly what in Android sent this signal ? Android didn't send the signal, the kernel did (SIGCHLD). Like this: 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally unrelated to firmware loading] 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which ends up calling request_firmware() kernel side 3) The firmware loading fallback mechanism is used, the request is sent to userspace and pid 1 waits in the kernel on wait_* 4) before firmware loading completes pid 42 dies (for any reason - in my case normal termination) 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which causes -ERESTARTSYS to be returned from wait_* Martin ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-07 17:54 ` Martin Fuzzey @ 2017-06-09 1:10 ` Luis R. Rodriguez 2017-06-09 1:57 ` Luis R. Rodriguez 0 siblings, 1 reply; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-09 1:10 UTC (permalink / raw) To: Martin Fuzzey Cc: linux-fsdevel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <mfuzzey@parkeon.com> wrote: > On 07/06/17 19:08, Luis R. Rodriguez wrote: >> >> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote: >>> >>> 1) Android init calls write() on the sysfs file >>> 2) The sysfs .store() callback registered by a driver is called >>> 3) The driver calls request_firmware() >>> 4) request_firmware() sends the firmware load request to userspace and >>> calls wait_for_completion_interruptible() >> >> Martin, just for completeness on documenting on the commit log of the next >> swait proposed fix for this -- what signal did the process get from which >> you >> note the child dies below ? Exactly what in Android sent this signal ? > > > Android didn't send the signal, the kernel did (SIGCHLD). > > Like this: > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally > unrelated to firmware loading] > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which > ends up calling request_firmware() kernel side > 3) The firmware loading fallback mechanism is used, the request is sent to > userspace and pid 1 waits in the kernel on wait_* > 4) before firmware loading completes pid 42 dies (for any reason - in my > case normal termination) Interesting, could one interpretation here be that the process successfully finishing + the signal being sent beats out the timing of the firmware_class syfs code detecting that the write completed ? > 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which > causes -ERESTARTSYS to be returned from wait_* Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-09 1:10 ` Luis R. Rodriguez @ 2017-06-09 1:57 ` Luis R. Rodriguez 2017-06-09 7:40 ` Martin Fuzzey 0 siblings, 1 reply; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-09 1:57 UTC (permalink / raw) To: Martin Fuzzey Cc: Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org, Luis R. Rodriguez On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <mfuzzey@parkeon.com> wrote: >> On 07/06/17 19:08, Luis R. Rodriguez wrote: >>> >>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote: >>>> >>>> 1) Android init calls write() on the sysfs file >>>> 2) The sysfs .store() callback registered by a driver is called >>>> 3) The driver calls request_firmware() >>>> 4) request_firmware() sends the firmware load request to userspace and >>>> calls wait_for_completion_interruptible() >>> >>> Martin, just for completeness on documenting on the commit log of the next >>> swait proposed fix for this -- what signal did the process get from which >>> you >>> note the child dies below ? Exactly what in Android sent this signal ? >> >> >> Android didn't send the signal, the kernel did (SIGCHLD). >> >> Like this: >> >> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally >> unrelated to firmware loading] >> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which >> ends up calling request_firmware() kernel side >> 3) The firmware loading fallback mechanism is used, the request is sent to >> userspace and pid 1 waits in the kernel on wait_* >> 4) before firmware loading completes pid 42 dies (for any reason - in my >> case normal termination) Martin just to be clear, by "normal case termination" do you mean completing successfully ?? Ie the firmware actually did make it onto the device ? > Interesting, could one interpretation here be that the process > successfully finishing + the signal being sent beats out the timing of > the firmware_class syfs code detecting that the write completed ? > >> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which >> causes -ERESTARTSYS to be returned from wait_* > > Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-09 1:57 ` Luis R. Rodriguez @ 2017-06-09 7:40 ` Martin Fuzzey 2017-06-09 21:12 ` Luis R. Rodriguez 2017-06-09 22:55 ` Luis R. Rodriguez 0 siblings, 2 replies; 18+ messages in thread From: Martin Fuzzey @ 2017-06-09 7:40 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org On 09/06/17 03:57, Luis R. Rodriguez wrote: > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >>> Android didn't send the signal, the kernel did (SIGCHLD). >>> >>> Like this: >>> >>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally >>> unrelated to firmware loading] >>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which >>> ends up calling request_firmware() kernel side >>> 3) The firmware loading fallback mechanism is used, the request is sent to >>> userspace and pid 1 waits in the kernel on wait_* >>> 4) before firmware loading completes pid 42 dies (for any reason - in my >>> case normal termination) > Martin just to be clear, by "normal case termination" do you mean > completing successfully ?? Ie the firmware actually did make it onto > the device ? The firmware did *not* make it onto the device since the request_firmware() call returned an error (the code that would have transfered it to the device is only executed following a successful request_firmware) The process that terminates normally is unrelated to firmware loading as I said above. The only things that matter are: - It is a child process of the process that calls request_firmware() - It terminates *while* the the wait_ is still in progress Here is a way of reproducing the problem using the test_firmware module (which I only just saw) on normal linux with no Android or custom driver #!/bin/sh set -e # Make sure the system firmware loader doesn't get in the way /etc/init.d/udev stop modprobe test_firmware DIR=/sys/devices/virtual/misc/test_firmware echo 10 >/sys/class/firmware/timeout; sleep 2 & echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request; If run with the "sleep 2 &" it terminates after 2 seconds If the sleep is commented it runs for the expected 10 seconds (the firmware loading timeout) Since the sleep process is a child of the script process requesting a firmware load its death causes a SIGCHLD causing request_firmware() to abort prematurely. Martin ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-09 7:40 ` Martin Fuzzey @ 2017-06-09 21:12 ` Luis R. Rodriguez 2017-06-09 22:55 ` Luis R. Rodriguez 1 sibling, 0 replies; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-09 21:12 UTC (permalink / raw) To: Martin Fuzzey Cc: Luis R. Rodriguez, Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote: > On 09/06/17 03:57, Luis R. Rodriguez wrote: > > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > Android didn't send the signal, the kernel did (SIGCHLD). > > > > > > > > Like this: > > > > > > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally > > > > unrelated to firmware loading] > > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which > > > > ends up calling request_firmware() kernel side > > > > 3) The firmware loading fallback mechanism is used, the request is sent to > > > > userspace and pid 1 waits in the kernel on wait_* > > > > 4) before firmware loading completes pid 42 dies (for any reason - in my > > > > case normal termination) > > Martin just to be clear, by "normal case termination" do you mean > > completing successfully ?? Ie the firmware actually did make it onto > > the device ? > > The firmware did *not* make it onto the device since the request_firmware() > call returned an error > (the code that would have transfered it to the device is only executed > following a successful request_firmware) > > The process that terminates normally is unrelated to firmware loading as I > said above. > > The only things that matter are: > - It is a child process of the process that calls request_firmware() > - It terminates *while* the the wait_ is still in progress > > > Here is a way of reproducing the problem using the test_firmware module > (which I only just saw) on normal linux with no Android or custom driver > > > #!/bin/sh > set -e > > # Make sure the system firmware loader doesn't get in the way > /etc/init.d/udev stop > > modprobe test_firmware > > DIR=/sys/devices/virtual/misc/test_firmware > > echo 10 >/sys/class/firmware/timeout; > sleep 2 & > echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request; > > > > If run with the "sleep 2 &" it terminates after 2 seconds > If the sleep is commented it runs for the expected 10 seconds (the firmware > loading timeout) > > Since the sleep process is a child of the script process requesting a > firmware load its death causes a SIGCHLD causing request_firmware() to abort > prematurely. Thanks this could mean we also *should* trigger a failure if init is issuing modprobe on a series of drivers and one completes before another while request_firmware() is called on init or probe on a subsequent driver. If true I'm surprised this never was reported back when the fallback mechanism was popular, I suppose it was not an issue given most firmware *was* present on /lib/firmware/ and the direct filesystem lookup first step always found the firmware first, so this would only be an issue for folks relying on the fallback mechanism exclusively. Will include a test case based on your above script. Thanks! Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback 2017-06-09 7:40 ` Martin Fuzzey 2017-06-09 21:12 ` Luis R. Rodriguez @ 2017-06-09 22:55 ` Luis R. Rodriguez 1 sibling, 0 replies; 18+ messages in thread From: Luis R. Rodriguez @ 2017-06-09 22:55 UTC (permalink / raw) To: Martin Fuzzey Cc: Luis R. Rodriguez, Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote: > On 09/06/17 03:57, Luis R. Rodriguez wrote: > > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > > > Android didn't send the signal, the kernel did (SIGCHLD). > > > > > > > > Like this: > > > > > > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally > > > > unrelated to firmware loading] > > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which > > > > ends up calling request_firmware() kernel side > > > > 3) The firmware loading fallback mechanism is used, the request is sent to > > > > userspace and pid 1 waits in the kernel on wait_* > > > > 4) before firmware loading completes pid 42 dies (for any reason - in my > > > > case normal termination) > > Martin just to be clear, by "normal case termination" do you mean > > completing successfully ?? Ie the firmware actually did make it onto > > the device ? > > The firmware did *not* make it onto the device since the request_firmware() > call returned an error > (the code that would have transfered it to the device is only executed > following a successful request_firmware) > > The process that terminates normally is unrelated to firmware loading as I > said above. > > The only things that matter are: > - It is a child process of the process that calls request_firmware() > - It terminates *while* the the wait_ is still in progress > > > Here is a way of reproducing the problem using the test_firmware module > (which I only just saw) on normal linux with no Android or custom driver > > > #!/bin/sh > set -e > > # Make sure the system firmware loader doesn't get in the way > /etc/init.d/udev stop > > modprobe test_firmware > > DIR=/sys/devices/virtual/misc/test_firmware > > echo 10 >/sys/class/firmware/timeout; > sleep 2 & > echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request; > > > > If run with the "sleep 2 &" it terminates after 2 seconds > If the sleep is commented it runs for the expected 10 seconds (the firmware > loading timeout) > > Since the sleep process is a child of the script process requesting a > firmware load its death causes a SIGCHLD causing request_firmware() to abort > prematurely. BTW I've run some more tests and if you use request_firmware_nowait() the above issue also is avoidable. Also, if you use the custom fallback (uevent parameter is false) the timeout is max value always. The only way to kill the custom requests is through the echo -1 > loading or the max timeout triggers (don't rely on that!). The custom interface was a bad idea though so best to just avoid it. Anyway I can reproduce in my tests now, will add this testcase as well. The stable fixes will be sent as well. Luis ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-06-09 22:55 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20170524205658.GK8951@wotan.suse.de> [not found] ` <20170524214027.7775-1-mcgrof@kernel.org> [not found] ` <CALCETrXUrirO-vg3M+MGhn=0gZTwx0phsRDS4TCwWWgNYC6RsA@mail.gmail.com> [not found] ` <CAB=NE6VENGmTEZ3_8EWmv6bhT81k=szg0_POWJm2SP+V21CeBw@mail.gmail.com> [not found] ` <CALCETrU4__YUGk36PN=FbuEf0SBaTrxQQqm4sWs2NrZ+6WN7jA@mail.gmail.com> [not found] ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg@mail.gmail.com> [not found] ` <87fufr3mdy.fsf@xmission.com> [not found] ` <20170526194640.GS8951@wotan.suse.de> [not found] ` <CAKdAkRTrcTVOAP5GK-R=Au_tL5WqSn5UkQEzNe5NcCWXS8mbtA@mail.gmail.com> [not found] ` <CAB=NE6Wh9inuA-R9-zegnvJV=s_6o_doMNDOhBVhYn4ZQ_Ku5Q@mail.gmail.com> [not found] ` <20170526215518.GB40877@dtor-ws> [not found] ` <20170605202410.GQ8951@wotan.suse.de> [not found] ` <59367025.3020901@parkeon.com> [not found] ` <20170606163401.GA27288@wotan.suse.de> 2017-06-06 17:52 ` [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback Luis R. Rodriguez [not found] ` <1496760796.5682.48.camel@linux.intel.com> [not found] ` <20170606164734.GB27288@wotan.suse.de> 2017-06-06 17:54 ` Luis R. Rodriguez 2017-06-06 22:11 ` Theodore Ts'o 2017-06-07 0:22 ` Luis R. Rodriguez 2017-06-07 4:56 ` Andy Lutomirski 2017-06-07 6:25 ` Dmitry Torokhov 2017-06-07 12:25 ` Alan Cox 2017-06-07 17:15 ` Luis R. Rodriguez 2017-06-09 1:14 ` Andy Lutomirski 2017-06-09 1:33 ` Luis R. Rodriguez 2017-06-09 21:29 ` Luis R. Rodriguez 2017-06-07 17:08 ` Luis R. Rodriguez 2017-06-07 17:54 ` Martin Fuzzey 2017-06-09 1:10 ` Luis R. Rodriguez 2017-06-09 1:57 ` Luis R. Rodriguez 2017-06-09 7:40 ` Martin Fuzzey 2017-06-09 21:12 ` Luis R. Rodriguez 2017-06-09 22:55 ` Luis R. Rodriguez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).