Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                         ` <20170606163401.GA27288@wotan.suse.de>
@ 2017-06-06 17:52                           ` Luis R. Rodriguez
  0 siblings, 0 replies; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-06 17:52 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Martin Fuzzey, mcgrof, Matthew Wilcox, Alan Cox, Jonathan Corbet,
	Michael Kerrisk (man-pages), Linux API, David Howells,
	Dmitry Torokhov, Peter Zijlstra, Eric W. Biederman,
	Andy Lutomirski, Greg KH, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull,
	Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach,
	Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook,
	AKASHI Takahiro, Peter Jones, Hans de G oede, Ted Ts'o,
	linux-kernel@vger.kernel.org

Used wrong alias for fsdevel now, its linux-fsdevel ...

  Luis

On Tue, Jun 06, 2017 at 06:34:01PM +0200, Luis R. Rodriguez wrote:
> Adding fsdevel for review on the correct semantics of handling signals on
> write(), in this case a sysfs write which triggered a sync request firmware
> call and what the firmware API should return in such case of a signal (I gather
> this should be -EINTR and not -ERESTARTSYS). Also whether or not SIGINT should
> be followed or if only allowing SIGKILL is fine (fine by me, but it would
> change old behaviour).
> 
> Hoping between fsdevel and linux-api folks we can hash this out.
> 
> On Tue, Jun 06, 2017 at 11:04:37AM +0200, Martin Fuzzey wrote:
> > On 05/06/17 22:24, Luis R. Rodriguez wrote:
> > > 
> > > 
> > > For these two reasons then it would seem best we do two things actually:
> > > 
> > > 1) return -EINTR instead of -EAGAIN when we detect swait_event_interruptible_timeout()
> > > got interrupted by a signal (it returns -ERESTARTSYS)
> > 
> > 
> > I disagree. That would force userspace to handle the signal rather than
> > having the kernel retry.
> > 
> > From Documentation/DocBook/kernel-hacking.tmpl:
> > 
> >    After you slept you should check if a signal occurred: the
> >    Unix/Linux way of handling signals is to temporarily exit the
> >    system call with the <constant>-ERESTARTSYS</constant> error.  The
> >    system call entry code will switch back to user context, process
> >    the signal handler and then your system call will be restarted
> >    (unless the user disabled that).  So you should be prepared to
> >    process the restart, e.g. if you're in the middle of manipulating
> >    some data structure.
> 
> This applies but you are missing my point that the LWN article [0] I referred
> to also stated "Kernel code which uses interruptible sleeps must always check
> to see whether it woke up as a result of a signal, and, if so, clean up
> whatever it was doing and return -EINTR back to user space." -- I realize there
> may be contradiction with above documentation -- this perhaps can be clarified
> with fsdevel folks *but* regardless of that the same article notes Alan Cox
> explains that "Unix tradition (and thus almost all applications) believe file
> store writes to be non signal interruptible. It would not be safe or practical
> to change that guarantee." So for this reason alone there does seem to be an
> exemption to the above documentation worth noting for file store writes, and
> the patch which you tested below *moves* the sysfs write op for firmware in
> that direction by adding a new killable swait.
> 
> [0] https://lwn.net/Articles/288056/                                                                                                                                          
>                                                                                                                                                                                 
> > > 2) Do as you note below and add wait_event_killable_timeout()
> > 
> > Hum,  I do think that would be better but, (please correct me if I'm wrong)
> > the _killable_ variants only allow SIGKILL  (and not SIGINT).
> 
> That seems correct given a TASK_KILLABLE is also TASK_UNINTERRUPTIBLE.
> 
> > 0cb64249ca "firmware_loader: abort request if wait_for_completion is
> > interrupted"
> > 
> > specifically mentrions ctrl-c (SIGINT) in the commit message so that would
> > no longer work.
> 
> Great point, but it *also* allowed SIGKILL, so I do feel the goal was also to
> allow it to be killable. I'm afraid that patch probably did not get proper
> review from sufficient folks and its worth now asking ourselves what we'd like
> to do.  I'm fine with letting go of SIGINT for firmware sysfs calls for the
> sake of keeping with the long standing unix tradition on write, given we *still
> have SIGKILL*.
> 
> > Myself I think having to use kill -9 to interrupt firmware loading by a
> > usespace helper is OK but others may disagree.
> 
> Its why I added fsdevel as well. This is really a semantics and uapi question.
> Between fsdevel and linux-api folks I would hope we can come to a sensible
> resolution.
> 
> > > I do not see why we could not introduce wait_event_killable_timeout()
> > > and swait_event_killable_timeout() into -stables.
> > > After seeing how simple it is to do so I tend to agree. Greg, Peter,
> > > what are your thoughts ?
> > > 
> > > Martin Fuzzey can you test this patch as an alternative to your issue ?
> > > 
> > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> > > index b9f907eedbf7..70fc42e5e0da 100644
> > > --- a/drivers/base/firmware_class.c
> > > +++ b/drivers/base/firmware_class.c
> > > @@ -131,7 +131,7 @@ static int __fw_state_wait_common(struct fw_state *fw_st, long timeout)
> > >   {
> > >   	long ret;
> > > -	ret = swait_event_interruptible_timeout(fw_st->wq,
> > > +	ret = swait_event_killable_timeout(fw_st->wq,
> > >   				__fw_state_is_done(READ_ONCE(fw_st->status)),
> > >   				timeout);
> > >   	if (ret != 0 && fw_st->status == FW_STATUS_ABORTED)
> > > diff --git a/include/linux/swait.h b/include/linux/swait.h
> > > index c1f9c62a8a50..9c5ca2898b2f 100644
> > > --- a/include/linux/swait.h
> > > +++ b/include/linux/swait.h
> > > @@ -169,4 +169,29 @@ do {									\
> > >   	__ret;								\
> > >   })
> > > +#define __swait_event_killable(wq, condition)				\
> > > +	(void)___swait_event(wq, condition, TASK_KILLABLE, 0, schedule())
> > > +
> > > +#define swait_event_killable(wq, condition)				\
> > > +({									\
> > > +	int __ret = 0;							\
> > > +	if (!(condition))						\
> > > +		__ret = __swait_event_killable(wq, condition);		\
> > > +	__ret;								\
> > > +})
> > > +
> > > +#define __swait_event_killable_timeout(wq, condition, timeout)		\
> > > +	___swait_event(wq, ___wait_cond_timeout(condition),		\
> > > +		      TASK_INTERRUPTIBLE, timeout,			\
> > > +		      __ret = schedule_timeout(__ret))
> > > +
> > 
> > Should be TASK_KILLABLE above
> 
> Oops yes sorry.
> 
> > > +#define swait_event_killable_timeout(wq, condition, timeout)		\
> > > +({									\
> > > +	long __ret = timeout;						\
> > > +	if (!___wait_cond_timeout(condition))				\
> > > +		__ret = __swait_event_killable_timeout(wq,		\
> > > +						condition, timeout);	\
> > > +	__ret;								\
> > > +})
> > > +
> > >   #endif /* _LINUX_SWAIT_H */
> > > 
> > >    Luis
> > 
> > After replacing TASK_INTERRUPTIBLE with TASK_KILLABLE above it works for me.
> 
> Great, thanks for testing.
> 
>   Luis
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <1496760796.5682.48.camel@linux.intel.com>]

[parent not found: <20170606164734.GB27288@wotan.suse.de>]

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                         ` <20170606164734.GB27288@wotan.suse.de>
@ 2017-06-06 17:54                           ` Luis R. Rodriguez
  2017-06-06 22:11                           ` Theodore Ts'o
  1 sibling, 0 replies; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-06 17:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Alan Cox, fsdevel, Luis R. Rodriguez, Stephen Boyd, Li, Yi,
	Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages), Andy Lutomirski,
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de G oede, Ted Ts'o,
	linux-kernel@vger.kernel.org

Using the right linux-fsdevel this time also, this was the second reply.

  Luis

On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> Adding fsdevel folks.
> 
> On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > > "Unix tradition (and thus almost all applications) believe file store
> > > writes to
> > > be non signal interruptible. It would not be safe or practical to
> > > change that
> > > guarantee."
> > 
> > Yep everyone codes
> > 
> > 	write(disk_file, "foo", 3);
> > 
> > not while(..) blah around it.
> 
> Thanks for the confirmation! That's a simple enough explanation.
> 
> > > For these two reasons then it would seem best we do two things
> > > actually:
> > > 
> > > 1) return -EINTR instead of -EAGAIN when we detect
> > > swait_event_interruptible_timeout()
> > > got interrupted by a signal (it returns -ERESTARTSYS)
> > > 2) Do as you note below and add wait_event_killable_timeout()
> > 
> > Pedantic detail that I don't think affects you
> > 
> > If you have completed a part of the I/O then you should return the byte
> > processed count not EINTR, but -1,EINTR if no progress was made.
> 
> You are right with some new exceptions and with regards to the future:
> 
> The syfs loading interface for firmware currently goes through the
> data file exposed on syfs, the respective write op firmware_data_write()
> only checks for signals at the beginning. After that its a full one
> swoop try to write if you are following the old tradition and are using
> a buffer allocated by the firmware API.
> 
> If you are using the relatively new request_firmware_into_buf() added
> by Stephen Boyd which lets the driver provide the allocated buffer then
> we have a loop in firmware_rw() which should be fixed to:
> 
> 1) Check for signals
> 2) Do what you noted above.
> 
> Furthermore Yi Li over at Intel is adding some new API calls which would
> re-use some of this for FPGA firmwares which are also very large, that
> work should consider the above and fix appropriately as well.
> 
>   Luis
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]                         ` <20170606164734.GB27288@wotan.suse.de>
  2017-06-06 17:54                           ` Luis R. Rodriguez
@ 2017-06-06 22:11                           ` Theodore Ts'o
  2017-06-07  0:22                             ` Luis R. Rodriguez
  1 sibling, 1 reply; 18+ messages in thread
From: Theodore Ts'o @ 2017-06-06 22:11 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alan Cox, linux-fsdevel, Stephen Boyd, Li, Yi, Dmitry Torokhov,
	Peter Zijlstra, Jonathan Corbet, Eric W. Biederman,
	Michael Kerrisk (man-pages), Andy Lutomirski, Greg KH,
	Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells, Peter Jones, Hans de G oede,
	linux-kernel@vger.kernel.org

On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > Yep everyone codes
> > 
> > 	write(disk_file, "foo", 3);
> > 
> > not while(..) blah around it.

In general I/O to tty devices and other character mode devices was
where you definitely needed to check for EINTR/EAGAIN because that was
the place where historically Unix systems would interrupt system calls
--- e.g., a user typing control-Z, for example.

And in general writes to file systems and block devices in *general*
were never interrupted by signals, although that was always a
non-portable assumption.

So I've always subscribed to the "be liberal in what you accept,
conservative in what you send" rule of thumb.  Which is to say, any
programs *I* write I'll in general always check for EINTR/EAGAIN and
check for partial writes, but in general, as a kernel program I try to
adhere to the long-standing Unix trandition for disk based files.

This does beg the question about whether firmware devices are more
like tty devices or block devices or files, though.  If before signals
never caused them to return EINTR/EAGAIN, then it's probably best to
not break backwards compatbility.

That being said, not that you also have the option of using
-ERESTARTNOINTR (always restart the system call, regardless of how the
sighandle flags were set), and -ERESTARTNOHAND (restart the system
call always if there was no signal handler and the process was not
killed), in addition to -ERESTARTSYS.  So that might be another option
that's fairly easy to implement or experiment with.

       	      	      		   - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-06 22:11                           ` Theodore Ts'o
@ 2017-06-07  0:22                             ` Luis R. Rodriguez
  2017-06-07  4:56                               ` Andy Lutomirski
  0 siblings, 1 reply; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-07  0:22 UTC (permalink / raw)
  To: Theodore Ts'o, Luis R. Rodriguez, Alan Cox, linux-fsdevel,
	Stephen Boyd, Li, Yi, Dmitry Torokhov, Peter Zijlstra,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Andy Lutomirski, Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org

On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> > > Yep everyone codes
> > > 
> > > 	write(disk_file, "foo", 3);
> > > 
> > > not while(..) blah around it.
> 
> In general I/O to tty devices and other character mode devices was
> where you definitely needed to check for EINTR/EAGAIN because that was
> the place where historically Unix systems would interrupt system calls
> --- e.g., a user typing control-Z, for example.
> 
> And in general writes to file systems and block devices in *general*
> were never interrupted by signals, although that was always a
> non-portable assumption.
> 
> So I've always subscribed to the "be liberal in what you accept,
> conservative in what you send" rule of thumb.  Which is to say, any
> programs *I* write I'll in general always check for EINTR/EAGAIN and
> check for partial writes, but in general, as a kernel program I try to
> adhere to the long-standing Unix trandition for disk based files.

OK so userspace should consider checking for EINTR/EAGAIN.  On the kernel front
though we we do *strive* to go by the old unix tradition -- there are
exceptions though, of course.

> This does beg the question about whether firmware devices are more
> like tty devices or block devices or files, though.

And here you raise the analogy to see if its *worthy* to break the old unix
tradition, the tty example is a case that we seem to accept as reasonable
for an exception, clearly.

This help, thanks!

So, "firmware devices" is a misnomer, the firmware API does direct FS lookup
and only if that fails and the distribution enabled the fallback mechanism will
it be used if the file was not found on /lib/firmware/ paths. So the syfs
interface we are evaluating here is this fallback mechanism. The "firmware
device" then is really more a user of the firmware API, and these do vary
widely. In fact it recent developments with the "driver data API" generalize
the interface given things outside our typical device drivers for what we know
as old "firmware" are looking to use the firmware API for looking for files
from userspace. This in fact already has happened, it was just subtle: we look
for default EEPROMs, configururation files, microcode updates, and soon we'll
want to replace the userspace wireless CRDA tool for the regulatory database
with a simple file lookup once we get firmware signing upstream. So -- more and
more the API is being used for general file lookups.

Which type of drivers request these files ? It all varies across the board,
and we have really early things like CPU microcode files (so we enable built-in
files), regular old firmware files for things like networking drivers,
subsystem configuration stuff (for the wireless regulatory database), and it
seems now *huge* (100s of MiBs I'm told) for remote-proc and FPGAs. The FPGA
use case is one use case where having the uploader use a while loop makes more
sense. Drivers may also want to have more fine control to the buffer where the
"driver data" gets stashed into, so the API request_firmware_into_buf() was
added for this purpose. The FPGA devices seem to want to use this breakdown
mechanism as well. Although the remote-proc folks seem to be relying on the
sysfs interface as a default mechanism.

So it would seem to make sense to support the loop thing since some
files these days *can* be really big. I think we can easily address
this by relying on the killable signal (SIGKILL), rather than capturing
SIGINT (CTRL-C).

One problem is we had supported SIGINT before, it was reported however we
never gave back to userspace the fact that a signal was captured, we always
returned EAGAIN, so usersace did not know WTF was going on. The reporter wanted
to detect this and wanted to get us to return the same value the wait call
passed to the firmware API, -ERESTARTSYS. Hence this patch, however since
-ERESTARTSYS is special and folks may just restart the syscall always when this
happens, one question is if we should return EINTR instead of today's EAGAIN.

> If before signals
> never caused them to return EINTR/EAGAIN, then it's probably best to
> not break backwards compatbility.

There was a time where did not have signal, and always just used to return
-ENOMEM on failure. After signals support was added we still were
returning -ENOMEM for a while. It was only later that we started capturing
the actual signal, however this was masked as EAGAIN, for no precise good
reason, I suppose just lack of review.

The patch in question wants us to return -ERESTARTSYS so that userspace
can restart the upload. But the issue was that the sysfs firmware loader
was killed by a signal as well, so this is why Dmitry noted that we can
just fix this issue by just making the wait killable with SIGKILL only.
The swait killable patch I supplied upon Dmitry's suggestion as a replacement
would do away with capturing SIGINT but allow SIGKILL then.

Since EAGAIN was always returned even if a signal was captured I'm
inclined to take the position we really never gave userspace a proper
clue about the signal. Additionally I cannot see why userspace would
rely on SIGINT working *only* but not *SIGKILL*.

The risk here I think is if userspace ever *did* rely on SIGINT for the sysfs
interface as userspace functional API. If this is not worth breaking
then I suppose we are stuck with the current wait. If we do that I'd
say we return EINTR, based on what I have read so far.

If doing away with SIGINT but keeping SIGKILL is rather sane then I'd go
with the approach of the new swait killable + returning -EINTR.

> That being said, note that you also have the option of using
> -ERESTARTNOINTR (always restart the system call, regardless of how the
> sighandle flags were set), and -ERESTARTNOHAND (restart the system
> call always if there was no signal handler and the process was not
> killed), in addition to -ERESTARTSYS.

We rely on swait, and swait right now only uses -ERESTARTSYS. Are
you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
or -ERESTARTNOHAND if we see fit for some future functionality / need ?

> So that might be another option
> that's fairly easy to implement or experiment with.

Thanks!

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  0:22                             ` Luis R. Rodriguez
@ 2017-06-07  4:56                               ` Andy Lutomirski
  2017-06-07  6:25                                 ` Dmitry Torokhov
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2017-06-07  4:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Theodore Ts'o, Alan Cox, Linux FS Devel, Stephen Boyd, Li, Yi,
	Dmitry Torokhov, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages), Andy Lutomirski,
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org

On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>
> We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> or -ERESTARTNOHAND if we see fit for some future functionality / need ?

I think that has essentially nothing to do with swait.  User code does
some syscall.  That syscall triggers a firmware load.  The caller gets
a signal.  If you're going to let firmware load get interrupted, you
need to consider what the syscall is.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  4:56                               ` Andy Lutomirski
@ 2017-06-07  6:25                                 ` Dmitry Torokhov
  2017-06-07 12:25                                   ` Alan Cox
  2017-06-09  1:14                                   ` Andy Lutomirski
  0 siblings, 2 replies; 18+ messages in thread
From: Dmitry Torokhov @ 2017-06-07  6:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Theodore Ts'o, Alan Cox, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH,
	Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells, Peter Jones, Hans de G oede,
	linux-kernel@vger.kernel.org

On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
> >
> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
> 
> I think that has essentially nothing to do with swait.  User code does
> some syscall.  That syscall triggers a firmware load.  The caller gets
> a signal.  If you're going to let firmware load get interrupted, you
> need to consider what the syscall is.

I think it is way too complicated and I do not think driver writers will
stand a chance of implementing this correctly, given that often firmware
load might be triggered indirectly and by multitude of syscalls.

What's wrong with saying that the only way to interrupt firmware loading
is to kill the process? So ctrl-c will no longer interrupt it, but I do
not think that ease of aborting firmware update is primary goal here. I
consider simple is good here.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  6:25                                 ` Dmitry Torokhov
@ 2017-06-07 12:25                                   ` Alan Cox
  2017-06-07 17:15                                     ` Luis R. Rodriguez
  2017-06-09  1:14                                   ` Andy Lutomirski
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Cox @ 2017-06-07 12:25 UTC (permalink / raw)
  To: Dmitry Torokhov, Andy Lutomirski
  Cc: Luis R. Rodriguez, Theodore Ts'o, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH,
	Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells, Peter Jones, Hans de G oede,
	linux-kernel@vger.kernel.org

> What's wrong with saying that the only way to interrupt firmware
> loading is to kill the process? So ctrl-c will no longer interrupt
> it, but I do not think that ease of aborting firmware update is
> primary goal here. I consider simple is good here.

Agreed 100%. The user process did not ask for firmware load, it asked
for an I/O operation. Semantically it should appear as if someone else
did the firmware load and it just had to wait for it to happen.

Alan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07 12:25                                   ` Alan Cox
@ 2017-06-07 17:15                                     ` Luis R. Rodriguez
  0 siblings, 0 replies; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-07 17:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dmitry Torokhov, Andy Lutomirski, Luis R. Rodriguez,
	Theodore Ts'o, Linux FS Devel, Stephen Boyd, Li, Yi,
	Peter Zijlstra, Jonathan Corbet, Eric W. Biederman,
	Michael Kerrisk (man-pages), Greg KH, Fuzzey, Martin, Linux API,
	Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Moritz Fischer, Petr Mladek, Johannes Berg,
	Emmanuel Grumbach, Luca Coelho, Kalle Valo, Linus Torvalds,
	Kees Cook, AKASHI Takahiro, David Howells, Peter Jones,
	Hans de G oede, linux-kernel@vger.kernel.org

On Wed, Jun 07, 2017 at 01:25:51PM +0100, Alan Cox wrote:
> > What's wrong with saying that the only way to interrupt firmware
> > loading is to kill the process? So ctrl-c will no longer interrupt
> > it, but I do not think that ease of aborting firmware update is
> > primary goal here. I consider simple is good here.
> 
> Agreed 100%. The user process did not ask for firmware load, it asked
> for an I/O operation. Semantically it should appear as if someone else
> did the firmware load and it just had to wait for it to happen.

Fine by me ! Will wrap up the patch for the new killable swait then.
I suppose noting it as a stable fix is worth it given the known issues
with for example Android killing loaders unexpectedly.

Unless I hear otherwise I'll also provide a follow up to return -EINTR instead
of -EAGAIN if swait returned -ERESTARTSYS, this way at least userspace could
tell a signal was definitely received. I *don't* think that follow up is
required for stable though.

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07  6:25                                 ` Dmitry Torokhov
  2017-06-07 12:25                                   ` Alan Cox
@ 2017-06-09  1:14                                   ` Andy Lutomirski
  2017-06-09  1:33                                     ` Luis R. Rodriguez
  1 sibling, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2017-06-09  1:14 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Andy Lutomirski, Luis R. Rodriguez, Theodore Ts'o, Alan Cox,
	Linux FS Devel, Stephen Boyd, Li, Yi, Peter Zijlstra,
	Jonathan Corbet, Eric W. Biederman, Michael Kerrisk (man-pages),
	Greg KH, Fuzzey, Martin, Linux API, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de G oede, linux-kernel@vger.kernel.org

On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov
<dmitry.torokhov@gmail.com> wrote:
> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>> >
>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>>
>> I think that has essentially nothing to do with swait.  User code does
>> some syscall.  That syscall triggers a firmware load.  The caller gets
>> a signal.  If you're going to let firmware load get interrupted, you
>> need to consider what the syscall is.
>
> I think it is way too complicated and I do not think driver writers will
> stand a chance of implementing this correctly, given that often firmware
> load might be triggered indirectly and by multitude of syscalls.
>

That's what I meant, but I said it unclearly.  I meant that, if we're
going to start allowing interruption, we would need to audit all the
callers.  Ugh.

I suppose we could have request_firmware_interruptable(), but that
seems like it's barely worth it.

--Andy

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  1:14                                   ` Andy Lutomirski
@ 2017-06-09  1:33                                     ` Luis R. Rodriguez
  2017-06-09 21:29                                       ` Luis R. Rodriguez
  0 siblings, 1 reply; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09  1:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Torokhov, Theodore Ts'o, Alan Cox, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH,
	Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, atull, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells, Peter Jones, Hans de G oede,
	linux-kernel@vger.kernel.org, Julia Lawall, Luis R. Rodriguez

On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, Jun 6, 2017 at 11:25 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>> On Tue, Jun 06, 2017 at 09:56:47PM -0700, Andy Lutomirski wrote:
>>> On Tue, Jun 6, 2017 at 5:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> > On Tue, Jun 06, 2017 at 06:11:51PM -0400, Theodore Ts'o wrote:
>>> >> On Tue, Jun 06, 2017 at 06:47:34PM +0200, Luis R. Rodriguez wrote:
>>> >> > On Tue, Jun 06, 2017 at 03:53:16PM +0100, Alan Cox wrote:
>>> >
>>> > We rely on swait, and swait right now only uses -ERESTARTSYS. Are
>>> > you saying we could mask out -ERESTARTSYS and map it to -ERESTARTNOINTR
>>> > or -ERESTARTNOHAND if we see fit for some future functionality / need ?
>>>
>>> I think that has essentially nothing to do with swait.  User code does
>>> some syscall.  That syscall triggers a firmware load.  The caller gets
>>> a signal.  If you're going to let firmware load get interrupted, you
>>> need to consider what the syscall is.
>>
>> I think it is way too complicated and I do not think driver writers will
>> stand a chance of implementing this correctly, given that often firmware
>> load might be triggered indirectly and by multitude of syscalls.
>>
>
> That's what I meant, but I said it unclearly.  I meant that, if we're
> going to start allowing interruption, we would need to audit all the
> callers.  Ugh.

There are actually two audits worth evaluating if what we've concluded
is fair game:

  a) firmware sync calls on interruptible paths
  b) use of swait / old interruptible waits on sysfs paths

As for a) we drove tons of code away from using sync, request
firmware, and on async firmware the return value is lost, we only can
currently know if a failure of some sort occurred. If that push to
async failed we also scared away tons of drivers to use
request_firmware() calls on init and later incorrectly on probe due to
serialized init + probe delay on boot. From what I recall my last
Coccinelle evil-monster-hunt, I did indeed find quite a bit of drivers
still still relying on sync firmware which ultimately revealed use on
a probe path. The signal however was only effective as of commit
0cb64249ca500 ("firmware_loader: abort request if wait_for_completion
is interrupted") merged as of v4.0. Creating awareness of the issue
seems fair, but I don't think its worth a huge fly swatter.

I have no information to contribute for b) other than I was reluctant
to even consider it.

> I suppose we could have request_firmware_interruptable(), but that
> seems like it's barely worth it.

I don't think that's worth it, given the signal was effective only as
of v4.0, we already had a big push away from sync requests, and also
had the "don't use request firmware" on init scare which also
propagated to probe later.

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  1:33                                     ` Luis R. Rodriguez
@ 2017-06-09 21:29                                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09 21:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Torokhov, Theodore Ts'o, Alan Cox, Linux FS Devel,
	Stephen Boyd, Li, Yi, Peter Zijlstra, Jonathan Corbet,
	Eric W. Biederman, Michael Kerrisk (man-pages), Greg KH,
	Fuzzey, Martin, Linux API, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, Moritz Fischer,
	Petr Mladek, Johannes Berg, Emmanuel Grumbach, Luca Coelho,
	Kalle Valo, Linus Torvalds, Kees Cook, AKASHI Takahiro,
	David Howells, Peter Jones, Hans de G oede,
	linux-kernel@vger.kernel.org, Julia Lawall, Luis R. Rodriguez,
	atull

On Thu, Jun 8, 2017 at 6:33 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Thu, Jun 8, 2017 at 6:14 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> That's what I meant, but I said it unclearly.  I meant that, if we're
>> going to start allowing interruption, we would need to audit all the
>> callers.  Ugh.
>
> There are actually two audits worth evaluating if what we've concluded
> is fair game:
>
>   a) firmware sync calls on interruptible paths
>   b) use of swait / old interruptible waits on sysfs paths

And as I noted in the other thread -- another possible issue could be
any swait / interruptable wait on init or probe. Provided any child
completes and the kernel code for wait handler does abort, that
request would be terminated. This could for instance happen at bootup
as modules load and any child from the loader terminates.

We already have Coccinelle grammar to hunt for "though shall not
request firmware on init or probe", such SmPL grammar could be in turn
be repruposed to hunt for these types of conditions.

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
       [not found]         ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg@mail.gmail.com>
       [not found]           ` <87fufr3mdy.fsf@xmission.com>
@ 2017-06-07 17:08           ` Luis R. Rodriguez
  2017-06-07 17:54             ` Martin Fuzzey
  1 sibling, 1 reply; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-07 17:08 UTC (permalink / raw)
  To: Fuzzey, Martin, linux-fsdevel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov
  Cc: Luis R. Rodriguez, Michael Kerrisk (man-pages), Linux API,
	Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull,
	Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach,
	Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook,
	AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede,
	linux-kernel@vger.kernel.org

On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
> 1) Android init calls write() on the sysfs file
> 2) The sysfs .store() callback registered by a driver is called
> 3) The driver calls request_firmware()
> 4) request_firmware() sends the firmware load request to userspace and
> calls wait_for_completion_interruptible()

Martin, just for completeness on documenting on the commit log of the next
swait proposed fix for this -- what signal did the process get from which you
note the child dies below ? Exactly what in Android sent this signal ?

> 5) A child dies and raises SIGCHLD
> 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07 17:08           ` Luis R. Rodriguez
@ 2017-06-07 17:54             ` Martin Fuzzey
  2017-06-09  1:10               ` Luis R. Rodriguez
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Fuzzey @ 2017-06-07 17:54 UTC (permalink / raw)
  To: Luis R. Rodriguez, linux-fsdevel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov
  Cc: Michael Kerrisk (man-pages), Linux API, Peter Zijlstra, Greg KH,
	Daniel Wagner, David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org

On 07/06/17 19:08, Luis R. Rodriguez wrote:
> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>> 1) Android init calls write() on the sysfs file
>> 2) The sysfs .store() callback registered by a driver is called
>> 3) The driver calls request_firmware()
>> 4) request_firmware() sends the firmware load request to userspace and
>> calls wait_for_completion_interruptible()
> Martin, just for completeness on documenting on the commit log of the next
> swait proposed fix for this -- what signal did the process get from which you
> note the child dies below ? Exactly what in Android sent this signal ?

Android didn't send the signal, the kernel did (SIGCHLD).

Like this:

1) Android init (pid=1) fork()s (say pid=42) [this child process is 
totally unrelated to firmware loading]
2) Android init (pid=1) does a write() on a (driver custom) sysfs file 
which ends up calling request_firmware() kernel side
3) The firmware loading fallback mechanism is used, the request is sent 
to userspace and pid 1 waits in the kernel on wait_*
4) before firmware loading completes pid 42 dies (for any reason - in my 
case normal termination)
5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which 
causes -ERESTARTSYS to be returned from wait_*


Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-07 17:54             ` Martin Fuzzey
@ 2017-06-09  1:10               ` Luis R. Rodriguez
  2017-06-09  1:57                 ` Luis R. Rodriguez
  0 siblings, 1 reply; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09  1:10 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: linux-fsdevel, Alan Cox, Ted Ts'o, Andy Lutomirski,
	Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API,
	Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull,
	Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach,
	Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook,
	AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede,
	linux-kernel@vger.kernel.org

On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <mfuzzey@parkeon.com> wrote:
> On 07/06/17 19:08, Luis R. Rodriguez wrote:
>>
>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>>>
>>> 1) Android init calls write() on the sysfs file
>>> 2) The sysfs .store() callback registered by a driver is called
>>> 3) The driver calls request_firmware()
>>> 4) request_firmware() sends the firmware load request to userspace and
>>> calls wait_for_completion_interruptible()
>>
>> Martin, just for completeness on documenting on the commit log of the next
>> swait proposed fix for this -- what signal did the process get from which
>> you
>> note the child dies below ? Exactly what in Android sent this signal ?
>
>
> Android didn't send the signal, the kernel did (SIGCHLD).
>
> Like this:
>
> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> unrelated to firmware loading]
> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> ends up calling request_firmware() kernel side
> 3) The firmware loading fallback mechanism is used, the request is sent to
> userspace and pid 1 waits in the kernel on wait_*
> 4) before firmware loading completes pid 42 dies (for any reason - in my
> case normal termination)

Interesting, could one interpretation here be that the process
successfully finishing + the signal being sent beats out the timing of
the firmware_class syfs code detecting that the write completed ?

> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
> causes -ERESTARTSYS to be returned from wait_*

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  1:10               ` Luis R. Rodriguez
@ 2017-06-09  1:57                 ` Luis R. Rodriguez
  2017-06-09  7:40                   ` Martin Fuzzey
  0 siblings, 1 reply; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09  1:57 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski,
	Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API,
	Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull,
	Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach,
	Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook,
	AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede,
	linux-kernel@vger.kernel.org, Luis R. Rodriguez

On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Wed, Jun 7, 2017 at 10:54 AM, Martin Fuzzey <mfuzzey@parkeon.com> wrote:
>> On 07/06/17 19:08, Luis R. Rodriguez wrote:
>>>
>>> On Thu, May 25, 2017 at 10:28:38AM +0200, Fuzzey, Martin wrote:
>>>>
>>>> 1) Android init calls write() on the sysfs file
>>>> 2) The sysfs .store() callback registered by a driver is called
>>>> 3) The driver calls request_firmware()
>>>> 4) request_firmware() sends the firmware load request to userspace and
>>>> calls wait_for_completion_interruptible()
>>>
>>> Martin, just for completeness on documenting on the commit log of the next
>>> swait proposed fix for this -- what signal did the process get from which
>>> you
>>> note the child dies below ? Exactly what in Android sent this signal ?
>>
>>
>> Android didn't send the signal, the kernel did (SIGCHLD).
>>
>> Like this:
>>
>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
>> unrelated to firmware loading]
>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
>> ends up calling request_firmware() kernel side
>> 3) The firmware loading fallback mechanism is used, the request is sent to
>> userspace and pid 1 waits in the kernel on wait_*
>> 4) before firmware loading completes pid 42 dies (for any reason - in my
>> case normal termination)

Martin just to be clear, by "normal case termination" do you mean
completing successfully ?? Ie the firmware actually did make it onto
the device ?

> Interesting, could one interpretation here be that the process
> successfully finishing + the signal being sent beats out the timing of
> the firmware_class syfs code detecting that the write completed ?
>
>> 5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
>> causes -ERESTARTSYS to be returned from wait_*
>
>   Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  1:57                 ` Luis R. Rodriguez
@ 2017-06-09  7:40                   ` Martin Fuzzey
  2017-06-09 21:12                     ` Luis R. Rodriguez
  2017-06-09 22:55                     ` Luis R. Rodriguez
  0 siblings, 2 replies; 18+ messages in thread
From: Martin Fuzzey @ 2017-06-09  7:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux FS Devel, Alan Cox, Ted Ts'o, Andy Lutomirski,
	Dmitry Torokhov, Michael Kerrisk (man-pages), Linux API,
	Peter Zijlstra, Greg KH, Daniel Wagner, David Woodhouse, jewalt,
	rafal, Arend Van Spriel, Rafael J. Wysocki, Li, Yi, atull,
	Moritz Fischer, Petr Mladek, Johannes Berg, Emmanuel Grumbach,
	Luca Coelho, Kalle Valo, Linus Torvalds, Kees Cook,
	AKASHI Takahiro, David Howells, Peter Jones, Hans de Goede,
	linux-kernel@vger.kernel.org

On 09/06/17 03:57, Luis R. Rodriguez wrote:
> On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> Android didn't send the signal, the kernel did (SIGCHLD).
>>>
>>> Like this:
>>>
>>> 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
>>> unrelated to firmware loading]
>>> 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
>>> ends up calling request_firmware() kernel side
>>> 3) The firmware loading fallback mechanism is used, the request is sent to
>>> userspace and pid 1 waits in the kernel on wait_*
>>> 4) before firmware loading completes pid 42 dies (for any reason - in my
>>> case normal termination)
> Martin just to be clear, by "normal case termination" do you mean
> completing successfully ?? Ie the firmware actually did make it onto
> the device ?

The firmware did *not* make it onto the device since the 
request_firmware() call returned an error
(the code that would have transfered it to the device is only executed 
following a successful request_firmware)

The process that terminates normally is unrelated to firmware loading as 
I said above.

The only things that matter are:
- It is a child process of the process that calls request_firmware()
- It terminates *while* the the wait_ is still in progress


Here is a way of reproducing the problem using the test_firmware module 
(which I only just saw) on normal linux with no Android or custom driver


#!/bin/sh
set -e

# Make sure the system firmware loader doesn't get in the way
/etc/init.d/udev stop

modprobe test_firmware

DIR=/sys/devices/virtual/misc/test_firmware

echo 10 >/sys/class/firmware/timeout;
sleep 2 &
echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;



If run with the "sleep 2 &" it terminates after 2 seconds
If the sleep is commented it runs for the expected 10 seconds (the 
firmware loading timeout)

Since the sleep process is a child of the script process requesting a 
firmware load its death causes a SIGCHLD causing request_firmware() to 
abort prematurely.


Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  7:40                   ` Martin Fuzzey
@ 2017-06-09 21:12                     ` Luis R. Rodriguez
  2017-06-09 22:55                     ` Luis R. Rodriguez
  1 sibling, 0 replies; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09 21:12 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: Luis R. Rodriguez, Linux FS Devel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org

On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > > 
> > > > Like this:
> > > > 
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
> 
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
> 
> The process that terminates normally is unrelated to firmware loading as I
> said above.
> 
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
> 
> 
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
> 
> 
> #!/bin/sh
> set -e
> 
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
> 
> modprobe test_firmware
> 
> DIR=/sys/devices/virtual/misc/test_firmware
> 
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
> 
> 
> 
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
> 
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

Thanks this could mean we also *should* trigger a failure if init is issuing
modprobe on a series of drivers and one completes before another while
request_firmware() is called on init or probe on a subsequent driver. If true
I'm surprised this never was reported back when the fallback mechanism was
popular, I suppose it was not an issue given most firmware *was* present on
/lib/firmware/ and the direct filesystem lookup first step always found the
firmware first, so this would only be an issue for folks relying on the
fallback mechanism exclusively.

Will include a test case based on your above script. Thanks!

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback
  2017-06-09  7:40                   ` Martin Fuzzey
  2017-06-09 21:12                     ` Luis R. Rodriguez
@ 2017-06-09 22:55                     ` Luis R. Rodriguez
  1 sibling, 0 replies; 18+ messages in thread
From: Luis R. Rodriguez @ 2017-06-09 22:55 UTC (permalink / raw)
  To: Martin Fuzzey
  Cc: Luis R. Rodriguez, Linux FS Devel, Alan Cox, Ted Ts'o,
	Andy Lutomirski, Dmitry Torokhov, Michael Kerrisk (man-pages),
	Linux API, Peter Zijlstra, Greg KH, Daniel Wagner,
	David Woodhouse, jewalt, rafal, Arend Van Spriel,
	Rafael J. Wysocki, Li, Yi, atull, Moritz Fischer, Petr Mladek,
	Johannes Berg, Emmanuel Grumbach, Luca Coelho, Kalle Valo,
	Linus Torvalds, Kees Cook, AKASHI Takahiro, David Howells,
	Peter Jones, Hans de Goede, linux-kernel@vger.kernel.org

On Fri, Jun 09, 2017 at 09:40:47AM +0200, Martin Fuzzey wrote:
> On 09/06/17 03:57, Luis R. Rodriguez wrote:
> > On Thu, Jun 8, 2017 at 6:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > Android didn't send the signal, the kernel did (SIGCHLD).
> > > > 
> > > > Like this:
> > > > 
> > > > 1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
> > > > unrelated to firmware loading]
> > > > 2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
> > > > ends up calling request_firmware() kernel side
> > > > 3) The firmware loading fallback mechanism is used, the request is sent to
> > > > userspace and pid 1 waits in the kernel on wait_*
> > > > 4) before firmware loading completes pid 42 dies (for any reason - in my
> > > > case normal termination)
> > Martin just to be clear, by "normal case termination" do you mean
> > completing successfully ?? Ie the firmware actually did make it onto
> > the device ?
> 
> The firmware did *not* make it onto the device since the request_firmware()
> call returned an error
> (the code that would have transfered it to the device is only executed
> following a successful request_firmware)
> 
> The process that terminates normally is unrelated to firmware loading as I
> said above.
> 
> The only things that matter are:
> - It is a child process of the process that calls request_firmware()
> - It terminates *while* the the wait_ is still in progress
> 
> 
> Here is a way of reproducing the problem using the test_firmware module
> (which I only just saw) on normal linux with no Android or custom driver
> 
> 
> #!/bin/sh
> set -e
> 
> # Make sure the system firmware loader doesn't get in the way
> /etc/init.d/udev stop
> 
> modprobe test_firmware
> 
> DIR=/sys/devices/virtual/misc/test_firmware
> 
> echo 10 >/sys/class/firmware/timeout;
> sleep 2 &
> echo -n "/some/non/existing/file.bin" > "$DIR"/trigger_request;
> 
> 
> 
> If run with the "sleep 2 &" it terminates after 2 seconds
> If the sleep is commented it runs for the expected 10 seconds (the firmware
> loading timeout)
> 
> Since the sleep process is a child of the script process requesting a
> firmware load its death causes a SIGCHLD causing request_firmware() to abort
> prematurely.

BTW I've run some more tests and if you use request_firmware_nowait() the above
issue also is avoidable. Also, if you use the custom fallback (uevent parameter
is false) the timeout is max value always. The only way to kill the custom
requests is through the echo -1 > loading or the max timeout triggers (don't
rely on that!). The custom interface was a bad idea though so best to just
avoid it.

Anyway I can reproduce in my tests now, will add this testcase as well.
The stable fixes will be sent as well.

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-06-09 22:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170524205658.GK8951@wotan.suse.de>
     [not found] ` <20170524214027.7775-1-mcgrof@kernel.org>
     [not found]   ` <CALCETrXUrirO-vg3M+MGhn=0gZTwx0phsRDS4TCwWWgNYC6RsA@mail.gmail.com>
     [not found]     ` <CAB=NE6VENGmTEZ3_8EWmv6bhT81k=szg0_POWJm2SP+V21CeBw@mail.gmail.com>
     [not found]       ` <CALCETrU4__YUGk36PN=FbuEf0SBaTrxQQqm4sWs2NrZ+6WN7jA@mail.gmail.com>
     [not found]         ` <CANh8QzwPb_+RKs5QVt7mdFk8h_rOMVS3j9m0OADgvzBtNqBBLg@mail.gmail.com>
     [not found]           ` <87fufr3mdy.fsf@xmission.com>
     [not found]             ` <20170526194640.GS8951@wotan.suse.de>
     [not found]               ` <CAKdAkRTrcTVOAP5GK-R=Au_tL5WqSn5UkQEzNe5NcCWXS8mbtA@mail.gmail.com>
     [not found]                 ` <CAB=NE6Wh9inuA-R9-zegnvJV=s_6o_doMNDOhBVhYn4ZQ_Ku5Q@mail.gmail.com>
     [not found]                   ` <20170526215518.GB40877@dtor-ws>
     [not found]                     ` <20170605202410.GQ8951@wotan.suse.de>
     [not found]                       ` <59367025.3020901@parkeon.com>
     [not found]                         ` <20170606163401.GA27288@wotan.suse.de>
2017-06-06 17:52                           ` [PATCH v2] firmware: fix sending -ERESTARTSYS due to signal on fallback Luis R. Rodriguez
     [not found]                       ` <1496760796.5682.48.camel@linux.intel.com>
     [not found]                         ` <20170606164734.GB27288@wotan.suse.de>
2017-06-06 17:54                           ` Luis R. Rodriguez
2017-06-06 22:11                           ` Theodore Ts'o
2017-06-07  0:22                             ` Luis R. Rodriguez
2017-06-07  4:56                               ` Andy Lutomirski
2017-06-07  6:25                                 ` Dmitry Torokhov
2017-06-07 12:25                                   ` Alan Cox
2017-06-07 17:15                                     ` Luis R. Rodriguez
2017-06-09  1:14                                   ` Andy Lutomirski
2017-06-09  1:33                                     ` Luis R. Rodriguez
2017-06-09 21:29                                       ` Luis R. Rodriguez
2017-06-07 17:08           ` Luis R. Rodriguez
2017-06-07 17:54             ` Martin Fuzzey
2017-06-09  1:10               ` Luis R. Rodriguez
2017-06-09  1:57                 ` Luis R. Rodriguez
2017-06-09  7:40                   ` Martin Fuzzey
2017-06-09 21:12                     ` Luis R. Rodriguez
2017-06-09 22:55                     ` Luis R. Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).