netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v0 0/2] skbuff: Fix applications not being woken for errors
@ 2018-03-12 23:10 Vinicius Costa Gomes
  2018-03-12 23:10 ` [RFC PATCH v0 1/2] selftests/txtimestamp: Add more configurable parameters Vinicius Costa Gomes
  2018-03-12 23:10 ` [RFC PATCH v0 2/2] skbuff: Notify errors with sk_error_report() Vinicius Costa Gomes
  0 siblings, 2 replies; 6+ messages in thread
From: Vinicius Costa Gomes @ 2018-03-12 23:10 UTC (permalink / raw)
  To: netdev; +Cc: Vinicius Costa Gomes, randy.e.witt, davem

Hi,

This is actually a "bug report"-RFC instead of the more usual "new
feature"-RFC.

We are developing an application that uses TX hardware timestamping to
make some measurements, and during development Randy Witt initially
reported that the application poll() never unblocked when TX hardware
timestamping was enabled.

After some investigation, it turned out the problem wasn't only
exclusive to hardware timestamping, and could be reproduced with
software timestamping.

Applying patch (1), and running txtimestamp like this, for example:

$ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F

('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
packets for each test, '-D' to remove the delay between packets, '-l
1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
wait forever)

will cause the application to become stuck in the poll() call in most
of the times. (Note: I couldn't reproduce the issue running against an
address that is routed through loopback.)

Another interesting fact is that if the POLLIN event is added to the
poll() .events, poll() no longer becomes stuck, and more interestingly
the returned event in .revents is only POLLERR.

After a few debugging sessions, we got to 'sock_queue_err_skb()' and
how it notifies applications of the error just enqueued. Changing it
to use 'sk->sk_error_report()', fixes the issue for hardware and
software timestamping. That is patch (2).

The "solution" proposed in patch (2) looks like too big a hammer, if
it's not, then it seems that this problem existed since a long time
ago (pre git) and was uncommon for folks to reach the necessary
conditions to trigger it (my hypothesis is that only triggers when the
error is reported from a different task context than the application).

Am I missing something here?


Cheers,
--


Vinicius Costa Gomes (2):
  selftests/txtimestamp: Add more configurable parameters
  skbuff: Notify errors with sk_error_report()

 net/core/skbuff.c                                   |  2 +-
 .../selftests/networking/timestamping/txtimestamp.c | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

--
2.16.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-13 17:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-12 23:10 [RFC PATCH v0 0/2] skbuff: Fix applications not being woken for errors Vinicius Costa Gomes
2018-03-12 23:10 ` [RFC PATCH v0 1/2] selftests/txtimestamp: Add more configurable parameters Vinicius Costa Gomes
2018-03-12 23:10 ` [RFC PATCH v0 2/2] skbuff: Notify errors with sk_error_report() Vinicius Costa Gomes
2018-03-13  0:23   ` Eric Dumazet
2018-03-13 17:20     ` Vinicius Costa Gomes
2018-03-13 17:27       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).