From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com ([134.134.136.126]:16614 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751289AbeCNThy (ORCPT ); Wed, 14 Mar 2018 15:37:54 -0400 From: Vinicius Costa Gomes To: Willem de Bruijn Cc: Network Development , randy.e.witt@intel.com, David Miller , Eric Dumazet Subject: Re: [PATCH net-next 0/2] skbuff: Fix applications not being woken for errors In-Reply-To: References: <20180313203519.8638-1-vinicius.gomes@intel.com> Date: Wed, 14 Mar 2018 12:37:52 -0700 Message-ID: <878tauwir3.fsf@intel.com> MIME-Version: 1.0 Content-Type: text/plain Sender: netdev-owner@vger.kernel.org List-ID: Hi, Willem de Bruijn writes: >> Another interesting fact is that if the POLLIN event is added to the >> poll() .events, poll() no longer becomes stuck, > > The process has registered interest only in POLLIN, which the call to > sk_data_read (sock_def_readable) will trigger. > >> and more interestingly >> the returned event in .revents is only POLLERR. > > datagram_poll will set (E)POLLERR based on non-empty sk_error_queue. > >> After a few debugging sessions, we got to 'sock_queue_err_skb()' and >> how it notifies applications of the error just enqueued. Changing it >> to use 'sk->sk_error_report()', fixes the issue for hardware and >> software timestamping. That is patch (2). >> >> The "solution" proposed in patch (2) looks like too big a hammer, > > It looks fine to me. POLLERR is returned regardless of the mask a > process sets up in pollfd.events. So waking with sk_error_report > will fix this while still waking callers waiting on POLLIN. > > Note that on sock_dequeue_err_skb, if another notification (of the > right kind) is waiting, sk_error_report is already called instead of > sk_data_ready. Thank you all who did confirm that this was the right fix. > > This should perhaps go to net, instead of net-next (though not the > test). Will propose it to net, what I am thinking is that now that a few TSN features are upstream, applications that use hardware timestamping (the easiest way to trigger this bug) could be more common. > > If resending, a small nit in the test: please keep the alphabetical > order in getopt. The filepath also looks a bit fishy, but git am applied > the mbox from patchwork without issue. Will send a second version. Thanks, -- Vinicius