Netdev List
 help / color / mirror / Atom feed
* Re: Re: [PATCH net v2] NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
From: duoming @ 2022-05-18  4:39 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-kernel, kuba, davem, edumazet, pabeni, gregkh,
	alexander.deucher, broonie, netdev
In-Reply-To: <68ccef70-ef30-8f53-6ec5-17ce5815089c@linaro.org>

Hello,

On Tue, 17 May 2022 17:28:51 +0200 Krzysztof wrote:

> >>> There are sleep in atomic context bugs when the request to secure
> >>> element of st21nfca is timeout. The root cause is that kzalloc and
> >>> alloc_skb with GFP_KERNEL parameter and mutex_lock are called in
> >>> st21nfca_se_wt_timeout which is a timer handler. The call tree shows
> >>> the execution paths that could lead to bugs:
> >>>
> >>>    (Interrupt context)
> >>> st21nfca_se_wt_timeout
> >>>   nfc_hci_send_event
> >>>     nfc_hci_hcp_message_tx
> >>>       kzalloc(..., GFP_KERNEL) //may sleep
> >>>       alloc_skb(..., GFP_KERNEL) //may sleep
> >>>       mutex_lock() //may sleep
> >>>
> >>> This patch changes allocation mode of kzalloc and alloc_skb from
> >>> GFP_KERNEL to GFP_ATOMIC and changes mutex_lock to spin_lock in
> >>> order to prevent atomic context from sleeping.
> >>>
> >>> Fixes: 2130fb97fecf ("NFC: st21nfca: Adding support for secure element")
> >>> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>

> > The nfc_hci_hcp_message_tx() is called by both process context(hci_dev_up and so on)
> > and interrupt context(st21nfca_se_wt_timeout()). The process context(hci_dev_up and so on)
> > calls device_lock, but I think calling spin_lock() within device_lock() is ok. There is
> > no device_lock() called within spin_lock(). 
> 
> There is.
> 
> nfc_hci_failure -> spin lock -> nfc_driver_failure -> nfc_targets_found
> -> device_lock
> 
> I found it just by a very quick look, so I suspect there are several
> other places, not really checked.

I agree with you, the spin_lock is not a good solution to this problem. There is another solution:

We could put the nfc_hci_send_event() of st21nfca_se_wt_timeout() in a work item, then, using
schedule_work() in st21nfca_se_wt_timeout() to execute the work item. The schedule_work() will
wake up another kernel thread which is in process context to execute the bottom half of the interrupt, 
so it allows sleep.

The following is the details.

diff --git a/drivers/nfc/st21nfca/se.c b/drivers/nfc/st21nfca/se.c
index c922f10d0d7..1e98467dbf7 100644
--- a/drivers/nfc/st21nfca/se.c
+++ b/drivers/nfc/st21nfca/se.c
@@ -241,7 +241,7 @@ int st21nfca_hci_se_io(struct nfc_hci_dev *hdev, u32 se_idx,
 }
 EXPORT_SYMBOL(st21nfca_hci_se_io);

-static void st21nfca_se_wt_timeout(struct timer_list *t)
+static void st21nfca_se_wt_work(struct work_struct *work)
 {
        /* 
         * No answer from the secure element
@@ -254,8 +254,9 @@ static void st21nfca_se_wt_timeout(struct timer_list *t)
         */
        /* hardware reset managed through VCC_UICC_OUT power supply */
        u8 param = 0x01;
-       struct st21nfca_hci_info *info = from_timer(info, t,
-                                                   se_info.bwi_timer);
+       struct st21nfca_hci_info *info = container_of(work,
+                                               struct st21nfca_hci_info,
+                                               se_info.timeout_work);

        info->se_info.bwi_active = false;
 
@@ -271,6 +272,13 @@ static void st21nfca_se_wt_timeout(struct timer_list *t)
        info->se_info.cb(info->se_info.cb_context, NULL, 0, -ETIME);
 }

+static void st21nfca_se_wt_timeout(struct timer_list *t)
+{
+       struct st21nfca_hci_info *info = from_timer(info, t, se_info.bwi_timer);
+
+       schedule_work(&info->se_info.timeout_work);
+}
+
 static void st21nfca_se_activation_timeout(struct timer_list *t)
 {
        struct st21nfca_hci_info *info = from_timer(info, t,
@@ -389,6 +397,7 @@ void st21nfca_se_init(struct nfc_hci_dev *hdev)
        struct st21nfca_hci_info *info = nfc_hci_get_clientdata(hdev);

        init_completion(&info->se_info.req_completion);
+       INIT_WORK(&info->se_info.timeout_work, st21nfca_se_wt_work);
        /* initialize timers */
        timer_setup(&info->se_info.bwi_timer, st21nfca_se_wt_timeout, 0);
        info->se_info.bwi_active = false;
@@ -416,6 +425,7 @@ void st21nfca_se_deinit(struct nfc_hci_dev *hdev)
        if (info->se_info.se_active)
                del_timer_sync(&info->se_info.se_active_timer);

+       cancel_work_sync(&info->se_info.timeout_work);
        info->se_info.bwi_active = false;
        info->se_info.se_active = false;
 }
diff --git a/drivers/nfc/st21nfca/st21nfca.h b/drivers/nfc/st21nfca/st21nfca.h
index cb6ad916be9..ae6771cc989 100644
--- a/drivers/nfc/st21nfca/st21nfca.h
+++ b/drivers/nfc/st21nfca/st21nfca.h
@@ -141,6 +141,7 @@ struct st21nfca_se_info {

        se_io_cb_t cb;
        void *cb_context;
+       struct work_struct timeout_work;
 };

 struct st21nfca_hci_info {

Do you think this solution is better?

Best regards,
Duoming Zhou

^ permalink raw reply related

* Re: [PATCH net-next v2] net: wwan: t7xx: fix GFP_KERNEL usage in spin_lock context
From: Ziyang Xuan (William) @ 2022-05-18  4:39 UTC (permalink / raw)
  To: Loic Poulain
  Cc: chandrashekar.devegowda, linuxwwan, chiranjeevi.rapolu,
	haijun.liu, m.chetan.kumar, ricardo.martinez, ryazanov.s.a,
	johannes, davem, edumazet, kuba, pabeni, netdev
In-Reply-To: <CAMZdPi-ibG9O9C2m3qVeEAbO6=TLA-8jZzX9Gbm2MQOwT_1vPg@mail.gmail.com>

> Hi Ziyang,
> 
> On Tue, 17 May 2022 at 08:30, Ziyang Xuan <william.xuanziyang@huawei.com> wrote:
>>
>> t7xx_cldma_clear_rxq() call t7xx_cldma_alloc_and_map_skb() in spin_lock
>> context, But __dev_alloc_skb() in t7xx_cldma_alloc_and_map_skb() uses
>> GFP_KERNEL, that will introduce scheduling factor in spin_lock context.
>>
>> Because t7xx_cldma_clear_rxq() is called after stopping CLDMA, so we can
>> remove the spin_lock from t7xx_cldma_clear_rxq().
>>
>> Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
>> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
>> ---
> 
> You should normally indicate what changed in this v2.
> 
>>  drivers/net/wwan/t7xx/t7xx_hif_cldma.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
>> index 46066dcd2607..7493285a9606 100644
>> --- a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
>> +++ b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
>> @@ -782,10 +782,12 @@ static int t7xx_cldma_clear_rxq(struct cldma_ctrl *md_ctrl, int qnum)
>>         struct cldma_queue *rxq = &md_ctrl->rxq[qnum];
>>         struct cldma_request *req;
>>         struct cldma_gpd *gpd;
>> -       unsigned long flags;
>>         int ret = 0;
>>
>> -       spin_lock_irqsave(&rxq->ring_lock, flags);
>> +       /* CLDMA has been stopped. There is not any CLDMA IRQ, holding
>> +        * ring_lock is not needed.
> 
> If it makes sense to explain why we don't need locking, the next
> sentence is not needed:

I want to remind the possible developer if he or she want to add spin_lock
here again in future, he or she should check whether there is a scheduling
factor or not here firstly.

> 
> 
>>  Thus we can use functions that may
>> +        * introduce scheduling.
>> +        */
>>         t7xx_cldma_q_reset(rxq);
>>         list_for_each_entry(req, &rxq->tr_ring->gpd_ring, entry) {
>>                 gpd = req->gpd;
>> @@ -808,7 +810,6 @@ static int t7xx_cldma_clear_rxq(struct cldma_ctrl *md_ctrl, int qnum)
>>
>>                 t7xx_cldma_gpd_set_data_ptr(req->gpd, req->mapped_buff);
>>         }
>> -       spin_unlock_irqrestore(&rxq->ring_lock, flags);
>>
>>         return ret;
>>  }
>> --
>> 2.25.1
>>
> .
> 

^ permalink raw reply

* Re: [PATCH 1/3] net: macb: Fix PTP one step sync support
From: Harini Katakam @ 2022-05-18  4:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Harini Katakam, Nicolas Ferre, David Miller, Richard Cochran,
	Claudiu Beznea, Paolo Abeni, netdev, Linux Kernel Mailing List,
	Michal Simek, Radhey Shyam Pandey
In-Reply-To: <20220517194254.015e87f3@kernel.org>

Hi Jakub,

On Wed, May 18, 2022 at 8:12 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 17 May 2022 13:02:57 +0530 Harini Katakam wrote:
> > PTP one step sync packets cannot have CSUM padding and insertion in
> > SW since time stamp is inserted on the fly by HW.
> > In addition, ptp4l version 3.0 and above report an error when skb
> > timestamps are reported for packets that not processed for TX TS
> > after transmission.
> > Add a helper to identify PTP one step sync and fix the above two
> > errors.
> > Also reset ptp OSS bit when one step is not selected.
> >
> > Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support")
> > Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation")
>
> Please make sure to CC authors of the patches under fixes.
> ./scripts/get_maintainer should point them out.

Thanks for the review.
Rafal Ozieblo <rafalo@cadence.com> is the author of the first Fixes
patch but that
address hasn't worked in the last ~4 yrs.
I have cced Claudiu and everyone else from the maintainers
(Eric Dumazet <edumazet@google.com> also doesn't work)

<snip>
> > +/* IEEE1588 PTP flag field values  */
> > +#define PTP_FLAG_TWOSTEP     0x2
>
> Shouldn't this go into the PTP header?

Let me add it to ptp_classify where the relevant helpers are present.

<snip>
> > +static inline bool ptp_oss(struct sk_buff *skb)
>
> Please spell out then name more oss == open source software.

Will change to ptp_one_step_sync

>
> No inline here, please, let the compiler decide if the function is
> small enough. One step timestamp may be a rare use case so inlining
> this twice is not necessarily the right choice.

One step is a rare case but the check happens on every PTP packet in the
transmit data path and hence I wanted to explicitly inline it.

<snip>
> > @@ -1158,13 +1192,14 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
> >
> >                       /* First, update TX stats if needed */
> >                       if (skb) {
> > -                             if (unlikely(skb_shinfo(skb)->tx_flags &
> > -                                          SKBTX_HW_TSTAMP) &&
> > -                                 gem_ptp_do_txstamp(queue, skb, desc) == 0) {
> > -                                     /* skb now belongs to timestamp buffer
> > -                                      * and will be removed later
> > -                                      */
> > -                                     tx_skb->skb = NULL;
> > +                             if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
>
> ptp_oss already checks if HW_TSTAMP is set.

The check for SKBTX_HW_TSTAMP is required here universally and not
just inside ptp_oss.
I will remove the redundant check in ptp_oss instead. Please see the
reply below.

>
> > +                                 !ptp_oss(skb)) {
> > +                                     if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {
>
> Why convert the gem_ptp_do_txstamp check from a && in the condition to
> a separate if?

The intention is that ptp_oss should only be evaluated when
SKBTX_HW_TSTAMP is set and
gem_ptp_do_txstamp should only be called if ptp_oss is false. Since
compiler follows the order
of evaluation, I'll simplify this to:

if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) && !ptp_oss(skb) &&
    gem_ptp_do_txstamp(queue, skb, desc) == 0) {
...
}

Regards,
Harini

^ permalink raw reply

* Re: [PATCH nf,v2] netfilter: flowtable: fix TCP flow teardown
From: Sven Auhagen @ 2022-05-18  4:16 UTC (permalink / raw)
  To: Oz Shlomo; +Cc: Pablo Neira Ayuso, netfilter-devel, nbd, fw, paulb, netdev
In-Reply-To: <f8247247-4109-18bc-c422-a69619b50258@nvidia.com>


I tested the patch in production yesterday and everything looks good.

Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>

On Tue, May 17, 2022 at 01:04:53PM +0300, Oz Shlomo wrote:
> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
> 
> On 5/17/2022 12:42 PM, Pablo Neira Ayuso wrote:
> > This patch addresses three possible problems:
> > 
> > 1. ct gc may race to undo the timeout adjustment of the packet path, leaving
> >     the conntrack entry in place with the internal offload timeout (one day).
> > 
> > 2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
> >     timeout is reached before the flow offload del.
> > 
> > 3. tcp ct is always set to ESTABLISHED with a very long timeout
> >     in flow offload teardown/delete even though the state might be already
> >     CLOSED. Also as a remark we cannot assume that the FIN or RST packet
> >     is hitting flow table teardown as the packet might get bumped to the
> >     slow path in nftables.
> > 
> > This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
> > conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
> > state transition.
> > 
> > Moreover, teturn the connection's ownership to conntrack upon teardown
> > by clearing the offload flag and fixing the established timeout value.
> > The flow table GC thread will asynchonrnously free the flow table and
> > hardware offload entries.
> > 
> > Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
> > which is also misleading since the flow is back to classic conntrack
> > path.
> > 
> > If nf_ct_delete() removes the entry from the conntrack table, then it
> > calls nf_ct_put() which decrements the refcnt. This is not a problem
> > because the flowtable holds a reference to the conntrack object from
> > flow_offload_alloc() path which is released via flow_offload_free().
> > 
> > This patch also updates nft_flow_offload to skip packets in SYN_RECV
> > state. Since we might miss or bump packets to slow path, we do not know
> > what will happen there while we are still in SYN_RECV, this patch
> > postpones offload up to the next packet which also aligns to the
> > existing behaviour in tc-ct.
> > 
> > flow_offload_teardown() does not reset the existing tcp state from
> > flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
> > path might have already update the state to CLOSE/FIN.
> > 
> > Joint work with Oz and Sven.
> > 
> > Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> > v2: fix nf_conntrack_tcp_established() call, reported by Oz
> > 
> >   net/netfilter/nf_flow_table_core.c | 33 +++++++-----------------------
> >   net/netfilter/nft_flow_offload.c   |  3 ++-
> >   2 files changed, 9 insertions(+), 27 deletions(-)
> > 
> > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > index 20b4a14e5d4e..ebdf5332e838 100644
> > --- a/net/netfilter/nf_flow_table_core.c
> > +++ b/net/netfilter/nf_flow_table_core.c
> > @@ -179,12 +179,11 @@ EXPORT_SYMBOL_GPL(flow_offload_route_init);
> >   static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
> >   {
> > -	tcp->state = TCP_CONNTRACK_ESTABLISHED;
> >   	tcp->seen[0].td_maxwin = 0;
> >   	tcp->seen[1].td_maxwin = 0;
> >   }
> > -static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > +static void flow_offload_fixup_ct(struct nf_conn *ct)
> >   {
> >   	struct net *net = nf_ct_net(ct);
> >   	int l4num = nf_ct_protonum(ct);
> > @@ -193,7 +192,9 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> >   	if (l4num == IPPROTO_TCP) {
> >   		struct nf_tcp_net *tn = nf_tcp_pernet(net);
> > -		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> > +		flow_offload_fixup_tcp(&ct->proto.tcp);
> > +
> > +		timeout = tn->timeouts[ct->proto.tcp.state];
> >   		timeout -= tn->offload_timeout;
> >   	} else if (l4num == IPPROTO_UDP) {
> >   		struct nf_udp_net *tn = nf_udp_pernet(net);
> > @@ -211,18 +212,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> >   		WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout);
> >   }
> > -static void flow_offload_fixup_ct_state(struct nf_conn *ct)
> > -{
> > -	if (nf_ct_protonum(ct) == IPPROTO_TCP)
> > -		flow_offload_fixup_tcp(&ct->proto.tcp);
> > -}
> > -
> > -static void flow_offload_fixup_ct(struct nf_conn *ct)
> > -{
> > -	flow_offload_fixup_ct_state(ct);
> > -	flow_offload_fixup_ct_timeout(ct);
> > -}
> > -
> >   static void flow_offload_route_release(struct flow_offload *flow)
> >   {
> >   	nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
> > @@ -361,22 +350,14 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> >   	rhashtable_remove_fast(&flow_table->rhashtable,
> >   			       &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
> >   			       nf_flow_offload_rhash_params);
> > -
> > -	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > -
> > -	if (nf_flow_has_expired(flow))
> > -		flow_offload_fixup_ct(flow->ct);
> > -	else
> > -		flow_offload_fixup_ct_timeout(flow->ct);
> > -
> >   	flow_offload_free(flow);
> >   }
> >   void flow_offload_teardown(struct flow_offload *flow)
> >   {
> > +	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> >   	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > -
> > -	flow_offload_fixup_ct_state(flow->ct);
> > +	flow_offload_fixup_ct(flow->ct);
> >   }
> >   EXPORT_SYMBOL_GPL(flow_offload_teardown);
> > @@ -466,7 +447,7 @@ static void nf_flow_offload_gc_step(struct nf_flowtable *flow_table,
> >   	if (nf_flow_has_expired(flow) ||
> >   	    nf_ct_is_dying(flow->ct) ||
> >   	    nf_flow_has_stale_dst(flow))
> > -		set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > +		flow_offload_teardown(flow);
> >   	if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
> >   		if (test_bit(NF_FLOW_HW, &flow->flags)) {
> > diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> > index 187b8cb9a510..6f0b07fe648d 100644
> > --- a/net/netfilter/nft_flow_offload.c
> > +++ b/net/netfilter/nft_flow_offload.c
> > @@ -298,7 +298,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> >   	case IPPROTO_TCP:
> >   		tcph = skb_header_pointer(pkt->skb, nft_thoff(pkt),
> >   					  sizeof(_tcph), &_tcph);
> > -		if (unlikely(!tcph || tcph->fin || tcph->rst))
> > +		if (unlikely(!tcph || tcph->fin || tcph->rst ||
> > +			     !nf_conntrack_tcp_established(ct)))
> >   			goto out;
> >   		break;
> >   	case IPPROTO_UDP:

^ permalink raw reply

* Re: [PATCH] vhost_net: fix double fget()
From: Jason Wang @ 2022-05-18  4:10 UTC (permalink / raw)
  To: Al Viro
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel,
	linux-fsdevel, Eric Biggers, davem
In-Reply-To: <YoQa4wzy9jSwDY7E@zeniv-ca.linux.org.uk>

On Wed, May 18, 2022 at 6:00 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, May 16, 2022 at 04:44:19AM -0400, Michael S. Tsirkin wrote:
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > and this is stable material I guess.
>
> It is, except that commit message ought to be cleaned up.  Something
> along the lines of
>
> ----
> Fix double fget() in vhost_net_set_backend()
>
> Descriptor table is a shared resource; two fget() on the same descriptor
> may return different struct file references.  get_tap_ptr_ring() is
> called after we'd found (and pinned) the socket we'll be using and it
> tries to find the private tun/tap data structures associated with it.
> Redoing the lookup by the same file descriptor we'd used to get the
> socket is racy - we need to same struct file.
>
> Thanks to Jason for spotting a braino in the original variant of patch -
> I'd missed the use of fd == -1 for disabling backend, and in that case
> we can end up with sock == NULL and sock != oldsock.
> ----
>
> Does the above sound sane for commit message?

Yes.

> And which tree would you
> prefer it to go through?  I can take it in vfs.git#fixes, or you could
> take it into your tree...
>

Consider Michael gave an ack, it would be fine if you want to take via
your tree.

Thanks


^ permalink raw reply

* Re: [RFC net-next] bonding: netlink error message support for options
From: Jonathan Toppins @ 2022-05-18  3:37 UTC (permalink / raw)
  To: Jakub Kicinski, Stephen Hemminger
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Paolo Abeni, linux-kernel
In-Reply-To: <20220517165419.540f2dc8@kernel.org>

On 5/17/22 19:54, Jakub Kicinski wrote:
> On Tue, 17 May 2022 15:44:19 -0700 Stephen Hemminger wrote:
>> On Tue, 17 May 2022 16:31:19 -0400
>> Jonathan Toppins <jtoppins@redhat.com> wrote:
>>
>>>      This is an RFC because the current NL_SET_ERR_MSG() macros do not support
>>>      printf like semantics so I rolled my own buffer setting in __bond_opt_set().
>>>      The issue is I could not quite figure out the life-cycle of the buffer, if
>>>      rtnl lock is held until after the text buffer is copied into the packet
>>>      then we are ok, otherwise, some other type of buffer management scheme will
>>>      be needed as this could result in corrupted error messages when modifying
>>>      multiple bonds.
>>
>> Might be better for others in long term if NL_SET_ERR_MSG() had printf like
>> semantics. Surely this isn't going to be first or last case.
>>
>> Then internally, it could print right to the netlink message.
> 
> Dunno. I think pointing at the bad attr + exposing per-attr netlink
> parsing policy + a string for a human worked pretty well so far.
> IMHO printf() is just a knee jerk reaction, especially when converting
> from netdev_err().

For some subsystems it is not a convert from netdev_err, it is an AND. 
In this RFC there are instances where changing the message from 
netdev_err() to the macro was trivial;

@@ -240,12 +243,14 @@ static int bond_changelink(struct net_device 
*bond_dev, st
ruct nlattr *tb[],
                 int arp_interval = 
nla_get_u32(data[IFLA_BOND_ARP_INTERVAL]);

                 if (arp_interval && miimon) {
-                       netdev_err(bond->dev, "ARP monitoring cannot be 
used with MII monitoring\n");
+                       NL_SET_ERR_MSG(extack,
+                                      "ARP monitoring cannot be used 
with MII monitoring");
                         return -EINVAL;
                 }

These are trivial because the path does not have to care about sysfs or 
some other legacy configuration interface. These macros become rather 
annoying to use once a system needs to support multiple configuration 
paths and is trying to utilize as much common configuration code[0] as 
possible so that all interfaces largely operate the same way.

> 
> Augmenting structured information is much, much better long term.
> 
> To me the never ending stream of efforts to improve printk() is a
> proof that once we let people printf() at will, efforts to contain
> it will be futile.
> 
At least for bonding I was trying to reuse the most amount of code which 
needs to deal with both sysfs and netlink. And I don't think it is a 
good idea to split the code paths, so if I am suppose to use statically 
allocated strings to support netlink errors that basically means 
anything that has to support multiple interfaces gets to sprinkle `if 
(extack)` everywhere[0]. Not great. The ownership model of the error 
buffer seems odd to me with the current macros, I am suppose to set a 
pointer in a structure subsystem X didn't allocate and has no control 
over its lifetime. Then netlink takes this pointer and does whatever 
with it. And somehow subsystem X is suppose to guarantee the pointer's 
lifetime exists forever, making a `const static char[]` buffer the only 
option. I don't understand why netlink doesn't provide the buffer and a 
subsystem just populates it. Using memcpy or snprintf doesn't matter, to 
me its a lifetime issue that makes the API not great to work with when 
you have to handle cases other than netlink.

Also as Joe Perches points out in this thread[1,2] the way the macros 
are written it is bloating the kernel because the error messages are 
getting duplicated for subsystems that need to support multiple 
configuration interfaces.

-Jon

[0] 
https://lore.kernel.org/netdev/e6b78ce8f5904a5411a809cf4205d745f8af98cb.1628650079.git.jtoppins@redhat.com/
[1] https://lore.kernel.org/netdev/cover.1628306392.git.jtoppins@redhat.com/
[2] 
https://lore.kernel.org/netdev/c8b69905c995ab887633ef11862705ee66c60aad.camel@perches.com/


^ permalink raw reply

* Re: [PATCH bpf-next v4 3/6] bpf: Move is_valid_bpf_tramp_flags() to the public trampoline code
From: Xu Kuohai @ 2022-05-18  3:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517155349.4jk5oymnjvrasw2p@MacBook-Pro.local>

On 5/17/2022 11:53 PM, Alexei Starovoitov wrote:
> On Tue, May 17, 2022 at 03:18:35AM -0400, Xu Kuohai wrote:
>>  
>> +static bool is_valid_bpf_tramp_flags(unsigned int flags)
>> +{
>> +	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
>> +	    (flags & BPF_TRAMP_F_SKIP_FRAME))
>> +		return false;
>> +
>> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>> +	 * and it must be used alone.
>> +	 */
>> +	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
>> +	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +int bpf_prepare_trampoline(struct bpf_tramp_image *tr, void *image,
>> +			   void *image_end, const struct btf_func_model *m,
>> +			   u32 flags, struct bpf_tramp_links *tlinks,
>> +			   void *orig_call)
>> +{
>> +	if (!is_valid_bpf_tramp_flags(flags))
>> +		return -EINVAL;
>> +
>> +	return arch_prepare_bpf_trampoline(tr, image, image_end, m, flags,
>> +					   tlinks, orig_call);
>> +}
> 
> It's an overkill to introduce a new helper function just to validate
> flags that almost compile time constants.
> The flags are not user supplied.
> Please move /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops ... */
> comment to bpf_struct_ops.c right before it calls arch_prepare_bpf_trampoline()
> And add a comment to trampoline.c saying that BPF_TRAMP_F_RESTORE_REGS
> and BPF_TRAMP_F_SKIP_FRAME should not be set together.
> We could add a warn_on there or in arch code, but feels like overkill.
> .

OK, will fix in next version.



^ permalink raw reply

* Re: [PATCH] bpf: add access control for map
From: Menglong Dong @ 2022-05-18  3:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Network Development, bpf, LKML, Menglong Dong,
	Andrii Nakryiko, Daniel Borkmann, Alan Maguire
In-Reply-To: <CAADnVQLRmv107zFL-dgB07Mf8NmR0TCAC9eG9aZ1O4DG3=ityw@mail.gmail.com>

On Wed, May 18, 2022 at 12:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 16, 2022 at 8:44 PM <menglong8.dong@gmail.com> wrote:
> >
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > Hello,
> >
> > I have a idea about the access control of eBPF map, could you help
> > to see if it works?
> >
> > For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
> > to access the data in eBPF maps. So I'm thinking, are there any way
> > to control the access to the maps, just like what we do to files?
>
> The bpf objects pinned in bpffs should always be accessible
> as files regardless of sysctl or cap-s.
>
> > Therefore, we can decide who have the right to read the map and who
> > can write.
> >
> > I think it is useful in some case. For example, I made a eBPF-based
> > network statistics program, and the information is stored in an array
> > map. And I want all users can read the information in the map, without
> > changing the capacity of them. As the information is iunsensitive,
> > normal users can read it. This make publish-consume mode possible,
> > the eBPF program is publisher and the user space program is consumer.
>
> Right. It is a choice of the bpf prog which data expose in the map.
>
> > So this aim can be achieve, if we can control the access of maps as a
> > file. There are many ways I thought, and I choosed one to implement:
> >
> > While pining the map, add the inode that is created to a list on the
> > map. root can change the permission of the inode through the pin path.
> > Therefore, we can try to find the inode corresponding to current user
> > namespace in the list, and check whether user have permission to read
> > or write.
> >
> > The steps can be:
> >
> > 1. create the map with BPF_F_UMODE flags, which imply that enable
> >    access control in this map.
> > 2. load and pin the map on /sys/fs/bpf/xxx.
> > 3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
> >    therefor all user can read the map.
>
> This behavior should be available by default.
> Only sysctl was preventing it. It's being fixed by
> the following patch. Please take a look at:
> https://patchwork.kernel.org/project/netdevbpf/patch/1652788780-25520-2-git-send-email-alan.maguire@oracle.com/
>
> Does it solve your use case?

Yeah, it seems this patch gives another way: give users all access to
bpf commands (except map_create and prog_load). Therefore, users
that have the access to the pin path have corresponding r/w of the
eBPF object. This patch can cover my case.

However, this seems to give users too much permission (or the
access check is not enough?) I have do a test:

1. load a ebpf program of type cgroup and pin it on
   /sys/fs/bpf/post_bind as root.
2. give users access to read /sys/fs/bpf/post_bind
3. Now, all users can attach or detach the eBPF program
   to /sys/fs/cgroup/, who have only read access to the ebpf
   and the cgroup.

I think there are many such cases. Is this fine?

>
> > @@ -542,14 +557,26 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
> >         if (IS_ERR(raw))
> >                 return PTR_ERR(raw);
> >
> > -       if (type == BPF_TYPE_PROG)
> > +       if (type != BPF_TYPE_MAP && !bpf_capable())
> > +               return -EPERM;
>
> obj_get already implements normal ACL style access to files.
> Let's not fragment this security model with extra cap checks.
>

Yeah, my way is too rough.

> > +
> > +       switch (type) {
> > +       case BPF_TYPE_PROG:
> >                 ret = bpf_prog_new_fd(raw);
> > -       else if (type == BPF_TYPE_MAP)
> > +               break;
> > +       case BPF_TYPE_MAP:
> > +               if (bpf_map_permission(raw, f_flags)) {
> > +                       bpf_any_put(raw, type);
> > +                       return -EPERM;
> > +               }
>
> bpf_obj_do_get() already does such check.
>

With the patch you mentioned above, now bpf_obj_do_get()
can do this job, as normal users can also get there too.

> > +int bpf_map_permission(struct bpf_map *map, int flags)
> > +{
> > +       struct bpf_map_inode *map_inode;
> > +       struct user_namespace *ns;
> > +
> > +       if (capable(CAP_SYS_ADMIN))
> > +               return 0;
> > +
> > +       if (!(map->map_flags & BPF_F_UMODE))
> > +               return -1;
> > +
> > +       rcu_read_lock();
> > +       list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
> > +               ns = map_inode->inode->i_sb->s_user_ns;
> > +               if (ns == current_user_ns())
> > +                       goto found;
> > +       }
> > +       rcu_read_unlock();
> > +       return -1;
> > +found:
> > +       rcu_read_unlock();
> > +       return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
> > +}
>
> See path_permission() in bpf_obj_do_get().
>
> >  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> > @@ -3720,9 +3757,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> >             attr->open_flags & ~BPF_OBJ_FLAG_MASK)
> >                 return -EINVAL;
> >
> > -       if (!capable(CAP_SYS_ADMIN))
> > -               return -EPERM;
> > -
>
> This part we cannot relax.
> What you're trying to do is to bypass path checks
> by pointing at a map with its ID only.
> That contradicts to your official goal in the cover letter.
>
> bpf_map_get_fd_by_id() has to stay cap_sys_admin only.
> Exactly for the reason that bpf subsystem has file ACL style.
> fd_by_id is a debug interface used by tools like bpftool and
> root admin that needs to see the system as a whole.
> Normal tasks/processes need to use bpffs and pin files with
> correct permissions to pass maps from one process to another.
> Or use FD passing kernel facilities.

Yeah, this part is not necessary for me either. Without this
part, the current code already can do what I wanted.

Thanks
Menglong Dong

^ permalink raw reply

* Re: [PATCH net-next v2] net: txgbe: Add build support for txgbe
From: Andrew Lunn @ 2022-05-18  3:12 UTC (permalink / raw)
  To: Jiawen Wu; +Cc: netdev
In-Reply-To: <20220517092109.8161-1-jiawenwu@trustnetic.com>

> +Support
> +=======
> +If you got any problem, contact Wangxun support team via support@trustnetic.com

Since this is now a mainline driver, you should be doing support out
in the open. So indicate your should also Cc: netdev, so other members
of the networking community using this hardware can learn as well from
peoples questions.

> +config TXGBE
> +	tristate "Wangxun(R) 10GbE PCI Express adapters support"
> +	depends on PCI
> +	depends on PTP_1588_CLOCK_OPTIONAL
> +	select PHYLIB

The current driver does not depend on PTP nor need PHYLIB. Please add
these when they are actually needed.

> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2015 - 2017 Beijing WangXun Technology Co., Ltd. */
> +
> +#ifndef _TXGBE_H_
> +#define _TXGBE_H_
> +
> +#include "txgbe_type.h"
> +
> +#ifndef MAX_REQUEST_SIZE
> +#define MAX_REQUEST_SIZE 256
> +#endif

Why the #ifndef? What could be setting it?

A TXGBE_ prefix would also be good.

> +
> +#define TXGBE_MAX_FDIR_INDICES          63
> +
> +#define MAX_TX_QUEUES   (TXGBE_MAX_FDIR_INDICES + 1)

Prefix here as well.

> +
> +/* board specific private data structure */
> +struct txgbe_adapter {
> +	/* OS defined structs */
> +	struct net_device *netdev;
> +	struct pci_dev *pdev;
> +
> +	unsigned long state;
> +
> +	/* structs defined in txgbe_hw.h */
> +	struct txgbe_hw hw;
> +	u16 msg_enable;
> +
> +	u8 __iomem *io_addr;    /* Mainly for iounmap use */
> +};
> +
> +enum txgbe_state_t {
> +	__TXGBE_TESTING,
> +	__TXGBE_RESETTING,
> +	__TXGBE_DOWN,
> +	__TXGBE_HANGING,
> +	__TXGBE_DISABLED,
> +	__TXGBE_REMOVING,
> +	__TXGBE_SERVICE_SCHED,
> +	__TXGBE_SERVICE_INITED,
> +	__TXGBE_IN_SFP_INIT,
> +	__TXGBE_PTP_RUNNING,
> +	__TXGBE_PTP_TX_IN_PROGRESS,
> +};
> +
> +#define TXGBE_NAME "txgbe"
> +
> +static inline struct device *pci_dev_to_dev(struct pci_dev *pdev)
> +{
> +	return &pdev->dev;
> +}

Does not have any value. &pdev->dev; is shorter than
pci_dev_to_dev(pdev), there are no casts here, etc.

> +#define txgbe_dev_info(format, arg...) \
> +	dev_info(&adapter->pdev->dev, format, ## arg)
> +#define txgbe_dev_warn(format, arg...) \
> +	dev_warn(&adapter->pdev->dev, format, ## arg)
> +#define txgbe_dev_err(format, arg...) \
> +	dev_err(&adapter->pdev->dev, format, ## arg)
> +#define txgbe_dev_notice(format, arg...) \
> +	dev_notice(&adapter->pdev->dev, format, ## arg)
> +#define txgbe_dbg(msglvl, format, arg...) \
> +	netif_dbg(adapter, msglvl, adapter->netdev, format, ## arg)
> +#define txgbe_info(msglvl, format, arg...) \
> +	netif_info(adapter, msglvl, adapter->netdev, format, ## arg)
> +#define txgbe_err(msglvl, format, arg...) \
> +	netif_err(adapter, msglvl, adapter->netdev, format, ## arg)
> +#define txgbe_warn(msglvl, format, arg...) \
> +	netif_warn(adapter, msglvl, adapter->netdev, format, ## arg)
> +#define txgbe_crit(msglvl, format, arg...) \
> +	netif_crit(adapter, msglvl, adapter->netdev, format, ## arg)

It is pretty unusual to use wrappers like this. It is also bad
practice for a macro to access something which is not passed to it as
a parameter. I suggest you remove all these.

> +
> +#define TXGBE_FAILED_READ_CFG_DWORD 0xffffffffU
> +#define TXGBE_FAILED_READ_CFG_WORD  0xffffU
> +#define TXGBE_FAILED_READ_CFG_BYTE  0xffU
> +
> +#endif /* _TXGBE_H_ */
> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> new file mode 100644
> index 000000000000..17a30629f76a
> --- /dev/null
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> @@ -0,0 +1,332 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2015 - 2017 Beijing WangXun Technology Co., Ltd. */
> +
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/netdevice.h>
> +#include <linux/string.h>
> +#include <linux/aer.h>
> +#include <linux/etherdevice.h>
> +
> +#include "txgbe.h"
> +
> +char txgbe_driver_name[32] = TXGBE_NAME;
> +static const char txgbe_driver_string[] =
> +			"WangXun 10 Gigabit PCI Express Network Driver";
> +
> +static const char txgbe_copyright[] =
> +	"Copyright (c) 2015 -2017 Beijing WangXun Technology Co., Ltd";

Only until 2017? You don't need this anyway, you have the copyright on
the top of each file.

> +
> +/* txgbe_pci_tbl - PCI Device ID Table
> + *
> + * Wildcard entries (PCI_ANY_ID) should come last
> + * Last entry must be all 0s
> + *
> + * { Vendor ID, Device ID, SubVendor ID, SubDevice ID,
> + *   Class, Class Mask, private data (not used) }
> + */
> +static const struct pci_device_id txgbe_pci_tbl[] = {
> +	{ PCI_VDEVICE(TRUSTNETIC, TXGBE_DEV_ID_SP1000), 0},
> +	{ PCI_VDEVICE(TRUSTNETIC, TXGBE_DEV_ID_WX1820), 0},
> +	/* required last entry */
> +	{ .device = 0 }
> +};
> +MODULE_DEVICE_TABLE(pci, txgbe_pci_tbl);
> +
> +MODULE_AUTHOR("Beijing WangXun Technology Co., Ltd, <software@trustnetic.com>");
> +MODULE_DESCRIPTION("WangXun(R) 10 Gigabit PCI Express Network Driver");
> +MODULE_LICENSE("GPL");

Traditionally, all MODULE_* things come at the end. 

> +#define DEFAULT_DEBUG_LEVEL_SHIFT 3
> +
> +static struct workqueue_struct *txgbe_wq;

No globals. 

> +
> +static bool txgbe_check_cfg_remove(struct txgbe_hw *hw, struct pci_dev *pdev);

Forwards references should only be needed if you have mutually
recursive functions. For anything else, move the code around to avoid
them.

> +
> +static void txgbe_remove_adapter(struct txgbe_hw *hw)
> +{
> +	struct txgbe_adapter *adapter = hw->back;
> +
> +	if (!hw->hw_addr)
> +		return;
> +	hw->hw_addr = NULL;
> +	txgbe_dev_err("Adapter removed\n");

It is not an error, modules can be unloaded, drivers unbound etc.

> +}
> +
> +/**
> + * txgbe_sw_init - Initialize general software structures (struct txgbe_adapter)
> + * @adapter: board private structure to initialize
> + *
> + * txgbe_sw_init initializes the Adapter private data structure.
> + * Fields are initialized based on PCI device information and
> + * OS network device settings (MTU size).
> + **/
> +static int txgbe_sw_init(struct txgbe_adapter *adapter)
> +{
> +	struct txgbe_hw *hw = &adapter->hw;
> +	struct pci_dev *pdev = adapter->pdev;
> +	int err = 0;
> +
> +	/* PCI config space info */
> +	hw->vendor_id = pdev->vendor;
> +	hw->device_id = pdev->device;
> +	pci_read_config_byte(pdev, PCI_REVISION_ID, &hw->revision_id);
> +	if (hw->revision_id == TXGBE_FAILED_READ_CFG_BYTE &&
> +	    txgbe_check_cfg_remove(hw, pdev)) {
> +		txgbe_err(probe, "read of revision id failed\n");
> +		err = -ENODEV;
> +		goto out;

goto out is used when you have something to cleanup on error. If there
is no cleanup needed, just return -ENODEV.

> +	}
> +	hw->subsystem_vendor_id = pdev->subsystem_vendor;
> +	hw->subsystem_device_id = pdev->subsystem_device;
> +
> +	pci_read_config_word(pdev, PCI_SUBSYSTEM_ID, &hw->subsystem_id);
> +	if (hw->subsystem_id == TXGBE_FAILED_READ_CFG_WORD) {
> +		txgbe_err(probe, "read of subsystem id failed\n");
> +		err = -ENODEV;
> +		goto out;

And this goto is pointless.

> +	}
> +
> +out:
> +	return err;
> +}
> +
> +static int __txgbe_shutdown(struct pci_dev *pdev, bool *enable_wake)

Please avoid __foo, unless you already have a _foo. And if you do have
foo, _foo and __foo, you should probably think about better names.

> +{
> +	struct txgbe_adapter *adapter = pci_get_drvdata(pdev);
> +	struct net_device *netdev = adapter->netdev;
> +
> +	netif_device_detach(netdev);
> +
> +	if (!test_and_set_bit(__TXGBE_DISABLED, &adapter->state))
> +		pci_disable_device(pdev);
> +
> +	return 0;

Looks like this should be a void function.


> +}
> +
> +static void txgbe_shutdown(struct pci_dev *pdev)
> +{
> +	bool wake;
> +
> +	__txgbe_shutdown(pdev, &wake);
> +
> +	if (system_state == SYSTEM_POWER_OFF) {
> +		pci_wake_from_d3(pdev, wake);
> +		pci_set_power_state(pdev, PCI_D3hot);
> +	}
> +}
> +
> +/**
> + * txgbe_probe - Device Initialization Routine
> + * @pdev: PCI device information struct
> + * @ent: entry in txgbe_pci_tbl
> + *
> + * Returns 0 on success, negative on failure
> + *
> + * txgbe_probe initializes an adapter identified by a pci_dev structure.
> + * The OS initialization, configuring of the adapter private structure,
> + * and a hardware reset occur.
> + **/
> +static int txgbe_probe(struct pci_dev *pdev,
> +		       const struct pci_device_id __always_unused *ent)
> +{
> +	struct net_device *netdev;
> +	struct txgbe_adapter *adapter = NULL;
> +	struct txgbe_hw *hw = NULL;
> +	int err, pci_using_dac;
> +	unsigned int indices = MAX_TX_QUEUES;
> +	bool disable_dev = false;

Reverse christmas tree. That is, sort these lines longest to shortest.

> +
> +	err = pci_enable_device_mem(pdev);
> +	if (err)
> +		return err;
> +
> +	if (!dma_set_mask(pci_dev_to_dev(pdev), DMA_BIT_MASK(64)) &&
> +	    !dma_set_coherent_mask(pci_dev_to_dev(pdev), DMA_BIT_MASK(64))) {
> +		pci_using_dac = 1;
> +	} else {
> +		err = dma_set_mask(pci_dev_to_dev(pdev), DMA_BIT_MASK(32));
> +		if (err) {
> +			err = dma_set_coherent_mask(pci_dev_to_dev(pdev),
> +						    DMA_BIT_MASK(32));
> +			if (err) {
> +				dev_err(pci_dev_to_dev(pdev),
> +					"No usable DMA configuration, aborting\n");
> +				goto err_dma;
> +			}
> +		}
> +		pci_using_dac = 0;
> +	}
> +
> +	err = pci_request_selected_regions(pdev,
> +					   pci_select_bars(pdev, IORESOURCE_MEM),
> +					   txgbe_driver_name);
> +	if (err) {
> +		dev_err(pci_dev_to_dev(pdev),
> +			"pci_request_selected_regions failed 0x%x\n", err);
> +		goto err_pci_reg;
> +	}
> +
> +	hw = vmalloc(sizeof(*hw));

Why vmalloc? Is *hw very big? 

> +	if (!hw)
> +		return -ENOMEM;

This should probably by a goto, to unwind what you have done above.

> +
> +	hw->vendor_id = pdev->vendor;
> +	hw->device_id = pdev->device;
> +	vfree(hw);

??? You just allocated it?

> +	pci_enable_pcie_error_reporting(pdev);
> +	pci_set_master(pdev);
> +	/* errata 16 */
> +	if (MAX_REQUEST_SIZE == 512) {

So this probably has something to do with my question above. Please
explain.

> +		pcie_capability_clear_and_set_word(pdev, PCI_EXP_DEVCTL,
> +						   PCI_EXP_DEVCTL_READRQ,
> +						   0x2000);
> +	} else {
> +		pcie_capability_clear_and_set_word(pdev, PCI_EXP_DEVCTL,
> +						   PCI_EXP_DEVCTL_READRQ,
> +						   0x1000);
> +	}
> +
> +	netdev = alloc_etherdev_mq(sizeof(struct txgbe_adapter), indices);

devm_alloc_etherdev_mqs(). Using devm makes your cleanup code simpler
and so less buggy.

> +	if (!netdev) {
> +		err = -ENOMEM;
> +		goto err_alloc_etherdev;
> +	}
> +
> +	SET_NETDEV_DEV(netdev, pci_dev_to_dev(pdev));
> +
> +	adapter = netdev_priv(netdev);
> +	adapter->netdev = netdev;
> +	adapter->pdev = pdev;
> +	hw = &adapter->hw;
> +	hw->back = adapter;

You should not need this. container_of() will get you from hw to adapter.

> +	adapter->msg_enable = (1 << DEFAULT_DEBUG_LEVEL_SHIFT) - 1;
> +
> +	hw->hw_addr = ioremap(pci_resource_start(pdev, 0),
> +			      pci_resource_len(pdev, 0));

devm_ioremap()

> +	adapter->io_addr = hw->hw_addr;

Suggests you don't actually have a clean separation. So why have hw?

> +	if (!hw->hw_addr) {
> +		err = -EIO;
> +		goto err_ioremap;
> +	}
> +
> +	strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);

The device gets a name when you register it. It is very unusual to do
this. It needs an explanation.

> +
> +	/* setup the private structure */
> +	err = txgbe_sw_init(adapter);
> +	if (err)
> +		goto err_sw_init;
> +
> +	if (pci_using_dac)
> +		netdev->features |= NETIF_F_HIGHDMA;

There should probably be a return 0; here, so the probe is
successful. Without that, you cannot test the remove function.

> +
> +err_sw_init:
> +	iounmap(adapter->io_addr);
> +err_ioremap:
> +	disable_dev = !test_and_set_bit(__TXGBE_DISABLED, &adapter->state);
> +	free_netdev(netdev);
> +err_alloc_etherdev:
> +	pci_release_selected_regions(pdev,
> +				     pci_select_bars(pdev, IORESOURCE_MEM));
> +err_pci_reg:
> +err_dma:
> +	if (!adapter || disable_dev)
> +		pci_disable_device(pdev);

Having an if in unwind code like this is very unusual. Is it really
needed?

> +	return err;
> +}
> +
> +/**
> + * txgbe_remove - Device Removal Routine
> + * @pdev: PCI device information struct
> + *
> + * txgbe_remove is called by the PCI subsystem to alert the driver
> + * that it should release a PCI device.  The could be caused by a
> + * Hot-Plug event, or because the driver is going to be removed from
> + * memory.
> + **/
> +static void txgbe_remove(struct pci_dev *pdev)
> +{
> +	struct txgbe_adapter *adapter = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +	bool disable_dev;
> +
> +	/* if !adapter then we already cleaned up in probe */
> +	if (!adapter)
> +		return;

Remove is only called if the probe was success. So adapter is valid,
no test needed.

> +	netdev = adapter->netdev;
> +
> +	iounmap(adapter->io_addr);
> +	pci_release_selected_regions(pdev,
> +				     pci_select_bars(pdev, IORESOURCE_MEM));
> +
> +	disable_dev = !test_and_set_bit(__TXGBE_DISABLED, &adapter->state);
> +	free_netdev(netdev);
> +
> +	pci_disable_pcie_error_reporting(pdev);
> +
> +	if (disable_dev)
> +		pci_disable_device(pdev);

And this test is probably not needed.

> +}
> +
> +static bool txgbe_check_cfg_remove(struct txgbe_hw *hw, struct pci_dev *pdev)
> +{
> +	u16 value;
> +
> +	pci_read_config_word(pdev, PCI_VENDOR_ID, &value);
> +	if (value == TXGBE_FAILED_READ_CFG_WORD) {
> +		txgbe_remove_adapter(hw);
> +		return true;
> +	}
> +	return false;

This needs a comment to explain what is happening here, because it is
not clear to me.


> +}
> +
> +static struct pci_driver txgbe_driver = {
> +	.name     = txgbe_driver_name,
> +	.id_table = txgbe_pci_tbl,
> +	.probe    = txgbe_probe,
> +	.remove   = txgbe_remove,
> +	.shutdown = txgbe_shutdown,
> +};
> +
> +/**
> + * txgbe_init_module - Driver Registration Routine
> + *
> + * txgbe_init_module is the first routine called when the driver is
> + * loaded. All it does is register with the PCI subsystem.
> + **/
> +static int __init txgbe_init_module(void)
> +{
> +	int ret;
> +
> +	pr_info("%s\n", txgbe_driver_string);
> +	pr_info("%s\n", txgbe_copyright);

Don't spam the kernel log with useless information.

> +
> +	txgbe_wq = create_singlethread_workqueue(txgbe_driver_name);
> +	if (!txgbe_wq) {
> +		pr_err("%s: Failed to create workqueue\n", txgbe_driver_name);
> +		return -ENOMEM;
> +	}

Why do you need a global work queue? I suggest you start with a plain
PCI device, no __init and __exit functions. You can add this work
queue along with the code which uses it. It will then be clear why it
is needed.

> +/* Little Endian defines */
> +#ifndef __le16
> +#define __le16  u16
> +#endif
> +#ifndef __le32
> +#define __le32  u32
> +#endif
> +#ifndef __le64
> +#define __le64  u64
> +
> +#endif
> +#ifndef __be16
> +/* Big Endian defines */
> +#define __be16  u16
> +#define __be32  u32
> +#define __be64  u64

The kernel provides these. No need for your own.


^ permalink raw reply

* [PATCH bpf-next v2] selftests/bpf: fix some bugs in map_lookup_percpu_elem testcase
From: Feng zhou @ 2022-05-18  2:50 UTC (permalink / raw)
  To: shuah, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, rostedt, mingo, jolsa, davemarchevsky,
	joannekoong, geliang.tang
  Cc: linux-kselftest, netdev, bpf, linux-kernel, duanxiongchun,
	songmuchun, wangdongdong.6, cong.wang, zhouchengming, zhoufeng.zf,
	yosryahmed

From: Feng Zhou <zhoufeng.zf@bytedance.com>

comments from Andrii Nakryiko, details in here:
https://lore.kernel.org/lkml/20220511093854.411-1-zhoufeng.zf@bytedance.com/T/

use /* */ instead of //
use libbpf_num_possible_cpus() instead of sysconf(_SC_NPROCESSORS_ONLN)
use 8 bytes for value size
fix memory leak
use ASSERT_EQ instead of ASSERT_OK
add bpf_loop to fetch values on each possible CPU

Fixes: ed7c13776e20c74486b0939a3c1de984c5efb6aa ("selftests/bpf: add test case for bpf_map_lookup_percpu_elem")
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
---
v1->v2: Addressed comments from Yonghong Song.
- Adjust the code format
more details can be seen from here:
https://lore.kernel.org/lkml/80ab09cf-6072-a75a-082d-2721f6f907ef@fb.com/T/

 .../bpf/prog_tests/map_lookup_percpu_elem.c   | 50 +++++++++------
 .../bpf/progs/test_map_lookup_percpu_elem.c   | 62 ++++++++++++-------
 2 files changed, 71 insertions(+), 41 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c b/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c
index 58b24c2112b0..f987c9278912 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c
@@ -1,30 +1,38 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (c) 2022 Bytedance
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2022 Bytedance */
 
 #include <test_progs.h>
-
 #include "test_map_lookup_percpu_elem.skel.h"
 
-#define TEST_VALUE  1
-
 void test_map_lookup_percpu_elem(void)
 {
 	struct test_map_lookup_percpu_elem *skel;
-	int key = 0, ret;
-	int nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
-	int *buf;
+	__u64 key = 0, sum;
+	int ret, i;
+	int nr_cpus = libbpf_num_possible_cpus();
+	__u64 *buf;
 
-	buf = (int *)malloc(nr_cpus*sizeof(int));
+	buf = (__u64 *)malloc(nr_cpus*sizeof(__u64));
 	if (!ASSERT_OK_PTR(buf, "malloc"))
 		return;
-	memset(buf, 0, nr_cpus*sizeof(int));
-	buf[0] = TEST_VALUE;
 
-	skel = test_map_lookup_percpu_elem__open_and_load();
-	if (!ASSERT_OK_PTR(skel, "test_map_lookup_percpu_elem__open_and_load"))
-		return;
+	for (i = 0; i < nr_cpus; i++)
+		buf[i] = i;
+	sum = (nr_cpus-1)*nr_cpus/2;
+
+	skel = test_map_lookup_percpu_elem__open();
+	if (!ASSERT_OK_PTR(skel, "test_map_lookup_percpu_elem__open"))
+		goto exit;
+
+	skel->rodata->nr_cpus = nr_cpus;
+
+	ret = test_map_lookup_percpu_elem__load(skel);
+	if (!ASSERT_OK(ret, "test_map_lookup_percpu_elem__load"))
+		goto cleanup;
+
 	ret = test_map_lookup_percpu_elem__attach(skel);
-	ASSERT_OK(ret, "test_map_lookup_percpu_elem__attach");
+	if (!ASSERT_OK(ret, "test_map_lookup_percpu_elem__attach"))
+		goto cleanup;
 
 	ret = bpf_map_update_elem(bpf_map__fd(skel->maps.percpu_array_map), &key, buf, 0);
 	ASSERT_OK(ret, "percpu_array_map update");
@@ -37,10 +45,14 @@ void test_map_lookup_percpu_elem(void)
 
 	syscall(__NR_getuid);
 
-	ret = skel->bss->percpu_array_elem_val == TEST_VALUE &&
-	      skel->bss->percpu_hash_elem_val == TEST_VALUE &&
-	      skel->bss->percpu_lru_hash_elem_val == TEST_VALUE;
-	ASSERT_OK(!ret, "bpf_map_lookup_percpu_elem success");
+	test_map_lookup_percpu_elem__detach(skel);
+
+	ASSERT_EQ(skel->bss->percpu_array_elem_sum, sum, "percpu_array lookup percpu elem");
+	ASSERT_EQ(skel->bss->percpu_hash_elem_sum, sum, "percpu_hash lookup percpu elem");
+	ASSERT_EQ(skel->bss->percpu_lru_hash_elem_sum, sum, "percpu_lru_hash lookup percpu elem");
 
+cleanup:
 	test_map_lookup_percpu_elem__destroy(skel);
+exit:
+	free(buf);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_map_lookup_percpu_elem.c b/tools/testing/selftests/bpf/progs/test_map_lookup_percpu_elem.c
index 5d4ef86cbf48..57e875d6e6e0 100644
--- a/tools/testing/selftests/bpf/progs/test_map_lookup_percpu_elem.c
+++ b/tools/testing/selftests/bpf/progs/test_map_lookup_percpu_elem.c
@@ -1,52 +1,70 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (c) 2022 Bytedance
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2022 Bytedance */
 
 #include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
 
-int percpu_array_elem_val = 0;
-int percpu_hash_elem_val = 0;
-int percpu_lru_hash_elem_val = 0;
+__u64 percpu_array_elem_sum = 0;
+__u64 percpu_hash_elem_sum = 0;
+__u64 percpu_lru_hash_elem_sum = 0;
+const volatile int nr_cpus;
 
 struct {
 	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
 	__uint(max_entries, 1);
 	__type(key, __u32);
-	__type(value, __u32);
+	__type(value, __u64);
 } percpu_array_map SEC(".maps");
 
 struct {
 	__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
 	__uint(max_entries, 1);
-	__type(key, __u32);
-	__type(value, __u32);
+	__type(key, __u64);
+	__type(value, __u64);
 } percpu_hash_map SEC(".maps");
 
 struct {
 	__uint(type, BPF_MAP_TYPE_LRU_PERCPU_HASH);
 	__uint(max_entries, 1);
-	__type(key, __u32);
-	__type(value, __u32);
+	__type(key, __u64);
+	__type(value, __u64);
 } percpu_lru_hash_map SEC(".maps");
 
+struct read_percpu_elem_ctx {
+	void *map;
+	__u64 sum;
+};
+
+static int read_percpu_elem_callback(__u32 index, struct read_percpu_elem_ctx *ctx)
+{
+	__u64 key = 0;
+	__u64 *value;
+
+	value = bpf_map_lookup_percpu_elem(ctx->map, &key, index);
+	if (value)
+		ctx->sum += *value;
+	return 0;
+}
+
 SEC("tp/syscalls/sys_enter_getuid")
 int sysenter_getuid(const void *ctx)
 {
-	__u32 key = 0;
-	__u32 cpu = 0;
-	__u32 *value;
+	struct read_percpu_elem_ctx map_ctx;
 
-	value = bpf_map_lookup_percpu_elem(&percpu_array_map, &key, cpu);
-	if (value)
-		percpu_array_elem_val = *value;
+	map_ctx.map = &percpu_array_map;
+	map_ctx.sum = 0;
+	bpf_loop(nr_cpus, read_percpu_elem_callback, &map_ctx, 0);
+	percpu_array_elem_sum = map_ctx.sum;
 
-	value = bpf_map_lookup_percpu_elem(&percpu_hash_map, &key, cpu);
-	if (value)
-		percpu_hash_elem_val = *value;
+	map_ctx.map = &percpu_hash_map;
+	map_ctx.sum = 0;
+	bpf_loop(nr_cpus, read_percpu_elem_callback, &map_ctx, 0);
+	percpu_hash_elem_sum = map_ctx.sum;
 
-	value = bpf_map_lookup_percpu_elem(&percpu_lru_hash_map, &key, cpu);
-	if (value)
-		percpu_lru_hash_elem_val = *value;
+	map_ctx.map = &percpu_lru_hash_map;
+	map_ctx.sum = 0;
+	bpf_loop(nr_cpus, read_percpu_elem_callback, &map_ctx, 0);
+	percpu_lru_hash_elem_sum = map_ctx.sum;
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 1/3] net: macb: Fix PTP one step sync support
From: Jakub Kicinski @ 2022-05-18  2:42 UTC (permalink / raw)
  To: Harini Katakam
  Cc: nicolas.ferre, davem, richardcochran, claudiu.beznea, pabeni,
	netdev, linux-kernel, michal.simek, harinikatakamlinux,
	radhey.shyam.pandey
In-Reply-To: <20220517073259.23476-2-harini.katakam@xilinx.com>

On Tue, 17 May 2022 13:02:57 +0530 Harini Katakam wrote:
> PTP one step sync packets cannot have CSUM padding and insertion in
> SW since time stamp is inserted on the fly by HW.
> In addition, ptp4l version 3.0 and above report an error when skb
> timestamps are reported for packets that not processed for TX TS
> after transmission.
> Add a helper to identify PTP one step sync and fix the above two
> errors.
> Also reset ptp OSS bit when one step is not selected.
> 
> Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support")
> Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation")

Please make sure to CC authors of the patches under fixes.
./scripts/get_maintainer should point them out.

> Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>

If this is a fix it needs to be posted as [PATCH net] separately first
so we can get it into stable as well. The trees get merged together
every Thursday, if you're quick you may still make it this week.
Then after trees get merged you should send send the remaining patches.

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html

> Notes:
> -> Added the macb pad and fcs fixes tag because strictly speaking the PTP support  
> patch precedes the fcs patch in timeline.
> -> FYI, the error observed with setting HW TX timestamp for one step sync packets:  
> ptp4l[405.292]: port 1: unexpected socket error
> 
>  drivers/net/ethernet/cadence/macb_main.c | 54 ++++++++++++++++++++----
>  drivers/net/ethernet/cadence/macb_ptp.c  |  4 +-
>  2 files changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index e993616308f8..e23a03e8badf 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -37,6 +37,7 @@
>  #include <linux/phy/phy.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/reset.h>
> +#include <linux/ptp_classify.h>

Please keep the includes in alphabetical order.

>  #include "macb.h"
>  
>  /* This structure is only used for MACB on SiFive FU540 devices */
> @@ -98,6 +99,9 @@ struct sifive_fu540_macb_mgmt {
>  
>  #define MACB_MDIO_TIMEOUT	1000000 /* in usecs */
>  
> +/* IEEE1588 PTP flag field values  */
> +#define PTP_FLAG_TWOSTEP	0x2

Shouldn't this go into the PTP header?

>  /* DMA buffer descriptor might be different size
>   * depends on hardware configuration:
>   *
> @@ -1122,6 +1126,36 @@ static void macb_tx_error_task(struct work_struct *work)
>  	napi_enable(&queue->napi_tx);
>  }
>  
> +static inline bool ptp_oss(struct sk_buff *skb)

Please spell out then name more oss == open source software.

No inline here, please, let the compiler decide if the function is
small enough. One step timestamp may be a rare use case so inlining
this twice is not necessarily the right choice.

> +{
> +	struct ptp_header *hdr;
> +	unsigned int ptp_class;
> +	u8 msgtype;
> +
> +	/* No need to parse packet if PTP TS is not involved */
> +	if (!(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
> +		goto not_oss;
> +
> +	/* Identify and return whether PTP one step sync is being processed */
> +	ptp_class = ptp_classify_raw(skb);
> +	if (ptp_class == PTP_CLASS_NONE)
> +		goto not_oss;
> +
> +	hdr = ptp_parse_header(skb, ptp_class);
> +	if (!hdr)
> +		goto not_oss;
> +
> +	if (hdr->flag_field[0] & PTP_FLAG_TWOSTEP)
> +		goto not_oss;
> +
> +	msgtype = ptp_get_msgtype(hdr, ptp_class);
> +	if (msgtype == PTP_MSGTYPE_SYNC)
> +		return true;
> +
> +not_oss:
> +	return false;
> +}
> +
>  static int macb_tx_complete(struct macb_queue *queue, int budget)
>  {
>  	struct macb *bp = queue->bp;
> @@ -1158,13 +1192,14 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
>  
>  			/* First, update TX stats if needed */
>  			if (skb) {
> -				if (unlikely(skb_shinfo(skb)->tx_flags &
> -					     SKBTX_HW_TSTAMP) &&
> -				    gem_ptp_do_txstamp(queue, skb, desc) == 0) {
> -					/* skb now belongs to timestamp buffer
> -					 * and will be removed later
> -					 */
> -					tx_skb->skb = NULL;
> +				if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&

ptp_oss already checks if HW_TSTAMP is set.

> +				    !ptp_oss(skb)) {
> +					if (gem_ptp_do_txstamp(queue, skb, desc) == 0) {

Why convert the gem_ptp_do_txstamp check from a && in the condition to
a separate if?

> +						/* skb now belongs to timestamp buffer
> +						 * and will be removed later
> +						 */
> +						tx_skb->skb = NULL;
> +					}
>  				}
>  				netdev_vdbg(bp->dev, "skb %u (data %p) TX complete\n",
>  					    macb_tx_ring_wrap(bp, tail),
> @@ -2063,7 +2098,7 @@ static unsigned int macb_tx_map(struct macb *bp,
>  			ctrl |= MACB_BF(TX_LSO, lso_ctrl);
>  			ctrl |= MACB_BF(TX_TCP_SEQ_SRC, seq_ctrl);
>  			if ((bp->dev->features & NETIF_F_HW_CSUM) &&
> -			    skb->ip_summed != CHECKSUM_PARTIAL && !lso_ctrl)
> +			    skb->ip_summed != CHECKSUM_PARTIAL && !lso_ctrl && !ptp_oss(skb))
>  				ctrl |= MACB_BIT(TX_NOCRC);
>  		} else
>  			/* Only set MSS/MFS on payload descriptors
> @@ -2159,9 +2194,10 @@ static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
>  	struct sk_buff *nskb;
>  	u32 fcs;
>  
> +	/* Not available for GSO and PTP one step sync */

If the functions are named right this comment should not be needed.

>  	if (!(ndev->features & NETIF_F_HW_CSUM) ||
>  	    !((*skb)->ip_summed != CHECKSUM_PARTIAL) ||
> -	    skb_shinfo(*skb)->gso_size)	/* Not available for GSO */
> +	    skb_shinfo(*skb)->gso_size || ptp_oss(*skb))
>  		return 0;
>  
>  	if (padlen <= 0) {
> diff --git a/drivers/net/ethernet/cadence/macb_ptp.c b/drivers/net/ethernet/cadence/macb_ptp.c
> index fb6b27f46b15..9559c16078f9 100644
> --- a/drivers/net/ethernet/cadence/macb_ptp.c
> +++ b/drivers/net/ethernet/cadence/macb_ptp.c
> @@ -470,8 +470,10 @@ int gem_set_hwtst(struct net_device *dev, struct ifreq *ifr, int cmd)
>  	case HWTSTAMP_TX_ONESTEP_SYNC:
>  		if (gem_ptp_set_one_step_sync(bp, 1) != 0)
>  			return -ERANGE;
> -		fallthrough;
> +		tx_bd_control = TSTAMP_ALL_FRAMES;
> +		break;
>  	case HWTSTAMP_TX_ON:
> +		gem_ptp_set_one_step_sync(bp, 0);
>  		tx_bd_control = TSTAMP_ALL_FRAMES;
>  		break;
>  	default:


^ permalink raw reply

* Re: [net-next PATCH V3] octeontx2-pf: Add support for adaptive interrupt coalescing
From: patchwork-bot+netdevbpf @ 2022-05-18  2:30 UTC (permalink / raw)
  To: Suman Ghosh
  Cc: davem, edumazet, kuba, pabeni, sgoutham, sbhatta, gakula,
	Sunil.Goutham, hkelam, colin.king, netdev
In-Reply-To: <20220517044055.876158-1-sumang@marvell.com>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 17 May 2022 10:10:55 +0530 you wrote:
> Added support for adaptive IRQ coalescing. It uses net_dim
> algorithm to find the suitable delay/IRQ count based on the
> current packet rate.
> 
> Signed-off-by: Suman Ghosh <sumang@marvell.com>
> ---
> Changes since V2
> - Addressed review comments.
> 
> [...]

Here is the summary with links:
  - [net-next,V3] octeontx2-pf: Add support for adaptive interrupt coalescing
    https://git.kernel.org/netdev/net-next/c/6e144b47f560

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCHv2 net] bonding: fix missed rcu protection
From: Hangbin Liu @ 2022-05-18  2:18 UTC (permalink / raw)
  To: Jonathan Toppins
  Cc: andy, davem, dsahern, eric.dumazet, j.vosburgh, kuba, netdev,
	pabeni, syzbot+92beb3d46aab498710fa, vfalico, vladimir.oltean,
	Eric Dumazet, linux-kernel
In-Reply-To: <a4ed2a83d38a58b0984edb519382c867204b7ea2.1652804144.git.jtoppins@redhat.com>

On Tue, May 17, 2022 at 01:32:58PM -0400, Jonathan Toppins wrote:
> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> ---
> RESEND, list still didn't receive my last version
> 
> The diffstat is slightly larger but IMO a slightly more readable version.
> When I was reading v2 I found myself jumping around.

Hi Jon,

Thanks for the commit. But..

> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 38e152548126..f9d27b63c454 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5591,23 +5591,32 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
>  	const struct ethtool_ops *ops;
>  	struct net_device *real_dev;
>  	struct phy_device *phydev;
> +	int ret = 0;
>  
> +	rcu_read_lock();
>  	real_dev = bond_option_active_slave_get_rcu(bond);
> -	if (real_dev) {
> -		ops = real_dev->ethtool_ops;
> -		phydev = real_dev->phydev;
> -
> -		if (phy_has_tsinfo(phydev)) {
> -			return phy_ts_info(phydev, info);
> -		} else if (ops->get_ts_info) {
> -			return ops->get_ts_info(real_dev, info);
> -		}
> -	}
> +	if (real_dev)
> +		dev_hold(real_dev);
> +	rcu_read_unlock();
> +
> +	if (!real_dev)
> +		goto software;
>  
> +	ops = real_dev->ethtool_ops;
> +	phydev = real_dev->phydev;
> +
> +	if (phy_has_tsinfo(phydev))
> +		ret = phy_ts_info(phydev, info);
> +	else if (ops->get_ts_info)
> +		ret = ops->get_ts_info(real_dev, info);
	else {
		dev_put(real_dev);
		goto software;
	}

Here we need another check and goto software if !phy_has_tsinfo() and
no ops->get_ts_info. With this change we also have 2 goto and dev_put().

> +
> +	dev_put(real_dev);
> +	return ret;
> +
> +software:
>  	info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE |
>  				SOF_TIMESTAMPING_SOFTWARE;
>  	info->phc_index = -1;
> -
>  	return 0;
>  }

As Jakub remind, dev_hold() and dev_put() can take NULL now. So how about
this new patch:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38e152548126..b5c5196e03ee 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5591,16 +5591,23 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
 	const struct ethtool_ops *ops;
 	struct net_device *real_dev;
 	struct phy_device *phydev;
+	int ret = 0;
 
+	rcu_read_lock();
 	real_dev = bond_option_active_slave_get_rcu(bond);
+	dev_hold(real_dev);
+	rcu_read_unlock();
+
 	if (real_dev) {
 		ops = real_dev->ethtool_ops;
 		phydev = real_dev->phydev;
 
 		if (phy_has_tsinfo(phydev)) {
-			return phy_ts_info(phydev, info);
+			ret = phy_ts_info(phydev, info);
+			goto out;
 		} else if (ops->get_ts_info) {
-			return ops->get_ts_info(real_dev, info);
+			ret = ops->get_ts_info(real_dev, info);
+			goto out;
 		}
 	}
 
@@ -5608,7 +5615,9 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
 				SOF_TIMESTAMPING_SOFTWARE;
 	info->phc_index = -1;
 
-	return 0;
+out:
+	dev_put(real_dev);
+	return ret;
 }

Thanks
Hangbin

^ permalink raw reply related

* Re: [PATCH net-next v2 1/9] net: skb: introduce __DEFINE_SKB_DROP_REASON() to simply the code
From: Menglong Dong @ 2022-05-18  2:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, Steven Rostedt, Ingo Molnar, David Miller,
	Hideaki YOSHIFUJI, David Ahern, Paolo Abeni, Menglong Dong,
	Martin Lau, Talal Ahmad, Kees Cook, Dongli Zhang, LKML, netdev,
	Jiang Biao, Hao Peng
In-Reply-To: <20220517181457.04c37147@kernel.org>

On Wed, May 18, 2022 at 9:15 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 17 May 2022 16:10:00 +0800 menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > It is annoying to add new skb drop reasons to 'enum skb_drop_reason'
> > and TRACE_SKB_DROP_REASON in trace/event/skb.h, and it's easy to forget
> > to add the new reasons we added to TRACE_SKB_DROP_REASON.
> >
> > TRACE_SKB_DROP_REASON is used to convert drop reason of type number
> > to string. For now, the string we passed to user space is exactly the
> > same as the name in 'enum skb_drop_reason' with a 'SKB_DROP_REASON_'
> > prefix. So why not make them togather by define a macro?
> >
> > Therefore, introduce __DEFINE_SKB_DROP_REASON() and use it for 'enum
> > skb_drop_reason' definition and string converting.
> >
> > Now, what should we with the document for the reasons? How about follow
> > __BPF_FUNC_MAPPER() and make these document togather?
>
> Hi, I know BPF does this but I really find the definition-by-macro
> counter productive :(
>
> kdoc will no longer work right because the parser will not see
> the real values. cscope and other code indexers will struggle
> to find definitions.
>

Yeah, I found this problem too. My autocomplete in vscode never helps
me anymore after I use this macro.

> Did you investigate using auto-generation? Kernel already generates
> a handful of headers. Maybe with a little script we could convert
> the enum into the string thing at build time?
>

Oh, I forgot about auto-generation, it seems it's a better choice.
I'll try to use auto-generation.

> Also let's use this opportunity to move the enum to a standalone
> header, it's getting huge.
>
> Probably worth keeping this rework separate from the TCP patches.
> Up to you which one you'd like to get done first.

Ok, I'll make the enum in a standalone header in the separated
series.

Thans!
Menglong Dong

^ permalink raw reply

* Re: [PATCH net-next v4 05/12] dt-bindings: net: dsa: add bindings for Renesas RZ/N1 Advanced 5 port switch
From: Rob Herring @ 2022-05-18  1:59 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Clément Léger, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Krzysztof Kozlowski, Geert Uytterhoeven, Magnus Damm,
	Heiner Kallweit, Russell King, Thomas Petazzoni, Herve Codina,
	Miquèl Raynal, Milan Stevanovic, Jimmy Lalande,
	Pascal Eberhard, linux-kernel, devicetree, linux-renesas-soc,
	netdev
In-Reply-To: <20220511153337.deqxawpbbk3actxf@skbuf>

On Wed, May 11, 2022 at 06:33:37PM +0300, Vladimir Oltean wrote:
> On Wed, May 11, 2022 at 10:22:21AM -0500, Rob Herring wrote:
> > > +patternProperties:
> > > +  "^ethernet-ports$":
> > 
> > Move to 'properties', not a pattern.
> > 
> > With that,
> > 
> > Reviewed-by: Rob Herring <robh@kernel.org>
> 
> Even if it should have been "^(ethernet-)?ports$"?

Why? Allowing 'ports' is for existing users. New ones don't need the 
variability and should use just 'ethernet-ports'.

Rob

^ permalink raw reply

* [PATCH -next v2] net: ethernet: sunplus: add missing of_node_put() in spl2sw_mdio_init()
From: Yang Yingliang @ 2022-05-18  2:08 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: wellslutw, andrew, pabeni, davem, kuba

of_get_child_by_name() returns device node pointer with refcount
incremented. The refcount should be decremented before returning
from spl2sw_mdio_init().

Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
v2:
  add fix tag.
---
 drivers/net/ethernet/sunplus/spl2sw_mdio.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sunplus/spl2sw_mdio.c b/drivers/net/ethernet/sunplus/spl2sw_mdio.c
index 139ac8f2685e..733ae1704269 100644
--- a/drivers/net/ethernet/sunplus/spl2sw_mdio.c
+++ b/drivers/net/ethernet/sunplus/spl2sw_mdio.c
@@ -97,8 +97,10 @@ u32 spl2sw_mdio_init(struct spl2sw_common *comm)
 
 	/* Allocate and register mdio bus. */
 	mii_bus = devm_mdiobus_alloc(&comm->pdev->dev);
-	if (!mii_bus)
-		return -ENOMEM;
+	if (!mii_bus) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	mii_bus->name = "sunplus_mii_bus";
 	mii_bus->parent = &comm->pdev->dev;
@@ -110,10 +112,13 @@ u32 spl2sw_mdio_init(struct spl2sw_common *comm)
 	ret = of_mdiobus_register(mii_bus, mdio_np);
 	if (ret) {
 		dev_err(&comm->pdev->dev, "Failed to register mdiobus!\n");
-		return ret;
+		goto out;
 	}
 
 	comm->mii_bus = mii_bus;
+
+out:
+	of_node_put(mdio_np);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related

* Re: Re: [PATCH v3 1/6] dt-bindings: net: dsa: convert binding for mediatek switches
From: Rob Herring @ 2022-05-18  1:54 UTC (permalink / raw)
  To: Frank Wunderlich
  Cc: Frank Wunderlich, linux-rockchip, linux-mediatek, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski,
	Matthias Brugger, Heiko Stuebner, Sean Wang, Landen Chao,
	DENG Qingfang, Peter Geis, netdev, devicetree, linux-arm-kernel,
	linux-kernel, Greg Ungerer, René van Dorst,
	Mauro Carvalho Chehab
In-Reply-To: <trinity-68761fe5-fcca-456b-ba50-ead759f0fb54-1652256165646@3c-app-gmx-bap47>

On Wed, May 11, 2022 at 10:02:45AM +0200, Frank Wunderlich wrote:
> Hi
> 
> thanks for review
> 
> > Gesendet: Dienstag, 10. Mai 2022 um 20:44 Uhr
> > Von: "Rob Herring" <robh@kernel.org>
> > On Sat, May 07, 2022 at 07:04:35PM +0200, Frank Wunderlich wrote:
> > > From: Frank Wunderlich <frank-w@public-files.de>
> 
> > > +++ b/Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.yaml
> 
> > > +properties:
> > > +  compatible:
> > > +    enum:
> > > +      - mediatek,mt7530
> > > +      - mediatek,mt7531
> > > +      - mediatek,mt7621
> > > +
> >
> > > +  "#address-cells":
> > > +    const: 1
> > > +
> > > +  "#size-cells":
> > > +    const: 0
> >
> > I don't see any child nodes with addresses, so these can be removed.
> 
> dropping this (and address-cells/size-cells from examples) causes errors like this (address-/size-cells set in mdio
> node, so imho it should inherite):

I think you are off a level.


> Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.example.dts:34.25-35: Warning (reg_format):
> /example-0/mdio/switch@0/ports/port@0:reg: property has invalid length (4 bytes) (#address-cells == 2, #size-cells == 1)

That's for yet another level where 'ports' node need the properties.

> Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.example.dtb: Warning (pci_device_reg): Failed prerequisite 'reg_format'
> 
> > > +  interrupt-controller:
> > > +    type: boolean
> >
> > Already has a type. Just:
> >
> > interrupt-controller: true
> >
> > > +
> > > +  interrupts:
> > > +    maxItems: 1
> 
> > > +patternProperties:
> > > +  "^(ethernet-)?ports$":
> > > +    type: object
> >
> >        additionalProperties: false
> 
> imho this will block address-/size-cells from this level too. looks like it is needed here too (for port-regs).
> 
> > > +
> > > +    patternProperties:
> > > +      "^(ethernet-)?port@[0-9]+$":
> > > +        type: object
> > > +        description: Ethernet switch ports
> > > +
> > > +        unevaluatedProperties: false
> > > +
> > > +        properties:
> > > +          reg:
> > > +            description:
> > > +              Port address described must be 5 or 6 for CPU port and from 0
> > > +              to 5 for user ports.
> > > +
> > > +        allOf:
> > > +          - $ref: dsa-port.yaml#
> > > +          - if:
> > > +              properties:
> > > +                label:
> > > +                  items:
> > > +                    - const: cpu
> > > +            then:
> > > +              required:
> > > +                - reg
> > > +                - phy-mode
> > > +
> 
> > > +  - if:
> > > +      required:
> > > +        - interrupt-controller
> > > +    then:
> > > +      required:
> > > +        - interrupts
> >
> > This can be expressed as:
> >
> > dependencies:
> >   interrupt-controller: [ interrupts ]
> 
> ok, i will change this
> 
> > > +            ports {
> >
> > Use the preferred form: ethernet-ports
> 
> current implementation in all existing dts and examples from old binding are "ports" only.
> should they changed too?

They don't have to be the schema allows both, but the example should be 
best practice.

Rob


^ permalink raw reply

* Re: [PATCH] gcc: fix -Warray-compare
From: Jakub Kicinski @ 2022-05-18  1:50 UTC (permalink / raw)
  To: Martin Liška; +Cc: netdev, lkml, davem, edumazet, pabeni
In-Reply-To: <21708ede-f378-d20e-0b1d-410e946fcca5@suse.cz>

On Tue, 17 May 2022 16:19:16 +0200 Martin Liška wrote:
> Subject: [PATCH] gcc: fix -Warray-compare

The subject should be more like

eth: sun: cassini: remove dead code

> 
> Fixes the following GCC warning:
> 
> drivers/net/ethernet/sun/cassini.c:1316:29: error: comparison between two arrays [-Werror=array-compare]
> drivers/net/ethernet/sun/cassini.c:3783:34: error: comparison between two arrays [-Werror=array-compare]

Because AFAICT this comparison will always be true:

#define CAS_HP_ALT_FIRMWARE   cas_prog_null

Hopefully DaveM will correct if I'm getting this wrong :)

^ permalink raw reply

* Re: [PATCH v2 net-next 12/15] net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support
From: Jakub Kicinski @ 2022-05-18  1:44 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, nbd, john, sean.wang, Mark-MC.Lee, davem, edumazet,
	pabeni, Sam.Shih, linux-mediatek, devicetree, robh,
	lorenzo.bianconi
In-Reply-To: <cc1bd411e3028e2d6b0365ed5d29f3cea66223f8.1652716741.git.lorenzo@kernel.org>

On Mon, 16 May 2022 18:06:39 +0200 Lorenzo Bianconi wrote:
> +	struct mtk_mac *mac = netdev_priv(dev);
> +	struct mtk_tx_dma_v2 *desc = txd;
> +	struct mtk_eth *eth = mac->hw;
> +	u32 data;
> +
> +	WRITE_ONCE(desc->txd1, info->addr);
> +
> +	data = TX_DMA_PLEN0(info->size);
> +	if (info->last)
> +		data |= TX_DMA_LS0;
> +	WRITE_ONCE(desc->txd3, data);
> +
> +	if (!info->qid && mac->id)
> +		info->qid = MTK_QDMA_GMAC2_QID;
> +
> +	data = (mac->id + 1) << TX_DMA_FPORT_SHIFT_V2; /* forward port */
> +	data |= TX_DMA_SWC_V2 | QID_BITS_V2(info->qid);
> +	WRITE_ONCE(desc->txd4, data);
> +
> +	data = 0;
> +	if (info->first) {
> +		if (info->gso)
> +			data |= TX_DMA_TSO_V2;
> +		/* tx checksum offload */
> +		if (info->csum)
> +			data |= TX_DMA_CHKSUM_V2;
> +	}
> +	WRITE_ONCE(desc->txd5, data);
> +
> +	data = 0;
> +	if (info->first && info->vlan)
> +		data |= TX_DMA_INS_VLAN_V2 | info->vlan_tci;
> +	WRITE_ONCE(desc->txd6, data);
> +
> +	WRITE_ONCE(desc->txd7, 0);
> +	WRITE_ONCE(desc->txd8, 0);

Why all the WRITE_ONCE()? Don't you just need a barrier between writing
the descriptor and kicking the HW? 

^ permalink raw reply

* Re: [PATCH v2 net-next 11/15] net: ethernet: mtk_eth_soc: introduce device register map
From: Jakub Kicinski @ 2022-05-18  1:41 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, nbd, john, sean.wang, Mark-MC.Lee, davem, edumazet,
	pabeni, Sam.Shih, linux-mediatek, devicetree, robh,
	lorenzo.bianconi
In-Reply-To: <78e8c6ed230130b75aae77e6d05a9b35e298860a.1652716741.git.lorenzo@kernel.org>

On Mon, 16 May 2022 18:06:38 +0200 Lorenzo Bianconi wrote:
>  /* PDMA RX Base Pointer Register */
> -#define MTK_PRX_BASE_PTR0	0x900
> +#define MTK_PRX_BASE_PTR0	(eth->soc->reg_map[MTK_PDMA_BASE] + 0x100)
>  #define MTK_PRX_BASE_PTR_CFG(x)	(MTK_PRX_BASE_PTR0 + (x * 0x10))

Implicit macro arguments are really unpleasant for people doing
tree-wide changes or otherwise unfamiliar with the driver.

Nothing we can do to avoid this?

^ permalink raw reply

* Re: [PATCH v2 net-next 10/15] net: ethernet: mtk_eth_soc: rely on rxd_size field in mtk_rx_alloc/mtk_rx_clean
From: Jakub Kicinski @ 2022-05-18  1:39 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, nbd, john, sean.wang, Mark-MC.Lee, davem, edumazet,
	pabeni, Sam.Shih, linux-mediatek, devicetree, robh,
	lorenzo.bianconi
In-Reply-To: <eca56ab1af7f4bbedc4a6d0990a10ff58911d842.1652716741.git.lorenzo@kernel.org>

On Mon, 16 May 2022 18:06:37 +0200 Lorenzo Bianconi wrote:
> +
> +		rxd = (void *)ring->dma + i * eth->soc->txrx.rxd_size;
> +		rxd->rxd1 = (unsigned int)dma_addr;
>  
>  		if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
> -			ring->dma[i].rxd2 = RX_DMA_LSO;
> +			rxd->rxd2 = RX_DMA_LSO;
>  		else
> -			ring->dma[i].rxd2 = RX_DMA_PLEN0(ring->buf_size);
> +			rxd->rxd2 = RX_DMA_PLEN0(ring->buf_size);
> +
> +		rxd->rxd3 = 0;
> +		rxd->rxd4 = 0;

The clearing of rxd3/rxd4 should probably have been mentioned in the
commit message. It does not seem related to descriptor size.

^ permalink raw reply

* Re: [PATCH v2 net-next 05/15] net: ethernet: mtk_eth_soc: rely on txd_size in mtk_tx_alloc/mtk_tx_clean
From: Jakub Kicinski @ 2022-05-18  1:35 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, nbd, john, sean.wang, Mark-MC.Lee, davem, edumazet,
	pabeni, Sam.Shih, linux-mediatek, devicetree, robh,
	lorenzo.bianconi
In-Reply-To: <c5228daa3d277cd71a134a1062525bc19b34fa2f.1652716741.git.lorenzo@kernel.org>

On Mon, 16 May 2022 18:06:32 +0200 Lorenzo Bianconi wrote:
>  static int mtk_tx_alloc(struct mtk_eth *eth)
>  {
> +	const struct mtk_soc_data *soc = eth->soc;
>  	struct mtk_tx_ring *ring = &eth->tx_ring;
> -	int i, sz = sizeof(*ring->dma);

The change would be smaller if you left sz in place. 
I guess you have a reason not to?

> +	struct mtk_tx_dma *txd;
> +	int i;
>  
>  	ring->buf = kcalloc(MTK_DMA_SIZE, sizeof(*ring->buf),
>  			       GFP_KERNEL);
>  	if (!ring->buf)
>  		goto no_tx_mem;
>  
> -	ring->dma = dma_alloc_coherent(eth->dma_dev, MTK_DMA_SIZE * sz,
> +	ring->dma = dma_alloc_coherent(eth->dma_dev,
> +				       MTK_DMA_SIZE * soc->txrx.txd_size,
>  				       &ring->phys, GFP_ATOMIC);

Another GFP nugget.

^ permalink raw reply

* Re: [PATCH v2 net-next 04/15] net: ethernet: mtk_eth_soc: add txd_size to mtk_soc_data
From: Jakub Kicinski @ 2022-05-18  1:33 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, nbd, john, sean.wang, Mark-MC.Lee, davem, edumazet,
	pabeni, Sam.Shih, linux-mediatek, devicetree, robh,
	lorenzo.bianconi
In-Reply-To: <22bd1bd88c09205b9bf83ea4c3ab030d5dc6e670.1652716741.git.lorenzo@kernel.org>

On Mon, 16 May 2022 18:06:31 +0200 Lorenzo Bianconi wrote:
>  	eth->scratch_ring = dma_alloc_coherent(eth->dma_dev,
> -					       cnt * sizeof(struct mtk_tx_dma),
> +					       cnt * soc->txrx.txd_size,
>  					       &eth->phy_scratch_ring,
>  					       GFP_ATOMIC);
>  	if (unlikely(!eth->scratch_ring))
>  		return -ENOMEM;
>  
> -	eth->scratch_head = kcalloc(cnt, MTK_QDMA_PAGE_SIZE,
> -				    GFP_KERNEL);
> +	eth->scratch_head = kcalloc(cnt, MTK_QDMA_PAGE_SIZE, GFP_KERNEL);

Unrelated, but GFP_ATOMIC right next to GFP_KERNEL caught my attention.

^ permalink raw reply

* Re: [PATCH] fsnotify: add generic perm check for unlink/rmdir
From: Paul Moore @ 2022-05-18  1:17 UTC (permalink / raw)
  To: Guowei Du
  Cc: jack, amir73il, linux-fsdevel, linux-kernel, viro, jmorris, serge,
	ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, linux-security-module, netdev, bpf, stephen.smalley.work,
	eparis, keescook, anton, ccross, tony.luck, selinux, duguowei
In-Reply-To: <20220503183750.1977-1-duguoweisz@gmail.com>

On Tue, May 3, 2022 at 2:38 PM Guowei Du <duguoweisz@gmail.com> wrote:
>
> From: duguowei <duguowei@xiaomi.com>
>
> For now, there have been open/access/open_exec perms for file operation,
> so we add new perms check with unlink/rmdir syscall. if one app deletes
> any file/dir within pubic area, fsnotify can sends fsnotify_event to
> listener to deny that, even if the app have right dac/mac permissions.
>
> Signed-off-by: duguowei <duguowei@xiaomi.com>
> ---
>  fs/notify/fsnotify.c             |  2 +-
>  include/linux/fs.h               |  2 ++
>  include/linux/fsnotify.h         | 16 ++++++++++++++++
>  include/linux/fsnotify_backend.h |  6 +++++-
>  security/security.c              | 12 ++++++++++--
>  security/selinux/hooks.c         |  4 ++++
>  6 files changed, 38 insertions(+), 4 deletions(-)

...

> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index e9e959343de9..f0780f0eb903 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -1801,8 +1801,12 @@ static int may_create(struct inode *dir,
>  }
>
>  #define MAY_LINK       0
> +#ifndef MAY_UNLINK
>  #define MAY_UNLINK     1
> +#endif
> +#ifndef MAY_RMDIR
>  #define MAY_RMDIR      2
> +#endif

In the future if you run into a symbol collision here I would prefer
if you renamed the SELinux constants to something like SEL_MAY_LINK,
etc.

-- 
paul-moore.com

^ permalink raw reply

* Re: [PATCH v5 0/3] adin: add support for clock output
From: Jakub Kicinski @ 2022-05-18  1:16 UTC (permalink / raw)
  To: Josua Mayer; +Cc: netdev, alvaro.karsz
In-Reply-To: <20220517085143.3749-1-josua@solid-run.com>

On Tue, 17 May 2022 11:51:36 +0300 Josua Mayer wrote:
> This patch series adds support for configuring the two clock outputs of adin
> 1200 and 1300 PHYs. Certain network controllers require an external reference
> clock which can be provided by the PHY.
> 
> One of the replies to v1 was asking why the common clock framework isn't used.
> Currently no PHY driver has implemented providing a clock to the network
> controller. Instead they rely on vendor extensions to make the appropriate
> configuration. For example ar8035 uses qca,clk-out-frequency - this patchset
> aimed to replicate the same functionality.
> 
> Finally the 125MHz free-running clock is enabled in the device-tree for
> SolidRun i.MX6 SoMs, to support revisions 1.9 and later, where the original phy
> has been replaced with an adin 1300.
> To avoid introducing new warning messages during boot for SoMs before rev 1.9,
> the status field of the new phy node is disabled by default, and will be
> enabled by U-Boot on demand.
> 
> Changes since v4:
> - removed recovered clock options

Acked-by: Jakub Kicinski <kuba@kernel.org>

Thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox