* Re: [PATCH net] udp6: set dst cache for a connected sk before udp_v6_send_skb
From: Alexey Kodanev @ 2018-03-23 17:43 UTC (permalink / raw)
To: Eric Dumazet, netdev; +Cc: David Miller
In-Reply-To: <6e3bf7fc-572c-9d6f-3d15-85a3b7699f3b@oracle.com>
On 03/23/2018 08:13 PM, Alexey Kodanev wrote:
> On 03/23/2018 06:50 PM, Eric Dumazet wrote:
...
>>> + if (connected)
>>> + ip6_dst_store(sk, dst,
>>> + ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ?
>>> + &sk->sk_v6_daddr : NULL,
>>> +#ifdef CONFIG_IPV6_SUBTREES
>>> + ipv6_addr_equal(&fl6.saddr, &np->saddr) ?
>>> + &np->saddr :
>>> +#endif
>>> + NULL);
>>> +
>>
>> What about the MSG_CONFIRM stuff ?
>>
>>> if (msg->msg_flags&MSG_CONFIRM)
>>> goto do_confirm;
>>> back_from_confirm:
>>
>> Should not you move the above code here instead ?
>
> Ah, you are right, it can release that dst if it go to "do_confirm".
>
>>> Also ip6_dst_store() does not increment dst refcount.
>>
>> I fear that as soon as dst is visible to other cpus, it might be stolen.
>>
>
> So we should pass dst_clone(dst) to ip6_dst_store() instead of dst,
> because udpv6_err() can release it if it's set the new one.
>
> Then, I guess, we could left udpv6_sendmsg()/ip6_dst_store() where it
> is now in the patch and remove the check for "connected" before dst_relase(),
> similar to udp_sendmsg(), right?
And the section "release_dst:" looks redundant after that too.
Thanks,
Alexey
^ permalink raw reply
* Re: [GIT] 'net' merged into 'net-next'
From: David Miller @ 2018-03-23 17:36 UTC (permalink / raw)
To: jgg; +Cc: netdev, dledford, idosch, dsahern, sd
In-Reply-To: <20180323172622.GC13003@mellanox.com>
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Fri, 23 Mar 2018 11:26:22 -0600
> On Fri, Mar 23, 2018 at 11:40:59AM -0400, David Miller wrote:
>>
>> This merge was a little bit more hectic than usual.
>>
>> But thankfully, I had some sample conflict resolutions to work
>> with, in particular for the mlx5 infiniband changes which were
>> the most difficult to resolve.
>>
>> Please double check my work and provide any fixup patches if
>> necessary.
>
> The drivers/infiniband looks OK, and I also checked that merging
> netdev and rdma together gets us to the right result.
Thanks for looking at it.
^ permalink raw reply
* Re: [PATCH net] ipv6: fix possible deadlock in rt6_age_examine_exception()
From: David Miller @ 2018-03-23 17:41 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, weiwan, kafai
In-Reply-To: <20180323145658.154636-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 23 Mar 2018 07:56:58 -0700
> syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()
>
> rt6_age_examine_exception() is called while rt6_exception_lock is held.
> This lock is the lower one in the lock hierarchy, thus we can not
> call dst_neigh_lookup() function, as it can fallback to neigh_create()
>
> We should instead do a pure RCU lookup. As a bonus we avoid
> a pair of atomic operations on neigh refcount.
>
> [1]
...
> Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for exception table")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied and queued up for -stable, thanks Eric.
^ permalink raw reply
* [PATCH net-next] mlxsw: spectrum_span: Prevent duplicate mirrors
From: Ido Schimmel @ 2018-03-23 18:03 UTC (permalink / raw)
To: netdev; +Cc: davem, petrm, jiri, mlxsw, Ido Schimmel
In net commit 8175f7c4736f ("mlxsw: spectrum: Prevent duplicate
mirrors") we prevented the user from mirroring more than once from a
single binding point (port-direction pair).
The fix was essentially reverted in a merge conflict resolution when net
was merged into net-next. Restore it.
Fixes: 03fe2debbb27 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
.../net/ethernet/mellanox/mlxsw/spectrum_span.c | 28 ++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index ac24e52d74db..65a77708ff61 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -600,13 +600,17 @@ int mlxsw_sp_span_port_mtu_update(struct mlxsw_sp_port *port, u16 mtu)
}
static struct mlxsw_sp_span_inspected_port *
-mlxsw_sp_span_entry_bound_port_find(struct mlxsw_sp_port *port,
- struct mlxsw_sp_span_entry *span_entry)
+mlxsw_sp_span_entry_bound_port_find(struct mlxsw_sp_span_entry *span_entry,
+ enum mlxsw_sp_span_type type,
+ struct mlxsw_sp_port *port,
+ bool bind)
{
struct mlxsw_sp_span_inspected_port *p;
list_for_each_entry(p, &span_entry->bound_ports_list, list)
- if (port->local_port == p->local_port)
+ if (type == p->type &&
+ port->local_port == p->local_port &&
+ bind == p->bound)
return p;
return NULL;
}
@@ -636,8 +640,22 @@ mlxsw_sp_span_inspected_port_add(struct mlxsw_sp_port *port,
struct mlxsw_sp_span_inspected_port *inspected_port;
struct mlxsw_sp *mlxsw_sp = port->mlxsw_sp;
char sbib_pl[MLXSW_REG_SBIB_LEN];
+ int i;
int err;
+ /* A given (source port, direction) can only be bound to one analyzer,
+ * so if a binding is requested, check for conflicts.
+ */
+ if (bind)
+ for (i = 0; i < mlxsw_sp->span.entries_count; i++) {
+ struct mlxsw_sp_span_entry *curr =
+ &mlxsw_sp->span.entries[i];
+
+ if (mlxsw_sp_span_entry_bound_port_find(curr, type,
+ port, bind))
+ return -EEXIST;
+ }
+
/* if it is an egress SPAN, bind a shared buffer to it */
if (type == MLXSW_SP_SPAN_EGRESS) {
u32 buffsize = mlxsw_sp_span_mtu_to_buffsize(mlxsw_sp,
@@ -665,6 +683,7 @@ mlxsw_sp_span_inspected_port_add(struct mlxsw_sp_port *port,
}
inspected_port->local_port = port->local_port;
inspected_port->type = type;
+ inspected_port->bound = bind;
list_add_tail(&inspected_port->list, &span_entry->bound_ports_list);
return 0;
@@ -691,7 +710,8 @@ mlxsw_sp_span_inspected_port_del(struct mlxsw_sp_port *port,
struct mlxsw_sp *mlxsw_sp = port->mlxsw_sp;
char sbib_pl[MLXSW_REG_SBIB_LEN];
- inspected_port = mlxsw_sp_span_entry_bound_port_find(port, span_entry);
+ inspected_port = mlxsw_sp_span_entry_bound_port_find(span_entry, type,
+ port, bind);
if (!inspected_port)
return;
--
2.14.3
^ permalink raw reply related
* Re: [PATCH 06/28] aio: implement IOCB_CMD_POLL
From: Christoph Hellwig @ 2018-03-23 18:05 UTC (permalink / raw)
To: Al Viro
Cc: Christoph Hellwig, Avi Kivity, linux-aio, linux-fsdevel, netdev,
linux-api, linux-kernel
In-Reply-To: <20180322181653.GJ30522@ZenIV.linux.org.uk>
On Thu, Mar 22, 2018 at 06:16:53PM +0000, Al Viro wrote:
> On Thu, Mar 22, 2018 at 06:24:10PM +0100, Christoph Hellwig wrote:
>
> > -static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
> > +static bool aio_complete(struct aio_kiocb *iocb, long res, long res2,
> > + unsigned complete_flags)
>
> Looks like all callers are following that with "if returned true,
> fput(something)". Does it really make any sense to keep that struct
> file * in different fields?
struct kiocb is used not just for aio, but for our normal read/write_iter
APIs, and it is not suitable for poll or fsync. So I can't really find
a good way to keep it common except for duplicating it in struct kiocb
and strut aio_iocb. But maybe we could pass a struct file argument
to aio_complete().
> Wait a sec... What ordering do we want for
> * call(s) of ->ki_complete
> * call (if any) of ->ki_cancel
> * dropping reference to struct file
> and what are the expected call chains for all of those?
fput must be done exactly once from inside ->ki_complete OR ->ki_cancel
in case it did manage to do the actual completion. Reference to struct
file isn't needed in aio_complete, but if aio_complete decided who
won the race we'll have to put after it (or inside it if we want to make
it common)
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply
* Re: [GIT] 'net' merged into 'net-next'
From: Ido Schimmel @ 2018-03-23 18:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev, jgg, dledford, dsahern, sd
In-Reply-To: <20180323.114059.297397691681485266.davem@davemloft.net>
On Fri, Mar 23, 2018 at 11:40:59AM -0400, David Miller wrote:
>
> This merge was a little bit more hectic than usual.
>
> But thankfully, I had some sample conflict resolutions to work
> with, in particular for the mlx5 infiniband changes which were
> the most difficult to resolve.
>
> Please double check my work and provide any fixup patches if
> necessary.
Fixup for mlxsw:
http://patchwork.ozlabs.org/patch/890093/
^ permalink raw reply
* [PATCH net-next] net/sched: remove tcf_idr_cleanup()
From: Davide Caratti @ 2018-03-23 18:09 UTC (permalink / raw)
To: Cong Wang, David S. Miller; +Cc: netdev
tcf_idr_cleanup() is no more used, so remove it.
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
include/net/act_api.h | 1 -
net/sched/act_api.c | 8 --------
2 files changed, 9 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index e0a9c2003b24..9e59ebfded62 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -149,7 +149,6 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
struct tc_action **a, const struct tc_action_ops *ops,
int bind, bool cpustats);
-void tcf_idr_cleanup(struct tc_action *a, struct nlattr *est);
void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 57cf37145282..7bd1b964f021 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -296,14 +296,6 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
}
EXPORT_SYMBOL(tcf_idr_check);
-void tcf_idr_cleanup(struct tc_action *a, struct nlattr *est)
-{
- if (est)
- gen_kill_estimator(&a->tcfa_rate_est);
- free_tcf(a);
-}
-EXPORT_SYMBOL(tcf_idr_cleanup);
-
int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
struct tc_action **a, const struct tc_action_ops *ops,
int bind, bool cpustats)
--
2.14.3
^ permalink raw reply related
* Re: [bpf-next V5 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping
From: Jesper Dangaard Brouer @ 2018-03-23 18:15 UTC (permalink / raw)
To: Alexander Duyck
Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan,
brouer
In-Reply-To: <CAKgT0Ucmouu0B1Hxfk2x-2cbbTqqanf43-RDMPdcEmSFaUACzQ@mail.gmail.com>
On Fri, 23 Mar 2018 09:56:50 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:
> On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > Use the IDA infrastructure for getting a cyclic increasing ID number,
> > that is used for keeping track of each registered allocator per
> > RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which
> > uses a radix tree, use a dynamic rhashtable, for creating ID to
> > pointer lookup table, because this is faster.
> >
> > The problem that is being solved here is that, the xdp_rxq_info
> > pointer (stored in xdp_buff) cannot be used directly, as the
> > guaranteed lifetime is too short. The info is needed on a
> > (potentially) remote CPU during DMA-TX completion time . In an
> > xdp_frame the xdp_mem_info is stored, when it got converted from an
> > xdp_buff, which is sufficient for the simple page refcnt based recycle
> > schemes.
> >
> > For more advanced allocators there is a need to store a pointer to the
> > registered allocator. Thus, there is a need to guard the lifetime or
> > validity of the allocator pointer, which is done through this
> > rhashtable ID map to pointer. The removal and validity of of the
> > allocator and helper struct xdp_mem_allocator is guarded by RCU. The
> > allocator will be created by the driver, and registered with
> > xdp_rxq_info_reg_mem_model().
> >
> > It is up-to debate who is responsible for freeing the allocator
> > pointer or invoking the allocator destructor function. In any case,
> > this must happen via RCU freeing.
> >
> > Use the IDA infrastructure for getting a cyclic increasing ID number,
> > that is used for keeping track of each registered allocator per
> > RX-queue xdp_rxq_info.
> >
> > V4: Per req of Jason Wang
> > - Use xdp_rxq_info_reg_mem_model() in all drivers implementing
> > XDP_REDIRECT, even-though it's not strictly necessary when
> > allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +
> > drivers/net/tun.c | 6 +
> > drivers/net/virtio_net.c | 7 +
> > include/net/xdp.h | 15 --
> > net/core/xdp.c | 230 ++++++++++++++++++++++++-
> > 5 files changed, 248 insertions(+), 19 deletions(-)
> >
[...]
> > int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> > enum mem_type type, void *allocator)
> > {
> > + struct xdp_mem_allocator *xdp_alloc;
> > + gfp_t gfp = GFP_KERNEL;
> > + int id, errno, ret;
> > + void *ptr;
> > +
> > + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
> > + WARN(1, "Missing register, driver bug");
> > + return -EFAULT;
> > + }
> > +
> > if (type >= MEM_TYPE_MAX)
> > return -EINVAL;
> >
> > xdp_rxq->mem.type = type;
> >
> > - if (allocator)
> > - return -EOPNOTSUPP;
> > + if (!allocator)
> > + return 0;
> > +
> > + /* Delay init of rhashtable to save memory if feature isn't used */
> > + if (!mem_id_init) {
> > + mutex_lock(&mem_id_lock);
> > + ret = __mem_id_init_hash_table();
> > + mutex_unlock(&mem_id_lock);
> > + if (ret < 0) {
> > + WARN_ON(1);
> > + return ret;
> > + }
> > + }
> > +
> > + xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
> > + if (!xdp_alloc)
> > + return -ENOMEM;
> > +
> > + mutex_lock(&mem_id_lock);
> > + id = __mem_id_cyclic_get(gfp);
> > + if (id < 0) {
> > + errno = id;
> > + goto err;
> > + }
> > + xdp_rxq->mem.id = id;
> > + xdp_alloc->mem = xdp_rxq->mem;
> > + xdp_alloc->allocator = allocator;
> > +
> > + /* Insert allocator into ID lookup table */
> > + ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
> > + if (IS_ERR(ptr)) {
> > + errno = PTR_ERR(ptr);
> > + goto err;
> > + }
> > +
> > + mutex_unlock(&mem_id_lock);
> >
> > - /* TODO: Allocate an ID that maps to allocator pointer
> > - * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
> > - */
> > return 0;
> > +err:
> > + mutex_unlock(&mem_id_lock);
> > + kfree(xdp_alloc);
> > + return errno;
> > }
> > EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
> > +
> > +void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> > +{
> > + struct xdp_mem_allocator *xa;
> > +
> > + rcu_read_lock();
> > + if (mem->id)
> > + xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> > + rcu_read_unlock();
> > +
> > + if (mem->type == MEM_TYPE_PAGE_SHARED) {
> > + page_frag_free(data);
> > + return;
> > + }
> > +
> > + if (mem->type == MEM_TYPE_PAGE_ORDER0) {
> > + struct page *page = virt_to_page(data); /* Assumes order0 page*/
> > +
> > + put_page(page);
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(xdp_return_frame);
> >
>
> I'm not sure what the point is of getting the xa value if it is not
> going to be used. Also I would assume there are types that won't even
> need the hash table lookup. I would prefer to see this bit held off on
> until you have something that actually needs it.
I think, you misread the patch. The lookup is NOT going to be performed
when mem->id is zero, which is the case that you are interested in for
your ixgbe driver.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* [PATCH v6 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: sulrich, netdev, timur, Sinan Kaya, linux-arm-msm,
linux-arm-kernel
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
I did a regex search for wmb() followed by writel() in each drivers
directory.
I scrubbed the ones I care about in this series.
I considered "ease of change", "popular usage" and "performance critical
path" as the determining criteria for my filtering.
We used relaxed API heavily on ARM for a long time but
it did not exist on other architectures. For this reason, relaxed
architectures have been paying double penalty in order to use the common
drivers.
Now that relaxed API is present on all architectures, we can go and scrub
all drivers to see what needs to change and what can remain.
We start with mostly used ones and hope to increase the coverage over time.
It will take a while to cover all drivers.
Feel free to apply patches individually.
Changes since v5:
add mmiowb to missing places in order not to break PPC
Changes since v4:
posted ixgbevf: keep writel() closer to wmb() to jkircher
posted ixgbevf: eliminate duplicate barriers on weakly-ordered archs to
Sinan Kaya (7):
i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
ixgbe: eliminate duplicate barriers on weakly-ordered archs
igbvf: eliminate duplicate barriers on weakly-ordered archs
igb: eliminate duplicate barriers on weakly-ordered archs
fm10k: Eliminate duplicate barriers on weakly-ordered archs
ixgbevf: keep writel() closer to wmb()
ixgbevf: eliminate duplicate barriers on weakly-ordered archs
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 +++++++--
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++++++++----
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 +++++++--
drivers/net/ethernet/intel/igb/igb_main.c | 9 +++++++--
drivers/net/ethernet/intel/igbvf/netdev.c | 9 +++++++--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 ++++++++++++++++++----
drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 -----
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 21 +++++++++++++++++---
8 files changed, 85 insertions(+), 24 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH v6 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
Sinan Kaya, intel-wired-lan, linux-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 ++++++++++++++++++++----
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 +++++++--
2 files changed, 27 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index c6972bd..fc10cc0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
/* Mark the data descriptor to be watched */
first->next_to_watch = tx_desc;
- writel(tx_ring->next_to_use, tx_ring->tail);
+ writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
+
return 0;
dma_fail:
@@ -1529,7 +1535,12 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
* such as IA-64).
*/
wmb();
- writel(val, rx_ring->tail);
+ writel_relaxed(val, rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
/**
@@ -2412,7 +2423,12 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
*/
wmb();
- writel(xdp_ring->next_to_use, xdp_ring->tail);
+ writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
rx_ring->skb = skb;
@@ -3437,7 +3453,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
/* notify HW of packet */
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
- writel(i, tx_ring->tail);
+ writel_relaxed(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
* at a time, it synchronizes IO on IA64/Altix systems
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 1ae112f..ca02762 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -810,7 +810,12 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
* such as IA-64).
*/
wmb();
- writel(val, rx_ring->tail);
+ writel_relaxed(val, rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
/**
@@ -2379,7 +2384,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
/* notify HW of packet */
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
- writel(i, tx_ring->tail);
+ writel_relaxed(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
* at a time, it synchronizes IO on IA64/Altix systems
--
2.7.4
^ permalink raw reply related
* [PATCH v6 2/7] ixgbe: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
Sinan Kaya, intel-wired-lan, linux-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e3b32ea..1ecc2f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1701,7 +1701,12 @@ void ixgbe_alloc_rx_buffers(struct ixgbe_ring *rx_ring, u16 cleaned_count)
* such as IA-64).
*/
wmb();
- writel(i, rx_ring->tail);
+ writel_relaxed(i, rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
}
@@ -2470,7 +2475,12 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
* know there are new descriptors to fetch.
*/
wmb();
- writel(ring->next_to_use, ring->tail);
+ writel_relaxed(ring->next_to_use, ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
xdp_do_flush_map();
}
@@ -8101,7 +8111,7 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
- writel(i, tx_ring->tail);
+ writel_relaxed(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
* at a time, it synchronizes IO on IA64/Altix systems
@@ -10038,7 +10048,12 @@ static void ixgbe_xdp_flush(struct net_device *dev)
* are new descriptors to fetch.
*/
wmb();
- writel(ring->next_to_use, ring->tail);
+ writel_relaxed(ring->next_to_use, ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
return;
}
--
2.7.4
^ permalink raw reply related
* [PATCH v6 3/7] igbvf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: sulrich, netdev, timur, linux-kernel, Sinan Kaya, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/igbvf/netdev.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index fa07876..6dfd3dc 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -252,7 +252,12 @@ static void igbvf_alloc_rx_buffers(struct igbvf_ring *rx_ring,
* such as IA-64).
*/
wmb();
- writel(i, adapter->hw.hw_addr + rx_ring->tail);
+ writel_relaxed(i, adapter->hw.hw_addr + rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
}
@@ -2298,7 +2303,7 @@ static inline void igbvf_tx_queue_adv(struct igbvf_adapter *adapter,
tx_ring->buffer_info[first].next_to_watch = tx_desc;
tx_ring->next_to_use = i;
- writel(i, adapter->hw.hw_addr + tx_ring->tail);
+ writel_relaxed(i, adapter->hw.hw_addr + tx_ring->tail);
/* we need this if more than one processor can write to our tail
* at a time, it synchronizes IO on IA64/Altix systems
*/
--
2.7.4
^ permalink raw reply related
* [PATCH v6 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
Sinan Kaya, intel-wired-lan, linux-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Code includes wmb() followed by writel(). writel() already has a barrier
on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/igb/igb_main.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 3d4ff3c..570af25 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5674,7 +5674,7 @@ static int igb_tx_map(struct igb_ring *tx_ring,
igb_maybe_stop_tx(tx_ring, DESC_NEEDED);
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
- writel(i, tx_ring->tail);
+ writel_relaxed(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
* at a time, it synchronizes IO on IA64/Altix systems
@@ -8079,7 +8079,12 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
* such as IA-64).
*/
wmb();
- writel(i, rx_ring->tail);
+ writel_relaxed(i, rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
}
--
2.7.4
^ permalink raw reply related
* [PATCH v6 5/7] fm10k: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
Sinan Kaya, intel-wired-lan, linux-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Code includes wmb() followed by writel(). writel() already has a
barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing
the register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 409554d..360ff9b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -180,7 +180,12 @@ void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count)
wmb();
/* notify hardware of new descriptors */
- writel(i, rx_ring->tail);
+ writel_relaxed(i, rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
}
@@ -1055,7 +1060,7 @@ static void fm10k_tx_map(struct fm10k_ring *tx_ring,
/* notify HW of packet */
if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
- writel(i, tx_ring->tail);
+ writel_relaxed(i, tx_ring->tail);
/* we need this if more than one processor can write to our tail
* at a time, it synchronizes IO on IA64/Altix systems
--
2.7.4
^ permalink raw reply related
* [PATCH v6 6/7] ixgbevf: keep writel() closer to wmb()
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
Sinan Kaya, intel-wired-lan, linux-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Remove ixgbevf_write_tail() in favor of moving writel() close to
wmb().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 -----
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 +++---
2 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index f712646..f97091d 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -312,11 +312,6 @@ static inline u16 ixgbevf_desc_unused(struct ixgbevf_ring *ring)
return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1;
}
-static inline void ixgbevf_write_tail(struct ixgbevf_ring *ring, u32 value)
-{
- writel(value, ring->tail);
-}
-
#define IXGBEVF_RX_DESC(R, i) \
(&(((union ixgbe_adv_rx_desc *)((R)->desc))[i]))
#define IXGBEVF_TX_DESC(R, i) \
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 3d9033f..815cb1a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
* such as IA-64).
*/
wmb();
- ixgbevf_write_tail(rx_ring, i);
+ writel(i, rx_ring->tail);
}
}
@@ -1232,7 +1232,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
* know there are new descriptors to fetch.
*/
wmb();
- ixgbevf_write_tail(xdp_ring, xdp_ring->next_to_use);
+ writel(xdp_ring->next_to_use, xdp_ring->tail);
}
u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4004,7 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
tx_ring->next_to_use = i;
/* notify HW of packet */
- ixgbevf_write_tail(tx_ring, i);
+ writel(i, tx_ring->tail);
return;
dma_error:
--
2.7.4
^ permalink raw reply related
* [PATCH v6 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:21 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: netdev, timur, sulrich, linux-arm-msm, linux-arm-kernel,
Sinan Kaya, intel-wired-lan, linux-kernel
In-Reply-To: <1521829277-9398-1-git-send-email-okaya@codeaurora.org>
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 815cb1a..9e684b1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -725,7 +725,12 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
* such as IA-64).
*/
wmb();
- writel(i, rx_ring->tail);
+ writel_relaxed(i, rx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
}
@@ -1232,7 +1237,12 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
* know there are new descriptors to fetch.
*/
wmb();
- writel(xdp_ring->next_to_use, xdp_ring->tail);
+ writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
}
u64_stats_update_begin(&rx_ring->syncp);
@@ -4004,7 +4014,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
tx_ring->next_to_use = i;
/* notify HW of packet */
- writel(i, tx_ring->tail);
+ writel_relaxed(i, tx_ring->tail);
+
+ /* We need this if more than one processor can write to our tail
+ * at a time, it synchronizes IO on IA64/Altix systems
+ */
+ mmiowb();
return;
dma_error:
--
2.7.4
^ permalink raw reply related
* Re: [bpf-next V5 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping
From: Alexander Duyck @ 2018-03-23 18:22 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Netdev, BjörnTöpel, Karlsson, Magnus, Eugenia Emantayev,
Jason Wang, John Fastabend, Eran Ben Elisha, Saeed Mahameed,
Gal Pressman, Daniel Borkmann, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <20180323191509.68b62451@redhat.com>
On Fri, Mar 23, 2018 at 11:15 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Fri, 23 Mar 2018 09:56:50 -0700
> Alexander Duyck <alexander.duyck@gmail.com> wrote:
>
>> On Fri, Mar 23, 2018 at 5:18 AM, Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>> > Use the IDA infrastructure for getting a cyclic increasing ID number,
>> > that is used for keeping track of each registered allocator per
>> > RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which
>> > uses a radix tree, use a dynamic rhashtable, for creating ID to
>> > pointer lookup table, because this is faster.
>> >
>> > The problem that is being solved here is that, the xdp_rxq_info
>> > pointer (stored in xdp_buff) cannot be used directly, as the
>> > guaranteed lifetime is too short. The info is needed on a
>> > (potentially) remote CPU during DMA-TX completion time . In an
>> > xdp_frame the xdp_mem_info is stored, when it got converted from an
>> > xdp_buff, which is sufficient for the simple page refcnt based recycle
>> > schemes.
>> >
>> > For more advanced allocators there is a need to store a pointer to the
>> > registered allocator. Thus, there is a need to guard the lifetime or
>> > validity of the allocator pointer, which is done through this
>> > rhashtable ID map to pointer. The removal and validity of of the
>> > allocator and helper struct xdp_mem_allocator is guarded by RCU. The
>> > allocator will be created by the driver, and registered with
>> > xdp_rxq_info_reg_mem_model().
>> >
>> > It is up-to debate who is responsible for freeing the allocator
>> > pointer or invoking the allocator destructor function. In any case,
>> > this must happen via RCU freeing.
>> >
>> > Use the IDA infrastructure for getting a cyclic increasing ID number,
>> > that is used for keeping track of each registered allocator per
>> > RX-queue xdp_rxq_info.
>> >
>> > V4: Per req of Jason Wang
>> > - Use xdp_rxq_info_reg_mem_model() in all drivers implementing
>> > XDP_REDIRECT, even-though it's not strictly necessary when
>> > allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).
>> >
>> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> > ---
>> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +
>> > drivers/net/tun.c | 6 +
>> > drivers/net/virtio_net.c | 7 +
>> > include/net/xdp.h | 15 --
>> > net/core/xdp.c | 230 ++++++++++++++++++++++++-
>> > 5 files changed, 248 insertions(+), 19 deletions(-)
>> >
> [...]
>> > int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
>> > enum mem_type type, void *allocator)
>> > {
>> > + struct xdp_mem_allocator *xdp_alloc;
>> > + gfp_t gfp = GFP_KERNEL;
>> > + int id, errno, ret;
>> > + void *ptr;
>> > +
>> > + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
>> > + WARN(1, "Missing register, driver bug");
>> > + return -EFAULT;
>> > + }
>> > +
>> > if (type >= MEM_TYPE_MAX)
>> > return -EINVAL;
>> >
>> > xdp_rxq->mem.type = type;
>> >
>> > - if (allocator)
>> > - return -EOPNOTSUPP;
>> > + if (!allocator)
>> > + return 0;
>> > +
>> > + /* Delay init of rhashtable to save memory if feature isn't used */
>> > + if (!mem_id_init) {
>> > + mutex_lock(&mem_id_lock);
>> > + ret = __mem_id_init_hash_table();
>> > + mutex_unlock(&mem_id_lock);
>> > + if (ret < 0) {
>> > + WARN_ON(1);
>> > + return ret;
>> > + }
>> > + }
>> > +
>> > + xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
>> > + if (!xdp_alloc)
>> > + return -ENOMEM;
>> > +
>> > + mutex_lock(&mem_id_lock);
>> > + id = __mem_id_cyclic_get(gfp);
>> > + if (id < 0) {
>> > + errno = id;
>> > + goto err;
>> > + }
>> > + xdp_rxq->mem.id = id;
>> > + xdp_alloc->mem = xdp_rxq->mem;
>> > + xdp_alloc->allocator = allocator;
>> > +
>> > + /* Insert allocator into ID lookup table */
>> > + ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
>> > + if (IS_ERR(ptr)) {
>> > + errno = PTR_ERR(ptr);
>> > + goto err;
>> > + }
>> > +
>> > + mutex_unlock(&mem_id_lock);
>> >
>> > - /* TODO: Allocate an ID that maps to allocator pointer
>> > - * See: https://www.kernel.org/doc/html/latest/core-api/idr.html
>> > - */
>> > return 0;
>> > +err:
>> > + mutex_unlock(&mem_id_lock);
>> > + kfree(xdp_alloc);
>> > + return errno;
>> > }
>> > EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
>> > +
>> > +void xdp_return_frame(void *data, struct xdp_mem_info *mem)
>> > +{
>> > + struct xdp_mem_allocator *xa;
>> > +
>> > + rcu_read_lock();
>> > + if (mem->id)
>> > + xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
>> > + rcu_read_unlock();
>> > +
>> > + if (mem->type == MEM_TYPE_PAGE_SHARED) {
>> > + page_frag_free(data);
>> > + return;
>> > + }
>> > +
>> > + if (mem->type == MEM_TYPE_PAGE_ORDER0) {
>> > + struct page *page = virt_to_page(data); /* Assumes order0 page*/
>> > +
>> > + put_page(page);
>> > + }
>> > +}
>> > +EXPORT_SYMBOL_GPL(xdp_return_frame);
>> >
>>
>> I'm not sure what the point is of getting the xa value if it is not
>> going to be used. Also I would assume there are types that won't even
>> need the hash table lookup. I would prefer to see this bit held off on
>> until you have something that actually needs it.
>
> I think, you misread the patch. The lookup is NOT going to be performed
> when mem->id is zero, which is the case that you are interested in for
> your ixgbe driver.
Sorry, to clarify. Why do I have to take rcu_read_lock and
rcu_read_unlock if i am not doing an rcu read? Why even bother doing a
conditional check for mem->id if the lookup using it is not used?
Basically if I am not using it why should I take any of the overhead
for it. I would much rather have this code reduced to be as small and
fast as possible instead of wasting cycles on the RCU acquire/release,
reading mem->id, testing mem->id, and then jumping over the code that
is not used in either case even if mem->id isn't 0.
What I don't get is why you aren't getting warnings about variables
being assigned but never used in the case of xa.
- Alex
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v6 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-23 18:25 UTC (permalink / raw)
To: Sinan Kaya
Cc: Jeff Kirsher, sulrich, Netdev, Timur Tabi, LKML, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <1521829277-9398-8-git-send-email-okaya@codeaurora.org>
On Fri, Mar 23, 2018 at 11:21 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel() in multiple places. writel()
> already has a barrier on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index 815cb1a..9e684b1 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -725,7 +725,12 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ring *rx_ring,
> * such as IA-64).
> */
> wmb();
> - writel(i, rx_ring->tail);
> + writel_relaxed(i, rx_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
> }
The mmiowb shouldn't be needed for Rx. Only one CPU will be running
NAPI for the queue and we will synchronize this with a full writel
anyway when we re-enable the interrupts.
> }
>
> @@ -1232,7 +1237,12 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
> * know there are new descriptors to fetch.
> */
> wmb();
> - writel(xdp_ring->next_to_use, xdp_ring->tail);
> + writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
> }
>
> u64_stats_update_begin(&rx_ring->syncp);
> @@ -4004,7 +4014,12 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
> tx_ring->next_to_use = i;
>
> /* notify HW of packet */
> - writel(i, tx_ring->tail);
> + writel_relaxed(i, tx_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
>
> return;
> dma_error:
> --
> 2.7.4
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v6 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:27 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jeff Kirsher, sulrich, Netdev, Timur Tabi, LKML, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <CAKgT0UfCm4SJAhwyHKrHZnoJWKpp6-DWWVUb-YKgA2NTQhx=RA@mail.gmail.com>
On 3/23/2018 2:25 PM, Alexander Duyck wrote:
>> + /* We need this if more than one processor can write to our tail
>> + * at a time, it synchronizes IO on IA64/Altix systems
>> + */
>> + mmiowb();
>> }
> The mmiowb shouldn't be needed for Rx. Only one CPU will be running
> NAPI for the queue and we will synchronize this with a full writel
> anyway when we re-enable the interrupts.
>
OK. I can fix this on the next version. I did a blanket search and replace for
my writel_relaxed() changes as I don't know the code well enough.
Please point me to the redundant ones.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v6 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-23 18:30 UTC (permalink / raw)
To: Sinan Kaya
Cc: Jeff Kirsher, sulrich, Netdev, Timur Tabi, LKML, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <1521829277-9398-2-git-send-email-okaya@codeaurora.org>
On Fri, Mar 23, 2018 at 11:21 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Code includes wmb() followed by writel(). writel() already has a barrier
> on some architectures like arm64.
>
> This ends up CPU observing two barriers back to back before executing the
> register write.
>
> Since code already has an explicit barrier call, changing writel() to
> writel_relaxed().
>
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 ++++++++++++++++++++----
> drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 +++++++--
> 2 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index c6972bd..fc10cc0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -186,7 +186,13 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data,
> /* Mark the data descriptor to be watched */
> first->next_to_watch = tx_desc;
>
> - writel(tx_ring->next_to_use, tx_ring->tail);
> + writel_relaxed(tx_ring->next_to_use, tx_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
> +
> return 0;
>
The addition of mmiowb here is valid. All of the others in this patch
are invalid.
> dma_fail:
> @@ -1529,7 +1535,12 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
> * such as IA-64).
> */
> wmb();
> - writel(val, rx_ring->tail);
> + writel_relaxed(val, rx_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
> }
>
> /**
> @@ -2412,7 +2423,12 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> */
> wmb();
>
> - writel(xdp_ring->next_to_use, xdp_ring->tail);
> + writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
> }
>
> rx_ring->skb = skb;
> @@ -3437,7 +3453,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
>
> /* notify HW of packet */
> if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
> - writel(i, tx_ring->tail);
> + writel_relaxed(i, tx_ring->tail);
>
> /* we need this if more than one processor can write to our tail
> * at a time, it synchronizes IO on IA64/Altix systems
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> index 1ae112f..ca02762 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
> @@ -810,7 +810,12 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
> * such as IA-64).
> */
> wmb();
> - writel(val, rx_ring->tail);
> + writel_relaxed(val, rx_ring->tail);
> +
> + /* We need this if more than one processor can write to our tail
> + * at a time, it synchronizes IO on IA64/Altix systems
> + */
> + mmiowb();
> }
>
> /**
> @@ -2379,7 +2384,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
>
> /* notify HW of packet */
> if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) {
> - writel(i, tx_ring->tail);
> + writel_relaxed(i, tx_ring->tail);
>
> /* we need this if more than one processor can write to our tail
> * at a time, it synchronizes IO on IA64/Altix systems
> --
> 2.7.4
>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v6 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-03-23 18:31 UTC (permalink / raw)
To: Sinan Kaya
Cc: Jeff Kirsher, sulrich, Netdev, Timur Tabi, LKML, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <cf6191c0-07cc-226b-d780-f488a0d7c779@codeaurora.org>
On Fri, Mar 23, 2018 at 11:27 AM, Sinan Kaya <okaya@codeaurora.org> wrote:
> On 3/23/2018 2:25 PM, Alexander Duyck wrote:
>>> + /* We need this if more than one processor can write to our tail
>>> + * at a time, it synchronizes IO on IA64/Altix systems
>>> + */
>>> + mmiowb();
>>> }
>> The mmiowb shouldn't be needed for Rx. Only one CPU will be running
>> NAPI for the queue and we will synchronize this with a full writel
>> anyway when we re-enable the interrupts.
>>
>
> OK. I can fix this on the next version. I did a blanket search and replace for
> my writel_relaxed() changes as I don't know the code well enough.
>
> Please point me to the redundant ones.
So from what I can tell only this file and i40e needed any additional
mmiowb calls added. The rest are not needed.
- Alex
^ permalink raw reply
* [PATCH net-next] net/sched: act_vlan: declare push_vid with host byte order
From: Davide Caratti @ 2018-03-23 18:31 UTC (permalink / raw)
To: Jiri Pirko, David S. Miller; +Cc: netdev
use u16 in place of __be16 to suppress the following sparse warnings:
net/sched/act_vlan.c:150:26: warning: incorrect type in assignment (different base types)
net/sched/act_vlan.c:150:26: expected restricted __be16 [usertype] push_vid
net/sched/act_vlan.c:150:26: got unsigned short
net/sched/act_vlan.c:151:21: warning: restricted __be16 degrades to integer
net/sched/act_vlan.c:208:26: warning: incorrect type in assignment (different base types)
net/sched/act_vlan.c:208:26: expected unsigned short [unsigned] [usertype] tcfv_push_vid
net/sched/act_vlan.c:208:26: got restricted __be16 [usertype] push_vid
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
net/sched/act_vlan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 4595391c2129..41a66effeb5f 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -117,7 +117,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
struct tc_vlan *parm;
struct tcf_vlan *v;
int action;
- __be16 push_vid = 0;
+ u16 push_vid = 0;
__be16 push_proto = 0;
u8 push_prio = 0;
bool exists = false;
--
2.14.3
^ permalink raw reply related
* Re: [GIT] 'net' merged into 'net-next'
From: Saeed Mahameed @ 2018-03-23 18:34 UTC (permalink / raw)
To: Jason Gunthorpe, davem@davemloft.net
Cc: dsahern@gmail.com, netdev@vger.kernel.org, Ido Schimmel,
dledford@redhat.com, sd@queasysnail.net
In-Reply-To: <20180323.133616.109678949206778090.davem@davemloft.net>
On Fri, 2018-03-23 at 13:36 -0400, David Miller wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> Date: Fri, 23 Mar 2018 11:26:22 -0600
>
> > On Fri, Mar 23, 2018 at 11:40:59AM -0400, David Miller wrote:
> > >
> > > This merge was a little bit more hectic than usual.
> > >
> > > But thankfully, I had some sample conflict resolutions to work
> > > with, in particular for the mlx5 infiniband changes which were
> > > the most difficult to resolve.
> > >
> > > Please double check my work and provide any fixup patches if
> > > necessary.
> >
> > The drivers/infiniband looks OK, and I also checked that merging
> > netdev and rdma together gets us to the right result.
>
> Thanks for looking at it.
Dave,
I would like to raise this up again, we already suggested a way to
avoid these kind of failures in the future, but you've seem to missed
it.
Basically we want to run mlx5 core branch to be clean from netdev or
rdma stuff and will be submitted to both subsystems.
for example if a netdev/rdma feature want to add mlx5 core
functionality we will end up sending a pull request in the following
structure:
mlx5/core pull request -> goes to both trees (netdev and rdma).
mlx5 netdev part pull request -> goes to netdev only
same for rdma, the mlx5/core part goes to both trees, but net-next is
not required to pull the rdma part, we will get to review it but will
never need to pull it..
Is this something that could work ?
Thanks,
Saeed.
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v6 7/7] ixgbevf: eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-03-23 18:45 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jeff Kirsher, sulrich, Netdev, Timur Tabi, LKML, intel-wired-lan,
linux-arm-msm, linux-arm-kernel
In-Reply-To: <CAKgT0UeAJOdvqXfRw1xF+LnpGo3K7NQt_3DcJGUkiPMF+O8PTg@mail.gmail.com>
On 3/23/2018 2:31 PM, Alexander Duyck wrote:
>> Please point me to the redundant ones.
> So from what I can tell only this file and i40e needed any additional
> mmiowb calls added. The rest are not needed.
Thanks, I'll clean up between 2..6 and then make your suggested changes
on 1 and 7.
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Re: [iproute2 1/1] ss: Add support for TIPC socket diag in ss tool
From: Jon Maloy @ 2018-03-23 18:50 UTC (permalink / raw)
To: Mohan Krishna Ghanta Krishnamurthy,
tipc-discussion@lists.sourceforge.net, maloy@donjonn.com,
ying.xue@windriver.com, netdev@vger.kernel.org,
stephen@networkplumber.org
In-Reply-To: <1521813662-9954-2-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Hi Mohan,
I remember you mentioned the possibility to add this functionality to the tipc tool, too.
Would it be easy to add a 'tipc ss' command that just calls 'ss' with the same parameters? I.e., no duplication of functionality ?
///jon
> -----Original Message-----
> From: Mohan Krishna Ghanta Krishnamurthy
> Sent: Friday, March 23, 2018 10:01
> To: tipc-discussion@lists.sourceforge.net; Jon Maloy
> <jon.maloy@ericsson.com>; maloy@donjonn.com;
> ying.xue@windriver.com; Mohan Krishna Ghanta Krishnamurthy
> <mohan.krishna.ghanta.krishnamurthy@ericsson.com>;
> netdev@vger.kernel.org; stephen@networkplumber.org
> Cc: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
> Subject: [iproute2 1/1] ss: Add support for TIPC socket diag in ss tool
>
> For iproute 4.x
> Allow TIPC socket statistics to be dumped with --tipc and tipc specific info
> with --tipcinfo.
>
> Acked-by: Jon Maloy <jon.maloy@ericsson.com>
> Signed-off-by: GhantaKrishnamurthy MohanKrishna
> <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
> Signed-off-by: Parthasarathy Bhuvaragan
> <parthasarathy.bhuvaragan@gmail.com>
> ---
> misc/ss.c | 166
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++-
> 1 file changed, 164 insertions(+), 2 deletions(-)
>
> diff --git a/misc/ss.c b/misc/ss.c
> index e047f9c04582..812f45717af9 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -45,6 +45,10 @@
> #include <linux/netlink_diag.h>
> #include <linux/sctp.h>
> #include <linux/vm_sockets_diag.h>
> +#include <linux/net.h>
> +#include <linux/tipc.h>
> +#include <linux/tipc_netlink.h>
> +#include <linux/tipc_sockets_diag.h>
>
> #define MAGIC_SEQ 123456
> #define BUF_CHUNK (1024 * 1024)
> @@ -104,6 +108,7 @@ int show_sock_ctx;
> int show_header = 1;
> int follow_events;
> int sctp_ino;
> +int show_tipcinfo;
>
> enum col_id {
> COL_NETID,
> @@ -191,6 +196,7 @@ enum {
> SCTP_DB,
> VSOCK_ST_DB,
> VSOCK_DG_DB,
> + TIPC_DB,
> MAX_DB
> };
>
> @@ -230,6 +236,7 @@ enum {
>
> #define SS_ALL ((1 << SS_MAX) - 1)
> #define SS_CONN (SS_ALL &
> ~((1<<SS_LISTEN)|(1<<SS_CLOSE)|(1<<SS_TIME_WAIT)|(1<<SS_SYN_RECV)
> ))
> +#define TIPC_SS_CONN
> ((1<<SS_ESTABLISHED)|(1<<SS_LISTEN)|(1<<SS_CLOSE))
>
> #include "ssfilter.h"
>
> @@ -297,6 +304,10 @@ static const struct filter default_dbs[MAX_DB] = {
> .states = SS_CONN,
> .families = FAMILY_MASK(AF_VSOCK),
> },
> + [TIPC_DB] = {
> + .states = TIPC_SS_CONN,
> + .families = FAMILY_MASK(AF_TIPC),
> + },
> };
>
> static const struct filter default_afs[AF_MAX] = { @@ -324,6 +335,10 @@
> static const struct filter default_afs[AF_MAX] = {
> .dbs = VSOCK_DBM,
> .states = SS_CONN,
> },
> + [AF_TIPC] = {
> + .dbs = (1 << TIPC_DB),
> + .states = TIPC_SS_CONN,
> + },
> };
>
> static int do_default = 1;
> @@ -364,6 +379,7 @@ static void filter_default_dbs(struct filter *f)
> filter_db_set(f, SCTP_DB);
> filter_db_set(f, VSOCK_ST_DB);
> filter_db_set(f, VSOCK_DG_DB);
> + filter_db_set(f, TIPC_DB);
> }
>
> static void filter_states_set(struct filter *f, int states) @@ -748,6 +764,14
> @@ static const char *sctp_sstate_name[] = {
> [SCTP_STATE_SHUTDOWN_ACK_SENT] = "ACK_SENT", };
>
> +static const char * const stype_nameg[] = {
> + "UNKNOWN",
> + [SOCK_STREAM] = "STREAM",
> + [SOCK_DGRAM] = "DGRAM",
> + [SOCK_RDM] = "RDM",
> + [SOCK_SEQPACKET] = "SEQPACKET",
> +};
> +
> struct sockstat {
> struct sockstat *next;
> unsigned int type;
> @@ -888,6 +912,22 @@ static const char *vsock_netid_name(int type)
> }
> }
>
> +static const char *tipc_netid_name(int type) {
> + switch (type) {
> + case SOCK_STREAM:
> + return "ti_st";
> + case SOCK_DGRAM:
> + return "ti_dg";
> + case SOCK_RDM:
> + return "ti_rd";
> + case SOCK_SEQPACKET:
> + return "ti_sq";
> + default:
> + return "???";
> + }
> +}
> +
> /* Allocate and initialize a new buffer chunk */ static struct buf_chunk
> *buf_chunk_new(void) { @@ -1274,6 +1314,9 @@ static void
> sock_state_print(struct sockstat *s)
> case AF_NETLINK:
> sock_name = "nl";
> break;
> + case AF_TIPC:
> + sock_name = tipc_netid_name(s->type);
> + break;
> case AF_VSOCK:
> sock_name = vsock_netid_name(s->type);
> break;
> @@ -4250,6 +4293,105 @@ static int vsock_show(struct filter *f)
> return handle_netlink_request(f, &req.nlh, sizeof(req),
> vsock_show_sock); }
>
> +static void tipc_sock_addr_print(struct rtattr *net_addr, struct rtattr
> +*id) {
> + uint32_t node = rta_getattr_u32(net_addr);
> + uint32_t identity = rta_getattr_u32(id);
> +
> + SPRINT_BUF(addr) = {};
> + SPRINT_BUF(port) = {};
> +
> + sprintf(addr, "%u", node);
> + sprintf(port, "%u", identity);
> + sock_addr_print(addr, ":", port, NULL);
> +
> +}
> +
> +static int tipc_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr
> *nlh,
> + void *arg)
> +{
> + struct rtattr *stat[TIPC_NLA_SOCK_STAT_MAX + 1] = {};
> + struct rtattr *attrs[TIPC_NLA_SOCK_MAX + 1] = {};
> + struct rtattr *con[TIPC_NLA_CON_MAX + 1] = {};
> + struct rtattr *info[TIPC_NLA_MAX + 1] = {};
> + struct rtattr *msg_ref;
> + struct sockstat ss = {};
> +
> + parse_rtattr(info, TIPC_NLA_MAX, NLMSG_DATA(nlh),
> + NLMSG_PAYLOAD(nlh, 0));
> +
> + if (!info[TIPC_NLA_SOCK])
> + return 0;
> +
> + msg_ref = info[TIPC_NLA_SOCK];
> + parse_rtattr(attrs, TIPC_NLA_SOCK_MAX, RTA_DATA(msg_ref),
> + RTA_PAYLOAD(msg_ref));
> +
> + msg_ref = attrs[TIPC_NLA_SOCK_STAT];
> + parse_rtattr(stat, TIPC_NLA_SOCK_STAT_MAX,
> + RTA_DATA(msg_ref), RTA_PAYLOAD(msg_ref));
> +
> +
> + ss.local.family = AF_TIPC;
> + ss.type = rta_getattr_u32(attrs[TIPC_NLA_SOCK_TYPE]);
> + ss.state = rta_getattr_u32(attrs[TIPC_NLA_SOCK_TIPC_STATE]);
> + ss.uid = rta_getattr_u32(attrs[TIPC_NLA_SOCK_UID]);
> + ss.ino = rta_getattr_u32(attrs[TIPC_NLA_SOCK_INO]);
> + ss.rq = rta_getattr_u32(stat[TIPC_NLA_SOCK_STAT_RCVQ]);
> + ss.wq = rta_getattr_u32(stat[TIPC_NLA_SOCK_STAT_SENDQ]);
> + ss.sk = rta_getattr_u64(attrs[TIPC_NLA_SOCK_COOKIE]);
> +
> + sock_state_print (&ss);
> +
> + tipc_sock_addr_print(attrs[TIPC_NLA_SOCK_ADDR],
> + attrs[TIPC_NLA_SOCK_REF]);
> +
> + msg_ref = attrs[TIPC_NLA_SOCK_CON];
> + if (msg_ref) {
> + parse_rtattr(con, TIPC_NLA_CON_MAX,
> + RTA_DATA(msg_ref), RTA_PAYLOAD(msg_ref));
> +
> + tipc_sock_addr_print(con[TIPC_NLA_CON_NODE],
> + con[TIPC_NLA_CON_SOCK]);
> + } else
> + sock_addr_print("", "-", "", NULL);
> +
> + if (show_details)
> + sock_details_print(&ss);
> +
> + proc_ctx_print(&ss);
> +
> + if (show_tipcinfo) {
> + out("\n type:%s", stype_nameg[ss.type]);
> + out(" cong:%s ",
> + stat[TIPC_NLA_SOCK_STAT_LINK_CONG] ? "link" :
> + stat[TIPC_NLA_SOCK_STAT_CONN_CONG] ? "conn" :
> "none");
> + out(" drop:%d ",
> + rta_getattr_u32(stat[TIPC_NLA_SOCK_STAT_DROP]));
> +
> + if (attrs[TIPC_NLA_SOCK_HAS_PUBL])
> + out(" publ");
> +
> + if (con[TIPC_NLA_CON_FLAG])
> + out(" via {%u,%u} ",
> + rta_getattr_u32(con[TIPC_NLA_CON_TYPE]),
> + rta_getattr_u32(con[TIPC_NLA_CON_INST]));
> + }
> +
> + return 0;
> +}
> +
> +static int tipc_show(struct filter *f)
> +{
> + DIAG_REQUEST(req, struct tipc_sock_diag_req r);
> +
> + memset(&req.r, 0, sizeof(req.r));
> + req.r.sdiag_family = AF_TIPC;
> + req.r.tidiag_states = f->states;
> +
> + return handle_netlink_request(f, &req.nlh, sizeof(req),
> +tipc_show_sock); }
> +
> struct sock_diag_msg {
> __u8 sdiag_family;
> };
> @@ -4494,6 +4636,7 @@ static void _usage(FILE *dest)
> " -m, --memory show socket memory usage\n"
> " -p, --processes show process using socket\n"
> " -i, --info show internal TCP information\n"
> +" --tipcinfo show internal tipc socket information\n"
> " -s, --summary show socket usage summary\n"
> " -b, --bpf show bpf filter socket information\n"
> " -E, --events continually display sockets as they are destroyed\n"
> @@ -4510,15 +4653,16 @@ static void _usage(FILE *dest)
> " -d, --dccp display only DCCP sockets\n"
> " -w, --raw display only RAW sockets\n"
> " -x, --unix display only Unix domain sockets\n"
> +" --tipc display only TIPC sockets\n"
> " --vsock display only vsock sockets\n"
> " -f, --family=FAMILY display sockets of type FAMILY\n"
> -" FAMILY := {inet|inet6|link|unix|netlink|vsock|help}\n"
> +" FAMILY := {inet|inet6|link|unix|netlink|vsock|tipc|help}\n"
> "\n"
> " -K, --kill forcibly close sockets, display what was closed\n"
> " -H, --no-header Suppress header line\n"
> "\n"
> " -A, --query=QUERY, --socket=QUERY\n"
> -" QUERY :=
> {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|pac
> ket|netlink|vsock_stream|vsock_dgram}[,QUERY]\n"
> +" QUERY :=
> {all|inet|tcp|udp|raw|unix|unix_dgram|unix_stream|unix_seqpacket|pac
> ket|netlink|vsock_stream|vsock_dgram|tipc}[,QUERY]\n"
> "\n"
> " -D, --diag=FILE Dump raw information about TCP sockets to FILE\n"
> " -F, --filter=FILE read filter information from FILE\n"
> @@ -4594,6 +4738,10 @@ static int scan_state(const char *state)
> /* Values 'v' and 'V' are already used so a non-character is used */ #define
> OPT_VSOCK 256
>
> +/* Values of 't' are already used so a non-character is used */ #define
> +OPT_TIPCSOCK 257 #define OPT_TIPCINFO 258
> +
> static const struct option long_opts[] = {
> { "numeric", 0, 0, 'n' },
> { "resolve", 0, 0, 'r' },
> @@ -4610,6 +4758,7 @@ static const struct option long_opts[] = {
> { "udp", 0, 0, 'u' },
> { "raw", 0, 0, 'w' },
> { "unix", 0, 0, 'x' },
> + { "tipc", 0, 0, OPT_TIPCSOCK},
> { "vsock", 0, 0, OPT_VSOCK },
> { "all", 0, 0, 'a' },
> { "listening", 0, 0, 'l' },
> @@ -4627,6 +4776,7 @@ static const struct option long_opts[] = {
> { "context", 0, 0, 'Z' },
> { "contexts", 0, 0, 'z' },
> { "net", 1, 0, 'N' },
> + { "tipcinfo", 0, 0, OPT_TIPCINFO},
> { "kill", 0, 0, 'K' },
> { "no-header", 0, 0, 'H' },
> { 0 }
> @@ -4699,6 +4849,9 @@ int main(int argc, char *argv[])
> case OPT_VSOCK:
> filter_af_set(¤t_filter, AF_VSOCK);
> break;
> + case OPT_TIPCSOCK:
> + filter_af_set(¤t_filter, AF_TIPC);
> + break;
> case 'a':
> state_filter = SS_ALL;
> break;
> @@ -4725,6 +4878,8 @@ int main(int argc, char *argv[])
> filter_af_set(¤t_filter, AF_UNIX);
> else if (strcmp(optarg, "netlink") == 0)
> filter_af_set(¤t_filter, AF_NETLINK);
> + else if (strcmp(optarg, "tipc") == 0)
> + filter_af_set(¤t_filter, AF_TIPC);
> else if (strcmp(optarg, "vsock") == 0)
> filter_af_set(¤t_filter, AF_VSOCK);
> else if (strcmp(optarg, "help") == 0) @@ -4801,6
> +4956,8 @@ int main(int argc, char *argv[])
> } else if (strcmp(p, "vsock_dgram") == 0 ||
> strcmp(p, "v_dgr") == 0) {
> filter_db_set(¤t_filter,
> VSOCK_DG_DB);
> + } else if (strcmp(optarg, "tipc") == 0) {
> + filter_db_set(¤t_filter,
> TIPC_DB);
> } else {
> fprintf(stderr, "ss: \"%s\" is illegal
> socket table id\n", p);
> usage();
> @@ -4848,6 +5005,9 @@ int main(int argc, char *argv[])
> if (netns_switch(optarg))
> exit(1);
> break;
> + case OPT_TIPCINFO:
> + show_tipcinfo = 1;
> + break;
> case 'K':
> current_filter.kill = 1;
> break;
> @@ -4979,6 +5139,8 @@ int main(int argc, char *argv[])
> sctp_show(¤t_filter);
> if (current_filter.dbs & VSOCK_DBM)
> vsock_show(¤t_filter);
> + if (current_filter.dbs & (1<<TIPC_DB))
> + tipc_show(¤t_filter);
>
> if (show_users || show_proc_ctx || show_sock_ctx)
> user_ent_destroy();
> --
> 2.1.4
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox