* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Jiri Pirko @ 2014-12-05 7:38 UTC (permalink / raw)
To: roopa
Cc: sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen, linville,
nhorman, nicolas.dichtel, vyasevic, f.fainelli, buytenh, aviadr,
netdev, davem, shm, gospo
In-Reply-To: <1417746401-8140-3-git-send-email-roopa@cumulusnetworks.com>
Fri, Dec 05, 2014 at 03:26:40AM CET, roopa@cumulusnetworks.com wrote:
>From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
>This allows offloading to switch asic without having the user to set
>any flag. And this is done in the bridge driver to rollback kernel settings
>on hw offload failure if required in the future.
>
>With this, it also makes sure a notification goes out only after the
>attributes are set both in the kernel and hw.
>---
> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
>diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>index 9f5eb55..ce173f0 100644
>--- a/net/bridge/br_netlink.c
>+++ b/net/bridge/br_netlink.c
>@@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
> afspec, RTM_SETLINK);
> }
>
>+ if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
>+ dev->netdev_ops->ndo_bridge_setlink) {
>+ int ret = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
This (and I suspect other patches as well) has many issues which
are pointed out by scripts/checkpatch.pl sctipts. For example
here, there should be an empty line. Please run your patches by
this script before you send them.
>+ if (ret && ret != -EOPNOTSUPP) {
>+ /* XXX Fix this in the future to rollback
>+ * kernel settings and return error
>+ */
>+ br_warn(p->br, "error offloading bridge attributes "
>+ "on port %u(%s)\n", (unsigned int) p->port_no,
>+ p->dev->name);
>+ }
>+ }
>+
> if (err == 0)
> br_ifinfo_notify(RTM_NEWLINK, p);
>-
> out:
> return err;
> }
>@@ -433,6 +445,19 @@ int br_dellink(struct net_device *dev, struct nlmsghdr *nlh)
> err = br_afspec((struct net_bridge *)netdev_priv(dev), p,
> afspec, RTM_DELLINK);
>
>+ if (dev->features & NETIF_F_HW_SWITCH_OFFLOAD
>+ && dev->netdev_ops->ndo_bridge_setlink) {
>+ int ret = dev->netdev_ops->ndo_bridge_dellink(dev, nlh);
c&p issue, there should be check for dellink here.
>+ if (ret && ret != -EOPNOTSUPP) {
>+ /* XXX Fix this in the future to rollback
>+ * kernel settings and return error
>+ */
>+ br_warn(p->br, "error offloading bridge attributes "
>+ "on port %u(%s)\n", (unsigned int) p->port_no,
>+ p->dev->name);
>+ }
>+ }
>+
I agree with Scott that this code should be moved to rtnetlink.c
> return err;
> }
> static int br_validate(struct nlattr *tb[], struct nlattr *data[])
>--
>1.7.10.4
>
^ permalink raw reply
* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
From: Jason Wang @ 2014-12-05 7:35 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Pankaj Gupta, linux-kernel, netdev, davem, dgibson, vfalico,
edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
xii, stephen, jiri, sergei.shtylyov
In-Reply-To: <20141204102013.GC17122@redhat.com>
On 12/04/2014 06:20 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 04, 2014 at 03:03:34AM +0008, Jason Wang wrote:
>> >
>> >
>> > On Wed, Dec 3, 2014 at 5:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> > >On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
>>>> > >> Networking under kvm works best if we allocate a per-vCPU RX and TX
>>>> > >> queue in a virtual NIC. This requires a per-vCPU queue on the host
>>>> > >>side.
>>>> > >> It is now safe to increase the maximum number of queues.
>>>> > >> Preceding patche: 'net: allow large number of rx queues'
>>> > >
>>> > >s/patche/patch/
>>> > >
>>>> > >> made sure this won't cause failures due to high order memory
>>>> > >> allocations. Increase it to 256: this is the max number of vCPUs
>>>> > >> KVM supports.
>>>> > >> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>>>> > >> Reviewed-by: David Gibson <dgibson@redhat.com>
>>> > >
>>> > >Hmm it's kind of nasty that each tun device is now using x16 memory.
>>> > >Maybe we should look at using a flex array instead, and removing the
>>> > >limitation altogether (e.g. make it INT_MAX)?
>> >
>> > But this only happens when IFF_MULTIQUEUE were used.
> I refer to this field:
> struct tun_file __rcu *tfiles[MAX_TAP_QUEUES];
> if we make MAX_TAP_QUEUES 256, this will use 4K bytes,
> apparently unconditionally.
>
>
How about just allocate one tfile if IFF_MULTIQUEUE were disabled?
^ permalink raw reply
* Re: [patch] ipvs: uninitialized data with IP_VS_IPV6
From: Dan Carpenter @ 2014-12-05 7:25 UTC (permalink / raw)
To: Julian Anastasov
Cc: Wensong Zhang, Simon Horman, Pablo Neira Ayuso, Patrick McHardy,
Jozsef Kadlecsik, David S. Miller, netdev, lvs-devel,
netfilter-devel, coreteam, kernel-janitors
In-Reply-To: <alpine.LFD.2.11.1412042308100.4841@ja.home.ssi.bg>
On Thu, Dec 04, 2014 at 11:19:34PM +0200, Julian Anastasov wrote:
>
> I guess ip_vs_ftp_in() needs the same fix?
Good catch. Thanks for noticing that. I'll send a v2.
regards,
dan carpenter
^ permalink raw reply
* Re: [PATCH v2 1/6] net-PPP: Replacement of a printk() call by pr_warn() in mppe_rekey()
From: Julia Lawall @ 2014-12-05 7:21 UTC (permalink / raw)
To: Joe Perches
Cc: SF Markus Elfring, Sergei Shtylyov, Paul Mackerras, linux-ppp,
netdev, Eric Dumazet, LKML, kernel-janitors
In-Reply-To: <1417731809.2721.17.camel@perches.com>
On Thu, 4 Dec 2014, Joe Perches wrote:
> On Thu, 2014-12-04 at 23:10 +0100, SF Markus Elfring wrote:
> > The mppe_rekey() function contained a few update candidates.
> > * Curly brackets were still used around a single function call "printk".
> > * Unwanted space characters
> >
> > Let us improve these implementation details according to the current Linux
> > coding style convention.
>
> trivia:
>
> > diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
> []
> > @@ -172,9 +172,8 @@ static void mppe_rekey(struct ppp_mppe_state * state, int initial_key)
> > setup_sg(sg_in, state->sha1_digest, state->keylen);
> > setup_sg(sg_out, state->session_key, state->keylen);
> > if (crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
> > - state->keylen) != 0) {
> > - printk(KERN_WARNING "mppe_rekey: cipher_encrypt failed\n");
> > - }
> > + state->keylen) != 0)
> > + pr_warn("mppe_rekey: cipher_encrypt failed\n");
>
> It's generally nicer to replace embedded function names
> with "%s: ", __func__
>
> pr_warn("%s: cipher_encrypt failed\n", __func__);
Doing so may potentially allow some strings to be shared, thus saving a
little space. Perhaps not in this case, though.
julia
^ permalink raw reply
* Re: [PATCH v2 1/6] net-PPP: Replacement of a printk() call by pr_warn() in mppe_rekey()
From: SF Markus Elfring @ 2014-12-05 7:18 UTC (permalink / raw)
To: Joe Perches
Cc: Sergei Shtylyov, Paul Mackerras, linux-ppp, netdev, Eric Dumazet,
LKML, kernel-janitors, Julia Lawall
In-Reply-To: <1417733101.2721.20.camel@perches.com>
>>> It's generally nicer to replace embedded function names
>>> with "%s: ", __func__
>>>
>>> pr_warn("%s: cipher_encrypt failed\n", __func__);
>>
>> Do you want that I send a third patch series for the fine-tuning of these parameters?
>
> If you want.
Would "a committer" fix such a small source code adjustment also without a resend of
a patch series?
> I just wanted you to be aware of it for future patches.
Thanks for your tip.
Does it make sense to express such implementation details in the Linux coding
style documentation more explicitly (besides the fact that this update suggestion
was also triggered by a warning from the script "checkpatch.pl").
Regards,
Markus
^ permalink raw reply
* [RESEND PATCH] net/socket.c : introduce helper function do_sock_sendmsg to replace reduplicate code
From: Gu Zheng @ 2014-12-05 7:14 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, Gu Zheng
In-Reply-To: <1416799125-13972-1-git-send-email-guz.fnst@cn.fujitsu.com>
Introduce helper function do_sock_sendmsg() to simplify sock_sendmsg{_nosec},
and replace reduplicate code.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
net/socket.c | 22 ++++++++++------------
1 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index fe20c31..2a6f3c6 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -651,7 +651,8 @@ static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
return err ?: __sock_sendmsg_nosec(iocb, sock, msg, size);
}
-int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
+static int do_sock_sendmsg(struct socket *sock, struct msghdr *msg,
+ size_t size, bool nosec)
{
struct kiocb iocb;
struct sock_iocb siocb;
@@ -659,25 +660,22 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
init_sync_kiocb(&iocb, NULL);
iocb.private = &siocb;
- ret = __sock_sendmsg(&iocb, sock, msg, size);
+ ret = nosec ? __sock_sendmsg_nosec(&iocb, sock, msg, size) :
+ __sock_sendmsg(&iocb, sock, msg, size);
if (-EIOCBQUEUED == ret)
ret = wait_on_sync_kiocb(&iocb);
return ret;
}
+
+int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
+{
+ return do_sock_sendmsg(sock, msg, size, false);
+}
EXPORT_SYMBOL(sock_sendmsg);
static int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg, size_t size)
{
- struct kiocb iocb;
- struct sock_iocb siocb;
- int ret;
-
- init_sync_kiocb(&iocb, NULL);
- iocb.private = &siocb;
- ret = __sock_sendmsg_nosec(&iocb, sock, msg, size);
- if (-EIOCBQUEUED == ret)
- ret = wait_on_sync_kiocb(&iocb);
- return ret;
+ return do_sock_sendmsg(sock, msg, size, true);
}
int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
--
1.7.7
.
^ permalink raw reply related
* [RESEND PATCH] net: introduce helper macra CMSG_FOREACH_HDR
From: Gu Zheng @ 2014-12-05 7:14 UTC (permalink / raw)
To: davem; +Cc: netdev, linux-kernel, Gu Zheng
Introduce helper macra CMSG_FOREACH_HDR as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
---
.../networking/timestamping/timestamping.c | 4 +---
.../networking/timestamping/txtimestamp.c | 4 +---
crypto/af_alg.c | 2 +-
include/linux/socket.h | 4 ++++
net/core/scm.c | 3 +--
net/dccp/proto.c | 5 ++---
net/ipv4/ip_sockglue.c | 2 +-
net/ipv6/datagram.c | 2 +-
net/iucv/af_iucv.c | 4 +---
net/rds/send.c | 4 ++--
net/rxrpc/ar-output.c | 2 +-
net/sctp/socket.c | 3 +--
12 files changed, 17 insertions(+), 22 deletions(-)
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
index 5cdfd74..3106e88 100644
--- a/Documentation/networking/timestamping/timestamping.c
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -169,9 +169,7 @@ static void printpacket(struct msghdr *msg, int res,
res,
inet_ntoa(from_addr->sin_addr),
msg->msg_controllen);
- for (cmsg = CMSG_FIRSTHDR(msg);
- cmsg;
- cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
printf(" cmsg len %zu: ", cmsg->cmsg_len);
switch (cmsg->cmsg_level) {
case SOL_SOCKET:
diff --git a/Documentation/networking/timestamping/txtimestamp.c b/Documentation/networking/timestamping/txtimestamp.c
index b32fc2a..e44ef35 100644
--- a/Documentation/networking/timestamping/txtimestamp.c
+++ b/Documentation/networking/timestamping/txtimestamp.c
@@ -149,9 +149,7 @@ static void __recv_errmsg_cmsg(struct msghdr *msg, int payload_len)
struct scm_timestamping *tss = NULL;
struct cmsghdr *cm;
- for (cm = CMSG_FIRSTHDR(msg);
- cm && cm->cmsg_len;
- cm = CMSG_NXTHDR(msg, cm)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (cm->cmsg_level == SOL_SOCKET &&
cm->cmsg_type == SCM_TIMESTAMPING) {
tss = (void *) CMSG_DATA(cm);
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 6a3ad80..3df7d53 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -399,7 +399,7 @@ int af_alg_cmsg_send(struct msghdr *msg, struct af_alg_control *con)
{
struct cmsghdr *cmsg;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
if (cmsg->cmsg_level != SOL_ALG)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index bb9b836..d4b592f 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -94,6 +94,10 @@ struct cmsghdr {
(cmsg)->cmsg_len <= (unsigned long) \
((mhdr)->msg_controllen - \
((char *)(cmsg) - (char *)(mhdr)->msg_control)))
+#define CMSG_FOREACH_HDR(cmsg, msg) \
+ for (cmsg = CMSG_FIRSTHDR(msg); \
+ cmsg; \
+ cmsg = CMSG_NXTHDR(msg, cmsg))
/*
* Get the next cmsg header
diff --git a/net/core/scm.c b/net/core/scm.c
index b442e7e..e938c49 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -129,8 +129,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
struct cmsghdr *cmsg;
int err;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg))
- {
+ CMSG_FOREACH_HDR(cmsg, msg) {
err = -EINVAL;
/* Verify that cmsg_len is at least sizeof(struct cmsghdr) */
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 5ab6627..d449cc5 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -703,7 +703,7 @@ EXPORT_SYMBOL_GPL(compat_dccp_getsockopt);
static int dccp_msghdr_parse(struct msghdr *msg, struct sk_buff *skb)
{
- struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg);
+ struct cmsghdr *cmsg;
/*
* Assign an (opaque) qpolicy priority value to skb->priority.
@@ -717,8 +717,7 @@ static int dccp_msghdr_parse(struct msghdr *msg, struct sk_buff *skb)
*/
skb->priority = 0;
- for (; cmsg != NULL; cmsg = CMSG_NXTHDR(msg, cmsg)) {
-
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 9daf217..14a6f71 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -192,7 +192,7 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc,
int err, val;
struct cmsghdr *cmsg;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
#if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2cdc383..9895b98 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -640,7 +640,7 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
int len;
int err = 0;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
int addr_type;
if (!CMSG_OK(msg, cmsg)) {
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index a089b6b..eae4e08 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1070,9 +1070,7 @@ static int iucv_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
txmsg.class = 0;
/* iterate over control messages */
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg;
- cmsg = CMSG_NXTHDR(msg, cmsg)) {
-
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg)) {
err = -EINVAL;
goto out;
diff --git a/net/rds/send.c b/net/rds/send.c
index 0a64541..1accb3e 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -826,7 +826,7 @@ static int rds_rm_size(struct msghdr *msg, int data_len)
int cmsg_groups = 0;
int retval;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
@@ -878,7 +878,7 @@ static int rds_cmsg_send(struct rds_sock *rs, struct rds_message *rm,
struct cmsghdr *cmsg;
int ret = 0;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
diff --git a/net/rxrpc/ar-output.c b/net/rxrpc/ar-output.c
index 0b4b9a7..f915e7e 100644
--- a/net/rxrpc/ar-output.c
+++ b/net/rxrpc/ar-output.c
@@ -45,7 +45,7 @@ static int rxrpc_sendmsg_cmsg(struct rxrpc_sock *rx, struct msghdr *msg,
if (msg->msg_controllen == 0)
return -EINVAL;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 634a2ab..58834d7 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6592,8 +6592,7 @@ static int sctp_msghdr_parse(const struct msghdr *msg, sctp_cmsgs_t *cmsgs)
struct cmsghdr *cmsg;
struct msghdr *my_msg = (struct msghdr *)msg;
- for (cmsg = CMSG_FIRSTHDR(msg); cmsg != NULL;
- cmsg = CMSG_NXTHDR(my_msg, cmsg)) {
+ CMSG_FOREACH_HDR(cmsg, my_msg) {
if (!CMSG_OK(my_msg, cmsg))
return -EINVAL;
--
1.7.7
.
^ permalink raw reply related
* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Roopa Prabhu @ 2014-12-05 7:10 UTC (permalink / raw)
To: John Fastabend
Cc: Scott Feldman, Jiří Pírko, Jamal Hadi Salim,
Benjamin LaHaise, Thomas Graf, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <548156C8.5060602@gmail.com>
On 12/4/14, 10:55 PM, John Fastabend wrote:
> On 12/04/2014 10:41 PM, Scott Feldman wrote:
>> On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> This allows offloading to switch asic without having the user to set
>>> any flag. And this is done in the bridge driver to rollback kernel
>>> settings
>>> on hw offload failure if required in the future.
>>>
>>> With this, it also makes sure a notification goes out only after the
>>> attributes are set both in the kernel and hw.
>>
>> I like this approach as it streamlines the steps for the user in
>> setting port flags. There is one case for FLOODING where you'll have
>> to turn off flooding for both, and then turn on flooding in hw. You
>> don't want flooding turned on on kernel and hw.
>>
>>> ---
>>> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
>>> 1 file changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>> index 9f5eb55..ce173f0 100644
>>> --- a/net/bridge/br_netlink.c
>>> +++ b/net/bridge/br_netlink.c
>>> @@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct
>>> nlmsghdr *nlh)
>>> afspec, RTM_SETLINK);
>>> }
>>>
>>> + if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
>>> + dev->netdev_ops->ndo_bridge_setlink) {
>>> + int ret = dev->netdev_ops->ndo_bridge_setlink(dev,
>>> nlh);
>>
>> I think you want to up-level this to net/core/rtnetlink.c because
>> you're only enabling the feature for one instance of a driver that
>> implements ndo_bridge_setlink: the bridge driver. If another driver
>> was MASTER and implemented ndo_bridge_setlink, you'd want same check
>> to push setting down to SELF port driver.
>
> Also if the user set SELF && MASTER flags && had HW_SWITCH_OFFLOAD bit
> set we would call ndo_bridge_setlink twice on the dev. Maybe you can
> catch this case in rtnetlink.c and only call it once.
yes, thought about this and when i looked at iproute2 code, it is either
master
or self today and i don't think anybody else uses both flags with the
current
kernel implementation. But yes, that does not stop anybody from setting
both flags.
I will handle it better in v2.
>
>>
>>> + if (ret && ret != -EOPNOTSUPP) {
>>> + /* XXX Fix this in the future to rollback
>>> + * kernel settings and return error
>>> + */
>>
>> The future is now. Let's fix this now for the rollback case (again up
>> in rtnetlink.c). So then a general question comes to mind: for these
>> dual target sets, is it best to try HW first and then SW, or the other
>> way around? Either way, on failure on second you need to rollback
>> first. And, on failure, you need to know rollback value for first, so
>> you have to do a getlink on first before attempting set.
>
> It might be helpful to return some indication of what object failed as
> well.
ok...
thanks,
Roopa
^ permalink raw reply
* Re: [PATCH iproute2] bridge link: add option 'self'
From: Roopa Prabhu @ 2014-12-05 7:04 UTC (permalink / raw)
To: Scott Feldman
Cc: Jiří Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <CAE4R7bA3JSyHxe5ptVzpCzP5nHmKm8KyPqGvK7a_N2TWX-BNRw@mail.gmail.com>
On 12/4/14, 10:52 PM, Scott Feldman wrote:
> On Thu, Dec 4, 2014 at 6:27 PM, <roopa@cumulusnetworks.com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> Currently self is set internally only if hwmode is set.
>> This makes it necessary for the hw to have a mode.
>> There is no hwmode really required to go to hardware. So, introduce
>> self for anybody who wants to target hardware.
> Can you include unwinding the hwmode 'swdev' setting for
> kernel/iproute2 in v2? With this "self" option, and your new
> NETIF_F_HW_SWITCH_OFFLOAD flag, we don't need the hwmde 'swdev'.
yep. will do. thanks
>> ---
>> bridge/link.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/bridge/link.c b/bridge/link.c
>> index 90d9e7f..2b86141 100644
>> --- a/bridge/link.c
>> +++ b/bridge/link.c
>> @@ -321,6 +321,9 @@ static int brlink_modify(int argc, char **argv)
>> "\"veb\".\n");
>> exit(-1);
>> }
>> + } else if (strcmp(*argv, "self") == 0) {
>> + NEXT_ARG();
>> + flags = BRIDGE_FLAGS_SELF;
>> } else {
>> usage();
>> }
>> --
>> 1.7.10.4
>>
^ permalink raw reply
* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Roopa Prabhu @ 2014-12-05 7:02 UTC (permalink / raw)
To: Scott Feldman
Cc: Jiří Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <CAE4R7bBOCgoH0zL6+82jj2qjcP8n3kSo51x+i5SoqkN8Ff1CBQ@mail.gmail.com>
On 12/4/14, 10:41 PM, Scott Feldman wrote:
> On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This allows offloading to switch asic without having the user to set
>> any flag. And this is done in the bridge driver to rollback kernel settings
>> on hw offload failure if required in the future.
>>
>> With this, it also makes sure a notification goes out only after the
>> attributes are set both in the kernel and hw.
> I like this approach as it streamlines the steps for the user in
> setting port flags. There is one case for FLOODING where you'll have
> to turn off flooding for both, and then turn on flooding in hw. You
> don't want flooding turned on on kernel and hw.
ok, maybe using the higher bits as in
https://patchwork.ozlabs.org/patch/413211/
might help with that. Let me think some more.
>
>> ---
>> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
>> 1 file changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>> index 9f5eb55..ce173f0 100644
>> --- a/net/bridge/br_netlink.c
>> +++ b/net/bridge/br_netlink.c
>> @@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
>> afspec, RTM_SETLINK);
>> }
>>
>> + if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
>> + dev->netdev_ops->ndo_bridge_setlink) {
>> + int ret = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
> I think you want to up-level this to net/core/rtnetlink.c because
> you're only enabling the feature for one instance of a driver that
> implements ndo_bridge_setlink: the bridge driver. If another driver
> was MASTER and implemented ndo_bridge_setlink, you'd want same check
> to push setting down to SELF port driver.
yeah, i thought about that. But i moved it here so that rollback would
be easier.
>
>> + if (ret && ret != -EOPNOTSUPP) {
>> + /* XXX Fix this in the future to rollback
>> + * kernel settings and return error
>> + */
> The future is now. Let's fix this now for the rollback case (again up
> in rtnetlink.c). So then a general question comes to mind: for these
> dual target sets, is it best to try HW first and then SW, or the other
> way around? Either way, on failure on second you need to rollback
> first. And, on failure, you need to know rollback value for first, so
> you have to do a getlink on first before attempting set.
yep, exactly, I went through the same thought process yesterday when i
was trying to implement rollback.
>
>> + br_warn(p->br, "error offloading bridge attributes "
>> + "on port %u(%s)\n", (unsigned int) p->port_no,
>> + p->dev->name);
>> + }
>> + }
>> +
>> if (err == 0)
>> br_ifinfo_notify(RTM_NEWLINK, p);
>> -
>> out:
>> return err;
>> }
>> @@ -433,6 +445,19 @@ int br_dellink(struct net_device *dev, struct nlmsghdr *nlh)
>> err = br_afspec((struct net_bridge *)netdev_priv(dev), p,
>> afspec, RTM_DELLINK);
>>
>> + if (dev->features & NETIF_F_HW_SWITCH_OFFLOAD
>> + && dev->netdev_ops->ndo_bridge_setlink) {
>> + int ret = dev->netdev_ops->ndo_bridge_dellink(dev, nlh);
>> + if (ret && ret != -EOPNOTSUPP) {
>> + /* XXX Fix this in the future to rollback
>> + * kernel settings and return error
>> + */
>> + br_warn(p->br, "error offloading bridge attributes "
>> + "on port %u(%s)\n", (unsigned int) p->port_no,
>> + p->dev->name);
>> + }
>> + }
>> +
> Same comments as setlink above.
>
>> return err;
>> }
>> static int br_validate(struct nlattr *tb[], struct nlattr *data[])
>> --
>> 1.7.10.4
>>
^ permalink raw reply
* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: John Fastabend @ 2014-12-05 6:55 UTC (permalink / raw)
To: Scott Feldman
Cc: Roopa Prabhu, Jiří Pírko, Jamal Hadi Salim,
Benjamin LaHaise, Thomas Graf, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <CAE4R7bBOCgoH0zL6+82jj2qjcP8n3kSo51x+i5SoqkN8Ff1CBQ@mail.gmail.com>
On 12/04/2014 10:41 PM, Scott Feldman wrote:
> On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This allows offloading to switch asic without having the user to set
>> any flag. And this is done in the bridge driver to rollback kernel settings
>> on hw offload failure if required in the future.
>>
>> With this, it also makes sure a notification goes out only after the
>> attributes are set both in the kernel and hw.
>
> I like this approach as it streamlines the steps for the user in
> setting port flags. There is one case for FLOODING where you'll have
> to turn off flooding for both, and then turn on flooding in hw. You
> don't want flooding turned on on kernel and hw.
>
>> ---
>> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
>> 1 file changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>> index 9f5eb55..ce173f0 100644
>> --- a/net/bridge/br_netlink.c
>> +++ b/net/bridge/br_netlink.c
>> @@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
>> afspec, RTM_SETLINK);
>> }
>>
>> + if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
>> + dev->netdev_ops->ndo_bridge_setlink) {
>> + int ret = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
>
> I think you want to up-level this to net/core/rtnetlink.c because
> you're only enabling the feature for one instance of a driver that
> implements ndo_bridge_setlink: the bridge driver. If another driver
> was MASTER and implemented ndo_bridge_setlink, you'd want same check
> to push setting down to SELF port driver.
Also if the user set SELF && MASTER flags && had HW_SWITCH_OFFLOAD bit
set we would call ndo_bridge_setlink twice on the dev. Maybe you can
catch this case in rtnetlink.c and only call it once.
>
>> + if (ret && ret != -EOPNOTSUPP) {
>> + /* XXX Fix this in the future to rollback
>> + * kernel settings and return error
>> + */
>
> The future is now. Let's fix this now for the rollback case (again up
> in rtnetlink.c). So then a general question comes to mind: for these
> dual target sets, is it best to try HW first and then SW, or the other
> way around? Either way, on failure on second you need to rollback
> first. And, on failure, you need to know rollback value for first, so
> you have to do a getlink on first before attempting set.
It might be helpful to return some indication of what object failed as
well.
>
>> + br_warn(p->br, "error offloading bridge attributes "
>> + "on port %u(%s)\n", (unsigned int) p->port_no,
>> + p->dev->name);
>> + }
>> + }
>> +
>> if (err == 0)
>> br_ifinfo_notify(RTM_NEWLINK, p);
>> -
>> out:
>> return err;
>> }
>> @@ -433,6 +445,19 @@ int br_dellink(struct net_device *dev, struct nlmsghdr *nlh)
>> err = br_afspec((struct net_bridge *)netdev_priv(dev), p,
>> afspec, RTM_DELLINK);
>>
>> + if (dev->features & NETIF_F_HW_SWITCH_OFFLOAD
>> + && dev->netdev_ops->ndo_bridge_setlink) {
>> + int ret = dev->netdev_ops->ndo_bridge_dellink(dev, nlh);
>> + if (ret && ret != -EOPNOTSUPP) {
>> + /* XXX Fix this in the future to rollback
>> + * kernel settings and return error
>> + */
>> + br_warn(p->br, "error offloading bridge attributes "
>> + "on port %u(%s)\n", (unsigned int) p->port_no,
>> + p->dev->name);
>> + }
>> + }
>> +
>
> Same comments as setlink above.
>
>> return err;
>> }
>> static int br_validate(struct nlattr *tb[], struct nlattr *data[])
>> --
>> 1.7.10.4
>>
--
John Fastabend Intel Corporation
^ permalink raw reply
* Re: [PATCH iproute2] bridge link: add option 'self'
From: Scott Feldman @ 2014-12-05 6:52 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Jiří Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <1417746436-41023-1-git-send-email-roopa@cumulusnetworks.com>
On Thu, Dec 4, 2014 at 6:27 PM, <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> Currently self is set internally only if hwmode is set.
> This makes it necessary for the hw to have a mode.
> There is no hwmode really required to go to hardware. So, introduce
> self for anybody who wants to target hardware.
Can you include unwinding the hwmode 'swdev' setting for
kernel/iproute2 in v2? With this "self" option, and your new
NETIF_F_HW_SWITCH_OFFLOAD flag, we don't need the hwmde 'swdev'.
> ---
> bridge/link.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/bridge/link.c b/bridge/link.c
> index 90d9e7f..2b86141 100644
> --- a/bridge/link.c
> +++ b/bridge/link.c
> @@ -321,6 +321,9 @@ static int brlink_modify(int argc, char **argv)
> "\"veb\".\n");
> exit(-1);
> }
> + } else if (strcmp(*argv, "self") == 0) {
> + NEXT_ARG();
> + flags = BRIDGE_FLAGS_SELF;
> } else {
> usage();
> }
> --
> 1.7.10.4
>
^ permalink raw reply
* Re: [PATCH 1/3] netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature flag for switch device offloads
From: Scott Feldman @ 2014-12-05 6:47 UTC (permalink / raw)
To: John Fastabend
Cc: Roopa Prabhu, Jiří Pírko, Jamal Hadi Salim,
Benjamin LaHaise, Thomas Graf, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <54815180.30900@gmail.com>
On Thu, Dec 4, 2014 at 10:32 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 12/04/2014 10:08 PM, Scott Feldman wrote:
>>
>> On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
>>>
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> This is a generic high level feature flag for all switch asic features
>>> today.
>>>
>>> switch drivers set this flag on switch ports. Logical devices like
>>> bridge, bonds, vxlans can inherit this flag from their slaves/ports.
>>>
>>> I had to use SWITCH in the name to avoid ambiguity with other feature
>>> flags. But, since i have been harping about not calling it 'switch',
>>> I am welcome to any suggestions :)
>>>
>>> An alternative to using a feature flag is to use a IFF_HW_OFFLOAD
>>> in net_device_flags.
>>> ---
>>> include/linux/netdev_features.h | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/include/linux/netdev_features.h
>>> b/include/linux/netdev_features.h
>>> index 8e30685..68db1de 100644
>>> --- a/include/linux/netdev_features.h
>>> +++ b/include/linux/netdev_features.h
>>> @@ -66,6 +66,7 @@ enum {
>>> NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN
>>> STAGs */
>>> NETIF_F_HW_L2FW_DOFFLOAD_BIT, /* Allow L2 Forwarding in
>>> Hardware */
>>> NETIF_F_BUSY_POLL_BIT, /* Busy poll */
>>> + NETIF_F_HW_SWITCH_OFFLOAD_BIT, /* HW switch offload */
>>>
>>> /*
>>> * Add your fresh new feature above and remember to update
>>> @@ -124,6 +125,7 @@ enum {
>>> #define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX)
>>> #define NETIF_F_HW_L2FW_DOFFLOAD __NETIF_F(HW_L2FW_DOFFLOAD)
>>> #define NETIF_F_BUSY_POLL __NETIF_F(BUSY_POLL)
>>> +#define NETIF_F_HW_SWITCH_OFFLOAD __NETIF_F(HW_SWITCH_OFFLOAD)
>>
>>
>> Can we use existing flag NETIF_F_HW_L2FW_DOFFLOAD? The comment above
>> on that flag suggests it would be perfect for what we're doing with L2
>> bridging: offloading the L2 forwarding path in HW.
>>
>> NETIF_F_HW_L2FW_DOFFLOAD seems to be orphaned in current net-next, so
>> I'm not sure of it's status, but see if it makes sense to recycle it.
>>
>> I could see a NETIF_F_HW_L3FW_DOFFLOAD for L3 fwd offload, down the road.
>>
>
> Its not orphaned its used by the macvlan device to indicate that the
> port switching should be done in hardware. For example put macvlan in
> VEPA mode and the hardware will deliver the packets directly to the
> netdev and get the pruning right. In VEB mode the hardware will forward
> packets correctly between macvlan netdevs. The implementation on ixgbe
> for example is to give a set of hardware queues per macvlan netdev. I
> still need to add offloaded QOS to make this more interesting.
>
> I could support a l3vlan device as well even on some of the host nics.
>
> The issue with generalizing it is at the moment it is used by ixgbe
> which can't forward between two physical ports. It works on macvlan
> because each macvlan netdev uses the same physical uplink. I think switching
> between different ports is going to require another feature
> bit like its done here.
My bad...I grepped for the _BIT variant and missed the usage in
macvlan, etc. Sorry about that.
>
> .John
>
>
>>
>>> /* Features valid for ethtool to change */
>>> /* = all defined minus driver/device-class-related */
>>> --
>>> 1.7.10.4
>>>
>
>
> --
> John Fastabend Intel Corporation
^ permalink raw reply
* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Scott Feldman @ 2014-12-05 6:41 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Jiří Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <1417746401-8140-3-git-send-email-roopa@cumulusnetworks.com>
On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This allows offloading to switch asic without having the user to set
> any flag. And this is done in the bridge driver to rollback kernel settings
> on hw offload failure if required in the future.
>
> With this, it also makes sure a notification goes out only after the
> attributes are set both in the kernel and hw.
I like this approach as it streamlines the steps for the user in
setting port flags. There is one case for FLOODING where you'll have
to turn off flooding for both, and then turn on flooding in hw. You
don't want flooding turned on on kernel and hw.
> ---
> net/bridge/br_netlink.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 9f5eb55..ce173f0 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -407,9 +407,21 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
> afspec, RTM_SETLINK);
> }
>
> + if ((dev->features & NETIF_F_HW_SWITCH_OFFLOAD) &&
> + dev->netdev_ops->ndo_bridge_setlink) {
> + int ret = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
I think you want to up-level this to net/core/rtnetlink.c because
you're only enabling the feature for one instance of a driver that
implements ndo_bridge_setlink: the bridge driver. If another driver
was MASTER and implemented ndo_bridge_setlink, you'd want same check
to push setting down to SELF port driver.
> + if (ret && ret != -EOPNOTSUPP) {
> + /* XXX Fix this in the future to rollback
> + * kernel settings and return error
> + */
The future is now. Let's fix this now for the rollback case (again up
in rtnetlink.c). So then a general question comes to mind: for these
dual target sets, is it best to try HW first and then SW, or the other
way around? Either way, on failure on second you need to rollback
first. And, on failure, you need to know rollback value for first, so
you have to do a getlink on first before attempting set.
> + br_warn(p->br, "error offloading bridge attributes "
> + "on port %u(%s)\n", (unsigned int) p->port_no,
> + p->dev->name);
> + }
> + }
> +
> if (err == 0)
> br_ifinfo_notify(RTM_NEWLINK, p);
> -
> out:
> return err;
> }
> @@ -433,6 +445,19 @@ int br_dellink(struct net_device *dev, struct nlmsghdr *nlh)
> err = br_afspec((struct net_bridge *)netdev_priv(dev), p,
> afspec, RTM_DELLINK);
>
> + if (dev->features & NETIF_F_HW_SWITCH_OFFLOAD
> + && dev->netdev_ops->ndo_bridge_setlink) {
> + int ret = dev->netdev_ops->ndo_bridge_dellink(dev, nlh);
> + if (ret && ret != -EOPNOTSUPP) {
> + /* XXX Fix this in the future to rollback
> + * kernel settings and return error
> + */
> + br_warn(p->br, "error offloading bridge attributes "
> + "on port %u(%s)\n", (unsigned int) p->port_no,
> + p->dev->name);
> + }
> + }
> +
Same comments as setlink above.
> return err;
> }
> static int br_validate(struct nlattr *tb[], struct nlattr *data[])
> --
> 1.7.10.4
>
^ permalink raw reply
* Re: [PATCH 3/3] rocker: set feature NETIF_F_HW_SWITCH_OFFLOAD
From: John Fastabend @ 2014-12-05 6:38 UTC (permalink / raw)
To: Scott Feldman
Cc: Jianhua Xie, Roopa Prabhu, Jiří Pírko,
Jamal Hadi Salim, Benjamin LaHaise, Thomas Graf,
stephen@networkplumber.org, John Linville, nhorman@tuxdriver.com,
Nicolas Dichtel, vyasevic@redhat.com, Florian Fainelli,
buytenh@wantstofly.org, Aviad Raveh, Netdev, David S. Miller, shm,
Andy Gospodarek
In-Reply-To: <CAE4R7bDCcE94WQE50J+suAr9F=OFvQnC73txA+xrC2JxMseYkw@mail.gmail.com>
On 12/04/2014 10:16 PM, Scott Feldman wrote:
> On Thu, Dec 4, 2014 at 7:25 PM, Jianhua Xie <jianhua.xie@freescale.com> wrote:
>>
>> 在 2014年12月05日 10:26, roopa@cumulusnetworks.com 写道:
>>
>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>
>>> This patch just sets the feature flag on rocker ports
>>> ---
>>> drivers/net/ethernet/rocker/rocker.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/rocker/rocker.c
>>> b/drivers/net/ethernet/rocker/rocker.c
>>> index fded127..3fe19b0 100644
>>> --- a/drivers/net/ethernet/rocker/rocker.c
>>> +++ b/drivers/net/ethernet/rocker/rocker.c
>>> @@ -4003,7 +4003,8 @@ static int rocker_probe_port(struct rocker *rocker,
>>> unsigned int port_number)
>>> NAPI_POLL_WEIGHT);
>>> rocker_carrier_init(rocker_port);
>>> - dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
>>> + dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
>>> + NETIF_F_HW_SWITCH_OFFLOAD;
>>
>> Do you have a plan on enabling/disabling this flag dynamically by ethtool?
>
> Might get it for free if we recycle NETIF_F_HW_L2FW_DOFFLOAD;
> NETIF_F_HW_L2FW_DOFFLOAD seems to be supported by ethtool already.
See my other note but ethtool support for feature flags makes
this pretty close to trivial to implement these days.
Take a look @ netdev_features_strings.
>
>>
>> Thanks & B.R
>> Jianhua
>>
>>> err = register_netdev(dev);
>>> if (err) {
>>
>>
--
John Fastabend Intel Corporation
^ permalink raw reply
* Re: [PATCH 1/3] netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature flag for switch device offloads
From: John Fastabend @ 2014-12-05 6:32 UTC (permalink / raw)
To: Scott Feldman
Cc: Roopa Prabhu, Jiří Pírko, Jamal Hadi Salim,
Benjamin LaHaise, Thomas Graf, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <CAE4R7bBC+W+9F3cmR5TVuO+BbhK+zBuiQhNtk6gCvK5CC2MkMg@mail.gmail.com>
On 12/04/2014 10:08 PM, Scott Feldman wrote:
> On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This is a generic high level feature flag for all switch asic features today.
>>
>> switch drivers set this flag on switch ports. Logical devices like
>> bridge, bonds, vxlans can inherit this flag from their slaves/ports.
>>
>> I had to use SWITCH in the name to avoid ambiguity with other feature
>> flags. But, since i have been harping about not calling it 'switch',
>> I am welcome to any suggestions :)
>>
>> An alternative to using a feature flag is to use a IFF_HW_OFFLOAD
>> in net_device_flags.
>> ---
>> include/linux/netdev_features.h | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>> index 8e30685..68db1de 100644
>> --- a/include/linux/netdev_features.h
>> +++ b/include/linux/netdev_features.h
>> @@ -66,6 +66,7 @@ enum {
>> NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
>> NETIF_F_HW_L2FW_DOFFLOAD_BIT, /* Allow L2 Forwarding in Hardware */
>> NETIF_F_BUSY_POLL_BIT, /* Busy poll */
>> + NETIF_F_HW_SWITCH_OFFLOAD_BIT, /* HW switch offload */
>>
>> /*
>> * Add your fresh new feature above and remember to update
>> @@ -124,6 +125,7 @@ enum {
>> #define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX)
>> #define NETIF_F_HW_L2FW_DOFFLOAD __NETIF_F(HW_L2FW_DOFFLOAD)
>> #define NETIF_F_BUSY_POLL __NETIF_F(BUSY_POLL)
>> +#define NETIF_F_HW_SWITCH_OFFLOAD __NETIF_F(HW_SWITCH_OFFLOAD)
>
> Can we use existing flag NETIF_F_HW_L2FW_DOFFLOAD? The comment above
> on that flag suggests it would be perfect for what we're doing with L2
> bridging: offloading the L2 forwarding path in HW.
>
> NETIF_F_HW_L2FW_DOFFLOAD seems to be orphaned in current net-next, so
> I'm not sure of it's status, but see if it makes sense to recycle it.
>
> I could see a NETIF_F_HW_L3FW_DOFFLOAD for L3 fwd offload, down the road.
>
Its not orphaned its used by the macvlan device to indicate that the
port switching should be done in hardware. For example put macvlan in
VEPA mode and the hardware will deliver the packets directly to the
netdev and get the pruning right. In VEB mode the hardware will forward
packets correctly between macvlan netdevs. The implementation on ixgbe
for example is to give a set of hardware queues per macvlan netdev. I
still need to add offloaded QOS to make this more interesting.
I could support a l3vlan device as well even on some of the host nics.
The issue with generalizing it is at the moment it is used by ixgbe
which can't forward between two physical ports. It works on macvlan
because each macvlan netdev uses the same physical uplink. I think
switching between different ports is going to require another feature
bit like its done here.
.John
>
>> /* Features valid for ethtool to change */
>> /* = all defined minus driver/device-class-related */
>> --
>> 1.7.10.4
>>
--
John Fastabend Intel Corporation
^ permalink raw reply
* Re: [PATCH v2 1/6] net-PPP: Replacement of a printk() call by pr_warn() in mppe_rekey()
From: Julia Lawall @ 2014-12-05 6:26 UTC (permalink / raw)
To: Joe Perches
Cc: SF Markus Elfring, Sergei Shtylyov, Paul Mackerras, linux-ppp,
netdev, Eric Dumazet, LKML, kernel-janitors, Julia Lawall
In-Reply-To: <1417733101.2721.20.camel@perches.com>
On Thu, 4 Dec 2014, Joe Perches wrote:
> On Thu, 2014-12-04 at 23:27 +0100, SF Markus Elfring wrote:
> > >> diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
> > > []
> > >> @@ -172,9 +172,8 @@ static void mppe_rekey(struct ppp_mppe_state * state, int initial_key)
> > >> setup_sg(sg_in, state->sha1_digest, state->keylen);
> > >> setup_sg(sg_out, state->session_key, state->keylen);
> > >> if (crypto_blkcipher_encrypt(&desc, sg_out, sg_in,
> > >> - state->keylen) != 0) {
> > >> - printk(KERN_WARNING "mppe_rekey: cipher_encrypt failed\n");
> > >> - }
> > >> + state->keylen) != 0)
> > >> + pr_warn("mppe_rekey: cipher_encrypt failed\n");
> > >
> > > It's generally nicer to replace embedded function names
> > > with "%s: ", __func__
> > >
> > > pr_warn("%s: cipher_encrypt failed\n", __func__);
> >
> > Do you want that I send a third patch series for the fine-tuning of these parameters?
>
> If you want.
>
> I just wanted you to be aware of it for future patches.
Markus, are you sure that you cannot use netdev_warn in this case?
julia
^ permalink raw reply
* Re: [PATCH 3/3] rocker: set feature NETIF_F_HW_SWITCH_OFFLOAD
From: Scott Feldman @ 2014-12-05 6:16 UTC (permalink / raw)
To: Jianhua Xie
Cc: Roopa Prabhu, Jiří Pírko, Jamal Hadi Salim,
Benjamin LaHaise, Thomas Graf, john fastabend,
stephen@networkplumber.org, John Linville, nhorman@tuxdriver.com,
Nicolas Dichtel, vyasevic@redhat.com, Florian Fainelli,
buytenh@wantstofly.org, Aviad Raveh, Netdev, David S. Miller, shm,
Andy Gospodarek
In-Reply-To: <5481259C.3040707@freescale.com>
On Thu, Dec 4, 2014 at 7:25 PM, Jianhua Xie <jianhua.xie@freescale.com> wrote:
>
> 在 2014年12月05日 10:26, roopa@cumulusnetworks.com 写道:
>
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This patch just sets the feature flag on rocker ports
>> ---
>> drivers/net/ethernet/rocker/rocker.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/rocker/rocker.c
>> b/drivers/net/ethernet/rocker/rocker.c
>> index fded127..3fe19b0 100644
>> --- a/drivers/net/ethernet/rocker/rocker.c
>> +++ b/drivers/net/ethernet/rocker/rocker.c
>> @@ -4003,7 +4003,8 @@ static int rocker_probe_port(struct rocker *rocker,
>> unsigned int port_number)
>> NAPI_POLL_WEIGHT);
>> rocker_carrier_init(rocker_port);
>> - dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
>> + dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
>> + NETIF_F_HW_SWITCH_OFFLOAD;
>
> Do you have a plan on enabling/disabling this flag dynamically by ethtool?
Might get it for free if we recycle NETIF_F_HW_L2FW_DOFFLOAD;
NETIF_F_HW_L2FW_DOFFLOAD seems to be supported by ethtool already.
>
> Thanks & B.R
> Jianhua
>
>> err = register_netdev(dev);
>> if (err) {
>
>
^ permalink raw reply
* Re: [PATCH 1/3] netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature flag for switch device offloads
From: Scott Feldman @ 2014-12-05 6:08 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Jiří Pírko, Jamal Hadi Salim, Benjamin LaHaise,
Thomas Graf, john fastabend, stephen@networkplumber.org,
John Linville, nhorman@tuxdriver.com, Nicolas Dichtel,
vyasevic@redhat.com, Florian Fainelli, buytenh@wantstofly.org,
Aviad Raveh, Netdev, David S. Miller, shm, Andy Gospodarek
In-Reply-To: <1417746401-8140-2-git-send-email-roopa@cumulusnetworks.com>
On Thu, Dec 4, 2014 at 6:26 PM, <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This is a generic high level feature flag for all switch asic features today.
>
> switch drivers set this flag on switch ports. Logical devices like
> bridge, bonds, vxlans can inherit this flag from their slaves/ports.
>
> I had to use SWITCH in the name to avoid ambiguity with other feature
> flags. But, since i have been harping about not calling it 'switch',
> I am welcome to any suggestions :)
>
> An alternative to using a feature flag is to use a IFF_HW_OFFLOAD
> in net_device_flags.
> ---
> include/linux/netdev_features.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index 8e30685..68db1de 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -66,6 +66,7 @@ enum {
> NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
> NETIF_F_HW_L2FW_DOFFLOAD_BIT, /* Allow L2 Forwarding in Hardware */
> NETIF_F_BUSY_POLL_BIT, /* Busy poll */
> + NETIF_F_HW_SWITCH_OFFLOAD_BIT, /* HW switch offload */
>
> /*
> * Add your fresh new feature above and remember to update
> @@ -124,6 +125,7 @@ enum {
> #define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX)
> #define NETIF_F_HW_L2FW_DOFFLOAD __NETIF_F(HW_L2FW_DOFFLOAD)
> #define NETIF_F_BUSY_POLL __NETIF_F(BUSY_POLL)
> +#define NETIF_F_HW_SWITCH_OFFLOAD __NETIF_F(HW_SWITCH_OFFLOAD)
Can we use existing flag NETIF_F_HW_L2FW_DOFFLOAD? The comment above
on that flag suggests it would be perfect for what we're doing with L2
bridging: offloading the L2 forwarding path in HW.
NETIF_F_HW_L2FW_DOFFLOAD seems to be orphaned in current net-next, so
I'm not sure of it's status, but see if it makes sense to recycle it.
I could see a NETIF_F_HW_L3FW_DOFFLOAD for L3 fwd offload, down the road.
> /* Features valid for ethtool to change */
> /* = all defined minus driver/device-class-related */
> --
> 1.7.10.4
>
^ permalink raw reply
* [PATCH 03/12] ip_generic_getfrag, udplite_getfrag: switch to passing msghdr
From: Al Viro @ 2014-12-05 5:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20141205055623.GQ29748@ZenIV.linux.org.uk>
From: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/net/udplite.h | 3 ++-
net/ipv4/ip_output.c | 6 +++---
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 4 ++--
net/ipv6/raw.c | 2 +-
net/ipv6/udp.c | 2 +-
net/l2tp/l2tp_ip6.c | 2 +-
7 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/include/net/udplite.h b/include/net/udplite.h
index 9a28a51..d5baaba 100644
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -19,7 +19,8 @@ extern struct udp_table udplite_table;
static __inline__ int udplite_getfrag(void *from, char *to, int offset,
int len, int odd, struct sk_buff *skb)
{
- return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+ struct msghdr *msg = from;
+ return memcpy_fromiovecend(to, msg->msg_iov, offset, len);
}
/* Designate sk as UDP-Lite socket */
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4a929ad..cdedcf1 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -752,14 +752,14 @@ EXPORT_SYMBOL(ip_fragment);
int
ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb)
{
- struct iovec *iov = from;
+ struct msghdr *msg = from;
if (skb->ip_summed == CHECKSUM_PARTIAL) {
- if (memcpy_fromiovecend(to, iov, offset, len) < 0)
+ if (memcpy_fromiovecend(to, msg->msg_iov, offset, len) < 0)
return -EFAULT;
} else {
__wsum csum = 0;
- if (csum_partial_copy_fromiovecend(to, iov, offset, len, &csum) < 0)
+ if (csum_partial_copy_fromiovecend(to, msg->msg_iov, offset, len, &csum) < 0)
return -EFAULT;
skb->csum = csum_block_add(skb->csum, csum, odd);
}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 5c901eb..5d83bd2 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -478,7 +478,7 @@ static int raw_getfrag(void *from, char *to, int offset, int len, int odd,
offset -= rfv->hlen;
- return ip_generic_getfrag(rfv->msg->msg_iov, to, offset, len, odd, skb);
+ return ip_generic_getfrag(rfv->msg, to, offset, len, odd, skb);
}
static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b2d6068..0cac250 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1036,7 +1036,7 @@ back_from_confirm:
/* Lockless fast path for the non-corking case. */
if (!corkreq) {
- skb = ip_make_skb(sk, fl4, getfrag, msg->msg_iov, ulen,
+ skb = ip_make_skb(sk, fl4, getfrag, msg, ulen,
sizeof(struct udphdr), &ipc, &rt,
msg->msg_flags);
err = PTR_ERR(skb);
@@ -1067,7 +1067,7 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, fl4, getfrag, msg->msg_iov, ulen,
+ err = ip_append_data(sk, fl4, getfrag, msg, ulen,
sizeof(struct udphdr), &ipc, &rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 942f67b..11a9283 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -727,7 +727,7 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
offset -= rfv->hlen;
- return ip_generic_getfrag(rfv->msg->msg_iov, to, offset, len, odd, skb);
+ return ip_generic_getfrag(rfv->msg, to, offset, len, odd, skb);
}
static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7cfb5d74..5a164b6 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1299,7 +1299,7 @@ do_append_data:
dontfrag = np->dontfrag;
up->len += ulen;
getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
- err = ip6_append_data(sk, getfrag, msg->msg_iov, ulen,
+ err = ip6_append_data(sk, getfrag, msg, ulen,
sizeof(struct udphdr), hlimit, tclass, opt, &fl6,
(struct rt6_info *)dst,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags, dontfrag);
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 2177b96..8611f1b 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -619,7 +619,7 @@ static int l2tp_ip6_sendmsg(struct kiocb *iocb, struct sock *sk,
back_from_confirm:
lock_sock(sk);
- err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov,
+ err = ip6_append_data(sk, ip_generic_getfrag, msg,
ulen, transhdrlen, hlimit, tclass, opt,
&fl6, (struct rt6_info *)dst,
msg->msg_flags, dontfrag);
--
2.1.3
^ permalink raw reply related
* [PATCH 12/12] bury memcpy_toiovec()
From: Al Viro @ 2014-12-05 5:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20141205055623.GQ29748@ZenIV.linux.org.uk>
From: Al Viro <viro@zeniv.linux.org.uk>
no users left
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/linux/uio.h | 1 -
lib/iovec.c | 25 -------------------------
2 files changed, 26 deletions(-)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index bd8569a..a41e252 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -131,7 +131,6 @@ size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct iov_
size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len);
-int memcpy_toiovec(struct iovec *iov, unsigned char *kdata, int len);
int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov,
int offset, int len);
int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
diff --git a/lib/iovec.c b/lib/iovec.c
index df3abd1..2d99cb4 100644
--- a/lib/iovec.c
+++ b/lib/iovec.c
@@ -29,31 +29,6 @@ EXPORT_SYMBOL(memcpy_fromiovec);
/*
* Copy kernel to iovec. Returns -EFAULT on error.
- *
- * Note: this modifies the original iovec.
- */
-
-int memcpy_toiovec(struct iovec *iov, unsigned char *kdata, int len)
-{
- while (len > 0) {
- if (iov->iov_len) {
- int copy = min_t(unsigned int, iov->iov_len, len);
- if (copy_to_user(iov->iov_base, kdata, copy))
- return -EFAULT;
- kdata += copy;
- len -= copy;
- iov->iov_len -= copy;
- iov->iov_base += copy;
- }
- iov++;
- }
-
- return 0;
-}
-EXPORT_SYMBOL(memcpy_toiovec);
-
-/*
- * Copy kernel to iovec. Returns -EFAULT on error.
*/
int memcpy_toiovecend(const struct iovec *iov, unsigned char *kdata,
--
2.1.3
^ permalink raw reply related
* [PATCH 11/12] skb_copy_datagram_iovec() can die
From: Al Viro @ 2014-12-05 5:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20141205055623.GQ29748@ZenIV.linux.org.uk>
From: Al Viro <viro@zeniv.linux.org.uk>
no callers other than itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/linux/skbuff.h | 2 --
net/core/datagram.c | 84 --------------------------------------------------
2 files changed, 86 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 45ac95c..f676e54 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2642,8 +2642,6 @@ struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock,
int *err);
unsigned int datagram_poll(struct file *file, struct socket *sock,
struct poll_table_struct *wait);
-int skb_copy_datagram_iovec(const struct sk_buff *from, int offset,
- struct iovec *to, int size);
int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
struct iov_iter *to, int size);
static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 41075ed..df493d6 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -310,90 +310,6 @@ int skb_kill_datagram(struct sock *sk, struct sk_buff *skb, unsigned int flags)
EXPORT_SYMBOL(skb_kill_datagram);
/**
- * skb_copy_datagram_iovec - Copy a datagram to an iovec.
- * @skb: buffer to copy
- * @offset: offset in the buffer to start copying from
- * @to: io vector to copy to
- * @len: amount of data to copy from buffer to iovec
- *
- * Note: the iovec is modified during the copy.
- */
-int skb_copy_datagram_iovec(const struct sk_buff *skb, int offset,
- struct iovec *to, int len)
-{
- int start = skb_headlen(skb);
- int i, copy = start - offset;
- struct sk_buff *frag_iter;
-
- trace_skb_copy_datagram_iovec(skb, len);
-
- /* Copy header. */
- if (copy > 0) {
- if (copy > len)
- copy = len;
- if (memcpy_toiovec(to, skb->data + offset, copy))
- goto fault;
- if ((len -= copy) == 0)
- return 0;
- offset += copy;
- }
-
- /* Copy paged appendix. Hmm... why does this look so complicated? */
- for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
- int end;
- const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-
- WARN_ON(start > offset + len);
-
- end = start + skb_frag_size(frag);
- if ((copy = end - offset) > 0) {
- int err;
- u8 *vaddr;
- struct page *page = skb_frag_page(frag);
-
- if (copy > len)
- copy = len;
- vaddr = kmap(page);
- err = memcpy_toiovec(to, vaddr + frag->page_offset +
- offset - start, copy);
- kunmap(page);
- if (err)
- goto fault;
- if (!(len -= copy))
- return 0;
- offset += copy;
- }
- start = end;
- }
-
- skb_walk_frags(skb, frag_iter) {
- int end;
-
- WARN_ON(start > offset + len);
-
- end = start + frag_iter->len;
- if ((copy = end - offset) > 0) {
- if (copy > len)
- copy = len;
- if (skb_copy_datagram_iovec(frag_iter,
- offset - start,
- to, copy))
- goto fault;
- if ((len -= copy) == 0)
- return 0;
- offset += copy;
- }
- start = end;
- }
- if (!len)
- return 0;
-
-fault:
- return -EFAULT;
-}
-EXPORT_SYMBOL(skb_copy_datagram_iovec);
-
-/**
* skb_copy_datagram_iter - Copy a datagram to an iovec iterator.
* @skb: buffer to copy
* @offset: offset in the buffer to start copying from
--
2.1.3
^ permalink raw reply related
* [PATCH 10/12] ppp_read(): switch to skb_copy_datagram_iter()
From: Al Viro @ 2014-12-05 5:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20141205055623.GQ29748@ZenIV.linux.org.uk>
From: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/net/ppp/ppp_generic.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 794a473..af034db 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -417,6 +417,7 @@ static ssize_t ppp_read(struct file *file, char __user *buf,
ssize_t ret;
struct sk_buff *skb = NULL;
struct iovec iov;
+ struct iov_iter to;
ret = count;
@@ -462,7 +463,8 @@ static ssize_t ppp_read(struct file *file, char __user *buf,
ret = -EFAULT;
iov.iov_base = buf;
iov.iov_len = count;
- if (skb_copy_datagram_iovec(skb, 0, &iov, skb->len))
+ iov_iter_init(&to, READ, &iov, 1, count);
+ if (skb_copy_datagram_iter(skb, 0, &to, skb->len))
goto outf;
ret = skb->len;
--
2.1.3
^ permalink raw reply related
* [PATCH 09/12] switch memcpy_to_msg() and skb_copy{,_and_csum}_datagram_msg() to primitives
From: Al Viro @ 2014-12-05 5:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20141205055623.GQ29748@ZenIV.linux.org.uk>
From: Al Viro <viro@zeniv.linux.org.uk>
... making both non-draining. That means that tcp_recvmsg() becomes
non-draining. And _that_ would break iscsit_do_rx_data() unless we
a) make sure tcp_recvmsg() is uniformly non-draining (it is)
b) make sure it copes with arbitrary (including shifted)
iov_iter (it does, all it uses is iov_iter primitives)
c) make iscsit_do_rx_data() initialize ->msg_iter only once.
Fortunately, (c) is doable with minimal work and we are rid of one
the two places where kernel send/recvmsg users would be unhappy with
non-draining behaviour.
Actually, that makes all but one of ->recvmsg() instances iov_iter-clean.
The exception is skcipher_recvmsg() and it also isn't hard to convert
to primitives (iov_iter_get_pages() is needed there). That'll wait
a bit - there's some interplay with ->sendmsg() path for that one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/target/iscsi/iscsi_target_util.c | 12 +++----
include/linux/skbuff.h | 16 +++-------
net/core/datagram.c | 54 +++++++++++---------------------
3 files changed, 28 insertions(+), 54 deletions(-)
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index ce87ce9..7c6a95b 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -1326,21 +1326,19 @@ static int iscsit_do_rx_data(
struct iscsi_conn *conn,
struct iscsi_data_count *count)
{
- int data = count->data_length, rx_loop = 0, total_rx = 0, iov_len;
- struct kvec *iov_p;
+ int data = count->data_length, rx_loop = 0, total_rx = 0;
struct msghdr msg;
if (!conn || !conn->sock || !conn->conn_ops)
return -1;
memset(&msg, 0, sizeof(struct msghdr));
-
- iov_p = count->iov;
- iov_len = count->iov_count;
+ iov_iter_kvec(&msg.msg_iter, READ | ITER_KVEC,
+ count->iov, count->iov_count, data);
while (total_rx < data) {
- rx_loop = kernel_recvmsg(conn->sock, &msg, iov_p, iov_len,
- (data - total_rx), MSG_WAITALL);
+ rx_loop = sock_recvmsg(conn->sock, &msg,
+ (data - total_rx), MSG_WAITALL);
if (rx_loop <= 0) {
pr_debug("rx_loop: %d total_rx: %d\n",
rx_loop, total_rx);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4199dfa..45ac95c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2649,17 +2649,10 @@ int skb_copy_datagram_iter(const struct sk_buff *from, int offset,
static inline int skb_copy_datagram_msg(const struct sk_buff *from, int offset,
struct msghdr *msg, int size)
{
- /* XXX: stripping const */
- return skb_copy_datagram_iovec(from, offset, (struct iovec *)msg->msg_iter.iov, size);
-}
-int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb, int hlen,
- struct iovec *iov);
-static inline int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, int hlen,
- struct msghdr *msg)
-{
- /* XXX: stripping const */
- return skb_copy_and_csum_datagram_iovec(skb, hlen, (struct iovec *)msg->msg_iter.iov);
+ return skb_copy_datagram_iter(from, offset, &msg->msg_iter, size);
}
+int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, int hlen,
+ struct msghdr *msg);
int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset,
struct iov_iter *from, int len);
int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *frm);
@@ -2695,8 +2688,7 @@ static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
static inline int memcpy_to_msg(struct msghdr *msg, void *data, int len)
{
- /* XXX: stripping const */
- return memcpy_toiovec((struct iovec *)msg->msg_iter.iov, data, len);
+ return copy_to_iter(data, len, &msg->msg_iter) == len ? 0 : -EFAULT;
}
struct skb_checksum_ops {
diff --git a/net/core/datagram.c b/net/core/datagram.c
index b6e303b..41075ed 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -615,27 +615,25 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct iov_iter *from)
EXPORT_SYMBOL(zerocopy_sg_from_iter);
static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
- u8 __user *to, int len,
+ struct iov_iter *to, int len,
__wsum *csump)
{
int start = skb_headlen(skb);
int i, copy = start - offset;
struct sk_buff *frag_iter;
int pos = 0;
+ int n;
/* Copy header. */
if (copy > 0) {
- int err = 0;
if (copy > len)
copy = len;
- *csump = csum_and_copy_to_user(skb->data + offset, to, copy,
- *csump, &err);
- if (err)
+ n = csum_and_copy_to_iter(skb->data + offset, copy, csump, to);
+ if (n != copy)
goto fault;
if ((len -= copy) == 0)
return 0;
offset += copy;
- to += copy;
pos = copy;
}
@@ -647,26 +645,22 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
end = start + skb_frag_size(frag);
if ((copy = end - offset) > 0) {
- __wsum csum2;
- int err = 0;
- u8 *vaddr;
+ __wsum csum2 = 0;
struct page *page = skb_frag_page(frag);
+ u8 *vaddr = kmap(page);
if (copy > len)
copy = len;
- vaddr = kmap(page);
- csum2 = csum_and_copy_to_user(vaddr +
- frag->page_offset +
- offset - start,
- to, copy, 0, &err);
+ n = csum_and_copy_to_iter(vaddr + frag->page_offset +
+ offset - start, copy,
+ &csum2, to);
kunmap(page);
- if (err)
+ if (n != copy)
goto fault;
*csump = csum_block_add(*csump, csum2, pos);
if (!(len -= copy))
return 0;
offset += copy;
- to += copy;
pos += copy;
}
start = end;
@@ -691,7 +685,6 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset,
if ((len -= copy) == 0)
return 0;
offset += copy;
- to += copy;
pos += copy;
}
start = end;
@@ -744,20 +737,19 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
EXPORT_SYMBOL(__skb_checksum_complete);
/**
- * skb_copy_and_csum_datagram_iovec - Copy and checksum skb to user iovec.
+ * skb_copy_and_csum_datagram_msg - Copy and checksum skb to user iovec.
* @skb: skbuff
* @hlen: hardware length
- * @iov: io vector
+ * @msg: destination
*
* Caller _must_ check that skb will fit to this iovec.
*
* Returns: 0 - success.
* -EINVAL - checksum failure.
- * -EFAULT - fault during copy. Beware, in this case iovec
- * can be modified!
+ * -EFAULT - fault during copy.
*/
-int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
- int hlen, struct iovec *iov)
+int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
+ int hlen, struct msghdr *msg)
{
__wsum csum;
int chunk = skb->len - hlen;
@@ -765,28 +757,20 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
if (!chunk)
return 0;
- /* Skip filled elements.
- * Pretty silly, look at memcpy_toiovec, though 8)
- */
- while (!iov->iov_len)
- iov++;
-
- if (iov->iov_len < chunk) {
+ if (iov_iter_count(&msg->msg_iter) < chunk) {
if (__skb_checksum_complete(skb))
goto csum_error;
- if (skb_copy_datagram_iovec(skb, hlen, iov, chunk))
+ if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
goto fault;
} else {
csum = csum_partial(skb->data, hlen, skb->csum);
- if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base,
+ if (skb_copy_and_csum_datagram(skb, hlen, &msg->msg_iter,
chunk, &csum))
goto fault;
if (csum_fold(csum))
goto csum_error;
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
netdev_rx_csum_fault(skb->dev);
- iov->iov_len -= chunk;
- iov->iov_base += chunk;
}
return 0;
csum_error:
@@ -794,7 +778,7 @@ csum_error:
fault:
return -EFAULT;
}
-EXPORT_SYMBOL(skb_copy_and_csum_datagram_iovec);
+EXPORT_SYMBOL(skb_copy_and_csum_datagram_msg);
/**
* datagram_poll - generic datagram poll
--
2.1.3
^ permalink raw reply related
* [PATCH 08/12] first fruits - kill l2cap ->memcpy_fromiovec()
From: Al Viro @ 2014-12-05 5:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20141205055623.GQ29748@ZenIV.linux.org.uk>
From: Al Viro <viro@zeniv.linux.org.uk>
Just use copy_from_iter(). That's what this method is trying to do
in all cases, in a very convoluted fashion.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/net/bluetooth/l2cap.h | 29 -----------------------------
net/bluetooth/6lowpan.c | 3 +--
net/bluetooth/a2mp.c | 3 +--
net/bluetooth/l2cap_core.c | 7 +++----
net/bluetooth/l2cap_sock.c | 8 --------
net/bluetooth/smp.c | 4 +---
6 files changed, 6 insertions(+), 48 deletions(-)
diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
index bca6fc0..692f786 100644
--- a/include/net/bluetooth/l2cap.h
+++ b/include/net/bluetooth/l2cap.h
@@ -606,10 +606,6 @@ struct l2cap_ops {
struct sk_buff *(*alloc_skb) (struct l2cap_chan *chan,
unsigned long hdr_len,
unsigned long len, int nb);
- int (*memcpy_fromiovec) (struct l2cap_chan *chan,
- unsigned char *kdata,
- struct msghdr *msg,
- int len);
};
struct l2cap_conn {
@@ -903,31 +899,6 @@ static inline long l2cap_chan_no_get_sndtimeo(struct l2cap_chan *chan)
return 0;
}
-static inline int l2cap_chan_no_memcpy_fromiovec(struct l2cap_chan *chan,
- unsigned char *kdata,
- struct msghdr *msg,
- int len)
-{
- /* Following is safe since for compiler definitions of kvec and
- * iovec are identical, yielding the same in-core layout and alignment
- */
- struct kvec *vec = (struct kvec *)msg->msg_iter.iov;
-
- while (len > 0) {
- if (vec->iov_len) {
- int copy = min_t(unsigned int, len, vec->iov_len);
- memcpy(kdata, vec->iov_base, copy);
- len -= copy;
- kdata += copy;
- vec->iov_base += copy;
- vec->iov_len -= copy;
- }
- vec++;
- }
-
- return 0;
-}
-
extern bool disable_ertm;
int l2cap_init_sockets(void);
diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index d8c67a5..76617be 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -541,7 +541,7 @@ static int send_pkt(struct l2cap_chan *chan, struct sk_buff *skb,
iv.iov_len = skb->len;
memset(&msg, 0, sizeof(msg));
- iov_iter_init(&msg.msg_iter, WRITE, (struct iovec *) &iv, 1, skb->len);
+ iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, &iv, 1, skb->len);
err = l2cap_chan_send(chan, &msg, skb->len);
if (err > 0) {
@@ -1050,7 +1050,6 @@ static const struct l2cap_ops bt_6lowpan_chan_ops = {
.suspend = chan_suspend_cb,
.get_sndtimeo = chan_get_sndtimeo_cb,
.alloc_skb = chan_alloc_skb_cb,
- .memcpy_fromiovec = l2cap_chan_no_memcpy_fromiovec,
.teardown = l2cap_chan_no_teardown,
.defer = l2cap_chan_no_defer,
diff --git a/net/bluetooth/a2mp.c b/net/bluetooth/a2mp.c
index 716d2a3..cedfbda 100644
--- a/net/bluetooth/a2mp.c
+++ b/net/bluetooth/a2mp.c
@@ -60,7 +60,7 @@ void a2mp_send(struct amp_mgr *mgr, u8 code, u8 ident, u16 len, void *data)
memset(&msg, 0, sizeof(msg));
- iov_iter_init(&msg.msg_iter, WRITE, (struct iovec *)&iv, 1, total_len);
+ iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, &iv, 1, total_len);
l2cap_chan_send(chan, &msg, total_len);
@@ -719,7 +719,6 @@ static const struct l2cap_ops a2mp_chan_ops = {
.resume = l2cap_chan_no_resume,
.set_shutdown = l2cap_chan_no_set_shutdown,
.get_sndtimeo = l2cap_chan_no_get_sndtimeo,
- .memcpy_fromiovec = l2cap_chan_no_memcpy_fromiovec,
};
static struct l2cap_chan *a2mp_chan_open(struct l2cap_conn *conn, bool locked)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 5201d61..1754040 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -2096,8 +2096,7 @@ static inline int l2cap_skbuff_fromiovec(struct l2cap_chan *chan,
struct sk_buff **frag;
int sent = 0;
- if (chan->ops->memcpy_fromiovec(chan, skb_put(skb, count),
- msg, count))
+ if (copy_from_iter(skb_put(skb, count), count, &msg->msg_iter) != count)
return -EFAULT;
sent += count;
@@ -2117,8 +2116,8 @@ static inline int l2cap_skbuff_fromiovec(struct l2cap_chan *chan,
*frag = tmp;
- if (chan->ops->memcpy_fromiovec(chan, skb_put(*frag, count),
- msg, count))
+ if (copy_from_iter(skb_put(*frag, count), count,
+ &msg->msg_iter) != count)
return -EFAULT;
sent += count;
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 205b298..f65caf4 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -1336,13 +1336,6 @@ static struct sk_buff *l2cap_sock_alloc_skb_cb(struct l2cap_chan *chan,
return skb;
}
-static int l2cap_sock_memcpy_fromiovec_cb(struct l2cap_chan *chan,
- unsigned char *kdata,
- struct msghdr *msg, int len)
-{
- return memcpy_from_msg(kdata, msg, len);
-}
-
static void l2cap_sock_ready_cb(struct l2cap_chan *chan)
{
struct sock *sk = chan->data;
@@ -1427,7 +1420,6 @@ static const struct l2cap_ops l2cap_chan_ops = {
.set_shutdown = l2cap_sock_set_shutdown_cb,
.get_sndtimeo = l2cap_sock_get_sndtimeo_cb,
.alloc_skb = l2cap_sock_alloc_skb_cb,
- .memcpy_fromiovec = l2cap_sock_memcpy_fromiovec_cb,
};
static void l2cap_sock_destruct(struct sock *sk)
diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
index 21f555b..de7dc75 100644
--- a/net/bluetooth/smp.c
+++ b/net/bluetooth/smp.c
@@ -268,7 +268,7 @@ static void smp_send_cmd(struct l2cap_conn *conn, u8 code, u16 len, void *data)
memset(&msg, 0, sizeof(msg));
- iov_iter_init(&msg.msg_iter, WRITE, (struct iovec *)iv, 2, 1 + len);
+ iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, iv, 2, 1 + len);
l2cap_chan_send(chan, &msg, 1 + len);
@@ -1629,7 +1629,6 @@ static const struct l2cap_ops smp_chan_ops = {
.suspend = l2cap_chan_no_suspend,
.set_shutdown = l2cap_chan_no_set_shutdown,
.get_sndtimeo = l2cap_chan_no_get_sndtimeo,
- .memcpy_fromiovec = l2cap_chan_no_memcpy_fromiovec,
};
static inline struct l2cap_chan *smp_new_conn_cb(struct l2cap_chan *pchan)
@@ -1678,7 +1677,6 @@ static const struct l2cap_ops smp_root_chan_ops = {
.resume = l2cap_chan_no_resume,
.set_shutdown = l2cap_chan_no_set_shutdown,
.get_sndtimeo = l2cap_chan_no_get_sndtimeo,
- .memcpy_fromiovec = l2cap_chan_no_memcpy_fromiovec,
};
int smp_register(struct hci_dev *hdev)
--
2.1.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox