All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3 bpf-next 5/8] veth: Add ndo_xdp_xmit
From: Jakub Kicinski @ 2018-07-24  1:02 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Toshiaki Makita,
	Jesper Dangaard Brouer
In-Reply-To: <20180722151308.5480-6-toshiaki.makita1@gmail.com>

On Mon, 23 Jul 2018 00:13:05 +0900, Toshiaki Makita wrote:
> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> 
> This allows NIC's XDP to redirect packets to veth. The destination veth
> device enqueues redirected packets to the napi ring of its peer, then
> they are processed by XDP on its peer veth device.
> This can be thought as calling another XDP program by XDP program using
> REDIRECT, when the peer enables driver XDP.
> 
> Note that when the peer veth device does not set driver xdp, redirected
> packets will be dropped because the peer is not ready for NAPI.

Often we can't redirect to devices which don't have am xdp program
installed.  In your case we can't redirect unless the peer of the
target doesn't have a program installed?  :(

Perhaps it is time to reconsider what Saeed once asked for, a flag or
attribute to enable being the destination of a XDP_REDIRECT.

> v2:
> - Drop the part converting xdp_frame into skb when XDP is not enabled.
> - Implement bulk interface of ndo_xdp_xmit.
> - Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush.
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>  drivers/net/veth.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 4be75c58bc6a..57187e955fea 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -17,6 +17,7 @@
>  #include <net/rtnetlink.h>
>  #include <net/dst.h>
>  #include <net/xfrm.h>
> +#include <net/xdp.h>
>  #include <linux/veth.h>
>  #include <linux/module.h>
>  #include <linux/bpf.h>
> @@ -125,6 +126,11 @@ static void *veth_ptr_to_xdp(void *ptr)
>  	return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG);
>  }
>  
> +static void *veth_xdp_to_ptr(void *ptr)
> +{
> +	return (void *)((unsigned long)ptr | VETH_XDP_FLAG);
> +}
> +
>  static void veth_ptr_free(void *ptr)
>  {
>  	if (veth_is_xdp_frame(ptr))
> @@ -267,6 +273,44 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
>  	return skb;
>  }
>  
> +static int veth_xdp_xmit(struct net_device *dev, int n,
> +			 struct xdp_frame **frames, u32 flags)
> +{
> +	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
> +	struct net_device *rcv;
> +	int i, drops = 0;
> +
> +	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
> +		return -EINVAL;
> +
> +	rcv = rcu_dereference(priv->peer);
> +	if (unlikely(!rcv))
> +		return -ENXIO;
> +
> +	rcv_priv = netdev_priv(rcv);
> +	/* xdp_ring is initialized on receive side? */
> +	if (!rcu_access_pointer(rcv_priv->xdp_prog))
> +		return -ENXIO;
> +
> +	spin_lock(&rcv_priv->xdp_ring.producer_lock);
> +	for (i = 0; i < n; i++) {
> +		struct xdp_frame *frame = frames[i];
> +		void *ptr = veth_xdp_to_ptr(frame);
> +
> +		if (unlikely(xdp_ok_fwd_dev(rcv, frame->len) ||
> +			     __ptr_ring_produce(&rcv_priv->xdp_ring, ptr))) {

Would you mind sparing a few more words how this is safe vs the
.ndo_close() on the peer?  Personally I'm a bit uncomfortable with the
IFF_UP check in xdp_ok_fwd_dev(), I'm not sure what's supposed to
guarantee the device doesn't go down right after that check, or is
already down, but netdev->flags are not atomic...  

> +			xdp_return_frame_rx_napi(frame);
> +			drops++;
> +		}
> +	}
> +	spin_unlock(&rcv_priv->xdp_ring.producer_lock);
> +
> +	if (flags & XDP_XMIT_FLUSH)
> +		__veth_xdp_flush(rcv_priv);
> +
> +	return n - drops;
> +}
> +
>  static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
>  					struct xdp_frame *frame)
>  {
> @@ -760,6 +804,7 @@ static const struct net_device_ops veth_netdev_ops = {
>  	.ndo_features_check	= passthru_features_check,
>  	.ndo_set_rx_headroom	= veth_set_rx_headroom,
>  	.ndo_bpf		= veth_xdp,
> +	.ndo_xdp_xmit		= veth_xdp_xmit,
>  };
>  
>  #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \

^ permalink raw reply

* [PATCH] tpm: add support for partial reads
From: Jason Gunthorpe @ 2018-07-24  2:05 UTC (permalink / raw)
  To: linux-security-module
In-Reply-To: <ee0f0ccf-f757-a7d4-bf55-316f81fb490b@intel.com>

On Mon, Jul 23, 2018 at 04:42:38PM -0700, Tadeusz Struk wrote:
> On 07/23/2018 03:08 PM, Jason Gunthorpe wrote:
> > On Mon, Jul 23, 2018 at 03:00:20PM -0700, Tadeusz Struk wrote:
> >> On 07/23/2018 02:56 PM, Jason Gunthorpe wrote:
> >>> The proposed patch doesn't clear the data_pending if the entire buffer
> >>> is not consumed, so of course it is ABI breaking, that really isn't OK.
> >> The data_pending will be cleared by the timeout handler if the user doesn't
> >> read the response fully before the timeout expires. The is the same situation
> >> if the user would not read the response at all.
> > That causes write() to fail with EBUSY
> > 
> > NAK from me on breaking the ABI like this
> 
> What if we introduce this new behavior only for the non-blocking mode
> as James suggested? Or do you have some other suggestions?

I think you should do it entirely in userspace.

But something sensible linked to O_NONBLOCK could be OK.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH v6 4/5] ARM: dts: imx7s: add i.MX7 messaging unit support
From: A.s. Dong @ 2018-07-24  2:04 UTC (permalink / raw)
  To: Lucas Stach, Oleksij Rempel, Shawn Guo, Fabio Estevam,
	Rob Herring, Mark Rutland, Vladimir Zapolskiy
  Cc: kernel@pengutronix.de, devicetree@vger.kernel.org, dl-linux-imx,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <1532365033.3163.107.camel@pengutronix.de>

> -----Original Message-----
> From: Lucas Stach [mailto:l.stach@pengutronix.de]
> Sent: Tuesday, July 24, 2018 12:57 AM
> To: Oleksij Rempel <o.rempel@pengutronix.de>; Shawn Guo
> <shawnguo@kernel.org>; Fabio Estevam <fabio.estevam@nxp.com>; Rob
> Herring <robh+dt@kernel.org>; Mark Rutland <mark.rutland@arm.com>; A.s.
> Dong <aisheng.dong@nxp.com>; Vladimir Zapolskiy
> <vladimir_zapolskiy@mentor.com>
> Cc: dl-linux-imx <linux-imx@nxp.com>; linux-arm-kernel@lists.infradead.org;
> kernel@pengutronix.de; devicetree@vger.kernel.org
> Subject: Re: [PATCH v6 4/5] ARM: dts: imx7s: add i.MX7 messaging unit
> support
> 
> Am Sonntag, den 22.07.2018, 08:39 +0200 schrieb Oleksij Rempel:
> > Define the Messaging Unit (MU) for i.MX7 in the processor's dtsi.
> > The respective driver is added in the next commit.
> >
> > > Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
> > > Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> > ---
> >  arch/arm/boot/dts/imx7s.dtsi | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/arch/arm/boot/dts/imx7s.dtsi
> > b/arch/arm/boot/dts/imx7s.dtsi index ce85b3ca1a55..191a0286fa3b 100644
> > --- a/arch/arm/boot/dts/imx7s.dtsi
> > +++ b/arch/arm/boot/dts/imx7s.dtsi
> > @@ -1001,6 +1001,25 @@
> > >  				status = "disabled";
> > >  			};
> >
> > > > +			mu0a: mailbox@30aa0000 {
> > +				compatible = "fsl,imx7s-mu", "fsl,imx6sx-mu";
> 
> Those compatibles are missing documentation in the binding.
> 

Sorry for the overlooking.
They seem to be got missed during a updated binding doc review.

Oleksij,
Would you add them back and resend the patch series?

Regards
Dong Aisheng

> > +				reg = <0x30aa0000 0x10000>;
> > > +				interrupts = <GIC_SPI 88
> IRQ_TYPE_LEVEL_HIGH>;
> > > +				clocks = <&clks IMX7D_MU_ROOT_CLK>;
> > > +				#mbox-cells = <1>;
> > > +				status = "disabled";
> > > +			};
> > +
> > > > +			mu0b: mailbox@30ab0000 {
> > > +				compatible = "fsl,imx7s-mu", "fsl,imx6sx-mu";
> > > +				reg = <0x30ab0000 0x10000>;
> > > +				interrupts = <GIC_SPI 97
> IRQ_TYPE_LEVEL_HIGH>;
> > > +				clocks = <&clks IMX7D_MU_ROOT_CLK>;
> > > +				#mbox-cells = <1>;
> > > +				fsl,mu-side-b;
> > > +				status = "disabled";
> > > +			};
> > +
> > > >  			usbotg1: usb@30b10000 {
> > >  				compatible = "fsl,imx7d-usb", "fsl,imx27-usb";
> > >  				reg = <0x30b10000 0x200>;
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH v6 4/5] ARM: dts: imx7s: add i.MX7 messaging unit support
From: A.s. Dong @ 2018-07-24  2:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1532365033.3163.107.camel@pengutronix.de>

> -----Original Message-----
> From: Lucas Stach [mailto:l.stach at pengutronix.de]
> Sent: Tuesday, July 24, 2018 12:57 AM
> To: Oleksij Rempel <o.rempel@pengutronix.de>; Shawn Guo
> <shawnguo@kernel.org>; Fabio Estevam <fabio.estevam@nxp.com>; Rob
> Herring <robh+dt@kernel.org>; Mark Rutland <mark.rutland@arm.com>; A.s.
> Dong <aisheng.dong@nxp.com>; Vladimir Zapolskiy
> <vladimir_zapolskiy@mentor.com>
> Cc: dl-linux-imx <linux-imx@nxp.com>; linux-arm-kernel at lists.infradead.org;
> kernel at pengutronix.de; devicetree at vger.kernel.org
> Subject: Re: [PATCH v6 4/5] ARM: dts: imx7s: add i.MX7 messaging unit
> support
> 
> Am Sonntag, den 22.07.2018, 08:39 +0200 schrieb Oleksij Rempel:
> > Define the Messaging Unit (MU) for i.MX7 in the processor's dtsi.
> > The respective driver is added in the next commit.
> >
> > > Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
> > > Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> > ---
> > ?arch/arm/boot/dts/imx7s.dtsi | 19 +++++++++++++++++++
> > ?1 file changed, 19 insertions(+)
> >
> > diff --git a/arch/arm/boot/dts/imx7s.dtsi
> > b/arch/arm/boot/dts/imx7s.dtsi index ce85b3ca1a55..191a0286fa3b 100644
> > --- a/arch/arm/boot/dts/imx7s.dtsi
> > +++ b/arch/arm/boot/dts/imx7s.dtsi
> > @@ -1001,6 +1001,25 @@
> > > ?				status = "disabled";
> > > ?			};
> >
> > > > +			mu0a: mailbox at 30aa0000 {
> > +				compatible = "fsl,imx7s-mu", "fsl,imx6sx-mu";
> 
> Those compatibles are missing documentation in the binding.
> 

Sorry for the overlooking.
They seem to be got missed during a updated binding doc review.

Oleksij,
Would you add them back and resend the patch series?

Regards
Dong Aisheng

> > +				reg = <0x30aa0000 0x10000>;
> > > +				interrupts = <GIC_SPI 88
> IRQ_TYPE_LEVEL_HIGH>;
> > > +				clocks = <&clks IMX7D_MU_ROOT_CLK>;
> > > +				#mbox-cells = <1>;
> > > +				status = "disabled";
> > > +			};
> > +
> > > > +			mu0b: mailbox at 30ab0000 {
> > > +				compatible = "fsl,imx7s-mu", "fsl,imx6sx-mu";
> > > +				reg = <0x30ab0000 0x10000>;
> > > +				interrupts = <GIC_SPI 97
> IRQ_TYPE_LEVEL_HIGH>;
> > > +				clocks = <&clks IMX7D_MU_ROOT_CLK>;
> > > +				#mbox-cells = <1>;
> > > +				fsl,mu-side-b;
> > > +				status = "disabled";
> > > +			};
> > +
> > > > ?			usbotg1: usb at 30b10000 {
> > > ?				compatible = "fsl,imx7d-usb", "fsl,imx27-usb";
> > > ?				reg = <0x30b10000 0x200>;

^ permalink raw reply

* Re: [PATCH] iommu/arm-smmu-v3: sync the OVACKFLG to PRIQ consumer register
From: Zhongmiao @ 2018-07-24  2:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Zhangshaokun,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Leizhen (ThunderTown)
  Cc: Will Deacon


Hi ,

>"Note: In terms of delivering events, a queue in an unacknowledged overflow state does not behave any differently to normal queue state. If software were to consume events and free space but leave overflow unacknowledged, new events could be recorded."

Yeah, I agree with you . But I test  priq , It's going to be a problem.   And  See the PRI queue overflow section of the SMMUv3 specification (IHI0070A):

" Note: On detection of overflow, ARM recommends that software processes the received PRI queue entries as quickly as possible, and follows this with a single write of SMMU_PRIQ_CONS to simultaneously move on the RD pointer and acknowledge receipt of the overflow condition by setting OVACKFLG to the value read from OVFLG."

Thanks,
Miao

^ permalink raw reply

* [U-Boot] [PATCH 13/17] fs: fat: support mkdir
From: AKASHI Takahiro @ 2018-07-24  2:01 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <58f7f25d-33ec-bd3f-3a03-6dca27a39be9@gmx.de>

On Fri, Jul 20, 2018 at 07:14:21PM +0200, Heinrich Schuchardt wrote:
> On 07/20/2018 04:57 AM, AKASHI Takahiro wrote:
> > In this patch, mkdir support is added to FAT file system.
> > A newly created directory contains only "." and ".." entries.
> > 
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> 
> The patch does set the creation date of the directory according to the
> real time clock but to 1980-01-01T00:00:00Z.
> 
> $ ls /mnt/dir1/ -la --full-time
> drwxr-xr-x 2 root root  2048 1980-01-01 01:00:00.000000000 +0100 dir3
> 
> Please, set the time correctly if the real time clock is available.

Good point, but I'd like to put this issue into future TODO list
as it will end up introducing a new API, such as gettimeofday()?

(and this is not a FAT-specific issue.)

Thanks,
-Takahiro AKASHI

> Best regards
> 
> Heinrich
> 
> > ---
> >  fs/fat/fat_write.c | 138 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/fs.c            |   3 +-
> >  include/fat.h      |   1 +
> >  3 files changed, 141 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
> > index cc45a33876..781883c9f4 100644
> > --- a/fs/fat/fat_write.c
> > +++ b/fs/fat/fat_write.c
> > @@ -1185,3 +1185,141 @@ int file_fat_write(const char *filename, void *buffer, loff_t offset,
> >  {
> >  	return file_fat_write_at(filename, offset, buffer, maxsize, actwrite);
> >  }
> > +
> > +int fat_mkdir(const char *new_dirname)
> > +{
> > +	dir_entry *retdent;
> > +	fsdata datablock = { .fatbuf = NULL, };
> > +	fsdata *mydata = &datablock;
> > +	fat_itr *itr = NULL;
> > +	char *dirname_copy, *parent, *dirname;
> > +	char l_dirname[VFAT_MAXLEN_BYTES];
> > +	int ret = -1;
> > +	loff_t actwrite;
> > +	unsigned int bytesperclust;
> > +	dir_entry *dotdent = NULL;
> > +
> > +	dirname_copy = strdup(new_dirname);
> > +	if (!dirname_copy)
> > +		goto exit;
> > +
> > +	split_filename(dirname_copy, &parent, &dirname);
> > +	if (!strlen(dirname)) {
> > +		ret = -EINVAL;
> > +		goto exit;
> > +	}
> > +
> > +	if (normalize_longname(l_dirname, dirname)) {
> > +		printf("FAT: illegal filename (%s)\n", dirname);
> > +		ret = -EINVAL;
> > +		goto exit;
> > +	}
> > +
> > +	itr = malloc_cache_aligned(sizeof(fat_itr));
> > +	if (!itr) {
> > +		ret = -ENOMEM;
> > +		goto exit;
> > +	}
> > +
> > +	ret = fat_itr_root(itr, &datablock);
> > +	if (ret)
> > +		goto exit;
> > +
> > +	total_sector = datablock.bs_total_sect;
> > +	if (total_sector == 0)
> > +		total_sector = (int)cur_part_info.size; /* cast of lbaint_t */
> > +
> > +	ret = fat_itr_resolve(itr, parent, TYPE_DIR);
> > +	if (ret) {
> > +		printf("%s: doesn't exist (%d)\n", parent, ret);
> > +		goto exit;
> > +	}
> > +
> > +	retdent = find_directory_entry(itr, l_dirname);
> > +
> > +	if (retdent) {
> > +		printf("%s: already exists\n", l_dirname);
> > +		ret = -EEXIST;
> > +		goto exit;
> > +	} else {
> > +		if (itr->is_root) {
> > +			/* root dir cannot have "." or ".." */
> > +			if (!strcmp(l_dirname, ".") ||
> > +			    !strcmp(l_dirname, "..")) {
> > +				ret = -EINVAL;
> > +				goto exit;
> > +			}
> > +		}
> > +
> > +		if (!itr->dent) {
> > +			printf("Error: allocating new dir entry\n");
> > +			ret = -EIO;
> > +			goto exit;
> > +		}
> > +
> > +		memset(itr->dent, 0, sizeof(*itr->dent));
> > +
> > +		/* Set short name to set alias checksum field in dir_slot */
> > +		set_name(itr->dent, dirname);
> > +		fill_dir_slot(itr, dirname);
> > +
> > +		/* Set attribute as archieve for regular file */
> > +		fill_dentry(itr->fsdata, itr->dent, dirname, 0, 0,
> > +							ATTR_DIR | ATTR_ARCH);
> > +
> > +		retdent = itr->dent;
> > +	}
> > +
> > +	/* Default entries */
> > +	bytesperclust = mydata->clust_size * mydata->sect_size;
> > +	dotdent = malloc_cache_aligned(bytesperclust);
> > +	if (!dotdent) {
> > +		ret = -ENOMEM;
> > +		goto exit;
> > +	}
> > +	memset(dotdent, 0, bytesperclust);
> > +
> > +	memcpy(dotdent[0].name, ".       ", 8);
> > +	memcpy(dotdent[0].ext, "   ", 3);
> > +	dotdent[0].attr = ATTR_DIR | ATTR_ARCH;
> > +
> > +	memcpy(dotdent[1].name, "..      ", 8);
> > +	memcpy(dotdent[1].ext, "   ", 3);
> > +	dotdent[1].attr = ATTR_DIR | ATTR_ARCH;
> > +	set_start_cluster(mydata, &dotdent[1], itr->start_clust);
> > +
> > +	ret = set_contents(mydata, retdent, 0, (__u8 *)dotdent, bytesperclust,
> > +								&actwrite);
> > +	if (ret < 0) {
> > +		printf("Error: writing contents\n");
> > +		goto exit;
> > +	}
> > +	/* Write twice for "." */
> > +	set_start_cluster(mydata, &dotdent[0], START(retdent));
> > +	ret = set_contents(mydata, retdent, 0, (__u8 *)dotdent, bytesperclust,
> > +								&actwrite);
> > +	if (ret < 0) {
> > +		printf("Error: writing contents\n");
> > +		goto exit;
> > +	}
> > +
> > +	/* Flush fat buffer */
> > +	ret = flush_dirty_fat_buffer(mydata);
> > +	if (ret) {
> > +		printf("Error: flush fat buffer\n");
> > +		goto exit;
> > +	}
> > +
> > +	/* Write directory table to device */
> > +	ret = set_cluster(mydata, itr->clust, itr->block,
> > +			mydata->clust_size * mydata->sect_size);
> > +	if (ret)
> > +		printf("Error: writing directory entry\n");
> > +
> > +exit:
> > +	free(dirname_copy);
> > +	free(mydata->fatbuf);
> > +	free(itr);
> > +	free(dotdent);
> > +	return ret;
> > +}
> > diff --git a/fs/fs.c b/fs/fs.c
> > index 3cb6b21fe9..a92e060296 100644
> > --- a/fs/fs.c
> > +++ b/fs/fs.c
> > @@ -164,14 +164,15 @@ static struct fstype_info fstypes[] = {
> >  		.read = fat_read_file,
> >  #ifdef CONFIG_FAT_WRITE
> >  		.write = file_fat_write,
> > +		.mkdir = fat_mkdir,
> >  #else
> >  		.write = fs_write_unsupported,
> > +		.mkdir = fs_mkdir_unsupported,
> >  #endif
> >  		.uuid = fs_uuid_unsupported,
> >  		.opendir = fat_opendir,
> >  		.readdir = fat_readdir,
> >  		.closedir = fat_closedir,
> > -		.mkdir = fs_mkdir_unsupported,
> >  	},
> >  #endif
> >  #ifdef CONFIG_FS_EXT4
> > diff --git a/include/fat.h b/include/fat.h
> > index 295da0f243..039b6e03de 100644
> > --- a/include/fat.h
> > +++ b/include/fat.h
> > @@ -237,5 +237,6 @@ int fat_read_file(const char *filename, void *buf, loff_t offset, loff_t len,
> >  int fat_opendir(const char *filename, struct fs_dir_stream **dirsp);
> >  int fat_readdir(struct fs_dir_stream *dirs, struct fs_dirent **dentp);
> >  void fat_closedir(struct fs_dir_stream *dirs);
> > +int fat_mkdir(const char *dirname);
> >  void fat_close(void);
> >  #endif /* _FAT_H_ */
> > 
> 

^ permalink raw reply

* Re: [PATCH 2/2 v2] Add support for CPCAP regulators on Tegra devices.
From: Peter Geis @ 2018-07-24  1:57 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: broonie, lgirdwood, robh+dt, linux-kernel, devicetree,
	linux-tegra
In-Reply-To: <4588004.Gnq6nbUl2V@dimapc>



On 07/23/2018 08:27 PM, Dmitry Osipenko wrote:
> On Monday, 23 July 2018 22:38:48 MSK Peter Geis wrote:
>> Added support for the CPCAP power management regulator functions on
>> Tegra devices.
>> Added sw2_sw4 value tables, which provide power to the Tegra core and
>> aux devices.
>> Added the Tegra init tables and device tree compatibility match.
>>
>> Signed-off-by: Peter Geis <pgwipeout@gmail.com>
>> ---
>>   .../bindings/regulator/cpcap-regulator.txt    |  1 +
>>   drivers/regulator/cpcap-regulator.c           | 80 +++++++++++++++++++
>>   2 files changed, 81 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
>> b/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt index
>> 675f4437ce92..3e2d33ab1731 100644
>> --- a/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
>> +++ b/Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
>> @@ -4,6 +4,7 @@ Motorola CPCAP PMIC voltage regulators
>>   Requires node properties:
>>   - "compatible" value one of:
>>       "motorola,cpcap-regulator"
>> +    "motorola,tegra-cpcap-regulator"
>>       "motorola,mapphone-cpcap-regulator"
>>
>>   Required regulator properties:
>> diff --git a/drivers/regulator/cpcap-regulator.c
>> b/drivers/regulator/cpcap-regulator.c index c0b1e04bd90f..cb3774be445d
>> 100644
>> --- a/drivers/regulator/cpcap-regulator.c
>> +++ b/drivers/regulator/cpcap-regulator.c
>> @@ -412,6 +412,82 @@ static struct cpcap_regulator omap4_regulators[] = {
>>   	{ /* sentinel */ },
>>   };
>>
>> +static struct cpcap_regulator tegra_regulators[] = {
>> +	CPCAP_REG(SW1, CPCAP_REG_S1C1, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_SW1_SEL, unknown_val_tbl,
>> +		  0, 0, 0, 0, 0, 0),
>> +	CPCAP_REG(SW2, CPCAP_REG_S2C1, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_SW2_SEL, sw2_sw4_val_tbl,
>> +		  0xf00, 0x7f, 0, 0x800, 0, 120),
>> +	CPCAP_REG(SW3, CPCAP_REG_S3C, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_SW3_SEL, unknown_val_tbl,
>> +		  0, 0, 0, 0, 0, 0),
>> +	CPCAP_REG(SW4, CPCAP_REG_S4C1, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_SW4_SEL, sw2_sw4_val_tbl,
>> +		  0xf00, 0x7f, 0, 0x900, 0, 100),
>> +	CPCAP_REG(SW5, CPCAP_REG_S5C, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_SW5_SEL, sw5_val_tbl,
>> +		  0x2a, 0, 0, 0x22, 0, 0),
>> +	CPCAP_REG(SW6, CPCAP_REG_S6C, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_SW6_SEL, unknown_val_tbl,
>> +		  0, 0, 0, 0, 0, 0),
>> +	CPCAP_REG(VCAM, CPCAP_REG_VCAMC, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_VCAM_SEL, vcam_val_tbl,
>> +		  0x87, 0x30, 4, 0x7, 0, 420),
>> +	CPCAP_REG(VCSI, CPCAP_REG_VCSIC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VCSI_SEL, vcsi_val_tbl,
>> +		  0x47, 0x10, 4, 0x7, 0, 350),
>> +	CPCAP_REG(VDAC, CPCAP_REG_VDACC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VDAC_SEL, vdac_val_tbl,
>> +		  0x87, 0x30, 4, 0x3, 0, 420),
>> +	CPCAP_REG(VDIG, CPCAP_REG_VDIGC, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_VDIG_SEL, vdig_val_tbl,
>> +		  0x87, 0x30, 4, 0x5, 0, 420),
>> +	CPCAP_REG(VFUSE, CPCAP_REG_VFUSEC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VFUSE_SEL, vfuse_val_tbl,
>> +		  0x80, 0xf, 0, 0x80, 0, 420),
>> +	CPCAP_REG(VHVIO, CPCAP_REG_VHVIOC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VHVIO_SEL, vhvio_val_tbl,
>> +		  0x17, 0, 0, 0x2, 0, 0),
>> +	CPCAP_REG(VSDIO, CPCAP_REG_VSDIOC, CPCAP_REG_ASSIGN2,
>> +		  CPCAP_BIT_VSDIO_SEL, vsdio_val_tbl,
>> +		  0x87, 0x38, 3, 0x2, 0, 420),
>> +	CPCAP_REG(VPLL, CPCAP_REG_VPLLC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VPLL_SEL, vpll_val_tbl,
>> +		  0x43, 0x18, 3, 0x1, 0, 420),
>> +	CPCAP_REG(VRF1, CPCAP_REG_VRF1C, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VRF1_SEL, vrf1_val_tbl,
>> +		  0xac, 0x2, 1, 0xc, 0, 10),
>> +	CPCAP_REG(VRF2, CPCAP_REG_VRF2C, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VRF2_SEL, vrf2_val_tbl,
>> +		  0x23, 0x8, 3, 0x3, 0, 10),
>> +	CPCAP_REG(VRFREF, CPCAP_REG_VRFREFC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VRFREF_SEL, vrfref_val_tbl,
>> +		  0x23, 0x8, 3, 0x3, 0, 420),
>> +	CPCAP_REG(VWLAN1, CPCAP_REG_VWLAN1C, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VWLAN1_SEL, vwlan1_val_tbl,
>> +		  0x47, 0x10, 4, 0x5, 0, 420),
>> +	CPCAP_REG(VWLAN2, CPCAP_REG_VWLAN2C, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VWLAN2_SEL, vwlan2_val_tbl,
>> +		  0x20c, 0xc0, 6, 0x8, 0, 420),
>> +	CPCAP_REG(VSIM, CPCAP_REG_VSIMC, CPCAP_REG_ASSIGN3,
>> +		  0xffff, vsim_val_tbl,
>> +		  0x23, 0x8, 3, 0x3, 0, 420),
>> +	CPCAP_REG(VSIMCARD, CPCAP_REG_VSIMC, CPCAP_REG_ASSIGN3,
>> +		  0xffff, vsimcard_val_tbl,
>> +		  0x1e80, 0x8, 3, 0x1e00, 0, 420),
>> +	CPCAP_REG(VVIB, CPCAP_REG_VVIBC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VVIB_SEL, vvib_val_tbl,
>> +		  0x1, 0xc, 2, 0, 0x1, 500),
>> +	CPCAP_REG(VUSB, CPCAP_REG_VUSBC, CPCAP_REG_ASSIGN3,
>> +		  CPCAP_BIT_VUSB_SEL, vusb_val_tbl,
>> +		  0x11c, 0x40, 6, 0xc, 0, 0),
>> +	CPCAP_REG(VAUDIO, CPCAP_REG_VAUDIOC, CPCAP_REG_ASSIGN4,
>> +		  CPCAP_BIT_VAUDIO_SEL, vaudio_val_tbl,
>> +		  0x16, 0x1, 0, 0x4, 0, 0),
>> +	{ /* sentinel */ },
>> +};
>> +
>>   static const struct of_device_id cpcap_regulator_id_table[] = {
>>   	{
>>   		.compatible = "motorola,cpcap-regulator",
>> @@ -420,6 +496,10 @@ static const struct of_device_id
>> cpcap_regulator_id_table[] = { .compatible =
>> "motorola,mapphone-cpcap-regulator",
>>   		.data = omap4_regulators,
>>   	},
>> +	{
>> +		.compatible = "motorola,tegra-cpcap-regulator",
>> +		.data = tegra_regulators,
>> +	},
>>   	{},
>>   };
>>   MODULE_DEVICE_TABLE(of, cpcap_regulator_id_table);
> 
> Are those cpcap_regulator values really common for all Tegra's? Probably
> better to name the DT compatible like "motorola,xoom-cpcap-regulator" or
> "motorola,mz602-cpcap-regulator".
> 
> The DT binding documentation should be updated with the new compatible value
> in the Documentation/devicetree/bindings/regulator/cpcap-regulator.txt
> 

I see your point, from my research the only other device that ever 
paired a Tegra with a cpcap was the Atrix 4g, and it has completely 
different values.
For now I will change the value to "motorola,xoom-cpcap-regulator", 
since these values are common across the entire Xoom line.
The DT binding documentation was updated as part of this patch, it will 
be adjusted to the new value.

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: force_napi_tx module param.
From: Stephen Hemminger @ 2018-07-24  0:53 UTC (permalink / raw)
  To: Caleb Raitto; +Cc: mst, jasowang, davem, netdev, Caleb Raitto
In-Reply-To: <20180723231119.142904-1-caleb.raitto@gmail.com>

On Mon, 23 Jul 2018 16:11:19 -0700
Caleb Raitto <caleb.raitto@gmail.com> wrote:

> From: Caleb Raitto <caraitto@google.com>
> 
> The driver disables tx napi if it's not certain that completions will
> be processed affine with tx service.
> 
> Its heuristic doesn't account for some scenarios where it is, such as
> when the queue pair count matches the core but not hyperthread count.
> 
> Allow userspace to override the heuristic. This is an alternative
> solution to that in the linked patch. That added more logic in the
> kernel for these cases, but the agreement was that this was better left
> to user control.
> 
> Do not expand the existing napi_tx variable to a ternary value,
> because doing so can break user applications that expect
> boolean ('Y'/'N') instead of integer output. Add a new param instead.
> 
> Link: https://patchwork.ozlabs.org/patch/725249/
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Jon Olson <jonolson@google.com>
> Signed-off-by: Caleb Raitto <caraitto@google.com>
> ---

Not a fan of this.
Module parameters are frowned on by the distributions because they
never get well tested and they force the user to do magic things
to enable features. It looks like you are using it to paper
over a bug in this case.

^ permalink raw reply

* Re: [PATCH v2 2/2] PCI: NVMe device specific reset quirk
From: Alex Williamson @ 2018-07-24  1:57 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: linux-pci, linux-kernel, linux-nvme
In-Reply-To: <abc86e6b-7df5-a84c-7e98-107c605812c6@kernel.org>

On Mon, 23 Jul 2018 17:40:02 -0700
Sinan Kaya <okaya@kernel.org> wrote:

> On 7/23/2018 5:13 PM, Alex Williamson wrote:
> > + * The NVMe specification requires that controllers support PCIe FLR, but
> > + * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
> > + * config space) unless the device is quiesced prior to FLR.  
> 
> Does disabling the memory bit in PCI config space as part of the FLR 
> reset function help? (like the very first thing)

No, it does not.  I modified this to only clear PCI_COMMAND_MEMORY and
call pcie_flr(), the Samsung controller dies just as it did previously.
 
> Can we do that in the pcie_flr() function to cover other endpoint types
> that might be pushing traffic while code is trying to do a reset?

Do you mean PCI_COMMAND_MASTER rather than PCI_COMMAND_MEMORY?  I tried
that too, it doesn't work either.  I'm not really sure the theory
behind clearing memory, clearing busmaster to stop DMA seems like a
sane thing to do, but doesn't help here.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH v2 2/2] PCI: NVMe device specific reset quirk
From: Alex Williamson @ 2018-07-24  1:57 UTC (permalink / raw)
  To: Sinan Kaya; +Cc: linux-pci, linux-kernel, linux-nvme
In-Reply-To: <abc86e6b-7df5-a84c-7e98-107c605812c6@kernel.org>

On Mon, 23 Jul 2018 17:40:02 -0700
Sinan Kaya <okaya@kernel.org> wrote:

> On 7/23/2018 5:13 PM, Alex Williamson wrote:
> > + * The NVMe specification requires that controllers support PCIe FLR, but
> > + * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
> > + * config space) unless the device is quiesced prior to FLR.  
> 
> Does disabling the memory bit in PCI config space as part of the FLR 
> reset function help? (like the very first thing)

No, it does not.  I modified this to only clear PCI_COMMAND_MEMORY and
call pcie_flr(), the Samsung controller dies just as it did previously.
 
> Can we do that in the pcie_flr() function to cover other endpoint types
> that might be pushing traffic while code is trying to do a reset?

Do you mean PCI_COMMAND_MASTER rather than PCI_COMMAND_MEMORY?  I tried
that too, it doesn't work either.  I'm not really sure the theory
behind clearing memory, clearing busmaster to stop DMA seems like a
sane thing to do, but doesn't help here.  Thanks,

Alex

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply

* [PATCH v2 2/2] PCI: NVMe device specific reset quirk
From: Alex Williamson @ 2018-07-24  1:57 UTC (permalink / raw)

In-Reply-To: <abc86e6b-7df5-a84c-7e98-107c605812c6@kernel.org>

On Mon, 23 Jul 2018 17:40:02 -0700
Sinan Kaya <okaya@kernel.org> wrote:

> On 7/23/2018 5:13 PM, Alex Williamson wrote:
> > + * The NVMe specification requires that controllers support PCIe FLR, but
> > + * but some Samsung SM961/PM961 controllers fail to recover after FLR (-1
> > + * config space) unless the device is quiesced prior to FLR.  
> 
> Does disabling the memory bit in PCI config space as part of the FLR 
> reset function help? (like the very first thing)

No, it does not.  I modified this to only clear PCI_COMMAND_MEMORY and
call pcie_flr(), the Samsung controller dies just as it did previously.
 
> Can we do that in the pcie_flr() function to cover other endpoint types
> that might be pushing traffic while code is trying to do a reset?

Do you mean PCI_COMMAND_MASTER rather than PCI_COMMAND_MEMORY?  I tried
that too, it doesn't work either.  I'm not really sure the theory
behind clearing memory, clearing busmaster to stop DMA seems like a
sane thing to do, but doesn't help here.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH v2] systemd: Allow custom coredump config options
From: Andre McCurdy @ 2018-07-24  1:56 UTC (permalink / raw)
  To: Alistair Francis; +Cc: OE Core mailing list
In-Reply-To: <CAKmqyKNrUdU9wcUH8no9J_XNE=4ztujQ95tap7Zk5HBWv9XT9g@mail.gmail.com>

On Mon, Jul 23, 2018 at 4:16 PM, Alistair Francis <alistair23@gmail.com> wrote:
> On Wed, Jul 18, 2018 at 4:48 PM, Andre McCurdy <armccurdy@gmail.com> wrote:
>> On Wed, Jul 18, 2018 at 3:53 PM, Alistair Francis
>> <alistair.francis@wdc.com> wrote:
>>> If the user has enabled coredump let's allow them to customise the
>>> config options.
>>>
>>> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
>>> ---
>>>  meta/recipes-core/systemd/systemd_239.bb | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/meta/recipes-core/systemd/systemd_239.bb b/meta/recipes-core/systemd/systemd_239.bb
>>> index 7822548993..a625d060a9 100644
>>> --- a/meta/recipes-core/systemd/systemd_239.bb
>>> +++ b/meta/recipes-core/systemd/systemd_239.bb
>>> @@ -279,6 +279,18 @@ do_install() {
>>>                         chown polkitd:root ${D}${datadir}/polkit-1/rules.d
>>>                 fi
>>>         fi
>>> +
>>> +  # If coredump was enabled, enable it in the config.
>>> +  # This just sets the default, but can be used to customise the values.
>>> +  if ${@bb.utils.contains('PACKAGECONFIG', 'coredump', 'true', 'false', d)}; then
>>> +    sed 's|#Storage.*|Storage=external|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>> +    sed 's|#Compress.*|Compress=yes|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>> +    sed 's|#ProcessSizeMax.*|ProcessSizeMax=2G|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>> +    sed 's|#ExternalSizeMax.*|ExternalSizeMax=2G|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>> +    sed 's|#JournalSizeMax.*|JournalSizeMax=767M|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>> +    sed 's|#MaxUse.*|MaxUse=|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>> +    sed 's|#KeepFree.*|KeepFree=|g' -i ${D}${sysconfdir}/systemd/coredump.conf
>>
>> How does this allow the user to customise the config options?
>
> They can edit the sed command line and have different options.

Editing the sed commands won't help users who prefer to customise
their builds via a .bbappend or by over-riding variables from
local.conf.

For users who edit recipes directly, providing a place-holder may not
be very useful either since those users can edit the recipe to their
liking and add any sed commands etc they need anyway. So this change
would be useful to quite a narrow audience.

>> By forcing the options to the _current_ default values you are
>> creating a maintenance problem - ie you'll still be forcing these
>> values even if/when the upstream defaults are changed.
>
> That is a good point, is there a usual way to allow edits to config
> files like this?

There are a few different ways. If an upstream reference config file
is really unusable then it might be OK to patch it from the recipe.

Perhaps the ideal case though is for the configure process to provide
control over any key values etc in the files which get installed. e.g.
instead of shipping and installing a fixed "options.conf" file, the
upstream source would ship an "options.conf.in" file which would get
transformed into "options.conf" as part of running the configure
process, based on configure options or auto detection etc in the
configure script. Configuring a build via configure options is
generally the preferred approach (better than patching or running sed
on output files etc).


^ permalink raw reply

* [PATCH 1/2] block: move dif_prepare/dif_complete functions to block layer
From: Martin K. Petersen @ 2018-07-24  1:54 UTC (permalink / raw)

In-Reply-To: <20180723072854.GA18365@lst.de>


Christoph,

>> +void blk_integrity_dif_prepare(struct request *rq, u8 protection_type,
>> +			       u32 ref_tag)
>> +{
>
> Maybe call this blk_t10_pi_prepare?

The rest of these functions have a blk_integrity_ prefix. So either
stick with that or put the functions in t10-pi.c and use a t10_pi_
prefix.

I'm a bit torn on placement since the integrity metadata could contain
other stuff than T10 PI. But the remapping is very specific to T10 PI.

>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 9421d9877730..4186bf027c59 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1119,7 +1119,9 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
>>  		SCpnt->cmnd[0] = WRITE_6;
>>  
>>  		if (blk_integrity_rq(rq))
>> -			sd_dif_prepare(SCpnt);
>> +			blk_integrity_dif_prepare(SCpnt->request,
>> +						  sdkp->protection_type,
>> +						  scsi_prot_ref_tag(SCpnt));
>
> scsi_prot_ref_tag could be move to the block layer as it only uses
> the sector in the eequest and the sector size, which we can get
> from the gendisk as well.  We then don't need to pass it to the function.

For Type 2, the PI can be at intervals different from the logical block
size (although we don't support that yet). We should use the
blk_integrity profile interval instead of assuming sector size.

And wrt. Keith's comment: The tuple_size should be the one from the
integrity profile as well, not sizeof(struct t10_pi_tuple).

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 1/2] block: move dif_prepare/dif_complete functions to block layer
From: Martin K. Petersen @ 2018-07-24  1:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Max Gurtovoy, martin.petersen, linux-block, axboe, keith.busch,
	linux-nvme, sagi
In-Reply-To: <20180723072854.GA18365@lst.de>


Christoph,

>> +void blk_integrity_dif_prepare(struct request *rq, u8 protection_type,
>> +			       u32 ref_tag)
>> +{
>
> Maybe call this blk_t10_pi_prepare?

The rest of these functions have a blk_integrity_ prefix. So either
stick with that or put the functions in t10-pi.c and use a t10_pi_
prefix.

I'm a bit torn on placement since the integrity metadata could contain
other stuff than T10 PI. But the remapping is very specific to T10 PI.

>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 9421d9877730..4186bf027c59 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1119,7 +1119,9 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
>>  		SCpnt->cmnd[0] = WRITE_6;
>>  
>>  		if (blk_integrity_rq(rq))
>> -			sd_dif_prepare(SCpnt);
>> +			blk_integrity_dif_prepare(SCpnt->request,
>> +						  sdkp->protection_type,
>> +						  scsi_prot_ref_tag(SCpnt));
>
> scsi_prot_ref_tag could be move to the block layer as it only uses
> the sector in the eequest and the sector size, which we can get
> from the gendisk as well.  We then don't need to pass it to the function.

For Type 2, the PI can be at intervals different from the logical block
size (although we don't support that yet). We should use the
blk_integrity profile interval instead of assuming sector size.

And wrt. Keith's comment: The tuple_size should be the one from the
integrity profile as well, not sizeof(struct t10_pi_tuple).

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* [PATCH net-next] tcp: ack immediately when a cwr packet arrives
From: Lawrence Brakmo @ 2018-07-24  0:49 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Alexei Starovoitov, Neal Cardwell, Yuchung Cheng,
	Eric Dumazet

We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The
problem is triggered when the last packet of a request arrives CE
marked. The reply will carry the ECE mark causing TCP to shrink its cwnd
to 1 (because there are no packets in flight). When the 1st packet of
the next request arrives, the ACK was sometimes delayed even though it
is CWR marked, adding up to 40ms to the RPC latency.

This patch insures that CWR marked data packets arriving will be acked
immediately.

Packetdrill script to reproduce the problem:

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
0.110 < [ect0] . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

0.200 < [ect0] . 1:1001(1000) ack 1 win 257
0.200 > [ect01] . 1:1(0) ack 1001

0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 1:2(1) ack 1001

0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
0.200 write(4, ..., 1) = 1
0.200 > [ect01] P. 2:3(1) ack 2001

0.200 < [ect0] . 2001:3001(1000) ack 3 win 257
0.200 < [ect0] . 3001:4001(1000) ack 3 win 257
0.200 > [ect01] . 3:3(0) ack 4001

0.210 < [ce] P. 4001:4501(500) ack 3 win 257

+0.001 read(4, ..., 4500) = 4500
+0 write(4, ..., 1) = 1
+0 > [ect01] PE. 3:4(1) ack 4501

+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257
// Previously the ACK sequence below would be 4501, causing a long RTO
+0.040~+0.045 > [ect01] . 4:4(0) ack 5501   // delayed ack

+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257  // More data
+0 > [ect01] . 4:4(0) ack 6501     // now acks everything

+0.500 < F. 9501:9501(0) ack 4 win 257

Modified based on comments by Neal Cardwell <ncardwell@google.com>

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 net/ipv4/tcp_input.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 91dbb9afb950..2370fd79c5c5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -246,8 +246,15 @@ static void tcp_ecn_queue_cwr(struct tcp_sock *tp)
 
 static void tcp_ecn_accept_cwr(struct tcp_sock *tp, const struct sk_buff *skb)
 {
-	if (tcp_hdr(skb)->cwr)
+	if (tcp_hdr(skb)->cwr) {
 		tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
+
+		/* If the sender is telling us it has entered CWR, then its
+		 * cwnd may be very low (even just 1 packet), so we should ACK
+		 * immediately.
+		 */
+		tcp_enter_quickack_mode((struct sock *)tp, 2);
+	}
 }
 
 static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp)
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH 3/3] microblaze: add endianness options to LDFLAGS instead of LD
From: Masahiro Yamada @ 2018-07-24  1:53 UTC (permalink / raw)
  To: Michal Simek; +Cc: Masahiro Yamada, Linux Kernel Mailing List, linux-arch
In-Reply-To: <1530580921-23340-4-git-send-email-yamada.masahiro@socionext.com>

Hi Michal,

Ping?


2018-07-03 10:22 GMT+09:00 Masahiro Yamada <yamada.masahiro@socionext.com>:
> With the recent syntax extension, Kconfig is now able to evaluate the
> compiler / toolchain capability.
>
> However, accumulating flags to 'LD' is not compatible with the way
> it works; 'LD' must be passed to Kconfig to call $(ld-option,...)
> from Kconfig files.  If you tweak 'LD' in arch Makefile depending on
> CONFIG_CPU_BIG_ENDIAN, this would end up with circular dependency
> between Makefile and Kconfig.
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
>
>  arch/microblaze/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/microblaze/Makefile b/arch/microblaze/Makefile
> index d269dd4b..7333036 100644
> --- a/arch/microblaze/Makefile
> +++ b/arch/microblaze/Makefile
> @@ -40,11 +40,11 @@ CPUFLAGS-$(CONFIG_XILINX_MICROBLAZE0_USE_PCMP_INSTR) += -mxl-pattern-compare
>  ifdef CONFIG_CPU_BIG_ENDIAN
>  KBUILD_CFLAGS += -mbig-endian
>  KBUILD_AFLAGS += -mbig-endian
> -LD += -EB
> +LDFLAGS += -EB
>  else
>  KBUILD_CFLAGS += -mlittle-endian
>  KBUILD_AFLAGS += -mlittle-endian
> -LD += -EL
> +LDFLAGS += -EL
>  endif
>
>  CPUFLAGS-1 += $(call cc-option,-mcpu=v$(CPU_VER))
> --
> 2.7.4
>



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: m68k allmodconfig build errors
From: Randy Dunlap @ 2018-07-24  1:52 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: LKML, Geert Uytterhoeven, linux-m68k
In-Reply-To: <878t66idai.fsf@igel.home>

On 07/20/2018 12:20 AM, Andreas Schwab wrote:
> On Jul 19 2018, Randy Dunlap <rdunlap@infradead.org> wrote:
> 
>> block/partitions/ldm.o: In function `ldm_partition':
>> ldm.c:(.text+0x1900): undefined reference to `strcmp'
>> ldm.c:(.text+0x1964): undefined reference to `strcmp'
>> drivers/rtc/rtc-proc.o: In function `is_rtc_hctosys':
>> rtc-proc.c:(.text+0x290): undefined reference to `strcmp'
>> drivers/watchdog/watchdog_pretimeout.o: In function `watchdog_register_governor':
>> (.text+0x142): undefined reference to `strcmp'
> 
> GCC has optimized strncmp to strcmp, but at a stage where macros are no
> longer available.  I think the right fix is to use strcmp directly,
> since strncmp doesn't make sense here.

Hi Andreas,

I don't see that all of these string compare fields are null-terminated.

How does one convert strncmp() users to strcmp()?

thanks,
-- 
~Randy

^ permalink raw reply

* Re: [PATCHv3 2/2] mtd: m25p80: restore the status of SPI flash when exiting
From: NeilBrown @ 2018-07-24  1:51 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Brian Norris, Zhiqiang Hou, linux-mtd, Linux Kernel,
	David Woodhouse, Boris BREZILLON, Marek Vasut, Richard Weinberger,
	Cyrille Pitchen
In-Reply-To: <20180724012226.307f578d@bbrezillon>

[-- Attachment #1: Type: text/plain, Size: 8719 bytes --]

On Tue, Jul 24 2018, Boris Brezillon wrote:

> On Tue, 24 Jul 2018 08:46:33 +1000
> NeilBrown <neilb@suse.com> wrote:
>
>> On Mon, Jul 23 2018, Brian Norris wrote:
>> 
>> > Hi Boris,
>> >
>> > On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
>> > <boris.brezillon@bootlin.com> wrote:  
>> >> On Mon, 23 Jul 2018 11:13:50 -0700
>> >> Brian Norris <computersforpeace@gmail.com> wrote:  
>> >>> I noticed this got merged, but I wanted to put my 2 cents in here:  
>> >>
>> >> I wish you had replied to this thread when it was posted (more than
>> >> 6 months ago). Reverting the patch now implies making some people
>> >> unhappy because they'll have to resort to their old out-of-tree
>> >> hacks :-(.  
>> >
>> > I'd say I'm sorry for not following things closely these days, but I'm
>> > not really that sorry. There are plenty of other capable hands. And if
>> > y'all shoot yourselves in the foot, so be it. This patch isn't going
>> > to blow things up, but now that I did finally notice it (because it
>> > happened to show up in a list of backports I was looking at), I
>> > thought better late than never to remind you.
>> >
>> > For way of notification: Marek already noticed that we've started down
>> > a slippery slope months ago:
>> >
>> > https://lkml.org/lkml/2018/4/8/141
>> > Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
>> > 3-byte addressing.
>> >
>> > I'm not quite sure why that wasn't taken to its logical conclusion --
>> > that the hack should be reverted.
>> >
>> > This problem has been noted many times already, and we've always
>> > stayed on the side of *avoiding* this hack. A few references from a
>> > search of my email:
>> >
>> > http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>> > [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>> >
>> > http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>> > [RFC] MTD m25p80 3-byte addressing and boot problem
>> >
>> > http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>> > [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>> >  
>> >>> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:  
>> >>> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
>> >>> >
>> >>> > Restore the status to be compatible with legacy devices.
>> >>> > Take Freescale eSPI boot for example, it copies (in 3 Byte
>> >>> > addressing mode) the RCW and bootloader images from SPI flash
>> >>> > without firing a reset signal previously, so the reboot command
>> >>> > will fail without reseting the addressing mode of SPI flash.
>> >>> > This patch implement .shutdown function to restore the status
>> >>> > in reboot process, and add the same operation to the .remove
>> >>> > function.  
>> >>>
>> >>> We have previously rejected this patch multiple times, because the above
>> >>> comment demonstrates a broken product.  
>> >>
>> >> If we were to only support working HW parts, I fear Linux would not
>> >> support a lot of HW (that's even more true when it comes to flashes :P).  
>> >
>> > You stopped allowing UBI to attach to MLC NAND recently, no? That
>> > sounds like almost the same boat -- you've probably killed quite a few
>> > shitty products, if they were to use mainline directly.
>> >
>> > Anyway, that's derailing the issue. Supporting broken hardware isn't
>> > something you try to do by applying the same hack to all systems. You
>> > normally try to apply your hack as narrowly as possible. You seem to
>> > imply that below. So maybe that's a solution to move forward with. But
>> > I'd personally be just as happy to see the patch reverted.
>> >  
>> >>> You cannot guarantee that all
>> >>> reboots will invoke the .shutdown() method -- what about crashes? What
>> >>> about watchdog resets? IIUC, those will hit the same broken behavior,
>> >>> and have unexepcted behavior in your bootloader.  
>> >>
>> >> Yes, there are corner cases that are not addressed with this approach,  
>> >
>> > Is a system crash really a corner case? :D
>> >  
>> >> but it still seems to improve things. Of course, that means the
>> >> user should try to re-route all HW reset sources to SW ones (RESET input
>> >> pin muxed to the GPIO controller, watchdog generating an interrupt
>> >> instead of directly asserting the RESET output pin), which is not always
>> >> possible, but even when it's not, isn't it better to have a setup that
>> >> works fine 99% of the time instead of 50% of the time?  
>> >
>> > Perhaps, but not at the expense of future development. And
>> > realistically, no one is doing that if they have this hack. Most
>> > people won't even know that this hack is protecting them at all (so
>> > again, they won't try to mitigate the problem any further).
>> >  
>> >>> I suppose one could argue for doing this in remove(), but AIUI you're
>> >>> just papering over system bugs by introducing the shutdown() function
>> >>> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
>> >>> other users of this driver.  
>> >>
>> >> I understand your point. But if the problem is about making sure people
>> >> designing new boards get that right, why not complaining at probe time
>> >> when things are wrong?
>> >>
>> >> I mean, spi_nor_restore() seems to only do something on very specific
>> >> NORs (those on which a SW RESET does not resets the addressing
>> >> mode).  
>> >
>> > The point isn't that SW RESET doesn't reset the addressing mode -- it
>> > does on any flash I've seen. The point is that most systems are built
>> > around a stateless assumption in these flash. IIRC, there wasn't even
>> > a SW RESET command at all until these "huge" flash came around and
>> > stateful addressing modes came about. So boot ROMs and bootloaders
>> > would have to be updated to start figuring out when/how to do this SW
>> > RESET. And once two vendors start doing it differently (I'm not sure:
>> > have they done this already? I think so) it's no longer something a
>> > boot ROM will get right.
>> >
>> > The only way to get this stuff right is to have a hardware reset, or
>> > else to avoid all of the stateful modes in software.
>> >  
>> >> So, how about adding a flag that says "my board has the NOR HW
>> >> RESET pin wired" (there would be a DT props to set that flag). Then you
>> >> add a WARN_ON() when this flag is not set and a NOR chip impacted by
>> >> this bug is detected.  
>> >
>> > I'd kinda prefer the reverse. There really isn't a need to document
>> > anything for a working system (software usually can't control this
>> > RESET pin). The burden should be on the b0rked system to document
>> > where it needs unsound hacks to survive.
>> >  
>> >> This way you make sure people are informed that
>> >> they're doing something wrong, and for those who can't change their HW
>> >> (because it's already widely deployed), you have a fix that improve
>> >> things.  
>> >
>> > Or even better: put this hack behind a DT flag, so that one has to
>> > admit that their board design is broken before it will even do
>> > anything. Proposal: "linux,badly-designed-flash-reset".
>> >
>> > But, I'd prefer just (partially?) reverting this, and let the authors
>> > submit something that works. We're not obligated to keep bad hacks in
>> > the kernel.
>> >
>> > Brian  
>> 
>> One possibility that occurred to me when I was exploring this issue is
>> to revert to 3-byte mode whenever 4-byte was not actively in use.
>> So any access beyond 16Meg is:
>>  switch-to-4-byte ; perform IO ; switch to 3-byte
>> or similar.  On my hardware it would be more efficient to
>> use the 4-byte opcode to perform the IO, then reset the cached
>> 4th address byte that the NOR chip transparently remembered.
>> 
>> This adds a little overhead, but should be fairly robust.
>> It doesn't help if something goes terribly wrong while IO is happening,
>> but I don't think any other software solution does either.
>> 
>> How would you see that approach?
>
> I think the problem stands: people that have proper HW mitigation for
> this problem (NOR chip is reset when the Processor is reset) don't want
> to pay the overhead. So, even if we go for this approach, we probably
> want to only do that for broken HW.

I agree that a "my-hardware-is-suboptimal" flag is appropriate.
I was addressing the suggestion that the current approach doesn't handle
all corner cases and was suggesting a different approach that might
handle more corner-cases.  I should have been more explicit about that.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems
From: Junio C Hamano @ 2018-07-24  1:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Johannes Schindelin via GitGitGadget, git
In-Reply-To: <20180723012552.GA26886@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> If I understand the situation correctly, Junio is saying that he will
> continue to produce the amlog mapping, and that it contains sufficient
> information to produce the reverse mapping (which, as an aside, I did
> not even know existed -- I mostly want to go the other way, from digging
> in history to a mailing list conversation).

Yes, the reverse mapping in amlog was an experiment that did not
work well in the end.

When I use "git am" to make a commit out of a message, a
post-applypatch hook picks up the "Message-Id:" from the original
message and adds a git note to the resulting commit.  This is in
line with how the notes are meant to be used.  We have a commit
object, and a piece of information that we want to associate with
the commit object, which is not recorded as a part of the commit
object.  So we say "git notes add -m 'that piece of info' $commit"
(the message-id happens to be that piece of info in this example).

And with notes.rewriteRef, "git commit --amend" etc. would copy the
piece of info about the original commit to the rewritten commit.

	Side Note: there are a few workflow elements I do want to
	keep using but they currently *lose* the mapping info.  An
	obvious one is

	  $ git checkout -b to/pic master &&
	  ... review in MUA and then ...
	  $ git am -s mbox &&
	  ... review in tree, attempt to build, tweak, etc.
          $ git format-patch --stdout master..to/pic >P &&
          $ edit P &&
          $ git reset --hard master &&
          $ git am P

	which is far more versatile and efficient when doing certain
	transformations on the series than running "rebase -i" and
	reopening and editing the target files of the patches one by
	one in each step.  But because format-patch does not
	generate Message-Id header of the original one out of the
	commit, the post-applypatch hook run by "am" at the end of
	the steps would not have a chance to record that for the
	newly created commit.

	For this one, I think I can use "format-patch --notes=amlog"
	to produce the patch file and then teach post-applypatch
	script to pay attention to the Notes annotation without
	changing anything else to record the message id of the
	original.  Other workflow elements that lose the notes need
	to be identified and either a fix implemented or a
	workaround found for each of them.  For example, I suspect
	there is no workaround for "cherry-pick" and it would take a
	real fix.

A reverse mapping entry used to get created by post-applypatch to
map the blob that represents the notes text added to the $commit to
another text blob that contains the 40-hex of the commit object.
This is the experiment that did not work well.  As none of the later
integrator's work e.g. "commit --amend", "rebase", "cherry-pick",
etc. is about rewriting that blob, notes.rewriteRef mechanism would
not kick in, and that is understandasble.

And these (incomplete) reverse mapping entries get in the way to
maintain and correct the forward mapping.  When a commit that got
unreachable gets expired, I want "git notes prune" to remove notes
on them, and I do not want to even think about what should happen to
the entries in the notes tree that abuse the mechanism to map blobs
that are otherwise *not* even reachable from the main history.

A much more important task is to make sure that the forward mapping
that annotates invidual commits reachable from 'pu' and/or 'master' 
is maintained correctly by various tools.  From a correctly maintained
forward mapping, it should be straight forward to get a reverse mapping
if needed.

> Though personally, I do not know if there is much point in pushing it
> out, given that receivers can reverse the mapping themselves.

Before this thread, I was planning to construct and publish the
reverse mapping at the end of the day, but do so on a separate notes
ref (see above---the hacky abuse gets in the way of maintaining and
debugging the forward mapping, but a separate notes-ref that only
contains hacks is less worrysome).  But I have changed my mind and
decided not to generate or publish one.  It is sort of similar to
the way the pack .idx is constructed only by the receiver [*1*].

> Or is there some argument that there is information in the reverse map
> that _cannot_ be generated from the forward map?

I know there is no information loss (after all I was the only one
who ran that experimental hack), but there is one objection that is
still possible, even though I admit that is a weak argument.

If a plumbing "diff-{files,tree,index}" family had a sibling
"diff-notes" to compare two notes-shaped trees while pretending that
the object-name fan-out did not exist (i.e. instead, the trees being
compared is without a subtree and full of 40-hex filenames), then it
would be less cumbersome to incrementally update the reverse mapping
by reading forward mapping with something like:

	git diff-notes --raw amlog@{1} amlog

to learn the commits whose notes have changed.  But without such a
plumbing, it is cumbersome to do so correctly.  "git diff-tree -r"
could serve as a rough substitute, until the note tree grows and get
rebalanced by reorganizing the fan-out, and on the day it happens
the reverse mapper needs to read and discard ghost changes that are
only due to tree reorganizing [*2*].


[Footnotes]

*1* Even if the sender could give one when it creates a .pack, the
    receiver would not trust that it is matches the corresponding
    .pack before using it, and the cost to validate is similar to
    the cost to generate.

*2* That makes it less efficient on that day (which hopefully would
    happen once in a blue moon) but would not affect correctness.

^ permalink raw reply

* Re: m68k allmodconfig build errors
From: Randy Dunlap @ 2018-07-24  1:49 UTC (permalink / raw)
  To: Finn Thain; +Cc: LKML, Geert Uytterhoeven, linux-m68k
In-Reply-To: <alpine.LNX.2.21.1807201454590.8@nippy.intranet>

On 07/19/2018 10:17 PM, Finn Thain wrote:
> On Thu, 19 Jul 2018, Randy Dunlap wrote:
> 
>> Hi Geert,
>>
>> I am seeing a few errors when cross-building m68k on x86_64, using the 
>> toolchain at https://mirrors.edge.kernel.org/pub/tools/crosstool/ 
>> (thanks, Arnd). (so this is gcc 8.1.0)
>>
>> block/partitions/ldm.o: In function `ldm_partition':
>> ldm.c:(.text+0x1900): undefined reference to `strcmp'
>> ldm.c:(.text+0x1964): undefined reference to `strcmp'
>> drivers/rtc/rtc-proc.o: In function `is_rtc_hctosys':
>> rtc-proc.c:(.text+0x290): undefined reference to `strcmp'
>> drivers/watchdog/watchdog_pretimeout.o: In function `watchdog_register_governor':
>> (.text+0x142): undefined reference to `strcmp'
>>
>>
>> Adding #include <linux/string.h> does not help.
>>
>> Is this a toolchain problem or drivers or something else?
>>
> 
> This gcc build was apparently configured like so:
> 
> /home/arnd/git/gcc/configure --target=m68k-linux --enable-targets=all 
> --prefix=/home/arnd/cross/x86_64/gcc-8.1.0-nolibc/m68k-linux 
> --enable-languages=c --without-headers --disable-bootstrap --disable-nls 
> --disable-threads --disable-shared --disable-libmudflap --disable-libssp 
> --disable-libgomp --disable-decimal-float --disable-libquadmath 
> --disable-libatomic --disable-libcc1 --disable-libmpx 
> --enable-checking=release
> 
> In my own cross toolchain builds strcmp comes from glibc but this 
> toolchain has no libc at all.
> 
>> help?
>>
> 
> Linux will use the strcmp in lib/string.c unless __HAVE_ARCH_STRCMP is 
> defined in the arch headers. Grep suggests that m68k, mips, x86, xtensa, 
> arc, sh, arm64, s390 all define that macro. But maybe you could just patch 
> out that definition for build testing.

Sure, that works.  Thanks.

-- 
~Randy

^ permalink raw reply

* Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-07-24  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mhocko,
	akpm, torvalds, pbonzini, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel, peterx
In-Reply-To: <20180723170826-mutt-send-email-mst@kernel.org>

On 07/23/2018 10:13 PM, Michael S. Tsirkin wrote:
>>>    	vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
>>>    #endif
>>> +	err = virtio_balloon_register_shrinker(vb);
>>> +	if (err)
>>> +		goto out_del_vqs;
>>> So we can get scans before device is ready. Leak will fail
>>> then. Why not register later after device is ready?
>> Probably no.
>>
>> - it would be better not to set device ready when register_shrinker failed.
> That's very rare so I won't be too worried.

Just a little confused with the point here. "very rare" means it still 
could happen (even it's a corner case), and if that happens, we got 
something wrong functionally. So it will be a bug if we change like 
that, right?

Still couldn't understand the reason of changing shrinker_register after 
device_ready (the original oom notifier was registered before setting 
device ready too)?
(I think the driver won't get shrinker_scan called if device isn't ready 
because of the reasons below)

>> - When the device isn't ready, ballooning won't happen, that is,
>> vb->num_pages will be 0, which results in shrinker_count=0 and shrinker_scan
>> won't be called.

Best,
Wei

^ permalink raw reply

* Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-07-24  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yang.zhang.wz, virtio-dev, riel, quan.xu0, kvm, nilal,
	liliang.opensource, linux-kernel, mhocko, linux-mm, pbonzini,
	akpm, virtualization, torvalds
In-Reply-To: <20180723170826-mutt-send-email-mst@kernel.org>

On 07/23/2018 10:13 PM, Michael S. Tsirkin wrote:
>>>    	vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
>>>    #endif
>>> +	err = virtio_balloon_register_shrinker(vb);
>>> +	if (err)
>>> +		goto out_del_vqs;
>>> So we can get scans before device is ready. Leak will fail
>>> then. Why not register later after device is ready?
>> Probably no.
>>
>> - it would be better not to set device ready when register_shrinker failed.
> That's very rare so I won't be too worried.

Just a little confused with the point here. "very rare" means it still 
could happen (even it's a corner case), and if that happens, we got 
something wrong functionally. So it will be a bug if we change like 
that, right?

Still couldn't understand the reason of changing shrinker_register after 
device_ready (the original oom notifier was registered before setting 
device ready too)?
(I think the driver won't get shrinker_scan called if device isn't ready 
because of the reasons below)

>> - When the device isn't ready, ballooning won't happen, that is,
>> vb->num_pages will be 0, which results in shrinker_count=0 and shrinker_scan
>> won't be called.

Best,
Wei

^ permalink raw reply

* Re: [PATCH 5/7] x86/vdso: Add vDSO functions for direct store instructions
From: Andy Lutomirski @ 2018-07-24  1:48 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H Peter Anvin
  Cc: Ashok Raj, Alan Cox, Ravi V Shankar, linux-kernel, x86
In-Reply-To: <1532350557-98388-6-git-send-email-fenghua.yu@intel.com>

On 07/23/2018 05:55 AM, Fenghua Yu wrote:
> User wants to query if direct store instructions are supported and use
> the instructions. The vDSO functions provides fast interface for user
> to query the support and use the instructions.
> 
> movdiri_supported and its alias __vdso_movdiri_supported check if
> movdiri instructions are supported.
> 
> movdir64b_supported and its alias __vdso_movdir64b_supported checks
> if movdir64b instruction is supported.
> 
> movdiri32 and its alias __vdso_movdiri32 provide user APIs for calling
> 32-bit movdiri instruction.
> 
> movdiri64 and its alias __vdso_movdiri64 provide user APIs for calling
> 64-bit movdiri instruction.
> 
> movdir64b and its alias __vdso_movdir64b provide user APIs to move
> 64-byte data through movdir64b instruction.
> 
> The instructions can be implemented in intrinsic functions in future
> GCC. But the vDSO interfaces are available to user without the
> intrinsic functions support in GCC and the APIs movdiri_supported and
> movdir64b_supported cannot be implemented as GCC functions.

I'm not convinced that any of this belongs in the vDSO at all.  You 
could just add AT_HWCAP (or AT_HWCAP2) flags for the new instructions. 
Or user code could use CPUID just like for any other new instruction. 
But, if there really is some compelling reason to add this to the vDSO, 
then see below:


+
> +notrace bool __vdso_movdiri_supported(void)
> +{
> +	return _vdso_funcs_data->movdiri_supported;

return static_cpu_has(X86_FEATURE_MOVDIRI);

And all the VVAR stuff can be removed.

^ permalink raw reply

* Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API
From: Rajkumar Manoharan @ 2018-07-24  0:42 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-wireless, make-wifi-fast, Felix Fietkau,
	linux-wireless-owner
In-Reply-To: <877elo48ea.fsf@toke.dk>

On 2018-07-21 13:54, Toke Høiland-Jørgensen wrote:
> Rajkumar Manoharan <rmanohar@codeaurora.org> writes:
> 
>> On 2018-07-19 07:18, Toke Høiland-Jørgensen wrote:
>>> Rajkumar Manoharan <rmanohar@codeaurora.org> writes:
>>> 
>>>> On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:
>>>>> Rajkumar Manoharan <rmanohar@codeaurora.org> writes:
>> 
>>>>> It is not generally predictable how many times this will loop 
>>>>> before
>>>>> exiting...
>>>>> 
>>>> Agree.. It would be better If the driver does not worry about txq
>>>> sequence numbering. Perhaps one more API (ieee80211_first_txq) could
>>>> solve this. Will leave it to you.
>>> 
>>> That is basically what the second parameter to next_txq() does in the
>>> current patchset. It just needs to be renamed...
>>> 
>> Agree. As next_txq() increments seqno while starting loop for each AC,
>> It seems bit confusing. i.e A single ath_txq_schedule_all call will
>> bump schedule_seqno by 4. right?
> 
> Hmm, good point. Guess there should be one seqno per ac...
> 
I would prefer to keep separate list for each AC ;) Prepared one on top 
of
your earlier merged changes. Now it needs revisit.

>> Let assume below case where CPU1 is dequeuing skb from isr context and
>> CPU2 is enqueuing skbs into same txq.
>> 
>> CPU1                                          CPU2
>> ----                                          ----
>> ath_txq_schedule
>>    -> ieee80211_next_txq(first)
>>          -> inc_seq
>>          -> delete from list
>>          -> txq->seqno = local->seqno
>>                                               ieee80211_tx/fast_xmit
>>                                                  -> 
>> ieee80211_queue_skb
>>                                                      ->
>> ieee80211_schedule_txq(reset_seqno)
>>                                                           -> list_add
>>                                                           -> 
>> txqi->seqno
>> = local->seqno - 1
>> 
>> In above sequence, it is quite possible that the same txq will be
>> served repeatedly and other txqs will be starved. am I right? IMHO
>> seqno mechanism will not guarantee that txqs will be processed only
>> once in an iteration.
> 
> Yeah, ieee80211_queue_skb() shouldn't set reset_seqno; was 
> experimenting
> with that, and guess I ended up picking the wrong value. Thanks for
> pointing that out :)
> 
A simple change in argument may break algo. What would be seqno of first
packet of txq if queue_skb() isn't reset seqno? I am fine in passing 
start
of loop as arg to next_ntx(). But prefer to keep loop detecting within
mac80211 itself by tracking txq same as ath10k. So that no change is 
needed
in schedule_txq() arg list. thoughts?

>>>>> Well, it could conceivably be done in a way that doesn't require
>>>>> taking
>>>>> the activeq_lock. Adding another STOP flag to the TXQ, for 
>>>>> instance.
>>>>> From an API point of view I think that is more consistent with what
>>>>> we
>>>>> have already...
>>>>> 
>>>> 
>>>> Make sense. ieee80211_txq_may_pull would be better place to decide
>>>> whether given txq is allowed for transmission. It also makes drivers
>>>> do not have to worry about deficit. Still I may need
>>>> ieee80211_reorder_txq API after processing txq. isn't it?
>>> 
>>> The way I was assuming this would work (and what ath9k does), is that 
>>> a
>>> driver only operates on one TXQ at a time; so it can get that txq,
>>> process it, and re-schedule it. In which case no other API is needed;
>>> the rotating can be done in next_txq(), and schedule_txq() can insert
>>> the txq back into the rotation as needed.
>>> 
>>> However, it sounds like this is not how ath10k does things? See 
>>> below.
>>> 
>> correct. The current rotation works only in push-only mode. i.e when
>> firmware is not deciding txqs and the driver picks priority txq from
>> active_txqs list. Unfortunately rotation won't work when the driver
>> selects txq other than first in the list. In any case separate API
>> needed for rotation when the driver is processing few packets from txq
>> instead of all pending frames.
> 
> Rotation is not dependent on the TXQ draining completely. Dequeueing a
> few packets, then rescheduling the TXQ is fine.
> 
The problem is that when the driver accesses txq directly, it wont call
next_txq(). So the txq will not dequeued from list and schedule_txq() 
wont
be effective. right?

ieee80211_txq_may_pull() - check whether txq is allowed for tx_dequeue()
                           that helps to keep deficit check in mac80211 
itself

ieee80211_reorder_txq() - after dequeuing skb (in direct txq access),
                           txq will be deleted from list and if there are 
pending
                           frames, it will be requeued.

>>> And if so, how does that interact with ath10k_mac_tx_push_pending()?
>>> That function is basically doing the same thing that I explained 
>>> above
>>> for ath9k; in the previous version of this patch series I modified 
>>> that
>>> to use next_txq(). But is that a different TX path, or am I
>>> misunderstanding you?
>>> 
>>> If you could point me to which parts of the ath10k code I should be
>>> looking at, that would be helpful as well :)
>>> 
>> Depends on firmware htt_tx_mode (push/push_pull),
>> ath10k_mac_tx_push_pending() downloads all/partial frames to firmware.
>> Please take a look at ath10k_mac_tx_can_push() in push_pending(). Let
>> me know If you need any other details.
> 
> Right, so looking at this, it looks like the driver will loop through
> all the available TXQs, trying to dequeue from each of them if
> ath10k_mac_tx_can_push() returns true, right? This should actually work
> fine with the next_txq() / schedule_txq() API. Without airtime fairness
> that will just translate into basically what the driver is doing now,
> and I don't think more API functions are needed, as long as that is the
> only point in the driver that pushes packets to the device...
> 
In push-only mode, this will work with next_txq() / schedule_txq() APIs.
In ath10k, packets are downloaded to firmware in three places.

wake_tx_queue()
tx_push_pending()
htt_rx_tx_fetch_ind() - do txq_lookup() and push_txq()

the above mentioned new APIs are needed to take care of fetch_ind().

> With airtime fairness, what is going to happen is that the loop is only
> going to get a single TXQ (the first one with a positive deficit), then
> exit. Once that station has transmitted something and its deficit runs
> negative, it'll get rotated and the next one will get a chance. So I
> expect that airtime fairness will probably work, but MU-MIMO will
> break...
> 
Agree.. In your earlier series, you did similar changes in ath10k 
especially
in wake_tx_queue and push_pending().

> Don't think we can do MU-MIMO with a DRR airtime scheduler; we'll have
> to replace it with something different. But I think the same next_txq()
> / schedule_txq() API will work for that as well...
> 
As mentioned above, have to take of fetch_ind(). All 10.4 based chips 
(QCA99x0,
QCA9984, QCA9888, QCA4019, etc.) operates in push-pull mode, when the 
number of
connected station increased. As target does not have enough memory for 
buffering
frames for each station, it relied on host memory. ath10k driver can not 
identify
that whether the received fetch_ind is for MU-MIMO or regular 
transmission.

If we don't handle this case, then ath10k driver can not claim mac80211 
ATF support.
Agree that MU-MIMO won't work with DDR scheduler. and it will impact 
MU-MIMO performace
in ath10k when mac80211 ATF is claimed by ath10k.

-Rajkumar

^ permalink raw reply

* [U-Boot] [PATCH 11/17] fs: add mkdir interface
From: AKASHI Takahiro @ 2018-07-24  1:45 UTC (permalink / raw)
  To: u-boot
In-Reply-To: <20180723235658.GC11755@bill-the-cat>

On Mon, Jul 23, 2018 at 07:56:58PM -0400, Tom Rini wrote:
> On Mon, Jul 23, 2018 at 05:48:13PM -0600, Simon Glass wrote:
> > Hi,
> > 
> > On 20 July 2018 at 11:35, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
> > > On 07/20/2018 04:57 AM, AKASHI Takahiro wrote:
> > >> "mkdir" interface is added to file operations.
> > >> This is a preparatory change as mkdir support for FAT file system
> > >> will be added in next patch.
> > >>
> > >> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > >> ---
> > >>  fs/fs.c      | 45 +++++++++++++++++++++++++++++++++++++++++++++
> > >>  include/fs.h | 10 ++++++++++
> > >>  2 files changed, 55 insertions(+)
> > >>
> > 
> > We need to get a proper fs test in place before we add any more stuff.
> 
> Agreed that we need more tests as part of this series.

Is this a new standard rule for new features?
Just kidding, but testing was definitely one of my concerns partly
because fs-test.sh is anyhow in an old fashion and partly because
I can't find any criteria for test coverage:
- white-box test or black-box test
- coverage for corner (or edge) cases
- some sort of stress test, etc.

> > Does someone waent to try converting fs-test.sh to pytest in test/py/tests ?
> 
> I thought there was at least a first pass at that?

Well, I don't see any file system related scripts under test/py/tests.

Please advise me.

Thanks,
-Takahiro AKASHI

> -- 
> Tom

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.