All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <mraynal@kernel.org>
To: liaoweixiong <liaoweixiong@allwinnertech.com>
Cc: Kees Cook <keescook@chromium.org>,
	Anton Vorontsov <anton@enomsg.org>,
	Colin Cross <ccross@android.com>, Tony Luck <tony.luck@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Rob Herring <robh@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mtd@lists.infradead.org
Subject: Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
Date: Wed, 22 Jan 2020 18:41:14 +0100	[thread overview]
Message-ID: <20200122184114.125b42c8@xps13> (raw)
In-Reply-To: <2c6000b1-ae25-564b-911a-2879e9c244b2@allwinnertech.com>

Hello,


> >>>> +/*
> >>>> + * All zones will be read as pstore/blk will read zone one by one when do
> >>>> + * recover.
> >>>> + */
> >>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> >>>> +{
> >>>> +	struct mtdpstore_context *cxt = &oops_cxt;
> >>>> +	size_t retlen;
> >>>> +	int ret;
> >>>> +
> >>>> +	if (mtdpstore_block_isbad(cxt, off))
> >>>> +		return -ENEXT;
> >>>> +
> >>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> >>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {  
> >>>
> >>> IIRC size != retlen does not mean it failed, but that you should
> >>> continue reading after retlen bytes, no?  
> >>>    >>  
> >> Yes, you are right. I will fix it. Thanks.
> >>  
> >>> Also, mtd_is_bitflip() does not mean that you are reading a false
> >>> buffer, but that the data has been corrected as it contained bitflips.
> >>> mtd_is_eccerr() however, would be meaningful.  
> >>>    >>  
> >> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
> >> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
> >> mtd_is_bitflip().  
> > 
> > Yes, just drop this check, only keep ret < 0.
> >   
> 
> If I don't get it wrong, it should not	 be dropped here. Like your words,
> "mtd_is_bitflip() does not mean that you are reading a false buffer,
> but that the data has been corrected as it contained bitflips.", the
> data I get are valid even if mtd_is_bitflip() return true. It's correct
> data and it's no need to go to handle error. To me, the codes
> should be:
> 	if (ret < 0 && !mit_is_bitflip())
> 		[error handling]

Please check the implementation of mtd_is_bitflip(). You'll probably
figure out what I am saying.

https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585


|...]

> >>>> +		return;
> >>>> +	}
> >>>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> >>>> +		pr_err("record size %lu KB must align to write size %d KB\n",
> >>>> +				info->dmesg_size / 1024,
> >>>> +				mtd->writesize / 1024);  
> >>>
> >>> This condition is weird, why would you check this?  
> >>>    >>  
> >> pstore/blk will write 'record_size' dmesg log at one time.
> >> Since each write data must be aligned to 'writesize' for flash, I am not
> >> sure
> >> all flash drivers are compatible with misaligned data, that's why i
> >> check this.  
> > 
> > I think you should enforce this alignment instead of checking it.
> >   
> 
> Do you mean that mtdpstore should enforce this alignment while running?
> The way I can think of is to handle a buffer aligned to writesize and
> write to flash with this aligned buffer.
> 
> That causes some error. The MTD device will be divided into mutil
> chunks accroding to dmesg_size. Each chunk stores a individual
> OOPS/Panic log. If dmesg_size is misaligned to writesize, the last
> write results in next write failure because the page of flash can only
> be programed once before next erase and the page shared by two chunks
> has been used by the last write. Besides, we can not read to buffer,
> ersae and write back as there is no read/erase for panic case.

I mean: what is the usual size of dmesg? I don't get why you need it to
be ie. a multiple of 2k. It probably is actually, I don't know if there
is a standard. But if dmesg_size is eg 3k, just skip the end of the
partially written page and start writing at the next page?

> 
> >>  
> >>>> +		return;
> >>>> +	}
> >>>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
> >>>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
> >>>> +				mtd->index,
> >>>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);  
> >>>
> >>> Same question? I could understand that it is easier to manage blocks
> >>> knowing their maximum number though.  
> >>>    >>  
> >> It refers to mtdoops.  
> > 
> > What do you mean?
> >   
> 
> To me, it's unnecessary to check at all, however it is really there
> on codes of mtdoops. I refer to module mtdoops when I design mtdpstore.
> It may be helpfull for some cases out of my think, that's why I keep it.

Why not.

[...]

> >>
> >> In case of repeated erase when users remove several log files, mtdpstore
> >> do remove jobs when exit.
> >>
> >> Besides, mtdpstore do not check the return code to ensure write back valid
> >> log as much as possible.  
> > 
> > You are not in a critical path, I don't understand why you don't check
> > it? If it returns an error, it means the data is not written. IMHO it
> > is best to alert the user than to silently fail.
> >   
> 
> This function will be called only when mtd device is removing. It's
> useless to alert the user but try to flush the other valid data to

It is useful to alert the user! It means something went wrong while
everything seems fine.

> flash as mush as possible by which reduce losses. If it's just
> because of busy, what happens next time?

Just because of busy? I don't get it.

I'm okay with the idea of trying to write the other chunks though:

	while (remaining_chunk) {
		ret = mtd_write()
		if (ret) {
			alert-user;
			continue;
		}
	}

> 
> >>  
> >>>> +. >>>> +		off += zonesize;
> >>>> +		size -= min_t(unsigned int, zonesize, size);
> >>>> +	}
> >>>> +
> >>>> +free:
> >>>> +	kfree(buf);
> >>>> +	return ret;
> >>>> +}
> >>>> +  
> > 
> > 
> > [...]
> >   
> >>>
> >>> Thanks,
> >>> Miquèl  
> >>>    >>  
> >> I will collect more suggestions and submit the new version at one time.
> >>  
> > 
> > Sure, no hurry.
> >   
> 
> I am on holiday, please forgive me for my slow response.

Take your time, as I said, no hurry.

> 
> > 
> > Thanks,
> > Miquèl
> >   




Thanks,
Miquèl

WARNING: multiple messages have this Message-ID (diff)
From: Miquel Raynal <mraynal@kernel.org>
To: liaoweixiong <liaoweixiong@allwinnertech.com>
Cc: Rob Herring <robh@kernel.org>, Tony Luck <tony.luck@intel.com>,
	Kees Cook <keescook@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Richard Weinberger <richard@nod.at>,
	Anton Vorontsov <anton@enomsg.org>,
	linux-doc@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Colin Cross <ccross@android.com>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Vignesh Raghavendra <vigneshr@ti.com>
Subject: Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
Date: Wed, 22 Jan 2020 18:41:14 +0100	[thread overview]
Message-ID: <20200122184114.125b42c8@xps13> (raw)
In-Reply-To: <2c6000b1-ae25-564b-911a-2879e9c244b2@allwinnertech.com>

Hello,


> >>>> +/*
> >>>> + * All zones will be read as pstore/blk will read zone one by one when do
> >>>> + * recover.
> >>>> + */
> >>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
> >>>> +{
> >>>> +	struct mtdpstore_context *cxt = &oops_cxt;
> >>>> +	size_t retlen;
> >>>> +	int ret;
> >>>> +
> >>>> +	if (mtdpstore_block_isbad(cxt, off))
> >>>> +		return -ENEXT;
> >>>> +
> >>>> +	pr_debug("try to read off 0x%llx size %zu\n", off, size);
> >>>> +	ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
> >>>> +	if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen)  {  
> >>>
> >>> IIRC size != retlen does not mean it failed, but that you should
> >>> continue reading after retlen bytes, no?  
> >>>    >>  
> >> Yes, you are right. I will fix it. Thanks.
> >>  
> >>> Also, mtd_is_bitflip() does not mean that you are reading a false
> >>> buffer, but that the data has been corrected as it contained bitflips.
> >>> mtd_is_eccerr() however, would be meaningful.  
> >>>    >>  
> >> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
> >> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
> >> mtd_is_bitflip().  
> > 
> > Yes, just drop this check, only keep ret < 0.
> >   
> 
> If I don't get it wrong, it should not	 be dropped here. Like your words,
> "mtd_is_bitflip() does not mean that you are reading a false buffer,
> but that the data has been corrected as it contained bitflips.", the
> data I get are valid even if mtd_is_bitflip() return true. It's correct
> data and it's no need to go to handle error. To me, the codes
> should be:
> 	if (ret < 0 && !mit_is_bitflip())
> 		[error handling]

Please check the implementation of mtd_is_bitflip(). You'll probably
figure out what I am saying.

https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585


|...]

> >>>> +		return;
> >>>> +	}
> >>>> +	if (unlikely(info->dmesg_size % mtd->writesize)) {
> >>>> +		pr_err("record size %lu KB must align to write size %d KB\n",
> >>>> +				info->dmesg_size / 1024,
> >>>> +				mtd->writesize / 1024);  
> >>>
> >>> This condition is weird, why would you check this?  
> >>>    >>  
> >> pstore/blk will write 'record_size' dmesg log at one time.
> >> Since each write data must be aligned to 'writesize' for flash, I am not
> >> sure
> >> all flash drivers are compatible with misaligned data, that's why i
> >> check this.  
> > 
> > I think you should enforce this alignment instead of checking it.
> >   
> 
> Do you mean that mtdpstore should enforce this alignment while running?
> The way I can think of is to handle a buffer aligned to writesize and
> write to flash with this aligned buffer.
> 
> That causes some error. The MTD device will be divided into mutil
> chunks accroding to dmesg_size. Each chunk stores a individual
> OOPS/Panic log. If dmesg_size is misaligned to writesize, the last
> write results in next write failure because the page of flash can only
> be programed once before next erase and the page shared by two chunks
> has been used by the last write. Besides, we can not read to buffer,
> ersae and write back as there is no read/erase for panic case.

I mean: what is the usual size of dmesg? I don't get why you need it to
be ie. a multiple of 2k. It probably is actually, I don't know if there
is a standard. But if dmesg_size is eg 3k, just skip the end of the
partially written page and start writing at the next page?

> 
> >>  
> >>>> +		return;
> >>>> +	}
> >>>> +	if (unlikely(mtd->size > MTDPSTORE_MAX_MTD_SIZE)) {
> >>>> +		pr_err("mtd%d is too large (limit is %d MiB)\n",
> >>>> +				mtd->index,
> >>>> +				MTDPSTORE_MAX_MTD_SIZE / 1024 / 1024);  
> >>>
> >>> Same question? I could understand that it is easier to manage blocks
> >>> knowing their maximum number though.  
> >>>    >>  
> >> It refers to mtdoops.  
> > 
> > What do you mean?
> >   
> 
> To me, it's unnecessary to check at all, however it is really there
> on codes of mtdoops. I refer to module mtdoops when I design mtdpstore.
> It may be helpfull for some cases out of my think, that's why I keep it.

Why not.

[...]

> >>
> >> In case of repeated erase when users remove several log files, mtdpstore
> >> do remove jobs when exit.
> >>
> >> Besides, mtdpstore do not check the return code to ensure write back valid
> >> log as much as possible.  
> > 
> > You are not in a critical path, I don't understand why you don't check
> > it? If it returns an error, it means the data is not written. IMHO it
> > is best to alert the user than to silently fail.
> >   
> 
> This function will be called only when mtd device is removing. It's
> useless to alert the user but try to flush the other valid data to

It is useful to alert the user! It means something went wrong while
everything seems fine.

> flash as mush as possible by which reduce losses. If it's just
> because of busy, what happens next time?

Just because of busy? I don't get it.

I'm okay with the idea of trying to write the other chunks though:

	while (remaining_chunk) {
		ret = mtd_write()
		if (ret) {
			alert-user;
			continue;
		}
	}

> 
> >>  
> >>>> +. >>>> +		off += zonesize;
> >>>> +		size -= min_t(unsigned int, zonesize, size);
> >>>> +	}
> >>>> +
> >>>> +free:
> >>>> +	kfree(buf);
> >>>> +	return ret;
> >>>> +}
> >>>> +  
> > 
> > 
> > [...]
> >   
> >>>
> >>> Thanks,
> >>> Miquèl  
> >>>    >>  
> >> I will collect more suggestions and submit the new version at one time.
> >>  
> > 
> > Sure, no hurry.
> >   
> 
> I am on holiday, please forgive me for my slow response.

Take your time, as I said, no hurry.

> 
> > 
> > Thanks,
> > Miquèl
> >   




Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2020-01-22 17:41 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-20  1:03 [PATCH v1 00/11] pstore: support crash log to block and mtd device WeiXiong Liao
2020-01-20  1:03 ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 01/11] pstore/blk: new support logger for block devices WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 02/11] blkoops: add blkoops, a warpper for pstore/blk WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 03/11] pstore/blk: support pmsg recorder WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 04/11] pstore/blk: blkoops: support console recorder WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 05/11] pstore/blk: blkoops: support ftrace recorder WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 06/11] Documentation: pstore/blk: blkoops: create document for pstore_blk WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-21  4:13   ` Randy Dunlap
2020-01-21  4:13     ` Randy Dunlap
2020-01-21  5:23     ` liaoweixiong
2020-01-21  5:23       ` liaoweixiong
2020-01-21  6:36       ` Randy Dunlap
2020-01-21  6:36         ` Randy Dunlap
2020-01-21  8:19         ` liaoweixiong
2020-01-21  8:19           ` liaoweixiong
2020-01-21 15:34           ` Randy Dunlap
2020-01-21 15:34             ` Randy Dunlap
2020-01-22 15:01             ` liaoweixiong
2020-01-22 15:01               ` liaoweixiong
2020-01-22 16:08               ` Randy Dunlap
2020-01-22 16:08                 ` Randy Dunlap
2020-01-20  1:03 ` [PATCH v1 07/11] pstore/blk: skip broken zone for mtd device WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 08/11] blkoops: respect for device to pick recorders WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 09/11] pstore/blk: blkoops: support special removing jobs for dmesg WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 10/11] blkoops: add interface for dirver to get information of blkoops WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20  1:03 ` [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk WeiXiong Liao
2020-01-20  1:03   ` WeiXiong Liao
2020-01-20 10:03   ` Miquel Raynal
2020-01-20 10:03     ` Miquel Raynal
2020-01-21  3:36     ` liaoweixiong
2020-01-21  3:36       ` liaoweixiong
2020-01-21  8:48       ` Miquel Raynal
2020-01-21  8:48         ` Miquel Raynal
2020-01-22 17:22         ` liaoweixiong
2020-01-22 17:22           ` liaoweixiong
2020-01-22 17:41           ` Miquel Raynal [this message]
2020-01-22 17:41             ` Miquel Raynal
2020-02-06 13:10             ` liaoweixiong
2020-02-06 13:10               ` liaoweixiong
2020-02-06 15:45               ` Miquel Raynal
2020-02-06 15:45                 ` Miquel Raynal
2020-02-07  4:13                 ` liaoweixiong
2020-02-07  4:13                   ` liaoweixiong
2020-02-07  8:41                   ` Miquel Raynal
2020-02-07  8:41                     ` Miquel Raynal
2020-02-07 10:30                     ` liaoweixiong
2020-02-07 10:30                       ` liaoweixiong
2020-01-23  4:24   ` Vignesh Raghavendra
2020-01-23  4:24     ` Vignesh Raghavendra
2020-01-23  7:03     ` liaoweixiong
2020-01-23  7:03       ` liaoweixiong
2020-02-06  9:13 ` [PATCH v1 00/11] pstore: support crash log to block and mtd device Kees Cook
2020-02-06  9:13   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200122184114.125b42c8@xps13 \
    --to=mraynal@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anton@enomsg.org \
    --cc=ccross@android.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=keescook@chromium.org \
    --cc=liaoweixiong@allwinnertech.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=richard@nod.at \
    --cc=robh@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.