From: Shaohua Li <shli@kernel.org>
To: Coly Li <colyli@suse.de>
Cc: NeilBrown <neilb@suse.com>,
linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
Johannes Thumshirn <jthumshirn@suse.de>,
Guoqing Jiang <gqjiang@suse.com>
Subject: Re: [PATCH V3 1/2] RAID1: a new I/O barrier implementation to remove resync window
Date: Fri, 24 Feb 2017 11:02:15 -0800 [thread overview]
Message-ID: <20170224190215.uzv2cl65ry7siip5@kernel.org> (raw)
In-Reply-To: <77888b1a-9b5c-6171-f830-ff2d62193067@suse.de>
On Sat, Feb 25, 2017 at 02:57:26AM +0800, Coly Li wrote:
> On 2017/2/25 上午1:17, Shaohua Li wrote:
> > On Sat, Feb 25, 2017 at 01:06:22AM +0800, Coly Li wrote:
> >> On 2017/2/24 上午7:14, NeilBrown wrote:
> >>> On Thu, Feb 23 2017, Coly Li wrote:
> >>>
> >>>>
> >>>> I tried to set up a 4 layer stacked md raid1, and reduce I/O
> >>>> barrier bucket size to 8MB, running for 10 hours, there is no
> >>>> deadlock observed,
> >>>
> >>> Try setting BARRIER_BUCKETS_NR to '1' and BARRIER_UNIT_SECTOR_BITS
> >>> to 3 and make sure the write requests are larger than 1 page (and
> >>> have resync happen at the same time as writes).
> >>
> >> Hi Neil,
> >>
> >> Yes, the above method triggers deadlock easily. After come to
> >> understand how bios are handled in stacked raid1 and the relationship
> >> between current->bio_list, plug->pending and conf->pending_bio_list, I
> >> think I come to understand what you worried and the meaning of your fix.
> >>
> >> I totally agree and understand there will be hash conflict sooner or
> >> later now. Yes we need this fix.
> >>
> >> Thanks to you and Shaohua, explaining the details to me, and help me
> >> to catch up your mind :-)
> >
> > I'm confused. So the deadlock is real? How is it triggered?
>
> Let me explain,
>
> There is no deadlock now, because,
> 1) If there is hash conflict existing, a deadlock is possible.
> 2) In current Linux kernel, hash conflict won't happen in real life
> 2.1) regular bio maximum size is 2MB, it can only be split into 2
> bios in raid1_make_request() of my new I/O barrier patch
> 2.2) DISCARD bio maximum size is 4GB, it can be split into 65 bios in
> raid1_make_request() of my new I/O barrier patch.
> 2.3) I verified that, for any consecutive 512 integers in [0,
> 1<<63], there is no hash conflict by calling sector_to_idx().
> 2.4) Currently there is almost no device provides LBA range exceeds
> (1<<63) bytes. So in current Linux kernel with my new I/O barrier patch,
> no dead lock will happen. The patch in current Linux kernel is deadlock
> clean from all conditions we discussed before.
>
> The reason why I suggest to still have your patch is,
> 1) If one day bi_iter.bi_size is extended from 32bit (unsigned int) to
> 64bit (unsigned long), a DISCARD bio will be split to more than 1024
> smaller bios.
> 2) If a DISCARD bio is split into more then 1024 smaller bios, that
> means sector_to_idx() is called by 1024+ consecutive integers. It is
> 100% possible to have hash conflict.
> 3) If hash conflict exists, the deadlock described by Neil will be passible.
>
>
> What I mean is, currently there is no deadlock, because bi_iter.bi_size
> is 32 bit; if in future bi_iter.bi_size extended to 64 bit, we will have
> deadlock. Your fix almost does not hurt performance, improves code
> readability and will avoid a potential deadlock in future (when
> bi_iter.bi_size extended to 64 bit), why not have it ?
Let's assume there is hash conflict. We have raid10 anyway, which doesn't have
the fancy barrier. When can the deadlock be triggered? My understanding is
there isn't because we are handling bios in raid1/10d. Any thing I missed?
Thanks,
Shaohua
next prev parent reply other threads:[~2017-02-24 19:02 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-15 16:35 [PATCH V3 1/2] RAID1: a new I/O barrier implementation to remove resync window colyli
2017-02-15 16:35 ` [PATCH V3 2/2] RAID1: avoid unnecessary spin locks in I/O barrier code colyli
2017-02-15 17:15 ` Coly Li
2017-02-16 2:25 ` Shaohua Li
2017-02-17 18:42 ` Coly Li
2017-02-16 7:04 ` NeilBrown
2017-02-17 7:56 ` Coly Li
2017-02-17 18:35 ` Coly Li
2017-02-16 2:22 ` [PATCH V3 1/2] RAID1: a new I/O barrier implementation to remove resync window Shaohua Li
2017-02-16 17:05 ` Coly Li
2017-02-17 12:40 ` Coly Li
2017-02-16 7:04 ` NeilBrown
2017-02-17 6:56 ` Coly Li
2017-02-19 23:50 ` NeilBrown
2017-02-20 2:51 ` NeilBrown
2017-02-20 7:04 ` Shaohua Li
2017-02-20 8:07 ` Coly Li
2017-02-20 8:30 ` Coly Li
2017-02-20 18:14 ` Wols Lists
2017-02-21 11:30 ` Coly Li
2017-02-21 19:20 ` Wols Lists
2017-02-21 20:16 ` Coly Li
2017-02-21 0:29 ` NeilBrown
2017-02-21 9:45 ` Coly Li
2017-02-21 17:45 ` Shaohua Li
2017-02-21 20:09 ` Coly Li
2017-02-23 5:54 ` Coly Li
2017-02-23 17:34 ` Shaohua Li
2017-02-23 19:31 ` Coly Li
2017-02-23 19:58 ` Shaohua Li
2017-02-24 17:02 ` Coly Li
2017-02-24 10:19 ` 王金浦
2017-02-28 19:42 ` Shaohua Li
2017-03-01 17:01 ` 王金浦
2017-02-23 23:14 ` NeilBrown
2017-02-24 17:06 ` Coly Li
2017-02-24 17:17 ` Shaohua Li
2017-02-24 18:57 ` Coly Li
2017-02-24 19:02 ` Shaohua Li [this message]
2017-02-24 19:19 ` Coly Li
2017-02-17 19:41 ` Shaohua Li
2017-02-18 2:40 ` Coly Li
2017-02-19 23:42 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170224190215.uzv2cl65ry7siip5@kernel.org \
--to=shli@kernel.org \
--cc=colyli@suse.de \
--cc=gqjiang@suse.com \
--cc=jthumshirn@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.com \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).