From: Kevin Wolf <kwolf@redhat.com>
To: Pavel Dovgalyuk <dovgaluk@ispras.ru>
Cc: 'Vladimir Sementsov-Ogievskiy' <vsementsov@virtuozzo.com>,
qemu-devel@nongnu.org, stefanha@redhat.com, mreitz@redhat.com
Subject: Re: Race condition in overlayed qcow2?
Date: Tue, 25 Feb 2020 12:47:11 +0100 [thread overview]
Message-ID: <20200225114711.GA7632@linux.fritz.box> (raw)
In-Reply-To: <004b01d5ebc3$626dc9d0$27495d70$@ru>
Am 25.02.2020 um 11:07 hat Pavel Dovgalyuk geschrieben:
> CC'ing Stefan due to the same question back in 2010:
>
> https://lists.gnu.org/archive/html/qemu-devel/2010-09/msg01996.html
>
> I also encountered this with Windows guest.
> E.g., there were the requests like:
>
> Read 2000 bytes:
> addr=A, size=1000
> addr=A, size=1000
>
> I.e. reading 1000 bytes in real, but the purpose of such request is unclear.
I think the conclusion back then was that the result is undefined (i.e.
you can get any mix between both parts). I'm not sure if we ever found
out why Windows is even issuing such requests. Maybe Stefan knows.
Kevin
> > -----Original Message-----
> > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru]
> > Sent: Tuesday, February 25, 2020 12:27 PM
> > To: 'kwolf@redhat.com'
> > Cc: 'qemu-devel@nongnu.org'; 'mreitz@redhat.com'; 'Vladimir Sementsov-Ogievskiy'
> > Subject: RE: Race condition in overlayed qcow2?
> >
> > Kevin, what do you think about it?
> >
> > What guest is intended to receive, when it requests multiple reads to the same buffer in a
> > single DMA transaction?
> >
> > Should it be the first SG part? The last one?
> > Or just a random set of bytes? (Then why it is reading this data in that case?)
> >
> > Pavel Dovgalyuk
> >
> > > -----Original Message-----
> > > From: Vladimir Sementsov-Ogievskiy [mailto:vsementsov@virtuozzo.com]
> > > Sent: Tuesday, February 25, 2020 12:19 PM
> > > To: dovgaluk
> > > Cc: qemu-devel@nongnu.org; mreitz@redhat.com; kwolf@redhat.com
> > > Subject: Re: Race condition in overlayed qcow2?
> > >
> > > 25.02.2020 10:56, dovgaluk wrote:
> > > > Vladimir Sementsov-Ogievskiy писал 2020-02-25 10:27:
> > > >> 25.02.2020 8:58, dovgaluk wrote:
> > > >>> Vladimir Sementsov-Ogievskiy писал 2020-02-21 16:23:
> > > >>>> 21.02.2020 15:35, dovgaluk wrote:
> > > >>>>> Vladimir Sementsov-Ogievskiy писал 2020-02-21 13:09:
> > > >>>>>> 21.02.2020 12:49, dovgaluk wrote:
> > > >>>>>>> Vladimir Sementsov-Ogievskiy писал 2020-02-20 12:36:
> > > >>>>>>
> > > >>>>>> So, preadv in file-posix.c returns different results for the same
> > > >>>>>> offset, for file which is always opened in RO mode? Sounds impossible
> > > >>>>>> :)
> > > >>>>>
> > > >>>>> True.
> > > >>>>> Maybe my logging is wrong?
> > > >>>>>
> > > >>>>> static ssize_t
> > > >>>>> qemu_preadv(int fd, const struct iovec *iov, int nr_iov, off_t offset)
> > > >>>>> {
> > > >>>>> ssize_t res = preadv(fd, iov, nr_iov, offset);
> > > >>>>> qemu_log("preadv %x %"PRIx64"\n", fd, (uint64_t)offset);
> > > >>>>> int i;
> > > >>>>> uint32_t sum = 0;
> > > >>>>> int cnt = 0;
> > > >>>>> for (i = 0 ; i < nr_iov ; ++i) {
> > > >>>>> int j;
> > > >>>>> for (j = 0 ; j < (int)iov[i].iov_len ; ++j)
> > > >>>>> {
> > > >>>>> sum += ((uint8_t*)iov[i].iov_base)[j];
> > > >>>>> ++cnt;
> > > >>>>> }
> > > >>>>> }
> > > >>>>> qemu_log("size: %x sum: %x\n", cnt, sum);
> > > >>>>> assert(cnt == res);
> > > >>>>> return res;
> > > >>>>> }
> > > >>>>>
> > > >>>>
> > > >>>> Hmm, I don't see any issues here..
> > > >>>>
> > > >>>> Are you absolutely sure, that all these reads are from backing file,
> > > >>>> which is read-only and never changed (may be by other processes)?
> > > >>>
> > > >>> Yes, I made a copy and compared the files with binwalk.
> > > >>>
> > > >>>> 2. guest modifies buffers during operation (you can catch it if
> > > >>>> allocate personal buffer for preadv, than calculate checksum, then
> > > >>>> memcpy to guest buffer)
> > > >>>
> > > >>> I added the following to the qemu_preadv:
> > > >>>
> > > >>> // do it again
> > > >>> unsigned char *buf = g_malloc(cnt);
> > > >>> struct iovec v = {buf, cnt};
> > > >>> res = preadv(fd, &v, 1, offset);
> > > >>> assert(cnt == res);
> > > >>> uint32_t sum2 = 0;
> > > >>> for (i = 0 ; i < cnt ; ++i)
> > > >>> sum2 += buf[i];
> > > >>> g_free(buf);
> > > >>> qemu_log("--- sum2 = %x\n", sum2);
> > > >>> assert(sum2 == sum);
> > > >>>
> > > >>> These two reads give different results.
> > > >>> But who can modify the buffer while qcow2 workers filling it with data from the disk?
> > > >>>
> > > >>
> > > >> As far as I know, it's guest's buffer, and guest may modify it during
> > > >> the operation. So, it may be winxp :)
> > > >
> > > > True, but normally the guest won't do it.
> > > >
> > > > But I noticed that DMA operation which causes the problems has the following set of the
> > > buffers:
> > > > dma read sg size 20000 offset: c000fe00
> > > > --- sg: base: 2eb1000 len: 1000
> > > > --- sg: base: 3000000 len: 1000
> > > > --- sg: base: 2eb2000 len: 3000
> > > > --- sg: base: 3000000 len: 1000
> > > > --- sg: base: 2eb5000 len: b000
> > > > --- sg: base: 3040000 len: 1000
> > > > --- sg: base: 2f41000 len: 3000
> > > > --- sg: base: 3000000 len: 1000
> > > > --- sg: base: 2f44000 len: 4000
> > > > --- sg: base: 3000000 len: 1000
> > > > --- sg: base: 2f48000 len: 2000
> > > > --- sg: base: 3000000 len: 1000
> > > > --- sg: base: 3000000 len: 1000
> > > > --- sg: base: 3000000 len: 1000
> > > >
> > > >
> > > > It means that one DMA transaction performs multiple reads into the same address.
> > > > And no races is possible, when there is only one qcow2 worker.
> > > > When there are many of them - they can fill this buffer simultaneously.
> > > >
> > >
> > > Hmm, actually if guest start parallel reads into same buffer from different offsets, races
> > are
> > > possible anyway, as different requests run in parallel even with one worker, because
> > > MAX_WORKERS is per-request value, not total... But several workers may increase probability
> > of
> > > races or introduce new ones.
> > >
> > > So, actually, several workers of one request can write to the same buffer only if guest
> > > provides broken iovec, which references the same buffer several times (if it is possible at
> > > all).
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Vladimir
>
next prev parent reply other threads:[~2020-02-25 11:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-19 14:32 Race condition in overlayed qcow2? dovgaluk
2020-02-19 16:07 ` Vladimir Sementsov-Ogievskiy
2020-02-20 8:31 ` dovgaluk
2020-02-20 9:05 ` Vladimir Sementsov-Ogievskiy
2020-02-20 9:36 ` Vladimir Sementsov-Ogievskiy
2020-02-21 9:49 ` dovgaluk
2020-02-21 10:09 ` Vladimir Sementsov-Ogievskiy
2020-02-21 12:35 ` dovgaluk
2020-02-21 13:23 ` Vladimir Sementsov-Ogievskiy
2020-02-25 5:58 ` dovgaluk
2020-02-25 7:27 ` Vladimir Sementsov-Ogievskiy
2020-02-25 7:56 ` dovgaluk
2020-02-25 9:19 ` Vladimir Sementsov-Ogievskiy
2020-02-25 9:26 ` Pavel Dovgalyuk
2020-02-25 10:07 ` Pavel Dovgalyuk
2020-02-25 11:47 ` Kevin Wolf [this message]
2020-02-20 10:00 ` Pavel Dovgalyuk
2020-02-20 11:26 ` Vladimir Sementsov-Ogievskiy
2020-02-20 11:48 ` Pavel Dovgalyuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200225114711.GA7632@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=dovgaluk@ispras.ru \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).