[Qemu-devel] Data corruption in Qemu 2.7.1

Discussion:

Peter Lieven

2017-01-13 10:44:32 UTC

Hi,

i currently facing a problem in our testing environment where I see file system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
Trying to bisect, but has anyone observed this before?

Thanks,
Peter

Fam Zheng

2017-01-17 06:40:28 UTC

Permalink

Post by Peter Lieven
Hi,
i currently facing a problem in our testing environment where I see file
system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
Trying to bisect, but has anyone observed this before?

The information here is too scarce to tell but a file corruption is more often a
result of two writers modifying the disk concurrently. Have you ruled that out?
Is the corruption reproducible?

Fam

Peter Lieven

2017-01-17 10:14:47 UTC

Permalink

Post by Fam Zheng

The information here is too scarce to tell but a file corruption is more often a
result of two writers modifying the disk concurrently. Have you ruled that out?
Is the corruption reproducible?

My issue was primary with iSCSI and I cut bisect it. I already send a patch to the list:

commit 0bd57e907311be6e4f97394cfd9afebe271457e2
Author: Peter Lieven <***@kamp.de>
Date: Mon Jan 16 16:10:26 2017 +0100

block/iscsi: avoid data corruption with cache=writeback

nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
request starts and ends within the same cluster. This results
in passing -1 to bitmap_set and bitmap_clear and they don't
handle negative values properly. In the end this leads to data
corruption.

Fixes: e1123a3b40a1a9a625a29c8ed4debb7e206ea690
Cc: qemu-***@nongnu.org
Signed-off-by: Peter Lieven <***@kamp.de>

However, one user also reported corruption with LVM. I will check if he uses virtio-scsi.

Thanks,
Peter

Alexandre DERUMIER

2017-01-17 07:33:46 UTC

Permalink

Hi,

proxmox users have reported recently corruption with qemu 2.7 and scsi-block (with passing physical /dev/sdX to virtio-scsi).

working fine with qemu 2.6.

qemu 2.7 + scsi-hd works fine

https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2

----- Mail original -----
De: "Peter Lieven" <***@kamp.de>
À: "qemu-devel" <qemu-***@nongnu.org>
Cc: "qemu-stable" <qemu-***@nongnu.org>
Envoyé: Vendredi 13 Janvier 2017 11:44:32
Objet: [Qemu-devel] Data corruption in Qemu 2.7.1

Hi,

i currently facing a problem in our testing environment where I see file system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
Trying to bisect, but has anyone observed this before?

Thanks,
Peter

Fabian Grünbichler

2017-01-17 08:03:26 UTC

Permalink

Post by Alexandre DERUMIER
Hi,
proxmox users have reported recently corruption with qemu 2.7 and scsi-block (with passing physical /dev/sdX to virtio-scsi).
working fine with qemu 2.6.
qemu 2.7 + scsi-hd works fine
https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2

I am fairly sure this is a separate issue.

Commit 8fdc7839e40f43a426bc7e858cf1dbfe315a3804 (first included in
2.7.0)[1] changed the behaviour of scsi-block passthrough. Previously
this worked with SATA disks, now it doesn't anymore. A bisect run
confirmed this, scsi-block with a SATA disk passed through via
virtio-scsi-single corrupts on writes since that commit, scsi-hd and
scsi-disk work fine (scsi-generic corrupts as well).

PVE's detection logic for passthrough just differentiated between disks
and tape drives, and unfortunately the SG_IO ioctl says SATA disks are
disks as well.. we probably need to default to scsi-hd or scsi-disk
instead of scsi-block, and only when we explicitly detect a "real" SCSI
disk we are allowed to use scsi-block?

@Paolo: was the old behaviour just an accident and the new bevaviour
intentional? documentation is quite sparse, or maybe I am looking in the
wrong places..

1: scsi-block: always use SG_IO

Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status. This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT. Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.

This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.

Signed-off-by: Paolo Bonzini <***@redhat.com>

Paolo Bonzini

2017-01-17 10:41:44 UTC

Permalink

Post by Fabian GrÃ¼nbichler
Commit 8fdc7839e40f43a426bc7e858cf1dbfe315a3804 (first included in
2.7.0)[1] changed the behaviour of scsi-block passthrough. Previously
this worked with SATA disks, now it doesn't anymore. A bisect run
confirmed this, scsi-block with a SATA disk passed through via
virtio-scsi-single corrupts on writes since that commit, scsi-hd and
scsi-disk work fine (scsi-generic corrupts as well).
PVE's detection logic for passthrough just differentiated between disks
and tape drives, and unfortunately the SG_IO ioctl says SATA disks are
disks as well.. we probably need to default to scsi-hd or scsi-disk
instead of scsi-block, and only when we explicitly detect a "real" SCSI
disk we are allowed to use scsi-block?
@Paolo: was the old behaviour just an accident and the new bevaviour
intentional? documentation is quite sparse, or maybe I am looking in the
wrong places..

No, it would be a bug (QEMU or kernel).

Do you have an easy reproducer with dd, as suggested at
https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2?

Paolo

Fabian Grünbichler

2017-01-17 11:22:24 UTC

Permalink

Post by Paolo Bonzini

No, it would be a bug (QEMU or kernel).
Do you have an easy reproducer with dd, as suggested at
https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2?
Paolo

setup

1) dd 1G of (u)random data to a file on a file system on a SATA disk on
the host (I tested with ext2/3/4, ZFS and btrfs, btrfs produces
corruption the fastest here, but raw writes to the device should already
trigger the kernel error messages)

2) calculate the md5sum of this file, and store it on the same FS

test (with current qemu master, using PVE's 4.4.35 kernel as host kernel

1) start Ubuntu 16.04 VM (see full commandline at the end, I tested with
three disks, one for each of the file systems above) - the same behaviour
can also be observed when booting from an Alpine Linux iso

2) mount FS

3) dd random file from above to new file on same FS

4) sync

5) check md5sum of resulting file

6) repeat 3-5 until md5sum does not match, kernel spews error
messages, or you are convinced that everything is OK

sample kernel message (for ext3):
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192
Jan 17 11:39:32 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 31937280)
Jan 17 11:39:32 ubuntu kernel: buffer_io_error: 246 callbacks suppressed
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936768
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936769
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936770
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936771
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936772
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936773
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936774
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936775
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936776
Jan 17 11:39:32 ubuntu kernel: Buffer I/O error on device sda1, logical block 31936777
Jan 17 11:39:39 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:39:41 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:39:55 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:39:56 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:07 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:08 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:15 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 17 11:40:22 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8

qemu commandline, originally generated by PVE, feel free to adapt or
minimize.. sdb, sdc and sde are the three SATA disks from above,
vm-101-disk-1 is the Ubuntu rootfs on LVM-thin.

/root/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-enable-kvm \
-chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' \
-mon 'chardev=qmp,mode=control' \
-pidfile /var/run/qemu-server/101.pid \
-smbios 'type=1,uuid=3e550136-9d7f-4a12-9d98-e25a5d2d5806' \
-name diskpassthroughtest \
-smp '8,sockets=1,cores=8,maxcpus=8' \
-nodefaults \
-boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
-vga cirrus \
-vnc unix:/var/run/qemu-server/101.vnc,x509,password \
-cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce \
-m 4096 \
-k de \
-device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
-device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
-device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' \
-device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
-device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:c6676e1b1f72' \
-drive 'file=/mnt/pve/iso/template/iso/ubuntu-16.04.1-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' \
-device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
-device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1' \
-drive 'file=/dev/sdb,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' \
-device 'scsi-block,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' \
-device 'virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2' \
-drive 'file=/dev/sdc,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' \
-device 'scsi-block,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' \
-device 'virtio-scsi-pci,id=virtioscsi3,bus=pci.3,addr=0x4' \
-drive 'file=/dev/sde,if=none,id=drive-scsi3,format=raw,cache=none,aio=native,detect-zeroes=on' \
-device 'scsi-block,bus=virtioscsi3.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi3,id=scsi3' \
-drive 'file=/dev/pve/vm-101-disk-1,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=103' \
-netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=46:83:4B:92:EC:88,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'

Paolo Bonzini

2017-01-17 15:03:19 UTC

Permalink

Post by Fabian GrÃ¼nbichler
6) repeat 3-5 until md5sum does not match, kernel spews error
messages, or you are convinced that everything is OK
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Sense Key : Illegal Request [current]
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 Add. Sense: Invalid field in cdb
Jan 17 11:39:32 ubuntu kernel: sd 2:0:0:0: [sda] tag#32 CDB: Write(10) 2a 00 0f 3a 90 00 00 07 d8 00
Jan 17 11:39:32 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 255496192

Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the
host? Or also using this systemtap script.

The important bit would be the lines with a nonzero status, but the
others can be useful to see what the surroundings look like.

# example output for "sudo stap -v strace.stp -c 'sg_opcodes /dev/sda'"
# | sg_opcodes[3444] 00000000 12 00 00 00 24 00 00 00 be 91
# | sg_opcodes[3444] 08100002 a3 0c 00 00 00 00 00 00 20 00

global cdbs%
global reqs%
global names%

function check_pid() {
return target() == 0 || pid() == target();
}

probe kernel.function("blk_fill_sghdr_rq") {
if (!check_pid()) next;

names[$rq]=sprintf("%s[%d]", execname(), tid())
cdbs[$rq]=sprintf("%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x",
$hdr->cmdp[0],$hdr->cmdp[1],$hdr->cmdp[2],$hdr->cmdp[3],$hdr->cmdp[4],
$hdr->cmdp[6],$hdr->cmdp[5],$hdr->cmdp[7],$hdr->cmdp[8],$hdr->cmdp[9])
}

probe kernel.function("scsi_setup_cmnd") {
if (!($req in cdbs)) next;
reqs[$req->special] = $req;
}

probe kernel.function("scsi_finish_command") {
if (!($cmd in reqs)) next;
rq = reqs[$cmd];
printf("%s %08x %s\n", names[rq], $cmd->result, cdbs[rq]);
delete reqs[$cmd]
delete cdbs[rq]
}

Paolo Bonzini

2017-01-17 16:24:55 UTC

Permalink

Post by Paolo Bonzini

Can you reproduce it if QEMU runs under "strace -e ioctl -ff" in the
host? Or also using this systemtap script.
The important bit would be the lines with a nonzero status, but the
others can be useful to see what the surroundings look like.
# example output for "sudo stap -v strace.stp -c 'sg_opcodes /dev/sda'"
# | sg_opcodes[3444] 00000000 12 00 00 00 24 00 00 00 be 91
# | sg_opcodes[3444] 08100002 a3 0c 00 00 00 00 00 00 20 00
global cdbs%
global reqs%
global names%
function check_pid() {
return target() == 0 || pid() == target();
}
probe kernel.function("blk_fill_sghdr_rq") {
if (!check_pid()) next;
names[$rq]=sprintf("%s[%d]", execname(), tid())
cdbs[$rq]=sprintf("%02x %02x %02x %02x %02x %02x %02x %02x %02x %02x",
$hdr->cmdp[0],$hdr->cmdp[1],$hdr->cmdp[2],$hdr->cmdp[3],$hdr->cmdp[4],
$hdr->cmdp[6],$hdr->cmdp[5],$hdr->cmdp[7],$hdr->cmdp[8],$hdr->cmdp[9])
}
probe kernel.function("scsi_setup_cmnd") {
if (!($req in cdbs)) next;
reqs[$req->special] = $req;
}
probe kernel.function("scsi_finish_command") {
if (!($cmd in reqs)) next;
rq = reqs[$cmd];
printf("%s %08x %s\n", names[rq], $cmd->result, cdbs[rq]);
delete reqs[$cmd]
delete cdbs[rq]

Please add a "delete names[rq]" here too!

Paolo

Post by Paolo Bonzini
}

Fabian Grünbichler

2017-01-18 11:50:50 UTC

Permalink

Post by Paolo Bonzini

OT: systemtap is not working with your script under Debian Jessie (or
maybe in general under Debian Jessie? not sure).

after some further testing it seems like this change in Qemu exposes
some subtle issue with our specific kernel (it works fine with the
upstream Ubuntu 4.4 one which ours is based on). I am currently
debugging further to narrow down potential causes - if I need further
input from your side or if I suspect Qemu to be at fault I'll resurrect
this thread (and include the strace output).

thanks for your quick reaction anyhow!

Fabian Grünbichler

2017-01-18 16:19:41 UTC

Permalink

Post by Fabian GrÃ¼nbichler

Post by Paolo Bonzini

OT: systemtap is not working with your script under Debian Jessie (or
maybe in general under Debian Jessie? not sure).
after some further testing it seems like this change in Qemu exposes
some subtle issue with our specific kernel (it works fine with the
upstream Ubuntu 4.4 one which ours is based on). I am currently
debugging further to narrow down potential causes - if I need further
input from your side or if I suspect Qemu to be at fault I'll resurrect
this thread (and include the strace output).
thanks for your quick reaction anyhow!

okay, so this looks like either a bug in Qemu or the upstream kernel.

disabling THP on the hypervisor host with

# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled

allows reproducing the bug very reliably, shutting the VM down, then
enabling THP (with 'always') and trying again makes it go away.

Qemu was compiled with:
../configure --with-confsuffix=/kvm --target-list=x86_64-softmmu
--disable-xen --enable-gnutls --enable-sdl --enable-uuid
--enable-linux-aio --enable-libiscsi --disable-smartcard
--audio-drv-list=alsa --enable-spice --enable-usb-redir --enable-libusb
--disable-gtk --enable-xfsctl --enable-numa --disable-strip
--enable-jemalloc --disable-libnfs --disable-fdt

attached is an strace with qemu master and mainline 4.9 running on
Debian Jessie - I will try to test it with Fedora or CentOS tomorrow.

journal in the VM says the following:

Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 0d d6 51 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232149320
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018409
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018410
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018411
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018412
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018413
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018414
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018415
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018416
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018417
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018418
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 0d d6 59 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232151368
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8

strace (with some random grep-ing):
[pid 1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
[pid 1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)

Paolo Bonzini

2017-01-18 16:30:17 UTC

Permalink

Post by Fabian GrÃ¼nbichler
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Sense Key : Illegal Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 Add. Sense: Invalid field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#109 CDB: Write(10) 2a 00 0d d6 51 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232149320
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29018921)
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018409
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018410
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018411
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018412
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018413
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018414
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018415
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018416
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018417
Jan 18 17:07:51 ubuntu kernel: Buffer I/O error on device sda1, logical block 29018418
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Sense Key : Illegal Request [current]
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 Add. Sense: Invalid field in cdb
Jan 18 17:07:51 ubuntu kernel: sd 2:0:0:0: [sda] tag#106 CDB: Write(10) 2a 00 0d d6 59 48 00 08 00 00
Jan 18 17:07:51 ubuntu kernel: blk_update_request: critical target error, dev sda, sector 232151368
Jan 18 17:07:51 ubuntu kernel: EXT4-fs warning (device sda1): ext4_end_bio:329: I/O error -121 writing to inode 125 (offset 0 size 0 starting block 29019177)
Jan 18 17:07:52 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
Jan 18 17:07:58 ubuntu kernel: JBD2: Detected IO errors while flushing file data on sda1-8
[pid 1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 51, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=17, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`\235=c\177\0\0\0\0\1\0\0\0\0\0\0`\236=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)
[pid 1794] ioctl(19, SG_IO, {'S', SG_DXFER_TO_DEV, cmd[10]=[2a, 00, 0d, d6, 59, 48, 00, 08, 00, 00], mx_sb_len=252, iovec_count=16, dxfer_len=1048576, timeout=4294967295, flags=0x1, data[1048576]=["\0`-=c\177\0\0\0\0\1\0\0\0\0\0\0`.=c\177\0\0\0\0\1\0\0\0\0\0"...]}) = -1 EINVAL (Invalid argument)

This is useful, thanks. I suspect blk_rq_map_user_iov is failing,
meaning that the scatter/gather list has too many segments for the HBA
in the host. (The limit can be found in /sys/block/sda/queue/max_segments).

Post by Fabian GrÃ¼nbichler
disabling THP on the hypervisor host with
# echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
allows reproducing the bug very reliably, shutting the VM down, then
enabling THP (with 'always') and trying again makes it go away.

because no THP means more memory fragmentation and thus more segments.

I'm not sure how to fix it, unfortunately. :(

Paolo

Fabian Grünbichler

2017-01-18 17:17:28 UTC

Permalink

Post by Paolo Bonzini

I can try to get some more info tomorrow..

Post by Paolo Bonzini

because no THP means more memory fragmentation and thus more segments.
I'm not sure how to fix it, unfortunately. :(

Well at least this means we have a (potentially too conservative) check for deciding when to use scsi-disk instead of scsi-block (maybe this could be detected in qemu as well?).

Seems especially troublesome since the (hypervisor) admin can change it at runtime, and it seems like there are widespread recommendations to disable THP for e.g., DB use cases..

Post by Paolo Bonzini
Paolo

Fabian Grünbichler

2017-01-19 11:59:58 UTC

Permalink

Post by Paolo Bonzini

limit is 168 for all the disks I tested with.

Post by Paolo Bonzini

because no THP means more memory fragmentation and thus more segments.

it is also very easily reproducible with both THP enable and defrag set
to madvise or always, if tested in fragmented- or low-memory conditions.

my test host has 64G of memory, my test VM 4G, huge pages are 2k big

if I simulate some memory load by repeatedly reserving 50G memory using
stress-ng:

# stress-ng --vm 50 --vm-bytes=1G --vm-hang 30

and then start the test VM and the dd-ing, I can see the big chunk of
AnonHugePages allocated to the VM system grow:

# grep -E 'AnonHugePages:[[:space:]]+[0-9]{5,} kB' /proc/$(pidof qemu-system-x86_64)/smaps

up to about 3G (of 4G), and hit the issue.

without the additional load and fragmentation using stress-ng, the
AnonHugePages allocated to the qemu process grow to the expected 4G, and
the issue does not occur.

Post by Paolo Bonzini
I'm not sure how to fix it, unfortunately. :(

so this means either use non-transparent huge pages when using
scsi-block (haven't verified but should work?), or use aggressive THP
settings and/or always leave enough memory reserves? :-/ this is very
unfortunate IMHO (and probably also not a very realistic usage
scenario?)

Post by Paolo Bonzini
Paolo

Paolo Bonzini

2017-01-24 09:35:19 UTC

Permalink

Post by Fabian GrÃ¼nbichler

Post by Paolo Bonzini
I'm not sure how to fix it, unfortunately. :(

Yes, I agree. Unfortunately there is no API in Linux that lets you
verify how many segments an iov is split into.

Paolo

Continue reading on narkive:

Search results for '[Qemu-devel] Data corruption in Qemu 2.7.1' (Questions and Answers)

replies

Any chance of portable Linux messing up my computer?

started 2009-09-17 07:46:00 UTC

software