[PATCH] kvm: clear guest TSC on reset

Discussion:

Fernando Luis Vázquez Cao

2013-12-05 06:08:42 UTC

I realized that the TSC reset should be done in QEMU
so I will be replying with a QEMU patch instead of a
KVM one.

- Fernando

I think there is a problem with the current patch, so please
ignore for the moment. I will be replying with an update ASAP.
Sorry for the noise.
- Fernando

=20

that
the newly booted kernel will panic or hang.
(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intel=C2=AE Xeo=

n=C2=AE

Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."
---
diff -urNp linux-3.13-rc2-orig/arch/x86/kvm/x86.c=20
linux-3.13-rc2/arch/x86/kvm/x86.c
--- linux-3.13-rc2-orig/arch/x86/kvm/x86.c 2013-11-30=20
05:57:14.000000000 +0900
+++ linux-3.13-rc2/arch/x86/kvm/x86.c 2013-12-03=20
14:51:53.747600839 +0900
@@ -6716,18 +6716,24 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu
return r;
}
-int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
+static void kvm_tsc_reset(struct kvm_vcpu *vcpu)
{
- int r;
struct msr_data msr;
- r =3D vcpu_load(vcpu);
- if (r)
- return r;
msr.data =3D 0x0;
msr.index =3D MSR_IA32_TSC;
msr.host_initiated =3D true;
kvm_write_tsc(vcpu, &msr);
+}
+
+int kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
+{
+ int r;
+
+ r =3D vcpu_load(vcpu);
+ if (r)
+ return r;
+ kvm_tsc_reset(vcpu);
vcpu_put(vcpu);
return r;
@@ -6770,6 +6776,10 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcp
kvm_pmu_reset(vcpu);
+ kvm_tsc_reset(vcpu);
+ if (guest_cpuid_has_tsc_adjust(vcpu))
+ vcpu->arch.ia32_tsc_adjust_msr =3D 0x0;
+
memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs));
vcpu->arch.regs_avail =3D ~0;
vcpu->arch.regs_dirty =3D ~0;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Fernando Luis Vázquez Cao

2013-12-05 06:15:04 UTC

Permalink

VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overfl=
ow
in cyc2ns_offset").

To put it in a nutshell, if a Linux guest without the patch above appli=
ed
has been up more than 208 days and attempts a warm reset chances are th=
at
the newly booted kernel will panic or hang.

(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intel=C2=AE Xeon=C2=
=AE
Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."

Cc: ***@vger.kernel.org
Cc: Will Auld <***@intel.com>
Cc: Marcelo Tosatti <***@redhat.com>
Signed-off-by: Fernando Luis Vazquez Cao <***@oss.ntt.co.jp>
---

--- qemu-orig/target-i386/kvm.c 2013-11-28 07:02:45.000000000 +0900
+++ qemu/target-i386/kvm.c 2013-12-05 14:47:03.085738175 +0900
@@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
}
if (has_msr_tsc_adjust) {
+ if (level =3D=3D KVM_PUT_RESET_STATE)
+ env->tsc_adjust =3D 0;
kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust)=
;
}
if (has_msr_misc_enable) {
@@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
}
#endif
- if (level =3D=3D KVM_PUT_FULL_STATE) {
+ /*
+ * The following MSRs have side effects on the guest or are too he=
avy
+ * for normal writeback. Limit them to reset or full state updates=
=2E
+ */
+ if (level >=3D KVM_PUT_RESET_STATE) {
+ if (level =3D=3D KVM_PUT_RESET_STATE)
+ env->tsc =3D 0;
/*
* KVM is yet unable to synchronize TSC values of multiple VCP=
Us on
* writeback. Until this is fixed, we only write the offset to=
SMP
* guests after migration, desynchronizing the VCPUs, but avoi=
ding
* huge jump-backs that would occur without any writeback at a=
ll.
*/
- if (smp_cpus =3D=3D 1 || env->tsc !=3D 0) {
+ if (smp_cpus =3D=3D 1 || env->tsc !=3D 0 || level =3D=3D KVM_P=
UT_RESET_STATE) {
kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
}
- }
- /*
- * The following MSRs have side effects on the guest or are too he=
avy
- * for normal writeback. Limit them to reset or full state updates=
=2E
- */
- if (level >=3D KVM_PUT_RESET_STATE) {
kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
env->system_time_msr);
kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_cl=
ock_msr);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Paolo Bonzini

2013-12-05 09:28:18 UTC

Permalink

flow

in cyc2ns_offset").
=20
To put it in a nutshell, if a Linux guest without the patch above app=

lied

has been up more than 208 days and attempts a warm reset chances are =

that

the newly booted kernel will panic or hang.
=20
(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intel=C2=AE Xeon=C2=

=AE

Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."
=20

I agree that the bug is in QEMU. One small nit in your patch is that
you should reset env->tsc_adjust and env->tsc in x86_cpu_reset. This
would already be pretty good.

However, a bigger problem is that env->tsc is a useless duplicate of
"cpu_get_ticks() + env->tsc_adjust". It would be nice to drop env->tsc
completely except for migration backwards compatibility. Thus you can:

- fill in env->tsc as mentioned above from target-i386/machine.c's
cpu_pre_save function. This guarantees backwards compatibility.

- add a function cpu_set_ticks(int64_t ticks) to cpus.c. The function
does nothing if use_icount is true, otherwise it needs to have (roughly=
)
the opposite logic compared to cpu_get_ticks. You then call this
function from x86_cpu_reset instead of setting env->tsc. You can
similarly call this function from kvm_get_msrs.

- add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
kvm-stub.c. For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
int64_t ticks) in target-*/kvm.c. The kvm_arch_set_tsc() function has =
a
dummy implementation for all architectures except x86. For x86 it call=
s
KVM_SET_MSRS passing "ticks + env->tsc_offset".

- call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()

Can you do this?

Thanks,

Paolo

---
=20
--- qemu-orig/target-i386/kvm.c 2013-11-28 07:02:45.000000000 +0900
+++ qemu/target-i386/kvm.c 2013-12-05 14:47:03.085738175 +0900
@@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave=

);

}
if (has_msr_tsc_adjust) {
+ if (level =3D=3D KVM_PUT_RESET_STATE)
+ env->tsc_adjust =3D 0;
kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjus=

t);

}
if (has_msr_misc_enable) {
@@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
}
#endif
- if (level =3D=3D KVM_PUT_FULL_STATE) {
+ /*
+ * The following MSRs have side effects on the guest or are too =

heavy

+ * for normal writeback. Limit them to reset or full state updat=

es.

+ */
+ if (level >=3D KVM_PUT_RESET_STATE) {
+ if (level =3D=3D KVM_PUT_RESET_STATE)
+ env->tsc =3D 0;
/*
* KVM is yet unable to synchronize TSC values of multiple V=

CPUs on

* writeback. Until this is fixed, we only write the offset =

to SMP

* guests after migration, desynchronizing the VCPUs, but av=

oiding

* huge jump-backs that would occur without any writeback at=

all.

*/
- if (smp_cpus =3D=3D 1 || env->tsc !=3D 0) {
+ if (smp_cpus =3D=3D 1 || env->tsc !=3D 0 || level =3D=3D KVM=

_PUT_RESET_STATE) {

kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
}
- }
- /*
- * The following MSRs have side effects on the guest or are too =

heavy

- * for normal writeback. Limit them to reset or full state updat=

es.

- */
- if (level >=3D KVM_PUT_RESET_STATE) {
kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
env->system_time_msr);
kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_=

clock_msr);

=20
=20
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
=20

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Fernando Luis Vazquez Cao

2013-12-05 13:15:06 UTC