Discussion:
[RFC] live snapshot, live merge, live block migration
Dor Laor
2011-05-09 13:40:00 UTC
Permalink
No patch here (sorry) but collection of thoughts about these features
and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)

Future qemu is expected to support these features (some already
implemented):

* Live block copy

Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.

Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.

* Live snapshots and live snapshot merge

Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).

Live snapshot merge is required in order of reducing the overhead
caused by the additional snapshots (sometimes over raw device).
Currently not implemented for a live running guest

Possibility: enhance live copy to be used for live snapshot merge.
It is almost the same mechanism.

* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)

It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).

Such functionality can be hooked together with live block migration
instead of the 'post copy' method.

* Live block migration (pre/post)

Beyond live block copy we'll sometimes need to move both the storage
and the guest. There are two main approached here:
- pre copy
First live copy the image and only then live migration the VM.
It is simple but if the purpose of the whole live block migration
was to balance the cpu load, it won't be practical to use since
copying an image of 100GB will take too long.
- post copy
First live migrate the VM, then live copy it's blocks.
It's better approach for HA/load balancing but it might make
management complex (need to keep the source VM alive, what happens
on failures?)
Using copy on read might simplify it -
post copy = live snapshot + copy on read.

In addition there are two cases for the storage access:
1. The source block device is shared and can be easily accessed by
the destination qemu-kvm process.
That's the easy case, no special protocol needed for the block
devices copying.
2. There is no shared storage at all.
This means we should implement a block access protocol over the
live migration fd :(

We need to chose whether to implement a new one, or re-use NBD or
iScsi (target&initiator)

* Using external dirty block bitmap

FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.

We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.

Summary:
* We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
* General copy on read is desirable
* Live snapshot merge to be implemented using block copy
* Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
* Keep qemu-img the single interface for dirty block mappings.
* Live block migration pre copy == live copy + block access protocol
+ live migration
* Live block migration post copy == live migration + block access
protocol/copy on read.

Comments?

Regards,
Dor
Anthony Liguori
2011-05-09 15:23:03 UTC
Permalink
Post by Dor Laor
No patch here (sorry) but collection of thoughts about these features
and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)
Future qemu is expected to support these features (some already
* Live block copy
Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.
Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.
Live block copy is growing on me. It can actually be used (with an
intermediate network storage) to do live block migration.
Post by Dor Laor
* Live snapshots and live snapshot merge
Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).
Live snapshot is unfortunately not really "live". It runs a lot of
operations synchronously which will cause the guest to incur downtime.

We really need to refactor it to truly be live.
Post by Dor Laor
* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)
It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).
I think the image format is really the best place to have this logic.
Of course, if we have live snapshot merge, we could use a temporary
QED/QCOW2 file and then merge afterwards.
Post by Dor Laor
* Using external dirty block bitmap
FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.
We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.
Does the dirty block bitmap need to exist outside of QEMU?

IOW, if it goes away after a guest shuts down, is that problematic?

I think it potentially greatly simplifies the problem which makes it
appealing from my perspective.

Regards,

Anthony Liguori
Dor Laor
2011-05-09 20:58:55 UTC
Permalink
Post by Dor Laor
No patch here (sorry) but collection of thoughts about these features
and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)
Future qemu is expected to support these features (some already
* Live block copy
Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.
Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.
Live block copy is growing on me. It can actually be used (with an
intermediate network storage) to do live block migration.
I'm not sure that we can relay on such storage. While it looks that
anyway can get such temporal storage, it makes failure cases complex, it
will need additional locking, security permissions, etc.

That said, the main gap is the block copy protocol and using qemu as
iScsi target/initiator might be a good solution.
Post by Dor Laor
* Live snapshots and live snapshot merge
Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).
Live snapshot is unfortunately not really "live". It runs a lot of
operations synchronously which will cause the guest to incur downtime.
We really need to refactor it to truly be live.
Well live migration is not really live too.
It can be thought as implementation details and improved later on.
Post by Dor Laor
* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)
It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).
I think the image format is really the best place to have this logic. Of
course, if we have live snapshot merge, we could use a temporary
QED/QCOW2 file and then merge afterwards.
Post by Dor Laor
* Using external dirty block bitmap
FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.
We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.
Does the dirty block bitmap need to exist outside of QEMU?
IOW, if it goes away after a guest shuts down, is that problematic?
I admit I didn't give it enough thought, I think that sharing the code
w/ qemu-img should be enough for us. If we have a live block operation
and suddenly the guest shuts down in the middle we need to finish the
block copy.
I think it potentially greatly simplifies the problem which makes it
appealing from my perspective.
Regards,
Anthony Liguori
Marcelo Tosatti
2011-05-12 14:18:17 UTC
Permalink
Post by Anthony Liguori
Post by Dor Laor
No patch here (sorry) but collection of thoughts about these features
and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)
Future qemu is expected to support these features (some already
* Live block copy
Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.
Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.
Live block copy is growing on me. It can actually be used (with an
intermediate network storage) to do live block migration.
Post by Dor Laor
* Live snapshots and live snapshot merge
Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).
Live snapshot is unfortunately not really "live". It runs a lot of
operations synchronously which will cause the guest to incur
downtime.
We really need to refactor it to truly be live.
Post by Dor Laor
* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)
It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).
I think the image format is really the best place to have this
logic. Of course, if we have live snapshot merge, we could use a
temporary QED/QCOW2 file and then merge afterwards.
Post by Dor Laor
* Using external dirty block bitmap
FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.
We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.
Does the dirty block bitmap need to exist outside of QEMU?
IOW, if it goes away after a guest shuts down, is that problematic?
I think it potentially greatly simplifies the problem which makes it
appealing from my perspective.
One limitation of block copy is the need to rewrite data that differs
from the base image on every "merge". But this is a limitation of qcow2
external snapshots represented as files, not block copy itself (with
external qcow2 snapshots, even a "live block merge" would require
potentially copying large amounts of data).

Only with snapshots internal to an image data copying can be avoided
(and depending on the scenario, this can be a nasty limitation).
Post by Anthony Liguori
Regards,
Anthony Liguori
Jes Sorensen
2011-05-12 15:37:39 UTC
Permalink
Post by Anthony Liguori
Post by Dor Laor
* Live snapshots and live snapshot merge
Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).
Live snapshot is unfortunately not really "live". It runs a lot of
operations synchronously which will cause the guest to incur downtime.
We really need to refactor it to truly be live.
We keep having this discussion, but as pointed out in my last reply on
this, you can pre-create your image if you so desire. The actual
snapshot then becomes less in one command. Yes we can make it even
nicer, but what we have now is far less bad than you make it out to be.

Cheers,
Jes
Marcelo Tosatti
2011-05-10 14:13:10 UTC
Permalink
Post by Dor Laor
No patch here (sorry) but collection of thoughts about these
features and their potential building blocks. Please review (also on
http://wiki.qemu.org/Features/LiveBlockMigration)
Future qemu is expected to support these features (some already
* Live block copy
Ability to copy 1+ virtual disk from the source backing file/block
device to a new target that is accessible by the host. The copy
supposed to be executed while the VM runs in a transparent way.
Status: code exists (by Marcelo) today in qemu but needs refactoring
due to a race condition at the end of the copy operation. We agreed
that a re-implementation of the copy operation should take place
that makes sure the image is completely mirrored until management
decides what copy to keep.
* Live snapshots and live snapshot merge
Live snapshot is already incorporated (by Jes) in qemu (still need
qemu-agent work to freeze the guest FS).
Live snapshot merge is required in order of reducing the overhead
caused by the additional snapshots (sometimes over raw device).
Currently not implemented for a live running guest
Possibility: enhance live copy to be used for live snapshot merge.
It is almost the same mechanism.
The idea is to use live block copy to perform snapshot "live merges".
The advantage is the simplicity, since there is no need to synchronize
between live merge writes and guest writes.

With live copy the guest is either using the old image or the new copy,
so crash handling is relatively simple.
Post by Dor Laor
* Copy on read (image streaming)
Ability to start guest execution while the parent image reside
remotely and each block access is replicated to a local copy (image
format snapshot)
It should be nice to have a general mechanism that will be used for
all image formats. What about the protocol to access these blocks
over the net? We can reuse existing ones (nbd/iscsi).
Such functionality can be hooked together with live block migration
instead of the 'post copy' method.
* Live block migration (pre/post)
Beyond live block copy we'll sometimes need to move both the storage
- pre copy
First live copy the image and only then live migration the VM.
It is simple but if the purpose of the whole live block migration
was to balance the cpu load, it won't be practical to use since
copying an image of 100GB will take too long.
- post copy
First live migrate the VM, then live copy it's blocks.
It's better approach for HA/load balancing but it might make
management complex (need to keep the source VM alive, what happens
on failures?)
Using copy on read might simplify it -
post copy = live snapshot + copy on read.
1. The source block device is shared and can be easily accessed by
the destination qemu-kvm process.
That's the easy case, no special protocol needed for the block
devices copying.
2. There is no shared storage at all.
This means we should implement a block access protocol over the
live migration fd :(
We need to chose whether to implement a new one, or re-use NBD or
iScsi (target&initiator)
* Using external dirty block bitmap
FVD has an option to use external dirty block bitmap file in
addition to the regular mapping/data files.
We can consider using it for live block migration and live merge too.
It can also allow additional usages of 3rd party tools to calculate
diffs between the snapshots.
There is a big down side thought since it will make management
complicated and there is the risky of the image and its bitmap file
get out of sync. It's much better choice to have qemu-img tool to be
the single interface to the dirty block bitmap data.
* We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
* General copy on read is desirable
* Live snapshot merge to be implemented using block copy
* Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
* Keep qemu-img the single interface for dirty block mappings.
* Live block migration pre copy == live copy + block access protocol
+ live migration
* Live block migration post copy == live migration + block access
protocol/copy on read.
Comments?
Regards,
Dor
Jes Sorensen
2011-05-12 15:33:06 UTC
Permalink
Post by Dor Laor
* We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
* General copy on read is desirable
* Live snapshot merge to be implemented using block copy
* Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
* Keep qemu-img the single interface for dirty block mappings.
* Live block migration pre copy == live copy + block access protocol
+ live migration
* Live block migration post copy == live migration + block access
protocol/copy on read.
Comments?
I think we should add Jagane Sundar's Livebackup to the watch list here.
It looks very interesting as an alternative way to reach some of the
same goals.

Cheers,
Jes
Jagane Sundar
2011-05-13 03:16:08 UTC
Permalink
Post by Jes Sorensen
Post by Dor Laor
* We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
* General copy on read is desirable
* Live snapshot merge to be implemented using block copy
* Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
* Keep qemu-img the single interface for dirty block mappings.
* Live block migration pre copy == live copy + block access protocol
+ live migration
* Live block migration post copy == live migration + block access
protocol/copy on read.
Comments?
I think we should add Jagane Sundar's Livebackup to the watch list here.
It looks very interesting as an alternative way to reach some of the
same goals.
Cheers,
Jes
Thanks for the intro, Jes. I am very interested in garnering support for
Livebackup.

You are correct in that Livebackup solves some, but not all, problems in
the same space.

Some comments about my code: It took me about two months of development
before I connected with you on the list.
Initially, I started off by doing a dynamic block transfer such that
fewer and fewer blocks are dirty till there are no more dirty blocks and
we declare the backup complete. The problem with this approach was that
there was no real way to plug in a guest file system quiesce function. I
then moved on to the snapshot technique. With this snapshot technique I
am also able to test the livebackup function very thoroughly - I use a
technique where I create a LVM snapshot simultaneously, and do a cmp of
the LVM snapshot and the livebackup backup image.

With this mode of testing, I am very confident of the integrity of my
solution.

I chose to invent a new protocol that is very simple, and custom to
livebackup, because I needed livebackup specific functions such as
'create snapshot', 'delete snapshot', etc. Also, I am currently
implementing SSL based encryption with both client authenticating to
server and server authenticating to client using self signed certificate.
iSCSI or NBD would be more standards compliant, I suppose.

My high level goal is to make this a natural solution for Infrastructure
As A Cloud environments. I am looking carefully at integrating the
management of the Livebackup function into Openstack.

I would like to help in any way I can to make KVM be the *best* VM
technology for IaaS clouds.

Thanks,
Jagane
Dor Laor
2011-05-15 21:14:25 UTC
Permalink
Post by Jagane Sundar
Post by Jes Sorensen
Post by Dor Laor
* We need Marcelo's new (to come) block copy implementation
* should work in parallel to migration and hotplug
* General copy on read is desirable
* Live snapshot merge to be implemented using block copy
* Need to utilize a remote block access protocol (iscsi/nbd/other)
Which one is the best?
* Keep qemu-img the single interface for dirty block mappings.
* Live block migration pre copy == live copy + block access protocol
+ live migration
* Live block migration post copy == live migration + block access
protocol/copy on read.
Comments?
I think we should add Jagane Sundar's Livebackup to the watch list here.
It looks very interesting as an alternative way to reach some of the
same goals.
Cheers,
Jes
Thanks for the intro, Jes. I am very interested in garnering support for
Livebackup.
You are correct in that Livebackup solves some, but not all, problems in
the same space.
Some comments about my code: It took me about two months of development
before I connected with you on the list.
Initially, I started off by doing a dynamic block transfer such that
fewer and fewer blocks are dirty till there are no more dirty blocks and
we declare the backup complete. The problem with this approach was that
there was no real way to plug in a guest file system quiesce function. I
then moved on to the snapshot technique. With this snapshot technique I
am also able to test the livebackup function very thoroughly - I use a
technique where I create a LVM snapshot simultaneously, and do a cmp of
the LVM snapshot and the livebackup backup image.
With this mode of testing, I am very confident of the integrity of my
solution.
I chose to invent a new protocol that is very simple, and custom to
livebackup, because I needed livebackup specific functions such as
'create snapshot', 'delete snapshot', etc. Also, I am currently
implementing SSL based encryption with both client authenticating to
server and server authenticating to client using self signed certificate.
iSCSI or NBD would be more standards compliant, I suppose.
+1 that iScsi/NBD have better potential.
Post by Jagane Sundar
My high level goal is to make this a natural solution for Infrastructure
As A Cloud environments. I am looking carefully at integrating the
management of the Livebackup function into Openstack.
One important advantage of live snapshot over live backup is support of
multiple (consecutive) live snapshots while there can be only a single
live backup at one time.

This is why I tend to think that although live backup carry some benefit
(no merge required), the live snapshot + live merge are more robust
mechanism.
Post by Jagane Sundar
I would like to help in any way I can to make KVM be the *best* VM
technology for IaaS clouds.
:)
Post by Jagane Sundar
Thanks,
Jagane
Jagane Sundar
2011-05-15 21:38:26 UTC
Permalink
Hello Dor,
Post by Dor Laor
One important advantage of live snapshot over live backup is support of
multiple (consecutive) live snapshots while there can be only a single
live backup at one time.
This is why I tend to think that although live backup carry some benefit
(no merge required), the live snapshot + live merge are more robust
mechanism.
The two things that concern me regarding the
live snapshot/live merge approach are:
1. Performance considerations of having
multiple active snapshots?
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.

The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.

One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.

Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.

Thanks,
Jagane

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Dor Laor
2011-05-16 07:53:40 UTC
Permalink
Post by Jagane Sundar
Hello Dor,
Post by Dor Laor
One important advantage of live snapshot over live backup is support of
multiple (consecutive) live snapshots while there can be only a single
live backup at one time.
This is why I tend to think that although live backup carry some benefit
(no merge required), the live snapshot + live merge are more robust
mechanism.
The two things that concern me regarding the
1. Performance considerations of having
multiple active snapshots?
My description above was in accurate and I only hinted that multiple
snapshots are possible but they are done consecutively -
Live snapshot takes practically almost no time - just the time to get
the guest virtagent to freeze the guest FS and to create the snapshot
(for qcow2 is immediate).

So if you like to have multiple snapshot, let's say 5 minute after you
issued the first snapshot, there is no problem.

The new writes will go to the snapshot while the former base is marked
as read only.
Eventually you like to (live) merge the snapshots together. This can be
done in any point in time.
Post by Jagane Sundar
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.
Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.
Post by Jagane Sundar
The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.
Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.
In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.
Post by Jagane Sundar
One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.
Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.
Both solutions can serve for the same scenario:
With live snapshot the backup is done the following:

1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
There are multiple ways for doing that -
1. Use qemu-img and get the dirty blocks from former backup.
- Currently qemu-img does not support it.
- Nevertheless, such mechanism will work for lvm, btrfs, NetApp
2. Mount the s0 image to another guest that runs traditional backup
software at the file system level and let it do the backup.
4. Live merge s1->s0
We'll use live copy for that so each write is duplicated (like your
live backup solution).
5. Delete s1

As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.
Post by Jagane Sundar
Thanks,
Jagane
Jagane Sundar
2011-05-16 08:23:35 UTC
Permalink
Hello Dor,

Let me see if I understand live snapshot correctly:
If I want to configure a VM for daily backup, then I would do
the following:
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
copy over the snapshot s1, which is the incremental
backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
I would live merge s1 with s0, to create a new merged
read-only image s1'.
- On day 2, I would create a new snapshot s3, then
copy over s2, which is the incremental backup from
s1' to s2
- And so on...

With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?

If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?

Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?
Post by Dor Laor
Post by Jagane Sundar
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.
Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.
What happens when a VM host reboots while a live merge of s0
and s1 is being done?
Post by Dor Laor
Post by Jagane Sundar
The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.
Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.
In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.
Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.
Post by Dor Laor
Post by Jagane Sundar
One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.
Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.
1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
There are multiple ways for doing that -
1. Use qemu-img and get the dirty blocks from former backup.
- Currently qemu-img does not support it.
- Nevertheless, such mechanism will work for lvm, btrfs, NetApp
2. Mount the s0 image to another guest that runs traditional backup
software at the file system level and let it do the backup.
4. Live merge s1->s0
We'll use live copy for that so each write is duplicated (like your
live backup solution).
5. Delete s1
As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.
As I explained at the head of this email, I believe that live snapshot
results in the VM read I/O paying a high penalty during normal operation
of the VM, whereas Livebackup results in this penalty being paid only
during the backup dirty block transfer operation.

Finally, I would like to bring up considerations of disk space. To expand on
my use case further, consider a Cloud Compute service with 100 VMs
running on a host. If live snapshot is used to create snapshot COW files,
then potentially each VM could grow the COW snapshot file to the size
of the base file, which means the VM host needs to reserve space for
the snapshot that equals the size of the VMs - i.e. a 8GB VM would
require an additional 8GB of space to be reserved for the snapshot,
so that the service provider could safely guarantee that the snapshot
will not run out of space.
Contrast this with livebackup, wherein the COW files are kept only when
the dirty block transfers are being done. This means that for a host with
100 VMs, if the backup server is connecting to each of the 100 qemu's
one by one and doing a livebackup, the service provider would need
to provision spare disk for at most the COW size of one VM.

Thanks,
Jagane
Dor Laor
2011-05-17 22:53:32 UTC
Permalink
Post by Jagane Sundar
Hello Dor,
If I want to configure a VM for daily backup, then I would do
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
copy over the snapshot s1, which is the incremental
backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
I would live merge s1 with s0, to create a new merged
read-only image s1'.
- On day 2, I would create a new snapshot s3, then
copy over s2, which is the incremental backup from
s1' to s2
- And so on...
With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?
No and yes ;-)

For regular non incremental backup you can have no snapshot active most
times:

- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- Once backup is finished, live merge s1 into s0 and make s0 writeable
again.

So this way there are no performance penalty here.
Here we need an option to track dirty block bits (either as internal
format or external file). This will be both efficient and get the job done.

But in order to be efficient in storage we'll need to ask the snapshot
creation to only refer to these dirt blocks.
Well, thinking out load, it turned out to your solution :)

Ok, I do see the value there is with incremental backups.

I'm aware that there were requirements that the backup software itself
will be done from the guest filesystem level, there incremental backup
would be done on the FS layer.

Still I do see the value in your solution.

Another option for us would be to keep the latest snapshots around and
and let the guest IO go through them all the time. There is some
performance cost but as the newer image format develop, this cost is
relatively very low.
Post by Jagane Sundar
If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?
Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?
Post by Dor Laor
Post by Jagane Sundar
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.
Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.
What happens when a VM host reboots while a live merge of s0
and s1 is being done?
Live merge is using live copy that does duplicates each write IO.
On a host crash, the merge will continue from the same point where it
stopped.

I think I answered the your other good comments above.
Thanks,
Dor
Post by Jagane Sundar
Post by Dor Laor
Post by Jagane Sundar
The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.
Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.
In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.
Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.
Post by Dor Laor
Post by Jagane Sundar
One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.
Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.
1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
There are multiple ways for doing that -
1. Use qemu-img and get the dirty blocks from former backup.
- Currently qemu-img does not support it.
- Nevertheless, such mechanism will work for lvm, btrfs, NetApp
2. Mount the s0 image to another guest that runs traditional backup
software at the file system level and let it do the backup.
4. Live merge s1->s0
We'll use live copy for that so each write is duplicated (like your
live backup solution).
5. Delete s1
As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.
As I explained at the head of this email, I believe that live snapshot
results in the VM read I/O paying a high penalty during normal operation
of the VM, whereas Livebackup results in this penalty being paid only
during the backup dirty block transfer operation.
Finally, I would like to bring up considerations of disk space. To expand on
my use case further, consider a Cloud Compute service with 100 VMs
running on a host. If live snapshot is used to create snapshot COW files,
then potentially each VM could grow the COW snapshot file to the size
of the base file, which means the VM host needs to reserve space for
the snapshot that equals the size of the VMs - i.e. a 8GB VM would
require an additional 8GB of space to be reserved for the snapshot,
so that the service provider could safely guarantee that the snapshot
will not run out of space.
Contrast this with livebackup, wherein the COW files are kept only when
the dirty block transfers are being done. This means that for a host with
100 VMs, if the backup server is connecting to each of the 100 qemu's
one by one and doing a livebackup, the service provider would need
to provision spare disk for at most the COW size of one VM.
Thanks,
Jagane
Jagane Sundar
2011-05-18 15:49:50 UTC
Permalink
Hello Dor,

I'm glad I could convince you of the value of Livebackup. I
think Livesnapshot/Livemerge, Livebackup and Block
Migration all have very interesting use cases. For example:

- Livesnapshot/Livemerge is very useful in development/QA
environments where one might want to create a snapshot
before trying out some new software and then committing.
- Livebackup is useful in cloud environments where the
Cloud Service Provider may want to offer regularly scheduled
backed up VMs with no effort on the part of the customer
- Block Migration with COR is useful in Cloud Service provider
environments where an arbitrary VM may need to be
migrated over to another VM server, even though the VM
is on direct attached storage.

The above is by no means an exhaustive list of use cases. I
am sure qemu/qemu-kvm users can come up with more.

Although there are some common concepts in these three
technologies, I think we should support all three in base
qemu. This would make qemu/qemu-kvm more feature rich
than vmware, xen and hyper-v.

Thanks,
Jagane
Post by Dor Laor
Post by Jagane Sundar
Hello Dor,
If I want to configure a VM for daily backup, then I would do
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
copy over the snapshot s1, which is the incremental
backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
I would live merge s1 with s0, to create a new merged
read-only image s1'.
- On day 2, I would create a new snapshot s3, then
copy over s2, which is the incremental backup from
s1' to s2
- And so on...
With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?
No and yes ;-)
For regular non incremental backup you can have no snapshot active most
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- Once backup is finished, live merge s1 into s0 and make s0 writeable
again.
So this way there are no performance penalty here.
Here we need an option to track dirty block bits (either as internal
format or external file). This will be both efficient and get the job done.
But in order to be efficient in storage we'll need to ask the snapshot
creation to only refer to these dirt blocks.
Well, thinking out load, it turned out to your solution :)
Ok, I do see the value there is with incremental backups.
I'm aware that there were requirements that the backup software itself
will be done from the guest filesystem level, there incremental backup
would be done on the FS layer.
Still I do see the value in your solution.
Another option for us would be to keep the latest snapshots around and
and let the guest IO go through them all the time. There is some
performance cost but as the newer image format develop, this cost is
relatively very low.
Post by Jagane Sundar
If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?
Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?
Post by Dor Laor
Post by Jagane Sundar
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.
Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.
What happens when a VM host reboots while a live merge of s0
and s1 is being done?
Live merge is using live copy that does duplicates each write IO.
On a host crash, the merge will continue from the same point where it
stopped.
I think I answered the your other good comments above.
Thanks,
Dor
Post by Jagane Sundar
Post by Dor Laor
Post by Jagane Sundar
The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.
Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.
In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.
Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.
Post by Dor Laor
Post by Jagane Sundar
One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.
Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.
1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
There are multiple ways for doing that -
1. Use qemu-img and get the dirty blocks from former backup.
- Currently qemu-img does not support it.
- Nevertheless, such mechanism will work for lvm, btrfs, NetApp
2. Mount the s0 image to another guest that runs traditional backup
software at the file system level and let it do the backup.
4. Live merge s1->s0
We'll use live copy for that so each write is duplicated (like your
live backup solution).
5. Delete s1
As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.
As I explained at the head of this email, I believe that live snapshot
results in the VM read I/O paying a high penalty during normal operation
of the VM, whereas Livebackup results in this penalty being paid only
during the backup dirty block transfer operation.
Finally, I would like to bring up considerations of disk space. To expand on
my use case further, consider a Cloud Compute service with 100 VMs
running on a host. If live snapshot is used to create snapshot COW files,
then potentially each VM could grow the COW snapshot file to the size
of the base file, which means the VM host needs to reserve space for
the snapshot that equals the size of the VMs - i.e. a 8GB VM would
require an additional 8GB of space to be reserved for the snapshot,
so that the service provider could safely guarantee that the snapshot
will not run out of space.
Contrast this with livebackup, wherein the COW files are kept only when
the dirty block transfers are being done. This means that for a host with
100 VMs, if the backup server is connecting to each of the 100 qemu's
one by one and doing a livebackup, the service provider would need
to provision spare disk for at most the COW size of one VM.
Thanks,
Jagane
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Hajnoczi
2011-05-20 12:19:54 UTC
Permalink
I'm interested in what the API for snapshots would look like.
Specifically how does user software do the following:
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
5. Restore a VM from a snapshot
6. Get the dirty blocks list (for incremental backup)

We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
implemented:
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots

It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.

Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support? I will take a stab at it if no
one else want to try it.

Stefan
Jes Sorensen
2011-05-20 12:39:28 UTC
Permalink
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
I presume you're talking external snapshots here? The API is really what
should be defined by libvirt, so you get a unified API that can work
both on QEMU level snapshots as well as enterprise storage, host file
system snapshots etc.
Post by Stefan Hajnoczi
1. Create a snapshot
There's a QMP patch out already that is still not applied, but it is
pretty simple, similar to the hmp command.

Alternatively you can do it the evil way by pre-creating the snapshot
image file and feeding that the snapshot command. In this case QEMU
won't create the snapshot file.
Post by Stefan Hajnoczi
2. Delete a snapshot
This is still to be defined.
Post by Stefan Hajnoczi
3. List snapshots
Again this is tricky as it depends on the type of snapshot. For QEMU
level ones they are files, so 'ls' is your friend :)
Post by Stefan Hajnoczi
4. Access data from a snapshot
You boot the snapshot file.
Post by Stefan Hajnoczi
5. Restore a VM from a snapshot
We're talking snapshots not checkpointing here, so you cannot restore a
VM from a snapshot.
Post by Stefan Hajnoczi
6. Get the dirty blocks list (for incremental backup)
Good question
Post by Stefan Hajnoczi
We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots
It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.
Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support? I will take a stab at it if no
one else want to try it.
I believe the libvirt guys are already looking at this. Adding to the CC
list.

Cheers,
Jes
Stefan Hajnoczi
2011-05-20 12:49:22 UTC
Permalink
Post by Jes Sorensen
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
I presume you're talking external snapshots here? The API is really what
should be defined by libvirt, so you get a unified API that can work
both on QEMU level snapshots as well as enterprise storage, host file
system snapshots etc.
Thanks for the pointers on external snapshots using image files. I'm
really thinking about the libvirt API.

Basically I'm not sure we'll implement the right things if we don't
think through the API that the user sees first.

Stefan
Jes Sorensen
2011-05-20 12:56:48 UTC
Permalink
Post by Stefan Hajnoczi
Post by Jes Sorensen
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
I presume you're talking external snapshots here? The API is really what
should be defined by libvirt, so you get a unified API that can work
both on QEMU level snapshots as well as enterprise storage, host file
system snapshots etc.
Thanks for the pointers on external snapshots using image files. I'm
really thinking about the libvirt API.
Basically I'm not sure we'll implement the right things if we don't
think through the API that the user sees first.
Right, I agree. There's a lot of variables there, and they are not
necessarily easy to map into a single namespace. I am not sure it should
be done either......

Cheers,
Jes
Dor Laor
2011-05-22 09:52:09 UTC
Permalink
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
There are plenty of options there:
- Run a (unrelated) VM and hotplug the snapshot as additional disk
- Use v2v (libguestfs)
- Boot the VM w/ RO
- Plenty more
Post by Stefan Hajnoczi
5. Restore a VM from a snapshot
6. Get the dirty blocks list (for incremental backup)
It might be needed for additional proposes like efficient delta sync
across sites or any other storage operation (dedup, etc)
Post by Stefan Hajnoczi
We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots
It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.
I agree but it can also be a chicken and the egg problem.
Actually 1/2/3/5 are already working today regardless of live snapshots.
Post by Stefan Hajnoczi
Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support? I will take a stab at it if no
one else want to try it.
I added libvirt-list and Ayal Baron from vdsm.
What you're asking is even beyond snapshots, it's the whole management
of VM images. Doing the above operations is simple but for enterprise
virtualization solution you'll need to lock the NFS/SAN images, handle
failures of VM/SAN/Mgmt, keep the snapshots info in mgmt DB, etc.

Today it is managed by a combination of rhev-m/vdsm and libvirt.
I agree it would have been nice to get such common single entry point
interface.
Post by Stefan Hajnoczi
Stefan
Stefan Hajnoczi
2011-05-23 13:02:31 UTC
Permalink
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
 - Run a (unrelated) VM and hotplug the snapshot as additional disk
This is the backup appliance VM model and makes it possible to move
the backup application to where the data is (or not, if you have a SAN
and decide to spin up the appliance VM on another host). This should
be perfectly doable if snapshots are "volumes" at the libvirt level.

A special-case of the backup appliance VM is using libguestfs to
access the snapshot from the host. This includes both block-level and
file system-level access along with OS detection APIs that libguestfs
provides.

If snapshots are "volumes" at the libvirt level, then it is also
possible to use virStorageVolDownload() to stream the entire snapshot
through libvirt:
http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload

Summarizing, here are three access methods that integrate with libvirt
and cover many use cases:

1. Backup appliance VM. Add a readonly snapshot volume to a backup
appliance VM. If shared storage (e.g. SAN) is available then the
appliance can be run on any host. Otherwise the appliance must run on
the same host that the snapshot resides on.

2. Libguestfs client on host. Launch libguestfs with the readonly
snapshot volume. The backup application runs directly on the host, it
has both block and file system access to the snapshot.

3. Download the snapshot to a remote host for backup processing. Use
the virStorageVolDownload() API to download the snapshot onto a
libvirt client machine. Dirty block tracking is still useful here
since the virStorageVolDownload() API supports <offset, length>
arguments.
Post by Stefan Hajnoczi
5. Restore a VM from a snapshot
Simplest option: virStorageVolUpload().
Post by Stefan Hajnoczi
6. Get the dirty blocks list (for incremental backup)
It might be needed for additional proposes like efficient delta sync across
sites or any other storage operation (dedup, etc)
Post by Stefan Hajnoczi
We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
1. Image format - image file snapshot (Jes, Jagane)
2. Host file system - ext4 and btrfs snapshots
3. Storage system - LVM or SAN volume snapshots
It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.
I agree but it can also be a chicken and the egg problem.
Actually 1/2/3/5 are already working today regardless of live snapshots.
Post by Stefan Hajnoczi
Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support?  I will take a stab at it if no
one else want to try it.
I added libvirt-list and Ayal Baron from vdsm.
What you're asking is even beyond snapshots, it's the whole management of VM
images. Doing the above operations is simple but for enterprise
virtualization solution you'll need to lock the NFS/SAN images, handle
failures of VM/SAN/Mgmt, keep the snapshots info in mgmt DB, etc.
Today it is managed by a combination of rhev-m/vdsm and libvirt.
I agree it would have been nice to get such common single entry point
interface.
Okay, the user API seems to be one layer above libvirt.

Stefan
Stefan Hajnoczi
2011-05-27 16:46:49 UTC
Permalink
Post by Stefan Hajnoczi
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
 - Run a (unrelated) VM and hotplug the snapshot as additional disk
This is the backup appliance VM model and makes it possible to move
the backup application to where the data is (or not, if you have a SAN
and decide to spin up the appliance VM on another host).  This should
be perfectly doable if snapshots are "volumes" at the libvirt level.
A special-case of the backup appliance VM is using libguestfs to
access the snapshot from the host.  This includes both block-level and
file system-level access along with OS detection APIs that libguestfs
provides.
If snapshots are "volumes" at the libvirt level, then it is also
possible to use virStorageVolDownload() to stream the entire snapshot
http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload
Summarizing, here are three access methods that integrate with libvirt
1. Backup appliance VM.  Add a readonly snapshot volume to a backup
appliance VM.  If shared storage (e.g. SAN) is available then the
appliance can be run on any host.  Otherwise the appliance must run on
the same host that the snapshot resides on.
2. Libguestfs client on host.  Launch libguestfs with the readonly
snapshot volume.  The backup application runs directly on the host, it
has both block and file system access to the snapshot.
3. Download the snapshot to a remote host for backup processing.  Use
the virStorageVolDownload() API to download the snapshot onto a
libvirt client machine.  Dirty block tracking is still useful here
since the virStorageVolDownload() API supports <offset, length>
arguments.
Jagane,
What do you think about these access methods? What does your custom
protocol integrate with today - do you have a custom non-libvirt KVM
management stack?

Stefan
Jagane Sundar
2011-05-27 17:16:31 UTC
Permalink
Post by Stefan Hajnoczi
Post by Stefan Hajnoczi
Post by Dor Laor
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
1. Create a snapshot
2. Delete a snapshot
3. List snapshots
4. Access data from a snapshot
- Run a (unrelated) VM and hotplug the snapshot as additional disk
This is the backup appliance VM model and makes it possible to move
the backup application to where the data is (or not, if you have a SAN
and decide to spin up the appliance VM on another host). This should
be perfectly doable if snapshots are "volumes" at the libvirt level.
A special-case of the backup appliance VM is using libguestfs to
access the snapshot from the host. This includes both block-level and
file system-level access along with OS detection APIs that libguestfs
provides.
If snapshots are "volumes" at the libvirt level, then it is also
possible to use virStorageVolDownload() to stream the entire snapshot
http://libvirt.org/html/libvirt-libvirt.html#virStorageVolDownload
Summarizing, here are three access methods that integrate with libvirt
1. Backup appliance VM. Add a readonly snapshot volume to a backup
appliance VM. If shared storage (e.g. SAN) is available then the
appliance can be run on any host. Otherwise the appliance must run on
the same host that the snapshot resides on.
2. Libguestfs client on host. Launch libguestfs with the readonly
snapshot volume. The backup application runs directly on the host, it
has both block and file system access to the snapshot.
3. Download the snapshot to a remote host for backup processing. Use
the virStorageVolDownload() API to download the snapshot onto a
libvirt client machine. Dirty block tracking is still useful here
since the virStorageVolDownload() API supports<offset, length>
arguments.
Jagane,
What do you think about these access methods? What does your custom
protocol integrate with today - do you have a custom non-libvirt KVM
management stack?
Stefan
Hello Stefan,

The current livebackup_client simply creates a backup of the VM on the
backup server. It can save the backup image as a complete image for
quick start of the VM on the backup server, or as 'full + n number of
incremental backup redo files'. The 'full + n incremental redo' is useful
if you want to store the backup on tape.

I don't have a full backup management stack yet. If livebackup_client
were available as part of kvm, then that would turn into the
command line utility that the backup management stack would use.
My own intertest is in using livebackup_client to integrate all
management functions into openstack. All management built
into openstack will be built with the intent of self service.
However, other Enterprise backup management stacks such as that
from Symantec, etc. can be enhanced to use livebackup_client to
extract the backup from the VM Host.

How does it apply to the above access mechanisms. Hmm. Let me see.

1. Backup appliance VM. : A backup appliance VM can be started
up and the livebackup images can be connected to it. The
limitation is that the backup appliance VM must be started up
on the backup server, where the livebackup image resides on
a local disk.

2. Libguestfs client on host. This too is possible. The
restriction is that libguestfs must be on the backup
server, and not on the VM Host.

3. Download the snapshot to a remote host for backup processing.
This is the native method for livebackup.


Thanks,
Jagane

Jagane Sundar
2011-05-23 05:42:58 UTC
Permalink
Hello Stefan,

I have been thinking about this since you sent out this message.
A quick look at the libvirt API indicates that their notion of a
snapshot often refers to a "disk+memory snapshot". It would
be good to provide feedback to the libvirt developers to make
sure that proper support for a 'disk only snapshot' capability is
included.

You might have already seen this, but here's a email chain from
the libvirt mailing list that's relevant:

http://www.redhat.com/archives/libvir-list/2010-March/msg01389.html

I am very interested in enhancing libvirt to support
the Livebackup semantics, for the following reason:
If libvirt can be enhanced to support all the constructs
required for full Livebackup functionality, then I would like to
remove the built-in livebackup network protocol, and rewrite
the client such that it is a native program on the VM host linked
with libvirt, and can perform a full or incremental backup using
libvirt. If a remote backup needs to be performed, then I would
require the remote client to ssh into the VM host, and then
run the local backup and pipe back to the remote backup host.
This way I would not need to deal with authentication of
livebackup client and server, and encryption of the network
connection.
Post by Stefan Hajnoczi
I'm interested in what the API for snapshots would look like.
1. Create a snapshot
For livebackup, one parameter that is required is the 'full' or
'incremental' backup parameter. If the param is 'incremental'
then only the blocks that were modified since the last snapshot
command was issued are part of the snapshot. If the param
is 'full', the the snapshot includes all the blocks of all the disks
in the VM.
Post by Stefan Hajnoczi
2. Delete a snapshot
Simple for livebackup, since no more than one snapshot is
allowed. Hence naming is a non-issue. As is deleting.
Post by Stefan Hajnoczi
3. List snapshots
Again, simple for livebackup, on account of the one
active snapshot restriction.
Post by Stefan Hajnoczi
4. Access data from a snapshot
In traditional terms, access could mean many
things. Some examples:
1. Access lists a set of files on the local
file system of the VM Host. A small VM
may be started up, and mount these
snapshot files as a set of secondary drives
2. Publish the snapshot drives as iSCSI LUNs.
3. If the origin drives are on a Netapp filer,
perhaps a filer snapshot is created, and
a URL describing that snapshot is printed
out.

Access, in Livebackup terms, is merely copying
dirty blocks over from qemu. Livebackup does
not provide a random access mode - i.e. one
where a VM could be started using the snapshot.

Currently, Livebackup uses 4K clusters of 512 byte
blocks. 'Dirty clusters' are transferred over by the
client supplying a 'cluster number' param, and qemu
returning the next 'n' number of contiguous dirty
clusters. At the end, qemu returns a 'no-more-dirty'
error.
Post by Stefan Hajnoczi
5. Restore a VM from a snapshot
Additional info for re-creating the VM needs to be
saved when a snapshot is saved. The origin VM's
libvirt XML desciptor should probably be saved
along with the snapshot.
Post by Stefan Hajnoczi
6. Get the dirty blocks list (for incremental backup)
Either a complete dump of the dirty blocks, or a way
to iterate through the dirty blocks and fetch them
needs to be provided. My preference is to use the
iterate through the dirty blocks approach, since
that will enable the client to pace the backup
process and provide guarantees such as 'no more
than 10% of the network b/w will be utilized for
backup'.
Post by Stefan Hajnoczi
We've discussed image format-level approaches but I think the scope of
the API should cover several levels at which snapshots are
1. Image format - image file snapshot (Jes, Jagane)
Livebackup uses qcow2 to save the Copy-On-Write blocks
that are dirtied by the VM when the snapshot is active.
Post by Stefan Hajnoczi
2. Host file system - ext4 and btrfs snapshots
I have tested with ext4 and raw LVM volumes for the origin
virtual disk files. The qcow2 COW files have only resided on
ext4.
Post by Stefan Hajnoczi
3. Storage system - LVM or SAN volume snapshots
It will be hard to take advantage of more efficient host file system
or storage system snapshots if they are not designed in now.
I agree. A snapshot and restore from backup should not result in
the virtual disk file getting inflated (going from sparse to fully
allocated, for example).
Post by Stefan Hajnoczi
Is anyone familiar enough with the libvirt storage APIs to draft an
extension that adds snapshot support? I will take a stab at it if no
one else want to try it.
I have only looked at it briefly, after getting your email message.
If you can take a deeper look at it, I would be willing to work with
you to iron out details.

Thanks,
Jagane
Loading...