Discussion:
[PATCH v6 for 2.1 02/10] block: add helper function to determine if a BDS is in a chain
(too old to reply)
Jeff Cody
2014-06-17 21:53:50 UTC
Permalink
This is a small helper function, to determine if 'base' is in the
chain of BlockDriverState 'top'. It returns true if it is in the chain,
and false otherwise.

If either argument is NULL, it will also return false.

Reviewed-by: Benoit Canet <***@irqsave.net>
Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
block.c | 11 +++++++++++
include/block/block.h | 1 +
2 files changed, 12 insertions(+)

diff --git a/block.c b/block.c
index da32bb0..280a167 100644
--- a/block.c
+++ b/block.c
@@ -3819,6 +3819,17 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
return NULL;
}

+/* If 'base' is in the same chain as 'top', return true. Otherwise,
+ * return false. If either argument is NULL, return false. */
+bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
+{
+ while (top && top != base) {
+ top = top->backing_hd;
+ }
+
+ return top != NULL;
+}
+
BlockDriverState *bdrv_next(BlockDriverState *bs)
{
if (!bs) {
diff --git a/include/block/block.h b/include/block/block.h
index f15b99b..c60bd52 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -403,6 +403,7 @@ BlockDeviceInfoList *bdrv_named_nodes_list(void);
BlockDriverState *bdrv_lookup_bs(const char *device,
const char *node_name,
Error **errp);
+bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
BlockDriverState *bdrv_next(BlockDriverState *bs);
void bdrv_iterate(void (*it)(void *opaque, BlockDriverState *bs),
void *opaque);
--
1.9.3
Jeff Cody
2014-06-17 21:53:49 UTC
Permalink
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.

If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.

If a node name is specified, then it will not be automatically
generated for that BDS entry.

If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
strings:
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG

The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.

Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 43abe96..da32bb0 100644
--- a/block.c
+++ b/block.c
@@ -843,12 +843,26 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags)
return open_flags;
}

+#define GEN_NODE_NAME_PREFIX "__qemu##"
+#define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8)
static void bdrv_assign_node_name(BlockDriverState *bs,
const char *node_name,
Error **errp)
{
+ char gen_node_name[GEN_NODE_NAME_MAX_LEN];
+ static uint32_t counter; /* simple counter to guarantee uniqueness */
+
+ /* if node_name is NULL, auto-generate a node name */
if (!node_name) {
- return;
+ int len;
+ snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN,
+ "%s%08x", GEN_NODE_NAME_PREFIX, counter++);
+ len = strlen(gen_node_name);
+ while (len < GEN_NODE_NAME_MAX_LEN - 1) {
+ gen_node_name[len++] = g_random_int_range('A', 'Z');
+ }
+ gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0';
+ node_name = gen_node_name;
}

/* empty string node name is invalid */
--
1.9.3
Benoît Canet
2014-06-18 12:53:15 UTC
Permalink
Post by Jeff Cody
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
If a node name is specified, then it will not be automatically
generated for that BDS entry.
If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG
Jeff can't we simply enforce the namespace separation with a check on the QDict
option content ?
This way we could be sure that the user can't input a node-name starting with
__qemu.
Post by Jeff Cody
The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 43abe96..da32bb0 100644
--- a/block.c
+++ b/block.c
@@ -843,12 +843,26 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags)
return open_flags;
}
+#define GEN_NODE_NAME_PREFIX "__qemu##"
+#define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8)
static void bdrv_assign_node_name(BlockDriverState *bs,
const char *node_name,
Error **errp)
{
+ char gen_node_name[GEN_NODE_NAME_MAX_LEN];
+ static uint32_t counter; /* simple counter to guarantee uniqueness */
+
+ /* if node_name is NULL, auto-generate a node name */
if (!node_name) {
- return;
+ int len;
+ snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN,
+ "%s%08x", GEN_NODE_NAME_PREFIX, counter++);
+ len = strlen(gen_node_name);
+ while (len < GEN_NODE_NAME_MAX_LEN - 1) {
+ gen_node_name[len++] = g_random_int_range('A', 'Z');
+ }
+ gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0';
+ node_name = gen_node_name;
}
/* empty string node name is invalid */
--
1.9.3
Jeff Cody
2014-06-18 13:13:28 UTC
Permalink
Post by Benoît Canet
Post by Jeff Cody
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
If a node name is specified, then it will not be automatically
generated for that BDS entry.
If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG
Jeff can't we simply enforce the namespace separation with a check on the QDict
option content ?
This way we could be sure that the user can't input a node-name starting with
__qemu.
That still would not stop a user from trying to 'predict' or assuming
what a node name would be ("oh, it is the first drive, it is probably
__qemu##0000", etc...). Having the combination of the incrementing
counter and the random string generation guarantees 2 things: it will
always be unique in a qemu session, and it is not predictable by the
user. The "__qemu##" just helps to visually identify it as a qemu
generated.

Although if you are strictly concerned about namespace confusion, we
could enforce the namespace as you suggest, so a user could not create
a node-name that would look like a qemu-generated node-name. Even in
that case, I would still want to keep the sequential number + random
string.
Post by Benoît Canet
Post by Jeff Cody
The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 43abe96..da32bb0 100644
--- a/block.c
+++ b/block.c
@@ -843,12 +843,26 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags)
return open_flags;
}
+#define GEN_NODE_NAME_PREFIX "__qemu##"
+#define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8)
static void bdrv_assign_node_name(BlockDriverState *bs,
const char *node_name,
Error **errp)
{
+ char gen_node_name[GEN_NODE_NAME_MAX_LEN];
+ static uint32_t counter; /* simple counter to guarantee uniqueness */
+
+ /* if node_name is NULL, auto-generate a node name */
if (!node_name) {
- return;
+ int len;
+ snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN,
+ "%s%08x", GEN_NODE_NAME_PREFIX, counter++);
+ len = strlen(gen_node_name);
+ while (len < GEN_NODE_NAME_MAX_LEN - 1) {
+ gen_node_name[len++] = g_random_int_range('A', 'Z');
+ }
+ gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0';
+ node_name = gen_node_name;
}
/* empty string node name is invalid */
--
1.9.3
Benoît Canet
2014-06-18 13:31:13 UTC
Permalink
Post by Jeff Cody
Post by Benoît Canet
Post by Jeff Cody
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
If a node name is specified, then it will not be automatically
generated for that BDS entry.
If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG
Jeff can't we simply enforce the namespace separation with a check on the QDict
option content ?
This way we could be sure that the user can't input a node-name starting with
__qemu.
That still would not stop a user from trying to 'predict' or assuming
what a node name would be ("oh, it is the first drive, it is probably
__qemu##0000", etc...). Having the combination of the incrementing
counter and the random string generation guarantees 2 things: it will
always be unique in a qemu session, and it is not predictable by the
user. The "__qemu##" just helps to visually identify it as a qemu
generated.
Although if you are strictly concerned about namespace confusion, we
could enforce the namespace as you suggest, so a user could not create
a node-name that would look like a qemu-generated node-name. Even in
that case, I would still want to keep the sequential number + random
string.
This way is fine for me.
Post by Jeff Cody
Post by Benoît Canet
Post by Jeff Cody
The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 43abe96..da32bb0 100644
--- a/block.c
+++ b/block.c
@@ -843,12 +843,26 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags)
return open_flags;
}
+#define GEN_NODE_NAME_PREFIX "__qemu##"
+#define GEN_NODE_NAME_MAX_LEN (sizeof(GEN_NODE_NAME_PREFIX) + 8 + 8)
static void bdrv_assign_node_name(BlockDriverState *bs,
const char *node_name,
Error **errp)
{
+ char gen_node_name[GEN_NODE_NAME_MAX_LEN];
+ static uint32_t counter; /* simple counter to guarantee uniqueness */
+
+ /* if node_name is NULL, auto-generate a node name */
if (!node_name) {
- return;
+ int len;
+ snprintf(gen_node_name, GEN_NODE_NAME_MAX_LEN,
+ "%s%08x", GEN_NODE_NAME_PREFIX, counter++);
+ len = strlen(gen_node_name);
+ while (len < GEN_NODE_NAME_MAX_LEN - 1) {
+ gen_node_name[len++] = g_random_int_range('A', 'Z');
+ }
+ gen_node_name[GEN_NODE_NAME_MAX_LEN - 1] = '\0';
+ node_name = gen_node_name;
}
/* empty string node name is invalid */
--
1.9.3
Stefan Hajnoczi
2014-06-19 08:55:02 UTC
Permalink
Post by Jeff Cody
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
If a node name is specified, then it will not be automatically
generated for that BDS entry.
If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG
The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
Who is this feature for?

Human users: they'll need to read through query-named-block-nodes
output to find the nodes they care about. This is pretty cumbersome and
not human-friendly.

Management tools: parsing query-named-block-nodes isn't trivial since
the output can vary between QEMU versions (e.g. when we move I/O
throttling to a block driver node there will be new internal nodes).
Tools doing this should really use blockdev-add instead and assign their
own node names.

It seems like neither type of user will get much mileage out of this
feature. Is it really necessary or did I miss a use case?

Stefan
Jeff Cody
2014-06-19 12:30:19 UTC
Permalink
Post by Stefan Hajnoczi
Post by Jeff Cody
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
If a node name is specified, then it will not be automatically
generated for that BDS entry.
If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG
The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
Who is this feature for?
Human users: they'll need to read through query-named-block-nodes
output to find the nodes they care about. This is pretty cumbersome and
not human-friendly.
Currently, that is how a human user would find the node-names. That
doesn't mean there might not be a new interface later on, that is more
human friendly.

And while a human parsing query-named-block-nodes isn't fun, I think
it is easier than a human assigning node-names to a graph, so it is
more human-friendly than the current system.
Post by Stefan Hajnoczi
Management tools: parsing query-named-block-nodes isn't trivial since
the output can vary between QEMU versions (e.g. when we move I/O
throttling to a block driver node there will be new internal nodes).
Tools doing this should really use blockdev-add instead and assign their
own node names.
Libvirt (and OpenStack) is already testing with these patches, and my
impression from Eric is that parsing the output of
query-named-block-nodes was less work than assigning node-names in
libvirt.

Can you expand a bit on moving i/o throttle to a block, and creating
new internal nodes?

The generated node-names have the same life cycle as a specified
node-name; anything that would invalidate a generated node-name would
invalidate a specified node-name as well.

And if a QMP command is issued that would cause new nodes to be
assigned, I believe libvirt is aware that they need to perform a
query-named-block-nodes again.
Post by Stefan Hajnoczi
It seems like neither type of user will get much mileage out of this
feature. Is it really necessary or did I miss a use case?
Strictly speaking, it isn't required. But it makes sense for QEMU to
assign node-names to any unassigned node-names, because it does make
life easier for both humans and management software, and QEMU is the
only one that can always ensure that every BDS has a node-name.

It is also nice for QEMU; we can now in future versions assume that
every BDS will always have a node-name, regardless if it has been
assigned by the user or not.

And the usage of the node-names is strictly optional by the human or
management software user; neither is required to use the generated
node-names, and are feel to specify their own node-name. A user
specified node-name will prevent an auto-generated one from being
assigned for that specific BDS.
Eric Blake
2014-06-19 17:03:55 UTC
Permalink
Post by Jeff Cody
Post by Stefan Hajnoczi
Post by Jeff Cody
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
Who is this feature for?
Human users: they'll need to read through query-named-block-nodes
output to find the nodes they care about. This is pretty cumbersome and
not human-friendly.
Currently, that is how a human user would find the node-names. That
doesn't mean there might not be a new interface later on, that is more
human friendly.
And while a human parsing query-named-block-nodes isn't fun, I think
it is easier than a human assigning node-names to a graph, so it is
more human-friendly than the current system.
I tend to agree here - it's easier to have a node name always present
(even if I have to look it up, and even if the lookup is a bit painful)
than it is to do experiments with block commands, and then realize only
after the fact that I forgot to name a node. As a developer working on
a new algorithm in a management app for a particular sequence of chain
operations, I'd then know to fix up my code to add in explicit
node-naming earlier in the sequence anywhere that I had to use a
generated name later in the sequence, thanks to this visibility of a
node name through the human interfaces that I'm experimenting with.
Post by Jeff Cody
Post by Stefan Hajnoczi
Management tools: parsing query-named-block-nodes isn't trivial since
the output can vary between QEMU versions (e.g. when we move I/O
throttling to a block driver node there will be new internal nodes).
Tools doing this should really use blockdev-add instead and assign their
own node names.
Libvirt (and OpenStack) is already testing with these patches, and my
impression from Eric is that parsing the output of
query-named-block-nodes was less work than assigning node-names in
libvirt.
Actually (and some of this came from IRC) - libvirt isn't QUITE using
node-name yet. As long as a backing chain is ALL local files, or ALL
gluster, then libvirt's current approach of using file names (whether
relative or absolute) has so far been sufficient to get everything done
we need for the current features being added to libvirt. But yes,
definitely down the road, we plan on using node names. Why? Because it
is impossible to have a mixed-mode chain (some local, some gluster) and
still have a sane way to refer to all elements in that chain without
node names. It's just that it will be a long-term project that may take
several more months and refactorings in libvirt before we get there, so
we're not in a huge rush to have it supported directly in qemu 2.1.
Post by Jeff Cody
Can you expand a bit on moving i/o throttle to a block, and creating
new internal nodes?
The generated node-names have the same life cycle as a specified
node-name; anything that would invalidate a generated node-name would
invalidate a specified node-name as well.
And if a QMP command is issued that would cause new nodes to be
assigned, I believe libvirt is aware that they need to perform a
query-named-block-nodes again.
Yes - my goal with libvirt is to avoid generated names in all libvirt
interactions - but to have them as a nice fallback to show where I have
not yet met that goal.
Post by Jeff Cody
Post by Stefan Hajnoczi
It seems like neither type of user will get much mileage out of this
feature. Is it really necessary or did I miss a use case?
Strictly speaking, it isn't required. But it makes sense for QEMU to
assign node-names to any unassigned node-names, because it does make
life easier for both humans and management software, and QEMU is the
only one that can always ensure that every BDS has a node-name.
It is also nice for QEMU; we can now in future versions assume that
every BDS will always have a node-name, regardless if it has been
assigned by the user or not.
This is another nice aspect of the feature. Coding around optional
names is trickier than coding around something guaranteed to exist.
Post by Jeff Cody
And the usage of the node-names is strictly optional by the human or
management software user; neither is required to use the generated
node-names, and are feel to specify their own node-name. A user
specified node-name will prevent an auto-generated one from being
assigned for that specific BDS.
I'm still in favor of this patch, even if I hope to never take advantage
of it directly from libvirt.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Stefan Hajnoczi
2014-06-20 04:24:13 UTC
Permalink
Post by Jeff Cody
Post by Stefan Hajnoczi
Post by Jeff Cody
Currently, node_name is only filled in when done so explicitly by the
user. If no node_name is specified, then the node name field is not
populated.
If node_names are automatically generated when not specified, that means
that all block job operations can be done by reference to the unique
node_name field. This eliminates ambiguity in resolving filenames
(relative filenames, or file descriptors, symlinks, mounts, etc..) that
qemu currently needs to deal with.
If a node name is specified, then it will not be automatically
generated for that BDS entry.
If it is automatically generated, it will be prefaced with "__qemu##",
followed by 8 characters of a unique number, followed by 8 random
ASCII characters in the range of 'A-Z'. Some sample generated node-name
__qemu##00000000IAIYNXXR
__qemu##00000002METXTRBQ
__qemu##00000001FMBORDWG
The prefix is to aid in identifying it as a qemu-generated name, the
numeric portion is to guarantee uniqueness in a given qemu session, and
the random characters are to further avoid any accidental collisions
with user-specified node-names.
---
block.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
Who is this feature for?
Human users: they'll need to read through query-named-block-nodes
output to find the nodes they care about. This is pretty cumbersome and
not human-friendly.
Currently, that is how a human user would find the node-names. That
doesn't mean there might not be a new interface later on, that is more
human friendly.
And while a human parsing query-named-block-nodes isn't fun, I think
it is easier than a human assigning node-names to a graph, so it is
more human-friendly than the current system.
Post by Stefan Hajnoczi
Management tools: parsing query-named-block-nodes isn't trivial since
the output can vary between QEMU versions (e.g. when we move I/O
throttling to a block driver node there will be new internal nodes).
Tools doing this should really use blockdev-add instead and assign their
own node names.
Libvirt (and OpenStack) is already testing with these patches, and my
impression from Eric is that parsing the output of
query-named-block-nodes was less work than assigning node-names in
libvirt.
Can you expand a bit on moving i/o throttle to a block, and creating
new internal nodes?
Currently I/O throttling is integrated into block.c. This was done
because we had no clean way to add filter nodes (like I/O throttling,
compression, encryption, etc) on top of the format and protocol nodes.

Now we have almost reached the point where I/O throttling can be split
off into a BlockDriver. When the feature is enabled an I/O throttling
node will be added into the graph. When the feature is disabled, there
will be no I/O throttling node in the graph.

Stefan
Stefan Hajnoczi
2014-06-23 12:41:10 UTC
Permalink
Post by Jeff Cody
Post by Stefan Hajnoczi
It seems like neither type of user will get much mileage out of this
feature. Is it really necessary or did I miss a use case?
Strictly speaking, it isn't required. But it makes sense for QEMU to
assign node-names to any unassigned node-names, because it does make
life easier for both humans and management software, and QEMU is the
only one that can always ensure that every BDS has a node-name.
It is also nice for QEMU; we can now in future versions assume that
every BDS will always have a node-name, regardless if it has been
assigned by the user or not.
And the usage of the node-names is strictly optional by the human or
management software user; neither is required to use the generated
node-names, and are feel to specify their own node-name. A user
specified node-name will prevent an auto-generated one from being
assigned for that specific BDS.
Thanks for the explanation. I understand how auto-generated node-names
will be used a bit better now. I think Eric and your arguments make
sense.

Stefan
Jeff Cody
2014-06-17 21:53:52 UTC
Permalink
Now that active layer block-commit is supported, the 'top' argument
no longer needs to be mandatory.

Change it to optional, with the default being the active layer in the
device chain.

Reviewed-by: Eric Blake <***@redhat.com>
Reviewed-by: Benoit Canet <***@irqsave.net>
Signed-off-by: Jeff Cody <***@redhat.com>
---
blockdev.c | 16 ++++++++++++++--
qapi/block-core.json | 7 ++++---
qmp-commands.hx | 5 +++--
tests/qemu-iotests/040 | 28 ++++++++++++++++++----------
4 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 9b0f8ac..a9a3b2f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1910,7 +1910,8 @@ void qmp_block_stream(const char *device, bool has_base,
}

void qmp_block_commit(const char *device,
- bool has_base, const char *base, const char *top,
+ bool has_base, const char *base,
+ bool has_top, const char *top,
bool has_speed, int64_t speed,
Error **errp)
{
@@ -1929,6 +1930,11 @@ void qmp_block_commit(const char *device,
/* drain all i/o before commits */
bdrv_drain_all();

+ /* Important Note:
+ * libvirt relies on the DeviceNotFound error class in order to probe for
+ * live commit feature versions; for this to work, we must make sure to
+ * perform the device lookup before any generic errors that may occur in a
+ * scenario in which all optional arguments are omitted. */
bs = bdrv_find(device);
if (!bs) {
error_set(errp, QERR_DEVICE_NOT_FOUND, device);
@@ -1942,7 +1948,7 @@ void qmp_block_commit(const char *device,
/* default top_bs is the active layer */
top_bs = bs;

- if (top) {
+ if (has_top && top) {
if (strcmp(bs->filename, top) != 0) {
top_bs = bdrv_find_backing_image(bs, top);
}
@@ -1964,6 +1970,12 @@ void qmp_block_commit(const char *device,
return;
}

+ /* Do not allow attempts to commit an image into itself */
+ if (top_bs == base_bs) {
+ error_setg(errp, "cannot commit an image into itself");
+ return;
+ }
+
if (top_bs == bs) {
commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
bs, &local_err);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7215e48..6ddce4f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -690,8 +690,9 @@
# @base: #optional The file name of the backing image to write data into.
# If not specified, this is the deepest backing image
#
-# @top: The file name of the backing image within the image chain,
-# which contains the topmost data to be committed down.
+# @top: #optional The file name of the backing image within the image chain,
+# which contains the topmost data to be committed down. If
+# not specified, this is the active layer.
#
# If top == base, that is an error.
# If top == active, the job will not be completed by itself,
@@ -719,7 +720,7 @@
#
##
{ 'command': 'block-commit',
- 'data': { 'device': 'str', '*base': 'str', 'top': 'str',
+ 'data': { 'device': 'str', '*base': 'str', '*top': 'str',
'*speed': 'int' } }

##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d6bb0f4..4a8e71a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -985,7 +985,7 @@ EQMP

{
.name = "block-commit",
- .args_type = "device:B,base:s?,top:s,speed:o?",
+ .args_type = "device:B,base:s?,top:s?,speed:o?",
.mhandler.cmd_new = qmp_marshal_input_block_commit,
},

@@ -1003,7 +1003,8 @@ Arguments:
If not specified, this is the deepest backing image
(json-string, optional)
- "top": The file name of the backing image within the image chain,
- which contains the topmost data to be committed down.
+ which contains the topmost data to be committed down. If
+ not specified, this is the active layer. (json-string, optional)

If top == base, that is an error.
If top == active, the job will not be completed by itself,
diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 734b6a6..803b0c7 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -35,11 +35,7 @@ test_img = os.path.join(iotests.test_dir, 'test.img')
class ImageCommitTestCase(iotests.QMPTestCase):
'''Abstract base class for image commit test cases'''

- def run_commit_test(self, top, base):
- self.assert_no_active_block_jobs()
- result = self.vm.qmp('block-commit', device='drive0', top=top, base=base)
- self.assert_qmp(result, 'return', {})
-
+ def wait_for_complete(self):
completed = False
while not completed:
for event in self.vm.get_qmp_events(wait=True):
@@ -58,6 +54,18 @@ class ImageCommitTestCase(iotests.QMPTestCase):
self.assert_no_active_block_jobs()
self.vm.shutdown()

+ def run_commit_test(self, top, base):
+ self.assert_no_active_block_jobs()
+ result = self.vm.qmp('block-commit', device='drive0', top=top, base=base)
+ self.assert_qmp(result, 'return', {})
+ self.wait_for_complete()
+
+ def run_default_commit_test(self):
+ self.assert_no_active_block_jobs()
+ result = self.vm.qmp('block-commit', device='drive0')
+ self.assert_qmp(result, 'return', {})
+ self.wait_for_complete()
+
class TestSingleDrive(ImageCommitTestCase):
image_len = 1 * 1024 * 1024
test_len = 1 * 1024 * 256
@@ -109,17 +117,17 @@ class TestSingleDrive(ImageCommitTestCase):
self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))

+ def test_top_is_default_active(self):
+ self.run_default_commit_test()
+ self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
+ self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
+
def test_top_and_base_reversed(self):
self.assert_no_active_block_jobs()
result = self.vm.qmp('block-commit', device='drive0', top='%s' % backing_img, base='%s' % mid_img)
self.assert_qmp(result, 'error/class', 'GenericError')
self.assert_qmp(result, 'error/desc', 'Base \'%s\' not found' % mid_img)

- def test_top_omitted(self):
- self.assert_no_active_block_jobs()
- result = self.vm.qmp('block-commit', device='drive0')
- self.assert_qmp(result, 'error/class', 'GenericError')
- self.assert_qmp(result, 'error/desc', "Parameter 'top' is missing")

class TestRelativePaths(ImageCommitTestCase):
image_len = 1 * 1024 * 1024
--
1.9.3
Eric Blake
2014-06-17 22:25:27 UTC
Permalink
Post by Jeff Cody
Now that active layer block-commit is supported, the 'top' argument
no longer needs to be mandatory.
Change it to optional, with the default being the active layer in the
device chain.
---
As mentioned elsewhere, this is the patch that changed from v5. But so
there's no confusion, I've gone back over this one, and my R-b still stands.
Post by Jeff Cody
+ * libvirt relies on the DeviceNotFound error class in order to probe for
+ * live commit feature versions; for this to work, we must make sure to
+ * perform the device lookup before any generic errors that may occur in a
+ * scenario in which all optional arguments are omitted. */
bs = bdrv_find(device);
if (!bs) {
error_set(errp, QERR_DEVICE_NOT_FOUND, device);
Thanks for adding the comment :)
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Eric Blake
2014-06-19 16:56:16 UTC
Permalink
Post by Eric Blake
Post by Jeff Cody
Now that active layer block-commit is supported, the 'top' argument
no longer needs to be mandatory.
Change it to optional, with the default being the active layer in the
device chain.
---
As mentioned elsewhere, this is the patch that changed from v5. But so
there's no confusion, I've gone back over this one, and my R-b still stands.
Post by Jeff Cody
+ * libvirt relies on the DeviceNotFound error class in order to probe for
+ * live commit feature versions; for this to work, we must make sure to
+ * perform the device lookup before any generic errors that may occur in a
+ * scenario in which all optional arguments are omitted. */
bs = bdrv_find(device);
if (!bs) {
error_set(errp, QERR_DEVICE_NOT_FOUND, device);
Thanks for adding the comment :)
Should we split this into a different series, since it is
non-controversial (unrelated to the rest of the series' impact on
child-node op-blockers), and useful for libvirt to turn on active
commit? The rest of the series lets libvirt turn on relative backing
chain work, but we might as well get the easy part into the tree now.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Stefan Hajnoczi
2014-06-19 06:40:30 UTC
Permalink
Post by Jeff Cody
Now that active layer block-commit is supported, the 'top' argument
no longer needs to be mandatory.
Change it to optional, with the default being the active layer in the
device chain.
---
blockdev.c | 16 ++++++++++++++--
qapi/block-core.json | 7 ++++---
qmp-commands.hx | 5 +++--
tests/qemu-iotests/040 | 28 ++++++++++++++++++----------
4 files changed, 39 insertions(+), 17 deletions(-)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Jeff Cody
2014-06-17 21:53:53 UTC
Permalink
This modifies the block operation block-commit so that it will
accept node-name arguments for either 'top' or 'base' BDS.

The filename and node-name are mutually exclusive to each other;
i.e.:
"top" and "top-node-name" are mutually exclusive (enforced)
"base" and "base-node-name" are mutually exclusive (enforced)

The preferred and recommended method now of using block-commit
is to specify the BDS to operate on via their node-name arguments.

This also adds an explicit check that base_bs is in the chain of
top_bs, because if they are referenced by their unique node-name
arguments, the discovery process does not intrinsically verify
they are in the same chain.

Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
blockdev.c | 54 ++++++++++++++++++++++++++++++++++++++++++++--------
qapi/block-core.json | 37 +++++++++++++++++++++++++----------
qmp-commands.hx | 31 ++++++++++++++++++++++--------
3 files changed, 96 insertions(+), 26 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index a9a3b2f..44f0cc4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1911,12 +1911,15 @@ void qmp_block_stream(const char *device, bool has_base,

void qmp_block_commit(const char *device,
bool has_base, const char *base,
+ bool has_base_node_name, const char *base_node_name,
bool has_top, const char *top,
+ bool has_top_node_name, const char *top_node_name,
bool has_speed, int64_t speed,
Error **errp)
{
- BlockDriverState *bs;
- BlockDriverState *base_bs, *top_bs;
+ BlockDriverState *bs = NULL;
+ BlockDriverState *base_bs = NULL;
+ BlockDriverState *top_bs = NULL;
Error *local_err = NULL;
/* This will be part of the QMP command, if/when the
* BlockdevOnError change for blkmirror makes it in
@@ -1930,6 +1933,16 @@ void qmp_block_commit(const char *device,
/* drain all i/o before commits */
bdrv_drain_all();

+ /* argument combination validation */
+ if (has_base && has_base_node_name) {
+ error_setg(errp, "'base' and 'base-node-name' are mutually exclusive");
+ return;
+ }
+ if (has_top && has_top_node_name) {
+ error_setg(errp, "'top' and 'top-node-name' are mutually exclusive");
+ return;
+ }
+
/* Important Note:
* libvirt relies on the DeviceNotFound error class in order to probe for
* live commit feature versions; for this to work, we must make sure to
@@ -1941,14 +1954,16 @@ void qmp_block_commit(const char *device,
return;
}

- if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT, errp)) {
- return;
- }
-
/* default top_bs is the active layer */
top_bs = bs;

- if (has_top && top) {
+ if (has_top_node_name) {
+ top_bs = bdrv_lookup_bs(NULL, top_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ } else if (has_top && top) {
if (strcmp(bs->filename, top) != 0) {
top_bs = bdrv_find_backing_image(bs, top);
}
@@ -1959,7 +1974,13 @@ void qmp_block_commit(const char *device,
return;
}

- if (has_base && base) {
+ if (has_base_node_name) {
+ base_bs = bdrv_lookup_bs(NULL, base_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ } else if (has_base && base) {
base_bs = bdrv_find_backing_image(top_bs, base);
} else {
base_bs = bdrv_find_base(top_bs);
@@ -1976,6 +1997,23 @@ void qmp_block_commit(const char *device,
return;
}

+ /* Verify that 'base' is in the same chain as 'top' */
+ if (!bdrv_chain_contains(top_bs, base_bs)) {
+ error_setg(errp, "'base' and 'top' are not in the same chain");
+ return;
+ }
+
+ /* This should technically be caught in commit_start, but
+ * check here for validation completeness */
+ if (!bdrv_chain_contains(bs, top_bs)) {
+ error_setg(errp, "'%s' and 'top' are not in the same chain", device);
+ return;
+ }
+
+ if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT, errp)) {
+ return;
+ }
+
if (top_bs == bs) {
commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
bs, &local_err);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6ddce4f..dddaa13 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -685,14 +685,29 @@
# Live commit of data from overlay image nodes into backing nodes - i.e.,
# writes data between 'top' and 'base' into 'base'.
#
-# @device: the name of the device
+# @device: The name of the device.
#
-# @base: #optional The file name of the backing image to write data into.
-# If not specified, this is the deepest backing image
+# For 'base', either @base or @base-node-name may be set but not both. If
+# neither is specified, this is the deepest backing image.
#
-# @top: #optional The file name of the backing image within the image chain,
-# which contains the topmost data to be committed down. If
-# not specified, this is the active layer.
+# @base: #optional The file name of the backing image to write data
+# into.
+#
+# @base-node-name #optional The name of the block driver state node of the
+# backing image to write data into.
+# (Since 2.1)
+#
+# For 'top', either @top or @top-node-name must be set but not both. If
+# neither is specified, this is the active layer.
+#
+# @top: #optional The file name of the backing image within the image
+# chain, which contains the topmost data to be
+# committed down.
+#
+# @top-node-name: #optional The block driver state node name of the backing
+# image within the image chain, which contains the
+# topmost data to be committed down.
+# (Since 2.1)
#
# If top == base, that is an error.
# If top == active, the job will not be completed by itself,
@@ -711,17 +726,19 @@
#
# Returns: Nothing on success
# If commit or stream is already active on this device, DeviceInUse
-# If @device does not exist, DeviceNotFound
+# If @device does not exist or cannot be determined, DeviceNotFound
# If image commit is not supported by this device, NotSupported
-# If @base or @top is invalid, a generic error is returned
+# If @base, @top, @base-node-name, @top-node-name invalid, GenericError
# If @speed is invalid, InvalidParameter
+# If both @base and @base-node-name are specified, GenericError
+# If both @top and @top-node-name are specified, GenericError
#
# Since: 1.3
#
##
{ 'command': 'block-commit',
- 'data': { 'device': 'str', '*base': 'str', '*top': 'str',
- '*speed': 'int' } }
+ 'data': { 'device': 'str', '*base': 'str', '*base-node-name': 'str',
+ '*top': 'str', '*top-node-name': 'str', '*speed': 'int' } }

##
# @drive-backup
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4a8e71a..52af928 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -985,7 +985,7 @@ EQMP

{
.name = "block-commit",
- .args_type = "device:B,base:s?,top:s?,speed:o?",
+ .args_type = "device:B,base:s?,base-node-name:s?,top:s?,top-node-name:s?,speed:o?",
.mhandler.cmd_new = qmp_marshal_input_block_commit,
},

@@ -998,13 +998,28 @@ data between 'top' and 'base' into 'base'.

Arguments:

-- "device": The device's ID, must be unique (json-string)
-- "base": The file name of the backing image to write data into.
- If not specified, this is the deepest backing image
- (json-string, optional)
-- "top": The file name of the backing image within the image chain,
- which contains the topmost data to be committed down. If
- not specified, this is the active layer. (json-string, optional)
+- "device": The device's ID, must be unique (json-string)
+
+For 'base', either base or base-node-name may be set but not both. If
+neither is specified, this is the deepest backing image
+
+- "base": The file name of the backing image to write data into.
+ (json-string, optional)
+- "base-node-name": The name of the block driver state node of the
+ backing image to write data into.
+ (json-string, optional) (Since 2.1)
+
+For 'top', either top or top-node-name must be set but not both. If
+neither is specified, this is the active layer
+
+- "top": The file name of the backing image within the image chain,
+ which contains the topmost data to be committed down.
+ (json-string, optional)
+
+- "top-node-name": The block driver state node name of the backing
+ image within the image chain, which contains the
+ topmost data to be committed down.
+ (json-string, optional) (Since 2.1)

If top == base, that is an error.
If top == active, the job will not be completed by itself,
--
1.9.3
Benoît Canet
2014-06-18 12:58:25 UTC
Permalink
Post by Jeff Cody
This modifies the block operation block-commit so that it will
accept node-name arguments for either 'top' or 'base' BDS.
The filename and node-name are mutually exclusive to each other;
"top" and "top-node-name" are mutually exclusive (enforced)
"base" and "base-node-name" are mutually exclusive (enforced)
The preferred and recommended method now of using block-commit
is to specify the BDS to operate on via their node-name arguments.
This also adds an explicit check that base_bs is in the chain of
top_bs, because if they are referenced by their unique node-name
arguments, the discovery process does not intrinsically verify
they are in the same chain.
---
blockdev.c | 54 ++++++++++++++++++++++++++++++++++++++++++++--------
qapi/block-core.json | 37 +++++++++++++++++++++++++----------
qmp-commands.hx | 31 ++++++++++++++++++++++--------
3 files changed, 96 insertions(+), 26 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index a9a3b2f..44f0cc4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1911,12 +1911,15 @@ void qmp_block_stream(const char *device, bool has_base,
void qmp_block_commit(const char *device,
bool has_base, const char *base,
+ bool has_base_node_name, const char *base_node_name,
bool has_top, const char *top,
+ bool has_top_node_name, const char *top_node_name,
bool has_speed, int64_t speed,
Error **errp)
{
- BlockDriverState *bs;
- BlockDriverState *base_bs, *top_bs;
+ BlockDriverState *bs = NULL;
+ BlockDriverState *base_bs = NULL;
+ BlockDriverState *top_bs = NULL;
Error *local_err = NULL;
/* This will be part of the QMP command, if/when the
* BlockdevOnError change for blkmirror makes it in
@@ -1930,6 +1933,16 @@ void qmp_block_commit(const char *device,
/* drain all i/o before commits */
bdrv_drain_all();
+ /* argument combination validation */
+ if (has_base && has_base_node_name) {
+ error_setg(errp, "'base' and 'base-node-name' are mutually exclusive");
+ return;
+ }
+ if (has_top && has_top_node_name) {
+ error_setg(errp, "'top' and 'top-node-name' are mutually exclusive");
+ return;
+ }
+
* libvirt relies on the DeviceNotFound error class in order to probe for
* live commit feature versions; for this to work, we must make sure to
@@ -1941,14 +1954,16 @@ void qmp_block_commit(const char *device,
return;
}
- if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT, errp)) {
- return;
- }
-
/* default top_bs is the active layer */
top_bs = bs;
- if (has_top && top) {
+ if (has_top_node_name) {
+ top_bs = bdrv_lookup_bs(NULL, top_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ } else if (has_top && top) {
if (strcmp(bs->filename, top) != 0) {
top_bs = bdrv_find_backing_image(bs, top);
}
@@ -1959,7 +1974,13 @@ void qmp_block_commit(const char *device,
return;
}
- if (has_base && base) {
+ if (has_base_node_name) {
+ base_bs = bdrv_lookup_bs(NULL, base_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ } else if (has_base && base) {
base_bs = bdrv_find_backing_image(top_bs, base);
} else {
base_bs = bdrv_find_base(top_bs);
@@ -1976,6 +1997,23 @@ void qmp_block_commit(const char *device,
return;
}
+ /* Verify that 'base' is in the same chain as 'top' */
+ if (!bdrv_chain_contains(top_bs, base_bs)) {
+ error_setg(errp, "'base' and 'top' are not in the same chain");
+ return;
+ }
+
+ /* This should technically be caught in commit_start, but
+ * check here for validation completeness */
+ if (!bdrv_chain_contains(bs, top_bs)) {
+ error_setg(errp, "'%s' and 'top' are not in the same chain", device);
+ return;
+ }
+
+ if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT, errp)) {
+ return;
+ }
+
if (top_bs == bs) {
commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
bs, &local_err);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6ddce4f..dddaa13 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -685,14 +685,29 @@
# Live commit of data from overlay image nodes into backing nodes - i.e.,
# writes data between 'top' and 'base' into 'base'.
#
#
-# If not specified, this is the deepest backing image
+# neither is specified, this is the deepest backing image.
#
-# which contains the topmost data to be committed down. If
-# not specified, this is the active layer.
+# into.
+#
+# backing image to write data into.
+# (Since 2.1)
+#
+# neither is specified, this is the active layer.
+#
+# chain, which contains the topmost data to be
+# committed down.
+#
+# image within the image chain, which contains the
+# topmost data to be committed down.
+# (Since 2.1)
#
# If top == base, that is an error.
# If top == active, the job will not be completed by itself,
@@ -711,17 +726,19 @@
#
# Returns: Nothing on success
# If commit or stream is already active on this device, DeviceInUse
# If image commit is not supported by this device, NotSupported
#
# Since: 1.3
#
##
{ 'command': 'block-commit',
- 'data': { 'device': 'str', '*base': 'str', '*top': 'str',
- '*speed': 'int' } }
+ 'data': { 'device': 'str', '*base': 'str', '*base-node-name': 'str',
+ '*top': 'str', '*top-node-name': 'str', '*speed': 'int' } }
##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 4a8e71a..52af928 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -985,7 +985,7 @@ EQMP
{
.name = "block-commit",
- .args_type = "device:B,base:s?,top:s?,speed:o?",
+ .args_type = "device:B,base:s?,base-node-name:s?,top:s?,top-node-name:s?,speed:o?",
.mhandler.cmd_new = qmp_marshal_input_block_commit,
},
@@ -998,13 +998,28 @@ data between 'top' and 'base' into 'base'.
-- "device": The device's ID, must be unique (json-string)
-- "base": The file name of the backing image to write data into.
- If not specified, this is the deepest backing image
- (json-string, optional)
-- "top": The file name of the backing image within the image chain,
- which contains the topmost data to be committed down. If
- not specified, this is the active layer. (json-string, optional)
+- "device": The device's ID, must be unique (json-string)
+
+For 'base', either base or base-node-name may be set but not both. If
+neither is specified, this is the deepest backing image
+
+- "base": The file name of the backing image to write data into.
+ (json-string, optional)
+- "base-node-name": The name of the block driver state node of the
+ backing image to write data into.
+ (json-string, optional) (Since 2.1)
+
+For 'top', either top or top-node-name must be set but not both. If
+neither is specified, this is the active layer
+
+- "top": The file name of the backing image within the image chain,
+ which contains the topmost data to be committed down.
+ (json-string, optional)
+
+- "top-node-name": The block driver state node name of the backing
+ image within the image chain, which contains the
+ topmost data to be committed down.
+ (json-string, optional) (Since 2.1)
If top == base, that is an error.
If top == active, the job will not be completed by itself,
--
1.9.3
Reviewed-by: Benoit Canet <***@irqsave.net>
Jeff Cody
2014-06-17 21:53:56 UTC
Permalink
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block job.

For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.

In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.

With this extension to the block-stream api, the user is able to change
the backing file of the active layer as part of the block-stream
operation.

This allows the change to be 'safe', in the sense that if the attempt
to write the active image metadata fails, then the block-stream
operation returns failure, without disrupting the guest.

If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.

Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
block/stream.c | 11 +++++------
blockdev.c | 12 ++++++++++++
hmp.c | 3 ++-
qapi/block-core.json | 18 +++++++++++++++++-
qmp-commands.hx | 2 +-
5 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 91d18a2..1c7f0a0 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -32,7 +32,7 @@ typedef struct StreamBlockJob {
RateLimit limit;
BlockDriverState *base;
BlockdevOnError on_error;
- char backing_file_id[1024];
+ char *backing_file_str;
} StreamBlockJob;

static int coroutine_fn stream_populate(BlockDriverState *bs,
@@ -186,7 +186,7 @@ wait:
if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
const char *base_id = NULL, *base_fmt = NULL;
if (base) {
- base_id = s->backing_file_id;
+ base_id = s->backing_file_str;
if (base->drv) {
base_fmt = base->drv->format_name;
}
@@ -196,6 +196,7 @@ wait:
}

qemu_vfree(buf);
+ g_free(s->backing_file_str);
block_job_completed(&s->common, ret);
}

@@ -217,7 +218,7 @@ static const BlockJobDriver stream_job_driver = {
};

void stream_start(BlockDriverState *bs, BlockDriverState *base,
- const char *base_id, int64_t speed,
+ const char *backing_file_str, int64_t speed,
BlockdevOnError on_error,
BlockDriverCompletionFunc *cb,
void *opaque, Error **errp)
@@ -237,9 +238,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
}

s->base = base;
- if (base_id) {
- pstrcpy(s->backing_file_id, sizeof(s->backing_file_id), base_id);
- }
+ s->backing_file_str = g_strdup(backing_file_str);

s->on_error = on_error;
s->common.co = qemu_coroutine_create(stream_run);
diff --git a/blockdev.c b/blockdev.c
index 2926738..42c7b3f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1871,6 +1871,7 @@ static void block_job_cb(void *opaque, int ret)
void qmp_block_stream(const char *device,
bool has_base, const char *base,
bool has_base_node_name, const char *base_node_name,
+ bool has_backing_file, const char *backing_file,
bool has_speed, int64_t speed,
bool has_on_error, BlockdevOnError on_error,
Error **errp)
@@ -1922,6 +1923,9 @@ void qmp_block_stream(const char *device,
return;
}

+ /* backing_file string overrides base bs filename */
+ base_name = has_backing_file ? backing_file : base_name;
+
/* Verify that 'base' is in the same chain as 'bs', if 'base' was
* specified */
if (base_bs && !bdrv_chain_contains(bs, base_bs)) {
@@ -1929,6 +1933,14 @@ void qmp_block_stream(const char *device,
return;
}

+ /* if we are streaming the entire chain, the result will have no backing
+ * file, and specifying one is therefore an error */
+ if (base_bs == NULL && has_backing_file) {
+ error_setg(errp, "backing file specified, but streaming the "
+ "entire chain");
+ return;
+ }
+
stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
on_error, block_job_cb, bs, &local_err);
if (local_err) {
diff --git a/hmp.c b/hmp.c
index bb934df..644e854 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1173,7 +1173,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
int64_t speed = qdict_get_try_int(qdict, "speed", 0);

qmp_block_stream(device, base != NULL, base,
- false, NULL, qdict_haskey(qdict, "speed"), speed,
+ false, NULL, false, NULL,
+ qdict_haskey(qdict, "speed"), speed,
true, BLOCKDEV_ON_ERROR_REPORT, &error);

hmp_handle_error(mon, &error);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d17e349..c76d45d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -916,6 +916,21 @@
# @base-node-name: #optional the block driver state node name of the
# common backing file. (Since 2.1)
#
+# @backing-file: #optional The backing file string to write into the active
+# layer. This filename is not validated.
+#
+# If a pathname string is such that it cannot be
+# resolved by QEMU, that means that subsequent QMP or
+# HMP commands must use node-names for the image in
+# question, as filename lookup methods will fail.
+#
+# If not specified, QEMU will automatically determine
+# the backing file string to use, or error out if there
+# is no obvious choice. Care should be taken when
+# specifying the string, to specify a valid filename or
+# protocol.
+# (Since 2.1)
+#
# @speed: #optional the maximum speed, in bytes per second
#
# @on-error: #optional the action to take on an error (default report).
@@ -929,7 +944,8 @@
##
{ 'command': 'block-stream',
'data': { 'device': 'str', '*base': 'str', '*base-node-name': 'str',
- '*speed': 'int', '*on-error': 'BlockdevOnError' } }
+ '*backing-file': 'str', '*speed': 'int',
+ '*on-error': 'BlockdevOnError' } }

##
# @block-job-set-speed:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 41e3853..720e5a3 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -979,7 +979,7 @@ EQMP

{
.name = "block-stream",
- .args_type = "device:B,base:s?,base-node-name:s?,speed:o?,on-error:s?",
+ .args_type = "device:B,base:s?,base-node-name:s?,speed:o?,backing-file:s?,on-error:s?",
.mhandler.cmd_new = qmp_marshal_input_block_stream,
},
--
1.9.3
Stefan Hajnoczi
2014-06-19 08:04:24 UTC
Permalink
Post by Jeff Cody
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block job.
For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.
In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.
With this extension to the block-stream api, the user is able to change
the backing file of the active layer as part of the block-stream
operation.
This allows the change to be 'safe', in the sense that if the attempt
to write the active image metadata fails, then the block-stream
operation returns failure, without disrupting the guest.
If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.
---
block/stream.c | 11 +++++------
blockdev.c | 12 ++++++++++++
hmp.c | 3 ++-
qapi/block-core.json | 18 +++++++++++++++++-
qmp-commands.hx | 2 +-
5 files changed, 37 insertions(+), 9 deletions(-)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Jeff Cody
2014-06-17 21:53:57 UTC
Permalink
The QMP command 'block-stream' was missing QMP documentation. Add
that documentation.

Reviewed-by: Eric Blake <***@redhat.com>
Reviewed-by: Benoit Canet <***@irqsave.net>
Signed-off-by: Jeff Cody <***@redhat.com>
---
qmp-commands.hx | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 720e5a3..314cbba 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -983,6 +983,64 @@ EQMP
.mhandler.cmd_new = qmp_marshal_input_block_stream,
},

+SQMP
+block-stream
+------------
+
+Copy data from a backing file into a block device.
+
+The block streaming operation is performed in the background until the entire
+backing file has been copied. This command returns immediately once streaming
+has started. The status of ongoing block streaming operations can be checked
+with query-block-jobs. The operation can be stopped before it has completed
+using the block-job-cancel command.
+
+If a base file is specified then sectors are not copied from that base file and
+its backing chain. When streaming completes the image file will have the base
+file as its backing file. This can be used to stream a subset of the backing
+file chain instead of flattening the entire image.
+
+On successful completion the image file is updated to drop the backing file
+and the BLOCK_JOB_COMPLETED event is emitted.
+
+- "device": The device name.
+ (json-string)
+
+For base, either 'base' or 'base-node-name' may be set but not both. If
+neither is specified, the entire chain will be streamed into the active image,
+and the chain will consist of a single image (the current active layer) with
+no backing file.
+
+- "base": The common backing file name.
+ (json-string, optional)
+
+- "base-node-name": The block driver state node name of the common backing file.
+ (json-string, optional) (Since 2.1)
+
+- "backing-file": The backing file string to write into the active layer.
+ This filename is not validated.
+
+ If a pathname string is such that it cannot be resolved by
+ QEMU, that means that subsequent QMP or HMP commands must
+ use node-names for the image in question, as filename
+ lookup methods will fail.
+
+ If not specified, QEMU will automatically determine the
+ backing file string to use, or error out if there is no
+ obvious choice. Care should be taken when specifying the
+ string, to specify a valid filename or protocol.
+ (json-string, optional)
+ (Since 2.1)
+
+- "speed": The maximum speed, in bytes per second.
+ (json-int, optional)
+
+- "on-error": The action to take on an error (default report).
+ 'stop' and 'enospc' can only be used if the block device
+ supports io-status (see BlockInfo).
+ (json-enum, optional) (Since 1.3)
+EQMP
+
{
.name = "block-commit",
.args_type = "device:B,base:s?,base-node-name:s?,top:s?,top-node-name:s?,backing-file:s?,speed:o?",
--
1.9.3
Stefan Hajnoczi
2014-06-19 08:06:31 UTC
Permalink
Post by Jeff Cody
The QMP command 'block-stream' was missing QMP documentation. Add
that documentation.
---
qmp-commands.hx | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Jeff Cody
2014-06-17 21:53:54 UTC
Permalink
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block commit.

For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.

In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.

With this extension to the block-commit api, the user is able to change
the backing file of the overlay image as part of the block-commit
operation.

This allows the change to be 'safe', in the sense that if the attempt
to write the overlay image metadata fails, then the block-commit
operation returns failure, without disrupting the guest.

If the commit top is the active layer, then specifying the backing
file string will be treated as an error (there is no overlay image
to modify in that case).

If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.

Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
block.c | 8 ++++++--
block/commit.c | 9 ++++++---
blockdev.c | 8 +++++++-
include/block/block.h | 3 ++-
include/block/block_int.h | 3 ++-
qapi/block-core.json | 23 +++++++++++++++++++++--
qmp-commands.hx | 19 ++++++++++++++++++-
7 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index f9ffca4..a9767de 100644
--- a/block.c
+++ b/block.c
@@ -2613,12 +2613,15 @@ typedef struct BlkIntermediateStates {
*
* base <- active
*
+ * If backing_file_str is non-NULL, it will be used when modifying top's
+ * overlay image metadata.
+ *
* Error conditions:
* if active == top, that is considered an error
*
*/
int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
- BlockDriverState *base)
+ BlockDriverState *base, const char *backing_file_str)
{
BlockDriverState *intermediate;
BlockDriverState *base_bs = NULL;
@@ -2670,7 +2673,8 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
}

/* success - we can delete the intermediate states, and link top->base */
- ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
+ backing_file_str = backing_file_str ? backing_file_str : base_bs->filename;
+ ret = bdrv_change_backing_file(new_top_bs, backing_file_str,
base_bs->drv ? base_bs->drv->format_name : "");
if (ret) {
goto exit;
diff --git a/block/commit.c b/block/commit.c
index 5c09f44..91517d3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -37,6 +37,7 @@ typedef struct CommitBlockJob {
BlockdevOnError on_error;
int base_flags;
int orig_overlay_flags;
+ char *backing_file_str;
} CommitBlockJob;

static int coroutine_fn commit_populate(BlockDriverState *bs,
@@ -141,7 +142,7 @@ wait:

if (!block_job_is_cancelled(&s->common) && sector_num == end) {
/* success */
- ret = bdrv_drop_intermediate(active, top, base);
+ ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
}

exit_free_buf:
@@ -158,7 +159,7 @@ exit_restore_reopen:
if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
}
-
+ g_free(s->backing_file_str);
block_job_completed(&s->common, ret);
}

@@ -182,7 +183,7 @@ static const BlockJobDriver commit_job_driver = {
void commit_start(BlockDriverState *bs, BlockDriverState *base,
BlockDriverState *top, int64_t speed,
BlockdevOnError on_error, BlockDriverCompletionFunc *cb,
- void *opaque, Error **errp)
+ void *opaque, const char *backing_file_str, Error **errp)
{
CommitBlockJob *s;
BlockReopenQueue *reopen_queue = NULL;
@@ -244,6 +245,8 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
s->base_flags = orig_base_flags;
s->orig_overlay_flags = orig_overlay_flags;

+ s->backing_file_str = g_strdup(backing_file_str);
+
s->on_error = on_error;
s->common.co = qemu_coroutine_create(commit_run);

diff --git a/blockdev.c b/blockdev.c
index 44f0cc4..7a1c966 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1914,6 +1914,7 @@ void qmp_block_commit(const char *device,
bool has_base_node_name, const char *base_node_name,
bool has_top, const char *top,
bool has_top_node_name, const char *top_node_name,
+ bool has_backing_file, const char *backing_file,
bool has_speed, int64_t speed,
Error **errp)
{
@@ -2015,11 +2016,16 @@ void qmp_block_commit(const char *device,
}

if (top_bs == bs) {
+ if (has_backing_file) {
+ error_setg(errp, "'backing-file' specified,"
+ " but 'top' is the active layer");
+ return;
+ }
commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
bs, &local_err);
} else {
commit_start(bs, base_bs, top_bs, speed, on_error, block_job_cb, bs,
- &local_err);
+ has_backing_file ? backing_file : NULL, &local_err);
}
if (local_err != NULL) {
error_propagate(errp, local_err);
diff --git a/include/block/block.h b/include/block/block.h
index c60bd52..2571945 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -288,7 +288,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
const char *backing_file, const char *backing_fmt);
void bdrv_register(BlockDriver *bdrv);
int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
- BlockDriverState *base);
+ BlockDriverState *base,
+ const char *backing_file_str);
BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
BlockDriverState *bs);
BlockDriverState *bdrv_find_base(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7aa2213..d7d53fc 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -462,13 +462,14 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
* @on_error: The action to take upon error.
* @cb: Completion function for the job.
* @opaque: Opaque pointer value passed to @cb.
+ * @backing_file_str: String to use as the backing file in @top's overlay
* @errp: Error object.
*
*/
void commit_start(BlockDriverState *bs, BlockDriverState *base,
BlockDriverState *top, int64_t speed,
BlockdevOnError on_error, BlockDriverCompletionFunc *cb,
- void *opaque, Error **errp);
+ void *opaque, const char *backing_file_str, Error **errp);
/**
* commit_active_start:
* @bs: Active block device to be committed.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index dddaa13..ae1dde9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -709,6 +709,23 @@
# topmost data to be committed down.
# (Since 2.1)
#
+# @backing-file: #optional The backing file string to write into the overlay
+# image of 'top'. If 'top' is the active layer,
+# specifying a backing file string is an error. This
+# filename is not validated.
+#
+# If a pathname string is such that it cannot be
+# resolved by QEMU, that means that subsequent QMP or
+# HMP commands must use node-names for the image in
+# question, as filename lookup methods will fail.
+#
+# If not specified, QEMU will automatically determine
+# the backing file string to use, or error out if
+# there is no obvious choice. Care should be taken
+# when specifying the string, to specify a valid
+# filename or protocol.
+# (Since 2.1)
+#
# If top == base, that is an error.
# If top == active, the job will not be completed by itself,
# user needs to complete the job with the block-job-complete
@@ -721,7 +738,6 @@
# size of the smaller top, you can safely truncate it
# yourself once the commit operation successfully completes.
#
-#
# @speed: #optional the maximum speed, in bytes per second
#
# Returns: Nothing on success
@@ -732,13 +748,16 @@
# If @speed is invalid, InvalidParameter
# If both @base and @base-node-name are specified, GenericError
# If both @top and @top-node-name are specified, GenericError
+# If @top or @top-node-name is the active layer, and @backing-file is
+# specified, GenericError
#
# Since: 1.3
#
##
{ 'command': 'block-commit',
'data': { 'device': 'str', '*base': 'str', '*base-node-name': 'str',
- '*top': 'str', '*top-node-name': 'str', '*speed': 'int' } }
+ '*top': 'str', '*top-node-name': 'str', '*backing-file': 'str',
+ '*speed': 'int' } }

##
# @drive-backup
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 52af928..7eb90a9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -985,7 +985,7 @@ EQMP

{
.name = "block-commit",
- .args_type = "device:B,base:s?,base-node-name:s?,top:s?,top-node-name:s?,speed:o?",
+ .args_type = "device:B,base:s?,base-node-name:s?,top:s?,top-node-name:s?,backing-file:s?,speed:o?",
.mhandler.cmd_new = qmp_marshal_input_block_commit,
},

@@ -1021,6 +1021,23 @@ neither is specified, this is the active layer
topmost data to be committed down.
(json-string, optional) (Since 2.1)

+- backing-file: The backing file string to write into the overlay
+ image of 'top'. If 'top' is the active layer,
+ specifying a backing file string is an error. This
+ filename is not validated.
+
+ If a pathname string is such that it cannot be
+ resolved by QEMU, that means that subsequent QMP or
+ HMP commands must use node-names for the image in
+ question, as filename lookup methods will fail.
+
+ If not specified, QEMU will automatically determine
+ the backing file string to use, or error out if
+ there is no obvious choice. Care should be taken
+ when specifying the string, to specify a valid
+ filename or protocol.
+ (json-string, optional) (Since 2.1)
+
If top == base, that is an error.
If top == active, the job will not be completed by itself,
user needs to complete the job with the block-job-complete
--
1.9.3
Stefan Hajnoczi
2014-06-19 07:49:54 UTC
Permalink
Post by Jeff Cody
On some image chains, QEMU may not always be able to resolve the
filenames properly, when updating the backing file of an image
after a block commit.
For instance, certain relative pathnames may fail, or drives may
have been specified originally by file descriptor (e.g. /dev/fd/???),
or a relative protocol pathname may have been used.
In these instances, QEMU may lack the information to be able to make
the correct choice, but the user or management layer most likely does
have that knowledge.
With this extension to the block-commit api, the user is able to change
the backing file of the overlay image as part of the block-commit
operation.
This allows the change to be 'safe', in the sense that if the attempt
to write the overlay image metadata fails, then the block-commit
operation returns failure, without disrupting the guest.
If the commit top is the active layer, then specifying the backing
file string will be treated as an error (there is no overlay image
to modify in that case).
If a backing file string is not specified in the command, the backing
file string to use is determined in the same manner as it was
previously.
---
block.c | 8 ++++++--
block/commit.c | 9 ++++++---
blockdev.c | 8 +++++++-
include/block/block.h | 3 ++-
include/block/block_int.h | 3 ++-
qapi/block-core.json | 23 +++++++++++++++++++++--
qmp-commands.hx | 19 ++++++++++++++++++-
7 files changed, 62 insertions(+), 11 deletions(-)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Jeff Cody
2014-06-17 21:53:55 UTC
Permalink
This adds the ability for block-stream to use node-name arguments
for base, to specify the backing image to stream from.

Both 'base' and 'base-node-name' are optional, but mutually exclusive.
Either can be specified, but not both together.

Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
blockdev.c | 48 +++++++++++++++++++++++++++++++++++++++---------
hmp.c | 2 +-
qapi/block-core.json | 16 ++++++++++++----
qmp-commands.hx | 2 +-
4 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7a1c966..2926738 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1868,38 +1868,68 @@ static void block_job_cb(void *opaque, int ret)
bdrv_put_ref_bh_schedule(bs);
}

-void qmp_block_stream(const char *device, bool has_base,
- const char *base, bool has_speed, int64_t speed,
+void qmp_block_stream(const char *device,
+ bool has_base, const char *base,
+ bool has_base_node_name, const char *base_node_name,
+ bool has_speed, int64_t speed,
bool has_on_error, BlockdevOnError on_error,
Error **errp)
{
- BlockDriverState *bs;
+ BlockDriverState *bs = NULL;
BlockDriverState *base_bs = NULL;
+ BlockDriverState *tmp_bs;
Error *local_err = NULL;
+ const char *base_name = NULL;

if (!has_on_error) {
on_error = BLOCKDEV_ON_ERROR_REPORT;
}

+ if (has_base && has_base_node_name) {
+ error_setg(errp, "'base' and 'base-node-name' are mutually exclusive");
+ return;
+ }
+
bs = bdrv_find(device);
if (!bs) {
error_set(errp, QERR_DEVICE_NOT_FOUND, device);
return;
}

+ if (has_base_node_name) {
+ base_bs = bdrv_lookup_bs(NULL, base_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ tmp_bs = bdrv_find_overlay(bs, base_bs);
+ if (tmp_bs) {
+ base_name = tmp_bs->backing_file;
+ }
+ }
+
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
return;
}

- if (base) {
+ if (has_base) {
base_bs = bdrv_find_backing_image(bs, base);
- if (base_bs == NULL) {
- error_set(errp, QERR_BASE_NOT_FOUND, base);
- return;
- }
+ base_name = base;
+ }
+
+ if (base_bs == NULL && (has_base || has_base_node_name)) {
+ error_set(errp, QERR_BASE_NOT_FOUND, base);
+ return;
+ }
+
+ /* Verify that 'base' is in the same chain as 'bs', if 'base' was
+ * specified */
+ if (base_bs && !bdrv_chain_contains(bs, base_bs)) {
+ error_setg(errp, "'device' and 'base' are not in the same chain");
+ return;
}

- stream_start(bs, base_bs, base, has_speed ? speed : 0,
+ stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
on_error, block_job_cb, bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
diff --git a/hmp.c b/hmp.c
index ccc35d4..bb934df 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1173,7 +1173,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
int64_t speed = qdict_get_try_int(qdict, "speed", 0);

qmp_block_stream(device, base != NULL, base,
- qdict_haskey(qdict, "speed"), speed,
+ false, NULL, qdict_haskey(qdict, "speed"), speed,
true, BLOCKDEV_ON_ERROR_REPORT, &error);

hmp_handle_error(mon, &error);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ae1dde9..d17e349 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -904,9 +904,17 @@
# On successful completion the image file is updated to drop the backing file
# and the BLOCK_JOB_COMPLETED event is emitted.
#
-# @device: the device name
+# @device: The device name.
+#
+# For 'base', either @base or @base-node-name may be set but not both. If
+# neither is specified, the entire chain will be streamed into the active image,
+# and the chain will consist of a single image (the current active layer) with
+# no backing file.
#
-# @base: #optional the common backing file name
+# @base: #optional the common backing file name
+#
+# @base-node-name: #optional the block driver state node name of the
+# common backing file. (Since 2.1)
#
# @speed: #optional the maximum speed, in bytes per second
#
@@ -920,8 +928,8 @@
# Since: 1.1
##
{ 'command': 'block-stream',
- 'data': { 'device': 'str', '*base': 'str', '*speed': 'int',
- '*on-error': 'BlockdevOnError' } }
+ 'data': { 'device': 'str', '*base': 'str', '*base-node-name': 'str',
+ '*speed': 'int', '*on-error': 'BlockdevOnError' } }

##
# @block-job-set-speed:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7eb90a9..41e3853 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -979,7 +979,7 @@ EQMP

{
.name = "block-stream",
- .args_type = "device:B,base:s?,speed:o?,on-error:s?",
+ .args_type = "device:B,base:s?,base-node-name:s?,speed:o?,on-error:s?",
.mhandler.cmd_new = qmp_marshal_input_block_stream,
},
--
1.9.3
Benoît Canet
2014-06-18 13:06:56 UTC
Permalink
Post by Jeff Cody
This adds the ability for block-stream to use node-name arguments
for base, to specify the backing image to stream from.
Both 'base' and 'base-node-name' are optional, but mutually exclusive.
Either can be specified, but not both together.
---
blockdev.c | 48 +++++++++++++++++++++++++++++++++++++++---------
hmp.c | 2 +-
qapi/block-core.json | 16 ++++++++++++----
qmp-commands.hx | 2 +-
4 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index 7a1c966..2926738 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1868,38 +1868,68 @@ static void block_job_cb(void *opaque, int ret)
bdrv_put_ref_bh_schedule(bs);
}
-void qmp_block_stream(const char *device, bool has_base,
- const char *base, bool has_speed, int64_t speed,
+void qmp_block_stream(const char *device,
+ bool has_base, const char *base,
+ bool has_base_node_name, const char *base_node_name,
+ bool has_speed, int64_t speed,
bool has_on_error, BlockdevOnError on_error,
Error **errp)
{
- BlockDriverState *bs;
+ BlockDriverState *bs = NULL;
BlockDriverState *base_bs = NULL;
+ BlockDriverState *tmp_bs;
Error *local_err = NULL;
+ const char *base_name = NULL;
if (!has_on_error) {
on_error = BLOCKDEV_ON_ERROR_REPORT;
}
+ if (has_base && has_base_node_name) {
+ error_setg(errp, "'base' and 'base-node-name' are mutually exclusive");
+ return;
+ }
+
bs = bdrv_find(device);
if (!bs) {
error_set(errp, QERR_DEVICE_NOT_FOUND, device);
return;
}
+ if (has_base_node_name) {
+ base_bs = bdrv_lookup_bs(NULL, base_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ tmp_bs = bdrv_find_overlay(bs, base_bs);
+ if (tmp_bs) {
+ base_name = tmp_bs->backing_file;
+ }
+ }
+
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
return;
}
- if (base) {
+ if (has_base) {
base_bs = bdrv_find_backing_image(bs, base);
- if (base_bs == NULL) {
- error_set(errp, QERR_BASE_NOT_FOUND, base);
- return;
- }
+ base_name = base;
+ }
+
+ if (base_bs == NULL && (has_base || has_base_node_name)) {
+ error_set(errp, QERR_BASE_NOT_FOUND, base);
+ return;
+ }
+
+ /* Verify that 'base' is in the same chain as 'bs', if 'base' was
+ * specified */
+ if (base_bs && !bdrv_chain_contains(bs, base_bs)) {
+ error_setg(errp, "'device' and 'base' are not in the same chain");
+ return;
}
- stream_start(bs, base_bs, base, has_speed ? speed : 0,
+ stream_start(bs, base_bs, base_name, has_speed ? speed : 0,
on_error, block_job_cb, bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
diff --git a/hmp.c b/hmp.c
index ccc35d4..bb934df 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1173,7 +1173,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
int64_t speed = qdict_get_try_int(qdict, "speed", 0);
qmp_block_stream(device, base != NULL, base,
- qdict_haskey(qdict, "speed"), speed,
+ false, NULL, qdict_haskey(qdict, "speed"), speed,
true, BLOCKDEV_ON_ERROR_REPORT, &error);
hmp_handle_error(mon, &error);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ae1dde9..d17e349 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -904,9 +904,17 @@
# On successful completion the image file is updated to drop the backing file
# and the BLOCK_JOB_COMPLETED event is emitted.
#
+#
+# neither is specified, the entire chain will be streamed into the active image,
+# and the chain will consist of a single image (the current active layer) with
+# no backing file.
#
+#
+# common backing file. (Since 2.1)
#
#
@@ -920,8 +928,8 @@
# Since: 1.1
##
{ 'command': 'block-stream',
- 'data': { 'device': 'str', '*base': 'str', '*speed': 'int',
- '*on-error': 'BlockdevOnError' } }
+ 'data': { 'device': 'str', '*base': 'str', '*base-node-name': 'str',
+ '*speed': 'int', '*on-error': 'BlockdevOnError' } }
##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7eb90a9..41e3853 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -979,7 +979,7 @@ EQMP
{
.name = "block-stream",
- .args_type = "device:B,base:s?,speed:o?,on-error:s?",
+ .args_type = "device:B,base:s?,base-node-name:s?,speed:o?,on-error:s?",
.mhandler.cmd_new = qmp_marshal_input_block_stream,
},
--
1.9.3
Reviewed-by: Benoit Canet <***@irqsave.net>
Stefan Hajnoczi
2014-06-19 08:01:27 UTC
Permalink
Post by Jeff Cody
This adds the ability for block-stream to use node-name arguments
for base, to specify the backing image to stream from.
Both 'base' and 'base-node-name' are optional, but mutually exclusive.
Either can be specified, but not both together.
---
blockdev.c | 48 +++++++++++++++++++++++++++++++++++++++---------
hmp.c | 2 +-
qapi/block-core.json | 16 ++++++++++++----
qmp-commands.hx | 2 +-
4 files changed, 53 insertions(+), 15 deletions(-)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Jeff Cody
2014-06-17 21:53:58 UTC
Permalink
This allows a user to make a live change to the backing file recorded in
an open image.

The image file to modify can be specified 2 ways:

1) image filename
2) image node-name

Note: this does not cause the backing file itself to be reopened; it
merely changes the backing filename in the image file structure, and
in internal BDS structures.

It is the responsibility of the user to pass a filename string that
can be resolved when the image chain is reopened, and the filename
string is not validated.

A good analogy for this command is that it is a live version of
'qemu-img rebase -u', with respect to changing the backing file string.

Reviewed-by: Eric Blake <***@redhat.com>
Signed-off-by: Jeff Cody <***@redhat.com>
---
blockdev.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++
qapi/block-core.json | 60 ++++++++++++++++++++++++++++++
qmp-commands.hx | 74 +++++++++++++++++++++++++++++++++++++
3 files changed, 236 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 42c7b3f..273e52a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2414,6 +2414,108 @@ void qmp_block_job_complete(const char *device, Error **errp)
block_job_complete(job, errp);
}

+void qmp_change_backing_file(const char *device,
+ bool has_image, const char *image,
+ bool has_image_node_name,
+ const char *image_node_name,
+ const char *backing_file,
+ Error **errp)
+{
+ BlockDriverState *bs = NULL;
+ BlockDriverState *image_bs = NULL;
+ Error *local_err = NULL;
+ bool ro;
+ int open_flags;
+ int ret;
+
+ /* validate argument combinations */
+ if (has_image && has_image_node_name) {
+ error_setg(errp, "'image' and 'image-node-name' "
+ "are mutually exclusive");
+ return;
+ }
+
+ /* find the top layer BDS of the chain */
+ bs = bdrv_find(device);
+ if (!bs) {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+ return;
+ }
+
+ if (has_image_node_name) {
+ image_bs = bdrv_lookup_bs(NULL, image_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ }
+
+ if (has_image) {
+ if (!strcmp(bs->filename, image)) {
+ image_bs = bs;
+ } else {
+ image_bs = bdrv_find_backing_image(bs, image);
+ }
+ }
+
+ if (!has_image && !has_image_node_name) {
+ image_bs = bs;
+ }
+
+ if (!image_bs) {
+ error_setg(errp, "image file not found");
+ return;
+ }
+
+ if (bdrv_find_base(image_bs) == image_bs) {
+ error_setg(errp, "not allowing backing file change on an image "
+ "without a backing file");
+ return;
+ }
+
+ /* even though we are not necessarily operating on bs, we need it to
+ * determine if block ops are currently prohibited on the chain */
+ if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_CHANGE, errp)) {
+ return;
+ }
+
+ /* final sanity check */
+ if (!bdrv_chain_contains(bs, image_bs)) {
+ error_setg(errp, "'%s' and image file are not in the same chain",
+ device);
+ return;
+ }
+
+ /* if not r/w, reopen to make r/w */
+ open_flags = image_bs->open_flags;
+ ro = bdrv_is_read_only(image_bs);
+
+ if (ro) {
+ bdrv_reopen(image_bs, open_flags | BDRV_O_RDWR, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ }
+
+ ret = bdrv_change_backing_file(image_bs, backing_file,
+ image_bs->drv ? image_bs->drv->format_name : "");
+
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Could not change backing file to '%s'",
+ backing_file);
+ /* don't exit here, so we can try to restore open flags if
+ * appropriate */
+ }
+
+ if (ro) {
+ bdrv_reopen(image_bs, open_flags, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err); /* will preserve prior errp */
+ }
+ }
+}
+
void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
{
QmpOutputVisitor *ov = qmp_output_visitor_new();
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c76d45d..0939143 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -680,6 +680,66 @@
'data': 'BlockdevSnapshot' }

##
+# @change-backing-file
+#
+# Change the backing file in the image file metadata. This does not cause QEMU
+# to reopen the image file to reparse the backing filename (it may, however,
+# perform a reopen to change permissions from r/o -> r/w -> r/o, if needed).
+# The new backing file string is written into the image file metadata, and the
+# QEMU internal strings are updated.
+#
+# The image file to perform the operation on can be specified by two different
+# methods:
+#
+# Method 1: Supply the device name (e.g. 'virtio0'), and optionally the image
+# filename. This would use arguments @device and @image.
+#
+# Method 2: Supply the device name, and the node-name of the image to modify,
+# via @image-node-name.
+#
+# Arguments @image and @image-node-name are mutually exclusive.
+#
+# Method 1 interface
+#---------------------
+# @image: #optional The file name of the image to modify. If omitted,
+# and @image-node-name is not supplied, then the
+# default is the active layer of the chain described
+# by @device.
+#
+# Method 2 interface
+#---------------------
+# @image-node-name #optional The name of the block driver state node of the
+# image to modify. The @device argument is used to
+# verify @image-node-name is in the chain described
+# by @device.
+#
+# Common arguments
+#---------------------
+# @device: The name of the device.
+#
+# @backing-file: The string to write as the backing file. This string is
+# not validated, so care should be taken when specifying
+# the string or the image chain may not be able to be
+# reopened again.
+#
+# If a pathname string is such that it cannot be
+# resolved by QEMU, that means that subsequent QMP or
+# HMP commands must use node-names for the image in
+# question, as filename lookup methods will fail.
+#
+#
+# Returns: Nothing on success
+# If @device does not exist or cannot be determined, DeviceNotFound
+# If @image is specified, but not @device, GenericError
+# If both @image and @image-node-name are specified, GenericError
+#
+# Since: 2.1
+##
+{ 'command': 'change-backing-file',
+ 'data': { 'device': 'str', '*image': 'str', '*image-node-name': 'str',
+ 'backing-file': 'str' } }
+
+##
# @block-commit
#
# Live commit of data from overlay image nodes into backing nodes - i.e.,
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 314cbba..c1cbfd2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1438,6 +1438,80 @@ Example:
EQMP

{
+ .name = "change-backing-file",
+ .args_type = "device:s,image:s?,image-node-name:s?,backing-file:s",
+ .mhandler.cmd_new = qmp_marshal_input_change_backing_file,
+ },
+
+SQMP
+change-backing-file
+-------------------
+Since: 2.1
+
+Change the backing file in the image file metadata. This does not cause QEMU
+to reopen the image file to reparse the backing filename (it may, however,
+perform a reopen to change permissions from r/o -> r/w -> r/o, if needed).
+The new backing file string is written into the image file metadata, and the
+QEMU internal strings are updated.
+
+The image file to perform the operation on can be specified by two different
+methods:
+
+ Method 1: Supply the device name (e.g. 'virtio0'), and optionally the image
+ filename. This would use arguments "device" and "image".
+
+ Method 2: Supply the device name, and the node-name of the image to modify,
+ via "image-node-name".
+
+Arguments:
+
+Arguments "image" or "image-node-name" are mutually exclusive.
+
+
+Method 1 interface
+--------------------
+- "image": The file name of the image to modify. If omitted,
+ and "image-node-name" is not supplied, then the
+ default is the active layer of the chain described
+ by device.
+ (json-string, optional)
+
+
+Method 2 interface
+--------------------
+- "image-node-name": The name of the block driver state node of the
+ image to modify. The "device" is argument is used to
+ verify "image-node-name" is in the chain described by
+ "device".
+ (json-string, optional)
+
+
+Common arguments
+--------------------
+- "device": The name of the device.
+ (json-string)
+
+- "backing-file": The string to write as the backing file. This string is
+ not validated, so care should be taken when specifying
+ the string or the image chain may not be able to be
+ reopened again.
+ (json-string)
+
+ If a pathname string is such that it cannot be
+ resolved by QEMU, that means that subsequent QMP or
+ HMP commands must use node-names for the image in
+ question, as filename lookup methods will fail.
+
+
+Returns: Nothing on success
+ If "device" does not exist or cannot be determined, DeviceNotFound
+ If "image" is specified, but not "device, GenericError
+ If both "image" and "image-node-name" are specified, GenericError
+
+
+EQMP
+
+ {
.name = "balloon",
.args_type = "value:M",
.mhandler.cmd_new = qmp_marshal_input_balloon,
--
1.9.3
Benoît Canet
2014-06-18 13:15:07 UTC
Permalink
Post by Jeff Cody
This allows a user to make a live change to the backing file recorded in
an open image.
1) image filename
2) image node-name
Note: this does not cause the backing file itself to be reopened; it
merely changes the backing filename in the image file structure, and
in internal BDS structures.
It is the responsibility of the user to pass a filename string that
can be resolved when the image chain is reopened, and the filename
string is not validated.
A good analogy for this command is that it is a live version of
'qemu-img rebase -u', with respect to changing the backing file string.
---
blockdev.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++
qapi/block-core.json | 60 ++++++++++++++++++++++++++++++
qmp-commands.hx | 74 +++++++++++++++++++++++++++++++++++++
3 files changed, 236 insertions(+)
diff --git a/blockdev.c b/blockdev.c
index 42c7b3f..273e52a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2414,6 +2414,108 @@ void qmp_block_job_complete(const char *device, Error **errp)
block_job_complete(job, errp);
}
+void qmp_change_backing_file(const char *device,
+ bool has_image, const char *image,
+ bool has_image_node_name,
+ const char *image_node_name,
+ const char *backing_file,
+ Error **errp)
+{
+ BlockDriverState *bs = NULL;
+ BlockDriverState *image_bs = NULL;
+ Error *local_err = NULL;
+ bool ro;
+ int open_flags;
+ int ret;
+
+ /* validate argument combinations */
+ if (has_image && has_image_node_name) {
+ error_setg(errp, "'image' and 'image-node-name' "
+ "are mutually exclusive");
+ return;
+ }
+
+ /* find the top layer BDS of the chain */
+ bs = bdrv_find(device);
+ if (!bs) {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+ return;
+ }
+
+ if (has_image_node_name) {
+ image_bs = bdrv_lookup_bs(NULL, image_node_name, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ }
+
+ if (has_image) {
+ if (!strcmp(bs->filename, image)) {
+ image_bs = bs;
+ } else {
+ image_bs = bdrv_find_backing_image(bs, image);
+ }
+ }
+
+ if (!has_image && !has_image_node_name) {
+ image_bs = bs;
+ }
+
+ if (!image_bs) {
+ error_setg(errp, "image file not found");
+ return;
+ }
+
+ if (bdrv_find_base(image_bs) == image_bs) {
+ error_setg(errp, "not allowing backing file change on an image "
+ "without a backing file");
+ return;
+ }
+
+ /* even though we are not necessarily operating on bs, we need it to
+ * determine if block ops are currently prohibited on the chain */
+ if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_CHANGE, errp)) {
+ return;
+ }
+
+ /* final sanity check */
+ if (!bdrv_chain_contains(bs, image_bs)) {
+ error_setg(errp, "'%s' and image file are not in the same chain",
+ device);
+ return;
+ }
+
+ /* if not r/w, reopen to make r/w */
+ open_flags = image_bs->open_flags;
+ ro = bdrv_is_read_only(image_bs);
+
+ if (ro) {
+ bdrv_reopen(image_bs, open_flags | BDRV_O_RDWR, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ }
+
+ ret = bdrv_change_backing_file(image_bs, backing_file,
+ image_bs->drv ? image_bs->drv->format_name : "");
+
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Could not change backing file to '%s'",
+ backing_file);
+ /* don't exit here, so we can try to restore open flags if
+ * appropriate */
+ }
+
+ if (ro) {
+ bdrv_reopen(image_bs, open_flags, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err); /* will preserve prior errp */
+ }
+ }
+}
+
void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
{
QmpOutputVisitor *ov = qmp_output_visitor_new();
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c76d45d..0939143 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -680,6 +680,66 @@
'data': 'BlockdevSnapshot' }
##
+#
+# Change the backing file in the image file metadata. This does not cause QEMU
+# to reopen the image file to reparse the backing filename (it may, however,
+# perform a reopen to change permissions from r/o -> r/w -> r/o, if needed).
+# The new backing file string is written into the image file metadata, and the
+# QEMU internal strings are updated.
+#
+# The image file to perform the operation on can be specified by two different
+#
+# Method 1: Supply the device name (e.g. 'virtio0'), and optionally the image
+#
+# Method 2: Supply the device name, and the node-name of the image to modify,
+#
+#
+# Method 1 interface
+#---------------------
+# default is the active layer of the chain described
+#
+# Method 2 interface
+#---------------------
+#
+# Common arguments
+#---------------------
+#
+# not validated, so care should be taken when specifying
+# the string or the image chain may not be able to be
+# reopened again.
+#
+# If a pathname string is such that it cannot be
+# resolved by QEMU, that means that subsequent QMP or
+# HMP commands must use node-names for the image in
+# question, as filename lookup methods will fail.
+#
+#
+# Returns: Nothing on success
+#
+# Since: 2.1
+##
+{ 'command': 'change-backing-file',
+ 'data': { 'device': 'str', '*image': 'str', '*image-node-name': 'str',
+ 'backing-file': 'str' } }
+
+##
#
# Live commit of data from overlay image nodes into backing nodes - i.e.,
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 314cbba..c1cbfd2 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
EQMP
{
+ .name = "change-backing-file",
+ .args_type = "device:s,image:s?,image-node-name:s?,backing-file:s",
+ .mhandler.cmd_new = qmp_marshal_input_change_backing_file,
+ },
+
+SQMP
+change-backing-file
+-------------------
+Since: 2.1
+
+Change the backing file in the image file metadata. This does not cause QEMU
+to reopen the image file to reparse the backing filename (it may, however,
+perform a reopen to change permissions from r/o -> r/w -> r/o, if needed).
+The new backing file string is written into the image file metadata, and the
+QEMU internal strings are updated.
+
+The image file to perform the operation on can be specified by two different
+
+ Method 1: Supply the device name (e.g. 'virtio0'), and optionally the image
+ filename. This would use arguments "device" and "image".
+
+ Method 2: Supply the device name, and the node-name of the image to modify,
+ via "image-node-name".
+
+
+Arguments "image" or "image-node-name" are mutually exclusive.
+
+
+Method 1 interface
+--------------------
+- "image": The file name of the image to modify. If omitted,
+ and "image-node-name" is not supplied, then the
+ default is the active layer of the chain described
+ by device.
+ (json-string, optional)
+
+
+Method 2 interface
+--------------------
+- "image-node-name": The name of the block driver state node of the
+ image to modify. The "device" is argument is used to
+ verify "image-node-name" is in the chain described by
+ "device".
+ (json-string, optional)
+
+
+Common arguments
+--------------------
+- "device": The name of the device.
+ (json-string)
+
+- "backing-file": The string to write as the backing file. This string is
+ not validated, so care should be taken when specifying
+ the string or the image chain may not be able to be
+ reopened again.
+ (json-string)
+
+ If a pathname string is such that it cannot be
+ resolved by QEMU, that means that subsequent QMP or
+ HMP commands must use node-names for the image in
+ question, as filename lookup methods will fail.
+
+
+Returns: Nothing on success
+ If "device" does not exist or cannot be determined, DeviceNotFound
+ If "image" is specified, but not "device, GenericError
+ If both "image" and "image-node-name" are specified, GenericError
+
+
+EQMP
+
+ {
.name = "balloon",
.args_type = "value:M",
.mhandler.cmd_new = qmp_marshal_input_balloon,
--
1.9.3
Could you provide an example script of how these backing file manipulation
are suposed to be done ?

Best regards

Benoît
Stefan Hajnoczi
2014-06-19 08:37:52 UTC
Permalink
Post by Benoît Canet
Could you provide an example script of how these backing file manipulation
are suposed to be done ?
I agree.

Jeff: Please send a qemu-iotests test case.

Stefan
Jeff Cody
2014-06-19 19:08:20 UTC
Permalink
Post by Stefan Hajnoczi
Post by Benoît Canet
Could you provide an example script of how these backing file manipulation
are suposed to be done ?
I agree.
Jeff: Please send a qemu-iotests test case.
OK
Stefan Hajnoczi
2014-06-19 08:37:56 UTC
Permalink
Post by Jeff Cody
This allows a user to make a live change to the backing file recorded in
an open image.
1) image filename
2) image node-name
Note: this does not cause the backing file itself to be reopened; it
merely changes the backing filename in the image file structure, and
in internal BDS structures.
It is the responsibility of the user to pass a filename string that
can be resolved when the image chain is reopened, and the filename
string is not validated.
A good analogy for this command is that it is a live version of
'qemu-img rebase -u', with respect to changing the backing file string.
---
blockdev.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++
qapi/block-core.json | 60 ++++++++++++++++++++++++++++++
qmp-commands.hx | 74 +++++++++++++++++++++++++++++++++++++
3 files changed, 236 insertions(+)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Jeff Cody
2014-06-17 21:53:51 UTC
Permalink
This simplifies the function bdrv_find_overlay(). With this change,
bdrv_find_base() is just a subset of usage of bdrv_find_overlay(),
so this also takes advantage of that.

Reviewed-by: Eric Blake <***@redhat.com>
Reviewed-by: Benoit Canet <***@irqsave.net>
Signed-off-by: Jeff Cody <***@redhat.com>
---
block.c | 45 ++++++++++-----------------------------------
1 file changed, 10 insertions(+), 35 deletions(-)

diff --git a/block.c b/block.c
index 280a167..f9ffca4 100644
--- a/block.c
+++ b/block.c
@@ -2566,32 +2566,23 @@ int bdrv_change_backing_file(BlockDriverState *bs,
*
* Returns NULL if bs is not found in active's image chain,
* or if active == bs.
+ *
+ * Returns the bottommost base image if bs == NULL.
*/
BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
BlockDriverState *bs)
{
- BlockDriverState *overlay = NULL;
- BlockDriverState *intermediate;
-
- assert(active != NULL);
- assert(bs != NULL);
-
- /* if bs is the same as active, then by definition it has no overlay
- */
- if (active == bs) {
- return NULL;
+ while (active && bs != active->backing_hd) {
+ active = active->backing_hd;
}

- intermediate = active;
- while (intermediate->backing_hd) {
- if (intermediate->backing_hd == bs) {
- overlay = intermediate;
- break;
- }
- intermediate = intermediate->backing_hd;
- }
+ return active;
+}

- return overlay;
+/* Given a BDS, searches for the base layer. */
+BlockDriverState *bdrv_find_base(BlockDriverState *bs)
+{
+ return bdrv_find_overlay(bs, NULL);
}

typedef struct BlkIntermediateStates {
@@ -4373,22 +4364,6 @@ int bdrv_get_backing_file_depth(BlockDriverState *bs)
return 1 + bdrv_get_backing_file_depth(bs->backing_hd);
}

-BlockDriverState *bdrv_find_base(BlockDriverState *bs)
-{
- BlockDriverState *curr_bs = NULL;
-
- if (!bs) {
- return NULL;
- }
-
- curr_bs = bs;
-
- while (curr_bs->backing_hd) {
- curr_bs = curr_bs->backing_hd;
- }
- return curr_bs;
-}
-
/**************************************************************/
/* async I/Os */
--
1.9.3
Stefan Hajnoczi
2014-06-19 06:31:44 UTC
Permalink
Post by Jeff Cody
This simplifies the function bdrv_find_overlay(). With this change,
bdrv_find_base() is just a subset of usage of bdrv_find_overlay(),
so this also takes advantage of that.
---
block.c | 45 ++++++++++-----------------------------------
1 file changed, 10 insertions(+), 35 deletions(-)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Stefan Hajnoczi
2014-06-19 06:27:15 UTC
Permalink
Post by Jeff Cody
This is a small helper function, to determine if 'base' is in the
chain of BlockDriverState 'top'. It returns true if it is in the chain,
and false otherwise.
If either argument is NULL, it will also return false.
---
block.c | 11 +++++++++++
include/block/block.h | 1 +
2 files changed, 12 insertions(+)
Reviewed-by: Stefan Hajnoczi <***@redhat.com>
Stefan Hajnoczi
2014-06-19 09:17:16 UTC
Permalink
* Check for attempt to commit an image to itself (Eric)
* Add a comment to the bdrv_find for block-commit, indicating
that libvirt uses the error case for probing (Eric)
* Added Benoit's R-b's
* Rebased on master
* Fixed commit log typos / stale paragraphs (Eric)
* Fixed comment typo (Eric)
* Added Eric's remaining R-b's
* Rebased on master
* Dropped overlay pointers, Eric's concerns are correct
* Require "device" for all arguments, in light of the above,
so we can find the active layer in all cases.
* Simplify more operations!
* Dropped Eric's Reviewed-by: on patches 3,5,6
Added Eric's Reviewed-by: on patches 8,9
* Add Eric's reviewed-by
* Addressed Eric's review comments
* Dropped HMP changes
* Added helper function for setting the overlay, and
set the overlay in bdrv_append()
* Use bs->backing_file instead of bs->backing_hd->filename in block_stream
Using node-names instead of filenames for block job operations
over QMP is a superior method of identifying the block driver
images to operate on, as it removes all pathname ambiguity.
This series modifies block-commit and block-stream to use node-names,
and creates a new QAPI command to allow stand-alone backing file
changes on an image file.
So that node-names can be used as desired for all block job
operations, this series also auto-generates node-names for every
BDS. User-specified node-names will override any autogenerated
block: Auto-generate node_names for each BDS entry
block: add helper function to determine if a BDS is in a chain
block: simplify bdrv_find_base() and bdrv_find_overlay()
block: make 'top' argument to block-commit optional
block: Accept node-name arguments for block-commit
block: extend block-commit to accept a string for the backing file
block: add ability for block-stream to use node-name
block: add backing-file option to block-stream
block: Add QMP documentation for block-stream
block: add QAPI command to allow live backing file change
block.c | 80 ++++++++--------
block/commit.c | 9 +-
block/stream.c | 11 +--
blockdev.c | 238 ++++++++++++++++++++++++++++++++++++++++++----
hmp.c | 1 +
include/block/block.h | 4 +-
include/block/block_int.h | 3 +-
qapi/block-core.json | 145 +++++++++++++++++++++++++---
qmp-commands.hx | 181 +++++++++++++++++++++++++++++++++--
tests/qemu-iotests/040 | 28 ++++--
10 files changed, 602 insertions(+), 98 deletions(-)
This series side-steps lack of child op blockers by checking only the
root node/drive.

Existing node-name commands like resize and snapshot-sync check for op
blockers on the actual node. They do not take the same approach as this
patch series.

We have a mess and I don't want to commit this series before we've
figured out what to do about child op blockers.

Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.

My questions are:
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.

b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.

c. We're painting ourselves into a corner by using the root node for op
blocker checks. We'll have to apply the same op blockers to all
nodes in a graph. There's no opportunity to apply different op
blockers to a subset of the child nodes. I *think* this can be
changed later without affecting the QMP API, so it's not a critical
issue.

The answer seems to be that op blockers should be propagated to all
child nodes and commands should test the node, not the drive/root node.
That gives us the flexibility for per-node op blockers in the future and
maintains compatibility with existing node-name users.

Stefan
Jeff Cody
2014-06-19 16:26:00 UTC
Permalink
Post by Stefan Hajnoczi
* Check for attempt to commit an image to itself (Eric)
* Add a comment to the bdrv_find for block-commit, indicating
that libvirt uses the error case for probing (Eric)
* Added Benoit's R-b's
* Rebased on master
* Fixed commit log typos / stale paragraphs (Eric)
* Fixed comment typo (Eric)
* Added Eric's remaining R-b's
* Rebased on master
* Dropped overlay pointers, Eric's concerns are correct
* Require "device" for all arguments, in light of the above,
so we can find the active layer in all cases.
* Simplify more operations!
* Dropped Eric's Reviewed-by: on patches 3,5,6
Added Eric's Reviewed-by: on patches 8,9
* Add Eric's reviewed-by
* Addressed Eric's review comments
* Dropped HMP changes
* Added helper function for setting the overlay, and
set the overlay in bdrv_append()
* Use bs->backing_file instead of bs->backing_hd->filename in block_stream
Using node-names instead of filenames for block job operations
over QMP is a superior method of identifying the block driver
images to operate on, as it removes all pathname ambiguity.
This series modifies block-commit and block-stream to use node-names,
and creates a new QAPI command to allow stand-alone backing file
changes on an image file.
So that node-names can be used as desired for all block job
operations, this series also auto-generates node-names for every
BDS. User-specified node-names will override any autogenerated
block: Auto-generate node_names for each BDS entry
block: add helper function to determine if a BDS is in a chain
block: simplify bdrv_find_base() and bdrv_find_overlay()
block: make 'top' argument to block-commit optional
block: Accept node-name arguments for block-commit
block: extend block-commit to accept a string for the backing file
block: add ability for block-stream to use node-name
block: add backing-file option to block-stream
block: Add QMP documentation for block-stream
block: add QAPI command to allow live backing file change
block.c | 80 ++++++++--------
block/commit.c | 9 +-
block/stream.c | 11 +--
blockdev.c | 238 ++++++++++++++++++++++++++++++++++++++++++----
hmp.c | 1 +
include/block/block.h | 4 +-
include/block/block_int.h | 3 +-
qapi/block-core.json | 145 +++++++++++++++++++++++++---
qmp-commands.hx | 181 +++++++++++++++++++++++++++++++++--
tests/qemu-iotests/040 | 28 ++++--
10 files changed, 602 insertions(+), 98 deletions(-)
This series side-steps lack of child op blockers by checking only the
root node/drive.
Yes. The lack of child op blockers is a definite issue.
Post by Stefan Hajnoczi
Existing node-name commands like resize and snapshot-sync check for op
blockers on the actual node. They do not take the same approach as this
patch series.
We have a mess and I don't want to commit this series before we've
figured out what to do about child op blockers.
Why? The problem doesn't go away if you don't commit this series;
-commit and -stream will still just check the topmost overlay. The
problem exists independent of this series. All this series does for
commit and stream is offer another way to specify an image. When a
series does address the blockers, it will likely need to go through
and touch all of the block job handlers.

Having said that, to be fair, the new QAPI command change-backing-file
does propagate this top-layer in-use flag semantic, but I would prefer
that patch to be dropped rather than not committing this series.

But if this series went in as-is for 2.1, nothing would change
regarding how -commit or -stream determines when to abort.
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.

The ability for internal BDSs to share a common base BDS makes some
block jobs unsafe currently, I believe. A crude and ugly fix is to
only allow a single block-job to occur at any given time, but that
doesn't seem feasible, so let's ignore that.

Perhaps, for 2.1, provide an overlay pointer list inside each BDS
(some of my earlier patches in this series had a single overlay, but
that is not enough). We could then apply op blockers to the topmost
nodes for any affected BDS image in a chain, by navigating upwards.
Not sure how complex this would be in practice, though.

We could also apply child blockers to all nodes in all directions in a
graph, if we don't want to rely on the topmost image as a blocker
proxy for the whole drive.
Post by Stefan Hajnoczi
c. We're painting ourselves into a corner by using the root node for op
blocker checks. We'll have to apply the same op blockers to all
nodes in a graph. There's no opportunity to apply different op
blockers to a subset of the child nodes. I *think* this can be
changed later without affecting the QMP API, so it's not a critical
issue.
We've already painted ourselves in that corner, alas.

I agree that from a QAPI perspective, the change is not critical:
once op blockers are correctly applied to all child nodes, any API
change (e.g. commit or stream) would likely be optional only (such as,
making 'device' optional instead of mandatory), and thus discoverable.
Post by Stefan Hajnoczi
The answer seems to be that op blockers should be propagated to all
child nodes and commands should test the node, not the drive/root node.
That gives us the flexibility for per-node op blockers in the future and
maintains compatibility with existing node-name users.
Long term thoughts:

So if I think of operations that are done on block devices from a
block job, and chuck them into categories, I think we have:

1) Read of guest-visible data
2) Write of guest-visible data
3) Read of host-visible data (e.g. image file metadata)
4) Write of host-visible data (e.g. image file metadata, such as
the backing-file)
5) Block chain manipulations (e.g. movement of a BDS, change to r/w
instead of r/o, etc..)
6) I/O attribute changes (e.g. throttling, etc..)

Does this make sense, and did I miss any (likely so)? It seems like
we should issue blockers not based on specific commands (e.g.
BLOCK_OP_TYPE_COMMIT), but rather based on what specific category of
interaction on a BDS we want to prohibit / allow.

I don't think specific command blockers provide enough granularity,
and doesn't necessarily scale well as new commands are added. It
forces a new block job author to go through the specific
implementation of the other block job commands, and interpret what
operations to prohibit based on what other jobs do. Whereas if each
command issues blockers based on the operation category, it takes care
of itself, and I just issue blockers based on my block job behavior.

Each command would then issue appropriate operational blockers to each
BDS affected by an operation. For instance, at first blush, I think
block-commit would want (at the very least) to block (and check) the
following, in this example chain:


[base] <-- [int1] <-- [int2] <-- [int3] <-- [top] <-- [overlay]

becomes:

[base] <-- [overlay]


Blocked operations per image:

* 'base' image

Operation | Blocked
-----------------------------
GUEST READ | Y
GUEST WRITE | Y
HOST READ |
HOST WRITE |
CHAIN | Y
I/O ATTRIBUTE |


* Intermediate images between 'base' up to and including 'top':

Operation | Blocked
-----------------------------
GUEST READ |
GUEST WRITE | Y
HOST READ |
HOST WRITE | Y
CHAIN | Y
I/O ATTRIBUTE |


* The overlay of 'top', if it exists:

Operation | Blocked
-----------------------------
GUEST READ |
GUEST WRITE |
HOST READ |
HOST WRITE | Y
CHAIN | Y
I/O ATTRIBUTE |
Eric Blake
2014-06-19 16:49:52 UTC
Permalink
Post by Jeff Cody
Having said that, to be fair, the new QAPI command change-backing-file
does propagate this top-layer in-use flag semantic, but I would prefer
that patch to be dropped rather than not committing this series.
Libvirt would prefer to have change-backing-file in the same release
that adds the optional backing-file parameters, as a witness that
backing-file is under libvirt's control (even if libvirt does not CALL
change-backing-file, the command needs to exist). One possibility for
2.1: create the command (so it shows up in 'query-commands' and
satisfies libvirt probing) but make it fail unless the node being
changed is the active node (that is, NO way to rewrite the metadata of a
backing file). Then, in 2.2, when we figure out op-blockers for child
nodes, we can enable the full power of change-backing-file to work on
any node, not just the active device node.
Post by Jeff Cody
Post by Stefan Hajnoczi
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
The ability for internal BDSs to share a common base BDS makes some
block jobs unsafe currently, I believe. A crude and ugly fix is to
only allow a single block-job to occur at any given time, but that
doesn't seem feasible, so let's ignore that.
Can we even create a common base BDS in qemu 2.1? I know that the
blockdev-add command was designed with this possibility in mind, but to
my knowledge, we have not actually wired it up yet. If 2.1 still
creates a distinct BDS for every element of every chain, even if some of
those distinct BDS are visiting the same file, then blocking jobs based
on device is sufficient. Then, in 2.2 when we finally figure out how to
do op-blockers on child nodes, we can also figure out sharing nodes.

Of course, until we have shared nodes, management apps must realize that
requesting an operation on a node that happens to have a duplicate BDS
visiting the same file from another device will NOT be flagged by qemu
as dangerous. That's just a quality-of-implementation issue, though -
just because qemu can't flag ALL stupid actions of management doesn't
mean qemu is buggy. Or put another way, if I have:

disk1: base <- img1
disk2: base <- img2

if qemu allows shared BDS, and I am able to open 'base' as a shared BDS
for both disks, then an attempt to block-commit img1 into base will be
blocked because base is still in use by img2. But as a management app,
I shouldn't be attempting that in the first place. Yes, it would be
nice if qemu blocks me from being stupid; but even if qemu allows shared
BDS, if I don't open a shared BDS on 'base' in the first place, it is no
different than if qemu doesn't allow shared BDS. Either way, it's not
qemu's fault if I, as management, try to commit into a file that is in
use by more than one chain.


if I am not able to do a shared BDS, I'm still
and as the management, I did not request that qemu open 'base' as a
shared BDS between both
Post by Jeff Cody
Perhaps, for 2.1, provide an overlay pointer list inside each BDS
(some of my earlier patches in this series had a single overlay, but
that is not enough). We could then apply op blockers to the topmost
nodes for any affected BDS image in a chain, by navigating upwards.
Not sure how complex this would be in practice, though.
We could also apply child blockers to all nodes in all directions in a
graph, if we don't want to rely on the topmost image as a blocker
proxy for the whole drive.
Post by Stefan Hajnoczi
c. We're painting ourselves into a corner by using the root node for op
blocker checks. We'll have to apply the same op blockers to all
nodes in a graph. There's no opportunity to apply different op
blockers to a subset of the child nodes. I *think* this can be
changed later without affecting the QMP API, so it's not a critical
issue.
We've already painted ourselves in that corner, alas.
once op blockers are correctly applied to all child nodes, any API
change (e.g. commit or stream) would likely be optional only (such as,
making 'device' optional instead of mandatory), and thus discoverable.
Well, discoverable if we ever get QAPI introspection added, or if we can
rely on the same hacks as we are already planning to use with
optional-'top' of getting a distinct GenericError vs. DeviceNotFound
class when abusing the QMP call with a bogus device name during probing
time.
Post by Jeff Cody
Post by Stefan Hajnoczi
The answer seems to be that op blockers should be propagated to all
child nodes and commands should test the node, not the drive/root node.
That gives us the flexibility for per-node op blockers in the future and
maintains compatibility with existing node-name users.
So if I think of operations that are done on block devices from a
1) Read of guest-visible data
2) Write of guest-visible data
3) Read of host-visible data (e.g. image file metadata)
4) Write of host-visible data (e.g. image file metadata, such as
the backing-file)
5) Block chain manipulations (e.g. movement of a BDS, change to r/w
instead of r/o, etc..)
6) I/O attribute changes (e.g. throttling, etc..)
Does this make sense, and did I miss any (likely so)? It seems like
we should issue blockers not based on specific commands (e.g.
BLOCK_OP_TYPE_COMMIT), but rather based on what specific category of
interaction on a BDS we want to prohibit / allow.
Yes, I like that train of thought - it's not the operation that we are
blocking, but the access category(ies) required by that operation.
Post by Jeff Cody
I don't think specific command blockers provide enough granularity,
and doesn't necessarily scale well as new commands are added. It
forces a new block job author to go through the specific
implementation of the other block job commands, and interpret what
operations to prohibit based on what other jobs do. Whereas if each
command issues blockers based on the operation category, it takes care
of itself, and I just issue blockers based on my block job behavior.
Each command would then issue appropriate operational blockers to each
BDS affected by an operation. For instance, at first blush, I think
block-commit would want (at the very least) to block (and check) the
[base] <-- [int1] <-- [int2] <-- [int3] <-- [top] <-- [overlay]
[base] <-- [overlay]
* 'base' image
Operation | Blocked
-----------------------------
GUEST READ | Y
GUEST WRITE | Y
HOST READ |
HOST WRITE |
CHAIN | Y
I/O ATTRIBUTE |
Makes sense:

blocking guest read because we are actively modifying the contents; if
any other operation branches off of base, they will see data that is not
consistent with what the guest sees through overlay.

blocking guest write because it is a backing chain, and overlay still
depends on base image for sectors that have not yet been commited..

blocking chain manipulations because we are going to do a chain
manipulation once our command completes.

Maybe you also want to block HOST_WRITE (our chain manipulation is done
by writing host metadata; so if someone else writes that under our feet,
we may have races on what gets written).
Post by Jeff Cody
Operation | Blocked
-----------------------------
GUEST READ |
GUEST WRITE | Y
HOST READ |
HOST WRITE | Y
CHAIN | Y
I/O ATTRIBUTE |
This needs to also block GUEST_READ - as soon as the commit starts
modifying base, ALL intermediate images are invalid (they no longer
represent a state of memory as was seen by the guest). Once you start a
commit, you MUST NOT allow a fleecing operation on any of the
intermediate operations.
Post by Jeff Cody
Operation | Blocked
-----------------------------
GUEST READ |
GUEST WRITE |
HOST READ |
HOST WRITE | Y
CHAIN | Y
I/O ATTRIBUTE |
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Eric Blake
2014-06-19 16:54:14 UTC
Permalink
Post by Eric Blake
if qemu allows shared BDS, and I am able to open 'base' as a shared BDS
for both disks, then an attempt to block-commit img1 into base will be
blocked because base is still in use by img2. But as a management app,
I shouldn't be attempting that in the first place. Yes, it would be
nice if qemu blocks me from being stupid; but even if qemu allows shared
BDS, if I don't open a shared BDS on 'base' in the first place, it is no
different than if qemu doesn't allow shared BDS. Either way, it's not
qemu's fault if I, as management, try to commit into a file that is in
use by more than one chain.
That's what I meant to say,
Post by Eric Blake
if I am not able to do a shared BDS, I'm still
and as the management, I did not request that qemu open 'base' as a
shared BDS between both
and this was leftover ramblings that I was experimenting with, but
forgot to delete once I had the paragraph above the way I liked it.
Sorry for any confusion.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Jeff Cody
2014-06-19 18:22:37 UTC
Permalink
<snip> Breaking the reply into two parts, for easier discussion
Post by Eric Blake
Post by Jeff Cody
So if I think of operations that are done on block devices from a
1) Read of guest-visible data
2) Write of guest-visible data
3) Read of host-visible data (e.g. image file metadata)
4) Write of host-visible data (e.g. image file metadata, such as
the backing-file)
5) Block chain manipulations (e.g. movement of a BDS, change to r/w
instead of r/o, etc..)
6) I/O attribute changes (e.g. throttling, etc..)
Does this make sense, and did I miss any (likely so)? It seems like
we should issue blockers not based on specific commands (e.g.
BLOCK_OP_TYPE_COMMIT), but rather based on what specific category of
interaction on a BDS we want to prohibit / allow.
Yes, I like that train of thought - it's not the operation that we are
blocking, but the access category(ies) required by that operation.
Post by Jeff Cody
I don't think specific command blockers provide enough granularity,
and doesn't necessarily scale well as new commands are added. It
forces a new block job author to go through the specific
implementation of the other block job commands, and interpret what
operations to prohibit based on what other jobs do. Whereas if each
command issues blockers based on the operation category, it takes care
of itself, and I just issue blockers based on my block job behavior.
Each command would then issue appropriate operational blockers to each
BDS affected by an operation. For instance, at first blush, I think
block-commit would want (at the very least) to block (and check) the
[base] <-- [int1] <-- [int2] <-- [int3] <-- [top] <-- [overlay]
[base] <-- [overlay]
* 'base' image
Operation | Blocked
-----------------------------
GUEST READ | Y
GUEST WRITE | Y
HOST READ |
HOST WRITE |
CHAIN | Y
I/O ATTRIBUTE |
blocking guest read because we are actively modifying the contents; if
any other operation branches off of base, they will see data that is not
consistent with what the guest sees through overlay.
blocking guest write because it is a backing chain, and overlay still
depends on base image for sectors that have not yet been commited..
blocking chain manipulations because we are going to do a chain
manipulation once our command completes.
Maybe you also want to block HOST_WRITE (our chain manipulation is done
by writing host metadata; so if someone else writes that under our feet,
we may have races on what gets written).
Hmm, you are right, we should block HOST_WRITE in addition to CHAIN.
The one host-level change we make to 'base' during a commit is
changing it from r/o to r/w, and we wouldn't want anyone else to undo
that. Other than that, the other thing we might do is a
bdrv_truncate(), but I would consider that a GUEST_WRITE.

(I think I listed r/w flag changes under chain instead of host-write;
this is a mistake, that would be a host-write example)
Post by Eric Blake
Post by Jeff Cody
Operation | Blocked
-----------------------------
GUEST READ |
GUEST WRITE | Y
HOST READ |
HOST WRITE | Y
CHAIN | Y
I/O ATTRIBUTE |
This needs to also block GUEST_READ - as soon as the commit starts
modifying base, ALL intermediate images are invalid (they no longer
represent a state of memory as was seen by the guest). Once you start a
commit, you MUST NOT allow a fleecing operation on any of the
intermediate operations.
Yes, you are right, I should have GUEST_READ in there as well.

We could even get more granular; only apply the GUEST_READ blocker
once a layer above that BDS pushes data down into 'base' (e.g. 'int1'
data going into 'base' doesn't invalidate 'int2', but 'int3' or higher
data would). Up until that point, that BDS is still OK. That level
of granularity is probably not needed, and overly complex, however.
But it would be an interesting piece of framework to have in place.
Post by Eric Blake
Post by Jeff Cody
Operation | Blocked
-----------------------------
GUEST READ |
GUEST WRITE |
HOST READ |
HOST WRITE | Y
CHAIN | Y
I/O ATTRIBUTE |
Kevin Wolf
2014-06-24 12:55:30 UTC
Permalink
Post by Eric Blake
Post by Jeff Cody
Post by Stefan Hajnoczi
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
What's the problem again with just propagating all blockers down the
chain using bs->backing_hd and then checking just the affected nodes
rather than trying to figure out a root? It would probably be more
restrictive than necessary, but it looked like a safe first step.
Post by Eric Blake
Post by Jeff Cody
The ability for internal BDSs to share a common base BDS makes some
block jobs unsafe currently, I believe. A crude and ugly fix is to
only allow a single block-job to occur at any given time, but that
doesn't seem feasible, so let's ignore that.
Can we even create a common base BDS in qemu 2.1? I know that the
blockdev-add command was designed with this possibility in mind, but to
my knowledge, we have not actually wired it up yet.
You can create such chains, I just tried it:

x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,id=backing,file=/tmp/backing.qcow2 \
-drive file=/tmp/empty.qcow2,backing.file=backing \
-drive file=/tmp/empty2.qcow2,backing.file=backing

Gives me the same disk twice, and I can independently change them as you
would expect.
Post by Eric Blake
Post by Jeff Cody
Post by Stefan Hajnoczi
The answer seems to be that op blockers should be propagated to all
child nodes and commands should test the node, not the drive/root node.
That gives us the flexibility for per-node op blockers in the future and
maintains compatibility with existing node-name users.
So if I think of operations that are done on block devices from a
1) Read of guest-visible data
2) Write of guest-visible data
3) Read of host-visible data (e.g. image file metadata)
4) Write of host-visible data (e.g. image file metadata, such as
the backing-file)
5) Block chain manipulations (e.g. movement of a BDS, change to r/w
instead of r/o, etc..)
6) I/O attribute changes (e.g. throttling, etc..)
Does this make sense, and did I miss any (likely so)? It seems like
we should issue blockers not based on specific commands (e.g.
BLOCK_OP_TYPE_COMMIT), but rather based on what specific category of
interaction on a BDS we want to prohibit / allow.
Yes, I like that train of thought - it's not the operation that we are
blocking, but the access category(ies) required by that operation.
Yes, I argued something similar in our block meeting in Stuttgart, and I
still think it's a good idea in principle. The problem with it is that
the right categories aren't quite obvious.

For example, I think guest/host writes really means writes that change
the visible contents of this image. A shared backing file would require
that its contents never changes, but that's not really related to the
guest. A commit operation does a "guest" write on the base node because
its contents changes when you look only at the specific node.

Kevin
Stefan Hajnoczi
2014-06-23 13:08:09 UTC
Permalink
Post by Jeff Cody
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
Checking op blockers on the root node as a stop-gap is a good idea.
Let's apply it across all commands (e.g. snapshot-sync, resize).

Fam pointed out that this approach is vulnerable to blockdev-add, where
blockers could be set/checked on an incomplete BDS graph (since you can
add new nodes on top). Do we need to move the blockers up the graph if
a new root node is inserted?

Besides this issue, your approach seems like the quickest safe solution
for 2.1.
Post by Jeff Cody
The ability for internal BDSs to share a common base BDS makes some
block jobs unsafe currently, I believe. A crude and ugly fix is to
only allow a single block-job to occur at any given time, but that
doesn't seem feasible, so let's ignore that.
Right now I don't think we share BDS chains.

Stefan
Benoît Canet
2014-06-23 14:17:05 UTC
Permalink
Post by Stefan Hajnoczi
Post by Jeff Cody
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
Checking op blockers on the root node as a stop-gap is a good idea.
Let's apply it across all commands (e.g. snapshot-sync, resize).
Fam pointed out that this approach is vulnerable to blockdev-add, where
blockers could be set/checked on an incomplete BDS graph (since you can
add new nodes on top). Do we need to move the blockers up the graph if
a new root node is inserted?
Besides this issue, your approach seems like the quickest safe solution
for 2.1.
I agree that always blocking the top BDS would be tactical.
Even it would need to move the blocker on the root on node insertion it would
solve the issues I have in the quorum maintainances patches of recursive BDS loops
and ownerships.

Best regards

Benoît
Post by Stefan Hajnoczi
Post by Jeff Cody
The ability for internal BDSs to share a common base BDS makes some
block jobs unsafe currently, I believe. A crude and ugly fix is to
only allow a single block-job to occur at any given time, but that
doesn't seem feasible, so let's ignore that.
Right now I don't think we share BDS chains.
Stefan
Fam Zheng
2014-06-24 02:48:52 UTC
Permalink
Post by Stefan Hajnoczi
Post by Jeff Cody
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
Checking op blockers on the root node as a stop-gap is a good idea.
Let's apply it across all commands (e.g. snapshot-sync, resize).
Fam pointed out that this approach is vulnerable to blockdev-add, where
blockers could be set/checked on an incomplete BDS graph (since you can
add new nodes on top). Do we need to move the blockers up the graph if
a new root node is inserted?
My concern is if we allow adding new root on top, it's not easy to know the
real root then.

To give an example:

If we have

[base id=""] <- [active id="drive0" blockers=...]

When user does

(QMP) block-commit device="drive0" ...

We should check drive0, which is OK.

Then, assume user adds a new root on top, we would take care of moving the
blockers:

[base id=""] <- [active id="drive0"] <- [active id="drive1" blockers=]

At this point, what if user does something on drive0 again?

(QMP) block-commit device="drive0" ...

The right thing to do is to check blockers on "drive1", since it's the real
root now. But how do we know? Do we need to add a back reference pointer
->overlap_hd in BDS, or do we maintain a look up table, or do we search all BDS
graphs to figure out?

None is easier than if we put the blockers in the bottom BDS, in the first
place:

[base id="" blockers=...] <- [active id="drive0"]
^^^^^^^^^^^^

Even if user adds a new root, we don't need to worry about moving blockers,
because the bottom is not changed.

[base id="" blockers=...] <- [active id="drive0"] <- [active id="drive1"]

Checking the blockers are easy, either for drive0 or drive1: just follow the
backing chain until getting to the end.

Fam
Jeff Cody
2014-06-24 13:32:59 UTC
Permalink
Post by Fam Zheng
Post by Stefan Hajnoczi
Post by Jeff Cody
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
Checking op blockers on the root node as a stop-gap is a good idea.
Let's apply it across all commands (e.g. snapshot-sync, resize).
Fam pointed out that this approach is vulnerable to blockdev-add, where
blockers could be set/checked on an incomplete BDS graph (since you can
add new nodes on top). Do we need to move the blockers up the graph if
a new root node is inserted?
My concern is if we allow adding new root on top, it's not easy to know the
real root then.
If we have
[base id=""] <- [active id="drive0" blockers=...]
When user does
(QMP) block-commit device="drive0" ...
We should check drive0, which is OK.
Then, assume user adds a new root on top, we would take care of moving the
[base id=""] <- [active id="drive0"] <- [active id="drive1" blockers=]
At this point, what if user does something on drive0 again?
(QMP) block-commit device="drive0" ...
The right thing to do is to check blockers on "drive1", since it's the real
root now. But how do we know? Do we need to add a back reference pointer
->overlap_hd in BDS, or do we maintain a look up table, or do we search all BDS
graphs to figure out?
None is easier than if we put the blockers in the bottom BDS, in the first
[base id="" blockers=...] <- [active id="drive0"]
^^^^^^^^^^^^
I think you are right. If we place the blocker at the bottom-most
BDS, then that would be a more restrictive blocker. This may end up
being more restrictive than needed, but more importantly it should
make everything safe.

Also, it is an easy change for 2.1 - just call bdrv_find_base(bs), and
set/check/clear blockers on the returned BDS.
Post by Fam Zheng
Even if user adds a new root, we don't need to worry about moving blockers,
because the bottom is not changed.
[base id="" blockers=...] <- [active id="drive0"] <- [active id="drive1"]
Checking the blockers are easy, either for drive0 or drive1: just follow the
backing chain until getting to the end.
Fam
Kevin Wolf
2014-06-24 14:08:57 UTC
Permalink
Post by Jeff Cody
Post by Fam Zheng
Post by Stefan Hajnoczi
Post by Jeff Cody
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
Checking op blockers on the root node as a stop-gap is a good idea.
Let's apply it across all commands (e.g. snapshot-sync, resize).
Fam pointed out that this approach is vulnerable to blockdev-add, where
blockers could be set/checked on an incomplete BDS graph (since you can
add new nodes on top). Do we need to move the blockers up the graph if
a new root node is inserted?
My concern is if we allow adding new root on top, it's not easy to know the
real root then.
If we have
[base id=""] <- [active id="drive0" blockers=...]
When user does
(QMP) block-commit device="drive0" ...
We should check drive0, which is OK.
Then, assume user adds a new root on top, we would take care of moving the
[base id=""] <- [active id="drive0"] <- [active id="drive1" blockers=]
At this point, what if user does something on drive0 again?
(QMP) block-commit device="drive0" ...
The right thing to do is to check blockers on "drive1", since it's the real
root now. But how do we know? Do we need to add a back reference pointer
->overlap_hd in BDS, or do we maintain a look up table, or do we search all BDS
graphs to figure out?
None is easier than if we put the blockers in the bottom BDS, in the first
[base id="" blockers=...] <- [active id="drive0"]
^^^^^^^^^^^^
I think you are right. If we place the blocker at the bottom-most
BDS, then that would be a more restrictive blocker. This may end up
being more restrictive than needed, but more importantly it should
make everything safe.
Also, it is an easy change for 2.1 - just call bdrv_find_base(bs), and
set/check/clear blockers on the returned BDS.
What does bdrv_find_base() return for e.g. quorum?

Kevin
Post by Jeff Cody
Post by Fam Zheng
Even if user adds a new root, we don't need to worry about moving blockers,
because the bottom is not changed.
[base id="" blockers=...] <- [active id="drive0"] <- [active id="drive1"]
Checking the blockers are easy, either for drive0 or drive1: just follow the
backing chain until getting to the end.
Fam
Benoît Canet
2014-06-24 15:30:53 UTC
Permalink
Post by Kevin Wolf
Post by Jeff Cody
Post by Fam Zheng
Post by Stefan Hajnoczi
Post by Jeff Cody
Post by Stefan Hajnoczi
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
I don't think (b) is the ultimate solution. It is used as a stop-gap
because op blockers in the current implementation is essentially
analogous to the in-use flag. But is it good enough for 2.1? If
*everything* checks the topmost node in 2.1, then I think we are OK in
all cases except where images files share a common BDS.
Checking op blockers on the root node as a stop-gap is a good idea.
Let's apply it across all commands (e.g. snapshot-sync, resize).
Fam pointed out that this approach is vulnerable to blockdev-add, where
blockers could be set/checked on an incomplete BDS graph (since you can
add new nodes on top). Do we need to move the blockers up the graph if
a new root node is inserted?
My concern is if we allow adding new root on top, it's not easy to know the
real root then.
If we have
[base id=""] <- [active id="drive0" blockers=...]
When user does
(QMP) block-commit device="drive0" ...
We should check drive0, which is OK.
Then, assume user adds a new root on top, we would take care of moving the
[base id=""] <- [active id="drive0"] <- [active id="drive1" blockers=]
At this point, what if user does something on drive0 again?
(QMP) block-commit device="drive0" ...
The right thing to do is to check blockers on "drive1", since it's the real
root now. But how do we know? Do we need to add a back reference pointer
->overlap_hd in BDS, or do we maintain a look up table, or do we search all BDS
graphs to figure out?
None is easier than if we put the blockers in the bottom BDS, in the first
[base id="" blockers=...] <- [active id="drive0"]
^^^^^^^^^^^^
I think you are right. If we place the blocker at the bottom-most
BDS, then that would be a more restrictive blocker. This may end up
being more restrictive than needed, but more importantly it should
make everything safe.
Also, it is an easy change for 2.1 - just call bdrv_find_base(bs), and
set/check/clear blockers on the returned BDS.
What does bdrv_find_base() return for e.g. quorum?
This will not work when unblocking a BDS loop like the one formed by drive-mirror
when replacing an arbitrary node.
Post by Kevin Wolf
Kevin
Post by Jeff Cody
Post by Fam Zheng
Even if user adds a new root, we don't need to worry about moving blockers,
because the bottom is not changed.
[base id="" blockers=...] <- [active id="drive0"] <- [active id="drive1"]
Checking the blockers are easy, either for drive0 or drive1: just follow the
backing chain until getting to the end.
Fam
Benoît Canet
2014-06-19 17:49:20 UTC
Permalink
Post by Stefan Hajnoczi
* Check for attempt to commit an image to itself (Eric)
* Add a comment to the bdrv_find for block-commit, indicating
that libvirt uses the error case for probing (Eric)
* Added Benoit's R-b's
* Rebased on master
* Fixed commit log typos / stale paragraphs (Eric)
* Fixed comment typo (Eric)
* Added Eric's remaining R-b's
* Rebased on master
* Dropped overlay pointers, Eric's concerns are correct
* Require "device" for all arguments, in light of the above,
so we can find the active layer in all cases.
* Simplify more operations!
* Dropped Eric's Reviewed-by: on patches 3,5,6
Added Eric's Reviewed-by: on patches 8,9
* Add Eric's reviewed-by
* Addressed Eric's review comments
* Dropped HMP changes
* Added helper function for setting the overlay, and
set the overlay in bdrv_append()
* Use bs->backing_file instead of bs->backing_hd->filename in block_stream
Using node-names instead of filenames for block job operations
over QMP is a superior method of identifying the block driver
images to operate on, as it removes all pathname ambiguity.
This series modifies block-commit and block-stream to use node-names,
and creates a new QAPI command to allow stand-alone backing file
changes on an image file.
So that node-names can be used as desired for all block job
operations, this series also auto-generates node-names for every
BDS. User-specified node-names will override any autogenerated
block: Auto-generate node_names for each BDS entry
block: add helper function to determine if a BDS is in a chain
block: simplify bdrv_find_base() and bdrv_find_overlay()
block: make 'top' argument to block-commit optional
block: Accept node-name arguments for block-commit
block: extend block-commit to accept a string for the backing file
block: add ability for block-stream to use node-name
block: add backing-file option to block-stream
block: Add QMP documentation for block-stream
block: add QAPI command to allow live backing file change
block.c | 80 ++++++++--------
block/commit.c | 9 +-
block/stream.c | 11 +--
blockdev.c | 238 ++++++++++++++++++++++++++++++++++++++++++----
hmp.c | 1 +
include/block/block.h | 4 +-
include/block/block_int.h | 3 +-
qapi/block-core.json | 145 +++++++++++++++++++++++++---
qmp-commands.hx | 181 +++++++++++++++++++++++++++++++++--
tests/qemu-iotests/040 | 28 ++++--
10 files changed, 602 insertions(+), 98 deletions(-)
This series side-steps lack of child op blockers by checking only the
root node/drive.
Existing node-name commands like resize and snapshot-sync check for op
blockers on the actual node. They do not take the same approach as this
patch series.
We have a mess and I don't want to commit this series before we've
figured out what to do about child op blockers.
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
c. We're painting ourselves into a corner by using the root node for op
blocker checks. We'll have to apply the same op blockers to all
nodes in a graph. There's no opportunity to apply different op
blockers to a subset of the child nodes. I *think* this can be
changed later without affecting the QMP API, so it's not a critical
issue.
The answer seems to be that op blockers should be propagated to all
child nodes and commands should test the node, not the drive/root node.
That gives us the flexibility for per-node op blockers in the future and
maintains compatibility with existing node-name users.
I tried to write a recursive function to block and unblock all child nodes today.

It turn that even if the recursion functions are easy to write there are some
hurdle with the approach.

In drive mirror when a subtree is swapped nobody will be here to unlock it
and bdrv_unref will fail.

In block-commmit the drive mirror run part form a BDS loop that is totally uneasy
to unblock properly in bdrv_set_backing_file.

So writing these functions is not trivial and we are late in the release cycle.

Are you sure that there is no lighter path for the quorum and node name patchset ?

Best regards

Benoît
Post by Stefan Hajnoczi
Stefan
Jeff Cody
2014-06-24 17:08:03 UTC
Permalink
Post by Stefan Hajnoczi
* Check for attempt to commit an image to itself (Eric)
* Add a comment to the bdrv_find for block-commit, indicating
that libvirt uses the error case for probing (Eric)
* Added Benoit's R-b's
* Rebased on master
* Fixed commit log typos / stale paragraphs (Eric)
* Fixed comment typo (Eric)
* Added Eric's remaining R-b's
* Rebased on master
* Dropped overlay pointers, Eric's concerns are correct
* Require "device" for all arguments, in light of the above,
so we can find the active layer in all cases.
* Simplify more operations!
* Dropped Eric's Reviewed-by: on patches 3,5,6
Added Eric's Reviewed-by: on patches 8,9
* Add Eric's reviewed-by
* Addressed Eric's review comments
* Dropped HMP changes
* Added helper function for setting the overlay, and
set the overlay in bdrv_append()
* Use bs->backing_file instead of bs->backing_hd->filename in block_stream
Using node-names instead of filenames for block job operations
over QMP is a superior method of identifying the block driver
images to operate on, as it removes all pathname ambiguity.
This series modifies block-commit and block-stream to use node-names,
and creates a new QAPI command to allow stand-alone backing file
changes on an image file.
So that node-names can be used as desired for all block job
operations, this series also auto-generates node-names for every
BDS. User-specified node-names will override any autogenerated
block: Auto-generate node_names for each BDS entry
block: add helper function to determine if a BDS is in a chain
block: simplify bdrv_find_base() and bdrv_find_overlay()
block: make 'top' argument to block-commit optional
block: Accept node-name arguments for block-commit
block: extend block-commit to accept a string for the backing file
block: add ability for block-stream to use node-name
block: add backing-file option to block-stream
block: Add QMP documentation for block-stream
block: add QAPI command to allow live backing file change
block.c | 80 ++++++++--------
block/commit.c | 9 +-
block/stream.c | 11 +--
blockdev.c | 238 ++++++++++++++++++++++++++++++++++++++++++----
hmp.c | 1 +
include/block/block.h | 4 +-
include/block/block_int.h | 3 +-
qapi/block-core.json | 145 +++++++++++++++++++++++++---
qmp-commands.hx | 181 +++++++++++++++++++++++++++++++++--
tests/qemu-iotests/040 | 28 ++++--
10 files changed, 602 insertions(+), 98 deletions(-)
This series side-steps lack of child op blockers by checking only the
root node/drive.
Existing node-name commands like resize and snapshot-sync check for op
blockers on the actual node. They do not take the same approach as this
patch series.
We have a mess and I don't want to commit this series before we've
figured out what to do about child op blockers.
Let's discuss this topic in a sub-thread and figure out what to do for
QEMU 2.1. This is an important issue to solve before the release
because we can't change QMP command semantics easily later.
a. How do we fix resize, snapshot-sync, etc? It seems like we need to
propagate child op blockers.
b. Is it a good idea to perform op blocker checks on the root node?
It's inconsistent with resize, snapshot-sync, etc. Permissions in
BDS graphs with multiple root nodes (e.g. guest device and NBD
run-time server) will be different depending on which root you
specify.
Looking at the code, backing_blocker on all backing files currently
breaks block-commit.
if (base_len < s->common.len) {
ret = bdrv_truncate(base, s->common.len);
if (ret) {
goto exit_restore_reopen;
}
}

We grow the base, if the top layer is larger, otherwise we get -EIO.
Now, with the backing blockers, from bdrv_truncate() in block.c:

if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_RESIZE, NULL)) {
return -EBUSY;
}

So this will be a regression for 2.1, unless we fix it.

Per IRC conversation, I think we can safely move the
bdrv_op_is_blocked() check to qmp_block_resize().

I'll put a patch out to do this, and add a test to qemu-iotests.
Post by Stefan Hajnoczi
c. We're painting ourselves into a corner by using the root node for op
blocker checks. We'll have to apply the same op blockers to all
nodes in a graph. There's no opportunity to apply different op
blockers to a subset of the child nodes. I *think* this can be
changed later without affecting the QMP API, so it's not a critical
issue.
The answer seems to be that op blockers should be propagated to all
child nodes and commands should test the node, not the drive/root node.
That gives us the flexibility for per-node op blockers in the future and
maintains compatibility with existing node-name users.
Stefan
Benoît Canet
2014-06-23 10:24:32 UTC
Permalink
Post by Jeff Cody
This is a small helper function, to determine if 'base' is in the
chain of BlockDriverState 'top'. It returns true if it is in the chain,
and false otherwise.
If either argument is NULL, it will also return false.
---
block.c | 11 +++++++++++
include/block/block.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/block.c b/block.c
index da32bb0..280a167 100644
--- a/block.c
+++ b/block.c
@@ -3819,6 +3819,17 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
return NULL;
}
+/* If 'base' is in the same chain as 'top', return true. Otherwise,
+ * return false. If either argument is NULL, return false. */
+bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base)
+{
+ while (top && top != base) {
+ top = top->backing_hd;
+ }
+
+ return top != NULL;
+}
+
BlockDriverState *bdrv_next(BlockDriverState *bs)
{
if (!bs) {
diff --git a/include/block/block.h b/include/block/block.h
index f15b99b..c60bd52 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -403,6 +403,7 @@ BlockDeviceInfoList *bdrv_named_nodes_list(void);
BlockDriverState *bdrv_lookup_bs(const char *device,
const char *node_name,
Error **errp);
+bool bdrv_chain_contains(BlockDriverState *top, BlockDriverState *base);
BlockDriverState *bdrv_next(BlockDriverState *bs);
void bdrv_iterate(void (*it)(void *opaque, BlockDriverState *bs),
void *opaque);
--
1.9.3
I know I rev by this patch but it now appears we will probably need to extend
this function to work with multiple children BDS.

Jeff do you prefer to take care of this in this series or should I rework it later
in my quorum maintainance series ?

Best regards

Benoît
Continue reading on narkive:
Loading...