Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements. https://git-scm.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
git/commit-graph.c

1278 lines
34 KiB

#include "cache.h"
#include "config.h"
#include "dir.h"
#include "git-compat-util.h"
#include "lockfile.h"
#include "pack.h"
#include "packfile.h"
#include "commit.h"
#include "object.h"
#include "refs.h"
#include "revision.h"
#include "sha1-lookup.h"
#include "commit-graph.h"
#include "object-store.h"
#include "alloc.h"
#include "hashmap.h"
#include "replace-object.h"
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
#include "progress.h"
#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
#define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
#define GRAPH_VERSION_1 0x1
#define GRAPH_VERSION GRAPH_VERSION_1
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
#define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
#define GRAPH_EDGE_LAST_MASK 0x7fffffff
#define GRAPH_PARENT_NONE 0x70000000
#define GRAPH_LAST_EDGE 0x80000000
#define GRAPH_HEADER_SIZE 8
#define GRAPH_FANOUT_SIZE (4 * 256)
#define GRAPH_CHUNKLOOKUP_WIDTH 12
#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
char *get_commit_graph_filename(const char *obj_dir)
{
return xstrfmt("%s/info/commit-graph", obj_dir);
}
static uint8_t oid_version(void)
{
return 1;
}
static struct commit_graph *alloc_commit_graph(void)
{
struct commit_graph *g = xcalloc(1, sizeof(*g));
g->graph_fd = -1;
return g;
}
extern int read_replace_refs;
static int commit_graph_compatible(struct repository *r)
{
if (!r->gitdir)
return 0;
if (read_replace_refs) {
prepare_replace_object(r);
if (hashmap_get_size(&r->objects->replace_map->map))
return 0;
}
prepare_commit_graft(r);
if (r->parsed_objects && r->parsed_objects->grafts_nr)
return 0;
if (is_repository_shallow(r))
return 0;
return 1;
}
struct commit_graph *load_commit_graph_one(const char *graph_file)
{
void *graph_map;
size_t graph_size;
struct stat st;
struct commit_graph *ret;
int fd = git_open(graph_file);
if (fd < 0)
return NULL;
if (fstat(fd, &st)) {
close(fd);
return NULL;
}
graph_size = xsize_t(st.st_size);
if (graph_size < GRAPH_MIN_SIZE) {
close(fd);
die(_("graph file %s is too small"), graph_file);
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
ret = parse_commit_graph(graph_map, fd, graph_size);
if (!ret) {
munmap(graph_map, graph_size);
close(fd);
exit(1);
}
return ret;
}
commit-graph: fix segfault on e.g. "git status" When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should *warn* and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
static int verify_commit_graph_lite(struct commit_graph *g)
{
/*
* Basic validation shared between parse_commit_graph()
* which'll be called every time the graph is used, and the
* much more expensive verify_commit_graph() used by
* "commit-graph verify".
*
* There should only be very basic checks here to ensure that
* we don't e.g. segfault in fill_commit_in_graph(), but
* because this is a very hot codepath nothing that e.g. loops
* over g->num_commits, or runs a checksum on the commit-graph
* itself.
*/
if (!g->chunk_oid_fanout) {
error("commit-graph is missing the OID Fanout chunk");
return 1;
}
if (!g->chunk_oid_lookup) {
error("commit-graph is missing the OID Lookup chunk");
return 1;
}
if (!g->chunk_commit_data) {
error("commit-graph is missing the Commit Data chunk");
return 1;
}
return 0;
}
struct commit_graph *parse_commit_graph(void *graph_map, int fd,
size_t graph_size)
{
const unsigned char *data, *chunk_lookup;
uint32_t i;
struct commit_graph *graph;
uint64_t last_chunk_offset;
uint32_t last_chunk_id;
uint32_t graph_signature;
unsigned char graph_version, hash_version;
if (!graph_map)
return NULL;
if (graph_size < GRAPH_MIN_SIZE)
return NULL;
data = (const unsigned char *)graph_map;
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
error(_("graph signature %X does not match signature %X"),
graph_signature, GRAPH_SIGNATURE);
return NULL;
}
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
error(_("graph version %X does not match version %X"),
graph_version, GRAPH_VERSION);
return NULL;
}
hash_version = *(unsigned char*)(data + 5);
if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
hash_version, oid_version());
return NULL;
}
graph = alloc_commit_graph();
graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
graph->data_len = graph_size;
last_chunk_id = 0;
last_chunk_offset = 8;
chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
uint32_t chunk_id;
uint64_t chunk_offset;
int chunk_repeated = 0;
if (data + graph_size - chunk_lookup <
GRAPH_CHUNKLOOKUP_WIDTH) {
error(_("chunk lookup table entry missing; graph file may be incomplete"));
free(graph);
return NULL;
}
chunk_id = get_be32(chunk_lookup + 0);
chunk_offset = get_be64(chunk_lookup + 4);
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
(uint32_t)chunk_offset);
free(graph);
return NULL;
}
switch (chunk_id) {
case GRAPH_CHUNKID_OIDFANOUT:
if (graph->chunk_oid_fanout)
chunk_repeated = 1;
else
graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
break;
case GRAPH_CHUNKID_OIDLOOKUP:
if (graph->chunk_oid_lookup)
chunk_repeated = 1;
else
graph->chunk_oid_lookup = data + chunk_offset;
break;
case GRAPH_CHUNKID_DATA:
if (graph->chunk_commit_data)
chunk_repeated = 1;
else
graph->chunk_commit_data = data + chunk_offset;
break;
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
case GRAPH_CHUNKID_EXTRAEDGES:
if (graph->chunk_extra_edges)
chunk_repeated = 1;
else
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
graph->chunk_extra_edges = data + chunk_offset;
break;
}
if (chunk_repeated) {
error(_("chunk id %08x appears multiple times"), chunk_id);
free(graph);
return NULL;
}
if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
{
graph->num_commits = (chunk_offset - last_chunk_offset)
/ graph->hash_len;
}
last_chunk_id = chunk_id;
last_chunk_offset = chunk_offset;
}
commit-graph: fix segfault on e.g. "git status" When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should *warn* and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
if (verify_commit_graph_lite(graph))
return NULL;
return graph;
}
static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
{
char *graph_name;
if (r->objects->commit_graph)
return;
graph_name = get_commit_graph_filename(obj_dir);
r->objects->commit_graph =
load_commit_graph_one(graph_name);
FREE_AND_NULL(graph_name);
}
/*
* Return 1 if commit_graph is non-NULL, and 0 otherwise.
*
* On the first invocation, this function attemps to load the commit
* graph if the_repository is configured to have one.
*/
static int prepare_commit_graph(struct repository *r)
{
struct object_directory *odb;
int config_value;
if (r->objects->commit_graph_attempted)
return !!r->objects->commit_graph;
r->objects->commit_graph_attempted = 1;
if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
(repo_config_get_bool(r, "core.commitgraph", &config_value) ||
!config_value))
/*
* This repository is not configured to use commit graphs, so
* do not load one. (But report commit_graph_attempted anyway
* so that commit graph loading is not attempted again for this
* repository.)
*/
return 0;
if (!commit_graph_compatible(r))
return 0;
prepare_alt_odb(r);
sha1-file: use an object_directory for the main object dir Our handling of alternate object directories is needlessly different from the main object directory. As a result, many places in the code basically look like this: do_something(r->objects->objdir); for (odb = r->objects->alt_odb_list; odb; odb = odb->next) do_something(odb->path); That gets annoying when do_something() is non-trivial, and we've resorted to gross hacks like creating fake alternates (see find_short_object_filename()). Instead, let's give each raw_object_store a unified list of object_directory structs. The first will be the main store, and everything after is an alternate. Very few callers even care about the distinction, and can just loop over the whole list (and those who care can just treat the first element differently). A few observations: - we don't need r->objects->objectdir anymore, and can just mechanically convert that to r->objects->odb->path - object_directory's path field needs to become a real pointer rather than a FLEX_ARRAY, in order to fill it with expand_base_dir() - we'll call prepare_alt_odb() earlier in many functions (i.e., outside of the loop). This may result in us calling it even when our function would be satisfied looking only at the main odb. But this doesn't matter in practice. It's not a very expensive operation in the first place, and in the majority of cases it will be a noop. We call it already (and cache its results) in prepare_packed_git(), and we'll generally check packs before loose objects. So essentially every program is going to call it immediately once per program. Arguably we should just prepare_alt_odb() immediately upon setting up the repository's object directory, which would save us sprinkling calls throughout the code base (and forgetting to do so has been a source of subtle bugs in the past). But I've stopped short of that here, since there are already a lot of other moving parts in this patch. - Most call sites just get shorter. The check_and_freshen() functions are an exception, because they have entry points to handle local and nonlocal directories separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
for (odb = r->objects->odb;
!r->objects->commit_graph && odb;
odb = odb->next)
prepare_commit_graph_one(r, odb->path);
return !!r->objects->commit_graph;
}
commit-reach: use can_all_from_reach The is_descendant_of method previously used in_merge_bases() to check if the commit can reach any of the commits in the provided list. This had two performance problems: 1. The performance is quadratic in worst-case. 2. A single in_merge_bases() call requires walking beyond the target commit in order to find the full set of boundary commits that may be merge-bases. The can_all_from_reach method avoids this quadratic behavior and can limit the search beyond the target commits using generation numbers. It requires a small prototype adjustment to stop using commit-date as a cutoff, as that optimization is no longer appropriate here. Since in_merge_bases() uses paint_down_to_common(), is_descendant_of() naturally found cutoffs to avoid walking the entire commit graph. Since we want to always return the correct result, we cannot use the min_commit_date cutoff in can_all_from_reach. We then rely on generation numbers to provide the cutoff. Since not all repos will have a commit-graph file, nor will we always have generation numbers computed for a commit-graph file, create a new method, generation_numbers_enabled(), that checks for a commit-graph file and sees if the first commit in the file has a non-zero generation number. In the case that we do not have generation numbers, use the old logic for is_descendant_of(). Performance was meausured on a copy of the Linux repository using the 'test-tool reach is_descendant_of' command using this input: A:v4.9 X:v4.10 X:v4.11 X:v4.12 X:v4.13 X:v4.14 X:v4.15 X:v4.16 X:v4.17 X.v3.0 Note that this input is tailored to demonstrate the quadratic nature of the previous method, as it will compute merge-bases for v4.9 versus all of the later versions before checking against v4.1. Before: 0.26 s After: 0.21 s Since we previously used the is_descendant_of method in the ref_newer method, we also measured performance there using 'test-tool reach ref_newer' with this input: A:v4.9 B:v3.19 Before: 0.10 s After: 0.08 s By adding a new commit with parent v3.19, we test the non-reachable case of ref_newer: Before: 0.09 s After: 0.08 s Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
int generation_numbers_enabled(struct repository *r)
{
uint32_t first_generation;
struct commit_graph *g;
if (!prepare_commit_graph(r))
return 0;
g = r->objects->commit_graph;
if (!g->num_commits)
return 0;
first_generation = get_be32(g->chunk_commit_data +
g->hash_len + 8) >> 2;
return !!first_generation;
}
void close_commit_graph(struct repository *r)
{
free_commit_graph(r->objects->commit_graph);
r->objects->commit_graph = NULL;
}
static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
{
return bsearch_hash(oid->hash, g->chunk_oid_fanout,
g->chunk_oid_lookup, g->hash_len, pos);
}
static struct commit_list **insert_parent_or_die(struct repository *r,
struct commit_graph *g,
uint64_t pos,
struct commit_list **pptr)
{
struct commit *c;
struct object_id oid;
if (pos >= g->num_commits)
die("invalid parent position %"PRIu64, pos);
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit(r, &oid);
if (!c)
die(_("could not find commit %s"), oid_to_hex(&oid));
c->graph_pos = pos;
return &commit_list_insert(c, pptr)->next;
}
static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
{
const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos;
item->graph_pos = pos;
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
}
static int fill_commit_in_graph(struct repository *r,
struct commit *item,
struct commit_graph *g, uint32_t pos)
{
uint32_t edge_value;
uint32_t *parent_data_ptr;
uint64_t date_low, date_high;
struct commit_list **pptr;
const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
item->object.parsed = 1;
item->graph_pos = pos;
item->maybe_tree = NULL;
date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
date_low = get_be32(commit_data + g->hash_len + 12);
item->date = (timestamp_t)((date_high << 32) | date_low);
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
pptr = &item->parents;
edge_value = get_be32(commit_data + g->hash_len);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
pptr = insert_parent_or_die(r, g, edge_value, pptr);
edge_value = get_be32(commit_data + g->hash_len + 4);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
if (!(edge_value & GRAPH_EXTRA_EDGES_NEEDED)) {
pptr = insert_parent_or_die(r, g, edge_value, pptr);
return 1;
}
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
parent_data_ptr = (uint32_t*)(g->chunk_extra_edges +
4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
do {
edge_value = get_be32(parent_data_ptr);
pptr = insert_parent_or_die(r, g,
edge_value & GRAPH_EDGE_LAST_MASK,
pptr);
parent_data_ptr++;
} while (!(edge_value & GRAPH_LAST_EDGE));
return 1;
}
static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
{
if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
*pos = item->graph_pos;
return 1;
} else {
return bsearch_graph(g, &(item->object.oid), pos);
}
}
static int parse_commit_in_graph_one(struct repository *r,
struct commit_graph *g,
struct commit *item)
{
uint32_t pos;
if (item->object.parsed)
return 1;
if (find_commit_in_graph(item, g, &pos))
return fill_commit_in_graph(r, item, g, pos);
return 0;
}
int parse_commit_in_graph(struct repository *r, struct commit *item)
{
if (!prepare_commit_graph(r))
return 0;
return parse_commit_in_graph_one(r, r->objects->commit_graph, item);
}
void load_commit_graph_info(struct repository *r, struct commit *item)
{
uint32_t pos;
if (!prepare_commit_graph(r))
return;
if (find_commit_in_graph(item, r->objects->commit_graph, &pos))
fill_commit_graph_info(item, r->objects->commit_graph, pos);
}
static struct tree *load_tree_for_commit(struct repository *r,
struct commit_graph *g,
struct commit *c)
{
struct object_id oid;
const unsigned char *commit_data = g->chunk_commit_data +
GRAPH_DATA_WIDTH * (c->graph_pos);
hashcpy(oid.hash, commit_data);
c->maybe_tree = lookup_tree(r, &oid);
return c->maybe_tree;
}
static struct tree *get_commit_tree_in_graph_one(struct repository *r,
struct commit_graph *g,
const struct commit *c)
{
if (c->maybe_tree)
return c->maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
return load_tree_for_commit(r, g, (struct commit *)c);
}
struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit *c)
{
return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
}
static void write_graph_chunk_fanout(struct hashfile *f,
struct commit **commits,
int nr_commits,
struct progress *progress,
uint64_t *progress_cnt)
{
int i, count = 0;
struct commit **list = commits;
/*
* Write the first-level table (the list is sorted,
* but we use a 256-entry lookup to be able to avoid
* having to do eight extra binary search iterations).
*/
for (i = 0; i < 256; i++) {
while (count < nr_commits) {
if ((*list)->object.oid.hash[0] != i)
break;
display_progress(progress, ++*progress_cnt);
count++;
list++;
}
hashwrite_be32(f, count);
}
}
static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
struct commit **commits, int nr_commits,
struct progress *progress,
uint64_t *progress_cnt)
{
struct commit **list = commits;
int count;
for (count = 0; count < nr_commits; count++, list++) {
display_progress(progress, ++*progress_cnt);
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
}
}
static const unsigned char *commit_to_sha1(size_t index, void *table)
{
struct commit **commits = table;
return commits[index]->object.oid.hash;
}
static void write_graph_chunk_data(struct hashfile *f, int hash_len,
struct commit **commits, int nr_commits,
struct progress *progress,
uint64_t *progress_cnt)
{
struct commit **list = commits;
struct commit **last = commits + nr_commits;
uint32_t num_extra_edges = 0;
while (list < last) {
struct commit_list *parent;
int edge_value;