Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements. https://git-scm.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
git/commit-graph.c

2482 lines
64 KiB

#include "git-compat-util.h"
#include "config.h"
#include "lockfile.h"
#include "pack.h"
#include "packfile.h"
#include "commit.h"
#include "object.h"
#include "refs.h"
#include "revision.h"
#include "sha1-lookup.h"
#include "commit-graph.h"
#include "object-store.h"
#include "alloc.h"
#include "hashmap.h"
#include "replace-object.h"
commit-graph write: add progress output Before this change the "commit-graph write" command didn't report any progress. On my machine this command takes more than 10 seconds to write the graph for linux.git, and around 1m30s on the 2015-04-03-1M-git.git[1] test repository (a test case for a large monorepository). Furthermore, since the gc.writeCommitGraph setting was added in d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27), there was no indication at all from a "git gc" run that anything was different. This why one of the progress bars being added here uses start_progress() instead of start_delayed_progress(), so that it's guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git repository: $ git -c gc.writeCommitGraph=true gc Enumerating objects: 2821, done. [...] Computing commit graph generation numbers: 100% (867/867), done. On larger repositories, such as linux.git the delayed progress bar(s) will kick in, and we'll show what's going on instead of, as was previously happening, printing nothing while we write the graph: $ git -c gc.writeCommitGraph=true gc [...] Annotating commits in commit graph: 1565573, done. Computing commit graph generation numbers: 100% (782484/782484), done. Note that here we don't show "Finding commits for commit graph", this is because under "git gc" we seed the search with the commit references in the repository, and that set is too small to show any progress, but would e.g. on a smaller repo such as git.git with --stdin-commits: $ git rev-list --all | git -c gc.writeCommitGraph=true write --stdin-commits Finding commits for commit graph: 100% (162576/162576), done. Computing commit graph generation numbers: 100% (162576/162576), done. With --stdin-packs we don't show any estimation of how much is left to do. This is because we might be processing more than one pack. We could be less lazy here and show progress, either by detecting that we're only processing one pack, or by first looping over the packs to discover how many commits they have. I don't see the point in doing that work. So instead we get (on 2015-04-03-1M-git.git): $ echo pack-<HASH>.idx | git -c gc.writeCommitGraph=true --exec-path=$PWD commit-graph write --stdin-packs Finding commits for commit graph: 13064614, done. Annotating commits in commit graph: 3001341, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. No GC mode uses --stdin-packs. It's what they use at Microsoft to manually compute the generation numbers for their collection of large packs which are never coalesced. The reason we need a "report_progress" variable passed down from "git gc" is so that we don't report this output when we're running in the process "git gc --auto" detaches from the terminal. Since we write the commit graph from the "git gc" process itself (as opposed to what we do with say the "git repack" phase), we'd end up writing the output to .git/gc.log and reporting it to the user next time as part of the "The last gc run reported the following[...]" error, see 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19). So we must keep track of whether or not we're running in that demonized mode, and if so print no progress. See [2] and subsequent replies for a discussion of an approach not taken in compute_generation_numbers(). I.e. we're saying "Computing commit graph generation numbers", even though on an established history we're mostly skipping over all the work we did in the past. This is similar to the white lie we tell in the "Writing objects" phase (not all are objects being written). Always showing progress is considered more important than accuracy. I.e. on a repository like 2015-04-03-1M-git.git we'd hang for 6 seconds with no output on the second "git gc" if no changes were made to any objects in the interim if we'd take the approach in [2]. 1. https://github.com/avar/2015-04-03-1M-git 2. <c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com> (https://public-inbox.org/git/c6960252-c095-fb2b-e0bc-b1e6bb261614@gmail.com/) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
#include "progress.h"
#include "bloom.h"
#include "commit-slab.h"
#include "shallow.h"
#include "json-writer.h"
#include "trace2.h"
void git_test_write_commit_graph_or_die(void)
{
int flags = 0;
if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
return;
if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0))
flags = COMMIT_GRAPH_WRITE_BLOOM_FILTERS;
if (write_commit_graph_reachable(the_repository->objects->odb,
flags, NULL))
die("failed to write commit-graph under GIT_TEST_COMMIT_GRAPH");
}
#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
#define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
#define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */
#define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
#define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */
#define MAX_NUM_CHUNKS 7
#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
#define GRAPH_VERSION_1 0x1
#define GRAPH_VERSION GRAPH_VERSION_1
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
#define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
#define GRAPH_EDGE_LAST_MASK 0x7fffffff
#define GRAPH_PARENT_NONE 0x70000000
#define GRAPH_LAST_EDGE 0x80000000
#define GRAPH_HEADER_SIZE 8
#define GRAPH_FANOUT_SIZE (4 * 256)
#define GRAPH_CHUNKLOOKUP_WIDTH 12
#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
commit-graph: fix writing first commit-graph during fetch The previous commit includes a failing test for an issue around fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we fix that bug and set the test to "test_expect_success". The problem arises with this set of commands when the remote repo at <url> has a submodule. Note that --recurse-submodules is not needed to demonstrate the bug. $ git clone <url> test $ cd test $ git -c fetch.writeCommitGraph=true fetch origin Computing commit graph generation numbers: 100% (12/12), done. BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2> Aborted (core dumped) As an initial fix, I converted the code in builtin/fetch.c that calls write_commit_graph_reachable() to instead launch a "git commit-graph write --reachable --split" process. That code worked, but is not how we want the feature to work long-term. That test did demonstrate that the issue must be something to do with internal state of the 'git fetch' process. The write_commit_graph() method in commit-graph.c ensures the commits we plan to write are "closed under reachability" using close_reachable(). This method walks from the input commits, and uses the UNINTERESTING flag to mark which commits have already been visited. This allows the walk to take O(N) time, where N is the number of commits, instead of O(P) time, where P is the number of paths. (The number of paths can be exponential in the number of commits.) However, the UNINTERESTING flag is used in lots of places in the codebase. This flag usually means some barrier to stop a commit walk, such as in revision-walking to compare histories. It is not often cleared after the walk completes because the starting points of those walks do not have the UNINTERESTING flag, and clear_commit_marks() would stop immediately. This is happening during a 'git fetch' call with a remote. The fetch negotiation is comparing the remote refs with the local refs and marking some commits as UNINTERESTING. I tested running clear_commit_marks_many() to clear the UNINTERESTING flag inside close_reachable(), but the tips did not have the flag, so that did nothing. It turns out that the calculate_changed_submodule_paths() method is at fault. Thanks, Peff, for pointing out this detail! More specifically, for each submodule, the collect_changed_submodules() runs a revision walk to essentially do file-history on the list of submodules. That revision walk marks commits UNININTERESTING if they are simplified away by not changing the submodule. Instead, I finally arrived on the conclusion that I should use a flag that is not used in any other part of the code. In commit-reach.c, a number of flags were defined for commit walk algorithms. The REACHABLE flag seemed like it made the most sense, and it seems it was not actually used in the file. The REACHABLE flag was used in early versions of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make can_all_from_reach... linear, 2018-07-20). Add the REACHABLE flag to commit-graph.c and use it instead of UNINTERESTING in close_reachable(). This fixes the bug in manual testing. Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Jeff King <peff@peff.net> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
/* Remember to update object flag allocation in object.h */
#define REACHABLE (1u<<15)
commit-graph: add a slab to store topological levels In a later commit we will introduce corrected commit date as the generation number v2. Corrected commit dates will be stored in the new seperate Generation Data chunk. However, to ensure backwards compatibility with "Old" Git we need to continue to write generation number v1 (topological levels) to the commit data chunk. Thus, we need to compute and store both versions of generation numbers to write the commit-graph file. Therefore, let's introduce a commit-slab `topo_level_slab` to store topological levels; corrected commit date will be stored in the member `generation` of struct commit_graph_data. The macros `GENERATION_NUMBER_INFINITY` and `GENERATION_NUMBER_ZERO` mark commits not in the commit-graph file and commits written by a version of Git that did not compute generation numbers respectively. Generation numbers are computed identically for both kinds of commits. A "slab-miss" should return `GENERATION_NUMBER_INFINITY` as the commit is not in the commit-graph file. However, since the slab is zero-initialized, it returns 0 (or rather `GENERATION_NUMBER_ZERO`). Thus, we no longer need to check if the topological level of a commit is `GENERATION_NUMBER_INFINITY`. We will add a pointer to the slab in `struct write_commit_graph_context` and `struct commit_graph` to populate the slab in `fill_commit_graph_info` if the commit has a pre-computed topological level as in case of split commit-graphs. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
define_commit_slab(topo_level_slab, uint32_t);
/* Keep track of the order in which commits are added to our list. */
define_commit_slab(commit_pos, int);
static struct commit_pos commit_pos = COMMIT_SLAB_INIT(1, commit_pos);
static void set_commit_pos(struct repository *r, const struct object_id *oid)
{
static int32_t max_pos;
struct commit *commit = lookup_commit(r, oid);
if (!commit)
return; /* should never happen, but be lenient */
*commit_pos_at(&commit_pos, commit) = max_pos++;
}
static int commit_pos_cmp(const void *va, const void *vb)
{
const struct commit *a = *(const struct commit **)va;
const struct commit *b = *(const struct commit **)vb;
return commit_pos_at(&commit_pos, a) -
commit_pos_at(&commit_pos, b);
}
define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
static struct commit_graph_data_slab commit_graph_data_slab =
COMMIT_SLAB_INIT(1, commit_graph_data_slab);
uint32_t commit_graph_position(const struct commit *c)
{
struct commit_graph_data *data =
commit_graph_data_slab_peek(&commit_graph_data_slab, c);
return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
}
uint32_t commit_graph_generation(const struct commit *c)
{
struct commit_graph_data *data =
commit_graph_data_slab_peek(&commit_graph_data_slab, c);
if (!data)
return GENERATION_NUMBER_INFINITY;
else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
return GENERATION_NUMBER_INFINITY;
return data->generation;
}
static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
{
unsigned int i, nth_slab;
struct commit_graph_data *data =
commit_graph_data_slab_peek(&commit_graph_data_slab, c);
if (data)
return data;
nth_slab = c->index / commit_graph_data_slab.slab_size;
data = commit_graph_data_slab_at(&commit_graph_data_slab, c);
/*
* commit-slab initializes elements with zero, overwrite this with
* COMMIT_NOT_FROM_GRAPH for graph_pos.
*
* We avoid initializing generation with checking if graph position
* is not COMMIT_NOT_FROM_GRAPH.
*/
for (i = 0; i < commit_graph_data_slab.slab_size; i++) {
commit_graph_data_slab.slab[nth_slab][i].graph_pos =
COMMIT_NOT_FROM_GRAPH;
}
return data;
}
commit-graph: fix regression when computing Bloom filters Before computing Bloom filters, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since c49c82aa4c (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for a 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_gen_cmp()' falls back to comparing commits by their date when they have equal generation number, and so since c49c82aa4c is purely a date comparison function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. Applying this patch (compared with v2.30.0) speeds up computing Bloom filters by factors ranging from 0.40% to 5.19% on various repositories [1]. So, avoid the useless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. [1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
/*
* Should be used only while writing commit-graph as it compares
* generation value of commits by directly accessing commit-slab.
*/
static int commit_gen_cmp(const void *va, const void *vb)
{
const struct commit *a = *(const struct commit **)va;
const struct commit *b = *(const struct commit **)vb;
commit-graph: fix regression when computing Bloom filters Before computing Bloom filters, the commit-graph machinery uses commit_gen_cmp to sort commits by generation order for improved diff performance. 3d11275505 (commit-graph: examine commits by generation number, 2020-03-30) claims that this sort can reduce the time spent to compute Bloom filters by nearly half. But since c49c82aa4c (commit: move members graph_pos, generation to a slab, 2020-06-17), this optimization is broken, since asking for a 'commit_graph_generation()' directly returns GENERATION_NUMBER_INFINITY while writing. Not all hope is lost, though: 'commit_gen_cmp()' falls back to comparing commits by their date when they have equal generation number, and so since c49c82aa4c is purely a date comparison function. This heuristic is good enough that we don't seem to loose appreciable performance while computing Bloom filters. Applying this patch (compared with v2.30.0) speeds up computing Bloom filters by factors ranging from 0.40% to 5.19% on various repositories [1]. So, avoid the useless 'commit_graph_generation()' while writing by instead accessing the slab directly. This returns the newly-computed generation numbers, and allows us to avoid the heuristic by directly comparing generation numbers. [1]: https://lore.kernel.org/git/20210105094535.GN8396@szeder.dev/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
uint32_t generation_a = commit_graph_data_at(a)->generation;
uint32_t generation_b = commit_graph_data_at(b)->generation;
/* lower generation commits first */
if (generation_a < generation_b)
return -1;
else if (generation_a > generation_b)
return 1;
/* use date as a heuristic when generations are equal */
if (a->date < b->date)
return -1;
else if (a->date > b->date)
return 1;
return 0;
}
char *get_commit_graph_filename(struct object_directory *obj_dir)
{
return xstrfmt("%s/info/commit-graph", obj_dir->path);
}
commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory *' instead of a 'const char *'. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
static char *get_split_graph_filename(struct object_directory *odb,
const char *oid_hex)
{
commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory *' instead of a 'const char *'. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
return xstrfmt("%s/info/commit-graphs/graph-%s.graph", odb->path,
oid_hex);
}
char *get_commit_graph_chain_filename(struct object_directory *odb)
{
commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory *' instead of a 'const char *'. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
}
static uint8_t oid_version(void)
{
commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since 092b677 (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
switch (hash_algo_by_ptr(the_hash_algo)) {
case GIT_HASH_SHA1:
return 1;
case GIT_HASH_SHA256:
return 2;
default:
die(_("invalid hash version"));
}
}
static struct commit_graph *alloc_commit_graph(void)
{
struct commit_graph *g = xcalloc(1, sizeof(*g));
return g;
}
extern int read_replace_refs;
static int commit_graph_compatible(struct repository *r)
{
if (!r->gitdir)
return 0;
if (read_replace_refs) {
prepare_replace_object(r);
if (hashmap_get_size(&r->objects->replace_map->map))
return 0;
}
prepare_commit_graft(r);
commit.c: don't persist substituted parents when unshallowing Since 37b9dcabfc (shallow.c: use '{commit,rollback}_shallow_file', 2020-04-22), Git knows how to reset stat-validity checks for the $GIT_DIR/shallow file, allowing it to change between a shallow and non-shallow state in the same process (e.g., in the case of 'git fetch --unshallow'). However, when $GIT_DIR/shallow changes, Git does not alter or remove any grafts (nor substituted parents) in memory. This comes up in a "git fetch --unshallow" with fetch.writeCommitGraph set to true. Ordinarily in a shallow repository (and before 37b9dcabfc, even in this case), commit_graph_compatible() would return false, indicating that the repository should not be used to write a commit-graphs (since commit-graph files cannot represent a shallow history). But since 37b9dcabfc, in an --unshallow operation that check succeeds. Thus even though the repository isn't shallow any longer (that is, we have all of the objects), the in-core representation of those objects still has munged parents at the shallow boundaries. When the commit-graph write proceeds, we use the incorrect parentage, producing wrong results. There are two ways for a user to work around this: either (1) set 'fetch.writeCommitGraph' to 'false', or (2) drop the commit-graph after unshallowing. One way to fix this would be to reset the parsed object pool entirely (flushing the cache and thus preventing subsequent reads from modifying their parents) after unshallowing. That would produce a problem when callers have a now-stale reference to the old pool, and so this patch implements a different approach. Instead, attach a new bit to the pool, 'substituted_parent', which indicates if the repository *ever* stored a commit which had its parents modified (i.e., the shallow boundary prior to unshallowing). This bit needs to be sticky because all reads subsequent to modifying a commit's parents are unreliable when unshallowing. Modify the check in 'commit_graph_compatible' to take this bit into account, and correctly avoid generating commit-graphs in this case, thus solving the bug. Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Jay Conrod <jayconrod@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
if (r->parsed_objects &&
(r->parsed_objects->grafts_nr || r->parsed_objects->substituted_parent))
return 0;
if (is_repository_shallow(r))
return 0;
return 1;
}
commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
int open_commit_graph(const char *graph_file, int *fd, struct stat *st)
{
*fd = git_open(graph_file);
if (*fd < 0)
return 0;
if (fstat(*fd, st)) {
close(*fd);
return 0;
}
return 1;
}
struct commit_graph *load_commit_graph_one_fd_st(struct repository *r,
int fd, struct stat *st,
struct object_directory *odb)
{
void *graph_map;
size_t graph_size;
struct commit_graph *ret;
commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
graph_size = xsize_t(st->st_size);
if (graph_size < GRAPH_MIN_SIZE) {
close(fd);
error(_("commit-graph file is too small"));
commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
return NULL;
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
ret = parse_commit_graph(r, graph_map, graph_size);
if (ret)
ret->odb = odb;
else
munmap(graph_map, graph_size);
return ret;
}
commit-graph: fix segfault on e.g. "git status" When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should *warn* and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
static int verify_commit_graph_lite(struct commit_graph *g)
{
/*
* Basic validation shared between parse_commit_graph()
* which'll be called every time the graph is used, and the
* much more expensive verify_commit_graph() used by
* "commit-graph verify".
*
* There should only be very basic checks here to ensure that
* we don't e.g. segfault in fill_commit_in_graph(), but
* because this is a very hot codepath nothing that e.g. loops
* over g->num_commits, or runs a checksum on the commit-graph
* itself.
*/
if (!g->chunk_oid_fanout) {
error("commit-graph is missing the OID Fanout chunk");
return 1;
}
if (!g->chunk_oid_lookup) {
error("commit-graph is missing the OID Lookup chunk");
return 1;
}
if (!g->chunk_commit_data) {
error("commit-graph is missing the Commit Data chunk");
return 1;
}
return 0;
}
struct commit_graph *parse_commit_graph(struct repository *r,
void *graph_map, size_t graph_size)
{
const unsigned char *data, *chunk_lookup;
uint32_t i;
struct commit_graph *graph;
uint64_t next_chunk_offset;
uint32_t graph_signature;
unsigned char graph_version, hash_version;
if (!graph_map)
return NULL;
if (graph_size < GRAPH_MIN_SIZE)
return NULL;
data = (const unsigned char *)graph_map;
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
error(_("commit-graph signature %X does not match signature %X"),
graph_signature, GRAPH_SIGNATURE);
return NULL;
}
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
error(_("commit-graph version %X does not match version %X"),
graph_version, GRAPH_VERSION);
return NULL;
}
hash_version = *(unsigned char*)(data + 5);
if (hash_version != oid_version()) {
error(_("commit-graph hash version %X does not match version %X"),
hash_version, oid_version());
return NULL;
}
prepare_repo_settings(r);
graph = alloc_commit_graph();
graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->data = graph_map;
graph->data_len = graph_size;
if (graph_size < GRAPH_HEADER_SIZE +
(graph->num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH +
GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) {
error(_("commit-graph file is too small to hold %u chunks"),
graph->num_chunks);
free(graph);
return NULL;
}
chunk_lookup = data + 8;
next_chunk_offset = get_be64(chunk_lookup + 4);
for (i = 0; i < graph->num_chunks; i++) {
uint32_t chunk_id;
uint64_t chunk_offset = next_chunk_offset;
int chunk_repeated = 0;
chunk_id = get_be32(chunk_lookup + 0);
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
next_chunk_offset = get_be64(chunk_lookup + 4);
if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
(uint32_t)chunk_offset);
goto free_and_return;
}
switch (chunk_id) {
case GRAPH_CHUNKID_OIDFANOUT:
if (graph->chunk_oid_fanout)
chunk_repeated = 1;
else
graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
break;
case GRAPH_CHUNKID_OIDLOOKUP:
if (graph->chunk_oid_lookup)
chunk_repeated = 1;
else {
graph->chunk_oid_lookup = data + chunk_offset;
graph->num_commits = (next_chunk_offset - chunk_offset)
/ graph->hash_len;
}
break;
case GRAPH_CHUNKID_DATA:
if (graph->chunk_commit_data)
chunk_repeated = 1;
else
graph->chunk_commit_data = data + chunk_offset;
break;
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
case GRAPH_CHUNKID_EXTRAEDGES:
if (graph->chunk_extra_edges)
chunk_repeated = 1;
else
commit-graph: rename "large edges" to "extra edges" The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
graph->chunk_extra_edges = data + chunk_offset;
break;
case GRAPH_CHUNKID_BASE:
if (graph->chunk_base_graphs)
chunk_repeated = 1;
else
graph->chunk_base_graphs = data + chunk_offset;
break;
case GRAPH_CHUNKID_BLOOMINDEXES:
if (graph->chunk_bloom_indexes)
chunk_repeated = 1;
else if (r->settings.commit_graph_read_changed_paths)
graph->chunk_bloom_indexes = data + chunk_offset;
break;
case GRAPH_CHUNKID_BLOOMDATA:
if (graph->chunk_bloom_data)
chunk_repeated = 1;
else if (r->settings.commit_graph_read_changed_paths) {
uint32_t hash_version;
graph->chunk_bloom_data = data + chunk_offset;
hash_version = get_be32(data + chunk_offset);
if (hash_version != 1)
break;
graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
graph->bloom_filter_settings->hash_version = hash_version;
graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4);
graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8);
graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
}
break;
}
if (chunk_repeated) {
error(_("commit-graph chunk id %08x appears multiple times"), chunk_id);
goto free_and_return;
}
}
if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
init_bloom_filters();
} else {
/* We need both the bloom chunks to exist together. Else ignore the data */
graph->chunk_bloom_indexes = NULL;
graph->chunk_bloom_data = NULL;
FREE_AND_NULL(graph->bloom_filter_settings);
}
hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len);
if (verify_commit_graph_lite(graph))
goto free_and_return;
commit-graph: fix segfault on e.g. "git status" When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should *warn* and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
return graph;
free_and_return:
free(graph->bloom_filter_settings);
free(graph);
return NULL;
}
static struct commit_graph *load_commit_graph_one(struct repository *r,
const char *graph_file,
struct object_directory *odb)
commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
{
struct stat st;
int fd;
struct commit_graph *g;
commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
int open_ok = open_commit_graph(graph_file, &fd, &st);
if (!open_ok)
return NULL;
g = load_commit_graph_one_fd_st(r, fd, &st, odb);
if (g)
g->filename = xstrdup(graph_file);
return g;
commit-graph: don't early exit(1) on e.g. "git status" Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
4 years ago
}
static struct commit_graph *load_commit_graph_v1(struct repository *r,
struct object_directory *odb)
{
commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory *' instead of a 'const char *'. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 years ago
char *graph_name = get_commit_graph_filename(odb);
struct commit_graph *g = load_commit_graph_one(r, graph_name, odb);
free(graph_name);
return g;
}
static int add_graph_to_chain(struct commit_graph *g,
struct commit_graph *chain,
struct object_id *oids,
int n)
{
struct commit_graph *cur_g = chain;
if (n && !g->chunk_base_graphs) {
warning(_("commit-graph has no base graphs chunk"));
return 0;
}
while (n) {
n--;
if (!cur_g ||
!oideq(&oids[n], &cur_g->oid) ||
!hasheq(oids[n].hash, g->chunk_base_graphs + g->hash_len * n)) {
warning(_("commit-graph chain does not match"));
return 0;
}
cur_g = cur_g->base_graph;
}
g->base_graph = chain;
if (chain)
g->num_commits_in_base = chain->num_commits + chain->num_commits_in_base;
return 1;
}