Browse Source

Merge branch 'ds/sparse-index-protections'

Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.

* ds/sparse-index-protections: (47 commits)
  name-hash: use expand_to_path()
  sparse-index: expand_to_path()
  name-hash: don't add directories to name_hash
  revision: ensure full index
  resolve-undo: ensure full index
  read-cache: ensure full index
  pathspec: ensure full index
  merge-recursive: ensure full index
  entry: ensure full index
  dir: ensure full index
  update-index: ensure full index
  stash: ensure full index
  rm: ensure full index
  merge-index: ensure full index
  ls-files: ensure full index
  grep: ensure full index
  fsck: ensure full index
  difftool: ensure full index
  commit: ensure full index
  checkout: ensure full index
  ...
pull/761/merge
Junio C Hamano 1 year ago
parent
commit
8e97852919
  1. 5
      Documentation/config/index.txt
  2. 14
      Documentation/git-sparse-checkout.txt
  3. 19
      Documentation/technical/index-format.txt
  4. 208
      Documentation/technical/sparse-index.txt
  5. 1
      Makefile
  6. 14
      attr.c
  7. 4
      attr.h
  8. 2
      builtin/add.c
  9. 2
      builtin/checkout-index.c
  10. 5
      builtin/checkout.c
  11. 4
      builtin/commit.c
  12. 3
      builtin/difftool.c
  13. 2
      builtin/fsck.c
  14. 2
      builtin/grep.c
  15. 14
      builtin/ls-files.c
  16. 5
      builtin/merge-index.c
  17. 2
      builtin/rm.c
  18. 44
      builtin/sparse-checkout.c
  19. 2
      builtin/stash.c
  20. 2
      builtin/update-index.c
  21. 40
      cache-tree.c
  22. 25
      cache.h
  23. 20
      convert.c
  24. 22
      convert.h
  25. 14
      dir.c
  26. 8
      dir.h
  27. 2
      entry.c
  28. 2
      merge-ort.c
  29. 4
      merge-recursive.c
  30. 11
      name-hash.c
  31. 8
      pathspec.c
  32. 6
      pathspec.h
  33. 79
      read-cache.c
  34. 15
      repo-settings.c
  35. 11
      repository.c
  36. 3
      repository.h
  37. 4
      resolve-undo.c
  38. 2
      revision.c
  39. 358
      sparse-index.c
  40. 23
      sparse-index.h
  41. 6
      submodule.c
  42. 6
      submodule.h
  43. 3
      t/README
  44. 66
      t/helper/test-read-cache.c
  45. 101
      t/perf/p2000-sparse-operations.sh
  46. 13
      t/t1091-sparse-checkout-builtin.sh
  47. 143
      t/t1092-sparse-checkout-compatibility.sh
  48. 17
      unpack-trees.c

5
Documentation/config/index.txt

@ -14,6 +14,11 @@ index.recordOffsetTable::
Defaults to 'true' if index.threads has been explicitly enabled,
'false' otherwise.
index.sparse::
When enabled, write the index using sparse-directory entries. This
has no effect unless `core.sparseCheckout` and
`core.sparseCheckoutCone` are both enabled. Defaults to 'false'.
index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.

14
Documentation/git-sparse-checkout.txt

@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the
When `--cone` is provided, the `core.sparseCheckoutCone` setting is
also set, allowing for better performance with a limited set of
patterns (see 'CONE PATTERN SET' below).
+
Use the `--[no-]sparse-index` option to toggle the use of the sparse
index format. This reduces the size of the index to be more closely
aligned with your sparse-checkout definition. This can have significant
performance advantages for commands such as `git status` or `git add`.
This feature is still experimental. Some commands might be slower with
a sparse index until they are properly integrated with the feature.
+
**WARNING:** Using a sparse index requires modifying the index in a way
that is not completely understood by external tools. If you have trouble
with this compatibility, then run `git sparse-checkout init --no-sparse-index`
to rewrite your index to not be sparse. Older versions of Git will not
understand the sparse directory entries index extension and may fail to
interact with your repository until it is disabled.
'set'::
Write a set of patterns to the sparse-checkout file, as given as

19
Documentation/technical/index-format.txt

@ -44,6 +44,13 @@ Git index format
localization, no special casing of directory separator '/'). Entries
with the same name are sorted by their stage field.
An index entry typically represents a file. However, if sparse-checkout
is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the
`extensions.sparseIndex` extension is enabled, then the index may
contain entries for directories outside of the sparse-checkout definition.
These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and
the path ends in a directory separator.
32-bit ctime seconds, the last time a file's metadata changed
this is stat(2) data
@ -385,3 +392,15 @@ The remaining data of each directory block is grouped by type:
in this block of entries.
- 32-bit count of cache entries in this block
== Sparse Directory Entries
When using sparse-checkout in cone mode, some entire directories within
the index can be summarized by pointing to a tree object instead of the
entire expanded list of paths within that tree. An index containing such
entries is a "sparse index". Index format versions 4 and less were not
implemented with such entries in mind. Thus, for these versions, an
index containing sparse directory entries will include this extension
with signature { 's', 'd', 'i', 'r' }. Like the split-index extension,
tools should avoid interacting with a sparse index unless they understand
this extension.

208
Documentation/technical/sparse-index.txt

@ -0,0 +1,208 @@
Git Sparse-Index Design Document
================================
The sparse-checkout feature allows users to focus a working directory on
a subset of the files at HEAD. The cone mode patterns, enabled by
`core.sparseCheckoutCone`, allow for very fast pattern matching to
discover which files at HEAD belong in the sparse-checkout cone.
Three important scale dimensions for a Git working directory are:
* `HEAD`: How many files are present at `HEAD`?
* Populated: How many files are within the sparse-checkout cone.
* Modified: How many files has the user modified in the working directory?
We will use big-O notation -- O(X) -- to denote how expensive certain
operations are in terms of these dimensions.
These dimensions are ordered by their magnitude: users (typically) modify
fewer files than are populated, and we can only populate files at `HEAD`.
Problems occur if there is an extreme imbalance in these dimensions. For
example, if `HEAD` contains millions of paths but the populated set has
only tens of thousands, then commands like `git status` and `git add` can
be dominated by operations that require O(`HEAD`) operations instead of
O(Populated). Primarily, the cost is in parsing and rewriting the index,
which is filled primarily with files at `HEAD` that are marked with the
`SKIP_WORKTREE` bit.
The sparse-index intends to take these commands that read and modify the
index from O(`HEAD`) to O(Populated). To do this, we need to modify the
index format in a significant way: add "sparse directory" entries.
With cone mode patterns, it is possible to detect when an entire
directory will have its contents outside of the sparse-checkout definition.
Instead of listing all of the files it contains as individual entries, a
sparse-index contains an entry with the directory name, referencing the
object ID of the tree at `HEAD` and marked with the `SKIP_WORKTREE` bit.
If we need to discover the details for paths within that directory, we
can parse trees to find that list.
At time of writing, sparse-directory entries violate expectations about the
index format and its in-memory data structure. There are many consumers in
the codebase that expect to iterate through all of the index entries and
see only files. In fact, these loops expect to see a reference to every
staged file. One way to handle this is to parse trees to replace a
sparse-directory entry with all of the files within that tree as the index
is loaded. However, parsing trees is slower than parsing the index format,
so that is a slower operation than if we left the index alone. The plan is
to make all of these integrations "sparse aware" so this expansion through
tree parsing is unnecessary and they use fewer resources than when using a
full index.
The implementation plan below follows four phases to slowly integrate with
the sparse-index. The intention is to incrementally update Git commands to
interact safely with the sparse-index without significant slowdowns. This
may not always be possible, but the hope is that the primary commands that
users need in their daily work are dramatically improved.
Phase I: Format and initial speedups
------------------------------------
During this phase, Git learns to enable the sparse-index and safely parse
one. Protections are put in place so that every consumer of the in-memory
data structure can operate with its current assumption of every file at
`HEAD`.
At first, every index parse will call a helper method,
`ensure_full_index()`, which scans the index for sparse-directory entries
(pointing to trees) and replaces them with the full list of paths (with
blob contents) by parsing tree objects. This will be slower in all cases.
The only noticeable change in behavior will be that the serialized index
file contains sparse-directory entries.
To start, we use a new required index extension, `sdir`, to allow
inserting sparse-directory entries into indexes with file format
versions 2, 3, and 4. This prevents Git versions that do not understand
the sparse-index from operating on one, while allowing tools that do not
understand the sparse-index to operate on repositories as long as they do
not interact with the index. A new format, index v5, will be introduced
that includes sparse-directory entries by default. It might also
introduce other features that have been considered for improving the
index, as well.
Next, consumers of the index will be guarded against operating on a
sparse-index by inserting calls to `ensure_full_index()` or
`expand_index_to_path()`. If a specific path is requested, then those will
be protected from within the `index_file_exists()` and `index_name_pos()`
API calls: they will call `ensure_full_index()` if necessary. The
intention here is to preserve existing behavior when interacting with a
sparse-checkout. We don't want a change to happen by accident, without
tests. Many of these locations may not need any change before removing the
guards, but we should not do so without tests to ensure the expected
behavior happens.
It may be desirable to _change_ the behavior of some commands in the
presence of a sparse index or more generally in any sparse-checkout
scenario. In such cases, these should be carefully communicated and
tested. No such behavior changes are intended during this phase.
During a scan of the codebase, not every iteration of the cache entries
needs an `ensure_full_index()` check. The basic reasons include:
1. The loop is scanning for entries with non-zero stage. These entries
are not collapsed into a sparse-directory entry.
2. The loop is scanning for submodules. These entries are not collapsed
into a sparse-directory entry.
3. The loop is part of the index API, especially around reading or
writing the format.
4. The loop is checking for correct order of cache entries and that is
correct if and only if the sparse-directory entries are in the correct
location.
5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is
otherwise already aware of sparse directory entries.
6. The sparse-index is disabled at this point when using the split-index
feature, so no effort is made to protect the split-index API.
Even after inserting these guards, we will keep expanding sparse-indexes
for most Git commands using the `command_requires_full_index` repository
setting. This setting will be on by default and disabled one builtin at a
time until we have sufficient confidence that all of the index operations
are properly guarded.
To complete this phase, the commands `git status` and `git add` will be
integrated with the sparse-index so that they operate with O(Populated)
performance. They will be carefully tested for operations within and
outside the sparse-checkout definition.
Phase II: Careful integrations
------------------------------
This phase focuses on ensuring that all index extensions and APIs work
well with a sparse-index. This requires significant increases to our test
coverage, especially for operations that interact with the working
directory outside of the sparse-checkout definition. Some of these
behaviors may not be the desirable ones, such as some tests already
marked for failure in `t1092-sparse-checkout-compatibility.sh`.
The index extensions that may require special integrations are:
* FS Monitor
* Untracked cache
While integrating with these features, we should look for patterns that
might lead to better APIs for interacting with the index. Coalescing
common usage patterns into an API call can reduce the number of places
where sparse-directories need to be handled carefully.
Phase III: Important command speedups
-------------------------------------
At this point, the patterns for testing and implementing sparse-directory
logic should be relatively stable. This phase focuses on updating some of
the most common builtins that use the index to operate as O(Populated).
Here is a potential list of commands that could be valuable to integrate
at this point:
* `git commit`
* `git checkout`
* `git merge`
* `git rebase`
Hopefully, commands such as `git merge` and `git rebase` can benefit
instead from merge algorithms that do not use the index as a data
structure, such as the merge-ORT strategy. As these topics mature, we
may enable the ORT strategy by default for repositories using the
sparse-index feature.
Along with `git status` and `git add`, these commands cover the majority
of users' interactions with the working directory. In addition, we can
integrate with these commands:
* `git grep`
* `git rm`
These have been proposed as some whose behavior could change when in a
repo with a sparse-checkout definition. It would be good to include this
behavior automatically when using a sparse-index. Some clarity is needed
to make the behavior switch clear to the user.
This phase is the first where parallel work might be possible without too
much conflicts between topics.
Phase IV: The long tail
-----------------------
This last phase is less a "phase" and more "the new normal" after all of
the previous work.
To start, the `command_requires_full_index` option could be removed in
favor of expanding only when hitting an API guard.
There are many Git commands that could use special attention to operate as
O(Populated), while some might be so rare that it is acceptable to leave
them with additional overhead when a sparse-index is present.
Here are some commands that might be useful to update:
* `git sparse-checkout set`
* `git am`
* `git clean`
* `git stash`

1
Makefile

@ -995,6 +995,7 @@ LIB_OBJS += setup.o
LIB_OBJS += shallow.o
LIB_OBJS += sideband.o
LIB_OBJS += sigchain.o
LIB_OBJS += sparse-index.o
LIB_OBJS += split-index.o
LIB_OBJS += stable-qsort.o
LIB_OBJS += strbuf.o

14
attr.c

@ -733,7 +733,7 @@ static struct attr_stack *read_attr_from_file(const char *path, unsigned flags)
return res;
}
static struct attr_stack *read_attr_from_index(const struct index_state *istate,
static struct attr_stack *read_attr_from_index(struct index_state *istate,
const char *path,
unsigned flags)
{
@ -763,7 +763,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate,
return res;
}
static struct attr_stack *read_attr(const struct index_state *istate,
static struct attr_stack *read_attr(struct index_state *istate,
const char *path, unsigned flags)
{
struct attr_stack *res = NULL;
@ -855,7 +855,7 @@ static void push_stack(struct attr_stack **attr_stack_p,
}
}
static void bootstrap_attr_stack(const struct index_state *istate,
static void bootstrap_attr_stack(struct index_state *istate,
struct attr_stack **stack)
{
struct attr_stack *e;
@ -894,7 +894,7 @@ static void bootstrap_attr_stack(const struct index_state *istate,
push_stack(stack, e, NULL, 0);
}
static void prepare_attr_stack(const struct index_state *istate,
static void prepare_attr_stack(struct index_state *istate,
const char *path, int dirlen,
struct attr_stack **stack)
{
@ -1094,7 +1094,7 @@ static void determine_macros(struct all_attrs_item *all_attrs,
* If check->check_nr is non-zero, only attributes in check[] are collected.
* Otherwise all attributes are collected.
*/
static void collect_some_attrs(const struct index_state *istate,
static void collect_some_attrs(struct index_state *istate,
const char *path,
struct attr_check *check)
{
@ -1123,7 +1123,7 @@ static void collect_some_attrs(const struct index_state *istate,
fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
}
void git_check_attr(const struct index_state *istate,
void git_check_attr(struct index_state *istate,
const char *path,
struct attr_check *check)
{
@ -1140,7 +1140,7 @@ void git_check_attr(const struct index_state *istate,
}
}
void git_all_attrs(const struct index_state *istate,
void git_all_attrs(struct index_state *istate,
const char *path, struct attr_check *check)
{
int i;

4
attr.h

@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check);
*/
const char *git_attr_name(const struct git_attr *);
void git_check_attr(const struct index_state *istate,
void git_check_attr(struct index_state *istate,
const char *path, struct attr_check *check);
/*
* Retrieve all attributes that apply to the specified path.
* check holds the attributes and their values.
*/
void git_all_attrs(const struct index_state *istate,
void git_all_attrs(struct index_state *istate,
const char *path, struct attr_check *check);
enum git_attr_direction {

2
builtin/add.c

@ -141,6 +141,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
{
int i, retval = 0;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
struct cache_entry *ce = active_cache[i];

2
builtin/checkout-index.c

@ -120,6 +120,8 @@ static void checkout_all(const char *prefix, int prefix_length)
int i, errs = 0;
struct cache_entry *last_ce = NULL;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr ; i++) {
struct cache_entry *ce = active_cache[i];
if (ce_stage(ce) != checkout_stage

5
builtin/checkout.c

@ -369,6 +369,9 @@ static int checkout_worktree(const struct checkout_opts *opts,
NULL);
enable_delayed_checkout(&state);
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++) {
struct cache_entry *ce = active_cache[pos];
if (ce->ce_flags & CE_MATCHED) {
@ -513,6 +516,8 @@ static int checkout_paths(const struct checkout_opts *opts,
* Make sure all pathspecs participated in locating the paths
* to be checked out.
*/
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++)
if (opts->overlay_mode)
mark_ce_for_checkout_overlay(active_cache[pos],

4
builtin/commit.c

@ -261,6 +261,8 @@ static int list_paths(struct string_list *list, const char *with_tree,
free(max_prefix);
}
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
struct string_list_item *item;
@ -976,6 +978,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
if (get_oid(parent, &oid)) {
int i, ita_nr = 0;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++)
if (ce_intent_to_add(active_cache[i]))
ita_nr++;

3
builtin/difftool.c

@ -585,6 +585,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1);
rc = run_command_v_opt(helper_argv, flags);
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&wtindex);
/*
* If the diff includes working copy files and those
* files were modified during the diff, then the changes

2
builtin/fsck.c

@ -881,6 +881,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
verify_index_checksum = 1;
verify_ce_order = 1;
read_cache();
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
unsigned int mode;
struct blob *blob;

2
builtin/grep.c

@ -504,6 +504,8 @@ static int grep_cache(struct grep_opt *opt,
if (repo_read_index(repo) < 0)
die(_("index file corrupt"));
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(repo->index);
for (nr = 0; nr < repo->index->cache_nr; nr++) {
const struct cache_entry *ce = repo->index->cache[nr];

14
builtin/ls-files.c

@ -57,7 +57,7 @@ static const char *tag_modified = "";
static const char *tag_skip_worktree = "";
static const char *tag_resolve_undo = "";
static void write_eolinfo(const struct index_state *istate,
static void write_eolinfo(struct index_state *istate,
const struct cache_entry *ce, const char *path)
{
if (show_eol) {
@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce)
}
}
static void show_dir_entry(const struct index_state *istate,
static void show_dir_entry(struct index_state *istate,
const char *tag, struct dir_entry *ent)
{
int len = max_prefix_len;
@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate,
write_name(ent->name);
}
static void show_other_files(const struct index_state *istate,
static void show_other_files(struct index_state *istate,
const struct dir_struct *dir)
{
int i;
@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate,
}
}
static void show_killed_files(const struct index_state *istate,
static void show_killed_files(struct index_state *istate,
const struct dir_struct *dir)
{
int i;
@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
}
}
static void show_ru_info(const struct index_state *istate)
static void show_ru_info(struct index_state *istate)
{
struct string_list_item *item;
@ -317,6 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
if (!(show_cached || show_stage || show_deleted || show_modified))
return;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(repo->index);
for (i = 0; i < repo->index->cache_nr; i++) {
const struct cache_entry *ce = repo->index->cache[i];
struct stat st;
@ -494,6 +496,8 @@ void overlay_tree_on_index(struct index_state *istate,
die("bad tree-ish %s", tree_name);
/* Hoist the unmerged entries up to stage #3 to make room */
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(istate);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce = istate->cache[i];
if (!ce_stage(ce))

5
builtin/merge-index.c

@ -58,6 +58,8 @@ static void merge_one_path(const char *path)
static void merge_all(void)
{
int i;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
if (!ce_stage(ce))
@ -80,6 +82,9 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix)
read_cache();
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
i = 1;
if (!strcmp(argv[i], "-o")) {
one_shot = 1;

2
builtin/rm.c

@ -293,6 +293,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
seen = xcalloc(pathspec.nr, 1);
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++) {
const struct cache_entry *ce = active_cache[i];
if (!ce_path_match(&the_index, ce, &pathspec, seen))

44
builtin/sparse-checkout.c

@ -14,6 +14,7 @@
#include "unpack-trees.h"
#include "wt-status.h"
#include "quote.h"
#include "sparse-index.h"
static const char *empty_base = "";
@ -110,6 +111,8 @@ static int update_working_directory(struct pattern_list *pl)
if (is_index_unborn(r->index))
return UPDATE_SPARSITY_SUCCESS;
r->index->sparse_checkout_patterns = pl;
memset(&o, 0, sizeof(o));
o.verbose_update = isatty(2);
o.update = 1;
@ -138,6 +141,7 @@ static int update_working_directory(struct pattern_list *pl)
else
rollback_lock_file(&lock_file);
r->index->sparse_checkout_patterns = NULL;
return result;
}
@ -276,16 +280,20 @@ static int set_config(enum sparse_checkout_mode mode)
"core.sparseCheckoutCone",
mode == MODE_CONE_PATTERNS ? "true" : NULL);
if (mode == MODE_NO_PATTERNS)
set_sparse_index_config(the_repository, 0);
return 0;
}
static char const * const builtin_sparse_checkout_init_usage[] = {
N_("git sparse-checkout init [--cone]"),
N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"),
NULL
};
static struct sparse_checkout_init_opts {
int cone_mode;
int sparse_index;
} init_opts;
static int sparse_checkout_init(int argc, const char **argv)
@ -300,11 +308,15 @@ static int sparse_checkout_init(int argc, const char **argv)
static struct option builtin_sparse_checkout_init_options[] = {
OPT_BOOL(0, "cone", &init_opts.cone_mode,
N_("initialize the sparse-checkout in cone mode")),
OPT_BOOL(0, "sparse-index", &init_opts.sparse_index,
N_("toggle the use of a sparse index")),
OPT_END(),
};
repo_read_index(the_repository);
init_opts.sparse_index = -1;
argc = parse_options(argc, argv, NULL,
builtin_sparse_checkout_init_options,
builtin_sparse_checkout_init_usage, 0);
@ -323,10 +335,20 @@ static int sparse_checkout_init(int argc, const char **argv)
sparse_filename = get_sparse_checkout_filename();
res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL, 0);
if (init_opts.sparse_index >= 0) {
if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0)
die(_("failed to modify sparse-index config"));
/* force an index rewrite */
repo_read_index(the_repository);
the_repository->index->updated_workdir = 1;
}
core_apply_sparse_checkout = 1;
/* If we already have a sparse-checkout file, use it. */
if (res >= 0) {
free(sparse_filename);
core_apply_sparse_checkout = 1;
return update_working_directory(NULL);
}
@ -348,6 +370,7 @@ static int sparse_checkout_init(int argc, const char **argv)
add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0);
strbuf_addstr(&pattern, "!/*/");
add_pattern(strbuf_detach(&pattern, NULL), empty_base, 0, &pl, 0);
pl.use_cone_patterns = init_opts.cone_mode;
return write_patterns_and_update(&pl);
}
@ -517,19 +540,18 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
{
int result;
int changed_config = 0;
struct pattern_list pl;
memset(&pl, 0, sizeof(pl));
struct pattern_list *pl = xcalloc(1, sizeof(*pl));
switch (m) {
case ADD:
if (core_sparse_checkout_cone)
add_patterns_cone_mode(argc, argv, &pl);
add_patterns_cone_mode(argc, argv, pl);
else
add_patterns_literal(argc, argv, &pl);
add_patterns_literal(argc, argv, pl);
break;
case REPLACE:
add_patterns_from_input(&pl, argc, argv);
add_patterns_from_input(pl, argc, argv);
break;
}
@ -539,12 +561,13 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
changed_config = 1;
}
result = write_patterns_and_update(&pl);
result = write_patterns_and_update(pl);
if (result && changed_config)
set_config(MODE_NO_PATTERNS);
clear_pattern_list(&pl);
clear_pattern_list(pl);
free(pl);
return result;
}
@ -614,6 +637,9 @@ static int sparse_checkout_disable(int argc, const char **argv)
strbuf_addstr(&match_all, "/*");
add_pattern(strbuf_detach(&match_all, NULL), empty_base, 0, &pl, 0);
prepare_repo_settings(the_repository);
the_repository->settings.sparse_index = 0;
if (update_working_directory(&pl))
die(_("error while refreshing working directory"));

2
builtin/stash.c

@ -1412,6 +1412,8 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
int i;
char *ps_matched = xcalloc(ps->nr, 1);
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (i = 0; i < active_nr; i++)
ce_path_match(&the_index, active_cache[i], ps,
ps_matched);

2
builtin/update-index.c

@ -745,6 +745,8 @@ static int do_reupdate(int ac, const char **av,
*/
has_head = 0;
redo:
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(&the_index);
for (pos = 0; pos < active_nr; pos++) {
const struct cache_entry *ce = active_cache[pos];
struct cache_entry *old = NULL;

40
cache-tree.c

@ -6,6 +6,7 @@
#include "object-store.h"
#include "replace-object.h"
#include "promisor-remote.h"
#include "sparse-index.h"
#ifndef DEBUG_CACHE_TREE
#define DEBUG_CACHE_TREE 0
@ -255,6 +256,24 @@ static int update_one(struct cache_tree *it,
*skip_count = 0;
/*
* If the first entry of this region is a sparse directory
* entry corresponding exactly to 'base', then this cache_tree
* struct is a "leaf" in the data structure, pointing to the
* tree OID specified in the entry.
*/
if (entries > 0) {
const struct cache_entry *ce = cache[0];
if (S_ISSPARSEDIR(ce->ce_mode) &&
ce->ce_namelen == baselen &&
!strncmp(ce->name, base, baselen)) {
it->entry_count = 1;
oidcpy(&it->oid, &ce->oid);
return 1;
}
}
if (0 <= it->entry_count && has_object_file(&it->oid))
return it->entry_count;
@ -442,6 +461,8 @@ int cache_tree_update(struct index_state *istate, int flags)
if (i)
return i;
ensure_full_index(istate);
if (!istate->cache_tree)
istate->cache_tree = cache_tree();
@ -787,6 +808,19 @@ int cache_tree_matches_traversal(struct cache_tree *root,
return 0;
}
static void verify_one_sparse(struct repository *r,
struct index_state *istate,
struct cache_tree *it,
struct strbuf *path,
int pos)
{
struct cache_entry *ce = istate->cache[pos];
if (!S_ISSPARSEDIR(ce->ce_mode))
BUG("directory '%s' is present in index, but not sparse",
path->buf);
}
static void verify_one(struct repository *r,
struct index_state *istate,
struct cache_tree *it,
@ -809,6 +843,12 @@ static void verify_one(struct repository *r,
if (path->len) {
pos = index_name_pos(istate, path->buf, path->len);
if (pos >= 0) {
verify_one_sparse(r, istate, it, path, pos);
return;
}
pos = -pos - 1;
} else {
pos = 0;

25
cache.h

@ -204,6 +204,8 @@ struct cache_entry {
#error "CE_EXTENDED_FLAGS out of range"
#endif
#define S_ISSPARSEDIR(m) ((m) == S_IFDIR)
/* Forward structure decls */
struct pathspec;
struct child_process;
@ -249,6 +251,8 @@ static inline unsigned int create_ce_mode(unsigned int mode)
{
if (S_ISLNK(mode))
return S_IFLNK;
if (S_ISSPARSEDIR(mode))
return S_IFDIR;
if (S_ISDIR(mode) || S_ISGITLINK(mode))
return S_IFGITLINK;
return S_IFREG | ce_permissions(mode);
@ -305,6 +309,7 @@ static inline unsigned int canon_mode(unsigned int mode)
struct split_index;
struct untracked_cache;
struct progress;
struct pattern_list;
struct index_state {
struct cache_entry **cache;
@ -319,7 +324,14 @@ struct index_state {
drop_cache_tree : 1,
updated_workdir : 1,
updated_skipworktree : 1,
fsmonitor_has_run_once : 1;
fsmonitor_has_run_once : 1,
/*
* sparse_index == 1 when sparse-directory
* entries exist. Requires sparse-checkout
* in cone mode.
*/
sparse_index : 1;
struct hashmap name_hash;
struct hashmap dir_hash;
struct object_id oid;
@ -329,6 +341,7 @@ struct index_state {
struct mem_pool *ce_mem_pool;
struct progress *progress;
struct repository *repo;
struct pattern_list *sparse_checkout_patterns;
};
/* Name hashing */
@ -337,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce);
void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
void free_name_hash(struct index_state *istate);
void ensure_full_index(struct index_state *istate);
/* Cache entry creation and cleanup */
@ -722,6 +736,8 @@ int read_index_from(struct index_state *, const char *path,
const char *gitdir);
int is_index_unborn(struct index_state *);
void ensure_full_index(struct index_state *istate);
/* For use with `write_locked_index()`. */
#define COMMIT_LOCK (1 << 0)
#define SKIP_IF_UNCHANGED (1 << 1)
@ -785,7 +801,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na
* index_name_pos(&index, "f", 1) -> -3
* index_name_pos(&index, "g", 1) -> -5
*/
int index_name_pos(const struct index_state *, const char *name, int namelen);
int index_name_pos(struct index_state *, const char *name, int namelen);
/*
* Some functions return the negative complement of an insert position when a
@ -835,8 +851,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags);
int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
int index_name_is_other(const struct index_state *, const char *, int);
void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
int index_name_is_other(struct index_state *, const char *, int);
void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
/* do stat comparison even if CE_VALID is true */
#define CE_MATCH_IGNORE_VALID 01
@ -1044,6 +1060,7 @@ struct repository_format {
int worktree_config;
int is_bare;
int hash_algo;
int sparse_index;
char *work_tree;
struct string_list unknown_extensions;
struct string_list v1_only_extensions;

20
convert.c

@ -127,7 +127,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si
}
}
const char *get_cached_convert_stats_ascii(const struct index_state *istate,
const char *get_cached_convert_stats_ascii(struct index_state *istate,
const char *path)
{
const char *ret;
@ -211,7 +211,7 @@ static void check_global_conv_flags_eol(const char *path,
}
}
static int has_crlf_in_index(const struct index_state *istate, const char *path)
static int has_crlf_in_index(struct index_state *istate, const char *path)
{
unsigned long sz;
void *data;
@ -485,7 +485,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len,
return 1;
}
static int crlf_to_git(const struct index_state *istate,
static int crlf_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *buf,
enum convert_crlf_action crlf_action, int conv_flags)
@ -1293,7 +1293,7 @@ static int git_path_check_ident(struct attr_check_item *check)
static struct attr_check *check;
void convert_attrs(const struct index_state *istate,
void convert_attrs(struct index_state *istate,
struct conv_attrs *ca, const char *path)
{
struct attr_check_item *ccheck = NULL;
@ -1355,7 +1355,7 @@ void reset_parsed_attributes(void)
user_convert_tail = NULL;
}
int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path)
int would_convert_to_git_filter_fd(struct index_state *istate, const char *path)
{
struct conv_attrs ca;
@ -1374,7 +1374,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL);
}
const char *get_convert_attr_ascii(const struct index_state *istate, const char *path)
const char *get_convert_attr_ascii(struct index_state *istate, const char *path)
{
struct conv_attrs ca;
@ -1400,7 +1400,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char
return "";
}
int convert_to_git(const struct index_state *istate,
int convert_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags)
{
@ -1434,7 +1434,7 @@ int convert_to_git(const struct index_state *istate,
return ret | ident_to_git(src, len, dst, ca.ident);
}
void convert_to_git_filter_fd(const struct index_state *istate,
void convert_to_git_filter_fd(struct index_state *istate,
const char *path, int fd, struct strbuf *dst,
int conv_flags)
{
@ -1511,7 +1511,7 @@ int convert_to_working_tree_ca(const struct conv_attrs *ca,
meta, NULL);
}
int renormalize_buffer(const struct index_state *istate, const char *path,
int renormalize_buffer(struct index_state *istate, const char *path,
const char *src, size_t len, struct strbuf *dst)
{
struct conv_attrs ca;
@ -1972,7 +1972,7 @@ struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,
return filter;
}
struct stream_filter *get_stream_filter(const struct index_state *istate,
struct stream_filter *get_stream_filter(struct index_state *istate,
const char *path,
const struct object_id *oid)
{

22
convert.h

@ -84,19 +84,19 @@ struct conv_attrs {
const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */
};
void convert_attrs(const struct index_state *istate,
void convert_attrs(struct index_state *istate,
struct conv_attrs *ca, const char *path);
extern enum eol core_eol;
extern char *check_roundtrip_encoding;
const char *get_cached_convert_stats_ascii(const struct index_state *istate,
const char *get_cached_convert_stats_ascii(struct index_state *istate,
const char *path);
const char *get_wt_convert_stats_ascii(const char *path);
const char *get_convert_attr_ascii(const struct index_state *istate,
const char *get_convert_attr_ascii(struct index_state *istate,
const char *path);
/* returns 1 if *dst was used */
int convert_to_git(const struct index_state *istate,
int convert_to_git(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst, int conv_flags);
int convert_to_working_tree_ca(const struct conv_attrs *ca,
@ -108,7 +108,7 @@ int async_convert_to_working_tree_ca(const struct conv_attrs *ca,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
void *dco);
static inline int convert_to_working_tree(const struct index_state *istate,
static inline int convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta)
@ -117,7 +117,7 @@ static inline int convert_to_working_tree(const struct index_state *istate,
convert_attrs(istate, &ca, path);
return convert_to_working_tree_ca(&ca, path, src, len, dst, meta);
}
static inline int async_convert_to_working_tree(const struct index_state *istate,
static inline int async_convert_to_working_tree(struct index_state *istate,
const char *path, const char *src,
size_t len, struct strbuf *dst,
const struct checkout_metadata *meta,
@ -129,20 +129,20 @@ static inline int async_convert_to_working_tree(const struct index_state *istate
}
int async_query_available_blobs(const char *cmd,
struct string_list *available_paths);
int renormalize_buffer(const struct index_state *istate,
int renormalize_buffer(struct index_state *istate,
const char *path, const char *src, size_t len,
struct strbuf *dst);
static inline int would_convert_to_git(const struct index_state *istate,
static inline int would_convert_to_git(struct index_state *istate,
const char *path)
{
return convert_to_git(istate, path, NULL, 0, NULL, 0);
}
/* Precondition: would_convert_to_git_filter_fd(path) == true */
void convert_to_git_filter_fd(const struct index_state *istate,
void convert_to_git_filter_fd(struct index_state *istate,
const char *path, int fd,
struct strbuf *dst,
int conv_flags);
int would_convert_to_git_filter_fd(const struct index_state *istate,
int would_convert_to_git_filter_fd(struct index_state *istate,
const char *path);
/*
@ -176,7 +176,7 @@ void reset_parsed_attributes(void);
struct stream_filter; /* opaque */
struct stream_filter *get_stream_filter(const struct index_state *istate,
struct stream_filter *get_stream_filter(struct index_state *istate,
const char *path,
const struct object_id *);
struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca,

14
dir.c

@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat,
* [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match.
* [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match.
*/
static int match_pathspec_item(const struct index_state *istate,
static int match_pathspec_item(struct index_state *istate,
const struct pathspec_item *item, int prefix,
const char *name, int namelen, unsigned flags)
{
@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate,
* pathspec did not match any names, which could indicate that the
* user mistyped the nth pathspec.
*/
static int do_match_pathspec(const struct index_state *istate,
static int do_match_pathspec(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen,
@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate,
return retval;
}
static int match_pathspec_with_flags(const struct index_state *istate,
static int match_pathspec_with_flags(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen, unsigned flags)
@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate,
return negative ? 0 : positive;
}
int match_pathspec(const struct index_state *istate,
int match_pathspec(struct index_state *istate,
const struct pathspec *ps,
const char *name, int namelen,
int prefix, char *seen, int is_dir)
@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate,
/**
* Check if a submodule is a superset of the pathspec
*/
int submodule_path_match(const struct index_state *istate,
int submodule_path_match(struct index_state *istate,
const struct pathspec *ps,
const char *submodule_name,
char *seen)
@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base,
add_pattern_to_hashsets(pl, pattern);
}
static int read_skip_worktree_file_from_index(const struct index_state *istate,
static int read_skip_worktree_file_from_index(struct index_state *istate,
const char *path,
size_t *size_out, char **data_out,
struct oid_stat *oid_stat)
@ -3542,6 +3542,8 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
if (repo_read_index(&subrepo) < 0)
die(_("index file corrupt in repo %s"), subrepo.gitdir);
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(subrepo.index);
for (i = 0; i < subrepo.index->cache_nr; i++) {
const struct cache_entry *ce = subrepo.index->cache[i];

8
dir.h

@ -354,7 +354,7 @@ int count_slashes(const char *s);
int simple_length(const char *match);
int no_wildcard(const char *string);
char *common_prefix(const struct pathspec *pathspec);
int match_pathspec(const struct index_state *istate,
int match_pathspec(struct index_state *istate,
const struct pathspec *pathspec,
const char *name, int namelen,
int prefix, char *seen, int is_dir);
@ -493,12 +493,12 @@ int git_fnmatch(const struct pathspec_item *item,
const char *pattern, const char *string,
int prefix);
int submodule_path_match(const struct index_state *istate,
int submodule_path_match(struct index_state *istate,
const struct pathspec *ps,
const char *submodule_name,
char *seen);
static inline int ce_path_match(const struct index_state *istate,
static inline int ce_path_match(struct index_state *istate,
const struct cache_entry *ce,
const struct pathspec *pathspec,
char *seen)
@ -507,7 +507,7 @@ static inline int ce_path_match(const struct index_state *istate,
S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
}
static inline int dir_path_match(const struct index_state *istate,
static inline int dir_path_match(struct index_state *istate,
const struct dir_entry *ent,
const struct pathspec *pathspec,
int prefix, char *seen)

2
entry.c

@ -423,6 +423,8 @@ static void mark_colliding_entries(const struct checkout *state,
ce->ce_flags |= CE_MATCHED;
/* TODO: audit for interaction with sparse-index. */
ensure_full_index(state->istate);
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];

2
merge-ort.c

@ -2564,7 +2564,7 @@ static int blob_unchanged(struct merge_options *opt,
struct strbuf basebuf = STRBUF_INIT;