Commit Graph

998 Commits

Author SHA1 Message Date
Jeff King 45a187cc34 lookup_unknown_object(): take a repository argument
All of the other lookup_foo() functions take a repository argument, but
lookup_unknown_object() was never converted, and it uses the_repository
internally. Let's fix that.

We could leave a wrapper that uses the_repository, but there aren't that
many calls, so we'll just convert them all. I looked briefly at each
site to see if we had a repository struct (besides the_repository) we
could pass, but none of them do (so this conversion to pass
the_repository is a pure noop in each case, though it does take us one
step closer to eventually getting rid of the_repository).

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2021-04-13 13:18:46 -07:00
René Scharfe ca56dadb4b use CALLOC_ARRAY
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe <>
Signed-off-by: Junio C Hamano <>
2021-03-13 16:00:09 -08:00
Junio C Hamano 6254fa1359 Merge branch 'tb/ls-refs-optim'
The ls-refs protocol operation has been optimized to narrow the
sub-hierarchy of refs/ it walks to produce response.

* tb/ls-refs-optim:
  ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets
  ls-refs.c: initialize 'prefixes' before using it
  refs: expose 'for_each_fullref_in_prefixes'
2021-02-05 16:40:45 -08:00
Junio C Hamano 973e20b83f Merge branch 'jk/peel-iterated-oid'
The peel_ref() API has been replaced with peel_iterated_oid().

* jk/peel-iterated-oid:
  refs: switch peel_ref() to peel_iterated_oid()
2021-02-03 15:04:49 -08:00
Taylor Blau 16b1985be5 refs: expose 'for_each_fullref_in_prefixes'
This function was used in the ref-filter.c code to find the longest
common prefix of among a set of refspecs, and then to iterate all of the
references that descend from that prefix.

A future patch will want to use that same code from ls-refs.c, so
prepare by exposing and moving it to refs.c. Since there is nothing
specific to the ref-filter code here (other than that it was previously
the only caller of this function), this really belongs in the more
generic refs.h header.

The code moved in this patch is identical before and after, with the one
exception of renaming some arguments to be consistent with other
functions exposed in refs.h.

Signed-off-by: Taylor Blau <>
Signed-off-by: Junio C Hamano <>
2021-01-22 18:57:27 -08:00
Jeff King 36a317929b refs: switch peel_ref() to peel_iterated_oid()
The peel_ref() interface is confusing and error-prone:

  - it's typically used by ref iteration callbacks that have both a
    refname and oid. But since they pass only the refname, we may load
    the ref value from the filesystem again. This is inefficient, but
    also means we are open to a race if somebody simultaneously updates
    the ref. E.g., this:

      int some_ref_cb(const char *refname, const struct object_id *oid, ...)
              if (!peel_ref(refname, &peeled))
                      printf("%s peels to %s",
                             oid_to_hex(oid), oid_to_hex(&peeled);

    could print nonsense. It is correct to say "refname peels to..."
    (you may see the "before" value or the "after" value, either of
    which is consistent), but mentioning both oids may be mixing
    before/after values.

    Worse, whether this is possible depends on whether the optimization
    to read from the current iterator value kicks in. So it is actually
    not possible with:


    but it _is_ possible with:


    which does not use the iterator mechanism (though in practice, HEAD
    should never peel to anything, so this may not be triggerable).

  - it must take a fully-qualified refname for the read_ref_full() code
    path to work. Yet we routinely pass it partial refnames from
    callbacks to for_each_tag_ref(), etc. This happens to work when
    iterating because there we do not call read_ref_full() at all, and
    only use the passed refname to check if it is the same as the
    iterator. But the requirements for the function parameters are quite

Instead of taking a refname, let's instead take an oid. That fixes both
problems. It's a little funny for a "ref" function not to involve refs
at all. The key thing is that it's optimizing under the hood based on
having access to the ref iterator. So let's change the name to make it
clear why you'd want this function versus just peel_object().

There are two other directions I considered but rejected:

  - we could pass the peel information into the each_ref_fn callback.
    However, we don't know if the caller actually wants it or not. For
    packed-refs, providing it is essentially free. But for loose refs,
    we actually have to peel the object, which would be wasteful in most
    cases. We could likewise pass in a flag to the callback indicating
    whether the peeled information is known, but that complicates those
    callbacks, as they then have to decide whether to manually peel
    themselves. Plus it requires changing the interface of every
    callback, whether they care about peeling or not, and there are many
    of them.

  - we could make a function to return the peeled value of the current
    iterated ref (computing it if necessary), and BUG() otherwise. I.e.:

      int peel_current_iterated_ref(struct object_id *out);

    Each of the current callers is an each_ref_fn callback, so they'd
    mostly be happy. But:

      - we use those callbacks with functions like head_ref(), which do
        not use the iteration code. So we'd need to handle the fallback
        case there, anyway.

      - it's possible that a caller would want to call into generic code
        that sometimes is used during iteration and sometimes not. This
        encapsulates the logic to do the fast thing when possible, and
        fallback when necessary.

The implementation is mostly obvious, but I want to call out a few
things in the patch:

  - the test-tool coverage for peel_ref() is now meaningless, as it all
    collapses to a single peel_object() call (arguably they were pretty
    uninteresting before; the tricky part of that function is the
    fast-path we see during iteration, but these calls didn't trigger
    that). I've just dropped it entirely, though note that some other
    tests relied on the tags we created; I've moved that creation to the
    tests where it matters.

  - we no longer need to take a ref_store parameter, since we'd never
    look up a ref now. We do still rely on a global "current iterator"
    variable which _could_ be kept per-ref-store. But in practice this
    is only useful if there are multiple recursive iterations, at which
    point the more appropriate solution is probably a stack of
    iterators. No caller used the actual ref-store parameter anyway
    (they all call the wrapper that passes the_repository).

  - the original only kicked in the optimization when the "refname"
    pointer matched (i.e., not string comparison). We do likewise with
    the "oid" parameter here, but fall back to doing an actual oideq()
    call. This in theory lets us kick in the optimization more often,
    though in practice no current caller cares. It should never be
    wrong, though (peeling is a property of an object, so two refs
    pointing to the same object would peel identically).

  - the original took care not to touch the peeled out-parameter unless
    we found something to put in it. But no caller cares about this, and
    anyway, it is enforced by peel_object() itself (and even in the
    optimized iterator case, that's where we eventually end up). We can
    shorten the code and avoid an extra copy by just passing the
    out-parameter through the stack.

Signed-off-by: Jeff King <>
Reviewed-by: Taylor Blau <>
Signed-off-by: Junio C Hamano <>
2021-01-21 15:51:31 -08:00
Denton Liu 6436a20284 refs: allow @{n} to work with n-sized reflog
This sequence works

	$ git checkout -b newbranch
	$ git commit --allow-empty -m one
	$ git show -s newbranch@{1}

and shows the state that was immediately after the newbranch was

But then if you do

	$ git reflog expire --expire=now refs/heads/newbranch
	$ git commit --allow=empty -m two
	$ git show -s newbranch@{1}

you'd be scolded with

	fatal: log for 'newbranch' only has 1 entries

While it is true that it has only 1 entry, we have enough
information in that single entry that records the transition between
the state in which the tip of the branch was pointing at commit
'one' to the new commit 'two' built on it, so we should be able to
answer "what object newbranch was pointing at?". But we refuse to
do so.

Make @{0} the special case where we use the new side to look up that
entry. Otherwise, look up @{n} using the old side of the (n-1)th entry
of the reflog.

Suggested-by: Junio C Hamano <>
Signed-off-by: Denton Liu <>
Signed-off-by: Junio C Hamano <>
2021-01-11 14:13:50 -08:00
Denton Liu 95c2a71820 refs: factor out set_read_ref_cutoffs()
This block of code is duplicated twice. In a future commit, it will be
duplicated for a third time. Factor out the common functionality into

In the case of read_ref_at_ent(), we are incrementing `cb->reccnt` at the
beginning of the function. Move these to right before the return so that
the `cb->reccnt - 1` is changed to `cb->reccnt` and it can be cleanly
factored out into set_read_ref_cutoffs(). The duplication of the
increment statements will be removed in a future patch.

Signed-off-by: Denton Liu <>
Signed-off-by: Junio C Hamano <>
2021-01-10 12:24:00 -08:00
Johannes Schindelin 675704c74d init: provide useful advice about init.defaultBranch
To give ample warning for users wishing to override Git's the fall-back
for an unconfigured `init.defaultBranch` (in case we decide to change it
in a future Git version), let's introduce some advice that is shown upon
`git init` when that value is not set.

Note: two test cases in Git's test suite want to verify that the
`stderr` output of `git init` is empty. It is now necessary to suppress
the advice, we now do that via the `init.defaultBranch` setting. While
not strictly necessary, we also set this to `false` in

Signed-off-by: Johannes Schindelin <>
Signed-off-by: Junio C Hamano <>
2020-12-13 15:53:51 -08:00
Johannes Schindelin cc0f13c57d get_default_branch_name(): prepare for showing some advice
We are about to introduce a message giving users running `git init` some
advice about `init.defaultBranch`. This will necessarily be done in

Not all code paths want to show that advice, though. In particular, the
`git clone` codepath _specifically_ asks for `init_db()` to be quiet,
via the `INIT_DB_QUIET` flag.

In preparation for showing users above-mentioned advice, let's change
the function signature of `get_default_branch_name()` to accept the
parameter `quiet`.

Signed-off-by: Johannes Schindelin <>
Signed-off-by: Junio C Hamano <>
2020-12-13 15:53:50 -08:00
Johannes Schindelin 704fed9ea2 tests: start moving to a different default main branch name
To allow for an incremental conversion to a new default main branch
name, let's introduce `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME`. This
environment variable can be set at the top of each converted test
script, overriding the default main branch name to use when initializing
new repositories (or cloning empty repositories).

Note: the `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME` is _not_ intended to be
used manually; many tests require a specific main branch name and cannot
simply work with another one. This `GIT_TEST_*` variable is meant purely
for the transitional period while the entire test suite is converted to
use `main` as the initial branch name by default.

We also introduce the `PREPARE_FOR_MAIN_BRANCH` prereq that determines
whether the default main branch name is `main`, and adjust a couple of
test functions to use it. This prereq will be used to temporarily
disable a couple test cases to allow for adjusting the test script
incrementally. Once an entire test is adjusted, we will adjust the test
so that it is run with `GIT_TEST_DEFAULT_MAIN_BRANCH_NAME=main`.

Signed-off-by: Johannes Schindelin <>
Signed-off-by: Junio C Hamano <>
2020-10-23 08:57:40 -07:00
Junio C Hamano c9a04f036f Merge branch 'hn/refs-trace-backend'
Developer support.

* hn/refs-trace-backend:
  refs: add GIT_TRACE_REFS debugging mechanism
2020-09-22 12:36:28 -07:00
Junio C Hamano 0df670bc0b Merge branch 'jt/interpret-branch-name-fallback'
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error.  Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its

* jt/interpret-branch-name-fallback:
  wt-status: tolerate dangling marks
  refs: move dwim_ref() to header file
  sha1-name: replace unsigned int with option struct
2020-09-09 13:53:09 -07:00
Han-Wen Nienhuys 4441f42707 refs: add GIT_TRACE_REFS debugging mechanism
When set in the environment, GIT_TRACE_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.


    $ GIT_TRACE_REFS="1" ./git branch
    15:42:09.769631 refs/debug.c:26         ref_store for .git
    15:42:09.769681 refs/debug.c:249        read_raw_ref: HEAD: 0000000000000000000000000000000000000000 (=> refs/heads/ref-debug) type 1: 0
    15:42:09.769695 refs/debug.c:249        read_raw_ref: refs/heads/ref-debug: 3a238e539b (=> refs/heads/ref-debug) type 0: 0
    15:42:09.770282 refs/debug.c:233        ref_iterator_begin: refs/heads/ (0x1)
    15:42:09.770290 refs/debug.c:189        iterator_advance: refs/heads/b4 (0)
    15:42:09.770295 refs/debug.c:189        iterator_advance: refs/heads/branch3 (0)

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-09-09 12:58:37 -07:00
Jonathan Tan f24c30e0b6 wt-status: tolerate dangling marks
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:

  git clone $URL client
  cd client
  git checkout @{u}
  git status

no status is printed, but instead an error message:

  fatal: HEAD does not point to a branch

(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)

This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.

Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().

Signed-off-by: Jonathan Tan <>
Signed-off-by: Junio C Hamano <>
2020-09-02 14:39:25 -07:00
Jonathan Tan ec06b05568 refs: move dwim_ref() to header file
This makes it clear that dwim_ref() is just repo_dwim_ref() without the
first parameter.

Signed-off-by: Jonathan Tan <>
Signed-off-by: Junio C Hamano <>
2020-09-02 14:39:17 -07:00
Jonathan Tan a4f66a7876 sha1-name: replace unsigned int with option struct
In preparation for a future patch adding a boolean parameter to
repo_interpret_branch_name(), which might be easily confused with an
existing unsigned int parameter, refactor repo_interpret_branch_name()
to take an option struct instead of the unsigned int parameter.

The static function interpret_branch_mark() is also updated to take the
option struct in preparation for that future patch, since it will also
make use of the to-be-introduced boolean parameter.

Signed-off-by: Jonathan Tan <>
Signed-off-by: Junio C Hamano <>
2020-09-02 14:39:17 -07:00
Junio C Hamano 6ddd76fd6c Merge branch 'ps/ref-transaction-hook'
Code simplification by removing ineffective optimization.

* ps/ref-transaction-hook:
  refs: remove lookup cache for reference-transaction hook
2020-08-31 15:49:53 -07:00
Junio C Hamano e699684cf6 Merge branch 'hn/refs-pseudorefs'
Accesses to two pseudorefs have been updated to properly use ref

* hn/refs-pseudorefs:
  sequencer: treat REVERT_HEAD as a pseudo ref
  builtin/commit: suggest update-ref for pseudoref removal
  sequencer: treat CHERRY_PICK_HEAD as a pseudo ref
  refs: make refs_ref_exists public
2020-08-31 15:49:48 -07:00
Junio C Hamano 98df75b286 Merge branch 'hn/refs-fetch-head-is-special'
The FETCH_HEAD is now always read from the filesystem regardless of
the ref backend in use, as its format is much richer than the
normal refs, and written directly by "git fetch" as a plain file..

* hn/refs-fetch-head-is-special:
  refs: read FETCH_HEAD and MERGE_HEAD generically
  refs: move gitdir into base ref_store
  refs: fix comment about submodule ref_stores
  refs: split off reading loose ref data in separate function
2020-08-27 14:04:49 -07:00
Patrick Steinhardt 0a0fbbe3ff refs: remove lookup cache for reference-transaction hook
When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.

To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:

Test                         master            pks-reftx-hook-remove-cache
1400.2: update-ref           2.79(2.16+0.74)   2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin   0.22(0.08+0.14)   0.21(0.08+0.12) -4.5%

Test                         master            pks-reftx-hook-remove-cache
1400.2: update-ref           2.70(2.09+0.72)   2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin   0.21(0.10+0.10)   0.21(0.08+0.13) +0.0%

One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:

Test                         master            pks-reftx-hook-remove-cache
1400.4: nonatomic push       6.63(6.50+0.13)   6.81(6.67+0.14) +2.7%
1400.4: nonatomic push       6.35(6.21+0.14)   6.39(6.23+0.16) +0.6%
1400.4: nonatomic push       6.43(6.31+0.13)   6.42(6.28+0.15) -0.2%

So let's just remove the cache altogether to simplify the code.

Signed-off-by: Patrick Steinhardt <>
Signed-off-by: Junio C Hamano <>
2020-08-25 15:34:42 -07:00
Han-Wen Nienhuys 3f9f1acccf refs: make refs_ref_exists public
This will be necessary to replace file existence checks for pseudorefs.

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-08-21 11:20:10 -07:00
Han-Wen Nienhuys e811530278 refs: read FETCH_HEAD and MERGE_HEAD generically
The FETCH_HEAD and MERGE_HEAD refs must be stored in a file, regardless of the
type of ref backend. This is because they can hold more than just a single ref.

To accomodate them for alternate ref backends, read them from a file generically
in refs_read_raw_ref()

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-08-19 14:08:04 -07:00
Junio C Hamano 95c687bf85 Merge branch 'hn/reftable-prep-part-2'
Further preliminary change to refs API.

* hn/reftable-prep-part-2:
  Modify pseudo refs through ref backend storage
  t1400: use git rev-parse for testing PSEUDOREF existence
2020-08-17 17:02:42 -07:00
Junio C Hamano 5676db2612 Merge branch 'ps/ref-transaction-hook'
The logic to find the ref transaction hook script attempted to
cache the path to the found hook without realizing that it needed
to keep a copied value, as the API it used returned a transitory
buffer space.  This has been corrected.

* ps/ref-transaction-hook:
  t1416: avoid hard-coded sha1 ids
  refs: fix interleaving hook calls with reference-transaction hook
2020-08-17 17:02:41 -07:00
Junio C Hamano 46b225f153 Merge branch 'jk/strvec'
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree.  It has been renamed to "strvec" to reduce the
barrier to adoption.

* jk/strvec:
  strvec: rename struct fields
  strvec: drop argv_array compatibility layer
  strvec: update documention to avoid argv_array
  strvec: fix indentation in renamed calls
  strvec: convert remaining callers away from argv_array name
  strvec: convert more callers away from argv_array name
  strvec: convert builtin/ callers away from argv_array name
  quote: rename sq_dequote_to_argv_array to mention strvec
  strvec: rename files from argv-array to strvec
  argv-array: rename to strvec
  argv-array: use size_t for count and alloc
2020-08-10 10:23:57 -07:00
Patrick Steinhardt e5256c82e5 refs: fix interleaving hook calls with reference-transaction hook
In order to not repeatedly search for the reference-transaction hook in
case it's getting called multiple times, we use a caching mechanism to
only call `find_hook()` once. What was missed though is that the return
value of `find_hook()` actually comes from a static strbuf, which means
it will get overwritten when calling `find_hook()` again. As a result,
we may call the wrong hook with parameters of the reference-transaction

This scenario was spotted in the wild when executing a git-push(1) with
multiple references, where there are interleaving calls to both the
update and the reference-transaction hook. While initial calls to the
reference-transaction hook work as expected, it will stop working after
the next invocation of the update hook. The result is that we now start
calling the update hook with parameters and stdin of the
reference-transaction hook.

This commit fixes the issue by storing a copy of `find_hook()`'s return
value in the cache.

Signed-off-by: Junio C Hamano <>
2020-08-07 12:27:41 -07:00
Junio C Hamano dc3c6fb565 Merge branch 'hn/reftable' into master
Brown-paper-bag fix.

* hn/reftable:
  refs: move the logic to add \t to reflog to the files backend
2020-08-01 13:49:13 -07:00
Han-Wen Nienhuys 25429fed5c refs: move the logic to add \t to reflog to the files backend
523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10)
centralized reflog normalizaton.  However, the normalizaton added a
leading "\t" to the message. This is an artifact of the reflog
storage format in the files backend, so it should be added there.

Routines that parse back the reflog (such as grab_nth_branch_switch)
expect the "\t" to not be in the message, so without this fix, git
with reftable cannot process the "@{-1}" syntax.

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-07-31 10:21:51 -07:00
Junio C Hamano 3161cc6e6b Merge branch 'hn/reftable' into master
Preliminary clean-up of the refs API in preparation for adding a
new refs backend "reftable".

* hn/reftable:
  reflog: cleanse messages in the refs.c layer
  bisect: treat BISECT_HEAD as a pseudo ref
  t3432: use git-reflog to inspect the reflog for HEAD write tag using git-update-ref
2020-07-30 13:20:32 -07:00
Jeff King c972bf4cf5 strvec: convert remaining callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts all of the remaining files, as the resulting diff is
reasonably sized.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '

We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2020-07-28 15:02:18 -07:00
Jeff King dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2020-07-28 15:02:17 -07:00
Han-Wen Nienhuys 55dd8b9108 Make HEAD a PSEUDOREF rather than PER_WORKTREE.
This is consistent with the definition of REF_TYPE_PSEUDOREF
(uppercase in the root ref namespace).

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-07-27 10:06:49 -07:00
Han-Wen Nienhuys 09743417a2 Modify pseudo refs through ref backend storage
The previous behavior was introduced in commit 74ec19d4be
("pseudorefs: create and use pseudoref update and delete functions",
Jul 31, 2015), with the justification "alternate ref backends still
need to store pseudorefs in GIT_DIR".

Refs such as REBASE_HEAD are read through the ref backend. This can
only work consistently if they are written through the ref backend as
well. Tooling that works directly on files under .git should be
updated to use git commands to read refs instead.

The following behaviors change:

* Updates to pseudorefs (eg. ORIG_HEAD) with
  core.logAllRefUpdates=always will create reflogs for the pseudoref.

* non-HEAD pseudoref symrefs are also dereferenced on deletion. Update
  t1405 accordingly.

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-07-27 10:06:49 -07:00
Junio C Hamano 523fa69c36 reflog: cleanse messages in the refs.c layer
Regarding reflog messages:

 - We expect that a reflog message consists of a single line.  The
   file format used by the files backend may add a LF after the
   message as a delimiter, and output by commands like "git log -g"
   may complete such an incomplete line by adding a LF at the end,
   but philosophically, the terminating LF is not a part of the

 - We however allow callers of refs API to supply a random sequence
   of NUL terminated bytes.  We cleanse caller-supplied message by
   squashing a run of whitespaces into a SP, and by trimming trailing
   whitespace, before storing the message.  This is how we tolerate,
   instead of erring out, a message with LF in it (be it at the end,
   in the middle, or both).

Currently, the cleansing of the reflog message is done by the files
backend, before the log is written out.  This is sufficient with the
current code, as that is the only backend that writes reflogs.  But
new backends can be added that write reflogs, and we'd want the
resulting log message we would read out of "log -g" the same no
matter what backend is used, and moving the code to do so to the
generic layer is a way to do so.

An added benefit is that the "cleansing" function could be updated
later, independent from individual backends, to e.g. allow
multi-line log messages if we wanted to, and when that happens, it
would help a lot to ensure we covered all bases if the cleansing
function (which would be updated) is called from the generic layer.

Side note: I am not interested in supporting multi-line reflog
messages right at the moment (nobody is asking for it), but I
envision that instead of the "squash a run of whitespaces into a SP
and rtrim" cleansing, we can %urlencode problematic bytes in the
message *AND* append a SP at the end, when a new version of Git that
supports multi-line and/or verbatim reflog messages writes a reflog
record.  The reading side can detect the presense of SP at the end
(which should have been rtrimmed out if it were written by existing
versions of Git) as a signal that decoding %urlencode recovers the
original reflog message.

Signed-off-by: Han-Wen Nienhuys <>
Signed-off-by: Junio C Hamano <>
2020-07-10 13:53:37 -07:00
Junio C Hamano 11cbda2add Merge branch 'js/default-branch-name'
The name of the primary branch in existing repositories, and the
default name used for the first branch in newly created
repositories, is made configurable, so that we can eventually wean
ourselves off of the hardcoded 'master'.

* js/default-branch-name:
  contrib: subtree: adjust test to change in fmt-merge-msg
  testsvn: respect `init.defaultBranch`
  remote: use the configured default branch name when appropriate
  clone: use configured default branch name when appropriate
  init: allow setting the default for the initial branch name via the config
  init: allow specifying the initial branch name for the new repository
  docs: add missing diamond brackets
  submodule: fall back to remote's HEAD for missing remote.<name>.branch
  send-pack/transport-helper: avoid mentioning a particular branch
  fmt-merge-msg: stop treating `master` specially
2020-07-06 22:09:17 -07:00
Junio C Hamano d80bea479d Merge branch 'ak/commit-graph-to-slab'
A few fields in "struct commit" that do not have to always be
present have been moved to commit slabs.

* ak/commit-graph-to-slab:
  commit-graph: minimize commit_graph_data_slab access
  commit: move members graph_pos, generation to a slab
  commit-graph: introduce commit_graph_data_slab
  object: drop parsed_object_pool->commit_count
2020-07-06 22:09:14 -07:00
Don Goodman-Wilson 8747ebb7cd init: allow setting the default for the initial branch name via the config
We just introduced the command-line option
`--initial-branch=<branch-name>` to allow initializing a new repository
with a different initial branch than the hard-coded one.

To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each and every
`git init` invocation), let's introduce the `init.defaultBranch` config

Helped-by: Johannes Schindelin <>
Helped-by: Derrick Stolee <>
Signed-off-by: Don Goodman-Wilson <>
Signed-off-by: Junio C Hamano <>
2020-06-24 09:14:21 -07:00
Patrick Steinhardt 6754159767 refs: implement reference transaction hook
The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.

One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.

The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted

Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.

In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:

  Test                         origin/master     HEAD
  1400.2: update-ref           2.70(2.10+0.71)   2.71(2.10+0.73) +0.4%
  1400.3: update-ref --stdin   0.21(0.09+0.11)   0.21(0.07+0.14) +0.0%

The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.

As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's

Signed-off-by: Patrick Steinhardt <>
Signed-off-by: Junio C Hamano <>
2020-06-19 10:46:13 -07:00
Abhishek Kumar 6da43d937c object: drop parsed_object_pool->commit_count
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.

However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.

Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.

As commit_count has no use anymore, let's also drop it from the struct.

Signed-off-by: Abhishek Kumar <>
Signed-off-by: Junio C Hamano <>
2020-06-17 14:37:14 -07:00
Junio C Hamano d3fc8dc53a Merge branch 'ds/log-exclude-decoration-config'
The "--decorate-refs" and "--decorate-refs-exclude" options "git
log" takes have learned a companion configuration variable
log.excludeDecoration that sits at the lowest priority in the

* ds/log-exclude-decoration-config:
  log: add log.excludeDecoration config option
  log-tree: make ref_filter_match() a helper method
2020-04-28 15:50:08 -07:00
Junio C Hamano 95ca48973d Merge branch 'jc/missing-ref-store-fix'
We've left the command line parsing of "git log :/a/b/" broken for
about a full year without anybody noticing, which has been

* jc/missing-ref-store-fix:
  repository: mark the "refs" pointer as private
  sha1-name: do not assume that the ref store is initialized
2020-04-22 13:42:55 -07:00
Derrick Stolee c9f7a793e8 log-tree: make ref_filter_match() a helper method
The ref_filter_match() method is defined in refs.h and implemented
in refs.c, but is only used by add_ref_decoration() in log-tree.c.
Move it into that file as a static helper method. The
match_ref_pattern() comes along for the ride.

While moving the code, also make a slight adjustment to have
ref_filter_match() take a struct decoration_filter pointer instead
of multiple string lists. This is non-functional, but will make a
later change be much cleaner.

The diff is easier to parse when using the --color-moved option.

Reported-by: Junio C Hamano <>
Signed-off-by: Derrick Stolee <>
Signed-off-by: Junio C Hamano <>
2020-04-16 11:04:55 -07:00
Jeff King 0220461071 repository: mark the "refs" pointer as private
The "refs" pointer in a struct repository starts life as NULL, but then
is lazily initialized when it is accessed via get_main_ref_store().
However, it's easy for calling code to forget this and access it
directly, leading to code which works _some_ of the time, but fails if
it is called before anybody else accesses the refs.

This was the cause of the bug fixed by 5ff4b920eb (sha1-name: do not
assume that the ref store is initialized, 2020-04-09). In order to
prevent similar bugs, let's more clearly mark the "refs" field as

In addition to helping future code, the name change will help us audit
any existing direct uses. Besides get_main_ref_store() itself, it turns
out there is only one. But we know it's OK as it is on the line directly
after the fix from 5ff4b920eb, which will have initialized the pointer.
However it's still a good idea for it to model the proper use of the
accessing function, so we'll convert it.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2020-04-09 22:40:48 -07:00
Eric Wong e2b5038d87 hashmap_entry: remove first member requirement from docs
Comments stating that "struct hashmap_entry" must be the first
member in a struct are no longer valid.

Suggested-by: Phillip Wood <>
Signed-off-by: Eric Wong <>
Signed-off-by: Junio C Hamano <>
2019-10-07 10:20:12 +09:00
Eric Wong 939af16eac hashmap_cmp_fn takes hashmap_entry params
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.

Signed-off-by: Eric Wong <>
Reviewed-by: Derrick Stolee <>
Signed-off-by: Junio C Hamano <>
2019-10-07 10:20:11 +09:00
Eric Wong f23a465132 hashmap_get{,_from_hash} return "struct hashmap_entry *"
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash
or container_of as appropriate.

This is another step towards eliminating the requirement of
hashmap_entry being the first field in a struct.

Signed-off-by: Eric Wong <>
Reviewed-by: Derrick Stolee <>
Signed-off-by: Junio C Hamano <>
2019-10-07 10:20:11 +09:00
Eric Wong 26b455f21e hashmap_put takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <>
Reviewed-by: Derrick Stolee <>
Signed-off-by: Junio C Hamano <>
2019-10-07 10:20:10 +09:00
Eric Wong d22245a2e3 hashmap_entry_init takes "struct hashmap_entry *"
C compilers do type checking to make life easier for us.  So
rely on that and update all hashmap_entry_init callers to take
"struct hashmap_entry *" to avoid future bugs while improving
safety and readability.

Signed-off-by: Eric Wong <>
Reviewed-by: Derrick Stolee <>
Signed-off-by: Junio C Hamano <>
2019-10-07 10:20:09 +09:00
Jeff King 0ebbcf70e6 object: convert lookup_unknown_object() to use object_id
There are no callers left of lookup_unknown_object() that aren't just
passing us the "hash" member of a "struct object_id". Let's take the
whole struct, which gets us closer to removing all raw sha1 variables.
It also matches the existing conversions of lookup_blob(), etc.

The conversions of callers were done by hand, but they're all mechanical

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2019-06-20 10:06:19 -07:00