Commit Graph

21 Commits

Author SHA1 Message Date
Elijah Newren b9a7ac2c68 hash-ll, hashmap: move oidhash() to hash-ll
oidhash() was used by both hashmap and khash, which makes sense.
However, the location of this function in hashmap.[ch] meant that
khash.h had to depend upon hashmap.h, making people unfamiliar with
khash think that it was built upon hashmap.  (Or at least, I personally
was confused for a while about this in the past.)

Move this function to hash-ll, so that khash.h can stop depending upon

This has another benefit as well: it allows us to remove hashmap.h's
dependency on hash-ll.h.  While some callers of hashmap.h were making
use of oidhash, most were not, so this change provides another way to
reduce the number of includes.

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren <>
Signed-off-by: Junio C Hamano <>
2023-06-21 13:39:54 -07:00
Elijah Newren 8043418b77 khash: name the structs that khash declares
khash.h lets you instantiate custom hash types that map between two
types. These are defined as a struct, as you might expect, and khash
typedef's that to kh_foo_t. But it declares the struct anonymously,
which doesn't give a name to the struct type itself; there is no
"struct kh_foo". This has two small downsides:

  - when using khash, we declare "kh_foo_t *the_foo".  This is
    unlike our usual naming style, which is "struct kh_foo *the_foo".

  - you can't forward-declare a typedef of an unnamed struct type in
    C. So we might do something like this in a header file:

        struct kh_foo;
        struct bar {
                struct kh_foo *the_foo;

    to avoid having to include the header that defines the real
    kh_foo. But that doesn't work with the typedef'd name. Without the
    "struct" keyword, the compiler doesn't know we mean that kh_foo is
    a type.

So let's always give khash structs the name that matches our
conventions ("struct kh_foo" to match "kh_foo_t"). We'll keep doing
the typedef to retain compatibility with existing callers.

Co-authored-by: Jeff King <>
Signed-off-by: Jeff King <>
Signed-off-by: Elijah Newren <>
Signed-off-by: Junio C Hamano <>
2023-06-21 13:39:54 -07:00
Elijah Newren d1cbe1e6d8 hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren <>
Signed-off-by: Junio C Hamano <>
2023-04-24 12:47:32 -07:00
Elijah Newren ba3d1c73da treewide: remove unnecessary cache.h includes
We had several header files include cache.h unnecessarily.  Remove
those.  These have all been verified via both ensuring that
    gcc -E $HEADER | grep '"cache.h"'
found no hits and that
    cat >temp.c <<EOF &&
    #include "git-compat-util.h"
    #include "$HEADER"
    int main() {}
    gcc -c temp.c
successfully compiles without warnings.

Signed-off-by: Elijah Newren <>
Signed-off-by: Junio C Hamano <>
2023-02-23 17:25:28 -08:00
René Scharfe 5632e838f8 khash: clarify that allocations never fail
We use our standard allocation functions and macros (xcalloc,
ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
the program on error instead, so code that's using them doesn't have to
handle allocation failures.  Make this behavior explicit by turning
kh_resize_ into a void function and removing the related unreachable
error handling code.

Helped-by: Jeff King <>
Signed-off-by: René Scharfe <>
Acked-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2021-07-06 13:07:50 -07:00
Jeff King d40abc8e95 hashmap: convert sha1hash() to oidhash()
There are no callers left of sha1hash() that do not simply pass the
"hash" member of a "struct object_id". Let's get rid of the outdated
sha1-specific function and provide one that operates on the whole struct
(even though the technique, taking the first few bytes of the hash, will
remain the same).

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2019-06-20 10:44:22 -07:00
Jeff King e465e7c41d khash: rename oid helper functions
For use in object_id hash tables, we have oid_hash() and oid_equal().
But these are confusingly similar to the existing oideq() and the
oidhash() we plan to add to replace sha1hash().

The big difference from those functions is that rather than accepting a
const pointer to the "struct object_id", we take the arguments by value
(which is a khash internal convention). So let's make that obvious by
calling them oidhash_by_value() and oideq_by_value().

Those names are fairly horrendous to type, but we rarely need to do so;
they are passed to the khash implementation macro and then only used
internally. Callers get to use the nice kh_put_oid_map(), etc.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2019-06-20 10:37:55 -07:00
Jeff King 685d34a96e khash: drop sha1-specific map types
All of the callers of khash_sha1 and khash_sha1_pos have been removed,
in favor of using maps that use "struct object_id" as their keys. Let's
drop these now-obsolete types.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2019-06-20 10:37:41 -07:00
Jeff King 8fbb558af4 khash: rename kh_oid_t to kh_oid_set
khash lets us define a hash as either a map or a set (i.e., with no
"value" type). For the oid maps we define, "oid" is the set and
"oid_map" is the map. As the bug in the previous commit shows, it's easy
to pick the wrong one.

So let's make the names more distinct: "oid_set" and "oid_map".

An alternative naming scheme would be to actually name the type after
the key/value types. So e.g., "oid" _would_ be the set, since it has no
value type. And "oid_map" would become "oid_void" or similar (and
"oid_pos" becomes "oid_int"). That's better in some ways: it's more
regular, and a given map type can be more reasily reused in multiple
contexts (e.g., something storing an "int" that isn't a "pos"). But it's
also slightly less descriptive.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2019-06-20 10:27:48 -07:00
Jeff King 4ed43d16d7 khash: drop broken oid_map typedef
Commit 5a8643eff1 (khash: move oid hash table definition, 2019-02-19)
added a khash "oid_map" type to match the existing "oid" type, which is
a simple set (i.e., just keys, no values). But in setting up the
khash_oid_map typedef, it accidentally referred to "kh_oid_t", which is
the set type.

Nobody noticed the breakage because there are not yet any callers; the
type was added just as a match to the existing sha1 types (whose map
type confusingly _is_ called khash_sha1, and it has no matching set

We could easily fix this with s/oid/oid_map/ in the typedef. But let's
take this a step further, and just drop the typedef entirely.  These
typedefs were added by 5a8643eff1 to match the khash_sha1 typedefs. But
the actual khash-derived type names are descriptive enough; this is just
adding an extra layer of indirection. The khash names do not quite
follow our usual style (e.g., they end in "_t"), but since we end up
using other khash names (e.g., khiter_t, kh_get_oid()) anyway, just
typedef-ing the struct name is not really helping much.

And there are already many cases where we use the raw khash type names
anyway (e.g., the "set" variant defined just above us does not have such
a typedef!).

So let's drop this typedef, and the matching oid_pos one (which actually
_does_ have a user, but we can easily convert it).

We'll leave the khash_sha1 typedef around. The ultimate fate of its
callers should be conversion to kh_oid_map_t, so there's no point in
going through the noise of changing the names now.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2019-06-20 10:21:27 -07:00
Junio C Hamano 4aeeef3773 Merge branch 'dl/no-extern-in-func-decl'
Mechanically and systematically drop "extern" from function

* dl/no-extern-in-func-decl:
  *.[ch]: manually align parameter lists
  *.[ch]: remove extern from function declarations using sed
  *.[ch]: remove extern from function declarations using spatch
2019-05-13 23:50:32 +09:00
Denton Liu ad6dad0996 *.[ch]: manually align parameter lists
In previous patches, extern was mechanically removed from function
declarations without care to formatting, causing parameter lists to be
misaligned. Manually format changed sections such that the parameter
lists should be realigned.

Viewing this patch with 'git diff -w' should produce no output.

Signed-off-by: Denton Liu <>
Signed-off-by: Junio C Hamano <>
2019-05-05 15:20:10 +09:00
Denton Liu b199d7147a *.[ch]: remove extern from function declarations using sed
There has been a push to remove extern from function declarations.
Finish the job by removing all instances of "extern" for function
declarations in headers using sed.

This was done by running the following on my system with sed 4.2.2:

    $ git ls-files \*.{c,h} |
        grep -v ^compat/ |
        xargs sed -i'' -e 's/^\(\s*\)extern \([^(]*([^*]\)/\1\2/'

Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.

Then, leftover instances of extern were found by running

    $ git grep -w -C3 extern \*.{c,h}

and manually checking the output. No other instances were found.

Note that the regex used specifically excludes function variables which
_should_ be left as extern.

Not the most elegant way to do it but it gets the job done.

Signed-off-by: Denton Liu <>
Signed-off-by: Junio C Hamano <>
2019-05-05 15:20:08 +09:00
brian m. carlson 5a8643eff1 khash: move oid hash table definition
Move the oid khash table definition to khash.h and define a typedef for
it, similar to the one we have for unsigned char pointers. Define
variants that are maps as well.

Signed-off-by: brian m. carlson <>
Signed-off-by: Junio C Hamano <>
2019-04-01 11:57:37 +09:00
Carlo Marcelo Arenas Belón d99ea5f9c5 khash: silence -Wunused-function for delta-islands
showing the following when compiled with latest clang (OpenBSD, Fedors
and macOS):

delta-islands.c:23:1: warning: unused function 'kh_destroy_str'
delta-islands.c:23:1: warning: unused function 'kh_clear_str'
delta-islands.c:23:1: warning: unused function 'kh_del_str' [-Wunused-function]

Reported-by: René Scharfe <>
Suggested-by: Nguyễn Thái Ngọc Duy <>
Signed-off-by: Carlo Marcelo Arenas Belón <>
Signed-off-by: Junio C Hamano <>
2018-10-24 14:52:50 +09:00
René Scharfe 9249ca26ac khash: factor out kh_release_*
Add a function for releasing the khash-internal allocations, but not the
khash structure itself.  It can be used with on-stack khash structs.

Signed-off-by: Rene Scharfe <>
Signed-off-by: Junio C Hamano <>
2018-10-04 11:12:13 -07:00
Elijah Newren ef3ca95475 Add missing includes and forward declarations
I looped over the toplevel header files, creating a temporary two-line C
program for each consisting of
  #include "git-compat-util.h"
  #include $HEADER
This patch is the result of manually fixing errors in compiling those
tiny programs.

Signed-off-by: Elijah Newren <>
Signed-off-by: Junio C Hamano <>
2018-08-15 11:52:09 -07:00
Jeff King b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2016-02-22 14:51:09 -08:00
René Scharfe 2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <>
Signed-off-by: Junio C Hamano <>
2014-09-18 09:13:42 -07:00
Karsten Blees 039dc71a7c hashmap: factor out getting a hash code from a SHA1
Copying the first bytes of a SHA1 is duplicated in six places,
however, the implications (the actual value would depend on the
endianness of the platform) is documented only once.

Add a properly documented API for this.

Signed-off-by: Karsten Blees <>
Signed-off-by: Junio C Hamano <>
2014-07-07 13:56:24 -07:00
Vicent Marti fff42755ef pack-bitmap: add support for bitmap indexes
A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

	- `prepare_bitmap_git`
	- `prepare_bitmap_walk`
	- `traverse_bitmap_commit_list`
	- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

	a) limit the subsequent walk when building the WANTs bitmap
	b) finding the final set of interesting commits by performing an
	   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

	git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti <>
Signed-off-by: Jeff King <>
Signed-off-by: Junio C Hamano <>
2013-12-30 12:19:22 -08:00