2023-04-11 09:41:56 +02:00
|
|
|
#include "git-compat-util.h"
|
2005-05-19 01:14:22 +02:00
|
|
|
#include "tag.h"
|
2005-04-18 20:39:48 +02:00
|
|
|
#include "commit.h"
|
2018-04-10 14:56:05 +02:00
|
|
|
#include "commit-graph.h"
|
2023-03-21 07:25:57 +01:00
|
|
|
#include "environment.h"
|
2023-03-21 07:25:54 +01:00
|
|
|
#include "gettext.h"
|
2023-02-24 01:09:27 +01:00
|
|
|
#include "hex.h"
|
2018-05-16 01:42:16 +02:00
|
|
|
#include "repository.h"
|
2023-04-11 09:41:49 +02:00
|
|
|
#include "object-name.h"
|
2023-05-16 08:34:06 +02:00
|
|
|
#include "object-store-ll.h"
|
2006-10-30 20:09:06 +01:00
|
|
|
#include "pkt-line.h"
|
2006-12-25 20:48:35 +01:00
|
|
|
#include "utf8.h"
|
2007-04-09 11:34:05 +02:00
|
|
|
#include "diff.h"
|
|
|
|
#include "revision.h"
|
2009-10-09 12:21:57 +02:00
|
|
|
#include "notes.h"
|
2018-05-15 23:48:42 +02:00
|
|
|
#include "alloc.h"
|
commit: teach --gpg-sign option
This uses the gpg-interface.[ch] to allow signing the commit, i.e.
$ git commit --gpg-sign -m foo
You need a passphrase to unlock the secret key for
user: "Junio C Hamano <gitster@pobox.com>"
4096-bit RSA key, ID 96AFE6CB, created 2011-10-03 (main key ID 713660A7)
[master 8457d13] foo
1 files changed, 1 insertions(+), 0 deletions(-)
The lines of GPG detached signature are placed in a new multi-line header
field, instead of tucking the signature block at the end of the commit log
message text (similar to how signed tag is done), for multiple reasons:
- The signature won't clutter output from "git log" and friends if it is
in the extra header. If we place it at the end of the log message, we
would need to teach "git log" and friends to strip the signature block
with an option.
- Teaching new versions of "git log" and "gitk" to optionally verify and
show signatures is cleaner if we structurally know where the signature
block is (instead of scanning in the commit log message).
- The signature needs to be stripped upon various commit rewriting
operations, e.g. rebase, filter-branch, etc. They all already ignore
unknown headers, but if we place signature in the log message, all of
these tools (and third-party tools) also need to learn how a signature
block would look like.
- When we added the optional encoding header, all the tools (both in tree
and third-party) that acts on the raw commit object should have been
fixed to ignore headers they do not understand, so it is not like that
new header would be more likely to break than extra text in the commit.
A commit made with the above sample sequence would look like this:
$ git cat-file commit HEAD
tree 3cd71d90e3db4136e5260ab54599791c4f883b9d
parent b87755351a47b09cb27d6913e6e0e17e6254a4d4
author Junio C Hamano <gitster@pobox.com> 1317862251 -0700
committer Junio C Hamano <gitster@pobox.com> 1317862251 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJOjPtrAAoJELC16IaWr+bL4TMP/RSe2Y/jYnCkds9unO5JEnfG
...
=dt98
-----END PGP SIGNATURE-----
foo
but "git log" (unless you ask for it with --pretty=raw) output is not
cluttered with the signature information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-06 02:23:20 +02:00
|
|
|
#include "gpg-interface.h"
|
2012-04-01 00:10:39 +02:00
|
|
|
#include "mergesort.h"
|
commit-slab: introduce a macro to define a slab for new type
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-13 20:56:41 +02:00
|
|
|
#include "commit-slab.h"
|
2013-06-07 06:58:12 +02:00
|
|
|
#include "prio-queue.h"
|
2020-12-31 12:56:23 +01:00
|
|
|
#include "hash-lookup.h"
|
interpret-trailers: honor the cut line
If a commit message is edited with the "verbose" option, the buffer
will have a cut line and diff after the log message, like so:
my subject
# ------------------------ >8 ------------------------
# Do not touch the line above.
# Everything below will be removed.
diff --git a/foo.txt b/foo.txt
index 5716ca5..7601807 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1 +1 @@
-bar
+baz
"git interpret-trailers" is unaware of the cut line, and assumes the
trailer block would be at the end of the whole thing. This can easily
be seen with:
$ GIT_EDITOR='git interpret-trailers --in-place --trailer Acked-by:me' \
git commit --amend -v
Teach "git interpret-trailers" to notice the cut-line and ignore the
remainder of the input when looking for a place to add new trailer
block. This makes it consistent with how "git commit -v -s" inserts a
new Signed-off-by: line.
This can be done by the same logic as the existing helper function,
wt_status_truncate_message_at_cut_line(), uses, but it wants the caller
to pass a strbuf to it. Because the function ignore_non_trailer() used
by the command takes a <pointer, length> pair, not a strbuf, steal the
logic from wt_status_truncate_message_at_cut_line() to create a new
wt_status_locate_end() helper function that takes <pointer, length>
pair, and make ignore_non_trailer() call it to help "interpret-trailers".
Since there is only one caller of wt_status_truncate_message_at_cut_line()
in cmd_commit(), rewrite it to call wt_status_locate_end() helper instead
and remove the old helper that no longer has any caller.
Signed-off-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 08:06:49 +02:00
|
|
|
#include "wt-status.h"
|
Deprecate support for .git/info/grafts
The grafts feature was a convenient way to "stitch together" ancient
history to the fresh start of linux.git.
Its implementation is, however, not up to Git's standards, as there are
too many ways where it can lead to surprising and unwelcome behavior.
For example, when pushing from a repository with active grafts, it is
possible to miss commits that have been "grafted out", resulting in a
broken state on the other side.
Also, the grafts feature is limited to "rewriting" commits' list of
parents, it cannot replace anything else.
The much younger feature implemented as `git replace` set out to remedy
those limitations and dangerous bugs.
Seeing as `git replace` is pretty mature by now (since 4228e8bc98
(replace: add --graft option, 2014-07-19) it can perform the graft
file's duties), it is time to deprecate support for the graft file, and
to retire it eventually.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-29 00:44:44 +02:00
|
|
|
#include "advice.h"
|
2018-09-05 00:00:08 +02:00
|
|
|
#include "refs.h"
|
2018-11-02 03:04:55 +01:00
|
|
|
#include "commit-reach.h"
|
2019-10-15 12:25:31 +02:00
|
|
|
#include "run-command.h"
|
2023-03-21 07:26:05 +01:00
|
|
|
#include "setup.h"
|
2020-04-30 21:48:50 +02:00
|
|
|
#include "shallow.h"
|
2023-04-22 22:17:26 +02:00
|
|
|
#include "tree.h"
|
2021-12-22 04:59:40 +01:00
|
|
|
#include "hook.h"
|
2005-04-18 20:39:48 +02:00
|
|
|
|
2012-09-15 22:58:15 +02:00
|
|
|
static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
|
|
|
|
|
[PATCH] Avoid wasting memory in git-rev-list
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-15 23:43:17 +02:00
|
|
|
int save_commit_buffer = 1;
|
2021-08-23 12:44:02 +02:00
|
|
|
int no_graft_file_deprecated_advice;
|
[PATCH] Avoid wasting memory in git-rev-list
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-15 23:43:17 +02:00
|
|
|
|
2005-04-18 20:39:48 +02:00
|
|
|
const char *commit_type = "commit";
|
|
|
|
|
2018-06-29 03:22:21 +02:00
|
|
|
struct commit *lookup_commit_reference_gently(struct repository *r,
|
2018-06-29 03:21:57 +02:00
|
|
|
const struct object_id *oid, int quiet)
|
2005-05-19 01:14:22 +02:00
|
|
|
{
|
2018-06-29 03:22:21 +02:00
|
|
|
struct object *obj = deref_tag(r,
|
|
|
|
parse_object(r, oid),
|
2018-06-29 03:21:51 +02:00
|
|
|
NULL, 0);
|
2005-05-19 01:14:22 +02:00
|
|
|
|
|
|
|
if (!obj)
|
|
|
|
return NULL;
|
2020-06-17 11:14:08 +02:00
|
|
|
return object_as_type(obj, OBJ_COMMIT, quiet);
|
2005-08-21 11:51:10 +02:00
|
|
|
}
|
|
|
|
|
2018-06-29 03:22:22 +02:00
|
|
|
struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
|
2005-08-21 11:51:10 +02:00
|
|
|
{
|
2018-06-29 03:22:22 +02:00
|
|
|
return lookup_commit_reference_gently(r, oid, 0);
|
2005-05-19 01:14:22 +02:00
|
|
|
}
|
|
|
|
|
Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-07 00:10:10 +02:00
|
|
|
struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name)
|
2011-09-17 13:57:45 +02:00
|
|
|
{
|
2018-06-29 03:21:58 +02:00
|
|
|
struct commit *c = lookup_commit_reference(the_repository, oid);
|
2011-09-17 13:57:45 +02:00
|
|
|
if (!c)
|
|
|
|
die(_("could not parse %s"), ref_name);
|
2018-08-28 23:22:48 +02:00
|
|
|
if (!oideq(oid, &c->object.oid)) {
|
2011-09-17 13:57:45 +02:00
|
|
|
warning(_("%s %s is not a commit!"),
|
Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-07 00:10:10 +02:00
|
|
|
ref_name, oid_to_hex(oid));
|
2011-09-17 13:57:45 +02:00
|
|
|
}
|
|
|
|
return c;
|
|
|
|
}
|
|
|
|
|
2022-10-17 15:17:40 +02:00
|
|
|
struct commit *lookup_commit_object(struct repository *r,
|
|
|
|
const struct object_id *oid)
|
|
|
|
{
|
|
|
|
struct object *obj = parse_object(r, oid);
|
|
|
|
return obj ? object_as_type(obj, OBJ_COMMIT, 0) : NULL;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2018-06-29 03:22:10 +02:00
|
|
|
struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
|
2005-04-18 20:39:48 +02:00
|
|
|
{
|
2019-06-20 09:41:14 +02:00
|
|
|
struct object *obj = lookup_object(r, oid);
|
2014-07-13 08:41:55 +02:00
|
|
|
if (!obj)
|
2019-06-20 09:41:21 +02:00
|
|
|
return create_object(r, oid, alloc_commit_node(r));
|
2020-06-17 11:14:08 +02:00
|
|
|
return object_as_type(obj, OBJ_COMMIT, 0);
|
2005-04-18 20:39:48 +02:00
|
|
|
}
|
|
|
|
|
2010-11-02 20:59:07 +01:00
|
|
|
struct commit *lookup_commit_reference_by_name(const char *name)
|
|
|
|
{
|
2015-03-14 00:39:34 +01:00
|
|
|
struct object_id oid;
|
2010-11-02 20:59:07 +01:00
|
|
|
struct commit *commit;
|
|
|
|
|
2023-03-28 15:58:46 +02:00
|
|
|
if (repo_get_oid_committish(the_repository, name, &oid))
|
2010-11-02 20:59:07 +01:00
|
|
|
return NULL;
|
2018-06-29 03:21:58 +02:00
|
|
|
commit = lookup_commit_reference(the_repository, &oid);
|
2023-03-28 15:58:48 +02:00
|
|
|
if (repo_parse_commit(the_repository, commit))
|
2010-11-02 20:59:07 +01:00
|
|
|
return NULL;
|
|
|
|
return commit;
|
|
|
|
}
|
|
|
|
|
2017-04-26 21:29:31 +02:00
|
|
|
static timestamp_t parse_commit_date(const char *buf, const char *tail)
|
2005-04-18 20:39:48 +02:00
|
|
|
{
|
2008-01-19 18:35:23 +01:00
|
|
|
const char *dateptr;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:14:09 +02:00
|
|
|
const char *eol;
|
2005-04-18 20:39:48 +02:00
|
|
|
|
2008-01-19 18:35:23 +01:00
|
|
|
if (buf + 6 >= tail)
|
|
|
|
return 0;
|
2005-04-18 20:39:48 +02:00
|
|
|
if (memcmp(buf, "author", 6))
|
|
|
|
return 0;
|
2008-01-19 18:35:23 +01:00
|
|
|
while (buf < tail && *buf++ != '\n')
|
2005-04-18 20:39:48 +02:00
|
|
|
/* nada */;
|
2008-01-19 18:35:23 +01:00
|
|
|
if (buf + 9 >= tail)
|
|
|
|
return 0;
|
2005-04-18 20:39:48 +02:00
|
|
|
if (memcmp(buf, "committer", 9))
|
|
|
|
return 0;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:14:09 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Jump to end-of-line so that we can walk backwards to find the
|
|
|
|
* end-of-email ">". This is more forgiving of malformed cases
|
|
|
|
* because unexpected characters tend to be in the name and email
|
|
|
|
* fields.
|
|
|
|
*/
|
|
|
|
eol = memchr(buf, '\n', tail - buf);
|
|
|
|
if (!eol)
|
2008-01-19 18:35:23 +01:00
|
|
|
return 0;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:14:09 +02:00
|
|
|
dateptr = eol;
|
|
|
|
while (dateptr > buf && dateptr[-1] != '>')
|
|
|
|
dateptr--;
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:17:15 +02:00
|
|
|
if (dateptr == buf)
|
2008-01-19 18:35:23 +01:00
|
|
|
return 0;
|
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
Name <Name<email>> 123456789 -0500
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:14:09 +02:00
|
|
|
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:17:15 +02:00
|
|
|
/*
|
|
|
|
* Trim leading whitespace, but make sure we have at least one
|
|
|
|
* non-whitespace character, as parse_timestamp() will otherwise walk
|
|
|
|
* right past the newline we found in "eol" when skipping whitespace
|
|
|
|
* itself.
|
|
|
|
*
|
|
|
|
* In theory it would be sufficient to allow any character not matched
|
|
|
|
* by isspace(), but there's a catch: our isspace() does not
|
|
|
|
* necessarily match the behavior of parse_timestamp(), as the latter
|
|
|
|
* is implemented by system routines which match more exotic control
|
|
|
|
* codes, or even locale-dependent sequences.
|
|
|
|
*
|
|
|
|
* Since we expect the timestamp to be a number, we can check for that.
|
|
|
|
* Anything else (e.g., a non-numeric token like "foo") would just
|
|
|
|
* cause parse_timestamp() to return 0 anyway.
|
|
|
|
*/
|
|
|
|
while (dateptr < eol && isspace(*dateptr))
|
|
|
|
dateptr++;
|
|
|
|
if (!isdigit(*dateptr) && *dateptr != '-')
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We know there is at least one digit (or dash), so we'll begin
|
|
|
|
* parsing there and stop at worst case at eol.
|
2023-04-27 10:17:24 +02:00
|
|
|
*
|
|
|
|
* Note that we may feed parse_timestamp() extra characters here if the
|
|
|
|
* commit is malformed, and it will parse as far as it can. For
|
|
|
|
* example, "123foo456" would return "123". That might be questionable
|
|
|
|
* (versus returning "0"), but it would help in a hypothetical case
|
|
|
|
* like "123456+0100", where the whitespace from the timezone is
|
|
|
|
* missing. Since such syntactic errors may be baked into history and
|
|
|
|
* hard to correct now, let's err on trying to make our best guess
|
|
|
|
* here, rather than insist on perfect syntax.
|
parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 10:17:15 +02:00
|
|
|
*/
|
2017-04-21 12:45:44 +02:00
|
|
|
return parse_timestamp(dateptr, NULL, 10);
|
2005-04-18 20:39:48 +02:00
|
|
|
}
|
|
|
|
|
2021-01-28 07:20:23 +01:00
|
|
|
static const struct object_id *commit_graft_oid_access(size_t index, const void *table)
|
2014-02-26 19:49:22 +01:00
|
|
|
{
|
2021-01-28 07:20:23 +01:00
|
|
|
const struct commit_graft * const *commit_graft_table = table;
|
2021-01-28 07:19:42 +01:00
|
|
|
return &commit_graft_table[index]->oid;
|
2014-02-26 19:49:22 +01:00
|
|
|
}
|
|
|
|
|
2021-01-28 07:12:35 +01:00
|
|
|
int commit_graft_pos(struct repository *r, const struct object_id *oid)
|
2005-07-30 09:58:28 +02:00
|
|
|
{
|
2021-01-28 07:19:42 +01:00
|
|
|
return oid_pos(oid, r->parsed_objects->grafts,
|
|
|
|
r->parsed_objects->grafts_nr,
|
|
|
|
commit_graft_oid_access);
|
2005-07-30 09:58:28 +02:00
|
|
|
}
|
|
|
|
|
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 19:54:37 +02:00
|
|
|
static void unparse_commit(struct repository *r, const struct object_id *oid)
|
|
|
|
{
|
|
|
|
struct commit *c = lookup_commit(r, oid);
|
|
|
|
|
|
|
|
if (!c->object.parsed)
|
|
|
|
return;
|
|
|
|
free_commit_list(c->parents);
|
|
|
|
c->parents = NULL;
|
|
|
|
c->object.parsed = 0;
|
|
|
|
}
|
|
|
|
|
2018-05-18 00:51:48 +02:00
|
|
|
int register_commit_graft(struct repository *r, struct commit_graft *graft,
|
|
|
|
int ignore_dups)
|
2006-04-07 08:58:51 +02:00
|
|
|
{
|
2021-01-28 07:12:35 +01:00
|
|
|
int pos = commit_graft_pos(r, &graft->oid);
|
2007-06-07 09:04:01 +02:00
|
|
|
|
2006-04-07 08:58:51 +02:00
|
|
|
if (0 <= pos) {
|
|
|
|
if (ignore_dups)
|
|
|
|
free(graft);
|
|
|
|
else {
|
2018-05-18 00:51:48 +02:00
|
|
|
free(r->parsed_objects->grafts[pos]);
|
|
|
|
r->parsed_objects->grafts[pos] = graft;
|
2006-04-07 08:58:51 +02:00
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
pos = -pos - 1;
|
2018-05-18 00:51:48 +02:00
|
|
|
ALLOC_GROW(r->parsed_objects->grafts,
|
|
|
|
r->parsed_objects->grafts_nr + 1,
|
|
|
|
r->parsed_objects->grafts_alloc);
|
|
|
|
r->parsed_objects->grafts_nr++;
|
|
|
|
if (pos < r->parsed_objects->grafts_nr)
|
|
|
|
memmove(r->parsed_objects->grafts + pos + 1,
|
|
|
|
r->parsed_objects->grafts + pos,
|
|
|
|
(r->parsed_objects->grafts_nr - pos - 1) *
|
|
|
|
sizeof(*r->parsed_objects->grafts));
|
|
|
|
r->parsed_objects->grafts[pos] = graft;
|
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 19:54:37 +02:00
|
|
|
unparse_commit(r, &graft->oid);
|
2006-04-07 08:58:51 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-08-18 20:33:12 +02:00
|
|
|
struct commit_graft *read_graft_line(struct strbuf *line)
|
2006-04-07 08:58:51 +02:00
|
|
|
{
|
|
|
|
/* The format is just "Commit Parent1 Parent2 ...\n" */
|
2017-08-18 20:33:14 +02:00
|
|
|
int i, phase;
|
|
|
|
const char *tail = NULL;
|
2006-04-07 08:58:51 +02:00
|
|
|
struct commit_graft *graft = NULL;
|
2017-08-18 20:33:14 +02:00
|
|
|
struct object_id dummy_oid, *oid;
|
2006-04-07 08:58:51 +02:00
|
|
|
|
2017-08-18 20:33:12 +02:00
|
|
|
strbuf_rtrim(line);
|
|
|
|
if (!line->len || line->buf[0] == '#')
|
2006-04-16 23:24:56 +02:00
|
|
|
return NULL;
|
2017-08-18 20:33:14 +02:00
|
|
|
/*
|
|
|
|
* phase 0 verifies line, counts hashes in line and allocates graft
|
|
|
|
* phase 1 fills graft
|
|
|
|
*/
|
|
|
|
for (phase = 0; phase < 2; phase++) {
|
|
|
|
oid = graft ? &graft->oid : &dummy_oid;
|
|
|
|
if (parse_oid_hex(line->buf, oid, &tail))
|
2006-04-07 08:58:51 +02:00
|
|
|
goto bad_graft_data;
|
2017-08-18 20:33:14 +02:00
|
|
|
for (i = 0; *tail != '\0'; i++) {
|
|
|
|
oid = graft ? &graft->parent[i] : &dummy_oid;
|
|
|
|
if (!isspace(*tail++) || parse_oid_hex(tail, oid, &tail))
|
|
|
|
goto bad_graft_data;
|
|
|
|
}
|
|
|
|
if (!graft) {
|
|
|
|
graft = xmalloc(st_add(sizeof(*graft),
|
|
|
|
st_mult(sizeof(struct object_id), i)));
|
|
|
|
graft->nr_parent = i;
|
|
|
|
}
|
2006-04-07 08:58:51 +02:00
|
|
|
}
|
|
|
|
return graft;
|
2010-12-01 20:15:59 +01:00
|
|
|
|
|
|
|
bad_graft_data:
|
2017-08-18 20:33:12 +02:00
|
|
|
error("bad graft data: %s", line->buf);
|
2017-08-18 20:33:14 +02:00
|
|
|
assert(!graft);
|
2010-12-01 20:15:59 +01:00
|
|
|
return NULL;
|
2006-04-07 08:58:51 +02:00
|
|
|
}
|
|
|
|
|
2018-05-18 00:51:49 +02:00
|
|
|
static int read_graft_file(struct repository *r, const char *graft_file)
|
2005-07-30 09:58:28 +02:00
|
|
|
{
|
2017-05-03 12:16:50 +02:00
|
|
|
FILE *fp = fopen_or_warn(graft_file, "r");
|
2013-12-27 21:49:57 +01:00
|
|
|
struct strbuf buf = STRBUF_INIT;
|
2006-04-07 08:58:51 +02:00
|
|
|
if (!fp)
|
|
|
|
return -1;
|
2021-08-23 12:44:02 +02:00
|
|
|
if (!no_graft_file_deprecated_advice &&
|
|
|
|
advice_enabled(ADVICE_GRAFT_FILE_DEPRECATED))
|
Deprecate support for .git/info/grafts
The grafts feature was a convenient way to "stitch together" ancient
history to the fresh start of linux.git.
Its implementation is, however, not up to Git's standards, as there are
too many ways where it can lead to surprising and unwelcome behavior.
For example, when pushing from a repository with active grafts, it is
possible to miss commits that have been "grafted out", resulting in a
broken state on the other side.
Also, the grafts feature is limited to "rewriting" commits' list of
parents, it cannot replace anything else.
The much younger feature implemented as `git replace` set out to remedy
those limitations and dangerous bugs.
Seeing as `git replace` is pretty mature by now (since 4228e8bc98
(replace: add --graft option, 2014-07-19) it can perform the graft
file's duties), it is time to deprecate support for the graft file, and
to retire it eventually.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-29 00:44:44 +02:00
|
|
|
advise(_("Support for <GIT_DIR>/info/grafts is deprecated\n"
|
|
|
|
"and will be removed in a future Git version.\n"
|
|
|
|
"\n"
|
|
|
|
"Please use \"git replace --convert-graft-file\"\n"
|
|
|
|
"to convert the grafts into replace refs.\n"
|
|
|
|
"\n"
|
|
|
|
"Turn this message off by running\n"
|
|
|
|
"\"git config advice.graftFileDeprecated false\""));
|
2013-12-27 21:49:57 +01:00
|
|
|
while (!strbuf_getwholeline(&buf, fp, '\n')) {
|
2005-07-30 09:58:28 +02:00
|
|
|
/* The format is just "Commit Parent1 Parent2 ...\n" */
|
2017-08-18 20:33:12 +02:00
|
|
|
struct commit_graft *graft = read_graft_line(&buf);
|
2006-04-16 23:24:56 +02:00
|
|
|
if (!graft)
|
|
|
|
continue;
|
2018-05-18 00:51:49 +02:00
|
|
|
if (register_commit_graft(r, graft, 1))
|
2013-12-27 21:49:57 +01:00
|
|
|
error("duplicate graft data: %s", buf.buf);
|
2005-07-30 09:58:28 +02:00
|
|
|
}
|
|
|
|
fclose(fp);
|
2013-12-27 21:49:57 +01:00
|
|
|
strbuf_release(&buf);
|
2006-04-07 08:58:51 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-20 20:24:30 +02:00
|
|
|
void prepare_commit_graft(struct repository *r)
|
2006-04-07 08:58:51 +02:00
|
|
|
{
|
|
|
|
char *graft_file;
|
|
|
|
|
2018-05-18 00:51:53 +02:00
|
|
|
if (r->parsed_objects->commit_graft_prepared)
|
2006-04-07 08:58:51 +02:00
|
|
|
return;
|
prepare_commit_graft: treat non-repository as a noop
The parse_commit_buffer() function consults lookup_commit_graft()
to see if we need to rewrite parents. The latter will look
at $GIT_DIR/info/grafts. If you're outside of a repository,
then this will trigger a BUG() as of b1ef400eec (setup_git_env:
avoid blind fall-back to ".git", 2016-10-20).
It's probably uncommon to actually parse a commit outside of
a repository, but you can see it in action with:
cd /not/a/git/repo
git index-pack --strict /some/file.pack
This works fine without --strict, but the fsck checks will
try to parse any commits, triggering the BUG(). We can fix
that by teaching the graft code to behave as if there are no
grafts when we aren't in a repository.
Arguably index-pack (and fsck) are wrong to consider grafts
at all. So another solution is to disable grafts entirely
for those commands. But given that the graft feature is
deprecated anyway, it's not worth even thinking through the
ramifications that might have.
There is one other corner case I considered here. What
should:
cd /not/a/git/repo
export GIT_GRAFT_FILE=/file/with/grafts
git index-pack --strict /some/file.pack
do? We don't have a repository, but the user has pointed us
directly at a graft file, which we could respect. I believe
this case did work that way prior to b1ef400eec. However,
fixing it now would be pretty invasive. Back then we would
just call into setup_git_env() even without a repository.
But these days it actually takes a git_dir argument. So
there would be a fair bit of refactoring of the setup code
involved.
Given the obscurity of this case, plus the fact that grafts
are deprecated and probably shouldn't work under index-pack
anyway, it's not worth pursuing further. This patch at least
un-breaks the common case where you're _not_ using grafts,
but we BUG() anyway trying to even find that out.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-01 00:42:53 +02:00
|
|
|
if (!startup_info->have_repository)
|
|
|
|
return;
|
|
|
|
|
2018-05-18 00:51:53 +02:00
|
|
|
graft_file = get_graft_file(r);
|
|
|
|
read_graft_file(r, graft_file);
|
2006-10-30 20:09:06 +01:00
|
|
|
/* make sure shallows are read */
|
2018-05-18 00:51:53 +02:00
|
|
|
is_repository_shallow(r);
|
|
|
|
r->parsed_objects->commit_graft_prepared = 1;
|
2005-07-30 09:58:28 +02:00
|
|
|
}
|
|
|
|
|
2018-05-18 00:51:54 +02:00
|
|
|
struct commit_graft *lookup_commit_graft(struct repository *r, const struct object_id *oid)
|
2005-07-30 09:58:28 +02:00
|
|
|
{
|
|
|
|
int pos;
|
2018-05-18 00:51:54 +02:00
|
|
|
prepare_commit_graft(r);
|
2021-01-28 07:12:35 +01:00
|
|
|
pos = commit_graft_pos(r, oid);
|
2005-07-30 09:58:28 +02:00
|
|
|
if (pos < 0)
|
|
|
|
return NULL;
|
2018-05-18 00:51:54 +02:00
|
|
|
return r->parsed_objects->grafts[pos];
|
2005-07-30 09:58:28 +02:00
|
|
|
}
|
|
|
|
|
2011-08-18 14:29:35 +02:00
|
|
|
int for_each_commit_graft(each_commit_graft_fn fn, void *cb_data)
|
2006-10-30 20:09:06 +01:00
|
|
|
{
|
2011-08-18 14:29:35 +02:00
|
|
|
int i, ret;
|
2018-05-16 01:42:16 +02:00
|
|
|
for (i = ret = 0; i < the_repository->parsed_objects->grafts_nr && !ret; i++)
|
|
|
|
ret = fn(the_repository->parsed_objects->grafts[i], cb_data);
|
2011-08-18 14:29:35 +02:00
|
|
|
return ret;
|
2006-10-30 20:09:06 +01:00
|
|
|
}
|
|
|
|
|
2022-03-17 19:24:47 +01:00
|
|
|
void reset_commit_grafts(struct repository *r)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 19:54:37 +02:00
|
|
|
for (i = 0; i < r->parsed_objects->grafts_nr; i++) {
|
|
|
|
unparse_commit(r, &r->parsed_objects->grafts[i]->oid);
|
2022-03-17 19:24:47 +01:00
|
|
|
free(r->parsed_objects->grafts[i]);
|
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 19:54:37 +02:00
|
|
|
}
|
2022-03-17 19:24:47 +01:00
|
|
|
r->parsed_objects->grafts_nr = 0;
|
|
|
|
r->parsed_objects->commit_graft_prepared = 0;
|
|
|
|
}
|
|
|
|
|
2014-06-10 23:44:13 +02:00
|
|
|
struct commit_buffer {
|
|
|
|
void *buffer;
|
|
|
|
unsigned long size;
|
|
|
|
};
|
|
|
|
define_commit_slab(buffer_slab, struct commit_buffer);
|
2014-06-10 23:43:02 +02:00
|
|
|
|
2018-06-29 03:22:15 +02:00
|
|
|
struct buffer_slab *allocate_commit_buffer_slab(void)
|
2014-06-10 23:40:14 +02:00
|
|
|
{
|
2018-06-29 03:22:15 +02:00
|
|
|
struct buffer_slab *bs = xmalloc(sizeof(*bs));
|
|
|
|
init_buffer_slab(bs);
|
|
|
|
return bs;
|
|
|
|
}
|
|
|
|
|
|
|
|
void free_commit_buffer_slab(struct buffer_slab *bs)
|
|
|
|
{
|
|
|
|
clear_buffer_slab(bs);
|
|
|
|
free(bs);
|
|
|
|
}
|
2014-06-10 23:43:02 +02:00
|
|
|
|
2018-06-29 03:22:16 +02:00
|
|
|
void set_commit_buffer(struct repository *r, struct commit *commit, void *buffer, unsigned long size)
|
2014-06-10 23:40:14 +02:00
|
|
|
{
|
2018-06-29 03:22:15 +02:00
|
|
|
struct commit_buffer *v = buffer_slab_at(
|
2018-06-29 03:22:16 +02:00
|
|
|
r->parsed_objects->buffer_slab, commit);
|
2014-06-10 23:44:13 +02:00
|
|
|
v->buffer = buffer;
|
|
|
|
v->size = size;
|
2014-06-10 23:40:14 +02:00
|
|
|
}
|
|
|
|
|
2018-06-29 03:22:17 +02:00
|
|
|
const void *get_cached_commit_buffer(struct repository *r, const struct commit *commit, unsigned long *sizep)
|
2014-06-10 23:40:39 +02:00
|
|
|
{
|
2018-06-29 03:22:15 +02:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
2018-06-29 03:22:17 +02:00
|
|
|
r->parsed_objects->buffer_slab, commit);
|
2015-05-15 00:25:52 +02:00
|
|
|
if (!v) {
|
|
|
|
if (sizep)
|
|
|
|
*sizep = 0;
|
|
|
|
return NULL;
|
|
|
|
}
|
2014-06-10 23:44:13 +02:00
|
|
|
if (sizep)
|
|
|
|
*sizep = v->size;
|
|
|
|
return v->buffer;
|
2014-06-10 23:40:39 +02:00
|
|
|
}
|
|
|
|
|
2018-11-14 01:12:57 +01:00
|
|
|
const void *repo_get_commit_buffer(struct repository *r,
|
|
|
|
const struct commit *commit,
|
|
|
|
unsigned long *sizep)
|
2014-06-10 23:40:39 +02:00
|
|
|
{
|
2018-11-14 01:12:57 +01:00
|
|
|
const void *ret = get_cached_commit_buffer(r, commit, sizep);
|
2014-06-10 23:40:39 +02:00
|
|
|
if (!ret) {
|
|
|
|
enum object_type type;
|
|
|
|
unsigned long size;
|
2018-11-14 01:12:57 +01:00
|
|
|
ret = repo_read_object_file(r, &commit->object.oid, &type, &size);
|
2014-06-10 23:40:39 +02:00
|
|
|
if (!ret)
|
|
|
|
die("cannot read commit object %s",
|
2015-11-10 03:22:28 +01:00
|
|
|
oid_to_hex(&commit->object.oid));
|
2014-06-10 23:40:39 +02:00
|
|
|
if (type != OBJ_COMMIT)
|
|
|
|
die("expected commit for %s, got %s",
|
2018-02-14 19:59:24 +01:00
|
|
|
oid_to_hex(&commit->object.oid), type_name(type));
|
2014-06-10 23:44:13 +02:00
|
|
|
if (sizep)
|
|
|
|
*sizep = size;
|
2014-06-10 23:40:39 +02:00
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-11-14 01:12:58 +01:00
|
|
|
void repo_unuse_commit_buffer(struct repository *r,
|
|
|
|
const struct commit *commit,
|
|
|
|
const void *buffer)
|
2014-06-10 23:40:39 +02:00
|
|
|
{
|
2018-06-29 03:22:15 +02:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
2018-11-14 01:12:58 +01:00
|
|
|
r->parsed_objects->buffer_slab, commit);
|
2015-05-15 00:25:52 +02:00
|
|
|
if (!(v && v->buffer == buffer))
|
2014-06-10 23:40:39 +02:00
|
|
|
free((void *)buffer);
|
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:40 +01:00
|
|
|
void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit)
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
{
|
2018-06-29 03:22:15 +02:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
2018-12-15 01:09:40 +01:00
|
|
|
pool->buffer_slab, commit);
|
2015-05-15 00:25:52 +02:00
|
|
|
if (v) {
|
2017-06-16 01:15:46 +02:00
|
|
|
FREE_AND_NULL(v->buffer);
|
2015-05-15 00:25:52 +02:00
|
|
|
v->size = 0;
|
|
|
|
}
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
}
|
|
|
|
|
2019-04-16 11:33:18 +02:00
|
|
|
static inline void set_commit_tree(struct commit *c, struct tree *t)
|
|
|
|
{
|
|
|
|
c->maybe_tree = t;
|
|
|
|
}
|
|
|
|
|
2019-04-16 11:33:19 +02:00
|
|
|
struct tree *repo_get_commit_tree(struct repository *r,
|
|
|
|
const struct commit *commit)
|
2018-04-06 21:09:34 +02:00
|
|
|
{
|
2018-04-06 21:09:46 +02:00
|
|
|
if (commit->maybe_tree || !commit->object.parsed)
|
|
|
|
return commit->maybe_tree;
|
|
|
|
|
2020-06-17 11:14:10 +02:00
|
|
|
if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
|
2019-05-08 17:37:25 +02:00
|
|
|
return get_commit_tree_in_graph(r, commit);
|
2018-04-06 21:09:46 +02:00
|
|
|
|
2019-04-10 04:13:20 +02:00
|
|
|
return NULL;
|
2018-04-06 21:09:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
struct object_id *get_commit_tree_oid(const struct commit *commit)
|
|
|
|
{
|
2023-03-28 15:58:48 +02:00
|
|
|
struct tree *tree = repo_get_commit_tree(the_repository, commit);
|
2019-09-06 00:04:57 +02:00
|
|
|
return tree ? &tree->object.oid : NULL;
|
2018-04-06 21:09:34 +02:00
|
|
|
}
|
|
|
|
|
2018-12-15 01:09:40 +01:00
|
|
|
void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
|
2018-05-15 23:48:42 +02:00
|
|
|
{
|
2019-04-16 11:33:18 +02:00
|
|
|
set_commit_tree(c, NULL);
|
2018-12-15 01:09:40 +01:00
|
|
|
free_commit_buffer(pool, c);
|
2019-08-26 04:01:37 +02:00
|
|
|
c->index = 0;
|
2018-05-15 23:48:42 +02:00
|
|
|
free_commit_list(c->parents);
|
|
|
|
|
|
|
|
c->object.parsed = 0;
|
|
|
|
}
|
|
|
|
|
2014-06-10 23:44:13 +02:00
|
|
|
const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
{
|
2018-06-29 03:22:15 +02:00
|
|
|
struct commit_buffer *v = buffer_slab_peek(
|
|
|
|
the_repository->parsed_objects->buffer_slab, commit);
|
2014-06-10 23:44:13 +02:00
|
|
|
void *ret;
|
|
|
|
|
2015-05-15 00:25:52 +02:00
|
|
|
if (!v) {
|
|
|
|
if (sizep)
|
|
|
|
*sizep = 0;
|
|
|
|
return NULL;
|
|
|
|
}
|
|