Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements. https://git-scm.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
git/merge-ort.c

5074 lines
162 KiB

/*
* "Ostensibly Recursive's Twin" merge strategy, or "ort" for short. Meant
* as a drop-in replacement for the "recursive" merge strategy, allowing one
* to replace
*
* git merge [-s recursive]
*
* with
*
* git merge -s ort
*
* Note: git's parser allows the space between '-s' and its argument to be
* missing. (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo",
* "cale", "peedy", or "ins" instead of "ort"?)
*/
#include "cache.h"
#include "merge-ort.h"
#include "alloc.h"
#include "attr.h"
#include "blob.h"
#include "cache-tree.h"
#include "commit.h"
#include "commit-reach.h"
merge-ort: port merge_start() from merge-recursive merge_start() basically does a bunch of sanity checks, then allocates and initializes opt->priv -- a struct merge_options_internal. Most of the sanity checks are usable as-is. The allocation/intialization is a bit different since merge-ort has a very different merge_options_internal than merge-recursive, but the idea is the same. The weirdest part here is that merge-ort and merge-recursive use the same struct merge_options, even though merge_options has a number of fields that are oddly specific to merge-recursive's internal implementation and don't even make sense with merge-ort's high-level design (e.g. buffer_output, which merge-ort has to always do). I reused the same data structure because: * most the fields made sense to both merge algorithms * making a new struct would have required making new enums or somehow externalizing them, and that was getting messy. * it simplifies converting the existing callers by not having to have different code paths for merge_options setup. I also marked detect_renames as ignored. We can revisit that later, but in short: merge-recursive allowed turning off rename detection because it was sometimes glacially slow. When you speed something up by a few orders of magnitude, it's worth revisiting whether that justification is still relevant. Besides, if folks find it's still too slow, perhaps they have a better scaling case than I could find and maybe it turns up some more optimizations we can add. If it still is needed as an option, it is easy to add later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
#include "diff.h"
#include "diffcore.h"
#include "dir.h"
#include "entry.h"
#include "ll-merge.h"
#include "object-store.h"
merge-ort: add prefetching for content merges Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-05) introduced batching of fetching missing blobs, so that the diff machinery would have one fetch subprocess grab N blobs instead of N processes each grabbing 1. However, the diff machinery is not the only thing in a merge that needs to work on blobs. The 3-way content merges need them as well. Rather than download all the blobs 1 at a time, prefetch all the blobs needed for regular content merges. This does not cover all possible paths in merge-ort that might need to download blobs. Others include: - The blob_unchanged() calls to avoid modify/delete conflicts (when blob renormalization results in an "unchanged" file) - Preliminary content merges needed for rename/add and rename/rename(2to1) style conflicts. (Both of these types of conflicts can result in nested conflict markers from the need to do two levels of content merging; the first happens before our new prefetch_for_content_merges() function.) The first of these wouldn't be an extreme amount of work to support, and even the second could be theoretically supported in batching, but all of these cases seem unusual to me, and this is a minor performance optimization anyway; in the worst case we only get some of the fetches batched and have a few additional one-off fetches. So for now, just handle the regular 3-way content merges in our prefetching. For the testcase from the previous commit, the number of downloaded objects remains at 63, but this drops the number of fetches needed from 32 down to 20, a sizeable reduction. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
#include "promisor-remote.h"
#include "revision.h"
#include "strmap.h"
#include "submodule-config.h"
#include "submodule.h"
#include "tree.h"
#include "unpack-trees.h"
merge-ort: use histogram diff In my cursory investigation, histogram diffs are about 2% slower than Myers diffs. Others have probably done more detailed benchmarks. But, in short, histogram diffs have been around for years and in a number of cases provide obviously better looking diffs where Myers diffs are unintelligible but the performance hit has kept them from becoming the default. However, there are real merge bugs we know about that have triggered on git.git and linux.git, which I don't have a clue how to address without the additional information that I believe is provided by histogram diffs. See the following: https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/ https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/ https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/ I don't like mismerges. I really don't like silent mismerges. While I am sometimes willing to make performance and correctness tradeoff, I'm much more interested in correctness in general. I want to fix the above bugs. I have not yet started doing so, but I believe histogram diff at least gives me an angle. Unfortunately, I can't rely on using the information from histogram diff unless it's in use. And it hasn't been used because of a few percentage performance hit. In testcases I have looked at, merge-ort is _much_ faster than merge-recursive for non-trivial merges/rebases/cherry-picks. As such, this is a golden opportunity to switch out the underlying diff algorithm (at least the one used by the merge machinery; git-diff and git-log are separate questions); doing so will allow me to get additional data and improved diffs, and I believe it will help me fix the above bugs at some point in the future. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
#include "xdiff-interface.h"
/*
* We have many arrays of size 3. Whenever we have such an array, the
* indices refer to one of the sides of the three-way merge. This is so
* pervasive that the constants 0, 1, and 2 are used in many places in the
* code (especially in arithmetic operations to find the other side's index
* or to compute a relevant mask), but sometimes these enum names are used
* to aid code clarity.
*
* See also 'filemask' and 'dirmask' in struct conflict_info; the "ith side"
* referred to there is one of these three sides.
*/
enum merge_side {
MERGE_BASE = 0,
MERGE_SIDE1 = 1,
MERGE_SIDE2 = 2
};
static unsigned RESULT_INITIALIZED = 0x1abe11ed; /* unlikely accidental value */
struct traversal_callback_data {
unsigned long mask;
unsigned long dirmask;
struct name_entry names[3];
};
merge-ort: add data structures for allowable trivial directory resolves As noted a few commits ago, we can resolve individual files early if all three sides of the merge have a file at the path and two of the three sides match. We would really like to do the same thing with directories, because being able to do a trivial directory resolve means we don't have to recurse into the directory, potentially saving us a huge amount of time in both collect_merge_info() and process_entries(). Unfortunately, resolving directories early would mean missing any renames whose source or destination is underneath that directory. If we somehow knew there weren't any renames under the directory in question, then we could resolve it early. Sadly, it is impossible to determine whether there are renames under the directory in question without recursing into it, and this has traditionally kept us from ever implementing such an optimization. In commit f89b4f2bee ("merge-ort: skip rename detection entirely if possible", 2021-03-11), we added an additional reason that rename detection could be skipped entirely -- namely, if no *relevant* sources were present. Without completing collect_merge_info_callback(), we do not yet know if there are no relevant sources. However, we do know that if the current directory on one side matches the merge base, then every source file within that directory will not be RELEVANT_CONTENT, and a few simple checks can often let us rule out RELEVANT_LOCATION as well. This suggests we can just defer recursing into such directories until the end of collect_merge_info. Since the deferred directories are known to not add any relevant sources due to the above properties, then if there are no relevant sources after we've traversed all paths other than the deferred ones, then we know there are not any relevant sources. Under those conditions, rename detection is unnecessary, and that means we can resolve the deferred directories without recursing into them. Note that the logic for skipping rename detection was also modified further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached renames when possible", 2021-01-30); in particular rename detection can be skipped if we already have cached renames for each relevant source. We can take advantage of this information as well with our deferral of recursing into directories where one side matches the merge base. Add some data structures that we will use to do these deferrals, with some lengthy comments explaining their purpose. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
struct deferred_traversal_data {
/*
* possible_trivial_merges: directories to be explored only when needed
*
* possible_trivial_merges is a map of directory names to
* dir_rename_mask. When we detect that a directory is unchanged on
* one side, we can sometimes resolve the directory without recursing
* into it. Renames are the only things that can prevent such an
* optimization. However, for rename sources:
* - If no parent directory needed directory rename detection, then
* no path under such a directory can be a relevant_source.
* and for rename destinations:
* - If no cached rename has a target path under the directory AND
* - If there are no unpaired relevant_sources elsewhere in the
* repository
* then we don't need any path under this directory for a rename
* destination. The only way to know the last item above is to defer
* handling such directories until the end of collect_merge_info(),
* in handle_deferred_entries().
*
* For each we store dir_rename_mask, since that's the only bit of
* information we need, other than the path, to resume the recursive
* traversal.
*/
struct strintmap possible_trivial_merges;
/*
* trivial_merges_okay: if trivial directory merges are okay
*
* See possible_trivial_merges above. The "no unpaired
* relevant_sources elsewhere in the repository" is a single boolean
* per merge side, which we store here. Note that while 0 means no,
* 1 only means "maybe" rather than "yes"; we optimistically set it
* to 1 initially and only clear when we determine it is unsafe to
* do trivial directory merges.
*/
unsigned trivial_merges_okay;
/*
* target_dirs: ancestor directories of rename targets
*
* target_dirs contains all directory names that are an ancestor of
* any rename destination.
*/
struct strset target_dirs;
};
struct rename_info {
/*
* All variables that are arrays of size 3 correspond to data tracked
* for the sides in enum merge_side. Index 0 is almost always unused
* because we often only need to track information for MERGE_SIDE1 and
* MERGE_SIDE2 (MERGE_BASE can't have rename information since renames
* are determined relative to what changed since the MERGE_BASE).
*/
/*
* pairs: pairing of filenames from diffcore_rename()
*/
struct diff_queue_struct pairs[3];
/*
* dirs_removed: directories removed on a given side of history.
*
* The keys of dirs_removed[side] are the directories that were removed
* on the given side of history. The value of the strintmap for each
* directory is a value from enum dir_rename_relevance.
*/
struct strintmap dirs_removed[3];
/*
* dir_rename_count: tracking where parts of a directory were renamed to
*
* When files in a directory are renamed, they may not all go to the
* same location. Each strmap here tracks:
* old_dir => {new_dir => int}
* That is, dir_rename_count[side] is a strmap to a strintmap.
*/
struct strmap dir_rename_count[3];
/*
* dir_renames: computed directory renames
*
* This is a map of old_dir => new_dir and is derived in part from
* dir_rename_count.
*/
struct strmap dir_renames[3];
merge-ort: precompute subset of sources for which we need rename detection rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
/*
* relevant_sources: deleted paths wanted in rename detection, and why
merge-ort: precompute subset of sources for which we need rename detection rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
*
* relevant_sources is a set of deleted paths on each side of
* history for which we need rename detection. If a path is deleted
* on one side of history, we need to detect if it is part of a
* rename if either
* * the file is modified/deleted on the other side of history
* * we need to detect renames for an ancestor directory
merge-ort: precompute subset of sources for which we need rename detection rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
* If neither of those are true, we can skip rename detection for
* that path. The reason is stored as a value from enum
* file_rename_relevance, as the reason can inform the algorithm in
* diffcore_rename_extended().
merge-ort: precompute subset of sources for which we need rename detection rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
*/
struct strintmap relevant_sources[3];
merge-ort: precompute subset of sources for which we need rename detection rename detection works by trying to pair all file deletions (or "sources") with all file additions (or "destinations"), checking similarity, and then marking the sufficiently similar ones as renames. This can be expensive if there are many sources and destinations on a given side of history as it results in an N x M comparison matrix. However, there are many cases where we can compute in advance that detecting renames for some of the sources provides no useful information and thus that we can exclude those sources from the matrix. To see why, first note that the merge machinery uses detected renames in two ways: * directory rename detection: when one side of history renames a directory, and the other side of history adds new files to that directory, we want to be able to warn the user about the need to chose whether those new files stay in the old directory or move to the new one. * three-way content merging: in order to do three-way content merging of files, we need three different file versions. If one side of history renamed a file, then some of the content for the file is found under a different path than in the merge base or on the other side of history. Add a simple testcase showing the two kinds of reasons renames are relevant; it's a testcase that will only pass if we detect both kinds of needed renames. Other than the testcase added above, this commit concentrates just on the three-way content merging; it will punt and mark all sources as needed for directory rename detection, and leave it to future commits to narrow that down more. The point of three-way content merging is to reconcile changes made on *both* sides of history. What if the file wasn't modified on both sides? There are two possibilities: * If it wasn't modified on the renamed side: -> then we get to do exact rename detection, which is cheap. * If it wasn't modified on the unrenamed side: -> then detection of a rename for that source file is irrelevant That latter claim might be surprising at first, so let's walk through a case to show why rename detection for that source file is irrelevant. Let's use two filenames, old.c & new.c, with the following abbreviated object ids (and where the value '000000' is used to denote that the file is missing in that commit): old.c new.c MERGE_BASE: 01d01d 000000 MERGE_SIDE1: 01d01d 000000 MERGE_SIDE2: 000000 5e1ec7 If the rename *isn't* detected: then old.c looks like it was unmodified on one side and deleted on the other and should thus be removed. new.c looks like a new file we should keep as-is. If the rename *is* detected: then a three-way content merge is done. Since the version of the file in MERGE_BASE and MERGE_SIDE1 are identical, the three-way merge will produce exactly the version of the file whose abbreviated object id is 5e1ec7. It will record that file at the path new.c, while removing old.c from the directory. Note that these two results are identical -- a single file named 'new.c' with object id 5e1ec7. In other words, it doesn't matter if the rename is detected in the case where the file is unmodified on the unrenamed side. Use this information to compute whether we need rename detection for each source created in add_pair(). It's probably worth noting that there used to be a few other edge or corner cases besides three-way content merges and directory rename detection where lack of rename detection could have affected the result, but those cases actually highlighted where conflict resolution methods were not consistent with each other. Fixing those inconsistencies were thus critically important to enabling this optimization. That work involved the following: * bringing consistency to add/add, rename/add, and rename/rename conflict types, as done back in the topic merged at commit ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04), and further extended in commits 2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts involving renames", 2020-08-10) and e8eb99d4a6 ("t642[23]: be more flexible for add/add conflicts involving pair renames", 2020-08-10) * making rename/delete more consistent with modify/delete as done in commits 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404, t6423: expect improved rename/delete handling in ort backend", 2020-10-26) Since the set of relevant_sources we compute has not yet been narrowed down for directory rename detection, we do not pass it to diffcore_rename_extended() yet. That will be done after subsequent commits narrow down the list of relevant_sources needed for directory rename detection reasons. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
merge-ort: add data structures for allowable trivial directory resolves As noted a few commits ago, we can resolve individual files early if all three sides of the merge have a file at the path and two of the three sides match. We would really like to do the same thing with directories, because being able to do a trivial directory resolve means we don't have to recurse into the directory, potentially saving us a huge amount of time in both collect_merge_info() and process_entries(). Unfortunately, resolving directories early would mean missing any renames whose source or destination is underneath that directory. If we somehow knew there weren't any renames under the directory in question, then we could resolve it early. Sadly, it is impossible to determine whether there are renames under the directory in question without recursing into it, and this has traditionally kept us from ever implementing such an optimization. In commit f89b4f2bee ("merge-ort: skip rename detection entirely if possible", 2021-03-11), we added an additional reason that rename detection could be skipped entirely -- namely, if no *relevant* sources were present. Without completing collect_merge_info_callback(), we do not yet know if there are no relevant sources. However, we do know that if the current directory on one side matches the merge base, then every source file within that directory will not be RELEVANT_CONTENT, and a few simple checks can often let us rule out RELEVANT_LOCATION as well. This suggests we can just defer recursing into such directories until the end of collect_merge_info. Since the deferred directories are known to not add any relevant sources due to the above properties, then if there are no relevant sources after we've traversed all paths other than the deferred ones, then we know there are not any relevant sources. Under those conditions, rename detection is unnecessary, and that means we can resolve the deferred directories without recursing into them. Note that the logic for skipping rename detection was also modified further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached renames when possible", 2021-01-30); in particular rename detection can be skipped if we already have cached renames for each relevant source. We can take advantage of this information as well with our deferral of recursing into directories where one side matches the merge base. Add some data structures that we will use to do these deferrals, with some lengthy comments explaining their purpose. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
struct deferred_traversal_data deferred[3];
merge-ort: precompute whether directory rename detection is needed The point of directory rename detection is that if one side of history renames a directory, and the other side adds new files under the old directory, then the merge can move those new files into the new directory. This leads to the following important observation: * If the other side does not add any new files under the old directory, we do not need to detect any renames for that directory. Similarly, directory rename detection had an important requirement: * If a directory still exists on one side of history, it has not been renamed on that side of history. (See section 4 of t6423 or Documentation/technical/directory-rename-detection.txt for more details). Using these two bits of information, we note that directory rename detection is only needed in cases where (1) directories exist in the merge base and on one side of history (i.e. dirmask == 3 or dirmask == 5), and (2) where there is some new file added to that directory on the side where it still exists (thus where the file has filemask == 2 or filemask == 4, respectively). This has to be done in two steps, because we have the dirmask when we are first considering the directory, and won't get the filemasks for the files within it until we recurse into that directory. So, we save dir_rename_mask = dirmask - 1 when we hit a directory that is missing on one side, and then later look for cases of filemask == dir_rename_mask One final note is that as soon as we hit a directory that needs directory rename detection, we will need to detect renames in all subdirectories of that directory as well due to the "majority rules" decision when files are renamed into different directory hierarchies. We arbitrarily use the special value of 0x07 to record when we've hit such a directory. The combination of all the above mean that we introduce a variable named dir_rename_mask (couldn't think of a better name) which has one of the following values as we traverse into a directory: * 0x00: directory rename detection not needed * 0x02 or 0x04: directory rename detection only needed if files added * 0x07: directory rename detection definitely needed We then pass this value through to add_pairs() so that it can mark location_relevant as true only when dir_rename_mask is 0x07. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
/*
* dir_rename_mask:
* 0: optimization removing unmodified potential rename source okay
* 2 or 4: optimization okay, but must check for files added to dir
* 7: optimization forbidden; need rename source in case of dir rename
*/
unsigned dir_rename_mask:3;
/*
* callback_data_*: supporting data structures for alternate traversal
*
* We sometimes need to be able to traverse through all the files
* in a given tree before all immediate subdirectories within that
* tree. Since traverse_trees() doesn't do that naturally, we have
* a traverse_trees_wrapper() that stores any immediate
* subdirectories while traversing files, then traverses the
* immediate subdirectories later. These callback_data* variables
* store the information for the subdirectories so that we can do
* that traversal order.
*/
struct traversal_callback_data *callback_data;
int callback_data_nr, callback_data_alloc;
char *callback_data_traverse_path;
merge-ort: add code to check for whether cached renames can be reused We need to know when renames detected in a previous merge operation can be reused in a later merge operation. Consider the following setup (from the git-rebase manpage): A---B---C topic / D---E---F---G master After rebasing, this will appear as: A'--B'--C' topic / D---E---F---G master Further, let's say that 'oldfile' was renamed to 'newfile' between E and G. The rebase or cherry-pick of A onto G will involve a three-way merge between E (as the merge base) and G and A. After detecting the rename between E:oldfile and G:newfile, there will be a three-way content merge of the following: E:oldfile G:newfile A:oldfile and produce a new result: A':newfile Now, when we want to pick B onto A', we will need to do a three-way merge between A (as the merge-base) and A' and B. This will involve a three-way content merge of A:oldfile A':newfile B:oldfile but only if we can detect that A:oldfile is similar enough to A':newfile to be used together in a three-way content merge, i.e. only if we can detect that A:oldfile and A':newfile are a rename. But we already know that A:oldfile and A':newfile are similar enough to be used in a three-way content merge, because that is precisely where A':newfile came from in the previous merge. Note that A & A' both appear in both merges. That gives us the condition under which we can reuse renames. There are a couple important points about this optimization: - If the rebase or cherry-pick halts for user conflicts, these caches are NOT saved anywhere. Thus, resuming a halted rebase or cherry-pick will result in no reused renames for the next commit. This is intentional, as user resolution can change files significantly and in ways that violate the similarity assumptions here. - Technically, in a *very* narrow case this might give slightly different results for rename detection. Using the example above, if: * E:oldfile had 20 lines * G:newfile added 10 new lines at the beginning of the file * A:oldfile deleted all but the first three lines of the file then => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. Consider the two cases: * Without this optimization: - the next step of the rebase operation (moving B to B') would not detect the rename betwen A:oldfile and A':newfile - we'd thus get a modify/delete conflict with the rebase operation halting for the user to resolve, and have both A':newfile and B:oldfile sitting in the working tree. * With this optimization: - the rename between A:oldfile and A':newfile would be detected via the cache of renames - a three-way merge between A:oldfile, A':newfile, and B:oldfile would commence and be written to A':newfile Now, is the difference in behavior a bug...or a bugfix? I can't tell. Given that A:oldfile and A':newfile are not very similar, when we three-way merge with B:oldfile it seems likely we'll hit a conflict for the user to resolve. And it shouldn't be too hard for users to see why we did that three-way merge; oldfile and newfile *were* renames somewhere in the sequence. So, most of these corner cases will still behave similarly -- namely, a conflict given to the user to resolve. Also, consider the interesting case when commit B is a clean revert of commit A. Without this optimization, a rebase could not both apply a weird patch like A and then immediately revert it; users would be forced to resolve merge conflicts. With this optimization, it would successfully apply the clean revert. So, there is certainly at least one case that behaves better. Even if it's considered a "difference in behavior", I think both behaviors are reasonable, and the time savings provided by this optimization justify using the slightly altered rename heuristics. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
/*
* merge_trees: trees passed to the merge algorithm for the merge
*
* merge_trees records the trees passed to the merge algorithm. But,
* this data also is stored in merge_result->priv. If a sequence of
* merges are being done (such as when cherry-picking or rebasing),
* the next merge can look at this and re-use information from
* previous merges under certain circumstances.
*
* See also all the cached_* variables.
*/
struct tree *merge_trees[3];
/*
* cached_pairs_valid_side: which side's cached info can be reused
*
* See the description for merge_trees. For repeated merges, at most
* only one side's cached information can be used. Valid values:
* MERGE_SIDE2: cached data from side2 can be reused
* MERGE_SIDE1: cached data from side1 can be reused
* 0: no cached data can be reused
merge-ort: restart merge with cached renames to reduce process entry cost The merge algorithm mostly consists of the following three functions: collect_merge_info() detect_and_process_renames() process_entries() Prior to the trivial directory resolution optimization of the last half dozen commits, process_entries() was consistently the slowest, followed by collect_merge_info(), then detect_and_process_renames(). When the trivial directory resolution applies, it often dramatically decreases the amount of time spent in the two slower functions. Looking at the performance results in the previous commit, the trivial directory resolution optimization helps amazingly well when there are no relevant renames. It also helps really well when reapplying a long series of linear commits (such as in a rebase or cherry-pick), since the relevant renames may well be cached from the first reapplied commit. But when there are any relevant renames that are not cached (represented by the just-one-mega testcase), then the optimization does not help at all. Often, I noticed that when the optimization does not apply, it is because there are a handful of relevant sources -- maybe even only one. It felt frustrating to need to recurse into potentially hundreds or even thousands of directories just for a single rename, but it was needed for correctness. However, staring at this list of functions and noticing that process_entries() is the most expensive and knowing I could avoid it if I had cached renames suggested a simple idea: change collect_merge_info() detect_and_process_renames() process_entries() into collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() This may seem odd and look like more work. However, note that although we run collect_merge_info() twice, the second time we get to employ trivial directory resolves, which makes it much faster, so the increased time in collect_merge_info() is small. While we run detect_and_process_renames() again, all renames are cached so it's nearly a no-op (we don't call into diffcore_rename_extended() but we do have a little bit of data structure checking and fixing up). And the big payoff comes from the fact that process_entries(), will be much faster due to having far fewer entries to process. This restarting only makes sense if we can save recursing into enough directories to make it worth our while. Introduce a simple heuristic to guide this. Note that this heuristic uses a "wanted_factor" that I have virtually no actual real world data for, just some back-of-the-envelope quasi-scientific calculations that I included in some comments and then plucked a simple round number out of thin air. It could be that tweaking this number to make it either higher or lower improves the optimization. (There's slightly more here; when I first introduced this optimization, I used a factor of 10, because I was completely confident it was big enough to not cause slowdowns in special cases. I was certain it was higher than needed. Several months later, I added the rough calculations which make me think the optimal number is close to 2; but instead of pushing to the limit, I just bumped it to 3 to reduce the risk that there are special cases where this optimization can result in slowing down the code a little. If the ratio of path counts is below 3, we probably will only see minor performance improvements at best anyway.) Also, note that while the diffstat looks kind of long (nearly 100 lines), more than half of it is in two comments explaining how things work. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
* -1: See redo_after_renames; both sides can be reused.
merge-ort: add code to check for whether cached renames can be reused We need to know when renames detected in a previous merge operation can be reused in a later merge operation. Consider the following setup (from the git-rebase manpage): A---B---C topic / D---E---F---G master After rebasing, this will appear as: A'--B'--C' topic / D---E---F---G master Further, let's say that 'oldfile' was renamed to 'newfile' between E and G. The rebase or cherry-pick of A onto G will involve a three-way merge between E (as the merge base) and G and A. After detecting the rename between E:oldfile and G:newfile, there will be a three-way content merge of the following: E:oldfile G:newfile A:oldfile and produce a new result: A':newfile Now, when we want to pick B onto A', we will need to do a three-way merge between A (as the merge-base) and A' and B. This will involve a three-way content merge of A:oldfile A':newfile B:oldfile but only if we can detect that A:oldfile is similar enough to A':newfile to be used together in a three-way content merge, i.e. only if we can detect that A:oldfile and A':newfile are a rename. But we already know that A:oldfile and A':newfile are similar enough to be used in a three-way content merge, because that is precisely where A':newfile came from in the previous merge. Note that A & A' both appear in both merges. That gives us the condition under which we can reuse renames. There are a couple important points about this optimization: - If the rebase or cherry-pick halts for user conflicts, these caches are NOT saved anywhere. Thus, resuming a halted rebase or cherry-pick will result in no reused renames for the next commit. This is intentional, as user resolution can change files significantly and in ways that violate the similarity assumptions here. - Technically, in a *very* narrow case this might give slightly different results for rename detection. Using the example above, if: * E:oldfile had 20 lines * G:newfile added 10 new lines at the beginning of the file * A:oldfile deleted all but the first three lines of the file then => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. Consider the two cases: * Without this optimization: - the next step of the rebase operation (moving B to B') would not detect the rename betwen A:oldfile and A':newfile - we'd thus get a modify/delete conflict with the rebase operation halting for the user to resolve, and have both A':newfile and B:oldfile sitting in the working tree. * With this optimization: - the rename between A:oldfile and A':newfile would be detected via the cache of renames - a three-way merge between A:oldfile, A':newfile, and B:oldfile would commence and be written to A':newfile Now, is the difference in behavior a bug...or a bugfix? I can't tell. Given that A:oldfile and A':newfile are not very similar, when we three-way merge with B:oldfile it seems likely we'll hit a conflict for the user to resolve. And it shouldn't be too hard for users to see why we did that three-way merge; oldfile and newfile *were* renames somewhere in the sequence. So, most of these corner cases will still behave similarly -- namely, a conflict given to the user to resolve. Also, consider the interesting case when commit B is a clean revert of commit A. Without this optimization, a rebase could not both apply a weird patch like A and then immediately revert it; users would be forced to resolve merge conflicts. With this optimization, it would successfully apply the clean revert. So, there is certainly at least one case that behaves better. Even if it's considered a "difference in behavior", I think both behaviors are reasonable, and the time savings provided by this optimization justify using the slightly altered rename heuristics. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
*/
int cached_pairs_valid_side;
/*
* cached_pairs: Caching of renames and deletions.
*
* These are mappings recording renames and deletions of individual
* files (not directories). They are thus a map from an old
* filename to either NULL (for deletions) or a new filename (for
* renames).
*/
struct strmap cached_pairs[3];
/*
* cached_target_names: just the destinations from cached_pairs
*
* We sometimes want a fast lookup to determine if a given filename
* is one of the destinations in cached_pairs. cached_target_names
* is thus duplicative information, but it provides a fast lookup.
*/
struct strset cached_target_names[3];
/*
* cached_irrelevant: Caching of rename_sources that aren't relevant.
*
* If we try to detect a rename for a source path and succeed, it's
* part of a rename. If we try to detect a rename for a source path
* and fail, then it's a delete. If we do not try to detect a rename
* for a path, then we don't know if it's a rename or a delete. If
* merge-ort doesn't think the path is relevant, then we just won't
* cache anything for that path. But there's a slight problem in
* that merge-ort can think a path is RELEVANT_LOCATION, but due to
* commit 9bd342137e ("diffcore-rename: determine which
* relevant_sources are no longer relevant", 2021-03-13),
* diffcore-rename can downgrade the path to RELEVANT_NO_MORE. To
* avoid excessive calls to diffcore_rename_extended() we still need
* to cache such paths, though we cannot record them as either
* renames or deletes. So we cache them here as a "turned out to be
* irrelevant *for this commit*" as they are often also irrelevant
* for subsequent commits, though we will have to do some extra
* checking to see whether such paths become relevant for rename
* detection when cherry-picking/rebasing subsequent commits.
*/
struct strset cached_irrelevant[3];
merge-ort: restart merge with cached renames to reduce process entry cost The merge algorithm mostly consists of the following three functions: collect_merge_info() detect_and_process_renames() process_entries() Prior to the trivial directory resolution optimization of the last half dozen commits, process_entries() was consistently the slowest, followed by collect_merge_info(), then detect_and_process_renames(). When the trivial directory resolution applies, it often dramatically decreases the amount of time spent in the two slower functions. Looking at the performance results in the previous commit, the trivial directory resolution optimization helps amazingly well when there are no relevant renames. It also helps really well when reapplying a long series of linear commits (such as in a rebase or cherry-pick), since the relevant renames may well be cached from the first reapplied commit. But when there are any relevant renames that are not cached (represented by the just-one-mega testcase), then the optimization does not help at all. Often, I noticed that when the optimization does not apply, it is because there are a handful of relevant sources -- maybe even only one. It felt frustrating to need to recurse into potentially hundreds or even thousands of directories just for a single rename, but it was needed for correctness. However, staring at this list of functions and noticing that process_entries() is the most expensive and knowing I could avoid it if I had cached renames suggested a simple idea: change collect_merge_info() detect_and_process_renames() process_entries() into collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() This may seem odd and look like more work. However, note that although we run collect_merge_info() twice, the second time we get to employ trivial directory resolves, which makes it much faster, so the increased time in collect_merge_info() is small. While we run detect_and_process_renames() again, all renames are cached so it's nearly a no-op (we don't call into diffcore_rename_extended() but we do have a little bit of data structure checking and fixing up). And the big payoff comes from the fact that process_entries(), will be much faster due to having far fewer entries to process. This restarting only makes sense if we can save recursing into enough directories to make it worth our while. Introduce a simple heuristic to guide this. Note that this heuristic uses a "wanted_factor" that I have virtually no actual real world data for, just some back-of-the-envelope quasi-scientific calculations that I included in some comments and then plucked a simple round number out of thin air. It could be that tweaking this number to make it either higher or lower improves the optimization. (There's slightly more here; when I first introduced this optimization, I used a factor of 10, because I was completely confident it was big enough to not cause slowdowns in special cases. I was certain it was higher than needed. Several months later, I added the rough calculations which make me think the optimal number is close to 2; but instead of pushing to the limit, I just bumped it to 3 to reduce the risk that there are special cases where this optimization can result in slowing down the code a little. If the ratio of path counts is below 3, we probably will only see minor performance improvements at best anyway.) Also, note that while the diffstat looks kind of long (nearly 100 lines), more than half of it is in two comments explaining how things work. For the testcases mentioned in commit 557ac0350d ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 205.1 ms ± 3.8 ms 204.2 ms ± 3.0 ms mega-renames: 1.564 s ± 0.010 s 1.076 s ± 0.015 s just-one-mega: 479.5 ms ± 3.9 ms 364.1 ms ± 7.0 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
/*
* redo_after_renames: optimization flag for "restarting" the merge
*
* Sometimes it pays to detect renames, cache them, and then
* restart the merge operation from the beginning. The reason for
* this is that when we know where all the renames are, we know
* whether a certain directory has any paths under it affected --
* and if a directory is not affected then it permits us to do
* trivial tree merging in more cases. Doing trivial tree merging
* prevents the need to run process_entry() on every path
* underneath trees that can be trivially merged, and
* process_entry() is more expensive than collect_merge_info() --
* plus, the second collect_merge_info() will be much faster since
* it doesn't have to recurse into the relevant trees.
*
* Values for this flag:
* 0 = don't bother, not worth it (or conditions not yet checked)
* 1 = conditions for optimization met, optimization worthwhile
* 2 = we already did it (don't restart merge yet again)
*/
unsigned redo_after_renames;
/*
* needed_limit: value needed for inexact rename detection to run
*
* If the current rename limit wasn't high enough for inexact
* rename detection to run, this records the limit needed. Otherwise,
* this value remains 0.
*/
int needed_limit;
};
struct merge_options_internal {
/*
* paths: primary data structure in all of merge ort.
*
* The keys of paths:
* * are full relative paths from the toplevel of the repository
* (e.g. "drivers/firmware/raspberrypi.c").
* * store all relevant paths in the repo, both directories and
* files (e.g. drivers, drivers/firmware would also be included)
* * these keys serve to intern all the path strings, which allows
* us to do pointer comparison on directory names instead of
* strcmp; we just have to be careful to use the interned strings.
*
* The values of paths:
* * either a pointer to a merged_info, or a conflict_info struct
* * merged_info contains all relevant information for a
* non-conflicted entry.
* * conflict_info contains a merged_info, plus any additional
* information about a conflict such as the higher orders stages
* involved and the names of the paths those came from (handy
* once renames get involved).
* * a path may start "conflicted" (i.e. point to a conflict_info)
* and then a later step (e.g. three-way content merge) determines
* it can be cleanly merged, at which point it'll be marked clean
* and the algorithm will ignore any data outside the contained
* merged_info for that entry
* * If an entry remains conflicted, the merged_info portion of a
* conflict_info will later be filled with whatever version of
* the file should be placed in the working directory (e.g. an
* as-merged-as-possible variation that contains conflict markers).
*/
struct strmap paths;
/*
* conflicted: a subset of keys->values from "paths"
*
* conflicted is basically an optimization between process_entries()
* and record_conflicted_index_entries(); the latter could loop over
* ALL the entries in paths AGAIN and look for the ones that are
* still conflicted, but since process_entries() has to loop over
* all of them, it saves the ones it couldn't resolve in this strmap
* so that record_conflicted_index_entries() can iterate just the
* relevant entries.
*/
struct strmap conflicted;
/*
* pool: memory pool for fast allocation/deallocation
*
* We allocate room for lots of filenames and auxiliary data
* structures in merge_options_internal, and it tends to all be
* freed together too. Using a memory pool for these provides a
* nice speedup.
*/
struct mem_pool pool;
merge-ort: add modify/delete handling and delayed output processing The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
/*
* conflicts: logical conflicts and messages stored by _primary_ path
merge-ort: add modify/delete handling and delayed output processing The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
*
* This is a map of pathnames (a subset of the keys in "paths" above)
* to struct string_list, with each item's `util` containing a
* `struct logical_conflict_info`. Note, though, that for each path,
* it only stores the logical conflicts for which that path is the
* primary path; the path might be part of additional conflicts.
merge-ort: add modify/delete handling and delayed output processing The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
*/
struct strmap conflicts;
merge-ort: add modify/delete handling and delayed output processing The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for 1f3c9ba707 ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
/*
* renames: various data relating to rename detection
*/
struct rename_info renames;
/*
* attr_index: hacky minimal index used for renormalization
*
* renormalization code _requires_ an index, though it only needs to
* find a .gitattributes file within the index. So, when
* renormalization is important, we create a special index with just
* that one file.
*/
struct index_state attr_index;
/*
* current_dir_name, toplevel_dir: temporary vars
*
* These are used in collect_merge_info_callback(), and will set the
* various merged_info.directory_name for the various paths we get;
* see documentation for that variable and the requirements placed on
* that field.
*/
const char *current_dir_name;
const char *toplevel_dir;
/* call_depth: recursion level counter for merging merge bases */
int call_depth;
submodule merge: update conflict error message When attempting to merge in a superproject with conflicting submodule pointers that cannot be fast-forwarded or trivially resolved, the merge fails and Git prints an error message that accurately describes the failure, but does not provide steps for the user to resolve the error. Git is left in a conflicted state, which requires the user to: 1. merge submodules or update submodules to an already existing commit that reflects the merge 2. add submodules changes to the superproject 3. finish merging superproject These steps are non-obvious for newer submodule users to figure out based on the error message and neither `git submodule status` nor `git status` provide any useful pointers. Update error message to provide steps to resolve submodule merge conflict. Future work could involve adding an advice flag to the message. Although the message is long, it also has the id of the submodule commit that needs to be merged, which could be useful information for the user. Additionally, 5 merge failures that resulted in an early return have been updated to reflect the status of the merge. 1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added as a new conflict type and will print updated error message. 2. Null merge side a (null a): BUG(). See [1] for discussion 3. Null merge side b (null b): BUG(). See [1] for discussion 4. Submodule not checked out: added NEEDSWORK bit 5. Submodule commits not present: added NEEDSWORK bit The errors with a NEEDSWORK bit deserve a more detailed explanation of how to resolve them. See [2] for more context. [1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/ [2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/ Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 months ago
/* field that holds submodule conflict information */
struct string_list conflicted_submodules;
};
struct conflicted_submodule_item {
char *abbrev;
int flag;
};
submodule merge: update conflict error message When attempting to merge in a superproject with conflicting submodule pointers that cannot be fast-forwarded or trivially resolved, the merge fails and Git prints an error message that accurately describes the failure, but does not provide steps for the user to resolve the error. Git is left in a conflicted state, which requires the user to: 1. merge submodules or update submodules to an already existing commit that reflects the merge 2. add submodules changes to the superproject 3. finish merging superproject These steps are non-obvious for newer submodule users to figure out based on the error message and neither `git submodule status` nor `git status` provide any useful pointers. Update error message to provide steps to resolve submodule merge conflict. Future work could involve adding an advice flag to the message. Although the message is long, it also has the id of the submodule commit that needs to be merged, which could be useful information for the user. Additionally, 5 merge failures that resulted in an early return have been updated to reflect the status of the merge. 1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added as a new conflict type and will print updated error message. 2. Null merge side a (null a): BUG(). See [1] for discussion 3. Null merge side b (null b): BUG(). See [1] for discussion 4. Submodule not checked out: added NEEDSWORK bit 5. Submodule commits not present: added NEEDSWORK bit The errors with a NEEDSWORK bit deserve a more detailed explanation of how to resolve them. See [2] for more context. [1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/ [2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/ Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 months ago
static void conflicted_submodule_item_free(void *util, const char *str)
{
struct conflicted_submodule_item *item = util;
free(item->abbrev);
free(item);
}
struct version_info {
struct object_id oid;
unsigned short mode;
};
struct merged_info {
/* if is_null, ignore result. otherwise result has oid & mode */
struct version_info result;
unsigned is_null:1;
/*
* clean: whether the path in question is cleanly merged.
*
* see conflict_info.merged for more details.
*/
unsigned clean:1;
/*
* basename_offset: offset of basename of path.
*
* perf optimization to avoid recomputing offset of final '/'
* character in pathname (0 if no '/' in pathname).
*/
size_t basename_offset;
/*
* directory_name: containing directory name.
*
* Note that we assume directory_name is constructed such that
* strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
* i.e. string equality is equivalent to pointer equality. For this
* to hold, we have to be careful setting directory_name.
*/
const char *directory_name;
};
struct conflict_info {
/*
* merged: the version of the path that will be written to working tree
*
* WARNING: It is critical to check merged.clean and ensure it is 0
* before reading any conflict_info fields outside of merged.
* Allocated merge_info structs will always have clean set to 1.
* Allocated conflict_info structs will have merged.clean set to 0
* initially. The merged.clean field is how we know if it is safe
* to access other parts of conflict_info besides merged; if a
* conflict_info's merged.clean is changed to 1, the rest of the
* algorithm is not allowed to look at anything outside of the
* merged member anymore.
*/
struct merged_info merged;
/* oids & modes from each of the three trees for this path */
struct version_info stages[3];
/* pathnames for each stage; may differ due to rename detection */
const char *pathnames[3];
/* Whether this path is/was involved in a directory/file conflict */
unsigned df_conflict:1;
/*
* Whether this path is/was involved in a non-content conflict other
* than a directory/file conflict (e.g. rename/rename, rename/delete,
* file location based on possible directory rename).
*/
unsigned path_conflict:1;
/*
* For filemask and dirmask, the ith bit corresponds to whether the
* ith entry is a file (filemask) or a directory (dirmask). Thus,
* filemask & dirmask is always zero, and filemask | dirmask is at
* most 7 but can be less when a path does not appear as either a
* file or a directory on at least one side of history.
*
* Note that these masks are related to enum merge_side, as the ith
* entry corresponds to side i.
*
* These values come from a traverse_trees() call; more info may be
* found looking at tree-walk.h's struct traverse_info,
* particularly the documentation above the "fn" member (note that
* filemask = mask & ~dirmask from that documentation).
*/
unsigned filemask:3;
unsigned dirmask:3;
/*
* Optimization to track which stages match, to avoid the need to
* recompute it in multiple steps. Either 0 or at least 2 bits are
* set; if at least 2 bits are set, their corresponding stages match.
*/
unsigned match_mask:3;
};
enum conflict_and_info_types {
/* "Simple" conflicts and informational messages */
INFO_AUTO_MERGING = 0,
CONFLICT_CONTENTS, /* text file that failed to merge */
CONFLICT_BINARY,
CONFLICT_FILE_DIRECTORY,
CONFLICT_DISTINCT_MODES,
CONFLICT_MODIFY_DELETE,
/* Regular rename */
CONFLICT_RENAME_RENAME, /* same file renamed differently */
CONFLICT_RENAME_COLLIDES, /* rename/add or two files renamed to 1 */
CONFLICT_RENAME_DELETE,
/* Basic directory rename */
CONFLICT_DIR_RENAME_SUGGESTED,
INFO_DIR_RENAME_APPLIED,
/* Special directory rename cases */
INFO_DIR_RENAME_SKIPPED_DUE_TO_RERENAME,
CONFLICT_DIR_RENAME_FILE_IN_WAY,
CONFLICT_DIR_RENAME_COLLISION,
CONFLICT_DIR_RENAME_SPLIT,
/* Basic submodule */
INFO_SUBMODULE_FAST_FORWARDING,
CONFLICT_SUBMODULE_FAILED_TO_MERGE,
/* Special submodule cases broken out from FAILED_TO_MERGE */
CONFLICT_SUBMODULE_FAILED_TO_MERGE_BUT_POSSIBLE_RESOLUTION,
CONFLICT_SUBMODULE_NOT_INITIALIZED,
CONFLICT_SUBMODULE_HISTORY_NOT_AVAILABLE,
CONFLICT_SUBMODULE_MAY_HAVE_REWINDS,
submodule merge: update conflict error message When attempting to merge in a superproject with conflicting submodule pointers that cannot be fast-forwarded or trivially resolved, the merge fails and Git prints an error message that accurately describes the failure, but does not provide steps for the user to resolve the error. Git is left in a conflicted state, which requires the user to: 1. merge submodules or update submodules to an already existing commit that reflects the merge 2. add submodules changes to the superproject 3. finish merging superproject These steps are non-obvious for newer submodule users to figure out based on the error message and neither `git submodule status` nor `git status` provide any useful pointers. Update error message to provide steps to resolve submodule merge conflict. Future work could involve adding an advice flag to the message. Although the message is long, it also has the id of the submodule commit that needs to be merged, which could be useful information for the user. Additionally, 5 merge failures that resulted in an early return have been updated to reflect the status of the merge. 1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added as a new conflict type and will print updated error message. 2. Null merge side a (null a): BUG(). See [1] for discussion 3. Null merge side b (null b): BUG(). See [1] for discussion 4. Submodule not checked out: added NEEDSWORK bit 5. Submodule commits not present: added NEEDSWORK bit The errors with a NEEDSWORK bit deserve a more detailed explanation of how to resolve them. See [2] for more context. [1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/ [2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/ Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 months ago
CONFLICT_SUBMODULE_NULL_MERGE_BASE,
/* Keep this entry _last_ in the list */
NB_CONFLICT_TYPES,
};
/*
* Short description of conflict type, relied upon by external tools.
*
* We can add more entries, but DO NOT change any of these strings. Also,
* Order MUST match conflict_info_and_types.
*/
static const char *type_short_descriptions[] = {
/*** "Simple" conflicts and informational messages ***/
[INFO_AUTO_MERGING] = "Auto-merging",
[CONFLICT_CONTENTS] = "CONFLICT (contents)",
[CONFLICT_BINARY] = "CONFLICT (binary)",
[CONFLICT_FILE_DIRECTORY] = "CONFLICT (file/directory)",
[CONFLICT_DISTINCT_MODES] = "CONFLICT (distinct modes)",
[CONFLICT_MODIFY_DELETE] = "CONFLICT (modify/delete)",
/*** Regular rename ***/
[CONFLICT_RENAME_RENAME] = "CONFLICT (rename/rename)",
[CONFLICT_RENAME_COLLIDES] = "CONFLICT (rename involved in collision)",
[CONFLICT_RENAME_DELETE] = "CONFLICT (rename/delete)",
/*** Basic directory rename ***/
[CONFLICT_DIR_RENAME_SUGGESTED] =
"CONFLICT (directory rename suggested)",
[INFO_DIR_RENAME_APPLIED] = "Path updated due to directory rename",
/*** Special directory rename cases ***/
[INFO_DIR_RENAME_SKIPPED_DUE_TO_RERENAME] =
"Directory rename skipped since directory was renamed on both sides",
[CONFLICT_DIR_RENAME_FILE_IN_WAY] =
"CONFLICT (file in way of directory rename)",
[CONFLICT_DIR_RENAME_COLLISION] = "CONFLICT(directory rename collision)",
[CONFLICT_DIR_RENAME_SPLIT] = "CONFLICT(directory rename unclear split)",
/*** Basic submodule ***/
[INFO_SUBMODULE_FAST_FORWARDING] = "Fast forwarding submodule",
[CONFLICT_SUBMODULE_FAILED_TO_MERGE] = "CONFLICT (submodule)",
/*** Special submodule cases broken out from FAILED_TO_MERGE ***/
[CONFLICT_SUBMODULE_FAILED_TO_MERGE_BUT_POSSIBLE_RESOLUTION] =
"CONFLICT (submodule with possible resolution)",
[CONFLICT_SUBMODULE_NOT_INITIALIZED] =
"CONFLICT (submodule not initialized)",
[CONFLICT_SUBMODULE_HISTORY_NOT_AVAILABLE] =
"CONFLICT (submodule history not available)",
[CONFLICT_SUBMODULE_MAY_HAVE_REWINDS] =
"CONFLICT (submodule may have rewinds)",
submodule merge: update conflict error message When attempting to merge in a superproject with conflicting submodule pointers that cannot be fast-forwarded or trivially resolved, the merge fails and Git prints an error message that accurately describes the failure, but does not provide steps for the user to resolve the error. Git is left in a conflicted state, which requires the user to: 1. merge submodules or update submodules to an already existing commit that reflects the merge 2. add submodules changes to the superproject 3. finish merging superproject These steps are non-obvious for newer submodule users to figure out based on the error message and neither `git submodule status` nor `git status` provide any useful pointers. Update error message to provide steps to resolve submodule merge conflict. Future work could involve adding an advice flag to the message. Although the message is long, it also has the id of the submodule commit that needs to be merged, which could be useful information for the user. Additionally, 5 merge failures that resulted in an early return have been updated to reflect the status of the merge. 1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added as a new conflict type and will print updated error message. 2. Null merge side a (null a): BUG(). See [1] for discussion 3. Null merge side b (null b): BUG(). See [1] for discussion 4. Submodule not checked out: added NEEDSWORK bit 5. Submodule commits not present: added NEEDSWORK bit The errors with a NEEDSWORK bit deserve a more detailed explanation of how to resolve them. See [2] for more context. [1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/ [2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/ Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 months ago
[CONFLICT_SUBMODULE_NULL_MERGE_BASE] =
"CONFLICT (submodule lacks merge base)"
};
struct logical_conflict_info {
enum conflict_and_info_types type;
struct strvec paths;
};
/*** Function Grouping: various utility functions ***/
/*
* For the next three macros, see warning for conflict_info.merged.
*
* In each of the below, mi is a struct merged_info*, and ci was defined
* as a struct conflict_info* (but we need to verify ci isn't actually
* pointed at a struct merged_info*).
*
* INITIALIZE_CI: Assign ci to mi but only if it's safe; set to NULL otherwise.
* VERIFY_CI: Ensure that something we assigned to a conflict_info* is one.
* ASSIGN_AND_VERIFY_CI: Similar to VERIFY_CI but do assignment first.
*/
#define INITIALIZE_CI(ci, mi) do { \
(ci) = (!(mi) || (mi)->clean) ? NULL : (struct conflict_info *)(mi); \
} while (0)
#define VERIFY_CI(ci) assert(ci && !ci->merged.clean);
#define ASSIGN_AND_VERIFY_CI(ci, mi) do { \
(ci) = (struct conflict_info *)(mi); \
assert((ci) && !(mi)->clean); \
} while (0)
static void free_strmap_strings(struct strmap *map)
{
struct hashmap_iter iter;
struct strmap_entry *entry;
strmap_for_each_entry(map, &iter, entry) {
free((char*)entry->key);
}
}
static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
int reinitialize)
{