Git Source Code Mirror - This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements. https://git-scm.com/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
git/git-compat-util.h

1453 lines
39 KiB

#ifndef GIT_COMPAT_UTIL_H
#define GIT_COMPAT_UTIL_H
git-compat-util: add a test balloon for C99 support The C99 standard was released in January 1999, now 22 years ago. It provides a variety of useful features, including variadic arguments for macros, declarations after statements, designated initializers, and a wide variety of other useful features, many of which we already use. We'd like to take advantage of these features, but we want to be cautious. As far as we know, all major compilers now support C99 or a later C standard, such as C11 or C17. POSIX has required C99 support as a requirement for the 2001 revision, so we can safely assume any POSIX system which we are interested in supporting has C99. Even MSVC, long a holdout against modern C, now supports both C11 and C17 with an appropriate update. Moreover, even if people are using an older version of MSVC on these systems, they will generally need some implementation of the standard Unix utilities for the testsuite, and GNU coreutils, the most common option, has required C99 since 2009. Therefore, we can safely assume that a suitable version of GCC or clang is available to users even if their version of MSVC is not sufficiently capable. Let's add a test balloon to git-compat-util.h to see if anyone is using an older compiler. We'll add a comment telling people how to enable this functionality on GCC and Clang, even though modern versions of both will automatically do the right thing, and ask people still experiencing a problem to report that to us on the list. Note that C89 compilers don't provide the __STDC_VERSION__ macro, so we use a well-known hack of using "- 0". On compilers with this macro, it doesn't change the value, and on C89 compilers, the macro will be replaced with nothing, and our value will be 0. For sparse, we explicitly request the gnu99 style because we've traditionally taken advantage of some GCC- and clang-specific extensions when available and we'd like to retain the ability to do that. sparse also defaults to C89 without it, so things will fail for us if we don't. Update the cmake configuration to require C11 for MSVC. We do this because this will make MSVC to use C11, since it does not explicitly support C99. We do this with a compiler options because setting the C_STANDARD option does not work in our CI on MSVC and at the moment, we don't want to require C11 for Unix compilers. In the Makefile, don't set any compiler flags for the compiler itself, since on some systems, such as FreeBSD, we actually need C11, and asking for C99 causes things to fail to compile. The error message should make it obvious what's going wrong and allow a user to set the appropriate option when building in the event they're using a Unix compiler that doesn't support it by default. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 year ago
#if __STDC_VERSION__ - 0 < 199901L
/*
* Git is in a testing period for mandatory C99 support in the compiler. If
* your compiler is reasonably recent, you can try to enable C99 support (or,
* for MSVC, C11 support). If you encounter a problem and can't enable C99
* support with your compiler (such as with "-std=gnu99") and don't have access
* to one with this support, such as GCC or Clang, you can remove this #if
* directive, but please report the details of your system to
* git@vger.kernel.org.
*/
#error "Required C99 support is in a test phase. Please see git-compat-util.h for more details."
#endif
#ifdef USE_MSVC_CRTDBG
/*
* For these to work they must appear very early in each
* file -- before most of the standard header files.
*/
#include <stdlib.h>
#include <crtdbg.h>
#endif
#define _FILE_OFFSET_BITS 64
/* Derived from Linux "Features Test Macro" header
* Convenience macros to test the versions of gcc (or
* a compatible compiler).
* Use them like this:
* #if GIT_GNUC_PREREQ (2,8)
* ... code requiring gcc 2.8 or later ...
* #endif
*/
#if defined(__GNUC__) && defined(__GNUC_MINOR__)
# define GIT_GNUC_PREREQ(maj, min) \
((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min))
#else
#define GIT_GNUC_PREREQ(maj, min) 0
#endif
#ifndef FLEX_ARRAY
/*
* See if our compiler is known to support flexible array members.
*/
/*
* Check vendor specific quirks first, before checking the
* __STDC_VERSION__, as vendor compilers can lie and we need to be
* able to work them around. Note that by not defining FLEX_ARRAY
* here, we can fall back to use the "safer but a bit wasteful" one
* later.
*/
#if defined(__SUNPRO_C) && (__SUNPRO_C <= 0x580)
#elif defined(__GNUC__)
# if (__GNUC__ >= 3)
# define FLEX_ARRAY /* empty */
# else
# define FLEX_ARRAY 0 /* older GNU extension */
# endif
#elif defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
# define FLEX_ARRAY /* empty */
#endif
/*
* Otherwise, default to safer but a bit wasteful traditional style
*/
#ifndef FLEX_ARRAY
# define FLEX_ARRAY 1
#endif
#endif
/*
* BUILD_ASSERT_OR_ZERO - assert a build-time dependency, as an expression.
* @cond: the compile-time condition which must be true.
*
* Your compile will fail if the condition isn't true, or can't be evaluated
* by the compiler. This can be used in an expression: its value is "0".
*
* Example:
* #define foo_to_char(foo) \
* ((char *)(foo) \
* + BUILD_ASSERT_OR_ZERO(offsetof(struct foo, string) == 0))
*/
#define BUILD_ASSERT_OR_ZERO(cond) \
(sizeof(char [1 - 2*!(cond)]) - 1)
#if GIT_GNUC_PREREQ(3, 1)
/* &arr[0] degrades to a pointer: a different type from an array */
# define BARF_UNLESS_AN_ARRAY(arr) \
BUILD_ASSERT_OR_ZERO(!__builtin_types_compatible_p(__typeof__(arr), \
__typeof__(&(arr)[0])))
#else
# define BARF_UNLESS_AN_ARRAY(arr) 0
#endif
/*
* ARRAY_SIZE - get the number of elements in a visible array
* @x: the array whose size you want.
*
* This does not work on pointers, or arrays declared as [], or
* function parameters. With correct compiler support, such usage
* will cause a build error (see the build_assert_or_zero macro).
*/
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + BARF_UNLESS_AN_ARRAY(x))
#define bitsizeof(x) (CHAR_BIT * sizeof(x))
#define maximum_signed_value_of_type(a) \
(INTMAX_MAX >> (bitsizeof(intmax_t) - bitsizeof(a)))
#define maximum_unsigned_value_of_type(a) \
(UINTMAX_MAX >> (bitsizeof(uintmax_t) - bitsizeof(a)))
/*
* Signed integer overflow is undefined in C, so here's a helper macro
* to detect if the sum of two integers will overflow.
*
* Requires: a >= 0, typeof(a) equals typeof(b)
*/
#define signed_add_overflows(a, b) \
((b) > maximum_signed_value_of_type(a) - (a))
#define unsigned_add_overflows(a, b) \
((b) > maximum_unsigned_value_of_type(a) - (a))
/*
* Returns true if the multiplication of "a" and "b" will
* overflow. The types of "a" and "b" must match and must be unsigned.
* Note that this macro evaluates "a" twice!
*/
#define unsigned_mult_overflows(a, b) \
((a) && (b) > maximum_unsigned_value_of_type(a) / (a))
/*
* Returns true if the left shift of "a" by "shift" bits will
* overflow. The type of "a" must be unsigned.
*/
#define unsigned_left_shift_overflows(a, shift) \
((shift) < bitsizeof(a) && \
(a) > maximum_unsigned_value_of_type(a) >> (shift))
#ifdef __GNUC__
#define TYPEOF(x) (__typeof__(x))
#else
#define TYPEOF(x)
#endif
#define MSB(x, bits) ((x) & TYPEOF(x)(~0ULL << (bitsizeof(x) - (bits))))
#define HAS_MULTI_BITS(i) ((i) & ((i) - 1)) /* checks if an integer has more than 1 bit set */
#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
/* Approximation of the length of the decimal representation of this type. */
#define decimal_length(x) ((int)(sizeof(x) * 2.56 + 0.5) + 1)
#ifdef __MINGW64__
#define _POSIX_C_SOURCE 1
#elif defined(__sun__)
/*
* On Solaris, when _XOPEN_EXTENDED is set, its header file
* forces the programs to be XPG4v2, defeating any _XOPEN_SOURCE
* setting to say we are XPG5 or XPG6. Also on Solaris,
* XPG6 programs must be compiled with a c99 compiler, while
* non XPG6 programs must be compiled with a pre-c99 compiler.
*/
# if __STDC_VERSION__ - 0 >= 199901L
# define _XOPEN_SOURCE 600
# else
# define _XOPEN_SOURCE 500
# endif
#elif !defined(__APPLE__) && !defined(__FreeBSD__) && !defined(__USLC__) && \
!defined(_M_UNIX) && !defined(__sgi) && !defined(__DragonFly__) && \
!defined(__TANDEM) && !defined(__QNX__) && !defined(__MirBSD__) && \
!defined(__CYGWIN__)
#define _XOPEN_SOURCE 600 /* glibc2 and AIX 5.3L need 500, OpenBSD needs 600 for S_ISLNK() */
#define _XOPEN_SOURCE_EXTENDED 1 /* AIX 5.3L needs this */
#endif
#define _ALL_SOURCE 1
#define _GNU_SOURCE 1
#define _BSD_SOURCE 1
#define _DEFAULT_SOURCE 1
#define _NETBSD_SOURCE 1
#define _SGI_SOURCE 1
#if defined(WIN32) && !defined(__CYGWIN__) /* Both MinGW and MSVC */
# if !defined(_WIN32_WINNT)
# define _WIN32_WINNT 0x0600
# endif
#define WIN32_LEAN_AND_MEAN /* stops windows.h including winsock.h */
#include <winsock2.h>
#ifndef NO_UNIX_SOCKETS
#include <afunix.h>
#endif
#include <windows.h>
#define GIT_WINDOWS_NATIVE
wrapper: add a helper to generate numbers from a CSPRNG There are many situations in which having access to a cryptographically secure pseudorandom number generator (CSPRNG) is helpful. In the future, we'll encounter one of these when dealing with temporary files. To make this possible, let's add a function which reads from a system CSPRNG and returns some bytes. We know that all systems will have such an interface. A CSPRNG is required for a secure TLS or SSH implementation and a Git implementation which provided neither would be of little practical use. In addition, POSIX is set to standardize getentropy(2) in the next version, so in the (potentially distant) future we can rely on that. For systems which lack one of the other interfaces, we provide the ability to use OpenSSL's CSPRNG. OpenSSL is highly portable and functions on practically every known OS, and we know it will have access to some source of cryptographically secure randomness. We also provide support for the arc4random in libbsd for folks who would prefer to use that. Because this is a security sensitive interface, we take some precautions. We either succeed by filling the buffer completely as we requested, or we fail. We don't return partial data because the caller will almost never find that to be a useful behavior. Specify a makefile knob which users can use to specify one or more suitable CSPRNGs, and turn the multiple string options into a set of defines, since we cannot match on strings in the preprocessor. We allow multiple options to make the job of handling this in autoconf easier. The order of options is important here. On systems with arc4random, which is most of the BSDs, we use that, since, except on MirBSD and macOS, it uses ChaCha20, which is extremely fast, and sits entirely in userspace, avoiding a system call. We then prefer getrandom over getentropy, because the former has been available longer on Linux, and then OpenSSL. Finally, if none of those are available, we use /dev/urandom, because most Unix-like operating systems provide that API. We prefer options that don't involve device files when possible because those work in some restricted environments where device files may not be available. Set the configuration variables appropriately for Linux and the BSDs, including macOS, as well as Windows and NonStop. We specifically only consider versions which receive publicly available security support here. For the same reason, we don't specify getrandom(2) on Linux, because CentOS 7 doesn't support it in glibc (although its kernel does) and we don't want to resort to making syscalls. Finally, add a test helper to allow this to be tested by hand and in tests. We don't add any tests, since invoking the CSPRNG is not likely to produce interesting, reproducible results. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
10 months ago
#ifdef HAVE_RTLGENRANDOM
/* This is required to get access to RtlGenRandom. */
#define SystemFunction036 NTAPI SystemFunction036
#include <NTSecAPI.h>
#undef SystemFunction036
#endif
#endif
#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#ifdef HAVE_STRINGS_H
#include <strings.h> /* for strcasecmp() */
#endif
#include <errno.h>
#include <limits.h>
#ifdef NEEDS_SYS_PARAM_H
#include <sys/param.h>
#endif
#include <sys/types.h>
#include <dirent.h>
#include <sys/time.h>
#include <time.h>
#include <signal.h>
#include <assert.h>
#include <regex.h>
#include <utime.h>
#include <syslog.h>
#if !defined(NO_POLL_H)
#include <poll.h>
#elif !defined(NO_SYS_POLL_H)
#include <sys/poll.h>
#else
/* Pull the compat stuff */
#include <poll.h>
#endif
#ifdef HAVE_BSD_SYSCTL
#include <sys/sysctl.h>
#endif
#if defined(__CYGWIN__)
#include "compat/win32/path-utils.h"
#endif
#if defined(__MINGW32__)
/* pull in Windows compatibility stuff */
#include "compat/win32/path-utils.h"
#include "compat/mingw.h"
#elif defined(_MSC_VER)
#include "compat/win32/path-utils.h"
#include "compat/msvc.h"
#else
#include <sys/utsname.h>
#include <sys/wait.h>
#include <sys/resource.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <termios.h>
#ifndef NO_SYS_SELECT_H
#include <sys/select.h>
#endif
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <pwd.h>
#include <sys/un.h>
#ifndef NO_INTTYPES_H
#include <inttypes.h>
#else
#include <stdint.h>
#endif
wrapper: add a helper to generate numbers from a CSPRNG There are many situations in which having access to a cryptographically secure pseudorandom number generator (CSPRNG) is helpful. In the future, we'll encounter one of these when dealing with temporary files. To make this possible, let's add a function which reads from a system CSPRNG and returns some bytes. We know that all systems will have such an interface. A CSPRNG is required for a secure TLS or SSH implementation and a Git implementation which provided neither would be of little practical use. In addition, POSIX is set to standardize getentropy(2) in the next version, so in the (potentially distant) future we can rely on that. For systems which lack one of the other interfaces, we provide the ability to use OpenSSL's CSPRNG. OpenSSL is highly portable and functions on practically every known OS, and we know it will have access to some source of cryptographically secure randomness. We also provide support for the arc4random in libbsd for folks who would prefer to use that. Because this is a security sensitive interface, we take some precautions. We either succeed by filling the buffer completely as we requested, or we fail. We don't return partial data because the caller will almost never find that to be a useful behavior. Specify a makefile knob which users can use to specify one or more suitable CSPRNGs, and turn the multiple string options into a set of defines, since we cannot match on strings in the preprocessor. We allow multiple options to make the job of handling this in autoconf easier. The order of options is important here. On systems with arc4random, which is most of the BSDs, we use that, since, except on MirBSD and macOS, it uses ChaCha20, which is extremely fast, and sits entirely in userspace, avoiding a system call. We then prefer getrandom over getentropy, because the former has been available longer on Linux, and then OpenSSL. Finally, if none of those are available, we use /dev/urandom, because most Unix-like operating systems provide that API. We prefer options that don't involve device files when possible because those work in some restricted environments where device files may not be available. Set the configuration variables appropriately for Linux and the BSDs, including macOS, as well as Windows and NonStop. We specifically only consider versions which receive publicly available security support here. For the same reason, we don't specify getrandom(2) on Linux, because CentOS 7 doesn't support it in glibc (although its kernel does) and we don't want to resort to making syscalls. Finally, add a test helper to allow this to be tested by hand and in tests. We don't add any tests, since invoking the CSPRNG is not likely to produce interesting, reproducible results. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
10 months ago
#ifdef HAVE_ARC4RANDOM_LIBBSD
#include <bsd/stdlib.h>
#endif
#ifdef HAVE_GETRANDOM
#include <sys/random.h>
#endif
#ifdef NO_INTPTR_T
/*
* On I16LP32, ILP32 and LP64 "long" is the safe bet, however
* on LLP86, IL33LLP64 and P64 it needs to be "long long",
* while on IP16 and IP16L32 it is "int" (resp. "short")
* Size needs to match (or exceed) 'sizeof(void *)'.
* We can't take "long long" here as not everybody has it.
*/
typedef long intptr_t;
typedef unsigned long uintptr_t;
#endif
#undef _ALL_SOURCE /* AIX 5.3L defines a struct list with _ALL_SOURCE. */
#include <grp.h>
#define _ALL_SOURCE 1
#endif
git on Mac OS and precomposed unicode Mac OS X mangles file names containing unicode on file systems HFS+, VFAT or SAMBA. When a file using unicode code points outside ASCII is created on a HFS+ drive, the file name is converted into decomposed unicode and written to disk. No conversion is done if the file name is already decomposed unicode. Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same result as open("\x41\xcc\x88",...) with a decomposed "Ä". As a consequence, readdir() returns the file names in decomposed unicode, even if the user expects precomposed unicode. Unlike on HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in precomposed unicode, but readdir() still returns file names in decomposed unicode. When a git repository is stored on a network share using SAMBA, file names are send over the wire and written to disk on the remote system in precomposed unicode, but Mac OS X readdir() returns decomposed unicode to be compatible with its behaviour on HFS+ and VFAT. The unicode decomposition causes many problems: - The names "git add" and other commands get from the end user may often be precomposed form (the decomposed form is not easily input from the keyboard), but when the commands read from the filesystem to see what it is going to update the index with already is on the filesystem, readdir() will give decomposed form, which is different. - Similarly "git log", "git mv" and all other commands that need to compare pathnames found on the command line (often but not always precomposed form; a command line input resulting from globbing may be in decomposed) with pathnames found in the tree objects (should be precomposed form to be compatible with other systems and for consistency in general). - The same for names stored in the index, which should be precomposed, that may need to be compared with the names read from readdir(). NFS mounted from Linux is fully transparent and does not suffer from the above. As Mac OS X treats precomposed and decomposed file names as equal, we can - wrap readdir() on Mac OS X to return the precomposed form, and - normalize decomposed form given from the command line also to the precomposed form, to ensure that all pathnames used in Git are always in the precomposed form. This behaviour can be requested by setting "core.precomposedunicode" configuration variable to true. The code in compat/precomposed_utf8.c implements basically 4 new functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(), precomposed_utf8_closedir() and precompose_argv(). The first three are to wrap opendir(3), readdir(3), and closedir(3) functions. The argv[] conversion allows to use the TAB filename completion done by the shell on command line. It tolerates other tools which use readdir() to feed decomposed file names into git. When creating a new git repository with "git init" or "git clone", "core.precomposedunicode" will be set "false". The user needs to activate this feature manually. She typically sets core.precomposedunicode to "true" on HFS and VFAT, or file systems mounted via SAMBA. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
/* used on Mac OS X */
#ifdef PRECOMPOSE_UNICODE
#include "compat/precompose_utf8.h"
#else
MacOS: precompose_argv_prefix() The following sequence leads to a "BUG" assertion running under MacOS: DIR=git-test-restore-p Adiarnfd=$(printf 'A\314\210') DIRNAME=xx${Adiarnfd}yy mkdir $DIR && cd $DIR && git init && mkdir $DIRNAME && cd $DIRNAME && echo "Initial" >file && git add file && echo "One more line" >>file && echo y | git restore -p . Initialized empty Git repository in /tmp/git-test-restore-p/.git/ BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-index --cached --numstat [snip] The command `git restore` is run from a directory inside a Git repo. Git needs to split the $CWD into 2 parts: The path to the repo and "the rest", if any. "The rest" becomes a "prefix" later used inside the pathspec code. As an example, "/path/to/repo/dir-inside-repå" would determine "/path/to/repo" as the root of the repo, the place where the configuration file .git/config is found. The rest becomes the prefix ("dir-inside-repå"), from where the pathspec machinery expands the ".", more about this later. If there is a decomposed form, (making the decomposing visible like this), "dir-inside-rep°a" doesn't match "dir-inside-repå". Git commands need to: (a) read the configuration variable "core.precomposeunicode" (b) precocompose argv[] (c) precompose the prefix, if there was any The first commit, 76759c7dff53 "git on Mac OS and precomposed unicode" addressed (a) and (b). The call to precompose_argv() was added into parse-options.c, because that seemed to be a good place when the patch was written. Commands that don't use parse-options need to do (a) and (b) themselfs. The commands `diff-files`, `diff-index`, `diff-tree` and `diff` learned (a) and (b) in commit 90a78b83e0b8 "diff: run arguments through precompose_argv" Branch names (or refs in general) using decomposed code points resulting in decomposed file names had been fixed in commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places" The bug report from above shows 2 things: - more commands need to handle precomposed unicode - (c) should be implemented for all commands using pathspecs Solution: precompose_argv() now handles the prefix (if needed), and is renamed into precompose_argv_prefix(). Inside this function the config variable core.precomposeunicode is read into the global variable precomposed_unicode, as before. This reading is skipped if precomposed_unicode had been read before. The original patch for preocomposed unicode, 76759c7dff53, placed precompose_argv() into parse-options.c Now add it into git.c::run_builtin() as well. Existing precompose calls in diff-files.c and others may become redundant, and if we audit the callflows that reach these places to make sure that they can never be reached without going through the new call added to run_builtin(), we might be able to remove these existing ones. But in this commit, we do not bother to do so and leave these precompose callsites as they are. Because precompose() is idempotent and can be called on an already precomposed string safely, this is safer than removing existing calls without fully vetting the callflows. There is certainly room for cleanups - this change intends to be a bug fix. Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should be done in future commits. [1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) [2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/ Reported-by: Daniel Troger <random_n0body@icloud.com> Helped-By: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
static inline const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
{
MacOS: precompose_argv_prefix() The following sequence leads to a "BUG" assertion running under MacOS: DIR=git-test-restore-p Adiarnfd=$(printf 'A\314\210') DIRNAME=xx${Adiarnfd}yy mkdir $DIR && cd $DIR && git init && mkdir $DIRNAME && cd $DIRNAME && echo "Initial" >file && git add file && echo "One more line" >>file && echo y | git restore -p . Initialized empty Git repository in /tmp/git-test-restore-p/.git/ BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-index --cached --numstat [snip] The command `git restore` is run from a directory inside a Git repo. Git needs to split the $CWD into 2 parts: The path to the repo and "the rest", if any. "The rest" becomes a "prefix" later used inside the pathspec code. As an example, "/path/to/repo/dir-inside-repå" would determine "/path/to/repo" as the root of the repo, the place where the configuration file .git/config is found. The rest becomes the prefix ("dir-inside-repå"), from where the pathspec machinery expands the ".", more about this later. If there is a decomposed form, (making the decomposing visible like this), "dir-inside-rep°a" doesn't match "dir-inside-repå". Git commands need to: (a) read the configuration variable "core.precomposeunicode" (b) precocompose argv[] (c) precompose the prefix, if there was any The first commit, 76759c7dff53 "git on Mac OS and precomposed unicode" addressed (a) and (b). The call to precompose_argv() was added into parse-options.c, because that seemed to be a good place when the patch was written. Commands that don't use parse-options need to do (a) and (b) themselfs. The commands `diff-files`, `diff-index`, `diff-tree` and `diff` learned (a) and (b) in commit 90a78b83e0b8 "diff: run arguments through precompose_argv" Branch names (or refs in general) using decomposed code points resulting in decomposed file names had been fixed in commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places" The bug report from above shows 2 things: - more commands need to handle precomposed unicode - (c) should be implemented for all commands using pathspecs Solution: precompose_argv() now handles the prefix (if needed), and is renamed into precompose_argv_prefix(). Inside this function the config variable core.precomposeunicode is read into the global variable precomposed_unicode, as before. This reading is skipped if precomposed_unicode had been read before. The original patch for preocomposed unicode, 76759c7dff53, placed precompose_argv() into parse-options.c Now add it into git.c::run_builtin() as well. Existing precompose calls in diff-files.c and others may become redundant, and if we audit the callflows that reach these places to make sure that they can never be reached without going through the new call added to run_builtin(), we might be able to remove these existing ones. But in this commit, we do not bother to do so and leave these precompose callsites as they are. Because precompose() is idempotent and can be called on an already precomposed string safely, this is safer than removing existing calls without fully vetting the callflows. There is certainly room for cleanups - this change intends to be a bug fix. Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should be done in future commits. [1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) [2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/ Reported-by: Daniel Troger <random_n0body@icloud.com> Helped-By: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2 years ago
return prefix;
}
static inline const char *precompose_string_if_needed(const char *in)
{
return in;
}
#define probe_utf8_pathname_composition()
git on Mac OS and precomposed unicode Mac OS X mangles file names containing unicode on file systems HFS+, VFAT or SAMBA. When a file using unicode code points outside ASCII is created on a HFS+ drive, the file name is converted into decomposed unicode and written to disk. No conversion is done if the file name is already decomposed unicode. Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same result as open("\x41\xcc\x88",...) with a decomposed "Ä". As a consequence, readdir() returns the file names in decomposed unicode, even if the user expects precomposed unicode. Unlike on HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in precomposed unicode, but readdir() still returns file names in decomposed unicode. When a git repository is stored on a network share using SAMBA, file names are send over the wire and written to disk on the remote system in precomposed unicode, but Mac OS X readdir() returns decomposed unicode to be compatible with its behaviour on HFS+ and VFAT. The unicode decomposition causes many problems: - The names "git add" and other commands get from the end user may often be precomposed form (the decomposed form is not easily input from the keyboard), but when the commands read from the filesystem to see what it is going to update the index with already is on the filesystem, readdir() will give decomposed form, which is different. - Similarly "git log", "git mv" and all other commands that need to compare pathnames found on the command line (often but not always precomposed form; a command line input resulting from globbing may be in decomposed) with pathnames found in the tree objects (should be precomposed form to be compatible with other systems and for consistency in general). - The same for names stored in the index, which should be precomposed, that may need to be compared with the names read from readdir(). NFS mounted from Linux is fully transparent and does not suffer from the above. As Mac OS X treats precomposed and decomposed file names as equal, we can - wrap readdir() on Mac OS X to return the precomposed form, and - normalize decomposed form given from the command line also to the precomposed form, to ensure that all pathnames used in Git are always in the precomposed form. This behaviour can be requested by setting "core.precomposedunicode" configuration variable to true. The code in compat/precomposed_utf8.c implements basically 4 new functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(), precomposed_utf8_closedir() and precompose_argv(). The first three are to wrap opendir(3), readdir(3), and closedir(3) functions. The argv[] conversion allows to use the TAB filename completion done by the shell on command line. It tolerates other tools which use readdir() to feed decomposed file names into git. When creating a new git repository with "git init" or "git clone", "core.precomposedunicode" will be set "false". The user needs to activate this feature manually. She typically sets core.precomposedunicode to "true" on HFS and VFAT, or file systems mounted via SAMBA. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
11 years ago
#endif
#ifdef MKDIR_WO_TRAILING_SLASH
#define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
int compat_mkdir_wo_trailing_slash(const char*, mode_t);
#endif
#ifdef NO_STRUCT_ITIMERVAL
struct itimerval {
struct timeval it_interval;
struct timeval it_value;
};
#endif
#ifdef NO_SETITIMER
static inline int setitimer(int which, const struct itimerval *value, struct itimerval *newvalue) {
return 0; /* pretend success */
}
#endif
#ifndef NO_LIBGEN_H
#include <libgen.h>
#else
#define basename gitbasename
char *gitbasename(char *);
#define dirname gitdirname
char *gitdirname(char *);
#endif
#ifndef NO_ICONV
#include <iconv.h>
#endif
#ifndef NO_OPENSSL
#ifdef __APPLE__
git-compat-util: suppress unavoidable Apple-specific deprecation warnings With the release of Mac OS X 10.7 in July 2011, Apple deprecated all openssl.h functionality due to OpenSSL ABI (application binary interface) instability, resulting in an explosion of compilation warnings about deprecated SSL, SHA1, and X509 functions (among others). 61067954ce (cache.h: eliminate SHA-1 deprecation warnings on Mac OS X; 2013-05-19) and be4c828b76 (imap-send: eliminate HMAC deprecation warnings on Mac OS X; 2013-05-19) attempted to ameliorate the situation by taking advantage of drop-in replacement functionality provided by Apple's (ABI-stable) CommonCrypto facility, however CommonCrypto supplies only a subset of deprecated OpenSSL functionality, thus a host of warnings remain. Despite this shortcoming, it was hoped that Apple would ultimately provide CommonCrypto replacements for all deprecated OpenSSL functionality, and that the effort started by 61067954ce and be4c828b76 would be continued and eventually eliminate all deprecation warnings. However, now 3.5 years later, and with Mac OS X at 10.10, the hoped-for CommonCrypto replacements have not yet materialized, nor is there any indication that they will be forthcoming. These Apple-specific warnings are pure noise: they don't tell us anything useful and we have no control over them, nor is Apple likely to provide replacements any time soon. Such noise may obscure other legitimate warnings, therefore silence them. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
8 years ago
#define __AVAILABILITY_MACROS_USES_AVAILABILITY 0
#include <AvailabilityMacros.h>
#undef DEPRECATED_ATTRIBUTE
#define DEPRECATED_ATTRIBUTE
#undef __AVAILABILITY_MACROS_USES_AVAILABILITY
#endif
#include <openssl/ssl.h>
#include <openssl/err.h>
#endif
#ifdef HAVE_SYSINFO
# include <sys/sysinfo.h>
#endif
/* On most systems <netdb.h> would have given us this, but
* not on some systems (e.g. z/OS).
*/
#ifndef NI_MAXHOST
#define NI_MAXHOST 1025
#endif
#ifndef NI_MAXSERV
#define NI_MAXSERV 32
#endif
/* On most systems <limits.h> would have given us this, but
* not on some systems (e.g. GNU/Hurd).
*/
#ifndef PATH_MAX
#define PATH_MAX 4096
#endif
typedef uintmax_t timestamp_t;
#define PRItime PRIuMAX
#define parse_timestamp strtoumax
#define TIME_MAX UINTMAX_MAX
#define TIME_MIN 0
#ifndef PATH_SEP
#define PATH_SEP ':'
#endif
#ifdef HAVE_PATHS_H
#include <paths.h>
#endif
#ifndef _PATH_DEFPATH
#define _PATH_DEFPATH "/usr/local/bin:/usr/bin:/bin"
#endif
#ifndef platform_core_config
static inline int noop_core_config(const char *var, const char *value, void *cb)
{
return 0;
}
#define platform_core_config noop_core_config
#endif
checkout: fix bug that makes checkout follow symlinks in leading path Before checking out a file, we have to confirm that all of its leading components are real existing directories. And to reduce the number of lstat() calls in this process, we cache the last leading path known to contain only directories. However, when a path collision occurs (e.g. when checking out case-sensitive files in case-insensitive file systems), a cached path might have its file type changed on disk, leaving the cache on an invalid state. Normally, this doesn't bring any bad consequences as we usually check out files in index order, and therefore, by the time the cached path becomes outdated, we no longer need it anyway (because all files in that directory would have already been written). But, there are some users of the checkout machinery that do not always follow the index order. In particular: checkout-index writes the paths in the same order that they appear on the CLI (or stdin); and the delayed checkout feature -- used when a long-running filter process replies with "status=delayed" -- postpones the checkout of some entries, thus modifying the checkout order. When we have to check out an out-of-order entry and the lstat() cache is invalid (due to a previous path collision), checkout_entry() may end up using the invalid data and thrusting that the leading components are real directories when, in reality, they are not. In the best case scenario, where the directory was replaced by a regular file, the user will get an error: "fatal: unable to create file 'foo/bar': Not a directory". But if the directory was replaced by a symlink, checkout could actually end up following the symlink and writing the file at a wrong place, even outside the repository. Since delayed checkout is affected by this bug, it could be used by an attacker to write arbitrary files during the clone of a maliciously crafted repository. Some candidate solutions considered were to disable the lstat() cache during unordered checkouts or sort the entries before passing them to the checkout machinery. But both ideas include some performance penalty and they don't future-proof the code against new unordered use cases. Instead, we now manually reset the lstat cache whenever we successfully remove a directory. Note: We are not even checking whether the directory was the same as the lstat cache points to because we might face a scenario where the paths refer to the same location but differ due to case folding, precomposed UTF-8 issues, or the presence of `..` components in the path. Two regression tests, with case-collisions and utf8-collisions, are also added for both checkout-index and delayed checkout. Note: to make the previously mentioned clone attack unfeasible, it would be sufficient to reset the lstat cache only after the remove_subtree() call inside checkout_entry(). This is the place where we would remove a directory whose path collides with the path of another entry that we are currently trying to check out (possibly a symlink). However, in the interest of a thorough fix that does not leave Git open to similar-but-not-identical attack vectors, we decided to intercept all `rmdir()` calls in one fell swoop. This addresses CVE-2021-21300. Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
2 years ago
int lstat_cache_aware_rmdir(const char *path);
#if !defined(__MINGW32__) && !defined(_MSC_VER)
#define rmdir lstat_cache_aware_rmdir
#endif
#ifndef has_dos_drive_prefix
static inline int git_has_dos_drive_prefix(const char *path)
{
return 0;
}
#define has_dos_drive_prefix git_has_dos_drive_prefix
#endif
#ifndef skip_dos_drive_prefix
static inline int git_skip_dos_drive_prefix(char **path)
{
return 0;
}
#define skip_dos_drive_prefix git_skip_dos_drive_prefix
#endif
#ifndef is_dir_sep
static inline int git_is_dir_sep(int c)
{
return c == '/';
}
#define is_dir_sep git_is_dir_sep
#endif
#ifndef offset_1st_component
static inline int git_offset_1st_component(const char *path)
{
return is_dir_sep(path[0]);
}
#define offset_1st_component git_offset_1st_component
#endif
mingw: refuse to access paths with trailing spaces or periods When creating a directory on Windows whose path ends in a space or a period (or chains thereof), the Win32 API "helpfully" trims those. For example, `mkdir("abc ");` will return success, but actually create a directory called `abc` instead. This stems back to the DOS days, when all file names had exactly 8 characters plus exactly 3 characters for the file extension, and the only way to have shorter names was by padding with spaces. Sadly, this "helpful" behavior is a bit inconsistent: after a successful `mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because the directory `abc ` does not actually exist). Even if it would work, we now have a serious problem because a Git repository could contain directories `abc` and `abc `, and on Windows, they would be "merged" unintentionally. As these paths are illegal on Windows, anyway, let's disallow any accesses to such paths on that Operating System. For practical reasons, this behavior is still guarded by the config setting `core.protectNTFS`: it is possible (and at least two regression tests make use of it) to create commits without involving the worktree. In such a scenario, it is of course possible -- even on Windows -- to create such file names. Among other consequences, this patch disallows submodules' paths to end in spaces on Windows (which would formerly have confused Git enough to try to write into incorrect paths, anyway). While this patch does not fix a vulnerability on its own, it prevents an attack vector that was exploited in demonstrations of a number of recently-fixed security bugs. The regression test added to `t/t7417-submodule-path-url.sh` reflects that attack vector. Note that we have to adjust the test case "prevent git~1 squatting on Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue. It tries to clone two submodules whose names differ only in a trailing period character, and as a consequence their git directories differ in the same way. Previously, when Git tried to clone the second submodule, it thought that the git directory already existed (because on Windows, when you create a directory with the name `b.` it actually creates `b`), but with this patch, the first submodule's clone will fail because of the illegal name of the git directory. Therefore, when cloning the second submodule, Git will take a different code path: a fresh clone (without an existing git directory). Both code paths fail to clone the second submodule, both because the the corresponding worktree directory exists and is not empty, but the error messages are worded differently. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
3 years ago
#ifndef is_valid_path
#define is_valid_path(path) 1
#endif
#ifndef find_last_dir_sep
static inline char *git_find_last_dir_sep(const char *path)
{
return strrchr(path, '/');
}
#define find_last_dir_sep git_find_last_dir_sep
#endif
#ifndef has_dir_sep
static inline int git_has_dir_sep(const char *path)
{
return !!strchr(path, '/');
}
#define has_dir_sep(path) git_has_dir_sep(path)
#endif
#ifndef query_user_email
#define query_user_email() NULL
#endif
#ifdef __TANDEM
#include <floss.h(floss_execl,floss_execlp,floss_execv,floss_execvp)>
#include <floss.h(floss_getpwuid)>
#ifndef NSIG
/*
* NonStop NSE and NSX do not provide NSIG. SIGGUARDIAN(99) is the highest
* known, by detective work using kill -l as a list is all signals
* instead of signal.h where it should be.
*/
# define NSIG 100
#endif
#endif
#if defined(__HP_cc) && (__HP_cc >= 61000)
#define NORETURN __attribute__((noreturn))
#define NORETURN_PTR
#elif defined(__GNUC__) && !defined(NO_NORETURN)
#define NORETURN __attribute__((__noreturn__))
#define NORETURN_PTR __attribute__((__noreturn__))
#elif defined(_MSC_VER)
#define NORETURN __declspec(noreturn)
#define NORETURN_PTR
#else
#define NORETURN
#define NORETURN_PTR
#ifndef __GNUC__
#ifndef __attribute__
#define __attribute__(x)
#endif
#endif
#endif
/* The sentinel attribute is valid from gcc version 4.0 */
#if defined(__GNUC__) && (__GNUC__ >= 4)
#define LAST_ARG_MUST_BE_NULL __attribute__((sentinel))
#else
#define LAST_ARG_MUST_BE_NULL
#endif
#define MAYBE_UNUSED __attribute__((__unused__))
#include "compat/bswap.h"
#include "wildmatch.h"
struct strbuf;
/* General helper functions */
NORETURN void usage(const char *err);
NORETURN void usagef(const char *err, ...) __attribute__((format (printf, 1, 2)));
NORETURN void die(const char *err, ...) __attribute__((format