Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
#include "cache.h"
|
2017-06-14 20:07:36 +02:00
|
|
|
#include "config.h"
|
2018-05-16 01:42:15 +02:00
|
|
|
#include "object-store.h"
|
2007-04-13 07:30:05 +02:00
|
|
|
#include "attr.h"
|
2007-04-22 04:09:02 +02:00
|
|
|
#include "run-command.h"
|
2010-12-22 15:40:13 +01:00
|
|
|
#include "quote.h"
|
2012-02-20 21:53:37 +01:00
|
|
|
#include "sigchain.h"
|
2016-10-17 01:20:37 +02:00
|
|
|
#include "pkt-line.h"
|
2017-05-05 17:28:01 +02:00
|
|
|
#include "sub-process.h"
|
2018-04-15 20:16:07 +02:00
|
|
|
#include "utf8.h"
|
2019-09-03 00:39:44 +02:00
|
|
|
#include "ll-merge.h"
|
2007-04-13 07:30:05 +02:00
|
|
|
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
/*
|
|
|
|
* convert.c - convert a file when checking it out and checking it in.
|
|
|
|
*
|
|
|
|
* This should use the pathname to decide on whether it wants to do some
|
|
|
|
* more interesting conversions (automatic gzip/unzip, general format
|
|
|
|
* conversions etc etc), but by default it just does automatic CRLF<->LF
|
2010-06-04 21:29:08 +02:00
|
|
|
* translation when the "text" attribute or "auto_crlf" option is set.
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
*/
|
|
|
|
|
2016-01-16 07:50:02 +01:00
|
|
|
/* Stat bits: When BIN is set, the txt bits are unset */
|
|
|
|
#define CONVERT_STAT_BITS_TXT_LF 0x1
|
|
|
|
#define CONVERT_STAT_BITS_TXT_CRLF 0x2
|
|
|
|
#define CONVERT_STAT_BITS_BIN 0x4
|
|
|
|
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
struct text_stat {
|
2008-01-16 02:59:12 +01:00
|
|
|
/* NUL, CR, LF and CRLF counts */
|
2016-02-10 17:24:43 +01:00
|
|
|
unsigned nul, lonecr, lonelf, crlf;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
|
|
|
|
/* These are just approximations! */
|
|
|
|
unsigned printable, nonprintable;
|
|
|
|
};
|
|
|
|
|
|
|
|
static void gather_stats(const char *buf, unsigned long size, struct text_stat *stats)
|
|
|
|
{
|
|
|
|
unsigned long i;
|
|
|
|
|
|
|
|
memset(stats, 0, sizeof(*stats));
|
|
|
|
|
|
|
|
for (i = 0; i < size; i++) {
|
|
|
|
unsigned char c = buf[i];
|
|
|
|
if (c == '\r') {
|
2016-02-10 17:24:43 +01:00
|
|
|
if (i+1 < size && buf[i+1] == '\n') {
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
stats->crlf++;
|
2016-02-10 17:24:43 +01:00
|
|
|
i++;
|
|
|
|
} else
|
|
|
|
stats->lonecr++;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (c == '\n') {
|
2016-02-10 17:24:43 +01:00
|
|
|
stats->lonelf++;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
if (c == 127)
|
|
|
|
/* DEL */
|
|
|
|
stats->nonprintable++;
|
|
|
|
else if (c < 32) {
|
|
|
|
switch (c) {
|
|
|
|
/* BS, HT, ESC and FF */
|
|
|
|
case '\b': case '\t': case '\033': case '\014':
|
|
|
|
stats->printable++;
|
|
|
|
break;
|
2008-01-16 02:59:12 +01:00
|
|
|
case 0:
|
|
|
|
stats->nul++;
|
|
|
|
/* fall through */
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
default:
|
|
|
|
stats->nonprintable++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else
|
|
|
|
stats->printable++;
|
|
|
|
}
|
2008-07-11 18:48:16 +02:00
|
|
|
|
|
|
|
/* If file ends with EOF then don't count this EOF as non-printable. */
|
|
|
|
if (size >= 1 && buf[size-1] == '\032')
|
|
|
|
stats->nonprintable--;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The same heuristics as diff.c::mmfile_is_binary()
|
2016-01-16 07:50:02 +01:00
|
|
|
* We treat files with bare CR as binary
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
*/
|
2019-01-24 14:12:41 +01:00
|
|
|
static int convert_is_binary(const struct text_stat *stats)
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
{
|
2016-02-10 17:24:43 +01:00
|
|
|
if (stats->lonecr)
|
2016-01-16 07:50:02 +01:00
|
|
|
return 1;
|
2008-01-16 02:59:12 +01:00
|
|
|
if (stats->nul)
|
|
|
|
return 1;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
if ((stats->printable >> 7) < stats->nonprintable)
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-01-16 07:50:02 +01:00
|
|
|
static unsigned int gather_convert_stats(const char *data, unsigned long size)
|
|
|
|
{
|
|
|
|
struct text_stat stats;
|
2016-02-10 17:24:43 +01:00
|
|
|
int ret = 0;
|
2016-01-16 07:50:02 +01:00
|
|
|
if (!data || !size)
|
|
|
|
return 0;
|
|
|
|
gather_stats(data, size, &stats);
|
2019-01-24 14:12:41 +01:00
|
|
|
if (convert_is_binary(&stats))
|
2016-02-10 17:24:43 +01:00
|
|
|
ret |= CONVERT_STAT_BITS_BIN;
|
|
|
|
if (stats.crlf)
|
|
|
|
ret |= CONVERT_STAT_BITS_TXT_CRLF;
|
|
|
|
if (stats.lonelf)
|
|
|
|
ret |= CONVERT_STAT_BITS_TXT_LF;
|
|
|
|
|
|
|
|
return ret;
|
2016-01-16 07:50:02 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static const char *gather_convert_stats_ascii(const char *data, unsigned long size)
|
|
|
|
{
|
|
|
|
unsigned int convert_stats = gather_convert_stats(data, size);
|
|
|
|
|
|
|
|
if (convert_stats & CONVERT_STAT_BITS_BIN)
|
|
|
|
return "-text";
|
|
|
|
switch (convert_stats) {
|
|
|
|
case CONVERT_STAT_BITS_TXT_LF:
|
|
|
|
return "lf";
|
|
|
|
case CONVERT_STAT_BITS_TXT_CRLF:
|
|
|
|
return "crlf";
|
|
|
|
case CONVERT_STAT_BITS_TXT_LF | CONVERT_STAT_BITS_TXT_CRLF:
|
|
|
|
return "mixed";
|
|
|
|
default:
|
|
|
|
return "none";
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-04-01 03:49:39 +02:00
|
|
|
const char *get_cached_convert_stats_ascii(struct index_state *istate,
|
2017-06-13 00:13:52 +02:00
|
|
|
const char *path)
|
2016-01-16 07:50:02 +01:00
|
|
|
{
|
|
|
|
const char *ret;
|
|
|
|
unsigned long sz;
|
2017-06-13 00:13:52 +02:00
|
|
|
void *data = read_blob_data_from_index(istate, path, &sz);
|
2016-01-16 07:50:02 +01:00
|
|
|
ret = gather_convert_stats_ascii(data, sz);
|
|
|
|
free(data);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
const char *get_wt_convert_stats_ascii(const char *path)
|
|
|
|
{
|
|
|
|
const char *ret = "";
|
|
|
|
struct strbuf sb = STRBUF_INIT;
|
|
|
|
if (strbuf_read_file(&sb, path, 0) >= 0)
|
|
|
|
ret = gather_convert_stats_ascii(sb.buf, sb.len);
|
|
|
|
strbuf_release(&sb);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-02-05 17:13:25 +01:00
|
|
|
static int text_eol_is_crlf(void)
|
|
|
|
{
|
|
|
|
if (auto_crlf == AUTO_CRLF_TRUE)
|
|
|
|
return 1;
|
|
|
|
else if (auto_crlf == AUTO_CRLF_INPUT)
|
|
|
|
return 0;
|
|
|
|
if (core_eol == EOL_CRLF)
|
|
|
|
return 1;
|
|
|
|
if (core_eol == EOL_UNSET && EOL_NATIVE == EOL_CRLF)
|
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-12-16 15:50:30 +01:00
|
|
|
static enum eol output_eol(enum convert_crlf_action crlf_action)
|
2010-07-02 21:20:47 +02:00
|
|
|
{
|
2011-05-09 22:12:57 +02:00
|
|
|
switch (crlf_action) {
|
2010-06-04 21:29:08 +02:00
|
|
|
case CRLF_BINARY:
|
|
|
|
return EOL_UNSET;
|
convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 17:24:41 +01:00
|
|
|
case CRLF_TEXT_CRLF:
|
2010-06-04 21:29:08 +02:00
|
|
|
return EOL_CRLF;
|
convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 17:24:41 +01:00
|
|
|
case CRLF_TEXT_INPUT:
|
2010-06-04 21:29:08 +02:00
|
|
|
return EOL_LF;
|
convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 17:24:41 +01:00
|
|
|
case CRLF_UNDEFINED:
|
|
|
|
case CRLF_AUTO_CRLF:
|
2016-06-28 10:01:13 +02:00
|
|
|
return EOL_CRLF;
|
convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 17:24:41 +01:00
|
|
|
case CRLF_AUTO_INPUT:
|
2016-06-28 10:01:13 +02:00
|
|
|
return EOL_LF;
|
2010-06-04 21:29:08 +02:00
|
|
|
case CRLF_TEXT:
|
|
|
|
case CRLF_AUTO:
|
convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 17:24:41 +01:00
|
|
|
/* fall through */
|
2016-02-05 17:13:25 +01:00
|
|
|
return text_eol_is_crlf() ? EOL_CRLF : EOL_LF;
|
2010-06-04 21:29:08 +02:00
|
|
|
}
|
2018-07-21 09:49:29 +02:00
|
|
|
warning(_("illegal crlf_action %d"), (int)crlf_action);
|
2011-05-09 21:52:12 +02:00
|
|
|
return core_eol;
|
2010-06-04 21:29:08 +02:00
|
|
|
}
|
|
|
|
|
2020-09-30 14:27:53 +02:00
|
|
|
static void check_global_conv_flags_eol(const char *path,
|
2016-08-13 23:29:27 +02:00
|
|
|
struct text_stat *old_stats, struct text_stat *new_stats,
|
2018-01-13 23:49:31 +01:00
|
|
|
int conv_flags)
|
safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 12:25:58 +01:00
|
|
|
{
|
2016-08-13 23:29:27 +02:00
|
|
|
if (old_stats->crlf && !new_stats->crlf ) {
|
safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 12:25:58 +01:00
|
|
|
/*
|
2016-08-13 23:29:27 +02:00
|
|
|
* CRLFs would not be restored by checkout
|
safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 12:25:58 +01:00
|
|
|
*/
|
2018-01-13 23:49:31 +01:00
|
|
|
if (conv_flags & CONV_EOL_RNDTRP_DIE)
|
2018-07-21 09:49:19 +02:00
|
|
|
die(_("CRLF would be replaced by LF in %s"), path);
|
2018-01-13 23:49:31 +01:00
|
|
|
else if (conv_flags & CONV_EOL_RNDTRP_WARN)
|
2016-10-17 15:15:27 +02:00
|
|
|
warning(_("CRLF will be replaced by LF in %s.\n"
|
|
|
|
"The file will have its original line"
|
2018-07-21 09:49:29 +02:00
|
|
|
" endings in your working directory"), path);
|
2016-08-13 23:29:27 +02:00
|
|
|
} else if (old_stats->lonelf && !new_stats->lonelf ) {
|
safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 12:25:58 +01:00
|
|
|
/*
|
2016-08-13 23:29:27 +02:00
|
|
|
* CRLFs would be added by checkout
|
safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 12:25:58 +01:00
|
|
|
*/
|
2018-01-13 23:49:31 +01:00
|
|
|
if (conv_flags & CONV_EOL_RNDTRP_DIE)
|
|
|
|
die(_("LF would be replaced by CRLF in %s"), path);
|
|
|
|
else if (conv_flags & CONV_EOL_RNDTRP_WARN)
|
2016-10-17 15:15:27 +02:00
|
|
|
warning(_("LF will be replaced by CRLF in %s.\n"
|
|
|
|
"The file will have its original line"
|
2018-07-21 09:49:29 +02:00
|
|
|
" endings in your working directory"), path);
|
safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout. A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git. For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.
If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes. Right
after committing you still have the original file in your work
tree and this file is not yet corrupted. You can explicitly tell
git that this file is binary and git will handle the file
appropriately.
Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished. In both cases CRLFs are removed
in an irreversible way. For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.
This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert. The
mechanism is controlled by the variable core.safecrlf, with the
following values:
- false: disable safecrlf mechanism
- warn: warn about irreversible conversions
- true: refuse irreversible conversions
The default is to warn. Users are only affected by this default
if core.autocrlf is set. But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.
The safecrlf mechanism's details depend on the git command. The
general principles when safecrlf is active (not false) are:
- we warn/error out if files in the work tree can modified in an
irreversible way without giving the user a chance to backup the
original file.
- for read-only operations that do not modify files in the work tree
we do not not print annoying warnings.
There are exceptions. Even though...
- "git add" itself does not touch the files in the work tree, the
next checkout would, so the safety triggers;
- "git apply" to update a text file with a patch does touch the files
in the work tree, but the operation is about text files and CRLF
conversion is about fixing the line ending inconsistencies, so the
safety does not trigger;
- "git diff" itself does not touch the files in the work tree, it is
often run to inspect the changes you intend to next "git add". To
catch potential problems early, safety triggers.
The concept of a safety check was originally proposed in a similar
way by Linus Torvalds. Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 12:25:58 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-04-01 03:49:39 +02:00
|
|
|
static int has_crlf_in_index(struct index_state *istate, const char *path)
|
autocrlf: Make it work also for un-normalized repositories
Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.
Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.
The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):
git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)
Previously this would break for any text file containing a CR.
Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.
I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?
Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):
1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.
I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).
I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.
Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-12 00:37:57 +02:00
|
|
|
{
|
|
|
|
unsigned long sz;
|
|
|
|
void *data;
|
2017-11-26 13:20:52 +01:00
|
|
|
const char *crp;
|
|
|
|
int has_crlf = 0;
|
autocrlf: Make it work also for un-normalized repositories
Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.
Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.
The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):
git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)
Previously this would break for any text file containing a CR.
Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.
I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?
Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):
1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.
I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).
I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.
Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-12 00:37:57 +02:00
|
|
|
|
2017-06-13 00:13:53 +02:00
|
|
|
data = read_blob_data_from_index(istate, path, &sz);
|
2013-04-13 15:28:32 +02:00
|
|
|
if (!data)
|
autocrlf: Make it work also for un-normalized repositories
Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.
Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.
The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):
git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)
Previously this would break for any text file containing a CR.
Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.
I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?
Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):
1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.
I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).
I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.
Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-12 00:37:57 +02:00
|
|
|
return 0;
|
2017-11-26 13:20:52 +01:00
|
|
|
|
|
|
|
crp = memchr(data, '\r', sz);
|
|
|
|
if (crp) {
|
|
|
|
unsigned int ret_stats;
|
|
|
|
ret_stats = gather_convert_stats(data, sz);
|
|
|
|
if (!(ret_stats & CONVERT_STAT_BITS_BIN) &&
|
|
|
|
(ret_stats & CONVERT_STAT_BITS_TXT_CRLF))
|
|
|
|
has_crlf = 1;
|
|
|
|
}
|
autocrlf: Make it work also for un-normalized repositories
Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.
Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.
The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):
git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)
Previously this would break for any text file containing a CR.
Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.
I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?
Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):
1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.
I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).
I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.
Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-12 00:37:57 +02:00
|
|
|
free(data);
|
2017-11-26 13:20:52 +01:00
|
|
|
return has_crlf;
|
autocrlf: Make it work also for un-normalized repositories
Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.
Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.
The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):
git checkout <something>
touch *
git add -A . (will add nothing)
git commit (nothing to commit)
Previously this would break for any text file containing a CR.
Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.
I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?
Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):
1. Setting autocrlf globally is a pain since autocrlf does not work well
with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
(the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
the same EOL style.
I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).
I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.
Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-12 00:37:57 +02:00
|
|
|
}
|
|
|
|
|
2019-01-24 14:12:41 +01:00
|
|
|
static int will_convert_lf_to_crlf(struct text_stat *stats,
|
2020-12-16 15:50:30 +01:00
|
|
|
enum convert_crlf_action crlf_action)
|
2016-08-13 23:29:27 +02:00
|
|
|
{
|
|
|
|
if (output_eol(crlf_action) != EOL_CRLF)
|
|
|
|
return 0;
|
|
|
|
/* No "naked" LF? Nothing to convert, regardless. */
|
|
|
|
if (!stats->lonelf)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
|
|
|
|
/* If we have any CR or CRLF line endings, we do not touch it */
|
|
|
|
/* This is the new safer autocrlf-handling */
|
|
|
|
if (stats->lonecr || stats->crlf)
|
|
|
|
return 0;
|
|
|
|
|
2019-01-24 14:12:41 +01:00
|
|
|
if (convert_is_binary(stats))
|
2016-08-13 23:29:27 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2018-04-15 20:16:08 +02:00
|
|
|
static int validate_encoding(const char *path, const char *enc,
|
|
|
|
const char *data, size_t len, int die_on_error)
|
|
|
|
{
|
2019-11-08 21:27:34 +01:00
|
|
|
const char *stripped;
|
|
|
|
|
2018-04-15 20:16:08 +02:00
|
|
|
/* We only check for UTF here as UTF?? can be an alias for UTF-?? */
|
2019-11-08 21:27:34 +01:00
|
|
|
if (skip_iprefix(enc, "UTF", &stripped)) {
|
|
|
|
skip_prefix(stripped, "-", &stripped);
|
|
|
|
|
2018-04-15 20:16:08 +02:00
|
|
|
/*
|
|
|
|
* Check for detectable errors in UTF encodings
|
|
|
|
*/
|
|
|
|
if (has_prohibited_utf_bom(enc, data, len)) {
|
|
|
|
const char *error_msg = _(
|
|
|
|
"BOM is prohibited in '%s' if encoded as %s");
|
|
|
|
/*
|
|
|
|
* This advice is shown for UTF-??BE and UTF-??LE encodings.
|
|
|
|
* We cut off the last two characters of the encoding name
|
|
|
|
* to generate the encoding name suitable for BOMs.
|
|
|
|
*/
|
|
|
|
const char *advise_msg = _(
|
|
|
|
"The file '%s' contains a byte order "
|
2019-11-08 21:27:34 +01:00
|
|
|
"mark (BOM). Please use UTF-%.*s as "
|
2018-04-15 20:16:08 +02:00
|
|
|
"working-tree-encoding.");
|
2019-11-08 21:27:34 +01:00
|
|
|
int stripped_len = strlen(stripped) - strlen("BE");
|
|
|
|
advise(advise_msg, path, stripped_len, stripped);
|
2018-04-15 20:16:08 +02:00
|
|
|
if (die_on_error)
|
|
|
|
die(error_msg, path, enc);
|
|
|
|
else {
|
|
|
|
return error(error_msg, path, enc);
|
|
|
|
}
|
|
|
|
|
|
|
|
} else if (is_missing_required_utf_bom(enc, data, len)) {
|
|
|
|
const char *error_msg = _(
|
|
|
|
"BOM is required in '%s' if encoded as %s");
|
|
|
|
const char *advise_msg = _(
|
|
|
|
"The file '%s' is missing a byte order "
|
|
|
|
"mark (BOM). Please use UTF-%sBE or UTF-%sLE "
|
|
|
|
"(depending on the byte order) as "
|
|
|
|
"working-tree-encoding.");
|
|
|
|
advise(advise_msg, path, stripped, stripped);
|
|
|
|
if (die_on_error)
|
|
|
|
die(error_msg, path, enc);
|
|
|
|
else {
|
|
|
|
return error(error_msg, path, enc);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-04-15 20:16:09 +02:00
|
|
|
static void trace_encoding(const char *context, const char *path,
|
|
|
|
const char *encoding, const char *buf, size_t len)
|
|
|
|
{
|
|
|
|
static struct trace_key coe = TRACE_KEY_INIT(WORKING_TREE_ENCODING);
|
|
|
|
struct strbuf trace = STRBUF_INIT;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
|
|
|
|
for (i = 0; i < len && buf; ++i) {
|
|
|
|
strbuf_addf(
|
2018-07-09 21:25:34 +02:00
|
|
|
&trace, "| \033[2m%2i:\033[0m %2x \033[2m%c\033[0m%c",
|
2018-04-15 20:16:09 +02:00
|
|
|
i,
|
|
|
|
(unsigned char) buf[i],
|
|
|
|
(buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '),
|
|
|
|
((i+1) % 8 && (i+1) < len ? ' ' : '\n')
|
|
|
|
);
|
|
|
|
}
|
|
|
|
strbuf_addchars(&trace, '\n', 1);
|
|
|
|
|
|
|
|
trace_strbuf(&coe, &trace);
|
|
|
|
strbuf_release(&trace);
|
|
|
|
}
|
|
|
|
|
2018-04-15 20:16:10 +02:00
|
|
|
static int check_roundtrip(const char *enc_name)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* check_roundtrip_encoding contains a string of comma and/or
|
|
|
|
* space separated encodings (eg. "UTF-16, ASCII, CP1125").
|
|
|
|
* Search for the given encoding in that string.
|
|
|
|
*/
|
|
|
|
const char *found = strcasestr(check_roundtrip_encoding, enc_name);
|
|
|
|
const char *next;
|
|
|
|
int len;
|
|
|
|
if (!found)
|
|
|
|
return 0;
|
|
|
|
next = found + strlen(enc_name);
|
|
|
|
len = strlen(check_roundtrip_encoding);
|
|
|
|
return (found && (
|
|
|
|
/*
|
|
|
|
* check that the found encoding is at the
|
|
|
|
* beginning of check_roundtrip_encoding or
|
|
|
|
* that it is prefixed with a space or comma
|
|
|
|
*/
|
|
|
|
found == check_roundtrip_encoding || (
|
|
|
|
(isspace(found[-1]) || found[-1] == ',')
|
|
|
|
)
|
|
|
|
) && (
|
|
|
|
/*
|
|
|
|
* check that the found encoding is at the
|
|
|
|
* end of check_roundtrip_encoding or
|
|
|
|
* that it is suffixed with a space or comma
|
|
|
|
*/
|
|
|
|
next == check_roundtrip_encoding + len || (
|
|
|
|
next < check_roundtrip_encoding + len &&
|
|
|
|
(isspace(next[0]) || next[0] == ',')
|
|
|
|
)
|
|
|
|
));
|
|
|
|
}
|
|
|
|
|
2018-04-15 20:16:07 +02:00
|
|
|
static const char *default_encoding = "UTF-8";
|
|
|
|
|
|
|
|
static int encode_to_git(const char *path, const char *src, size_t src_len,
|
|
|
|
struct strbuf *buf, const char *enc, int conv_flags)
|
|
|
|
{
|
|
|
|
char *dst;
|
2018-07-24 12:50:33 +02:00
|
|
|
size_t dst_len;
|
2018-04-15 20:16:07 +02:00
|
|
|
int die_on_error = conv_flags & CONV_WRITE_OBJECT;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* No encoding is specified or there is nothing to encode.
|
|
|
|
* Tell the caller that the content was not modified.
|
|
|
|
*/
|
|
|
|
if (!enc || (src && !src_len))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Looks like we got called from "would_convert_to_git()".
|
|
|
|
* This means Git wants to know if it would encode (= modify!)
|
|
|
|
* the content. Let's answer with "yes", since an encoding was
|
|
|
|
* specified.
|
|
|
|
*/
|
|
|
|
if (!buf && !src)
|
|
|
|
return 1;
|
|
|
|
|
2018-04-15 20:16:08 +02:00
|
|
|
if (validate_encoding(path, enc, src, src_len, die_on_error))
|
|
|
|
return 0;
|
|
|
|
|
2018-04-15 20:16:09 +02:00
|
|
|
trace_encoding("source", path, enc, src, src_len);
|
2018-04-15 20:16:07 +02:00
|
|
|
dst = reencode_string_len(src, src_len, default_encoding, enc,
|
|
|
|
&dst_len);
|
|
|
|
if (!dst) {
|
|
|
|
/*
|
|
|
|
* We could add the blob "as-is" to Git. However, on checkout
|
2019-11-05 18:07:23 +01:00
|
|
|
* we would try to re-encode to the original encoding. This
|
2018-04-15 20:16:07 +02:00
|
|
|
* would fail and we would leave the user with a messed-up
|
|
|
|
* working tree. Let's try to avoid this by screaming loud.
|
|
|
|
*/
|
|
|
|
const char* msg = _("failed to encode '%s' from %s to %s");
|
|
|
|
if (die_on_error)
|
|
|
|
die(msg, path, enc, default_encoding);
|
|
|
|
else {
|
|
|
|
error(msg, path, enc, default_encoding);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
2018-04-15 20:16:09 +02:00
|
|
|
trace_encoding("destination", path, default_encoding, dst, dst_len);
|
2018-04-15 20:16:07 +02:00
|
|
|
|
2018-04-15 20:16:10 +02:00
|
|
|
/*
|
|
|
|
* UTF supports lossless conversion round tripping [1] and conversions
|
|
|
|
* between UTF and other encodings are mostly round trip safe as
|
|
|
|
* Unicode aims to be a superset of all other character encodings.
|
|
|
|
* However, certain encodings (e.g. SHIFT-JIS) are known to have round
|
|
|
|
* trip issues [2]. Check the round trip conversion for all encodings
|
|
|
|
* listed in core.checkRoundtripEncoding.
|
|
|
|
*
|
|
|
|
* The round trip check is only performed if content is written to Git.
|
|
|
|
* This ensures that no information is lost during conversion to/from
|
|
|
|
* the internal UTF-8 representation.
|
|
|
|
*
|
|
|
|
* Please note, the code below is not tested because I was not able to
|
|
|
|
* generate a faulty round trip without an iconv error. Iconv errors
|
|
|
|
* are already caught above.
|
|
|
|
*
|
|
|
|
* [1] http://unicode.org/faq/utf_bom.html#gen2
|
|
|
|
* [2] https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode
|
|
|
|
*/
|
|
|
|
if (die_on_error && check_roundtrip(enc)) {
|
|
|
|
char *re_src;
|
2018-07-24 12:50:33 +02:00
|
|
|
size_t re_src_len;
|
2018-04-15 20:16:10 +02:00
|
|
|
|
|
|
|
re_src = reencode_string_len(dst, dst_len,
|
|
|
|
enc, default_encoding,
|
|
|
|
&re_src_len);
|
|
|
|
|
|
|
|
trace_printf("Checking roundtrip encoding for %s...\n", enc);
|
|
|
|
trace_encoding("reencoded source", path, enc,
|
|
|
|
re_src, re_src_len);
|
|
|
|
|
|
|
|
if (!re_src || src_len != re_src_len ||
|
|
|
|
memcmp(src, re_src, src_len)) {
|
|
|
|
const char* msg = _("encoding '%s' from %s to %s and "
|
|
|
|
"back is not the same");
|
|
|
|
die(msg, path, enc, default_encoding);
|
|
|
|
}
|
|
|
|
|
|
|
|
free(re_src);
|
|
|
|
}
|
|
|
|
|
2018-04-15 20:16:07 +02:00
|
|
|
strbuf_attach(buf, dst, dst_len, dst_len + 1);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int encode_to_worktree(const char *path, const char *src, size_t src_len,
|
|
|
|
struct strbuf *buf, const char *enc)
|
|
|
|
{
|
|
|
|
char *dst;
|
2018-07-24 12:50:33 +02:00
|
|
|
size_t dst_len;
|
2018-04-15 20:16:07 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* No encoding is specified or there is nothing to encode.
|
|
|
|
* Tell the caller that the content was not modified.
|
|
|
|
*/
|
|
|
|
if (!enc || (src && !src_len))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
dst = reencode_string_len(src, src_len, enc, default_encoding,
|
|
|
|
&dst_len);
|
|
|
|
if (!dst) {
|
2018-07-21 09:49:29 +02:00
|
|
|
error(_("failed to encode '%s' from %s to %s"),
|
2018-07-21 09:49:19 +02:00
|
|
|
path, default_encoding, enc);
|
2018-04-15 20:16:07 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
strbuf_attach(buf, dst, dst_len, dst_len + 1);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2021-04-01 03:49:39 +02:00
|
|
|
static int crlf_to_git(struct index_state *istate,
|
2017-06-13 00:13:53 +02:00
|
|
|
const char *path, const char *src, size_t len,
|
2011-05-09 22:12:57 +02:00
|
|
|
struct strbuf *buf,
|
2020-12-16 15:50:30 +01:00
|
|
|
enum convert_crlf_action crlf_action, int conv_flags)
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
{
|
|
|
|
struct text_stat stats;
|
Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 15:51:04 +02:00
|
|
|
char *dst;
|
2016-08-13 23:29:27 +02:00
|
|
|
int convert_crlf_into_lf;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
|
2011-05-09 22:12:57 +02:00
|
|
|
if (crlf_action == CRLF_BINARY ||
|
2012-02-24 23:05:03 +01:00
|
|
|
(src && !len))
|
Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 15:51:04 +02:00
|
|
|
return 0;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
|
2012-02-24 23:05:03 +01:00
|
|
|
/*
|
|
|
|
* If we are doing a dry-run and have no source buffer, there is
|
|
|
|
* nothing to analyze; we must assume we would convert.
|
|
|
|
*/
|
|
|
|
if (!buf && !src)
|
|
|
|
return 1;
|
|
|
|
|
Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 15:51:04 +02:00
|
|
|
gather_stats(src, len, &stats);
|
2016-08-13 23:29:27 +02:00
|
|
|
/* Optimization: No CRLF? Nothing to convert, regardless. */
|
|
|
|
convert_crlf_into_lf = !!stats.crlf;
|
Lazy man's auto-CRLF
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-13 20:07:23 +01:00
|
|
|
|
convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 17:24:41 +01:00
|
|
|
if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
|
2019-01-24 14:12:41 +01:00
|
|
|
if (convert_is_binary(&stats))
|
Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 15:51:04 +02:00
|
|
|
return 0;
|
2016-06-28 10:01:13 +02:00
|
|
|
/*
|
2016-11-30 18:02:32 +01:00
|
|
|
* If the file in the index has any CR in it, do not
|
|
|
|
* convert. This is the new safer autocrlf handling,
|
|
|
|
* unless we want to renormalize in a merge or
|
|
|
|
* cherry-pick.
|
2016-06-28 10:01:13 +02:00
|
|
|
*/
|
2018-01-13 23:49:31 +01:00
|
|
|
if ((!(conv_flags & CONV_EOL_RENORMALIZE)) &&
|
2017-11-26 13:20:52 +01:00
|
|
|
has_crlf_in_index(istate, path))
|
2016-08-13 23:29:27 +02:00
|
|
|
convert_crlf_into_lf = 0;
|
2007-04-15 22:35:45 +02:00
|
|
|
}
|
2018-01-13 23:49:31 +01:00
|
|
|
if (((conv_flags & CONV_EOL_RNDTRP_WARN) ||
|
|
|
|
((conv_flags & CONV_EOL_RNDTRP_DIE) && len))) {
|
|