Quantcast
Channel: What are the metrics that `file ` uses to determine the type of a text-like file? - Unix & Linux Stack Exchange
Viewing all articles
Browse latest Browse all 2

What are the metrics that `file ` uses to determine the type of a text-like file?

$
0
0

I have a bunch of LaTeX source files, all have the same structure, all have Unix-style line endings and all are UTF-8, all are roughly the same size (1-2KB), all use spaces for indentation-formatting. They are included in a bigger document, each file handling a separate section in the document with each section having the same layout (so each file is structured identical with mostly the same LaTeX commands, just with different text content), so all files directly start/end with and contain many LaTeX commands. The strange thing now is this:

$ file *.texfile1.tex: LaTeX document, Unicode text, UTF-8 textfile2.tex: CSV text

This is just a tiny excerpt, the detection of CSV vs. LaTeX is totaly random, while CSV is slightly less often detected (maybe 40% CSV, 60% LaTeX), but for each file the type is reproducible.

I tried varying some formatting and content in CSV-detected files, but they stay detected as CSV.

What is going on here?


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles



Latest Images