- ls
One of my favorite usage is
ls -l -t
It lists files ordered by modified time, with detailed file information.
- cut
To cut some filds (columns) from a table-like file (.csv file for example).
For example, you have a csv file (dump.csv) with 15 columns, the first few lines
look like this:
"Z.RS.STZ.VX",2780,0,0,0,0,0,0,0,0,0,0,0,0,0
"Z.RS.STZ.VX",2784,0,0,0,0,0,0,0,0,0,0,0,0,0
"Z.S.SUT.RSY",405,0,0,0,0,0,0,0,0,0,0,0,0,0
to cut column 3 to 15 out of the file, simply type:
cat dump.csv | cut -d ',' -f3-15
- paste
One of the most useful command for file processing.
In my experience (data mining experiment), one frequent situation is
to paste the true labels to the estimated posterior, then use the new
file to calculate AUC.
paste -d '\t' labels posteriors > combination
An program for calculating AUC will be given later.
- paste
A common linux command, I like to append test data to training data using this command.
- which
Shows the full path of a command. For example, when writing python scripts to be
executed, you need to put
#!/usr/bin/python
at the beginning of the script file. If you don't know where is python, you can
type
which python
the above path will show up.
- sort
Sort entries of a file in some order according to some key columns. Some of
the most useful options:
-k Specifying which column you like to treated as key for sorting
-n Treat the key as number rather than string
-g When the keys are in scientific notation, you need this option to
turn those strings into correct numbers
-r Sort in reverse order, the greater ones will come first
- head/tail
For huge file, even for VIM, it takes a while to open. Sometimes you don't want to see
that whole file, you just need to check the first and last few lines. That's when
these commands come into play. Try these:
head -n 10 huge_file
tail -n 10 huge_file
- tr
To replace some characters in a string with another character.