Perl, 64bit file i/o, memory mapped files
Prof. Brian D. Davison
Computer Science & Engineering, Lehigh University
Announcements
- Reminder Project 8 due tomorrow night.
- Reminder: Final exam is scheduled for Wednesday May 4, noon-3pm in Packard 416
- Contact me today if you already have two finals scheduled for that day to arrange to take the exam earlier.
- Today
- Return SQ10: range 4-10, mean 6.6
- Revised P7 grading: range 74-100, mean 94
- 64bit file i/o, memory-mapped files, Perl
Large Files
- Unfortunately, in typical systems that permit large files, open(2) will return an EOVERFLOW error if a file larger than 2 GB is encountered.
- This is not because the file system is incapable of managing the storage of large files (typically they can).
- Associated with every open file descriptor is a file pointer (an offset) -- the location within the file where the next read() or write() will occur.
- For many years, that offet was a 32-bit integer, limiting file offsets to (2^31)-1.
- Consider the implementation of tail(1)...
Large Files
- The solution is obvious -- need a larger representation of offsets!
- and off64_t is the replacement for off_t.
- But... changing a standard type can be dangerous
to programs that are unaware.
- As a result, we have three classes of programs:
- Large-file unaware (may break when encountering large files)
- Large-file safe (may quit safely with an error when given large files)
- Large-file aware (know and can use large files)
Support for Large Files
- First, the best way to be certain of safety is to check the return code of all kernel function calls.
- Old programs, if they check the return code properly (e.g., are large-file safe), will see an error
value when they attempt to work with a large file.
- OSes that support large files have added transitional APIs with 64-bit functions and types, that can access both small and large files.
- New functions are named xxx64() for every relevant function xxx()
- New data types are xxx64_t for every relevant data type xxx_t
- E.g., open64() and off64_t
Building Programs with Large File Support
- Default compilation continues to use old, small-file support.
- (Sufficient to build large-file safe programs.)
- Transitional compilation environment -- both 32-bit and 64-bit functions available
- Enables both xxx() and xxx64()
- Set _LARGEFILE64_SOURCE to 1 before including any system headers
- Large file compilation environment
- All xxx() source interfaces are remapped to xxx64() calls, and relevant data types are automatically converted.
- Set _FILE_OFFSET_BITS to 64 before including any system headers
- See
Sun's Large File White Paper for more details.
Memory Mapped Files
- Many operating systems permit a file to be mapped into (virtual) memory.
- A range of memory addresses are now associated with the file.
- Reads and writes to those memory locations are treated as reads and writes to the file.
- Useful when you need to create a very large data structure but don't have enough real memory for it.
- You can create a file of appropriate size, mmap() it, and access your data structure using pointer arithmetic instead of file operators.
- Usage:
fd = open(filename, O_RDWR);
char *array = mmap(0, MEMSIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
...
munmap(array, MEMSIZE);
close(fd);
- Often simpler than many random lseek() calls on a file.
Memory Mapped Files
- Memory mapped files can also be used for inter-process communication.
- The same memory mapped file can be mapped by multiple processes.
- The mapping includes options to MAP_SHARED or MAP_PRIVATE, to determine what to do on writes.
- Effectively operates much like shared memory operations, except that changes to the memory are backed-up on disk.
- For performance reasons, can also mlock() the region of memory so that it always resides in RAM.
Perl
- Perl
- is an interpreted programming language
- is known for its power and flexibility
- combines the familiar syntax of C, C++, sed, awk, grep, sh, and csh into one powerful tool
- was invented by Larry Wall
- is an acronym for "Practical Extraction and Report Language"
- is optimized for string manipulation, I/O, and system tasks
- has builtin functionss for almost everything that's in section 2 of the UNIX manual pages
Perl Basics
- Perl scripts can be run in the same way as shell scripts
- by perl scriptname
- by making the script executable and including an appropriate !# line
- Perl ignores extra whitespace -- indent or not, it's up to you
- Comments start with a # and go to the end of the line
- All Perl statements end in a ;
Perl Functions
- Perl functions (built in or user defined) are identified by their unique names
- Parameters are comma separated, but parentheses are often optional
print("length: " ,length("hello world"));
# prints the same thing as
print "length: ",1, length "a";
- Define as:
sub functionname {
statements
}
functionname(list of parameters);
Associative Arrays (Hashes)
Example Perl Script
- Perl is huge; there is always more than one way to do something.
- It has loops, pointers, object-oriented support, full regular expressions, many supporting libraries, etc.
- We can't cover much here, so let's look at an example.
- Recall
Project 5
- A Perl implementation can be much easier and faster.
- For more information, see the
perl.com documentation and the variety of perl tutorials that are available online.