Class TextFile
- Author:
- Kohsuke Kawaguchi
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
delete()
boolean
exists()
fastTail
(int numChars) Uses the platform default encoding.Efficiently reads the last N characters (or shorter, if the whole file is shorter than that.)head
(int numChars) Reads the first N characters or until we hit EOF.lines()
Read all lines from the file as aStream
.read()
Reads the entire contents and returns it.readTrim()
toString()
void
Overwrites the file by the given string.
-
Field Details
-
file
-
-
Constructor Details
-
TextFile
-
-
Method Details
-
exists
public boolean exists() -
delete
- Throws:
IOException
-
read
Reads the entire contents and returns it.- Throws:
IOException
-
lines
Read all lines from the file as aStream
. Bytes from the file are decoded into characters using theUTF-8
charset
. If timely disposal of file system resources is required, the try-with-resources construct should be used to ensure thatBaseStream.close()
is invoked after the stream operations are completed.- Returns:
- the lines from the file as a
Stream
- Throws:
IOException
- if an I/O error occurs opening the file
-
write
Overwrites the file by the given string.- Throws:
IOException
-
head
Reads the first N characters or until we hit EOF.- Throws:
IOException
-
fastTail
Efficiently reads the last N characters (or shorter, if the whole file is shorter than that.)This method first tries to just read the tail section of the file to get the necessary chars. To handle multi-byte variable length encoding (such as UTF-8), we read a larger than necessary chunk.
Some multi-byte encoding, such as Shift-JIS, doesn't allow the first byte and the second byte of a single char to be unambiguously identified, so it is possible that we end up decoding incorrectly if we start reading in the middle of a multi-byte character. All the CJK multi-byte encodings that I know of are self-correcting; as they are ASCII-compatible, any ASCII characters or control characters will bring the decoding back in sync, so the worst case we just have some garbage in the beginning that needs to be discarded. To accommodate this, we read additional 1024 bytes.
Other encodings, such as UTF-8, are better in that the character boundary is unambiguous, so there can be at most one garbage char. For dealing with UTF-16 and UTF-32, we read at 4 bytes boundary (all the constants and multipliers are multiples of 4.)
Note that it is possible to construct a contrived input that fools this algorithm, and in this method we are willing to live with a small possibility of that to avoid reading the whole text. In practice, such an input is very unlikely.
So all in all, this algorithm should work decently, and it works quite efficiently on a large text.
- Throws:
IOException
-
fastTail
Uses the platform default encoding.- Throws:
IOException
-
readTrim
- Throws:
IOException
-
toString
-