java - MalformedInputException with Files.readAllLines() -
i iterating on files, 5328 precise. these files average xml files 60-200 lines max. first filtered through simple method isxmlsourcefile parse path.
files.walk(paths.get("/home/me/development/projects/myproject"), filevisitoption.follow_links) .filter(v3testsgenerator::isxmltestsourcefile) .filter(v3testsgenerator::filecontainsxmltag)
the big question second filter, method filecontainsxmltag. each file wanted detect if pattern contained @ least once among lines of it:
private static boolean filecontainsxmltag(path path) { try { return files.readalllines(path).stream().anymatch(line -> pattern.matcher(line).find()); } catch (ioexception e) { e.printstacktrace(); } return false; }
for files exception
java.nio.charset.malformedinputexception: input length = 1 @ java.nio.charset.coderresult.throwexception(coderresult.java:281) @ sun.nio.cs.streamdecoder.implread(streamdecoder.java:339) @ sun.nio.cs.streamdecoder.read(streamdecoder.java:178) @ java.io.inputstreamreader.read(inputstreamreader.java:184) @ java.io.bufferedreader.fill(bufferedreader.java:161) @ java.io.bufferedreader.readline(bufferedreader.java:324) @ java.io.bufferedreader.readline(bufferedreader.java:389) @ java.nio.file.files.readalllines(files.java:3205) @ java.nio.file.files.readalllines(files.java:3242)
but when use fileutiles.readlines() instead of files.readalllines getting well.
it's curiosity question if clue of what's going on, it's pleasure.
thanks
the method files.readalllines()
assumes file reading encoded in utf-8.
if exception, file reading encoded using different character encoding utf-8.
find out character encoding used, , use other readalllines
method, allows specify character encoding.
for example, if files encoded in iso-8859-1:
return files.readalllines(path, standardcharsets.iso_8859_1).stream()... // etc.
the method fileutiles.readlines()
(where come from?) assumes else (it assumes files in default character encoding of system, else utf-8).
Comments
Post a Comment