java - MalformedInputException with Files.readAllLines() -


i iterating on files, 5328 precise. these files average xml files 60-200 lines max. first filtered through simple method isxmlsourcefile parse path.

    files.walk(paths.get("/home/me/development/projects/myproject"), filevisitoption.follow_links)             .filter(v3testsgenerator::isxmltestsourcefile)             .filter(v3testsgenerator::filecontainsxmltag) 

the big question second filter, method filecontainsxmltag. each file wanted detect if pattern contained @ least once among lines of it:

private static boolean filecontainsxmltag(path path) {     try {         return files.readalllines(path).stream().anymatch(line -> pattern.matcher(line).find());     } catch (ioexception e) {         e.printstacktrace();     }     return false; } 

for files exception

java.nio.charset.malformedinputexception: input length = 1 @ java.nio.charset.coderresult.throwexception(coderresult.java:281) @ sun.nio.cs.streamdecoder.implread(streamdecoder.java:339) @ sun.nio.cs.streamdecoder.read(streamdecoder.java:178) @ java.io.inputstreamreader.read(inputstreamreader.java:184) @ java.io.bufferedreader.fill(bufferedreader.java:161) @ java.io.bufferedreader.readline(bufferedreader.java:324) @ java.io.bufferedreader.readline(bufferedreader.java:389) @ java.nio.file.files.readalllines(files.java:3205) @ java.nio.file.files.readalllines(files.java:3242) 

but when use fileutiles.readlines() instead of files.readalllines getting well.

it's curiosity question if clue of what's going on, it's pleasure.

thanks

the method files.readalllines() assumes file reading encoded in utf-8.

if exception, file reading encoded using different character encoding utf-8.

find out character encoding used, , use other readalllines method, allows specify character encoding.

for example, if files encoded in iso-8859-1:

return files.readalllines(path, standardcharsets.iso_8859_1).stream()... // etc. 

the method fileutiles.readlines() (where come from?) assumes else (it assumes files in default character encoding of system, else utf-8).


Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -