java - Bullets in document getting as a question mark in GATE NLP -


i new gate nlp. have document, contains bullets. when load gate. bullets detected unknown type symbol printed . tried set encoding utf-8. , tryed load document programmatically, bullets gets detected ? .

can explain me this?

example:

 promoted senior member technical in 2.5 years of experience.

here symbol in gate developer ui , ? symbol shown when did "programmatically".

in experience, doc , docx files not produce characters. bullets either missing (text formatted bullet-list) or printed (text raw bullet characters).

see related question: parsing either font style or block of paragraph in gate

pdf files produce "-bullet characters" in gate document. may related pdf or apache pdfbox issues, see e.g. this one.

these characters have unicode value. in xml, encoded example . in case, advice trace such characters (they may have different unicode values depending on original bullet character) , replace them printable (e.g. ).

concerning ? characters: caused java environment doesn't support these characters. see e.g.: why unicode characters appears question mark in console?


Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -