How to remove potentially troublesome invisible characters before generating JSON?

A woman just joined our site and she created her profile. We didn’t have any trouble accepting her input and putting it into MongoDB. However, when we pull her profile from MongoDB and try to use Cheshire to convert it to a JSON string:

            (json/generate-string rme)

It throws an Exception ---- unable to parse the data. It seems to be just her profile. I’m still adding more logging to try to narrow it down.

We looked at her profile with pprint but everything looked normal.

I am curious, could there be an invisible character that is causing problems? Would we use regex to remove it?

unable to parse the data

Parse the data? What does that error have to do with string generation then?

Are there any details in the error, like the offending character or its position?

I’m still adding more logging to try to narrow it down.

FWIW, I would definitely use something like Portal to inspect the data.

1 Like

Interesting! Have definitely been bitten by similar in the past.

Perhaps try to load it w/ Charred?

Might produce a different (more helpful) error message.

1 Like

I just tried with every single character listed here.
It broke my terminal but Cheshire worked just fine:

user=> (let [s ", , ,­,͏,؜,ᅟ,ᅠ,឴,឵,᠋,᠌,᠍, , , , , , , , , , , , ,​,‌,‍,‎,‏,<U+202A>,<U+202B>,<U+202C>,<U+202D>,<U+202E>, , ,⁠,⁡,⁢,⁣,⁤,⁥,<U+2066>,<U+2067>,<U+2068>,<U+2069>,,,,,,,⠀, ,ㅤ,︀,︁,︂,︃,︄,︅,︆,︇,︈,︉,︊,︋,︌,︍,︎,️,,ᅠ,,𓏼,𝅙,𝅳,𝅴,𝅵,𝅶,𝅷,𝅸,𝅹,𝅺,󠀁,󠀠,󠀡,󠀢,󠀣,󠀤,󠀥,󠀦,󠀧,󠀨,󠀩,󠀪,󠀫,󠀬,󠀭,󠀮,󠀯,󠀰,󠀱,󠀲,󠀳,󠀴,󠀵,󠀶,󠀷,󠀸,󠀹,󠀺,󠀻,󠀼,󠀽,󠀾,󠀿,󠁀,󠁁,󠁂,󠁃,󠁄,󠁅,󠁆,󠁇,󠁈,󠁉,󠁊,󠁋,󠁌,󠁍,󠁎,󠁏,󠁐,󠁑,󠁒,󠁓,󠁔,󠁕,󠁖,󠁗,󠁘,󠁙,󠁚,󠁛,󠁜,󠁝,󠁞,󠁟,󠁠,󠁡,󠁢,󠁣,󠁤,󠁥,󠁦,󠁧,󠁨,󠁩,󠁪,󠁫,󠁬,󠁭,󠁮,󠁯,󠁰,󠁱,󠁲,󠁳,󠁴,󠁵,󠁶,󠁷,󠁸,󠁹,󠁺,󠁻,󠁼,󠁽,󠁾,󠁿,󠄀,󠄁,󠄂,󠄃,󠄄,󠄅,󠄆,󠄇,󠄈,󠄉,󠄊,󠄋,󠄌,󠄍,󠄎,󠄏,󠄐,󠄑,󠄒,󠄓,󠄔,󠄕,󠄖,󠄗,󠄘,󠄙,󠄚,󠄛,󠄜,󠄝,󠄞,󠄟,󠄠,󠄡,󠄢,󠄣,󠄤,󠄥,󠄦,󠄧,󠄨,󠄩,󠄪,󠄫,󠄬,󠄭,󠄮,󠄯,󠄰,󠄱,󠄲,󠄳,󠄴,󠄵,󠄶,󠄷,󠄸,󠄹,󠄺,󠄻,󠄼,󠄽,󠄾,󠄿,󠅀,󠅁,󠅂,󠅃,󠅄,󠅅,󠅆,󠅇,󠅈,󠅉,󠅊,󠅋,󠅌,󠅍,󠅎,󠅏,󠅐,󠅑,󠅒,󠅓,󠅔,󠅕,󠅖,󠅗,󠅘,󠅙,󠅚,󠅛,󠅜,󠅝,󠅞,󠅟,󠅠,󠅡,󠅢,󠅣,󠅤,󠅥,󠅦,󠅧,󠅨,󠅩,󠅪,󠅫,󠅬,󠅭,󠅮,󠅯,󠅰,󠅱,󠅲,󠅳,󠅴,󠅵,󠅶,󠅷,󠅸,󠅹,󠅺,󠅻,󠅼,󠅽,󠅾,󠅿,󠆀,󠆁,󠆂,󠆃,󠆄,󠆅,󠆆,󠆇,󠆈,󠆉,󠆊,󠆋,󠆌,󠆍,󠆎,󠆏,󠆐,󠆑,󠆒,󠆓,󠆔,󠆕,󠆖,󠆗,󠆘,󠆙,󠆚,󠆛,󠆜,󠆝,󠆞,󠆟,󠆠,󠆡,󠆢,󠆣,󠆤,󠆥,󠆦,󠆧,󠆨,󠆩,󠆪,󠆫,󠆬,󠆭,󠆮,󠆯,󠆰,󠆱,󠆲,󠆳,󠆴,󠆵,󠆶,󠆷,󠆸,󠆹,󠆺,󠆻,󠆼,󠆽,󠆾,󠆿,󠇀,󠇁,󠇂,󠇃,󠇄,󠇅,󠇆,󠇇,󠇈,󠇉,󠇊,󠇋,󠇌,󠇍,󠇎,󠇏,󠇐,󠇑,󠇒,󠇓,󠇔,󠇕,󠇖,󠇗,󠇘,󠇙,󠇚,󠇛,󠇜,󠇝,󠇞,󠇟,󠇠,󠇡,󠇢,󠇣,󠇤,󠇥,󠇦,󠇧,󠇨,󠇩,󠇪,󠇫,󠇬,󠇭,󠇮,󠇯"] (println (count s)) (c/generate-string s))
1189
"\", , ,­,͏,؜,ᅟ,ᅠ,឴,឵,᠋,᠌,᠍, , , , , , , , , , , , ,​,‌,‍,‎,‏,<U+202A>,<U+202B>,<U+202C>,<U+202D>,<U+202E>, , ,⁠,⁡,⁢,⁣,⁤,⁥,<U+2066>,<U+2067>,<U+2068>,<U+2069>,,,,,,,⠀, ,ㅤ,︀,︁,︂,︃,︄,︅,︆,︇,︈,︉,︊,︋,︌,︍,︎,️,,ᅠ,,𓏼,𝅙,𝅳,𝅴,𝅵,𝅶,𝅷,𝅸,𝅹,𝅺,󠀁,󠀠,󠀡,󠀢,󠀣,󠀤,󠀥,󠀦,󠀧,󠀨,󠀩,󠀪,󠀫,󠀬,󠀭,󠀮,󠀯,󠀰,󠀱,󠀲,󠀳,󠀴,󠀵,󠀶,󠀷,󠀸,󠀹,󠀺,󠀻,󠀼,󠀽,󠀾,󠀿,󠁀,󠁁,󠁂,󠁃,󠁄,󠁅,󠁆,󠁇,󠁈,󠁉,󠁊,󠁋,󠁌,󠁍,󠁎,󠁏,󠁐,󠁑,󠁒,󠁓,󠁔,󠁕,󠁖,󠁗,󠁘,󠁙,󠁚,󠁛,󠁜,󠁝,󠁞,󠁟,󠁠,󠁡,󠁢,󠁣,󠁤,󠁥,󠁦,󠁧,󠁨,󠁩,󠁪,󠁫,󠁬,󠁭,󠁮,󠁯,󠁰,󠁱,󠁲,󠁳,󠁴,󠁵,󠁶,󠁷,󠁸,󠁹,󠁺,󠁻,󠁼,󠁽,󠁾,󠁿,󠄀,󠄁,󠄂,󠄃,󠄄,󠄅,󠄆,󠄇,󠄈,󠄉,󠄊,󠄋,󠄌,󠄍,󠄎,󠄏,󠄐,󠄑,󠄒,󠄓,󠄔,󠄕,󠄖,󠄗,󠄘,󠄙,󠄚,󠄛,󠄜,󠄝,󠄞,󠄟,󠄠,󠄡,󠄢,󠄣,󠄤,󠄥,󠄦,󠄧,󠄨,󠄩,󠄪,󠄫,󠄬,󠄭,󠄮,󠄯,󠄰,󠄱,󠄲,󠄳,󠄴,󠄵,󠄶,󠄷,󠄸,󠄹,󠄺,󠄻,󠄼,󠄽,󠄾,󠄿,󠅀,󠅁,󠅂,󠅃,󠅄,󠅅,󠅆,󠅇,󠅈,󠅉,󠅊,󠅋,󠅌,󠅍,󠅎,󠅏,󠅐,󠅑,󠅒,󠅓,󠅔,󠅕,󠅖,󠅗,󠅘,󠅙,󠅚,󠅛,󠅜,󠅝,󠅞,󠅟,󠅠,󠅡,󠅢,󠅣,󠅤,󠅥,󠅦,󠅧,󠅨,󠅩,󠅪,󠅫,󠅬,󠅭,󠅮,󠅯,󠅰,󠅱,󠅲,󠅳,󠅴,󠅵,󠅶,󠅷,󠅸,󠅹,󠅺,󠅻,󠅼,󠅽,󠅾,󠅿,󠆀,󠆁,󠆂,󠆃,󠆄,󠆅,󠆆,󠆇,󠆈,󠆉,󠆊,󠆋,󠆌,󠆍,󠆎,󠆏,󠆐,󠆑,󠆒,󠆓,󠆔,󠆕,󠆖,󠆗,󠆘,󠆙,󠆚,󠆛,󠆜,󠆝,󠆞,󠆟,󠆠,󠆡,󠆢,󠆣,󠆤,󠆥,󠆦,󠆧,󠆨,󠆩,󠆪,󠆫,󠆬,󠆭,󠆮,󠆯,󠆰,󠆱,󠆲,󠆳,󠆴,󠆵,󠆶,󠆷,󠆸,󠆹,󠆺,󠆻,󠆼,󠆽,󠆾,󠆿,󠇀,󠇁,󠇂,󠇃,󠇄,󠇅,󠇆,󠇇,󠇈,󠇉,󠇊,󠇋,󠇌,󠇍,󠇎,󠇏,󠇐,󠇑,󠇒,󠇓,󠇔,󠇕,󠇖,󠇗,󠇘,󠇙,󠇚,󠇛,󠇜,󠇝,󠇞,󠇟,󠇠,󠇡,󠇢,󠇣,󠇤,󠇥,󠇦,󠇧,󠇨,󠇩,󠇪,󠇫,󠇬,󠇭,󠇮,󠇯\""

Given that the error says “parse”, maybe something downsteam of Cheshire tried to parse the data and failed?

1 Like

Thank you for this. I should have tried this.