I am scraping some webpage which is encoded in UTF8 using a Node.js replit. I print out the content to the console and instead of umlauts I am getting replacement characters. What am I doing wrong?
Can you show an image?
Maybe because the console and shell use a different encoding, and umlauts are not supported.
I wrote it to a file to see if its the console. the downloaded file still shows replacement characters.
UTF-8 probably doesn’t support umlauts then
1 Like
Oh it does. Look here: äöü.
Anyways the issue was that the response.text() function decodes to UTF8. Using a text decoder to decode to ISO-8895-1 solved the issue.
1 Like
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.