I am getting a weird error when I am trying to write bs4 output to a file?


I am getting a weird eorr when trying wite output to file, and I am stuck. Thaks for your help.
import urllib2
import bs4

open the web page

response = urllib2.urlopen('http://npr.org')

get the response text

raw_html = response.read()

use BeautifulSoup to convert raw text into beautiful soup object

soup = bs4.BeautifulSoup(raw_html)

convert soup to plain text]

plain_text = soup.get_text()

f= open('html.txt','w')
f.write(plain_text )
the errr is Traceback (most recent call last):
File "C:\Users\Owner\Documents\Markrov chain\Fetch_data.py", line 21, in
f.write(plain_text )
UnicodeEncodeError: 'ascii' codec can't encode characters in position 19698-19700: ordinal not in range(128)


First off. When you post code - make sure it's intact so that others can copy it and run it to get the same result.

I don't understand how unicode works, I don't know what it means to decode/encode unicode, but I believe that what you need to do is this:


Also, I tried doing the same thing in Python3, where unicode is the default string class (it was renamed to str), and there it "worked on its own"

I believe (keyword: believe, this might not be correct) that file.write expects to receive bytes (8 bits) that can be immediately written to a file. When iterating through a unicode object, that's not what you get, instead you get a character, which can be of varying size (in number of bits). So.. what you needed to get is the bytes (which are not going to correspond 1:1 to characters since this is unicode) for your unicode string, which it will give you when you call unicode.encode('utf-8'), those bytes can then be written to the file and all is well.

Python2's str class is just ascii characters, one byte per character, nothing fancy whatsoever.


This works. thankyou so much