Error converting '\ xa0' to GBK in Python's crawler UTF-8 string


#1

As I said above, because I need to crawl a large number of Chinese and UTF-8 encoded pages

Google’s solution is:

Replace ‘\ xa0’ with '(& nbsp;)

print(item['detail'][i].replace(u'\xa0 ', u' '))

And then still wrong, BUG feedback estimates your eldest brother is familiar with getting started:

D:\Users\15806.DESKTOP-A9HK574\Anaconda\python.exe C:/Users/15806.DESKTOP-A9HK574/Desktop/工作站-代码/python项目/网页爬虫初步/e-book.py
Traceback (most recent call last):
  File "C:/Users/15806.DESKTOP-A9HK574/Desktop/工作站-代码/python项目/网页爬虫初步/e-book.py", line 57, in <module>
    system_write(getBook(i)[0],getBook(i)[1])
  File "C:/Users/15806.DESKTOP-A9HK574/Desktop/工作站-代码/python项目/网页爬虫初步/e-book.py", line 51, in system_write
    f.writelines(data)
UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 0: illegal multibyte sequence

Process finished with exit code 1

Attached: did not call you to see Chinese

Still familiar with the familiar recipe

Has replaced the wrong / xa0, and then still incorrect report, the problem function is as follows:

def system_write(title,page):
    road=str(title.replace(" ", "_")+'.txt')
    #尝试替换utf-8中gbk没有的\xa0字符为空格|
    #Tried to replace utf-8 gbk not \ xa0 characters for the space
    data=page.replace(u'\xa0 ', u' ')

    #print('part 1 can work')
    with open(road,'a+') as f:
        #f.writelines('\n')
        f.writelines(data)
        f.seek(0)
        cNames=f.readlines()
    print(road+' 已下载完成')`
#Has been downloaded

Under this folder, you can see the file name written. However, this opening is an empty document, indicating that the Chinese file have already error

The big brother who will help out it
[although I think you may not encounter this problem because you are using GBK anyway]

At last:

English is not very good, written in Google translation, there are some places do not fluent everyone guess what it means
Each Chinese comment I have added English translation