Data pulled from the web is seen as a string (even though it is in list format)

Hey everyone,

I was tinkering with the “Become a Pokemon Master” project, and thought it would be a good occasion to learn some web-scraping, import the official stats of Pokémon, and create objects based on them.

So I did a few lessons on the Beautiful Soup module, I chose this website, and went to work. Now the reason why I chose that page is that if you look at their code, you will see on line 157 something that looks a lot like a Python list :

<span id=sourcehash class=sourcehash>[[1,[128,118,111,20,10,1115,12,4,1,null]],[2,[155,151,143,10,7,1699,12,4,1,null]],[3,[190,198,189,5,5,2720,12,4,1,3]],[4,[118,116,93,20,10,980,10,null,1,null]],[5,[151,158,126,10,7,1653,10,null,1,null]],[6,[186,223,173,5,5,2889,10,3,1,6]],[7,[127,94,121,20,10,946,11,null,1,null]],[8,[153,126,155,10,7,1488,11,null,1,null]],[9,[188,171,207,5,5,2466,11,null,1,9]],[10,[128,55,55,50,20,437,7,null,1,null]],[11,[137,45,80,25,9,450,7,null,1,null]],[12, ...

And indeed, if I simply create a list and copy/paste the content of that tag, I can easily extract all the data I want and turn it into a dictionary:

null = None
mega_list_example = [[1,[128,118,111,20,10,1115,12,4,1,null]],[2,[155,151,143,10,7,1699,12,4,1,null]],[3,[190,198,189,5,5,2720,12,4,1,3]],[4,[118,116,93,20,10,980,10,null,1,null]],[5,[151,158,126,10,7,1653,10,null,1,null]],[6,[186,223,173,5,5,2889,10,3,1,6]]]

key_list = ['Stamina', 'Attack', 'Defense', 'Capture %', 'Flee %', 'Max CP', 'Type 1', 'Type 2', 'Generation', 'k10']
type_names = ['Normal', 'Fighting', 'Flying', 'Poison', 'Ground', 'Rock', 'Bug', 'Ghost', 'Steel', 'Fire', 'Water', 'Grass', 'Electric', 'Psychic', 'Ice', 'Dragon', 'Dark', 'Fairy']
dictionary = {}

for item in mega_list_example:
    dic2 = {k : v for k, v in zip(key_list, item[1])}
    # Overwriting the type numbers in the Pokemon dictionaries with their corresponding name in the list type_names
    dic2['Type 1'] = type_names[int(dic2['Type 1']-1)]
        dic2['Type 2'] = type_names[int(dic2['Type 2']-1)]
    except TypeError:
    # Adding Pokemon stats to dictionary and ending the loop cycle
    dictionary[item[0]] = dic2

for keys, values in dictionary.items():
    print(keys, ": ", values)

The problem occurs when I try to work with the real imported data. I have managed to extract that massive list and save it under ‘massive_list1’, but then, when I try to work with it, it simply won’t let me. I printed that object’s type and it’s recognised by python as a string:

import requests
from bs4 import BeautifulSoup

webpage_response = requests.get('')
soup = BeautifulSoup(webpage_response.content, 'html.parser')
span_stats = str(soup.find('span', id='sourcehash'))
null = None
mega_list1 = span_stats[span_stats.find('>[')+1:span_stats.find(']<')+1]

Is there a way I can make Python recognise this new imported object as a list?
I tried using the compile function but that didn’t work.

You ARE reading TEXT. Not lists.
If it’s json, parse it as such. See the json module
if it’s a valid python literal value then you can also use ast.literal_eval but that’s a bit suspect (why would it be specifically python) and you’d probably still want to parse as json instead

and you wouldn’t want the whole tag, you’d want the content of the tag, so ask BS for that rather than converting to str

1 Like

This topic was automatically closed 18 hours after the last reply. New replies are no longer allowed.