Find today's RSS entries with feedparser

I’m trying to create the worlds most basic RSS reader - I just want to know the number (len) of entries that were only created TODAY. I don’t care if anything older was modified on today, just entries created.

Should I even be using feedparser do this? It seems I could just findAll tags with BeautifulSoup and match it today’s date, but everyone insists I should be using feedparser. Note: I’m still new at this, so it’s hours of struggling either way.

Here’s where I’m at - this shows that I have 50 entries:

import feedparser
dgtw = feedparser.parse('https://investorshub.advfn.com/boards/rss.aspx?board_id=22658')
print (len(dgtw['entries']))

This just shows the published date of the first entry:

print(dgtw.entries[0].published)

I just want to findAll published dates that match today’s date and give me a len number/return.

I don’t see anything in the docs about this specifically: https://pythonhosted.org/feedparser/date-parsing.html

Hi @biksco, welcome to the forum!

feedparser returns two values that hold the published date for the item: entries[i].published and entries[i].published_parsed.

published retains the format used in the feed, e.g. Monday, June 10 2019. published_parsed, on the other hand, returns a tuple in the same format as that returned from time.struct_time().

So, what we can do, is this:

import feedparser
from time import gmtime

now = gmtime()
print(now[0],now[1],now[2]) # the current YEAR MONTH DAY

dgtw = feedparser.parse('https://investorshub.advfn.com/boards/rss.aspx?board_id=22658')

pub_today = 0
for entry in dgtw['entries']:
  #print(entry.published_parsed[0],entry.published_parsed[1],entry.published_parsed[2]) # print entry YEAR MONTH DAY for debugging
  if (entry.published_parsed[0],entry.published_parsed[1],entry.published_parsed[2]) == (now[0],now[1],now[2]):
    pub_today += 1

print("There were %d entries published today!" % pub_today)

What we’re doing is getting the value of gmtime(), which returns a tuple with the current date and time, iterating over the entries and comparing the relevant fields (year, month, day) to the same fields as returned by the feed.

Edit 2: I’ve used gmtime() here in this example, which returns the current time in UTC. If you want to respect a local time offset, e.g. your local timezone is Eastern Standard, you could use localtime() instead.

If they match, the entry was published today and so we increment the counter. If not, we don’t.

Does that make sense?

Edit:

Here are the relevant docs for the stuff I’ve used.

gmtime(): https://docs.python.org/3/library/time.html#time.gmtime
struct_time(): https://docs.python.org/3/library/time.html#time.struct_time (this is info on the time tuple!)

entries[i].published: https://pythonhosted.org/feedparser/reference-entry-published.html#reference-entry-published
entries[i].published_parsed: https://pythonhosted.org/feedparser/reference-entry-published_parsed.html

2 Likes

Thank you pitycoder for taking PITY on me. :slightly_smiling_face: Yup, it works! Thanks for putting the links to what I’m supposed to know.

The last thing I’m trying to do it put the actual number into a variable. (this seems to work)

total = ("%d" % pub_today)
print(total)

I used localtime. The feed itself is set to GMT, but everything should convert fine in the background, right?

I got a couple of suggestions on how to do this around the Internet, yours is the most succinct!

Oh yea, I’m hoping I can use your code for other feeds. It should work as is? Anything I would have to look out for?

You already have the number of published articles in a variable - that’s what pub_today is.

Those two lines of code are effectively changing pub_today from an int to a string, and assigning them to a new variable called total. If all you wanted to do is print the count of articles published today, you can just do print(pub_today). :slight_smile:

I’m fairly sure the output from feedparser.parse is pretty standard, and I presume that either RSS is itself relatively standard or the people behind feedparser have accounted for any variability, so that snippet should accommodate them. In the event anything isn’t as expected, now you’ve got the basic premise of how to go about it I would imagine you can tweak the code as necessary if anything breaks. :smiley:

2 Likes