Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
You should be saving RSS files in archive.org
19 points by renegat0x0 on Jan 18, 2023 | hide | past | favorite | 5 comments
Yesterday I was going through all the links in BBC, CNN, Guardian RSS stored during time of pandemic, for research purposes. I downloaded archived RSS files in archive.org, like for http://feeds.bbci.co.uk/news/rss.xml. Unfortunately I have noticed that some days are missing in the archive.org.

I think it is a good idea, if you have a favorite site, which has a RSS source, to make a call to store it regularly in archive.org for 'the next generations'. https://help.archive.org/help/save-pages-in-the-wayback-machine/.

I have updated my RSS client to make a such call everyday, even for my fav YouTube RSS sources.

Even if 'link rot' gets in a way, this might be insightful to see at least link, or title, or description of the RSS entry. Historians might consider it useful some day.



I have contacted next cloud 'News' plugin developer https://github.com/nextcloud/news/discussions/2066 to request automated call to archive.org.

I have received a response:

I don't think archive.org would be very happy if every nextcloud/news install would request they store the CNN frontpage every hour for example. I think this is better suited in feed creation software since that would know exactly when a feed is updated.


It sounds like something that could be automated. I know that they proactively archive certain interesting content regularly. Maybe another source of that could be feeds. They could track feeds for submitrf pages and considering subscribing to and archiving feeds that hit a certain popularity threshold.


Sounds reasonable, but what if the page that you are interested is not in cross-heirs of archive.org, what if is niche? Archive.org provides API to store the pages you are interested in, it is like that by design. I think they decide what to store after all, by checking how many requests are made.


I'm not saying that you shouldn't archive manually as well. Just that it would be a good idea to also have detection and auto-archiving of popular feeds.


Feels like a waste of their resources




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: