What if:
- You lost all your WordPress posts
- You are lucky for the fact that you have subscribed your blog on your own Google Reader
- There is no backup of DB or any tools that could help you to recover the lost files
Then you are lucky! Because you could recover:
+ Posts content that is fed to Google Reader
however you could not recover:
- Attachment post
- Comments
- Other miscs linked to the post
Here is how:
+ Login into your Google Reader, unstar all posts and star all items belongs to your blog
+ Next, we will export all those starred items to Atom XML file. But first, you need to grab your Google Reader ID. To do that, login to url http://www.google.com/reader/view/ then View Source of the page under Firefox or any browsers. Then you will need to search for _USER_ID in the source. For my case, the search result is:
_USER_ID = "10716493500706428020",
All good, next we need to generate the URL to ATOM XML, that is:
http://www.google.com/reader/atom/user/USER_ID/state/com.google/starred?n=NUM_POST
replace USER_ID with your USER_ID and NUM_POST is the number of posts you want to retrieve. For my case, it is:
http://www.google.com/reader/atom/user/10716493500706428020/state/com.google/starred?n=800
All good. Next I use XML parser and manipulate the Feed into RSS Feed then have it imported with WordPress’s builtin Import RSS function. I use Ruby for this tutorial, please feel free to adapt the code to your favorite language:
#!/usr/bin/ruby require 'rubygems' require 'simple-rss' require 'open-uri' require 'builder' require 'progressbar' source = "http://www.google.com/reader/atom/user/10716493500706428020/state/com.google/starred?n=800" # url the google reader feed content = "" pbar = nil feed = SimpleRSS.parse(open(source, :content_length_proc => lambda {|t| if t && 0 < t pbar = ProgressBar.new("Fetching Atom Feed", t) pbar.file_transfer_mode end }, :progress_proc => lambda {|s| pbar.set s if pbar })) xml = Builder::XmlMarkup.new( :target => File.open("rss.xml", "w"), :indent => 2 ) xml.instruct! xml.rss("version" => "0.9.2") do xml.channel do xml.title feed.channel.title xml.link feed.channel.link xml.description "I am going to turn to RSS :)" xml.lastBuildDate feed.channel.updated xml.docs "http://backend.userland.com/rss092" xml.language "en" feed.items.each do | item | xml.item do xml.pubDate item.published xml.category item.category xml.title item.title c = item.content c.gsub!("\n",'') xml.description c xml.link item.link end end end end
What you would have after running the script is rss.xml. But it’s not the perfect RSS feed. Here is what you need to clean up the file:
+ Remove the tag at the first line – I don’t know why XML instruct! create that line when use with File pipe – I will update the post if I find a fix
+ Search and replace
< >
with
> <
+ If you have images that link to attachment, you should search and replace it with the image path itself. I use TextMate to do this tedious task, use Search with Regular Expression.
Next, the last step is to login into your WP backend and go to Tools >> Import and select the rss.xml file. All done 🙂
This is a great tutorial; however, I’ve run into a problem.
I get stuck because it’s being redirected to a login page for google when I run the script. How did you bypass this?
Its likely because your USER_ID is not correct, paste the URL here if you can.
This is the url: http://www.google.com/reader/atom/user/11088277186832730355/state/com.google/starred?n=800
It takes me to the feed I’m expecting so I think it’s correct.
I could not access your feed, but you are supposed to get an RSS XML output if using Firefox. (use View Source)
The xml output works fine for me. So we ended up altering your script to just use the xml file. However when I import the final xml file into wordpress it only imported parts of each post. Since I only have about 20 posts, I’m just going to go back through and copy and paste the rest of text in.
Update: The reason it was only importing parts of posts is that there was a tag of
in the xml that was causing problems. I searched and replaced all of those with a simple space, imported it again, and it worked. Yay!
Glad you got it working, if you have XML cleanser script, would you mind share with me and everyone?
Thanks for this tutorial. Btw, I am working on a ‘sed’ script to sanitize the XML
You’re are overcomplicating this needlessly;
Example:
Your site is slashdot.org –
The your feed URL is http://rss.slashdot.org/Slashdot/slashdot
the URL to view it in Google Reader is
http://www.google.com/reader/view/feed/http://rss.slashdot.org/Slashdot/slashdot
the URL of the atom feed for the google reader cache of the feed is then simply
http://www.google.com/reader/atom/feed/http://rss.slashdot.org/Slashdot/slashdot?n=1000
The ‘?n=1000’ at the end indicated how many items you want the feed to provide
You need to be logged in to view the feed, but you could subscribe the feed and add it to a public tag, then view the feed of the public tag to overcome that need.
Upto 2000 items this way; if you have more items, you’ll have to use the continuation parameter ( http://www.google.com/support/forum/p/reader/thread?tid=29c0f8a4afeced25&hl=en )
Thanks for tip.
u just made my day!!!!!!!!!
Thanks for the awesome tutorial, but I’m having one problem. I managed to get the proper google reader link, and I have saved all the important feed in xml, but I have no idea what to do with the code you posted in the code box. Could you explain me a bit?
The code use Simple-RSS library to fetch all XML output provided in the Google Reader URI, then use Builder class to compile WordPress RSS file. Just simple that is.
How exactly do I use Simple-RSS then, is it a server-based application? In that case, I doubt my host offers Ruby X.X
I use Ruby 1.8.7-p174 locally on my Macbook Air. And it is not related to server-based or not concept. You need to install Ruby 1.8.7 on your Windows 0r OSX or Linux and then install simple-rss using: gem install simple-rss. Just simple that is.
Thank you for the reply.
I tried running the script as you described above and it just showed a command prompt and nothing happened afterwards. I’m using Windows XP and I could try on Xubuntu too.
No answer so far, so I’ll just paste in what it says in the command prompt when I try to install simple-rss:
"X:\cardmagic-simple-rss-ef0d5db\cardmagic-simple-rss-ef0d5db\install.rb"
:29:in `require': no such file to load --
ftools (LoadError)
from :29:in `require'
from X:/cardmagic-simple-rss-ef0d5db/cardmagic-simple-rss-ef0d5db/install.rb:3:in `'
As indicated above, the script will generate an rss.xml file in the current folder. Please run the script in a writable folder. And I haven’t installed simple-rss using source so I could not help you much here. What I did is just simply: gem install simple-rss. Please use Xubuntu or any UNIX env for the job, I haven’t tested the code on Windows env yet.
So then basically you mean I should run “gem install simple-rss” into a terminal, well you might want to specify that next time.
So, I’ve managed to install simple-rss, and run the script, but now it says:
bash: /home/---/Desktop/RSS.rb: /usr/bin/ruby^M: bad interpreter: No such file or directory
*bump*
Help?
I bumped into a similar bug when all my ruby codes result in bad interpreter ^M. The ONLY way I found is to create new user, re-install ruby and ^_^
Thanks for your tips, however I have trouble running your script. Error msg below,
/var/lib/gems/1.8/gems/simple-rss-1.2.3/lib/simple-rss.rb:75:in `parse’: Poorly formatted feed (SimpleRSSError)
Any idea on how I can fix it? Thanks!
Seems your source is not a valid RSS, are you sure you set the URL to RSS source?
I guess you’re talking about the source variable in your script, here it is,
source = “http://www.google.com/reader/atom/user/11420598154031612632/state/com.google/starred?n=40”
I can see something like below on my browser,
+
Expanding “+” on the second line shows my blog context.
O.K., it seems simple-rss treats “” illegal. Any workaround besides modifying simple-rss.rb?
chenwj, I think you could append this line:
# encoding: utf-8
To the very top of the script, and please make sure you use ruby 1.9.2 or newer.