Howto recover WordPress posts from Google Reader

What if:

  • You lost all your WordPress posts
  • You are lucky for the fact that you have subscribed your blog on your own Google Reader
  • There is no backup of DB or any tools that could help you to recover the lost files

Then you are lucky! Because you could recover:

+ Posts content that is fed to Google Reader

however you could not recover:

  • Attachment post
  • Comments
  • Other miscs linked to the post

Here is how:

+ Login into your Google Reader, unstar all posts and star all items belongs to your blog
+ Next, we will export all those starred items to Atom XML file. But first, you need to grab your Google Reader ID. To do that, login to url http://www.google.com/reader/view/ then View Source of the page under Firefox or any browsers. Then you will need to search for _USER_ID in the source. For my case, the search result is:

_USER_ID = "10716493500706428020",

All good, next we need to generate the URL to ATOM XML, that is:

http://www.google.com/reader/atom/user/USER_ID/state/com.google/starred?n=NUM_POST

replace USER_ID with your USER_ID and NUM_POST is the number of posts you want to retrieve. For my case, it is:

http://www.google.com/reader/atom/user/10716493500706428020/state/com.google/starred?n=800

All good. Next I use XML parser and manipulate the Feed into RSS Feed then have it imported with WordPress’s builtin Import RSS function. I use Ruby for this tutorial, please feel free to adapt the code to your favorite language:

#!/usr/bin/ruby
require 'rubygems'
require 'simple-rss'
require 'open-uri'
require 'builder'
require 'progressbar'

source = "http://www.google.com/reader/atom/user/10716493500706428020/state/com.google/starred?n=800" # url the google reader feed
content = ""

pbar = nil

feed = SimpleRSS.parse(open(source,
      :content_length_proc => lambda {|t|
        if t && 0 < t
          pbar = ProgressBar.new("Fetching Atom Feed", t)
          pbar.file_transfer_mode
        end
      },
      :progress_proc => lambda {|s|
        pbar.set s if pbar
      }))

xml = Builder::XmlMarkup.new( :target => File.open("rss.xml", "w"), :indent => 2 )
xml.instruct!
xml.rss("version" => "0.9.2") do
  xml.channel do
    xml.title feed.channel.title
    xml.link  feed.channel.link
    xml.description "I am going to turn to RSS :)"
    xml.lastBuildDate feed.channel.updated
    xml.docs "http://backend.userland.com/rss092"
    xml.language "en"

    feed.items.each do | item |
      xml.item do
          xml.pubDate item.published
          xml.category item.category
          xml.title item.title
          c = item.content
          c.gsub!("\n",'')
          xml.description c
          xml.link item.link
      end
    end
  end
end

What you would have after running the script is rss.xml. But it’s not the perfect RSS feed. Here is what you need to clean up the file:

+ Remove the tag at the first line – I don’t know why XML instruct! create that line when use with File pipe – I will update the post if I find a fix
+ Search and replace

&lt;
&gt;

with

>
<

+ If you have images that link to attachment, you should search and replace it with the image path itself. I use TextMate to do this tedious task, use Search with Regular Expression.

Next, the last step is to login into your WP backend and go to Tools >> Import and select the rss.xml file. All done 🙂

Advertisements

About Jones Lee

Nothing much about me..

26 responses to “Howto recover WordPress posts from Google Reader

  1. Lydia

    This is a great tutorial; however, I’ve run into a problem.

    I get stuck because it’s being redirected to a login page for google when I run the script. How did you bypass this?

  2. Jones Lee

    I could not access your feed, but you are supposed to get an RSS XML output if using Firefox. (use View Source)

  3. Lydia

    The xml output works fine for me. So we ended up altering your script to just use the xml file. However when I import the final xml file into wordpress it only imported parts of each post. Since I only have about 20 posts, I’m just going to go back through and copy and paste the rest of text in.

  4. Lydia

    Update: The reason it was only importing parts of posts is that there was a tag of

     

    in the xml that was causing problems. I searched and replaced all of those with a simple space, imported it again, and it worked. Yay!

  5. Suresh

    Thanks for this tutorial. Btw, I am working on a ‘sed’ script to sanitize the XML

  6. You’re are overcomplicating this needlessly;
    Example:
    Your site is slashdot.org –

    The your feed URL is http://rss.slashdot.org/Slashdot/slashdot

    the URL to view it in Google Reader is
    http://www.google.com/reader/view/feed/http://rss.slashdot.org/Slashdot/slashdot

    the URL of the atom feed for the google reader cache of the feed is then simply
    http://www.google.com/reader/atom/feed/http://rss.slashdot.org/Slashdot/slashdot?n=1000
    The ‘?n=1000’ at the end indicated how many items you want the feed to provide
    You need to be logged in to view the feed, but you could subscribe the feed and add it to a public tag, then view the feed of the public tag to overcome that need.

    Upto 2000 items this way; if you have more items, you’ll have to use the continuation parameter ( http://www.google.com/support/forum/p/reader/thread?tid=29c0f8a4afeced25&hl=en )

  7. anish

    u just made my day!!!!!!!!!

  8. Andoru

    Thanks for the awesome tutorial, but I’m having one problem. I managed to get the proper google reader link, and I have saved all the important feed in xml, but I have no idea what to do with the code you posted in the code box. Could you explain me a bit?

    • Jones Lee

      The code use Simple-RSS library to fetch all XML output provided in the Google Reader URI, then use Builder class to compile WordPress RSS file. Just simple that is.

      • Andoru

        How exactly do I use Simple-RSS then, is it a server-based application? In that case, I doubt my host offers Ruby X.X

      • Jones Lee

        I use Ruby 1.8.7-p174 locally on my Macbook Air. And it is not related to server-based or not concept. You need to install Ruby 1.8.7 on your Windows 0r OSX or Linux and then install simple-rss using: gem install simple-rss. Just simple that is.

  9. Andoru

    Thank you for the reply.
    I tried running the script as you described above and it just showed a command prompt and nothing happened afterwards. I’m using Windows XP and I could try on Xubuntu too.

    • Andoru

      No answer so far, so I’ll just paste in what it says in the command prompt when I try to install simple-rss:

      "X:\cardmagic-simple-rss-ef0d5db\cardmagic-simple-rss-ef0d5db\install.rb"
      :29:in `require': no such file to load --
      ftools (LoadError)
      from :29:in `require'
      from X:/cardmagic-simple-rss-ef0d5db/cardmagic-simple-rss-ef0d5db/install.rb:3:in `'

      • Jones Lee

        As indicated above, the script will generate an rss.xml file in the current folder. Please run the script in a writable folder. And I haven’t installed simple-rss using source so I could not help you much here. What I did is just simply: gem install simple-rss. Please use Xubuntu or any UNIX env for the job, I haven’t tested the code on Windows env yet.

  10. Andoru

    So then basically you mean I should run “gem install simple-rss” into a terminal, well you might want to specify that next time.
    So, I’ve managed to install simple-rss, and run the script, but now it says:
    bash: /home/---/Desktop/RSS.rb: /usr/bin/ruby^M: bad interpreter: No such file or directory

  11. chenwj

    Thanks for your tips, however I have trouble running your script. Error msg below,

    /var/lib/gems/1.8/gems/simple-rss-1.2.3/lib/simple-rss.rb:75:in `parse’: Poorly formatted feed (SimpleRSSError)

    Any idea on how I can fix it? Thanks!

    • Trung LE

      Seems your source is not a valid RSS, are you sure you set the URL to RSS source?

      • chenwj

        I guess you’re talking about the source variable in your script, here it is,

        source = “http://www.google.com/reader/atom/user/11420598154031612632/state/com.google/starred?n=40”

        I can see something like below on my browser,

        +

        Expanding “+” on the second line shows my blog context.

      • chenwj

        O.K., it seems simple-rss treats “” illegal. Any workaround besides modifying simple-rss.rb?

  12. chenwj, I think you could append this line:

    # encoding: utf-8

    To the very top of the script, and please make sure you use ruby 1.9.2 or newer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: