Robust HTML Parsing (in Ruby)?

Have you ever wanted to parse information from some rather complex or totally broken (in terms of html standards compliance) website? Maybe you tried fighting that problem with regular expressions or DOM or SAX XML parser. If you did you probably ran into some problems: Maybe there were too many similar matches for your regex as there are repeating similar patterns in the website or your XML parser went crazy with invalid formatted or non-xhtml-compliant content?

I wanted to parse a website that had no RSS feed for changes and create a RSS feed. I first tried around with various of the ideas mentioned above but as the website is kind of “irregular” (every item is a slight bit different) and W3 validator shows over 11k of errors (in 1.1 transitional) i had quite some problems.

Until i found Rubies Hpricot, a HTML parser that lets you realize robust HTML parsing of fucked up formatted and non-standard-compliant content at ease.

Read more

Backferment (special yeast) bread

Recently we’ve been into baking bread. Like, traditional easy bread thats tastey and available kinda quick after you’ve decided you wanted bread for dinner or supper. Anyhow those simple baguett-like yeast bread, even though they require quite some trying around to be able to properly handle the yeast properly to achieve a even fluffyer result … Read more

Ladder Stitch

You’re going to use this stitch many times if you make plushies, usually used for closing after you’ve stuffed them. It’s made of awesome. I plan to write a more detailed description this stitch in the future but these pictures have to suffice for now. You can google it in the meantime. Like this (click … Read more