Kanojo Blog Blog-Blog

2Jan/100

Mahjong ruleset, TeXed

As we've been playing more and more Mahjong (not the solitair version, the real thing) recently and just stumbeled upon #mahjong in rizon where we got linked a really really nice ruleset, here i though ... well, wouldn't it be nice to have this as a booklet printout so you can check the rules or yaku right at the table if you're unsure.

Okay, so after almost two days of fighting with XeLaTeX to get nice unicode support and fighting defoma for getting a nice font its finally done!

You can fetch the PDF here, the booklet printing (just print it, fold the whole stack in the middle (short-edge oriented) and staple it together) version as postscript is available here.

Also note that there might be a few mistakes due to the hardcore TeX action in the typesetting, feel free to report those to me to get 'em fixed. For errors in the original document please contact the original author or me.

I hope you're having your fun playing with those rule sheets, i hope they came out nicely :P. For a nice, short yaku overview just surf up here.

1Jan/100

Robust HTML Parsing (in Ruby)?

Have you ever wanted to parse information from some rather complex or totally broken (in terms of html standards compliance) website? Maybe you tried fighting that problem with regular expressions or DOM or SAX XML parser. If you did you probably ran into some problems: Maybe there were too many similar matches for your regex as there are repeating similar patterns in the website or your XML parser went crazy with invalid formatted or non-xhtml-compliant content?

I wanted to parse a website that had no RSS feed for changes and create a RSS feed. I first tried around with various of the ideas mentioned above but as the website is kind of "irregular" (every item is a slight bit different) and W3 validator shows over 11k of errors (in 1.1 transitional) i had quite some problems.

Until i found Rubies Hpricot, a HTML parser that lets you realize robust HTML parsing of fucked up formatted and non-standard-compliant content at ease.