Kanojo.de Blog

13Jan/100

AmiAmi.com RSS

Maybe some of you like figures (as in figmas, nendos, etc.) and know www.AmiAmi.com - which is a great shop. It just poses one problem - it offers no RSS feed that allows you to stay up to date to whether items get available, get into preorder or are restocked. So you may miss the item you want due to the completely crazy japanese habit of buying everything that was just restocked out in mere hours.

As we have also been disstatisfied with that we wrote a parser for AmiAmi that creates a RSS feed. You may also subscribe it, it's update twice a day at 16:00 and 04:00 Europe/Berlin time every day. You can find it on:

http://ghostdub.de/~eefi/rss.xml

Maybe it'll help you harvesting the Otaku loot you want :P .

1Jan/100

Robust HTML Parsing (in Ruby)?

Have you ever wanted to parse information from some rather complex or totally broken (in terms of html standards compliance) website? Maybe you tried fighting that problem with regular expressions or DOM or SAX XML parser. If you did you probably ran into some problems: Maybe there were too many similar matches for your regex as there are repeating similar patterns in the website or your XML parser went crazy with invalid formatted or non-xhtml-compliant content?

I wanted to parse a website that had no RSS feed for changes and create a RSS feed. I first tried around with various of the ideas mentioned above but as the website is kind of "irregular" (every item is a slight bit different) and W3 validator shows over 11k of errors (in 1.1 transitional) i had quite some problems.

Until i found Rubies Hpricot, a HTML parser that lets you realize robust HTML parsing of fucked up formatted and non-standard-compliant content at ease.

7Dec/090

The pain of being independent

I've spend a whole lot of the time of the last few days tinkering on various parts of my root server. With the time passing you get used to the comfort of various (web based) tools such as GMail, Google-Calendar, Google Reader, etc. You may notice that you just read the word "Google" quite often, so what pops into your mind? Right, privacy. Google kind of mines quite some of your data. Ever checked the Ads google shows you on various sites (given you don't use a capable Ad Blocker)? Sometimes it gets quite creepy. That data is quite valueable for profiling your behavior, and that profile is (not related to your persona, but in general) sold to marketing monkeys.

So, that's why you might want to rebuild all those Tools you're used to in a trusted environment - your own server. You don't want your mail stored in some possible hostile environment on a untrusted machine that could leak your valuable data. Turns out its not that easy sometimes. I've only worked on getting a capable Web Mailer and Feed Reader to run smoothly. What surprised me here - and why I'm writing about this - is that it was exceptionally hard - or rather time-intensive - for something sounding as easy as this.

google-privacy

I first targeted the Reader, looked around Freshmeat and SourceForge where you expect to find decent free software for that task. I've found quite a few not-so-simple-looking projects, including Tiny Tiny RSS. Turned out TinyTiny RSS is almost there, but needs PostgreSQL. So  i set up PostgreSQL, installed TTRSS and set it up. Imported GoogleReaders OPML, and zup. worked. Problem's all the feeds are in one big Table - resulting in the whole thing beeing painfully slow. So up for 7-8 Hours of harcore Postgres performance tuning, trying to hack memcached into ttrss, etc. Speedup of almost 100%, yet it was not close to beeing usable. Turns out the developer didn't intent the project for archiving articles for having a searchable archive. So up for something different, Gregarius, which the TTRSS dev recommended. This worked quite out of the box, except for 3-4 Hours of tinkering and writing small plugins to get the whole thing to work properly. But it does - and has almost all the feature one expects.

The harder part came now, Webmail. I first tried to hack the roughly-set-up RoundCube that still was on my server. After short testing and many functions that just did not work due to unknown reasons i knew i needed something different - and started toying around with Horde and its webmailer Imp(4) in the Horde Webmail Edition pack. Horde feels somehow "unix style"-ish, like ... building a highly reusable backend, letting other projects include/work ontop of that backend, etc. - but let me say one thing: This beast is so darn hard to set up. I've got it working - more or less - after hours and hours of doc reading, tinkering around with mysql tables and databases and reading Horde source due to its ... slightly lacking ... documentation on some points. Still, it was so unstable and lacked features too. So i finally decided Horde/IMP was a bit too much to go with. After searching around its a lets-get-back-to-Roundcube.

GoogleXKCD

Just that this didn't make things better, well... a bit at least. It takes tinkering, fixing old plugins to work with the current version, finding out why the hell buttons are greyed out that shouldn't be and getting a "well, reset the database (again)" from the developers. All of that fun. As of now i at least managed to get everything except for Filter-Rules and sa-learning ham to work. Phew!

Okay, so why am i writing all this? you may ask yourself. Well, i've gone through some PITA for beeing independent. It really takes some work to get everything to run smoothly if you're used to professional systems that are customly coded (such as the google stuff) and backed by real money its still some hackish tinkering to get the Free Software tools that we're given by the community (which i'm not ranting against by the way, all that code out there is really beautiful in fact) to the same level. I also want to encourage everyone out there not to give away their data but to build something on their own, keeping their data. As computer users used to for a good reason for a long long time.

4Nov/091

iPhone Push-EMail with your own Mailserver (Exim, Postfix, QMail, etc…)

As Jabber does not seem appropriate for a SMS-replacement (the clients just don't to survive connection changes from wifi->3g, 3g->E, etc.) i've been looking for something different. E-Mail is nice, just the shortest polling-interval of 15minutes is not suited for realtime communication. So - theres Push Email (meaning the iPhone will hold a HTTP connection to some server open all the time and the server writes something to that stream as soon as a new mail arrives) for various services including Microsoft Exchange and GMail. How about you hosting a own server since you don't want to give away all your data? No problem - either use one of the tons of commercial servers supporting Push or use Z-Push with your favorite IMAP and SMTP server.

30Oct/090

GNU/Linux iPhone Sync – Wireless! Funambol error -1, yay!

I recently got me a iPhone for tinkering, development (i've got a few nice ideas) and general nerdism. I've run into a few problems syncing my PIM (stuff like Contacts, Tasks, etc.) - especially since i use GNU/Linux which is no platform to run iTunes. Pictures and Music is no Problem as gtkpod and the like support the iPhone. Just the important stuff does not work out nicely.