[three]Bean
In which I avoid the inverse unicode sandwich
Jun 22, 2012 | categories: python, toscawidgets, testing, turbogears View CommentsProblem #1 - I need to test tons of HTML output for correctness (because I maintain toscawidgets2). That output varies slightly because tw2 supports five different templating languages (mako, genshi, jinja2, kajiki, and chameleon). Using double-equals (==) just won't do it.
Solution #1 - We used strainer. It works!
Problem #2 - Imagine porting this to Python 3. Yes, that's right. The encoding is sniffed by hand and then used to encode regular expressions; these are in turn applied to parse XML. Think "inverse unicode sandwich with a side of Cthulhu."
Solution #2 - I wrote sieve: a baby module child of one corner of FormEncode and another corner of strainer. It works on pythons 2.6, 2.7, and 3.2. If you like, you may use it:
>>> from sieve.operators import eq_xml, in_xml >>> a = "<foo><bar>Value</bar></foo>" >>> b = """ ... <foo> ... <bar> ... Value ... </bar> ... </foo> ... """ >>> eq_xml(a, b) True >>> c = "<html><body><foo><bar>Value</bar></foo></body></html" >>> in_xml(a, c) # 'needle' in a 'haystack' True
p.s. -- I looked into xmldiff. Awesome!