Problem #1 - I need to test tons of HTML output for correctness (because I maintain toscawidgets2). That output varies slightly because tw2 supports five different templating languages (mako, genshi, jinja2, kajiki, and chameleon). Using double-equals (==) just won't do it.
Solution #1 - We used strainer. It works!
Problem #2 - Imagine porting this to Python 3. Yes, that's right. The encoding is sniffed by hand and then used to encode regular expressions; these are in turn applied to parse XML. Think "inverse unicode sandwich with a side of Cthulhu."
>>> from sieve.operators import eq_xml, in_xml >>> a = "<foo><bar>Value</bar></foo>" >>> b = """ ... <foo> ... <bar> ... Value ... </bar> ... </foo> ... """ >>> eq_xml(a, b) True >>> c = "<html><body><foo><bar>Value</bar></foo></body></html" >>> in_xml(a, c) # 'needle' in a 'haystack' True
p.s. -- I looked into xmldiff. Awesome!