HTML::HTML5::Parser is an HTML parser, similar to the non-CPAN module
Whatpm::HTML with some changes including:
* Provides an XML::LibXML-like DOM interface. If you usually use
XML::LibXML's DOM parser, this should be a drop-in solution for tag
soup HTML.
* Constructs an XML::LibXML::Document as the result of parsing.
* Via bundling and modifications, removed external dependencies on
non-CPAN packages.


Install Howto

  1. Update the package index:
    # sudo apt-get update
  2. Install libhtml-html5-parser-perl deb package:
    # sudo apt-get install libhtml-html5-parser-perl




2013-07-15 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.301-1) unstable; urgency=low
[ Salvatore Bonaccorso ]
* Use canonical hostname ( in Vcs-Git URI.
[ Jonas Smedegaard ]
* Add README.source emphasizing file as *not* a
show-stopper for contributions, referring to wiki page for details.
* Bump standards-version to 3.9.4.
* Drop patch 1001: Build problem turned out to be (and fixed by now)
in CDBS.
* Add git URL as alternate source.
* Stop tracking md5sum of upstream tarball.
* List upstream issue tracker as preferred contact.
* Update Homepage to use, to match upstream hint.
* Bump packaging license to GPL-3+, and extend copyrigt coverage for
myself to include recent years.
* Update copyright coverage for additional convenience code copies.
2012-12-11 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.208-1) unstable; urgency=low
[ upstream ]
* New upstream release.
+ Bugfix: If two <html> tags were in the same file, attributes on
the second <html> element could cause crashes.
+ Bugfix: Minor fix re LWP-independence.
[ Jonas Smedegaard ]
* Recommend libwww-perl.
* Unfuzz patch.
2012-08-07 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.206-2) unstable; urgency=low
* Bump debhelper compatibility level to 8.
* Update package relations:
+ Relax to (build-)depend unversioned on cdbs: Needed version
satisfied in stable, and oldstable no longer supported.
* Update copyright file:
+ Fix use pseudo-license-in-comment and -comment-in-license fields:
File format 1.0 mandates License field to either be single-line or
include all licensing info.
2012-06-30 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.206-1) experimental; urgency=low
* New upstream release.
* (Build-)depend on recent perl or libttp-tiny-perl (not libwww-perl),
and liburi-perl.
* Update copyright file to cover yet another convenience copy of
external Perl module.
2012-06-14 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.200-1) experimental; urgency=low
* New upstream release.
* Update inclusion of example files.
* (Build-)depend on libio-html-perl and libtry-tiny-perl (not
liberror-perl or libhtml-encoding-perl).
* Update copyright file:
+ Improve Files sections for convenience copies of external Perl
+ Quote license string in GPL comment.
* Add patch 1001 to avoid auto-installing dependencies during build.
2012-03-25 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.110-1) unstable; urgency=low
* New upstream release.
* Use URL in Vcs-Browser field.
* Update package relations:
+ Newline-delimit dependencies.
+ Suggest (not yet packaged) libxml-libxml-devel-setlinenumber-perl.
2012-03-19 - Jonas Smedegaard <>
libhtml-html5-parser-perl (0.109-1) unstable; urgency=low
* New upstream release.
[Jonas Smedegaard]
* Fix use by-module URL for get-orig-source rule.
* Update copyright file:
+ Improve coverage of convenience code copies below inc/.
+ Fix double-indent in Copyright fields as per Policy ยง5.6.13.
+ Quote license strings in comments.
* List Florian as copyright holder in debian/rules, and extend
copyright years for myself.
* Update package relations:
+ Relax build-dependency on cdbs (needlessly tight).
+ Relax build-depend unversioned on debhelper and devscripts (needed
versions satisfied even in oldstable).
[ Florian Schlichting ]
* Tighten dependency on XML::LibXML to at least version 1.93.
* Bump Standards-Version to 3.9.3.
* Update copyright file:
+ Bump copyright format to 1.0.
+ Adjust copyright year for Scalar::Util version 1.23.
+ Add files paragraph for html5lib test data.

