My Bachelor's Thesis (Czech)
The application is written in Java (1.4 SDK) and the web-based interface was tested on tomcat 5.0.14. JARV interface with Sun MSV validator is used to allow language independent validation but is currently used to validate Relax NG definitions. Saxon is used to extract embedded Schematron and validate HTML documents using it. Tagsoupe library is used to convert HTML 4.01 documents into XML so that those documents (still very common on WWW) may be validated as well. Apache ANT is used for building, installation, deployment, testing and validation.
The application has a modular architecture with an easy to use API. Most of eventual application extensions may be done just by changing configuration files without the need of code changes. The HTML schemas stem from the work of James Clark called "Modularization of XHTML in Relax NG". After a deep analysis of the mentioned W3C standards this schema was bug-fixed and extended with as many additional restrictions as possible. Those are formalized using Relax NG or Schematron (always the best suitable is used). The schema was adjusted to allow validation of strict, transitional and frameset HTML subsets. Finally some of the WCAG 1.0 restrictions were implemented using Schematron. There is a test-case library testing all the newly added definitions. This allows keeping the definition consistent and minimize bug occurrence during development.
Currently the application is stable and running (you may try to test it at here). There are many possibilities and opportunities for extensions. It's just a short step to fully suppor t modularization of XHTML and XML Basic standards. It would be nice to support ISO-HTML validation where the use of Relax NG and Schematron may significantly simplify the publication process. Currently the project is in approval process at SourceForge. The aim of making this project open source is to give all the work I have done on this for several months to the community so that people can use the tool freely, extend it in various ways according to there needs and come with new valuable ideas.
Web-based validation interface,
Command-line validation interface,
HTML definition development and testing environment,
Validated standard may be dynamically chosen during validation,
Validation output includes line number, severity and error description,
Complete Relax NG schema with Schematron patterns for HTML 4.01/XHTML 1.0 and WCAG 1.0.