Markup Validator Web Service API
SOAP 1.2 validation interface documentation

Interface applications with the Markup Validator through its experimental API. This is version 0.2, dated May 2007. For a history of the format, see Change Log.

Note: Please be considerate in using this shared, free resource. Consider Installing your own instance of the validator for smooth and fast operation. Excessive use of the W3C Validation Service will be blocked.

Table of Contents

Validation Request Format

Below is a table of the parameter you can use to send a query to the W3C Markup Validator. All parameter values except data in uploaded_file are expected to be encoded in the UTF-8 character encoding.

If you want to use W3C's public validation server, use the parameters below in conjunction with the following base URI:
http://validator.w3.org/check
(replace with the address of your own server if you want to call a private instance of the validator)

Note: If you wish to call the validator programmatically for a batch of documents, please make sure that your script will sleep for at least 1 second between requests. The Markup Validation service is a free, public service for all, your respect is appreciated. thanks.

ParameterDescriptionDefault value
uri The URL of the document to validate None, but either this parameter, or uploaded_file, or fragment must be given.
uploaded_file The document to validate, POSTed as multipart/form-data None, but either this parameter, or uri, or fragment must be given.
fragment The source of the document to validate. Full documents only. None, but either this parameter, or uri, or uploaded_file must be given.
output triggers the various outputs formats of the validator. If unset, the usual Web format will be sent. If set to soap12, the SOAP1.2 interface will be triggered. See below for the SOAP 1.2 response format description. unset
charset Character encoding override: Specify the character encoding to use when parsing the document. When used with the auxiliary parameter fbc set to 1, the given encoding will only be used as a fallback value, in case the charset is absent or unrecognized. Note that this parameter is ignored if validating a fragment with the direct input interface. None, by default the validator detects the charset of the document automatically.
doctype Document Type override: Specify the Document Type (DOCTYPE) to use when parsing the document. When used with the auxiliary parameter fbd set to 1, the given document type will only be used as a fallback value, in case the document's DOCTYPE declaration is missing or unrecognized. None, by default the validator detects the document type of the document automatically.
verbose In the web interface, when set to 1, will make error messages, explanations and other diagnostics more verbose. In SOAP output, does not have any impact. 0 (unset)
debug When set to 1, will output some extra debugging information on the validated resource (such as HTTP headers) and validation process (such as parser used, parse mode etc.). In the SOAP output, this information will be given in <m:debug> elements. 0 (unset)
ss as show source. In the web interface, triggers the display of the source after the validation results. In SOAP output, does not have any impact. 0 (unset)
outline In the web interface, when set to 1, triggers the display of the document outline after the validation results. In SOAP output, does not have any impact. 0 (unset)

SOAP format description

When called with parameter output=soap12, the validator will switch to its SOAP 1.2 interface (experimental for now). Below is a sample response, as well as a description of the most important elements of the response.

sample SOAP 1.2 validation response

A SOAP response for the validation of a document (invalid) will look like this:

 
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Body>
<m:markupvalidationresponse
env:encodingStyle="http://www.w3.org/2003/05/soap-encoding" 
xmlns:m="http://www.w3.org/2005/10/markup-validator">
    <m:uri>http://qa-dev.w3.org/wmvs/HEAD/dev/tests/xhtml1-bogus-element.html</m:uri>
    <m:checkedby>http://validator.w3.org/</m:checkedby>
    <m:doctype>-//W3C//DTD XHTML 1.0 Transitional//EN</m:doctype>
    <m:charset>utf-8</m:charset>
    <m:validity>false</m:validity>
    <m:errors>
        <m:errorcount>1</m:errorcount>
        <m:errorlist>
          
            <m:error>
                <m:line>13</m:line>
                <m:col>6</m:col>                                           
                <m:source>  
                <![CDATA[
                  &#60;foo<strong title="Position where error was detected.">&#62;</strong>This phrase is enclosed in a bogus FOO element.&#60;/foo&#62;
                  ]]>
                </m:source>                                           
                <m:explanation>
                  <![CDATA[
                    <p> ... </p<p>
                  ]]>
                </m:explanation>                                           
                <m:messageid>76</m:messageid>                                           
                <m:message>element "foo" undefined</m:message>
            </m:error>
           
        </m:errorlist>
    </m:errors>
    <m:warnings>
        <m:warningcount>0</m:warningcount>
        <m:warninglist>
        
        
        </m:warninglist>
    </m:warnings>
</m:markupvalidationresponse>
</env:Body>
</env:Envelope>

SOAP1.2 response format reference

elementdescription
markupvalidationresponse The main element of the validation response. Encloses all other information about the validation results.
uri the address of the document validated. Will (likely?) be upload://Form Submission if an uploaded document or fragment was validated. In EARL terms, this is the TestSubject.
checkedby Location of the service which provided the validation result. In EARL terms, this is the Assertor.
doctype Detected (or forced) Document Type for the validated document
charset Detected (or forced) Character Encoding for the validated document
validity Whether or not the document validated passed or not formal validation (true|false boolean)
errors Encapsulates all data about errors encountered through the validation process
errorcount a child of errors, counts the number of errors listed
errorlist a child of errors, contains the list of errors (surprise!)
error a child of errorlist, contains the information on a single validation error.

Note: warnings, warningcount, warninglist and warning are similar to, respectively, errors, errorcount, errorlist and error.

SOAP1.2 atomic message (error or warning) format reference

As seen as the example above, the children of the error element, but also the warning element are line, col and message, defined below:

elementdescription
line Within the source code of the validated document, refers to the line where the error was detected.
col Within the source code of the validated document, refers to the column of the line where the error was detected.
message The actual error message
messageid The number/identifier of the error, as addressed internally by the validator
explanation Explanation for the error. Given as HTML fragment within CDATA block.
source Snippet of the source where the error was found. Given as HTML fragment within CDATA block.

Change Log

Up to version 0.2, all changes are backward-compatible.

v 0.2 (June 2007)
  • debug parameter now has an effect on both HTML and SOAP outputs
  • messageid is now implemented
  • added source and explanation elements.
v 0.1

Initial revision

Libraries

Building of libraries used to interact with the validator's API is encouraged. If you are the maintainer of such a library, contact us and we will list it here.

W3C has not reviewed, verified nor endorses these implementations.

Known libraries for the W3C Markup Validator API

Using HTTP headers to know validation results

Every validation result is served via the HTTP protocol, with custom headers giving a simple, quick way to get validation results without having to parse the results body. This is a simple (but poorer) alternative to using the full API described above.

The HTTP headers for a validation results page will generally look like:

    HEAD 'http://validator.localhost/check?uri=http%3A%2F%2Fwww.w3.org'
    
    200 OK
    [...]
    Content-Language: en
    Content-Type: text/html; charset=utf-8
    X-W3C-Validator-Errors: 0
    X-W3C-Validator-Warnings: 0
    X-W3C-Validator-Recursion: 1
    X-W3C-Validator-Status: Valid

The headers and their values are as follows:

HeaderValueNotes
X-W3C-Validator-Status Valid or Invalid if validation was performed.
value will be Abort if a fatal error (decoding, 404 not found, etc) was encountered and validation could not be performed
Note: Abort value was added in version 0.8.0
X-W3C-Validator-Errors Number of Errors found during validation. 0 if no errors found. 0 does not necessarily mean "valid" (it may mean that validation could not be performed)
X-W3C-Validator-Warnings Number of Warnings found during validation. 0 if no errors found. The warnings include validation warning, as well as pre-parsing warnings (such as character encoding mismatch, doctype override, etc.)
X-W3C-Validator-Recursion Integer. Generally, 1. More if recursively validating validation results. The validator will use this in conjunction with its Max Recursion setup to avoid abusive recursion (Denial of Service attack).