<?xml version="1.0" encoding="UTF-8"?><api:function-page xml:base="/apidoc/8.0/xdmp.pdfConvert.xml" generated="2015-10-07T16:36:00.016766-07:00" mode="javascript" xmlns:api="http://marklogic.com/rundmc/api"><api:function-name>xdmp.pdfConvert</api:function-name><api:suggest>xdmp.pdfconvert</api:suggest><api:suggest>xdmp</api:suggest><api:suggest>pdfconvert</api:suggest><api:function-link mode="xquery" fullname="xdmp:pdf-convert">/apidoc/8.0/xdmp:pdf-convert.xml</api:function-link><api:function mode="javascript" name="pdfConvert" type="builtin" lib="xdmp" category="Document Conversion" hidden="false" bucket="MarkLogic Built-In Functions" prefix="xdmp" namespace="http://marklogic.com/xdmp" fullname="xdmp.pdfConvert"><api:summary>
  Converts a PDF file to XHTML. Returns several nodes,
  including a parts node, the converted document xml node, and any
  other document parts (for example, css files and images).  The first
  node is the parts node, which contains a manefest of all of the parts
  generated as result of the conversion.
</api:summary><api:params><api:param name="doc" type="node()" optional="false"><api:param-description>
  PDF document to convert to HTML, as a binary node().
  </api:param-description><api:param-name>doc</api:param-name><api:param-type>Node</api:param-type></api:param><api:param name="filename" type="xs:string" optional="false"><api:param-description>
    The root for the name of the converted files and directories. If the
    specified filename includes an extension, then the extension is appended
    to the root with an underscore. The directory for other parts of the
    conversion (images, for example) has the string "_parts" appended to the
    root. For example, if you specify a filename of "myFile.pdf", the
    generated names will be "myFile_pdf.xhtml" for the xml node and
    "myFile_pdf_parts" for the directory containing the any other parts
    generated by the conversion (images, css files, and so on).
  </api:param-description><api:param-name>filename</api:param-name><api:param-type>String</api:param-type></api:param><api:param name="options" type="(element()|map:map)?" optional="true"><api:param-description>

    The options 
                <span class="javascript" xmlns="http://www.w3.org/1999/xhtml">object</span>
    for this conversion. 
    The default value is <code xmlns="http://www.w3.org/1999/xhtml">
    <span class="javascript">null</span></code>.

    In addition to the options shown below, you can
    
    <span class="javascript" xmlns="http://www.w3.org/1999/xhtml">add <code>xdmp.tidy</code> options directly.</span>

    <p xmlns="http://www.w3.org/1999/xhtml">Options include:</p>
    <blockquote xmlns="http://www.w3.org/1999/xhtml"><dl>
    <dt><p>
    <span class="javascript"><code>tidy</code></span></p></dt>
    <dd>Default value: <code>true</code> <br/><br/>
    Specify <code>true</code> to run tidy on the document and
    <code>false</code> not to run tidy.

    If you run tidy, you can also specify any
    
    <a class="javascript" href="/xdmp.tidy">xdmp.tidy</a> options.
    
    </dd>
    <dt><p>
    <span class="javascript"><code>config</code></span></p></dt>
    <dd>The configuration file for the conversion. You can specify an
    absolute path or a relative path. The relative path is relative
    to the <code>&lt;install_dir&gt;/Converters/cvtpdf</code> directory.
    The default configuration file is named <code>PDFtoHTML.cfg</code>;
    it produces a single reflowed XHTML document with CSS styling. Setting
    this parameter may override the remaining options.</dd>
    <dt><p>
    <span class="javascript"><code>pageByPage</code></span></p></dt>
    <dd>Default value: <code>false</code><br/><br/>
    Specify <code>true</code> to select a different default configuration
    file that produces one XHTML document per page with absolute positioning.
    The default paged configuration file is named <code>PDFtoXHTML_pages.cfg</code>
    If a specific configuration file is selected with the <code>config</code>
    option, the 
    <span class="javascript"><code>pageByPage</code></span> option has no effect.
    </dd>
    <dt><p>
    <span class="javascript"><code>pageStartId</code></span></p></dt>
    <dd>Default value: <code>0</code><br/><br/>
    The index of the first page to convert. Page indices start at zero.
    </dd>
    <dt><p>
    <span class="javascript"><code>pageEndId</code></span></p></dt>
    <dd>Default value: <code>-1</code><br/><br/>
    The index of the last page to convert. Page indices start at zero.
    The default is -1, meaning to convert through the last page of the
    document.
    </dd>
    <dt><p>
    <span class="javascript"><code>synthBookmarks</code></span></p></dt>
    <dd>Default value: <code>true</code><br/><br/>
    Enable/disable converter's internal font-based TOC inferences.
    </dd>
    <dt><p>
    <span class="javascript"><code>imageOutput</code></span></p></dt>
    <dd>Default value: <code>true</code><br/><br/>
    Enable/disable extraction and conversion of images.
    </dd>
    <dt><p>
    <span class="javascript"><code>textOutput</code></span></p></dt>
    <dd>Default value: <code>true</code><br/><br/>
    Enable/disable extraction of text.
    </dd>
    <dt><p>
    <span class="javascript"><code>zones</code></span></p></dt>
    <dd>Default value: <code>false</code><br/><br/>
    Enable/disable zone controls. Using <code>true</code> produces better
    results when the PDF is annotated; using <code>false</code> produces
    better results in non-annotated tables.
    </dd>
    <dt><p>
    <span class="javascript"><code>ignoreText</code></span></p></dt>
    <dd>Default value: <code>true</code><br/><br/>
    Enable/disable extraction of text from images. Documents consisting of
    scanned pages can only have text extracted if this parameter is set to
    <code>true</code>; however, diagrams with embedded text labels may
    be less palatable. For page-by-page conversion, the problem with reflowing
    of text and graphical elements within a diagram giving poor results is
    not such a problem, and the value of <code>false</code> will probably
    be the better choice.
    </dd>
    <dt><p>
    <span class="javascript"><code>removeOverprint</code></span></p></dt>
    <dd>Default value: <code>false</code><br/><br/>
    Enable/disable removal of text overlays. Setting this parameter to
    <code>true</code> can sometimes clean up messy results stemming from
    reflowing of text that was not visible in the original PDF because it
    was covered by something else.
    </dd>
    <dt><p>
    <span class="javascript"><code>illustrations</code></span></p></dt>
    <dd>Default value: <code>true</code><br/><br/>
    Enable/disable extraction of illustrations. Setting this parameter to
    <code>false</code> can sometimes clean up messy results stemming from
    minor and unnecessary graphical ornaments.
    </dd>
    <dt><p>
    <span class="javascript"><code>imageQuality</code></span></p></dt>
    <dd>Default value: <code>75</code><br/><br/>
    Determines the quality of extracted and converted images: smaller values
    mean smaller image sizes (in bytes) but lossier rendering. The maximum is
    100.
    </dd>
    <dt><p>
    <span class="javascript"><code>pageStart</code></span></p></dt>
    <dd>Default value: <b>none</b><br/><br/>
    Boilerplate text inserted at the start of every page. Any XML markup
    must be escaped. For example: <code>&lt;p&gt;PAGE START&lt;/p&gt;</code>
    </dd>
    <dt><p>
    <span class="javascript"><code>pageEnd</code></span></p></dt>
    <dd>Default value: <b>none</b><br/><br/>
    Boilerplate text inserted at the end of every page. XML markup must be
    escaped.
    </dd>
    <dt><p>
    <span class="javascript"><code>documentStart</code></span></p></dt>
    <dd>Default value: <b>none</b><br/><br/>
    Boilerplate text inserted at the start of every document. XML markup
    must be escaped.
    </dd>
    <dt><p>
    <span class="javascript"><code>documentEnd</code></span></p></dt>
    <dd>Default value: <b>none</b><br/><br/>
    Boilerplate text inserted at the end of every document. XML markup must
    be escaped.
    </dd>
    <dt><p>
    <span class="javascript"><code>password</code></span></p></dt>
    <dd>Default value: <b>none</b><br/><br/>
    The password required to open a password-protected PDF.
    </dd>
    <dt><p>Sample Options Node:</p></dt>
    <dd>The following is a sample options node which specifies that tidy is
    used to clean the generated html, specifies to use the tidy "clean" option,
    and specifies a particular configuration file to use for the conversion:
    
<pre class="javascript">
{
  'tidy': true,
  'clean': 'yes',
  'config': "c:\myConfigFile.cfg"
}
</pre>
</dd>
    </dl>
    </blockquote>

  </api:param-description><api:param-name>options</api:param-name><api:param-type>Object?</api:param-type></api:param></api:params><api:return>ValueIterator</api:return><api:usage>
  The convert functions return several nodes.  The first node is a manifest
  containing the various parts of the conversion. Typically there will be
  an xml part, a css part, and some image parts.  Each part is returned as
  a separate node in the order shown in the manifest.
  <p xmlns="http://www.w3.org/1999/xhtml">Therefore, given the following manifest: </p>
  <pre xmlns="http://www.w3.org/1999/xhtml">
&lt;parts&gt;
  &lt;part&gt;myFile_pdf.xhtml&lt;/part&gt;
  &lt;part&gt;myFile_pdf_parts/conv.css&lt;/part&gt;
  &lt;part&gt;myFile_pdf_parts/toc.xml&lt;/part&gt;
&lt;/parts&gt;
</pre>
  <p xmlns="http://www.w3.org/1999/xhtml">the first node of the returned query is the manifest, the second is the
  "myFile_pdf.xhtml" node, the third is the "myFile_pdf_parts/conv.css" node,
  and the fourth is the myFile_pdf_parts/toc.xml node.</p>

</api:usage><api:example class="javascript"><pre xml:space="preserve" xmlns="http://www.w3.org/1999/xhtml">
var results = xdmp.pdfConvert(
                xdmp.documentGet("/space/Hello.pdf"),
                "Hello.pdf");
var manifest= results.next().value;
var pdfAsXHTML = results.next().value;
pdfAsXHTML;

=&gt; The pdf document converted as xhtml.  The results variable
   is a ValueIterator, where the first item is the manifest, and the 
   remaining items are the converted nodes.
</pre></api:example></api:function></api:function-page>