confernece xml-v 1.2

Upload: srirangam-vamshikrishna

Post on 10-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/22/2019 Confernece XML-V 1.2

    1/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 1 of 22

    Input PDF and Meta XML file

    Output XML, HTML, Images

    Process

    1. Convert extracted TBXML to XML using conversion script. Script ConvertsConversion of TBXML to XML based on conf-jats1.dtdDTD (version 1.5)Injection of Tex tagging from the server

    Renaming of extracted Images as per specifications

    2. Manual Review and FixingPoints for manual review and fixing are mentioned below

    3. ValidationValidation of XML files according to DTD RULES.

    Validation of TeX files according to Vtex RULES.

    4. QA ChecksRun QA script and fix the error logs, if any

    Source Input Folder Structure

    +Year

    +Publication Number

    +Issue Number

    +Article Number

    PDF/XML/Images

  • 7/22/2019 Confernece XML-V 1.2

    2/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 2 of 22

    Manual Review and Fixing

    1. Sequence for lines of XML and Doctype declaration asExample tagging

    RULES

    Word utf should be in lower case

    DTD version may vary as per current provided DTD by IEEE.

    2. Element conf-articleExample tagging

    RULES

    Validate content-type attribute value from the XML metadata element

    Attribute article-type may vary according to the content of article

    Need to capture the attributes as defined in example only. No attribute should be extra or missing in the final file

    Validate open access attribute value from Meta xml file

    a. If element value is F, open-access="no

    b. If element value is T, open-access="yes

    3. Comment TagsExample tagging

    RULES

    Delivery Date: Should be uploading date

    XML Script: Should be current TBXML 2 XML conversion script version

    Batch: Should be generated as per guidelines

    4. Conference AcronymExample tagging

    CIPS

    RULES

    Always be in Capital Letters

    Should be captured from Meta XML from below tag

    CIPS

  • 7/22/2019 Confernece XML-V 1.2

    3/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 3 of 22

    5. Conference full titleExample tagging

    2012 7th International Conference on Integrated Power Electronics Systems (CIPS)

    RULES

    Should be captured from Meta XML from below tag

    (CIPS)]]>

    Capture the content case as appearing in Meta XML

    6. Conference Normalized TitleExample tagging

    Integrated Power Electronics Systems (CIPS), 2012 7th International Conference

    RULES

    Should be captured from Meta XML from below tag

    on]]>

    Do not capture word on appearing at the end of title

    Capture the content case as appearing in Meta XML

    7. VolumeExample tagging1

    RULES

    Capture from meta XML file

    If dummy tag is appearing in Meta XML, we need to capture the default value as 1

    8. ISBNExample tagging

    978-3-8007-3414-6

    RULES

    Capture from meta XML file

    Attribute mediatype appearing in meta xml should be captured as attribute content-type in output xml file

    Do not convert the attribute isbntype, if appearing in meta xml

    Multiple ISBN appearing in Meta xml should be captured in output xml file

    If duplicate ISBN appearing with same element and attribute value, it should be captured as once

    If 2 ISBN appearing with same element value, but different attribute value or vice versa, need to capture as 2

    elements in output xml file

    Do not capture the dummy markup, if appearing in meta xml file

  • 7/22/2019 Confernece XML-V 1.2

    4/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 4 of 22

    9. ISSNExample tagging

    1530-1591

    RULES

    Capture from meta XML file Attribute mediatype appearing in meta xml should be captured as attribute content-type in output xml file

    Do not convert the attribute issntype, if appearing in meta xml

    Multiple ISSN appearing in Meta xml should be captured in output xml file

    If duplicate ISSN appears

    Case 1: If duplicate ISSN values are appearing with same attribute

    Capture it at once

    Case 2: If duplicate ISSN values are appearing with different attribute

    Use the mediatype "Paper" ISSN only and omit the others

    Do not capture the dummy markup, if appearing in meta xml file

    10. Conference NameExample tagging

    2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012)

    RULES

    Should be captured from Meta XML from element conftitle

    Capture the content case as per Meta xml file

    11. Conference Start DateExample tagging

    12March2012

    RULES

    Should be captured from Meta XML from element

    Capture the value as per Meta xml file. Do not convert the month value in numeric

    Generate the attribute iso-8601-datein format of YYYY-MM-DDNo need to generate prefix 0 for single digit day or month information

  • 7/22/2019 Confernece XML-V 1.2

    5/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 5 of 22

    12. Conference End DateExample tagging

    16March2012

    RULES

    Should be captured from Meta XML from element

    Capture the value as per Meta xml file. Do not convert the month value in numericGenerate the attribute iso-8601-date in format of YYYY-MM-DD

    13. Conference LocationExample tagging

    Nuremberg, Germany

    RULES

    Should be captured from Meta XML from element

    Need to mark the content as city, stateor country

    14. Xplore Article IDExample tagging

    6176423

    RULES

    Should be captured from Meta XML from element

    15. Xplore IssueExample tagging

    6176405

    RULES

    Should be captured from Meta XML from element

    16. Xplore Publication IdExample tagging6171057

    RULES

    Should be captured from Meta XML from element

  • 7/22/2019 Confernece XML-V 1.2

    6/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 6 of 22

    17. Article TitleExample tagging

    Automated Generation of Directed Tests for Transition Coverage in Cache Coherence Protocols

    RULESShould be captured from PDF file

    If subtitle appears, should be captured as

    Content case should be captured as per PDF file

    18. Authors and AffiliationExample tagging

    QinXiaoke

    [email protected]

    MishraPrabhat

    [email protected]

    Computer and Information Science and Engineering

    University of Florida

    USA

    RULES

    Author name , , , and sequence should be as per PDF file

    Multiple authors should be tagged in individual element

    If corresponding information is available, need to provide the attribute corresp=yesto the element .

    Note that no need to markcorresp=noto remaining authors

    If article has only 1 author, we need to set the value as corresp=yes

    If primary information is available, need to provide the attribute primary=yesto the element . Note

    that no need to markprimary=noto remaining authorsPersonal information of authors like email address should be captured in element

    No need to capture word EmailIf Biography of particular author appears in article, need to provide element

    If affiliation links appears as superscript in PDF, need to capture as superscript or otherwise as PCDATA

  • 7/22/2019 Confernece XML-V 1.2

    7/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 7 of 22

    19. Affiliation LinkExample tagging

    1

    RULES

    Each author must have element if affiliation is appearingElement value of should be as per PDF file

    If content like 1 dagger is appearing as superscript, need to capture in element

    If no label is appearing in PDF file, need to capture as dummy element like

    20. Affiliations

    1

    Dept. of High-Frequency Electronics, University of Paderborn

    Warburgerstr. 100, D-33098

    Paderborn

    Germany

    RULES

    Each unique affiliation should be captured in individual element

    Affiliation label should be captured, if appearing in PDF file

    Mark the content as institution, addr-Line, city, state country..

    If different markup of same type appears in 2 line, example 2 institutes appears in an affiliation, mark the content

    in 2 element institution. Do not merge the content with adding comma

    21. Publication dateExample tagging

    March2012

    RULES

    Capture from Meta xml from the element

    If Day appears with value 0, do not capture in output xml file

    Capture the month value as per Meta file. Do not convert in numeric

    Generate the attribute iso-8601-date in format of YYYY-MM-DD

    22. First and Last PageExample tagging

    12

    RULES

    Should be captured from meta xml file from element artpagenums

    Attribute startpage should be captured as

    Attribute endpage should be captured as

  • 7/22/2019 Confernece XML-V 1.2

    8/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 8 of 22

    23. Copyright StatementExample tagging

    ISBN 978-3-8007-3414-6 VDE VERLAG GMBH Berlin Offenbach,

    Germany

    RULESShould be captured from PDF file

    Capture the complete statement (including ISBN, location.)

    If appears as footnote, need to capture both as copyright statement and footnote

    24. Copyright YearExample tagging

    2012

    RULES

    Should be captured from meta xml file from element

    25. Copyright HolderExample tagging

    VDE VERLAG GmbH

    RULES

    Should be captured from meta xml file from element

    Generate attribute copyright-owner as per below specifications

    26. AbstractExample tagging

    Abstract

    RULES

    A. If Abstract is appearing in PDF fileCapture from PDF file

    Crown Crown copyrightIBM The IBM corporation

    IEEE IEEE

    NA Not applicable

    Other Copyright holder is not one of the other named values.

    Unknown Copyright holder is not known.

    USGov United States Government

  • 7/22/2019 Confernece XML-V 1.2

    9/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 9 of 22

    B. If Abstract is NOT appearing in PDF fileCase 1: If first paragraph of PDF is same as Abstract appearing in Meta xml

    Capture content as abstract section from Meta xml and also as first paragraph in body section

    Case 2: If first paragraph of PDF is not same as Abstract appearing in Meta xml

    Capture content as abstract section from Meta xml

    Case 3: If Body section do not starts with Paragraph, but appearing in Meta xml

    Capture content as abstract section from Meta xml

    Case 4: If abstract is not appearing in both PDF and Meta XML

    Capture first paragraph (wherever it appears in the PDF) as Abstract

    Always capture the attribute xml:lang

    Capture title if appearing in PDF file

    27. KeywordsExample tagging

    thermomechanical treatment

    Ag

    thermo mechanical reliability

    low-temperature low-pressure die bonding

    Microassembly

    Bonding

    RULES

    a. Capture keywords from both PDF and Meta XML file

    b. Keywords from Source PDF file should be captured with attribute author

    c. Keywords where exact match is found between captured from PDF and Meta XML file would be captured for

    single time in output xml file

    d. Duplicate keywords should be deleted from the bottom

    28. Counts

    Count of display equations (sum of numbered display and unnumbered display)

  • 7/22/2019 Confernece XML-V 1.2

    10/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 10 of 22

    Count of display equations (sum of numbered display and unnumbered figures)

    Count of total pages

    Count of total references

    Count of display tables (sum of numbered display and unnumbered tables)

    Count of total words in XML file

    29. Funding StatementExample tagging

    National Science and Technology support for Eleventh Five-Year Plan key topics

    2006BAJ18B01

    National Basic Research Program of China

    2006CB705507

    RULES

    Need to mark the multiple Award ids in different elements

    If acknowledgement section contains words like funded or supported, the line should also need to mark as

    finding statement (along with award ID and funding source)

    30. SectionExample tagging

    1Introduction

    RULES

    Label should be captured in element

    Title should be captured as per PDF file

    Section hierarchy should be as per PDF file

    If all section (like level 3) is appearing in italic, no need to mark the same

    If title appears as small caps, need to mark the content as small caps

    Introduction of Programming

  • 7/22/2019 Confernece XML-V 1.2

    11/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 11 of 22

    31. AlgorithmExample tagging

    RULES

    Should be captured as Image

    Naming convention should be alg

    Attribute rule with value both should be captured in element

    32. FigureExample tagging

    Fig. 1

    Die bonding shear strength of dummy chips bonded onto a Cu substrate metalized with Ag as a function

    of bonding temperature.

    RULES

    All numbered graphic should be renamed as per specifications

    Need to place at the end of paragraph where it is appearing

    Content case for caption should be as per PDF file

    Label should be captured in element as per PDF file

    Unnumbered Figure should be named as graphic. Example 6170664-graphic-1-source.tifOnly ID and xlink:href should appears as attribute

    33. Tables

  • 7/22/2019 Confernece XML-V 1.2

    12/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 12 of 22

    Example tagging

    TABLE III.LOCATION

    Where was the questionnaire filled?

    LocationFrequencyValid%Cumulative%

    ValidSandton3918.718.7

    Durban3918.737.3

    Polokwane104.842.1

    Kimberly146.748.8

    Zululand4622.070.8

    White River3818.289.0

    Auckland Park2311.0100.0

    Total209100.0

    Note: We need to tag the table content as text, once specifications get finalized

    RULES

    1. Need to capture as Text in XML and as Image in HTML

    2. Need to insert attribute values: @cellpadding="5"; @frame="box" and @rules="all" to element 3. Linking of References, Figures, Tables needs to be done, if appearing

    4. Use for any footer text that appears below the

    5. No alignment is required

    6. No cell shading or coloring should be captured

    Need to mark for emphasis in and as appearing in PDF file

    Need to place at the end of paragraph where it is appearing

    Content case for caption should be as per PDF file

    Label should be captured in element as per PDF file

    If content appears in tabular format and does not contain a label, title, caption, or table headings (column heads),

    need to capture in element

    Example Tagging

    U.S. Patent Documents

    7010440Mar. 2006

    Lillis et al.

  • 7/22/2019 Confernece XML-V 1.2

    13/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 13 of 22

    34. Inline equationsExample tagging

    $v1$

    RULES

    All inline math content (including Greek entities) appearing in running text should be mark as Math based onVtex RULES

    Chemical equations should be captured as text

    Should be enclosed with single $

    35. Display equationsExample tagging

    $$M=r{H}\cdot X(t)+r_{L}\cdot[D-X(t)]\eqno{\hbox{(1)}}$$

    RULES

    All display math content appearing in running text should be mark as Math based on Vtex RULES

    Should be enclosed with double $$

    Numbered equation should have an id like deqn1

    If multiple numbered equations are tagged as single equation, ID need to be provided in range as deqn3-6

    (Here, equation 6, 7 and 8 are tagged as single_)

    No need to provide the ID to unnumbered display equation

    Example tagging$$ {\BBP}_{m}(y)=2^{-\bar {d}_{B}(H({\tilde {Y}}\vert \bar {X})+D(\bar {X} {\tilde

    {Y}}\vert\vert \bar {X}Y))} $$

    and

    $$ {\BBP}_{\star}(y)=2^{-\bar {d}_{B}(H({\tilde {Y}})+D({\tilde {Y}}\Vert Y_{\star}))}. $$

  • 7/22/2019 Confernece XML-V 1.2

    14/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 14 of 22

    Hence,

    $$ \eqalignno{ {\BBP}_{\star}(y)& ={\BBP}_{m}(y)2^{-\bar {d}_{B}(D(\bar {X} {\tilde

    {Y}}\Vert \bar {X}Y_{\star})-D(\bar {X} {\tilde {Y}}\Vert \bar {X}Y))}\cr & ={\BBP}_{m}(y)2^{-

    {d}_{B}(D(XY\VertXY_{\star})+o(1))}} $$

    36. Sidebars

    Boxed Text/Sidebar

    There are many transitions that start and end in the same state.

    RULES

    If linking is appearing for the text in running body matter, needs to provide the ID for the element

    37. FootnotesExample tagging

    There are many transitions that start and end in the same state.

    RULES

    Should be captured at its callouts

    If label appears, need to capture in

    If copyright statement is appearing as footnote symbol, need to capture both as Footnote and Copyright

    38. Emphasis Markup Bold

    Italic

    Monospace Text (Typewriter Text)

    Small Caps

    Underline

    39. List Ordered List

    Example tagging

    1.

    We propose a dynamic migration policy, that decides at run-time.

    2.

    The DSR architecture uses the traditional least recently used (LRU).

  • 7/22/2019 Confernece XML-V 1.2

    15/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 15 of 22

    We propose a dynamic migration policy, that decides at run-time.

    The DSR architecture uses the traditional least recently used (LRU).

    1.

    Poodles

    2.

    Persian Cats

    3.

    Weaver Finches

    RULES

    All numeric list should be marked as

    All labels need to capture in element (like 1., or 1) or (1), a, A., I)

    For ordered list appearing with prefix labels like Step 1, need to insert an attribute prefix-word

    If list get continue, generate the attribute continued-from

    Attribute value for list-type can be either of below

    Order Ordered list. Prefix character is a number or a letter, depending on style.Bullet Unordered or bulleted list. Prefix character is a bullet and dash

    alpha-lower Ordered list. Prefix character is a lowercase alphabetical character.

    alpha-upper Ordered list. Prefix character is an uppercase alphabetical character.

    roman-lower Ordered list. Prefix character is a lowercase roman numeral.

    roman-upper Ordered list. Prefix character is an uppercase roman numeral.

    Simple Simple or plain list (No prefix character before each item)

    40. Definition ListExample tagging

    Diagnostic coverage

    The fractional decrease in the probability

    Dangerous failure

    A failure having the potential

    Dangerous failure detected

    A failure detected by on-line diagnostic tests

    41. StatementExample tagging

    Proof of Theorem 1.

    First, we obtain the two predicted parities of block 1

    RULES

    Capturing sections like Theorem, Lemma, Remark, Proof, and Prepositions etc. as and its

    linking in body text

    If colon appears at the end of label, we need to capture the same

  • 7/22/2019 Confernece XML-V 1.2

    16/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 16 of 22

    42. AppendixExample tagging

    Appendix A

    Proof of Theorem 1.

    First, we obtain the two predicted parities of block 1

    Appendix B

    As seen in Fig. 2, the S-box and the inverse S-box share

    RULES

    Must have an ID attribute to the element

    43. AcknowledgmentsExample tagging

    VIAcknowledgments

    First, we obtain the two predicted parities of block 1

    RULES

    If appears exactly before the References, should be captured as part of

    44. BiographyExample tagging

    Biographical Sketch

    Ailamaki is a Professor of Computer Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in

    Switzerland.

  • 7/22/2019 Confernece XML-V 1.2

    17/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 17 of 22

    45. ReferencesRULES

    Need to capture both label and title if appearing

    Punctuations should be captured as per PDF file

    Always capture punctuation like comma outside the element

    Each Reference will be captured in element

    No need to generate attribute iso-8601-datein references

    No need to mark as , or to the element , if appearing in References.

    Generate attribute specific-use="IEEE", if word IEEE appears in element

    IEEE Trans. Microw. Theory Tech.

    Note:No need to generate attribute otherwiseGenerate attribute country to the element

    Attribute value of publication-formatcan be either from

    Need to capture the any one value from print, online or other If a reference clearly only has a URL, with no volume/issue/page number print-specific information, then set the

    @publication-format attribute to "online."

    If a reference has a URL, but also has volume/issue/page number data, then set the @publication-format attribute to

    "print."

    If a reference does not have a URL, not a clear reference to a CD or other electronic media, then set the @publication-

    format attribute to "print."

    If a reference has specific information about a CD, DVD or other media, then set the @publication-format attribute to

    "other."

    Note: The vast majority of IEEE references will be either "print" or "online".

    Attribute value of publication type can be either from

    a. periodical

    b. report

    c. thesis

    d. standard

    e. manual

    f. confproc

    g. confpaper

    h. patent

    i. unpubd

    j. software

    k. other [Need to capture the attribute, if appears]

    l. online

    m. government

    n. book

  • 7/22/2019 Confernece XML-V 1.2

    18/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 18 of 22

    Examples for each type

    Periodical[13]D.J. Smith, Reliability, Maintainability, and Risk.

    London, UK: Butterworth Heinemann;

    2000.

    Conference Proceedings

    [6]R. Johnson, N.

    Hardavellas, : To Share or Not To Share? 33rd VLDB Conference, Vienna, Austria, 2007.

    Report

    [5]A.A. Frederickson. Comparison of programmable

    electronic safety-related system architectures. [Web Page]. Available at: http://www.safetyusersgroup.com/

    default.asp Accessed 2005 Apr

    25.

    Government Document

    [21]P. S. Wellman. Tactile Imaging. PhD thesis,

    Harvard University, 1999.

    Patent

  • 7/22/2019 Confernece XML-V 1.2

    19/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 19 of 22

    Standard

  • 7/22/2019 Confernece XML-V 1.2

    20/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 20 of 22

    footnote fn rid="fn1" Fn

    grant grant rid="grant1" Grant

    graphic graphic rid="graphic1" Graphic

    lemma lemma rid="lemma1" Lemma

    list list rid="list1" List

    other other rid="other1" Other

    plate plate rid="plate1" Plate

    proof proof rid="proof1" Proof

    reference ref rid="ref1" Bibr

    Remark remark rid="remark1" Remark

    scenario scenario rid="scenario1" Scenario

    Scheme scheme rid="scheme1" Scheme

    Section sec rid="sec1" or "sec1a"

    or "sec1a1"

    Sec

    Sidebar sidebar rid="sidebar1" boxed-text

    statement statement rid="statement1" Statement

    supplementary

    material

    supp-mat rid="supp-mat1" supplementary-

    material

    Table table rid="table1" Table

    table footnote table-fn rid="table-fn1" table-fn

    theorem theorem rid="theorem1" Theorem

    Linking Pattern for References

    Case I: A simple range. If the print representation is: "[10],[11],[12],[13],[14],[15]", XML coding is:[11], [12], [13],

    [14], [15]

    Case II: A range with one skip. If the print representation is: "[10]-[13],[15]" , XML coding is:

    [10]-[13], [15]

    Case III: A range with one skip and a connector-word. If the print representation is: "[10]-[13] and [15]." XML coding is:

    [10]-

    [13] and [15].

    Case IV: A simple range with only a connector-word. If the print representation is: "[11] through [15]", XML coding is:

    [11] through [15]

    Linking Pattern for Display Equations

  • 7/22/2019 Confernece XML-V 1.2

    21/22

    ` PDF 2 XML Conversion Work Instructions for Conference

    Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 21 of 22

    Case I: Single display equation if the print respresentation is: (1) then the XML coding is:

  • 7/22/2019 Confernece XML-V 1.2

    22/22