confernece xml-v 1.2
TRANSCRIPT
-
7/22/2019 Confernece XML-V 1.2
1/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 1 of 22
Input PDF and Meta XML file
Output XML, HTML, Images
Process
1. Convert extracted TBXML to XML using conversion script. Script ConvertsConversion of TBXML to XML based on conf-jats1.dtdDTD (version 1.5)Injection of Tex tagging from the server
Renaming of extracted Images as per specifications
2. Manual Review and FixingPoints for manual review and fixing are mentioned below
3. ValidationValidation of XML files according to DTD RULES.
Validation of TeX files according to Vtex RULES.
4. QA ChecksRun QA script and fix the error logs, if any
Source Input Folder Structure
+Year
+Publication Number
+Issue Number
+Article Number
PDF/XML/Images
-
7/22/2019 Confernece XML-V 1.2
2/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 2 of 22
Manual Review and Fixing
1. Sequence for lines of XML and Doctype declaration asExample tagging
RULES
Word utf should be in lower case
DTD version may vary as per current provided DTD by IEEE.
2. Element conf-articleExample tagging
RULES
Validate content-type attribute value from the XML metadata element
Attribute article-type may vary according to the content of article
Need to capture the attributes as defined in example only. No attribute should be extra or missing in the final file
Validate open access attribute value from Meta xml file
a. If element value is F, open-access="no
b. If element value is T, open-access="yes
3. Comment TagsExample tagging
RULES
Delivery Date: Should be uploading date
XML Script: Should be current TBXML 2 XML conversion script version
Batch: Should be generated as per guidelines
4. Conference AcronymExample tagging
CIPS
RULES
Always be in Capital Letters
Should be captured from Meta XML from below tag
CIPS
-
7/22/2019 Confernece XML-V 1.2
3/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 3 of 22
5. Conference full titleExample tagging
2012 7th International Conference on Integrated Power Electronics Systems (CIPS)
RULES
Should be captured from Meta XML from below tag
(CIPS)]]>
Capture the content case as appearing in Meta XML
6. Conference Normalized TitleExample tagging
Integrated Power Electronics Systems (CIPS), 2012 7th International Conference
RULES
Should be captured from Meta XML from below tag
on]]>
Do not capture word on appearing at the end of title
Capture the content case as appearing in Meta XML
7. VolumeExample tagging1
RULES
Capture from meta XML file
If dummy tag is appearing in Meta XML, we need to capture the default value as 1
8. ISBNExample tagging
978-3-8007-3414-6
RULES
Capture from meta XML file
Attribute mediatype appearing in meta xml should be captured as attribute content-type in output xml file
Do not convert the attribute isbntype, if appearing in meta xml
Multiple ISBN appearing in Meta xml should be captured in output xml file
If duplicate ISBN appearing with same element and attribute value, it should be captured as once
If 2 ISBN appearing with same element value, but different attribute value or vice versa, need to capture as 2
elements in output xml file
Do not capture the dummy markup, if appearing in meta xml file
-
7/22/2019 Confernece XML-V 1.2
4/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 4 of 22
9. ISSNExample tagging
1530-1591
RULES
Capture from meta XML file Attribute mediatype appearing in meta xml should be captured as attribute content-type in output xml file
Do not convert the attribute issntype, if appearing in meta xml
Multiple ISSN appearing in Meta xml should be captured in output xml file
If duplicate ISSN appears
Case 1: If duplicate ISSN values are appearing with same attribute
Capture it at once
Case 2: If duplicate ISSN values are appearing with different attribute
Use the mediatype "Paper" ISSN only and omit the others
Do not capture the dummy markup, if appearing in meta xml file
10. Conference NameExample tagging
2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012)
RULES
Should be captured from Meta XML from element conftitle
Capture the content case as per Meta xml file
11. Conference Start DateExample tagging
12March2012
RULES
Should be captured from Meta XML from element
Capture the value as per Meta xml file. Do not convert the month value in numeric
Generate the attribute iso-8601-datein format of YYYY-MM-DDNo need to generate prefix 0 for single digit day or month information
-
7/22/2019 Confernece XML-V 1.2
5/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 5 of 22
12. Conference End DateExample tagging
16March2012
RULES
Should be captured from Meta XML from element
Capture the value as per Meta xml file. Do not convert the month value in numericGenerate the attribute iso-8601-date in format of YYYY-MM-DD
13. Conference LocationExample tagging
Nuremberg, Germany
RULES
Should be captured from Meta XML from element
Need to mark the content as city, stateor country
14. Xplore Article IDExample tagging
6176423
RULES
Should be captured from Meta XML from element
15. Xplore IssueExample tagging
6176405
RULES
Should be captured from Meta XML from element
16. Xplore Publication IdExample tagging6171057
RULES
Should be captured from Meta XML from element
-
7/22/2019 Confernece XML-V 1.2
6/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 6 of 22
17. Article TitleExample tagging
Automated Generation of Directed Tests for Transition Coverage in Cache Coherence Protocols
RULESShould be captured from PDF file
If subtitle appears, should be captured as
Content case should be captured as per PDF file
18. Authors and AffiliationExample tagging
QinXiaoke
MishraPrabhat
Computer and Information Science and Engineering
University of Florida
USA
RULES
Author name , , , and sequence should be as per PDF file
Multiple authors should be tagged in individual element
If corresponding information is available, need to provide the attribute corresp=yesto the element .
Note that no need to markcorresp=noto remaining authors
If article has only 1 author, we need to set the value as corresp=yes
If primary information is available, need to provide the attribute primary=yesto the element . Note
that no need to markprimary=noto remaining authorsPersonal information of authors like email address should be captured in element
No need to capture word EmailIf Biography of particular author appears in article, need to provide element
If affiliation links appears as superscript in PDF, need to capture as superscript or otherwise as PCDATA
-
7/22/2019 Confernece XML-V 1.2
7/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 7 of 22
19. Affiliation LinkExample tagging
1
RULES
Each author must have element if affiliation is appearingElement value of should be as per PDF file
If content like 1 dagger is appearing as superscript, need to capture in element
If no label is appearing in PDF file, need to capture as dummy element like
20. Affiliations
1
Dept. of High-Frequency Electronics, University of Paderborn
Warburgerstr. 100, D-33098
Paderborn
Germany
RULES
Each unique affiliation should be captured in individual element
Affiliation label should be captured, if appearing in PDF file
Mark the content as institution, addr-Line, city, state country..
If different markup of same type appears in 2 line, example 2 institutes appears in an affiliation, mark the content
in 2 element institution. Do not merge the content with adding comma
21. Publication dateExample tagging
March2012
RULES
Capture from Meta xml from the element
If Day appears with value 0, do not capture in output xml file
Capture the month value as per Meta file. Do not convert in numeric
Generate the attribute iso-8601-date in format of YYYY-MM-DD
22. First and Last PageExample tagging
12
RULES
Should be captured from meta xml file from element artpagenums
Attribute startpage should be captured as
Attribute endpage should be captured as
-
7/22/2019 Confernece XML-V 1.2
8/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 8 of 22
23. Copyright StatementExample tagging
ISBN 978-3-8007-3414-6 VDE VERLAG GMBH Berlin Offenbach,
Germany
RULESShould be captured from PDF file
Capture the complete statement (including ISBN, location.)
If appears as footnote, need to capture both as copyright statement and footnote
24. Copyright YearExample tagging
2012
RULES
Should be captured from meta xml file from element
25. Copyright HolderExample tagging
VDE VERLAG GmbH
RULES
Should be captured from meta xml file from element
Generate attribute copyright-owner as per below specifications
26. AbstractExample tagging
Abstract
RULES
A. If Abstract is appearing in PDF fileCapture from PDF file
Crown Crown copyrightIBM The IBM corporation
IEEE IEEE
NA Not applicable
Other Copyright holder is not one of the other named values.
Unknown Copyright holder is not known.
USGov United States Government
-
7/22/2019 Confernece XML-V 1.2
9/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 9 of 22
B. If Abstract is NOT appearing in PDF fileCase 1: If first paragraph of PDF is same as Abstract appearing in Meta xml
Capture content as abstract section from Meta xml and also as first paragraph in body section
Case 2: If first paragraph of PDF is not same as Abstract appearing in Meta xml
Capture content as abstract section from Meta xml
Case 3: If Body section do not starts with Paragraph, but appearing in Meta xml
Capture content as abstract section from Meta xml
Case 4: If abstract is not appearing in both PDF and Meta XML
Capture first paragraph (wherever it appears in the PDF) as Abstract
Always capture the attribute xml:lang
Capture title if appearing in PDF file
27. KeywordsExample tagging
thermomechanical treatment
Ag
thermo mechanical reliability
low-temperature low-pressure die bonding
Microassembly
Bonding
RULES
a. Capture keywords from both PDF and Meta XML file
b. Keywords from Source PDF file should be captured with attribute author
c. Keywords where exact match is found between captured from PDF and Meta XML file would be captured for
single time in output xml file
d. Duplicate keywords should be deleted from the bottom
28. Counts
Count of display equations (sum of numbered display and unnumbered display)
-
7/22/2019 Confernece XML-V 1.2
10/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 10 of 22
Count of display equations (sum of numbered display and unnumbered figures)
Count of total pages
Count of total references
Count of display tables (sum of numbered display and unnumbered tables)
Count of total words in XML file
29. Funding StatementExample tagging
National Science and Technology support for Eleventh Five-Year Plan key topics
2006BAJ18B01
National Basic Research Program of China
2006CB705507
RULES
Need to mark the multiple Award ids in different elements
If acknowledgement section contains words like funded or supported, the line should also need to mark as
finding statement (along with award ID and funding source)
30. SectionExample tagging
1Introduction
RULES
Label should be captured in element
Title should be captured as per PDF file
Section hierarchy should be as per PDF file
If all section (like level 3) is appearing in italic, no need to mark the same
If title appears as small caps, need to mark the content as small caps
Introduction of Programming
-
7/22/2019 Confernece XML-V 1.2
11/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 11 of 22
31. AlgorithmExample tagging
RULES
Should be captured as Image
Naming convention should be alg
Attribute rule with value both should be captured in element
32. FigureExample tagging
Fig. 1
Die bonding shear strength of dummy chips bonded onto a Cu substrate metalized with Ag as a function
of bonding temperature.
RULES
All numbered graphic should be renamed as per specifications
Need to place at the end of paragraph where it is appearing
Content case for caption should be as per PDF file
Label should be captured in element as per PDF file
Unnumbered Figure should be named as graphic. Example 6170664-graphic-1-source.tifOnly ID and xlink:href should appears as attribute
33. Tables
-
7/22/2019 Confernece XML-V 1.2
12/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 12 of 22
Example tagging
TABLE III.LOCATION
Where was the questionnaire filled?
LocationFrequencyValid%Cumulative%
ValidSandton3918.718.7
Durban3918.737.3
Polokwane104.842.1
Kimberly146.748.8
Zululand4622.070.8
White River3818.289.0
Auckland Park2311.0100.0
Total209100.0
Note: We need to tag the table content as text, once specifications get finalized
RULES
1. Need to capture as Text in XML and as Image in HTML
2. Need to insert attribute values: @cellpadding="5"; @frame="box" and @rules="all" to element 3. Linking of References, Figures, Tables needs to be done, if appearing
4. Use for any footer text that appears below the
5. No alignment is required
6. No cell shading or coloring should be captured
Need to mark for emphasis in and as appearing in PDF file
Need to place at the end of paragraph where it is appearing
Content case for caption should be as per PDF file
Label should be captured in element as per PDF file
If content appears in tabular format and does not contain a label, title, caption, or table headings (column heads),
need to capture in element
Example Tagging
U.S. Patent Documents
7010440Mar. 2006
Lillis et al.
-
7/22/2019 Confernece XML-V 1.2
13/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 13 of 22
34. Inline equationsExample tagging
$v1$
RULES
All inline math content (including Greek entities) appearing in running text should be mark as Math based onVtex RULES
Chemical equations should be captured as text
Should be enclosed with single $
35. Display equationsExample tagging
$$M=r{H}\cdot X(t)+r_{L}\cdot[D-X(t)]\eqno{\hbox{(1)}}$$
RULES
All display math content appearing in running text should be mark as Math based on Vtex RULES
Should be enclosed with double $$
Numbered equation should have an id like deqn1
If multiple numbered equations are tagged as single equation, ID need to be provided in range as deqn3-6
(Here, equation 6, 7 and 8 are tagged as single_)
No need to provide the ID to unnumbered display equation
Example tagging$$ {\BBP}_{m}(y)=2^{-\bar {d}_{B}(H({\tilde {Y}}\vert \bar {X})+D(\bar {X} {\tilde
{Y}}\vert\vert \bar {X}Y))} $$
and
$$ {\BBP}_{\star}(y)=2^{-\bar {d}_{B}(H({\tilde {Y}})+D({\tilde {Y}}\Vert Y_{\star}))}. $$
-
7/22/2019 Confernece XML-V 1.2
14/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 14 of 22
Hence,
$$ \eqalignno{ {\BBP}_{\star}(y)& ={\BBP}_{m}(y)2^{-\bar {d}_{B}(D(\bar {X} {\tilde
{Y}}\Vert \bar {X}Y_{\star})-D(\bar {X} {\tilde {Y}}\Vert \bar {X}Y))}\cr & ={\BBP}_{m}(y)2^{-
{d}_{B}(D(XY\VertXY_{\star})+o(1))}} $$
36. Sidebars
Boxed Text/Sidebar
There are many transitions that start and end in the same state.
RULES
If linking is appearing for the text in running body matter, needs to provide the ID for the element
37. FootnotesExample tagging
There are many transitions that start and end in the same state.
RULES
Should be captured at its callouts
If label appears, need to capture in
If copyright statement is appearing as footnote symbol, need to capture both as Footnote and Copyright
38. Emphasis Markup Bold
Italic
Monospace Text (Typewriter Text)
Small Caps
Underline
39. List Ordered List
Example tagging
1.
We propose a dynamic migration policy, that decides at run-time.
2.
The DSR architecture uses the traditional least recently used (LRU).
-
7/22/2019 Confernece XML-V 1.2
15/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 15 of 22
We propose a dynamic migration policy, that decides at run-time.
The DSR architecture uses the traditional least recently used (LRU).
1.
Poodles
2.Persian Cats
3.
Weaver Finches
RULES
All numeric list should be marked as
All labels need to capture in element (like 1., or 1) or (1), a, A., I)
For ordered list appearing with prefix labels like Step 1, need to insert an attribute prefix-word
If list get continue, generate the attribute continued-from
Attribute value for list-type can be either of below
Order Ordered list. Prefix character is a number or a letter, depending on style.Bullet Unordered or bulleted list. Prefix character is a bullet and dash
alpha-lower Ordered list. Prefix character is a lowercase alphabetical character.
alpha-upper Ordered list. Prefix character is an uppercase alphabetical character.
roman-lower Ordered list. Prefix character is a lowercase roman numeral.
roman-upper Ordered list. Prefix character is an uppercase roman numeral.
Simple Simple or plain list (No prefix character before each item)
40. Definition ListExample tagging
Diagnostic coverage
The fractional decrease in the probability
Dangerous failure
A failure having the potential
Dangerous failure detected
A failure detected by on-line diagnostic tests
41. StatementExample tagging
Proof of Theorem 1.
First, we obtain the two predicted parities of block 1
RULES
Capturing sections like Theorem, Lemma, Remark, Proof, and Prepositions etc. as and its
linking in body text
If colon appears at the end of label, we need to capture the same
-
7/22/2019 Confernece XML-V 1.2
16/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 16 of 22
42. AppendixExample tagging
Appendix A
Proof of Theorem 1.
First, we obtain the two predicted parities of block 1
Appendix B
As seen in Fig. 2, the S-box and the inverse S-box share
RULES
Must have an ID attribute to the element
43. AcknowledgmentsExample tagging
VIAcknowledgments
First, we obtain the two predicted parities of block 1
RULES
If appears exactly before the References, should be captured as part of
44. BiographyExample tagging
Biographical Sketch
Ailamaki is a Professor of Computer Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in
Switzerland.
-
7/22/2019 Confernece XML-V 1.2
17/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 17 of 22
45. ReferencesRULES
Need to capture both label and title if appearing
Punctuations should be captured as per PDF file
Always capture punctuation like comma outside the element
Each Reference will be captured in element
No need to generate attribute iso-8601-datein references
No need to mark as , or to the element , if appearing in References.
Generate attribute specific-use="IEEE", if word IEEE appears in element
IEEE Trans. Microw. Theory Tech.
Note:No need to generate attribute otherwiseGenerate attribute country to the element
Attribute value of publication-formatcan be either from
Need to capture the any one value from print, online or other If a reference clearly only has a URL, with no volume/issue/page number print-specific information, then set the
@publication-format attribute to "online."
If a reference has a URL, but also has volume/issue/page number data, then set the @publication-format attribute to
"print."
If a reference does not have a URL, not a clear reference to a CD or other electronic media, then set the @publication-
format attribute to "print."
If a reference has specific information about a CD, DVD or other media, then set the @publication-format attribute to
"other."
Note: The vast majority of IEEE references will be either "print" or "online".
Attribute value of publication type can be either from
a. periodical
b. report
c. thesis
d. standard
e. manual
f. confproc
g. confpaper
h. patent
i. unpubd
j. software
k. other [Need to capture the attribute, if appears]
l. online
m. government
n. book
-
7/22/2019 Confernece XML-V 1.2
18/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 18 of 22
Examples for each type
Periodical[13]D.J. Smith, Reliability, Maintainability, and Risk.
London, UK: Butterworth Heinemann;
2000.
Conference Proceedings
[6]R. Johnson, N.
Hardavellas, : To Share or Not To Share? 33rd VLDB Conference, Vienna, Austria, 2007.
Report
[5]A.A. Frederickson. Comparison of programmable
electronic safety-related system architectures. [Web Page]. Available at: http://www.safetyusersgroup.com/
default.asp Accessed 2005 Apr
25.
Government Document
[21]P. S. Wellman. Tactile Imaging. PhD thesis,
Harvard University, 1999.
Patent
-
7/22/2019 Confernece XML-V 1.2
19/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 19 of 22
Standard
-
7/22/2019 Confernece XML-V 1.2
20/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 20 of 22
footnote fn rid="fn1" Fn
grant grant rid="grant1" Grant
graphic graphic rid="graphic1" Graphic
lemma lemma rid="lemma1" Lemma
list list rid="list1" List
other other rid="other1" Other
plate plate rid="plate1" Plate
proof proof rid="proof1" Proof
reference ref rid="ref1" Bibr
Remark remark rid="remark1" Remark
scenario scenario rid="scenario1" Scenario
Scheme scheme rid="scheme1" Scheme
Section sec rid="sec1" or "sec1a"
or "sec1a1"
Sec
Sidebar sidebar rid="sidebar1" boxed-text
statement statement rid="statement1" Statement
supplementary
material
supp-mat rid="supp-mat1" supplementary-
material
Table table rid="table1" Table
table footnote table-fn rid="table-fn1" table-fn
theorem theorem rid="theorem1" Theorem
Linking Pattern for References
Case I: A simple range. If the print representation is: "[10],[11],[12],[13],[14],[15]", XML coding is:[11], [12], [13],
[14], [15]
Case II: A range with one skip. If the print representation is: "[10]-[13],[15]" , XML coding is:
[10]-[13], [15]
Case III: A range with one skip and a connector-word. If the print representation is: "[10]-[13] and [15]." XML coding is:
[10]-
[13] and [15].
Case IV: A simple range with only a connector-word. If the print representation is: "[11] through [15]", XML coding is:
[11] through [15]
Linking Pattern for Display Equations
-
7/22/2019 Confernece XML-V 1.2
21/22
` PDF 2 XML Conversion Work Instructions for Conference
Confidential Aptara Proprietary Version 1.0.0.2 19 March13 Page 21 of 22
Case I: Single display equation if the print respresentation is: (1) then the XML coding is:
-
7/22/2019 Confernece XML-V 1.2
22/22