Project description
Extract Excel data to generate an XML file using the Python openpyxl library.
Install openpyxl on Windows
Download the latest version of openpyxl.
In a command prompt window, run: pip install openpyxl
Python code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| import openpyxl
from openpyxl import load_workbook
from lxml import etree
# load excel with its path
xlsPath = 'bib.xlsx'
workBook = openpyxl.load_workbook(xlsPath) # open excel workbook.
workSheet = workBook['Sheet1'] # specify sheet.
xmlRootNode = etree.Element('bib') # create xml root node.
colNum = 1
for row in workSheet.iter_rows(min_row=2, max_row=7, min_col=1, max_col=6):
record = etree.SubElement(xmlRootNode, 'record') # create new record node.
colNum = 1 # reset column index.
for cell in row:
colHeaderName = workSheet.cell(row=1, column=colNum).value # get first row header.
tagNameNoSpaces = colHeaderName.replace(' ','') # xml disallows spaces in tag names.
tagValueNonBlank = str(cell.value if cell.value else '') # handle empty cells.
etree.SubElement(record, tagNameNoSpaces).text = tagValueNonBlank # add new xml element to current record node.
colNum += 1 # increment column index.
# generate xml file:
fileContent = etree.tostring(xmlRootNode, encoding='unicode', pretty_print=True)
with open('bib.xml', 'w', encoding = 'utf-8') as myFile: # open file for writing.
myFile.write(fileContent) # specify string to write.
# print xml file:
declaration = '<?xml version=\'1.0\' encoding=\'utf-8\'?>'
print(declaration)
print(etree.tostring(xmlRootNode, encoding='unicode', pretty_print=True))
|
Excel file
Generated XML file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
| <bib>
<record>
<Title>Dark sacred night</Title>
<ISBN>9780316484800</ISBN>
<Author>Michael Connelly</Author>
<PlaceOfPublication>New York</PlaceOfPublication>
<Publisher>Grand Central Publishing</Publisher>
<PublicationDate>2019</PublicationDate>
</record>
<record>
<Title>10 things I can see from here</Title>
<ISBN>9780399556258</ISBN>
<Author>Carrie Mac</Author>
<PlaceOfPublication>New York</PlaceOfPublication>
<Publisher>Alfred A. Knopf</Publisher>
<PublicationDate>2017</PublicationDate>
</record>
<record>
<Title>28 Barbary Lane</Title>
<ISBN>9780062499011</ISBN>
<Author>Armistead Maupin</Author>
<PlaceOfPublication>New York</PlaceOfPublication>
<Publisher>Harper Perennial</Publisher>
<PublicationDate>2016</PublicationDate>
</record>
<record>
<Title>Aesop's fables</Title>
<ISBN>9781896580814</ISBN>
<Author>Michael Rosen</Author>
<PlaceOfPublication>Vancouver</PlaceOfPublication>
<Publisher>Tradewind Books</Publisher>
<PublicationDate>2013</PublicationDate>
</record>
<record>
<Title>Even dogs in the wild</Title>
<ISBN>9781410485311</ISBN>
<Author>Ian Rankin</Author>
<PlaceOfPublication>Farmington Hills</PlaceOfPublication>
<Publisher>Thorndike Press</Publisher>
<PublicationDate>2016</PublicationDate>
</record>
<record>
<Title>House by the river</Title>
<ISBN>9781542045896</ISBN>
<Author>Lena Manta</Author>
<PlaceOfPublication>Seattle</PlaceOfPublication>
<Publisher>AmazonCrossing</Publisher>
<PublicationDate>2017</PublicationDate>
</record>
</bib>
|