Python Excel Project
   2 min read

Project description

Extract Excel data to generate an XML file using the Python openpyxl library.

Install openpyxl on Windows

Download the latest version of openpyxl.
In a command prompt window, run: pip install openpyxl

Python code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import openpyxl
from openpyxl import load_workbook
from lxml import etree

# load excel with its path
xlsPath = 'bib.xlsx'
workBook = openpyxl.load_workbook(xlsPath)  # open excel workbook.
workSheet = workBook['Sheet1'] # specify sheet.

xmlRootNode = etree.Element('bib')  # create xml root node.
colNum = 1

for row in workSheet.iter_rows(min_row=2, max_row=7, min_col=1, max_col=6):
    record = etree.SubElement(xmlRootNode, 'record')  # create new record node.
    colNum = 1  # reset column index.
    for cell in row:
        colHeaderName = workSheet.cell(row=1, column=colNum).value  # get first row header.
        tagNameNoSpaces = colHeaderName.replace(' ','')  # xml disallows spaces in tag names.
        tagValueNonBlank = str(cell.value if cell.value else '')  # handle empty cells.
        etree.SubElement(record, tagNameNoSpaces).text = tagValueNonBlank  # add new xml element to current record node.
        colNum += 1  # increment column index.

# generate xml file:
fileContent = etree.tostring(xmlRootNode, encoding='unicode', pretty_print=True)
with open('bib.xml', 'w', encoding = 'utf-8') as myFile:  # open file for writing.
    myFile.write(fileContent)  # specify string to write.

# print xml file:
declaration = '<?xml version=\'1.0\' encoding=\'utf-8\'?>'
print(declaration)
print(etree.tostring(xmlRootNode, encoding='unicode', pretty_print=True))

Excel file

Screenshot of Excel data sheet.

Generated XML file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<bib>
  <record>
    <Title>Dark sacred night</Title>
    <ISBN>9780316484800</ISBN>
    <Author>Michael Connelly</Author>
    <PlaceOfPublication>New York</PlaceOfPublication>
    <Publisher>Grand Central Publishing</Publisher>
    <PublicationDate>2019</PublicationDate>
  </record>
  <record>
    <Title>10 things I can see from here</Title>
    <ISBN>9780399556258</ISBN>
    <Author>Carrie Mac</Author>
    <PlaceOfPublication>New York</PlaceOfPublication>
    <Publisher>Alfred A. Knopf</Publisher>
    <PublicationDate>2017</PublicationDate>
  </record>
  <record>
    <Title>28 Barbary Lane</Title>
    <ISBN>9780062499011</ISBN>
    <Author>Armistead Maupin</Author>
    <PlaceOfPublication>New York</PlaceOfPublication>
    <Publisher>Harper Perennial</Publisher>
    <PublicationDate>2016</PublicationDate>
  </record>
  <record>
    <Title>Aesop's fables</Title>
    <ISBN>9781896580814</ISBN>
    <Author>Michael Rosen</Author>
    <PlaceOfPublication>Vancouver</PlaceOfPublication>
    <Publisher>Tradewind Books</Publisher>
    <PublicationDate>2013</PublicationDate>
  </record>
  <record>
    <Title>Even dogs in the wild</Title>
    <ISBN>9781410485311</ISBN>
    <Author>Ian Rankin</Author>
    <PlaceOfPublication>Farmington Hills</PlaceOfPublication>
    <Publisher>Thorndike Press</Publisher>
    <PublicationDate>2016</PublicationDate>
  </record>
  <record>
    <Title>House by the river</Title>
    <ISBN>9781542045896</ISBN>
    <Author>Lena Manta</Author>
    <PlaceOfPublication>Seattle</PlaceOfPublication>
    <Publisher>AmazonCrossing</Publisher>
    <PublicationDate>2017</PublicationDate>
  </record>
</bib>