I would like to do statistical analysis on an Excel file. Rather than learn a new programming language VBA (Visual Basic for Applications), I would rather work with Excel files in Python. This post shows how to read and write Excel files with Python.
The Python library openpyxl
is designed for reading and writing Excel xlsx/xlsm/xltx/xltm
files. Refer to openpyxl documentation for its usage.
Use the following command to install openpyxl
:
$ sudo pip install openpyxl
BTW, xlrd
and xlwt
are for reading and writing spreadsheet files compatible with older Microsoft Excel files (i.e., .xls
) respectively.
load_workbook(...)
is used to open an Excel file and return a workbook.
from openpyxl import load_workbook
load_workbook(filename, read_only=False, keep_vba=False, data_only=False, guess_types=False)
# for instance
wb = load_workbook(file_workbook, read_only=True)
In gerneral, a workbook contains several worksheet. wb.get_sheet_names()
returns a list of sheet names. With a given name, wb.get_sheet_by_name(sheet_name)
returns a worksheet. For instance,
ws = wb.get_sheet_by_name(sheet_name)
ws.iter_rows(...)
is used to iterate over cells in the worksheet ws
. For instance,
iter_rows(range_string=None, row_offset=0, column_offset=0)
# read the worksheet to a list of lists
lists = list()
for row in ws.iter_rows(row_offset=1): # row_offset=1, skip the header
# select fields
selected_fields = [
row[IDX_CARD_ID],
row[IDX_NAME],
row[IDX_TIME],
row[IDX_IN_OUT]
]
# deal with blank fields; some fields might contain Chinese characters
lists.append([item.value.encode('utf8') if item.value else None for item in selected_fields])