$ pip install openpyxl
$ pip install pillow
Write Sample code:
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("sample.xlsx")
Read Sample code:
from xlrd import open_workbook
book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
if name.endswith('2'):
sheet = book.sheet_by_name(name)
# Attempt to find a matching row (search the first column for 'john')
rowIndex = -1
for cell in sheet.col(0): #
if 'john' in cell.value:
break
# If we found the row, print it
if row != -1:
cells = sheet.row(row)
for cell in cells:
print cell.value
book.unload_sheet(name)
The easiest way to understand
xs
is to show an example. I will take a data example
from the pivot table article.First we get the data uploaded into a simple pivot table. Do my standard imports, read in the data and create my pivot table:
import pandas as pd
import numpy as np
df = pd.read_excel("sales-funnel.xlsx")
table = pd.pivot_table(df,index=["Manager","Rep","Product"],
values=["Price","Quantity"],
aggfunc=[np.sum,np.mean],fill_value=0)
table
sum | mean | |||||
---|---|---|---|---|---|---|
Price | Quantity | Price | Quantity | |||
Manager | Rep | Product | ||||
Debra Henley | Craig Booker | CPU | 65000 | 2 | 32500 | 1.0 |
Maintenance | 5000 | 2 | 5000 | 2.0 | ||
Software | 10000 | 1 | 10000 | 1.0 | ||
Daniel Hilton | CPU | 105000 | 4 | 52500 | 2.0 | |
Software | 10000 | 1 | 10000 | 1.0 | ||
John Smith | CPU | 35000 | 1 | 35000 | 1.0 | |
Maintenance | 5000 | 2 | 5000 | 2.0 | ||
Fred Anderson | Cedric Moss | CPU | 95000 | 3 | 47500 | 1.5 |
Maintenance | 5000 | 1 | 5000 | 1.0 | ||
Software | 10000 | 1 | 10000 | 1.0 | ||
Wendy Yule | CPU | 165000 | 7 | 82500 | 3.5 | |
Maintenance | 7000 | 3 | 7000 | 3.0 | ||
Monitor | 5000 | 2 | 5000 | 2.0 |
pivot_table
syntax.Now, let’s take a look at what
xs
can do:table.xs('Debra Henley', level=0)
sum | mean | ||||
---|---|---|---|---|---|
Price | Quantity | Price | Quantity | ||
Rep | Product | ||||
Craig Booker | CPU | 65000 | 2 | 32500 | 1 |
Maintenance | 5000 | 2 | 5000 | 2 | |
Software | 10000 | 1 | 10000 | 1 | |
Daniel Hilton | CPU | 105000 | 4 | 52500 | 2 |
Software | 10000 | 1 | 10000 | 1 | |
John Smith | CPU | 35000 | 1 | 35000 | 1 |
Maintenance | 5000 | 2 | 5000 | 2 |
xs
allows me to drill down to one cross-section of the pivot table.
We can drill down multiple levels as well. If we want to just see one rep’s results:table.xs(('Debra Henley','Craig Booker'), level=0)
sum | mean | |||
---|---|---|---|---|
Price | Quantity | Price | Quantity | |
Product | ||||
CPU | 65000 | 2 | 32500 | 1 |
Maintenance | 5000 | 2 | 5000 | 2 |
Software | 10000 | 1 | 10000 | 1 |
We need the
get_level_values
to make this work as seamlessly as possible.
For example, if we want to see all the Manager values:table.index.get_level_values(0)
Index([u'Debra Henley', u'Debra Henley', u'Debra Henley', u'Debra Henley', u'Debra Henley', u'Debra Henley', u'Debra Henley', u'Fred Anderson', u'Fred Anderson', u'Fred Anderson', u'Fred Anderson', u'Fred Anderson', u'Fred Anderson'], dtype='object')If we want to see all the rep values:
table.index.get_level_values(1)
Index([u'Craig Booker', u'Craig Booker', u'Craig Booker', u'Daniel Hilton', u'Daniel Hilton', u'John Smith', u'John Smith', u'Cedric Moss', u'Cedric Moss', u'Cedric Moss', u'Wendy Yule', u'Wendy Yule', u'Wendy Yule'], dtype='object')To make it a little simpler for iterating, use
unique
:table.index.get_level_values(0).unique()
array([u'Debra Henley', u'Fred Anderson'], dtype=object)Now it should be clear what we’re about to do. I’ll print it out first so you can see.
for manager in table.index.get_level_values(0).unique():
print(table.xs(manager, level=0))
sum mean Price Quantity Price Quantity Rep Product Craig Booker CPU 65000 2 32500 1 Maintenance 5000 2 5000 2 Software 10000 1 10000 1 Daniel Hilton CPU 105000 4 52500 2 Software 10000 1 10000 1 John Smith CPU 35000 1 35000 1 Maintenance 5000 2 5000 2 sum mean Price Quantity Price Quantity Rep Product Cedric Moss CPU 95000 3 47500 1.5 Maintenance 5000 1 5000 1.0 Software 10000 1 10000 1.0 Wendy Yule CPU 165000 7 82500 3.5 Maintenance 7000 3 7000 3.0 Monitor 5000 2 5000 2.0As we pull it all together, it is super simple to create a single Excel sheet with one tab per manager:
writer = pd.ExcelWriter('output.xlsx')
for manager in table.index.get_level_values(0).unique():
temp_df = table.xs(manager, level=0)
temp_df.to_excel(writer,manager)
writer.save()
Stop and Think
As you sit back and think about this code, just take a second to revel in how much we are doing with 7 lines of code (plus 2 imports):import pandas as pd
import numpy as np
df = pd.read_excel("sales-funnel.xlsx")
table = pd.pivot_table(df,index=["Manager","Rep","Product"], values=["Price","Quantity"],aggfunc=[np.sum,np.mean],fill_value=0)
writer = pd.ExcelWriter('output.xlsx')
for manager in table.index.get_level_values(0).unique():
temp_df = table.xs(manager, level=0)
temp_df.to_excel(writer,manager)
writer.save()
I think my excitement about this functionality is warranted.
Taking It One Step Further
In some cases, you might want to generate separate files per manager or do some other manipulation. It should be pretty simple to understand how to do so given the examples above.To close out this discussion, I decided I would wrap things up with a fully functional program that utilizes additional python functions to make this script a truly useful program that utilizes good python programming practices so that you can scale it up for your own needs:
"""
Sample report generation script from pbpython.com
This program takes an input Excel file, reads it and turns it into a
pivot table.
The output is saved in multiple tabs in a new Excel file.
"""
import argparse
import pandas as pd
import numpy as np
def create_pivot(infile, index_list=["Manager", "Rep", "Product"],
value_list=["Price", "Quantity"]):
"""
Read in the Excel file, create a pivot table and return it as a DataFrame
"""
df = pd.read_excel(infile)
table = pd.pivot_table(df, index=index_list,
values=value_list,
aggfunc=[np.sum, np.mean], fill_value=0)
return table
def save_report(report, outfile):
"""
Take a report and save it to a single Excel file
"""
writer = pd.ExcelWriter(outfile)
for manager in report.index.get_level_values(0).unique():
temp_df = report.xs(manager, level=0)
temp_df.to_excel(writer, manager)
writer.save()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Script to generate sales report')
parser.add_argument('infile', type=argparse.FileType('r'),
help="report source file in Excel")
parser.add_argument('outfile', type=argparse.FileType('w'),
help="output file in Excel")
args = parser.parse_args()
# We need to pass the full file name instead of the file object
sales_report = create_pivot(args.infile.name)
save_report(sales_report, args.outfile.name)
The simplest example of reading a CSV file:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
import csv
with open('passwd', newline='') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
print(row)
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
open()
is used to open a CSV file for reading, the file
will by default be decoded into unicode using the system default
encoding (see locale.getpreferredencoding()
). To decode a file
using a different encoding, use the encoding
argument of open:import csv
with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
Registering a new dialect:
import csv
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
with open('passwd', newline='') as f:
reader = csv.reader(f, 'unixpwd')
import csv, sys
filename = 'some.csv'
with open(filename, newline='') as f:
reader = csv.reader(f)
try:
for row in reader:
print(row)
except csv.Error as e:
sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))
import csv
for row in csv.reader(['one,two,three']):
print(row)
Footnotes
[1] | (1, 2) If newline='' is not specified, newlines embedded inside quoted fields
will not be interpreted correctly, and on platforms that use \r\n linendings
on write an extra \r will be added. It should always be safe to specify
newline='' , since the csv module does its own
(universal) newline handling. |
沒有留言:
張貼留言