Hey,
Welcome back to the Data Analysis in Python Pandas series. In the last post, we learned common attributes of DataFrame or Series that can be used for early data analysis.
In this article, we will discover the reader and writer functions of Pandas that are used for reading and writing in different file formats.
CSV Files (Comma Separated Values)
Pandas support .read_csv()
method to read data stored as CSV file into pandas DataFrame and .to_csv()
method to store DataFrame into CSV file.
.read_csv()
# Importing pandas library.
import pandas as pd
# Reading csv file into DataFrame.
path = "input/iris_data.csv" # filepath
data = pd.read_csv(path)
After reading data into DataFrame, we can view the DataFrame.
Pandas provides .head(n)
and .tail(n)
methods of DataFrame to view first n
and last n
rows respectively.
.head(n)
By default, it shows the first 5 rows if n is not specified.
>>> data.head() # first 5 rows
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
>>> data.head(3) # first 3 rows
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
.tail(n)
By default, it shows the last 5 rows if n is not specified.
>>> data.tail() # last 5 rows
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8
.to_csv()
After analysis, you may need to store the clean dataset back into CSV files. For that Pandas provide .to_csv()
method of DataFrame.
data.to_csv(path = 'input/new_iris_data.csv', index=False)
Here, the path
is a file path with the name of the file you want to save with. And index
is a keyword argument that takes boolean values to write index. By setting index=False
row index labels are not saved in the CSV file.
You can read more about .read_csv()
here and .to_csv()
here
Pandas support various different file formats to read into DataFrame. Each of them can be used with the .read_*
method and the .to_*
method to write. (* is file format)
File Format | Reader Function | Writer Function |
CSV | pd.read_csv() | df.to_csv() |
excel | pd.read_excel() | df.to_excel() |
JSON | pd.read_json() | df.to_json() |
HTML | pd.read_html() | df.to_html() |
XML | pd.read_xml() | df.to_xml() |
SQL | pd.read_sql() | df.to_sql() |
These are some of the most common files. Here pd
is for Pandas and df
for DataFrame. Pandas support many other files to read and write, to know more you can visit the Pandas reader and writer function.
This is all about today's article. Thank you for reading. See you in my next article.