How to read and write files in Pandas?

How to read and write files in Pandas?

Hey,

Welcome back to the Data Analysis in Python Pandas series. In the last post, we learned common attributes of DataFrame or Series that can be used for early data analysis.

In this article, we will discover the reader and writer functions of Pandas that are used for reading and writing in different file formats.

CSV Files (Comma Separated Values)

Pandas support .read_csv() method to read data stored as CSV file into pandas DataFrame and .to_csv() method to store DataFrame into CSV file.

.read_csv()

# Importing pandas library.
import pandas as pd

# Reading csv file into DataFrame.
path = "input/iris_data.csv"  # filepath
data = pd.read_csv(path)

After reading data into DataFrame, we can view the DataFrame. Pandas provides .head(n) and .tail(n) methods of DataFrame to view first n and last n rows respectively.

.head(n)

By default, it shows the first 5 rows if n is not specified.

>>> data.head() # first 5 rows
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)      
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2
>>> data.head(3) # first 3 rows
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2

.tail(n)

By default, it shows the last 5 rows if n is not specified.

>>> data.tail() # last 5 rows
     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

.to_csv()

After analysis, you may need to store the clean dataset back into CSV files. For that Pandas provide .to_csv() method of DataFrame.

data.to_csv(path = 'input/new_iris_data.csv', index=False)

Here, the path is a file path with the name of the file you want to save with. And index is a keyword argument that takes boolean values to write index. By setting index=False row index labels are not saved in the CSV file.

You can read more about .read_csv() here and .to_csv()here

Pandas support various different file formats to read into DataFrame. Each of them can be used with the .read_* method and the .to_* method to write. (* is file format)

File FormatReader FunctionWriter Function
CSVpd.read_csv()df.to_csv()
excelpd.read_excel()df.to_excel()
JSONpd.read_json()df.to_json()
HTMLpd.read_html()df.to_html()
XMLpd.read_xml()df.to_xml()
SQLpd.read_sql()df.to_sql()

These are some of the most common files. Here pd is for Pandas and df for DataFrame. Pandas support many other files to read and write, to know more you can visit the Pandas reader and writer function.

This is all about today's article. Thank you for reading. See you in my next article.

Did you find this article valuable?

Support Madhuri Patil by becoming a sponsor. Any amount is appreciated!