# PANDAS

** pandas** is a Python package that provides a fast, flexible, and expressive data structure designed to make working with “relational” or “labeled” data easy and intuitive.

Limitation of NumPy

for performance, NumPy arrays were significantly faster but NumPy is missing features to enable data analysis on relational data(data that are related to one another). A few of the features missed in NumPy are :

- NO ways to attach labels to data.
- NO pre-built methods to fill missing values.
- NO ways to group data.
- NO ways to pivot data.

Why Pandas ?

pandas are+ built on top of NumPy to make data processing on relational data easier.

**Ingesting**(the process of obtaining and importing data), **Storing**, **pre-processing**, **Summarising**, And **visualizing** data can all be done effectively in pandas.

Size mutability: columns can be **inserted and deleted** from DataFrame.

pandas make it easy for **reshaping**, **merging,** and** joining **data sets.

import pandas as pd

creating a series (Series is 1D labeled homogeneously- typed array)

**s**=pd.Series([1,1,2,3,5,8,13],index=['a','b','c','d','e','f','g']) ##fibonacci series🤗 print(**s**)

>>**a 1 **

b 1

c 2

d 3

e 5

f 8

g 13

**dtype: int64**

## if index is not defined,index is autocreated as increasing whole number.(feature in pandas)

indexing and selection of data (preferred loc and iloc should be used)

s.iloc[0:4] ##iloc:It gives value for implicit location.>>a 1s.

b 1

c 2

d 3

dtype: int64loc["a":"d"] ##loc: It gives value for the location "a" to "d" per index defined.

>>a 1

b 1

c 2

d 3

dtype: int64

reading the file

`nifty=`**pd.read_csv**(“nifty.csv”,index_col=0)

>>This read the csv file "nifty" in dataframe

nifty**.head() # display the top 5 data of the dataframe.**

nifty**.tail() # display the last 5 data of dataframe.**

dealing with null (NaN) values

`df.`**dropna() ** *##drops all null values rows*

df.**fillna(0)** *##fill NaN values with zeros*

df.**fillna().mean()** *##fills NaN values with mean*

df.**replace(np.nan, 0)** *##replace with mean*

df.**replace(np.nan, df.column.mean())** *##replace with mean*

operations in dataframe

`df.`**mean() ** *## calculate the mean w.r.t. the columns*

df.**amin()** *## display the least value of the column*

df.**std()** *## display the standard deviation w.r.t. columns*

df.**median() ** *## display the standard deviation w.r.t. columnsAll the arithmetic operation are easily done in dataframe.*

Thank you for reading!!😀😀

**References**