Python for Data Science: A Starter Guide

Python Logo

Python is the dominant language for data science, thanks to its powerful libraries for data analysis, manipulation, and visualization. This guide introduces the essential libraries: Pandas, NumPy, and Matplotlib.

Core Libraries

  • NumPy: Provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them.
  • Pandas: Built on top of NumPy, it offers data structures like the DataFrame, which is perfect for handling tabular data.
  • Matplotlib: A comprehensive library for creating static, animated, and interactive visualizations.

Example: Analyzing Sales Data

import pandas as pd
import matplotlib.pyplot as plt

# Load data from a CSV file into a DataFrame
data = {'Month': ['Jan', 'Feb', 'Mar'], 'Sales': [150, 200, 180]}
df = pd.DataFrame(data)

# Calculate average sales
average_sales = df['Sales'].mean()
print(f'Average Sales: {average_sales}')

# Plot the data
plt.figure(figsize=(8, 5))
plt.bar(df['Month'], df['Sales'])
plt.title('Monthly Sales')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.show()

Comments