Dplython數(shù)據(jù)分析庫
Dplython是使用Python語言的Dplyr。Dplyr是一個使用R語言快速分析數(shù)據(jù)的庫。 Dplyr的理念是在一些最常見的任務中限制數(shù)據(jù)操作的部分功能。這種映射思維更接近編碼思維,幫助您在分析數(shù)據(jù)時提高“思維速度”。
安裝:
pip install git+https://github.com/dodger487/dplython.git
使用:
from dplython import * diamonds >> select(X.carat, X.cut, X.price) >> head(5) """
# Filter out rows using dfilter diamonds >> dfilter(X.carat > 4) >> select(X.carat, X.cut, X.depth, X.price) """
# Sample with sample_n or sample_frac, sort with arrange (diamonds >> sample_n(10) >> arrange(X.carat) >> select(X.carat, X.cut, X.depth, X.price))"""
# You can: # add columns with mutate (referencing other columns!) # group rows into dplyr-style groups with group_by # collapse rows into single rows using sumarize (diamonds >> mutate(carat_bin=X.carat.round()) >> group_by(X.cut, X.carat_bin) >> summarize(avg_price=X.price.mean()))"""
# If you have column names that don't work as attributes, you can use an # alternate "get item" notation with X. diamonds["column w/ spaces"] = range(len(diamonds)) diamonds >> select(X["column w/ spaces"]) >> head() """
# It's possible to pass the entire dataframe using X._ diamonds >> sample_n(6) >> select(X.carat, X.price) >> X._.T """
# To pass the DataFrame or columns into functions, apply @DelayFunction
@DelayFunctiondef PairwiseGreater(series1, series2):
index = series1.index
newSeries = pandas.Series([max(s1, s2) for s1, s2 in zip(series1, series2)])
newSeries.index = index return newSeries
diamonds >> PairwiseGreater(X.x, X.y)# Passing entire dataframe and plotting with ggplotfrom ggplot import *ggplot = DelayFunction(ggplot) # Simple installationdiamonds = DplyFrame(pandas.read_csv('./diamonds.csv')) # Masked in ggplot pkg(diamonds >> ggplot(aes(x="carat", y="price", color="cut"), data=X._) +
geom_point() + facet_wrap("color"))
(diamonds >>
dfilter((X.clarity == "I1") | (X.clarity == "IF")) >>
ggplot(aes(x="carat", y="price", color="color"), X._) +
geom_point() +
facet_wrap("clarity"))
# Matplotlib works as well! import pylab as pl pl.scatter = DelayFunction(pl.scatter) diamonds >> sample_frac(0.1) >> pl.scatter(X.carat, X.price)
評論
圖片
表情
