pandas


How to summarise data by percentages in pandas


This code:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
Leads to the printing of this:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
How would I now aggregate the total Missing, Unknown and Other for each action as a total value count for each action, and have as a percentage of All action_types which are Missing, Unknown or Other? So for example, there would be one row for each action, and about_us row would have 406+0/Total Missing + Unknown + Other for all actions.
See this question for context.
The problem is that the above contains a row right at the bottom of it called All which is the sum of everything, so:
All Missing 1126204
Unknown 1031170
Desired output would be:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
Here is some test data:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
Which should go into this:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24
First you can find value of Multindex with key All by xs and then you can try it by original Series. Last you can reset_index:
print df
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
dtype: int64
print df.xs('All')
Missing 11
Unknown 24
dtype: int64
action action_type
print df / df.xs('All')
action action_type
a Missing 0.181818
Unknown 0.208333
b Missing 0.272727
Unknown 0.166667
c Missing 0.454545
Unknown 0.250000
d Missing 0.090909
Unknown 0.375000
All Missing 1.000000
Unknown 1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
action action_type action_type_percentage
0 a Missing 0.181818
1 a Unknown 0.208333
2 b Missing 0.272727
3 b Unknown 0.166667
4 c Missing 0.454545
5 c Unknown 0.250000
6 d Missing 0.090909
7 d Unknown 0.375000
8 All Missing 1.000000
9 All Unknown 1.000000

Related Links

pandas to HDF5 using any blosc compression
Weighted average for every possible range [closed]
Pandas multi index dataframe to nested dictionarry
No module named zipline (Python 64 bit and Windows 64 bit)
Mapping to similar times in Pandas
Pandas Unmelt dataset
multicollinearity for one-hot encoding
How to work with 'NA' in pandas?
Pandas DataFrame.assign arguments
zipline error KeyError: <type 'zipline.assets._assets.Equity'>
How can i split 'Date' column into 'Date' and 'Time' column? [closed]
Pandas customized group aggregation
What is the significance of t-stats value while applying ttest_ind on two pandas series?
Align order of multiindex dataframe with the order of another multiindex df, pandas
From Object To Date in Pandas [duplicate]
How do I configure PyCharm to Pandas

Categories

HOME
vim
deezer
iterator
raspberry-pi
fme
google-oauth
elm
react-redux
paw-app
sharepoint-designer
frameworks
qore
gorm
reveal.js
quickbooks
collectd
leiningen
serverless-framework
django-simple-history
evopdf
interop
viewport
textfield
swingx
object-detection
tokenize
captiveportal
trading
java-7
angular2-aot
sqlcipher
tdd
internet-explorer-9
web-api-testing
elasticsearch-net
libssl
google-cloud-nl
tasklet
procdump
gsmcomm
skia
microsoft-chart-controls
frame
exuberant-ctags
siesta-swift
winrt-xaml-toolkit
atl
mapdb
vxworks
unobtrusive-validation
ansible-playbook
streamreader
bootstrapper
avro4s
node-gyp
.net-4.6.2
revapi
pg-dump
windows-mobile-6.5
pearson
redundancy
chord-diagram
mediaelement
time-and-attendance
color-picker
azure-sdk
probability-density
slicknav
underscore.js-templating
ctest
firebaseui
testng-dataprovider
make-install
teamcity-8.0
metaclass
quicklisp
cakephp-3.1
microbenchmark
operation
pretty-print
fscommand
camanjs
java-metro-framework
message-driven-bean
algebraic-data-types
pushbackinputstream
xamlparseexception
delphi-6
pvrtc
gil
bluepill
armcc
viewswitcher
ohm
removeclass
boost-filesystem
window-management
f#-powerpack
pyinotify
w3wp.exe
perfect-hash
multi-tier
mtj
premature-optimization
procedural-music
uiq3

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html