pandas


How to summarise data by percentages in pandas


This code:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
Leads to the printing of this:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
How would I now aggregate the total Missing, Unknown and Other for each action as a total value count for each action, and have as a percentage of All action_types which are Missing, Unknown or Other? So for example, there would be one row for each action, and about_us row would have 406+0/Total Missing + Unknown + Other for all actions.
See this question for context.
The problem is that the above contains a row right at the bottom of it called All which is the sum of everything, so:
All Missing 1126204
Unknown 1031170
Desired output would be:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
Here is some test data:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
Which should go into this:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24
First you can find value of Multindex with key All by xs and then you can try it by original Series. Last you can reset_index:
print df
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
dtype: int64
print df.xs('All')
Missing 11
Unknown 24
dtype: int64
action action_type
print df / df.xs('All')
action action_type
a Missing 0.181818
Unknown 0.208333
b Missing 0.272727
Unknown 0.166667
c Missing 0.454545
Unknown 0.250000
d Missing 0.090909
Unknown 0.375000
All Missing 1.000000
Unknown 1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
action action_type action_type_percentage
0 a Missing 0.181818
1 a Unknown 0.208333
2 b Missing 0.272727
3 b Unknown 0.166667
4 c Missing 0.454545
5 c Unknown 0.250000
6 d Missing 0.090909
7 d Unknown 0.375000
8 All Missing 1.000000
9 All Unknown 1.000000

Related Links

pandas dataframe interpolating missing days
procedurally convert interval data to cross sectional data
Pandas Resample Strange Zero Tolerance Behavior
VLOOKUP equivalent function to look up value in pandas DataFrame
pandas dataframe shift dates
ipython dataframe plotting setting color parameter?
How to change string columns size for all columns in pandas hdfstore?
is there an equivalent of data-frame in OCaml?
Pandas option to keep levels after xs operation
Pandas DataFrame Column rename error… am I'm being silly?
Assigning one column to another column between pandas DataFrames (like vector to vector assignment)
Incompatible indexer with Series
data wrangling with Flask: how to do this using SQL language? Does it make sense to use pandas?
Update columns in dataframe inside panel without for loop?
get grouping level in agg function
getting a default value from pandas dataframe when a key is not present

Categories

HOME
pdf
layout
at-command
rdf
baqend
blueprintjs
cross-validation
contact
medical
dacpac
vifm
caml
seaborn
django-admin
pc
propel
ab-testing
reverse-proxy
autoconf
opencover
tar
facebook-instant-articles
jndi
h2db
selectedindexchanged
tooltipster
typo3-6.2.x
compatibility
libssl
c++-amp
procdump
tinymce-4
google-cloud-endpoints-v2
functor
titanium-mobile
catch-all
vsts-build-task
revolution-slider
io-redirection
web-mining
abstract-class
hilbert-curve
ansible-playbook
greenrobot-eventbus
android-tabhost
sql-server-agent
isbn
keydown
qcombobox
jlink
togetherjs
windows-iot-core-10
python-cryptography
gcsfuse
django-scheduler
migradoc
savon
sts-springsourcetoolsuite
plottable.js
gstreamer-0.10
ios4
associative-array
energy
r-tree
cartesian-product
python-3.2
embedded-code
drawbitmap
deis
ideamart
myo
sailfish-os
skos
bigbluebutton
operator-precedence
nstableviewcell
php-5.4
openexr
doskey
mysqltuner
apc
django-nonrel
yui-compressor
seed
random-seed
xamlparseexception
eventlistener
smtp-auth
http-unit
chuck
isnullorempty
uiviewanimation-curve
zpt
appendto
mysql-error-1005
sproutcore-2
nhibernate.search
photoshop-cs4
adrotator
text-coloring
paul-graham

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App