pandas


How to summarise data by percentages in pandas


This code:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
Leads to the printing of this:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
How would I now aggregate the total Missing, Unknown and Other for each action as a total value count for each action, and have as a percentage of All action_types which are Missing, Unknown or Other? So for example, there would be one row for each action, and about_us row would have 406+0/Total Missing + Unknown + Other for all actions.
See this question for context.
The problem is that the above contains a row right at the bottom of it called All which is the sum of everything, so:
All Missing 1126204
Unknown 1031170
Desired output would be:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
Here is some test data:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
Which should go into this:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24

First you can find value of Multindex with key All by xs and then you can try it by original Series. Last you can reset_index:
print df
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
dtype: int64
print df.xs('All')
Missing 11
Unknown 24
dtype: int64
action action_type
print df / df.xs('All')
action action_type
a Missing 0.181818
Unknown 0.208333
b Missing 0.272727
Unknown 0.166667
c Missing 0.454545
Unknown 0.250000
d Missing 0.090909
Unknown 0.375000
All Missing 1.000000
Unknown 1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
action action_type action_type_percentage
0 a Missing 0.181818
1 a Unknown 0.208333
2 b Missing 0.272727
3 b Unknown 0.166667
4 c Missing 0.454545
5 c Unknown 0.250000
6 d Missing 0.090909
7 d Unknown 0.375000
8 All Missing 1.000000
9 All Unknown 1.000000


Related Links

“TypeError: only length-1 arrays can be converted to Python scalars”
Pandas resampling hourly timeseries into hourly proportion timeseries
How to customize headers and column widths of DataFrame display?
horizontally centered xlabels for pandas timeseries plotting
drop a single tuple from a multi tuple column
Accessing Row from Previous Day in Pandas Dataframe with Apply
Select a multiple-key cross section from a DataFrame
“Reindexing only valid with uniquely valued Index objects”
How can I divide single values of a dataframe by monthly averages?
How to generate pandas DataFrame column of Categorical from string column?
How to perform key-restricted broadcast-operate-update in Pandas?
partial update to dataframe with Multi-index index with integer labels
IPython Notebook: What is the default encoding?
Append new columns to HDFStore with pandas
Pandas: reindex multiindex, broadcast results
TypeError when changing an integer data series in pandas

Categories

HOME
ionic-framework
client
wso2
debugging
fluentd
kde
google-play
comparison
electron
rsyslog
framework7
sharepoint-designer
sqlite3
cross-validation
slick-slider
midi
dax
export-to-csv
quickbooks
worldwind
systemc
libtiff
django-admin
lucene.net
jtextfield
swingx
reverse-proxy
paging
abi
errorlevel
cloudhub
kvc
language-agnostic
crosstab
p-value
javacv
pepper
url-scheme
intel-pin
bootstrap-material-design
maquette
yadcf
http-status-code-503
preg-match-all
column-family
caret
amazon-kinesis-kpl
main
broadcastreceiver
pdf-reactor
jvm-languages
glew
rainbowtable
scrollable
apache-fop
python-c-api
blogengine.net
jlink
datastax-startup
hls.js
brightcove
jxcore
nodebb
akka-cluster
orthogonal
background-service
setuptools
url-pattern
rotativa
savon
database-optimization
asp.net-4.5
pcf
merge-conflict-resolution
firebaseui
persist
lua-5.1
superstack
oauth2client
ptrace
tmuxinator
ibaction
myo
dereference
lustre
facebook-graph-api-v2.4
asp.net-web-api-odata
java-metro-framework
document-classification
project-planning
phpthumb
monomac
mysqltuner
ivyde
cos
dataadapter
random-seed
sharp-repository
eventlistener
pysimplesoap
bigcouch
external-accessory
vdsp
coercion
appender
locate
castle-monorail
aquaticprime





Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization