pandas


How to summarise data by percentages in pandas


This code:
#Missing analysis for actions - which action is missing the most action_types?
grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
print(grouped_unknown)
Leads to the printing of this:
action action_type
10 Missing 0
Unknown 0
11 Missing 0
Unknown 0
12 Missing 0
Unknown 0
15 Missing 0
Unknown 0
about_us Missing 0
Unknown 416
accept_decline Missing 0
Unknown 0
account Missing 0
Unknown 9040
acculynk_bin_check_failed Missing 0
Unknown 1
acculynk_bin_check_success Missing 0
Unknown 51
acculynk_load_pin_pad Missing 0
Unknown 50
How would I now aggregate the total Missing, Unknown and Other for each action as a total value count for each action, and have as a percentage of All action_types which are Missing, Unknown or Other? So for example, there would be one row for each action, and about_us row would have 406+0/Total Missing + Unknown + Other for all actions.
See this question for context.
The problem is that the above contains a row right at the bottom of it called All which is the sum of everything, so:
All Missing 1126204
Unknown 1031170
Desired output would be:
action percent_total_missing_action_type
10 0
11 0
12 0
15 0
about_us 416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline 0
account 9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed 1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..
Here is some test data:
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
Which should go into this:
action action_type_percentage
a Missing 2/11
Unknown 5/24
b Missing 3/11
Unknown 4/24
c Missing 5/11
Unknown 6/24
d Missing 1/11
Unknown 9/24
All Missing 11/11
Unknown 24/24
First you can find value of Multindex with key All by xs and then you can try it by original Series. Last you can reset_index:
print df
action action_type
a Missing 2
Unknown 5
b Missing 3
Unknown 4
c Missing 5
Unknown 6
d Missing 1
Unknown 9
All Missing 11
Unknown 24
dtype: int64
print df.xs('All')
Missing 11
Unknown 24
dtype: int64
action action_type
print df / df.xs('All')
action action_type
a Missing 0.181818
Unknown 0.208333
b Missing 0.272727
Unknown 0.166667
c Missing 0.454545
Unknown 0.250000
d Missing 0.090909
Unknown 0.375000
All Missing 1.000000
Unknown 1.000000
dtype: float64
print (df / df.xs('All')).reset_index().rename(columns={0:'action_type_percentage'})
action action_type action_type_percentage
0 a Missing 0.181818
1 a Unknown 0.208333
2 b Missing 0.272727
3 b Unknown 0.166667
4 c Missing 0.454545
5 c Unknown 0.250000
6 d Missing 0.090909
7 d Unknown 0.375000
8 All Missing 1.000000
9 All Unknown 1.000000

Related Links

Counting null as percentage
appending list of lists to pd.Dataframe()
how to perform where and distinct count operation in pandas dataframe?
Pandas Dataframe - Using index as value when slicing/filtering
How can I select out columns where the first values are NaN?
Record limitation in pandas dataframe when importing from a csv file
count of unique values in pandas dataframe column
Rolling sums on pandas dataframe
Pandas Bug - Error when inserting list serialize as string
Formatting index of a pandas table in a plot
resample over consecutive chunks of large size CSV
Too many possibilities for categorical fields
How to install pandas on virtual machine?
How to change particular column value when defined mask is true?
Pandas/NumPy: concisely label first N values matching a mask
fast way to make index prefix with an alphabet

Categories

HOME
arduino-uno
ngrx
path-finding
fme
graphql
cross-browser
azure-storage-tables
webrequest
contact
slurm
vifm
ssl-client-authentication
node-pdfkit
leiningen
flask-wtforms
carthage
ef-migrations
conemu
wkwebview
autosys
blazemeter
firebase-crash-reporting
bootstrap-duallistbox
pingfederate
web-api-testing
maxmind
minitab
ping
airconsole
azure-sql-database
normal-distribution
xcode-extension
twitch
texmaker
angular2-meteor
cubic-spline
rainbowtable
zip4j
errordocument
integrity
togetherjs
aurelia-validation
recursive-datastructures
darcs
namecoin
bind9
time-and-attendance
httplistener
bonobo
dataview
tform
system.management
cartesian-product
pundit
map-projections
service-accounts
method-parameters
vstest.console.exe
angular-local-storage
javax.mail
npapi
python-green
formatjs
dia
sorl-thumbnail
offloading
typekit
cdt
nsbutton
heisenbug
bundles
jmapviewer
gridcontrol
html-editor
selected
browser-detection
multipage
dbproviderfactories
stage
pydot
xmlspy
tomcat-valve
ecl
icanhaz.js
backcolor
private-members
wsdl.exe
.net-1.0

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App