cloud-based inter-rater reliability analysis, Cohen's kappa, Gwet's AC1/AC2, Krippendorff's alpha, Brennan-Prediger, Fleiss generalized kappa, intraclass correlation coefficients

Analyzing a Contingency Table

Weighted Analysis Requested?

Predefined Weights Custom Weights

Benchmarking Method

None Landis-Koch Altman Fleiss

Cumulative Probability Threshold

Confidence Level (%):

Sampling Fraction (%):

Key in your custom weights here!

Analyzing 2 -Rater Flat List Frequency Data

To subset your data, highlight the target area on the data grid and click the appropriate dark red button

Weighted Analysis Requested?

Predefined Weights Custom Weights

Benchmarking Method

None Landis-Koch Altman Fleiss

Cumulative Probability Threshold

Confidence Level (%):

Sampling Fraction (%):

Key in your custom weights here!

Analyzing 2-Rater Raw Scores

Type of Analysis

Chance-corrected Agreement Coefficients (CAC)

Intraclass Correlation Coefficients (ICC)

To subset your data, highlight the target area on the data grid and click the appropriate dark red button

Rater 1's Data Range:

Rater 2's Data Range:

Weighted Analysis Requested?

Predefined Weights Custom Weights

Benchmarking Method

None Landis-Koch Altman Fleiss

Cumulative Probability Threshold

Confidence Level (%):

Sampling Fraction (%):

Inter-/Intra-Reliability Model

Each target rated by a different group of raters

Each rater rates a different group of targets

Each rater rates all targets

Specify the nature of the rater effect

Raters from a larger population

Raters are the only ones of interest

Rater/Target Interaction?

Confidence Level (%):

Type of Intraclass Correlation Coefficient

Inter-Rater Reliability Coefficient

Intra-Rater Reliability Coefficient

Inter-Rater Reliability Coefficient

Intra-Rater Reliability Coefficient

Inter-Rater Reliability Coefficient

Intra-Rater Reliability Coefficient

Key in your custom weights here!

Analyzing Raw Scores for 3 Raters or More

Type of Analysis

Chance-corrected Agreement Coefficients (CAC)

Intraclass Correlation Coefficients (ICC)

To subset your data, highlight the target area on the data grid and click the appropriate dark red button

Weighted Analysis Requested?

Predefined Weights Custom Weights

Benchmarking Method

None Landis-Koch Altman Fleiss

Cumulative Probability Threshold

Confidence Level (%):

Sampling Fraction (%):

Inter-/Intra-Rater Reliability Model

Each target rated by a different group of raters

Each rater rates a different group of targets

Each rater rates all targets

Specify the nature of the rater effect

Raters from a larger population

Raters are the only ones of interest

Rater/Target Interaction?

Confidence Level (%):

Type of Intraclass Correlation Coefficient

Inter-Rater Reliability Coefficient

Intra-Rater Reliability Coefficient

Inter-Rater Reliability Coefficient

Intra-Rater Reliability Coefficient

Inter-Rater Reliability Coefficient

Intra-Rater Reliability Coefficient

Key in your custom weights here!

Analyzing the Distribution of Raters by Subject & Category

Type of Analysis

Chance-corrected Agreement Coefficients (CAC)

Intraclass Correlation Coefficients (ICC)

To subset your data, highlight the target area on the data grid and click the appropriate dark red button

Weighted Analysis Requested?

Predefined Weights Custom Weights

Benchmarking Method

None Landis-Koch Altman Fleiss

Cumulative Probability Threshold

Confidence Level (%):

Sampling Fraction (%):

Key in your custom weights here!

Paired and Unpaired t-Tests: Testing the Difference Between 2 Coefficients for Statistical Significance

Correlated or Uncorrelated Coefficients?

Correlated Uncorrelated

Weighted Analysis Requested?

About this App

AgreeStat360 is an App that implements various methods for evaluating the extent of agreement among 2 or more raters. These methods are discussed in details in 2 volumes that comprise the 5th edition of the book "Handbook of Inter-Rater Reliability" by Kilem L. Gwet. Both volumes are available in the form of printable PDF file and can be obtained here among other books.

For 2 raters, organize your data either as a contingency table (for categorical ratings only) or as a two-column table of raw categorical or quantitative ratings. For 3 raters or more, your data can be in the form of columns of raw scores (for categorical or quantitative ratings), or alternatively in the form of a distribution of raters by response category for categorical only.

Check the "Load a test dataset" checkbox to populate the data grid with test data, and click on the Execute button to see the results.

Your data can be captured in 2 ways. You can key in the ratings directly in the data grid or import it from a CSV text file or MS Excel. Whichever method you choose, you can highlight a portion of the grid you want to analyze and click on the red action button. The selected data will be described below the associated red action button.

Author

Kilem L. Gwet, PhD

gwet@agreestat.com