Quote: John Charnock "You can use statistics to draw inferences without looking at individual decisions. If, when refereeing games between Teams A & B, one particular referee P gives more penalties to A than B, and does this every game between them, season after season, whereas referee Q always gives more penalties to team B, then it is reasonable to conclude that at least one of P & Q is biased or that they are interpreting the rules differently. '"
No it isn't. Your data set is nothing like large enough.
I've just tossed a coin 10 times. The results were H - T - H - H - H - T - H - H - T - H.[this is actually true!] From this I conclude that Heads is twice as likely to come up as tails, and that my coin is intrinsically biased to land tails down.
When dealing with micro data-sets, like, say, the performance of one ref officiating the same combination of teams over several years, statistical extrapolation doesn't work. Examining the production of those data-sets is the only useful way forward, especially when considering a subjective performance (decisions) against an objective benchmark (rules of the game).