Gargantua's Hit-Handicap system ->New A&A Concept

Gargantua

So I’ve been having some fun.

I developed a Hit-Handicap parser program, to tangibly measure luck/performance in Axis and Allies, with a mathematician/programmer friend of mine.

Here’s how it works; it reads the entire text log of a TripleA game. Battle by Batte Adds up all the attack and defense power, divides by 6; showing how many hits a player should have expected on average. It then compares that to actual hits scored. It does this for both axis and allies.

For example, if I attacked you with 4 tanks, I should get 2 hits, and if you defended with 3 inf, you should get 1 hit.
Say I score 0 hits, my HH would be -2, and say you scored 3 hits, your HH would be +2.

In a perfect world, both players should hover around 0; scoring roughly as many hits as would be statistically normal. Deviation between both players numbers, then becomes the Hit-Handicap. Alot of dice get rolled, so the higher the differential the more extreme the game has become.

HH differential above be 4.

Now, extrapolate that exact same equation, over an entire game; and voila, you have a fairly precise hit handicap of how lucky or unlucky you have been; of how many more extra hits your enemy or yourself has, over the opponent!

I’ve put this forward to the TripleA dev’s and they are hopefully going to add it to the game in the next update! So people can
finally stop bickering about luck, and actually know the quantities of how good or bad it is. Not this selective memory crap.

You can see the request here.
https://forums.triplea-game.org/topic/474/total-game-report-option-for-statistical-purpose/10

The system isn’t perfect, as luck in some battles is more important than others, as well as attacking 1 inf with 100 of your own units, will show the luck those 100 units had good or bad. But it’s a good, measurable talking point, to discuss the kind of luck handicaps out there, as opposed to just saying “I got diced” when maybe I didnt? or maybe my opponent really has been just as unlucky as I have!

I know others have different preferences how chance gets calculated, at the end of the day, both players dice records are getting subject to the same system, so the differential is all that matters. We’re not calculating dice, we’re calculating hits expected vs hits received.

My hope in the future, is people can use a system like this, to mitigate luck when it gets really bad. Like say your HH is 40 for the allies; maybe the axis get a reroll card to use at a later date, or a free unit or something. who knows. Or maybe you can just take that pathetic “i got diced” excuse away from lots of people! :)

StuckTojo

@Gargantua:

Or maybe you can just take that pathetic “i got diced” excuse away from lots of people! :)

That’s only a pathetic excuse when someone uses it against me. When I use it it’s an objective, well-reasoned analysis of probability based on impartial observation and precision calculations.

Sir, I will thank you to acknowledge the difference. :-D

Gargantua

Well now we can PROVE if what you say is true! lol.

Speaking of PROOFs

I did all my calculations on rough simplfied AA short hand. Not “combining” dice. Just using Hits and Power! to establish a score that has meaning.

I didn’t get into things 2 dice is actually 36 results, and crap like that. That type of precision simply doesn’t matter, so long as both parties are being reviewed with the same tool of measurement!

CWO Marc

This discussion has actually made me realize something that I’d never caught before about one of the Alec Guiness lines in the original Star Wars film. You’ll recall that in Episode IV, when Han Solo dismisses Luke’s success at lightsaber training with the remark “I’d call it luck,” Obi-wan Kenobi retorts, “In my experience, there’s no such thing as luck.” What I’ve just realized is that there’s a scene in Episode I which puts a whole different spin on that retort. It’s the scene in which Watto tries to scam Qui-Gon Jinn with loaded dice, but fails because Qui-Gon counter-scams him by using the Force to make the dice roll turn out in his own favour. In other words: Jedi don’t believe in luck because, when the circumstances call for it (such as a crooked dice roll), they use the Force to cheat.

Gargantua

In other words… Obi-Wan was helping Luke with force powers to boost his confidence. Because Luke was so whiny… :)

variance

FINALLY I will get the respect I deserve for being one of the unluckiest people in the known universe. Others should know what I am dealing with here.

Gargantua

Precisely!

If only I could get it to work easily in F2F games lol. Sooner or later…

Young Grasshopper

@Gargantua:

Precisely!

If only I could get it to work easily in F2F games lol. Â Sooner or later…

Great work buddy, now crawl on your hands and knees to Imperious Leader and beg for that customizer badge (LOL)

variance

IL should beg him for the privilege of delivering the rightful badge of customizer.
(I am joking in case anyone with no sense of humour gets upset)

You know, this could be done in a face to face game too. You would just keep a running total of 4 things: the number of dice thrown by both sides, and the number of hits made by both sides. At the end of the game you take the number of dice thrown by each side and divide by 6, then subtract the number of actual hits made by each side throughout the game (the result may be positive or negative). Subtract these 2 numbers and you have Garg’s index. It would really not be too difficult. Maybe at each battle the attacker and defender roll the dice, and one of the other players is appointed to adjust the totals after each round of combat. Sounds like something you might do in a tournament or other high stakes situation.

Gargantua

That’s “close” variance but it wont quite work. You’d need to know what the attacking / defending values of the units are to know the “expected value”.

variance

Oh right, so you would add up the attack value and hits made, and the defense value and hits made in each round of combat. Divide the total attack and defense values by 6, and take the difference. Easy to do if using the battle board especially in larger battles

Witt

I like it Garg.

Tizkit

Gargantua,

This is definitely a great idea and would be a great tool.

I’m wondering about a slightly different metric though:
Sum[(Battle Score) - (Expected Battle Score)]

You mentioned the two main drawbacks of your calc:
a) Includes all overkill rolls
b) Values all rolls equally

These two factors could produce a false handicap. Consider a match where in the early game one player hits 3/3 on AA fire. (+2.5 Handicap) An unexpected TUV swing of 50+ is very possible from the value of the planes and the loss of their potential rolls. With top tier players that could be enough to slant the game.

If later in the game the player with the lucky AA shots mashes his stack of 50 units into a blocker a couple of times the impact of the extra 100 rolls will almost certainly overshadow the +2.5 Handicap from the AA fire. (and the impact of those rolls could easily throw the handicap into the negatives instead)

So now you can lose to an opponent who was lucky but still has a negative handicap inaccurately confirming their brilliance. :-P

Using the Battle Score differential would compensate somewhat for the above problems because it incorporates a value system to the impact of each roll and ignores overkills.

The actual battle score is easy since it comes with the turn summary. The Expected battle score is harder. I believe you need a simulator to get that reliably because of the path dependency of the battle. (i.e. the second round depends on how the first round went) Unless you know of a reliable way to get expected TUV swing without a simulation?

Perhaps the developers could get the system to run the battle calculator for X trials before each battle and spit out the expected TUV swing with the turn summary so you would get both an actual Battle Score and Expected Battle Score.

The principle would be the same as your calc, you find the Battle Score differential for both players for all combats and add them together to get an overall result. Plus the metric’s value is already in IPCs which we all understand. Measuring hits/kills is more subjective… kills of what?

Have you tested your method at all for false handicap results?

variance

Great points tizkit.

To fix the overkills problem, what if you limit the Actual Hits on each round of combat to the number needed to reduce the enemy’s force to 0 units.

For example, suppose you attack 1 destroyer with 10 fighters and let’s say you roll 3 hits. 1 hit would be sufficient to kill the destroyer so you only count that as 1 actual hit instead of 3. Drop the remainder.

Gargantua

After consideration I’ve decide that Overkill actually isn’t a problem, and I’ll explain why in a second. But before I get to that I will respond to the TUV metric.

TUV metric is what I’ll push for next. We are discussing it in the league general discussion, apparently it’s been play tested with great results. Basically, take expected TUV, vs actual TUV recieived. But this has it’s own problems. Some battles can go wild. running a battle calc can give you -20 to +30 on some larger battles, and vary in between those numbers each time you click the button. I’ve seen lots of 50/50 battles like this. SO it’s not without error; but I do like it overall.

My vision for this would be, you start the turn, prep all your combats, select DONE combat move. It’s at that point that it then calculates all your expected combat results; then you roll dice. TUV’s can then be compared by triple A of the before and after for a final cumulative output.

But here’s the thing, there’s lots of kinds of luck in axis and allies; and Having a low die roll is a luck that is independent from what type of unit it hits. The two should not be confused.

Ultimately - this is what I am trying to prove. That at any given point, when you roll a dice (1 die at a time), how you are performing compared to expected value. So you can wholesomely establish whether it’s entirely been just a dicing or not; regardless of what units got hit in what battles, or where, or on what turn (early hits better than later). The dice don’t care if it’s a battleship or a bomber.

A whiner’s argument is that everytime they roll a die, their average is off the median. What we want to prove is whether that’s true or not, so we are isolating and compartmentalizing chance specifically down to each die, each time it’s rolled, one at a time; compared to it’s expected value.

This is why OVERKILL isn’t a problem. We calculate each dice. If you’re getting diced it will reflect equally in overkill as in normal battles.

Also in my game against farmboy right now for example, I HAVE to send overkill, in order to secure victory, it’s not a choice. the dice have been that bad. Sometimes it takes 3 turns of 3 bombers to get ONE hit. and if I sent 100 bombers and got 10 hits; in one battle or over several battles, those poor results should be reflected. But if I’ve lost some small battles, but done average in the large ones; then I want to know that really in the big picture I haven’t been diced.

BACK to TUV calculations, other than the previously mentioned solution, I have another metric in mind that may help alot; or in a different way. I’m pushing for a cumulative casualty reporting system. So we can know how much plastic we have killed each game, of what type. :) This will help show TUV scores as the game progresses.

Once we start down this type of road of statistical reporting, we will start getting more reports. Like what’s russia’s kill ratio against Germany, or USA vs Japan, and that kind of thing. We’re going to get a TON of information mined out of the game, and things will continue to evolve from there.

Please understand for now, that the goal is just to prove whether the dice have been cruel or not; independent of when or where. Adding layers after or dissecting that information with different tools for different perspectives is stage #2.

Sorry for wall of text!

CWO Marc

One thing to keep in mind is that there’s a difference between the individual result of a single individual dice roll and the cumulated results of all the dice rolls that a player rolls during an entire game, in the same way that there’s a difference between weather (which is what you get on an individual day) and climate (which is the overall pattern of temperature, rainfall, etc. for a given region over the course of several years).

The cumulated dice rolls in an A&A game are like climate. In principle, they should more or less follow the normal statistical distribution that applies to the number of dice being rolled…and the more often you roll the dice, the more the results should match that distribution. Casino house games are built around this fundamental statistical principle, and this explains why the casino’s blackjack card dealers (for example) are encouraged to keep the game moving as fast as possible in order to play as many rounds as possible during their shift: because the more games are played, the more the results will fit the statistical distribution around which the payouts (which are designed to earn a profit for the house) are calculated.

An individual dice roll, by contrast, is like weather. By its very nature, it’s more prone to variability than a whole bunch of dice rolls taken together…and that’s where an important distinction comes in. If you roll two dice, and you get either two 1s or two 6s, it’s perfectly valid to say that the result doesn’t fit the predicted distribution, given that the highest probablity involving two dice is a result that adds up to 7…just as it’s perfectly valid to say that the -20C daily high temperatures that prevailed in southern Ontario and Quebec in the week between Christmas and New Year’s Day did not fit the normal season average of -7C or so. The issue of whether a player is getting “bad dice” in general, however, can only be judged by the cumulative results that he gets over the course of an entire game, not by an individual dice roll (just as the weather of a single day or a single week can’t be used to draw conclusions about whether the climate is changing).

What I’m wondering about Garg’s proposed system, which is certainly an interesting concept, is whether it’s a system that has no effect in the early rounds of the game (at which point the system is simply collecting data, and at which point it can’t draw any conclusions because it’s only got a small statistical sample to work with), and then – once it’s dealing with enough rolls to see whether a player is indeed falling outside the normal distribution overall – which gradually has more and more of a compensating effect on those players who are indeed getting bad dice. (I can’t really tell from the posts in the thread, but it may just be because I’ve only had time to read them quickly.) This also raises a potential point to think about: if the system aims to compensate for players who get excessively bad dice by making their results bettwe fit the normal distribution…shouldn’t it do the same thing to players who get excessively good dice? Nobody ever complains when they themselves get great dice, but I can understand why their opponents might complain about it.

Gargantua

Let’s break this down

LOW SAMPLING NEED NOT APPLY

What I’m wondering about Garg’s proposed system, which is certainly an interesting concept, is whether it’s a system that has no effect in the early rounds of the game (at which point the system is simply collecting data, and at which point it can’t draw any conclusions because it’s only got a small statistical sample to work with), and then – once it’s dealing with enough rolls to see whether a player is indeed falling outside the normal distribution overall – which gradually has more and more of a compensating effect on those players who are indeed getting bad dice. (I can’t really tell from the posts in the thread, but it may just be because I’ve only had time to read them quickly.) This also raises a potential point to think about: if the system aims to compensate for players who get excessively bad dice by making their results bettwe fit the normal distribution…shouldn’t it do the same thing to players who get excessively good dice? Nobody ever complains when they themselves get great dice, but I can understand why their opponents might complain about it.

It does work on low samples. Lets look at a firm example. I had an EPIC G1 against Mallery29 late 2017. In just G1 alone, Germany was +9 hits over expected value, and the allies were -6 under expected value. A Hit-Handicap of 15 units. Massacre. Sure in theory smaller sampling won’t generally show greater results, but you don’t usually know you’re getting diced in a game until a few turns in. and you have to pick a “compartmentalization point”. Are you just looking at one battle? one turn? or the whole game?. I can’t say for sure how many dice are rolled over G1 on average, or over turn 1. but G1 is probably at least 60 dice rolled? the sampling is instantly enough to start a baseline - which trends from there.

As for compensation… that’s basically house rules people can figure out to their own standard. Good or bad; and if they so want!

Something else to consider - The way the dev’s have started coding this at TripleA, it’s recording dice stats nation by nation. Maybe Germanys hot and Japan folds. what then? :)

Like Weather vs Climate

The cumulated dice rolls in an A&A game are like climate. In principle, they should more or less follow the normal statistical distribution that applies to the number of dice being rolled…and the more often you roll the dice, the more the results should match that distribution. Casino house games are built around this fundamental statistical principle, and this explains why the casino’s blackjack card dealers (for example) are encouraged to keep the game moving as fast as possible in order to play as many rounds as possible during their shift: because the more games are played, the more the results will fit the statistical distribution around which the payouts (which are designed to earn a profit for the house) are calculated.

This is exactly why all die rolls need to be weighed equally, against their expected results per die. Arizona is dry, it shouldn’t rain there often. but if it rains everyday there for a year WTF? something is off and you can quantify it by recording each day and comparing it; to see what kind of dice climate you dealt with on a quantifiable level.

Gargantua

To illustrate the point on “overkill” attacks. They should be recorded.

An example from just this turn against Farmboy.
https://www.axisandallies.org/forums/index.php?topic=41042.195

Combat - British
Battle in 98 Sea Zone
British attack with 5 fighters
Italians defend with 1 destroyer
British roll dice for 5 fighters in 98 Sea Zone, round 2 : 1/5 hits, 2.50 expected hits
Italians roll dice for 1 destroyer in 98 Sea Zone, round 2 : 1/1 hits, 0.33 expected hits
1 destroyer owned by the Italians and 1 fighter owned by the British lost in 98 Sea Zone
British win with 4 fighters remaining. Battle score for attacker is -2
Casualties for British: 1 fighter
Casualties for Italians: 1 destroyer

I attacked 4 destroyers that round, all battles the allies had 3 or more units attacking. “Overkill” as you would say. I scored 1 hit a battle, his scored 3/4 defenses. Just destroyers. All his ground units also hit atleast once defence as well.

On average I need to roll about 4 or 5 dice to get a hit on his destroyers, and his destroyers are rolling as if they are about a 5 defense unit. There’s nothing “overkill” about it.

How have the dice been treating you? Now you know :)

Gargantua

What we are effectively demonstrating should be called “Underkill” lol.

Omega1759

Regarding the “overkill”, the calculator can simply compare the outcome (# of hit or TUV damage) to the expected result (again # of hit or TUV damage).

Regarding the TUV logic, we would need to figure out how to factor retreat. Once a retreat is called, do you reduce the expected TUV to the number of rounds that were rolled?

Expected TUV appears way to go. Could also add the value of territories / NO gained in that equation.

Once all this is set up, maybe we can train an AI to read game scripts and play the game. :-D

Then skynet is born and we end up playing table top! :-D

Gargantua's Hit-Handicap system ->New A&A Concept

Featured Topics

T-shirts, Hats, and More

Suggested Topics

Queen Elizabeth's World War Three Speech

Wolfe's victory

The Toddler's Creed

Why can't

To people working on the A&AHE

Don't worry… We don't mind hiding more studies...

Stormtroopers v.s. Clonetroopers

'Playboy' playmate rounded up for deportation

32

17.0k

39.3k

1.7m