(continued)
So again. Strategic bombing. Now think about 1942 Online - the game. The meta. The actions of the players. How does it manifest?
The Russia player opens with a triple attack or a double attack (say West Russia / Ukraine) or a single attack (say West Russia). The meta being what it is, I say triple attack is too risky, and I say MOST players will do a double attack. (Agree, disagree, it doesn’t matter - this is for illustrative purposes, the important concept is the R1 meta stabilizes around relatively few lines of play. Which it does. Especially without a preplaced bid. Sorry, just my little jab there.)
That’s an important point. If players literally had an infinite number of valid choices, then that would change the actions - which would change the particulars of dice calls - which would mean lines diverge. But players do NOT have infinite valid choices. Just a few. Their behavior “clusters” - which is different to random number clustering, but to be noted.
So let’s say in a particular game, R1 West Russia attack rolls 13 dice for attackers, 6 dice for defenders (or whatever), then 11 dice for attackers, 3 dice for defenders (or whatever). Let’s say the R1 attack resolves after 45 dice are rolled.
Then let’s say in ANOTHER game, an identical R1 attack resolves after 41 dice are rolled, another game 47 dice - whatever.
Then let’s say in ANOTHER game a slightly different R1 attack on West Russia resolves after 43 dice . . .etc.
What I’m getting at is even in all these different games, you get convergence of the number of dice rolled - and even though you have “randomness” in the PRNG, the game mechanics themselves lead to convergence. If the PRNG DOES generate something like X hits of Y attempts for infantry SOMEWHAT “fairly”, then with a great amount of data, results converge - just like how large amounts of data of binomial events converge on the binomial distribution.
And yes, probability is what it is so there’s no “certainty” - but as I wrote many many months ago, though you can’t definitively declare “our tests show there is no issue” (remember that?), you CAN say things like “our tests with this dataset indicate there’s a 99.999999999% of there not being an issue”. And please understand, I literally mean that last number can be COMPUTED AND MATHEMATICALLY DEMONSTRATED - unlike other “references” that I’ve seen from developers stating there’s a “99.999% whatever”, which I’m quite sure are NOT actually calculated.
Right. So we know human player actions cluster, we know the PRNG clusters, we know the PRNG generates a limited range of sequences of numbers, we know mathematically this MUST result in bias. The question is, again, though - not whether there is bias - but is there bias in EXPRESSION?
. . .and the answer is chained to the IMPLEMENTATION of the PRNG.
Suppose you say that the PRNG grabs a new time seed every time. You may recall the developers releasing a statement they tested whatever billion or something PRNG outputs. And . . . . that was drawn from in-game data? (Maybe not). Contextual data? (Almost CERTAINLY not.) What was the period of the data? That is, say the first hundred numbers in ten million separate PRNG generated sequences of numbers were taken for analysis. But if the game doesn’t call through an entire first hundred numbers then that’s quite useless for testing the validity of the PRNG in actual application. And if the game doesn’t jump to a random point in that hundred number sequence, then you’re going to get clustering. And if the game DID jump to a “random” point, then of course that “random” point is itself determined by a PRNG, so you’re back to clustering of clustering problems again.
So back to the expression. You remember I said we can EXPECT to see PRNG output clustering (and if we can’t, again, that of itself is a problem). And I said human behavior means inputs to the PRNG are also clustered. So that means if a player IS strategic bombing, they’re probably pursuing much the same strategy time and again. Make sense? So depending on how exactly the PRNG does manifest, it is entirely possible that the player will be punished for, say, strategic bombing, time and again.
You could point to second order analysis, you could say all the numbers come out in the wash - but they DON’T, not really. It IS possible that a player’s strategic bombers get shot down again and again and again - and given the described implementation you should be able to see that IS a distinct possibility because of clustering of clusters - then of course they’re going to think there’s an issue. Because there IS ACTUALLY AN ISSUE, STATISTICALLY - even if second-order analysis simply doesn’t detect the issue at all. And I’ll say AGAIN - it depends on a particular player, how they particularly play, because their clustering of behavior changes the calls to the PRNG which is itself clustered - so a reported issue from one player that IS statistically legitimate will not be able to be replicated by players that don’t emulate exactly the same behavior. (See, again? You must have data export.)
So returning to what I said was the public address. Rather than saying “we looked at customer complaints and switched out our PRNG (tee hee!)” I’d set up a REAL battery of tests using the ACTUAL data from complaining players - and if that wasn’t in the budget, I’d just not say anything - at least not without written records that I had protested, and written orders TO say what was said. Because again - if you want to do blanket denials, that DOES work in SOME situations, but when there’s legitimate complaints that remain unanswered for months, something else needs to be done.
==
Look. I get that if Beamdog doesn’t want to put out tools, okay. That’s a lotta work. Expensive. But if you’re seriously going to go with “xorshift128+ did nothing wrong”, you’re going to have to draw data from IN-GAME. There’s just way too many holes in the methodology if you don’t. (And I’ll note - if you do draw data, best have separate tests for platinum, gold, silver, bronze, and wood. I won’t elaborate here unless asked).
If you want to put out a statement addressing those that DO want to look into it - you’ll want to say something like how many time seeds the PRNG uses, how the time seed is generated (whether it’s on the second, whether each battle uses a different time seed, or when exactly the time seed argument is passed to the PRNG). The obvious problem is people will say if you put out that information, that’s going to potentially give hackers more information - but for heaven’s sake if they’re reverse engineering they can work it all out anyways, plus it’s already been released that xorshift128+ is the core, if people are going to hack a wee bit of general information that they probably already have (and more besides) shouldn’t make a difference.
And for heaven’s sake, in most Western cultures replies like “have you even read the link?” and “why did you comment that way?” are interpreted as NOT at all being sincere. They’re read as combative passive-aggressive rhetorical questions along the lines of “you haven’t read the link” and “you shouldn’t have commented that way (if you’d even bothered to read, which you didn’t.)”
I mean, I should know. You may recall my saying “some reading needed to be done” or similar on Steam forums and being warned for being “offensive”. So if I’m “offensive” if I’m a person that’s not representing an entity and I have documentation to back up that reading wasn’t done - then what of someone that IS representing an entity combatively asking if someone even read something, but then actually looking into it and seeing yes probably the reading WAS done?
You can see how that’s much worse, yes?
Look, I know, I’m disconcertingly up-front. But I want to be clear I’m not trying to get you in trouble. If I had that in mind, I’d just grab a load of screenshots and contextual documentation, add a summary and commentary, and send it on all over the place. The way I figure it, you’ve got a tough job, and at least even if you don’t have all the mathematical or contextual background, you HAVE at least been around for months - and if you were replaced, who’s to say the replacement would be any better? So if I can help by maybe answering questions that I think were left unanswered okay, I don’t have to make a big thing of it, you know? I do what I can, you do what you can, let’s all get along, etc.
But I can’t do anything about stuff like not getting the legitimate mathematics and argument behind complaints - nor can I do anything about combative responses like “have you even read the link?” Nor can I really play off blanket denials and evasions over the course of months as being a good answer.
And think, am I just pretending to try to be fair-minded? When I say the developers don’t do certain things, you know I don’t say things like “ach, the developers are hopeless idjits”? I always say things like “developers have to set priorities given their budget”. True. Am I endlessly pestering for transparency and dates? Months ago when it was said a developer response on dice was upcoming - I COULD have cited your and Cody’s later responses and made a big point of how the developers were up to their usual denial games (you remember a response (not you) that said something like “PRNGs can’t be distinguished from random numbers by humans”.) Goodness knows I COULD have ripped into that, I WANTED to - and you know that’s EXACTLY the sort of thing I CAN and HAVE jumped all over - but I didn’t. I just shut my mouth and waited for the developer response. I reasoned I didn’t need to feed any controversy, if the developer response was good (and I bet it wouldn’t be, but I didn’t SAY that at the time), then great. If the developer response was NOT good, then I could take it up at the time. Is that the behavior of someone with an axe to grind, or the behavior of someone that DOES want to give the developer a fair opportunity to respond (and even MORE than fair I’d say really.)
I could have played it very hard, but I didn’t. Not at all. So when I say let’s try to work it out, is that just me playing games, or is that how it is?
But then, if I am trying to work things out - if I say that things could be improved in the public relations department, if I say there is a legitimate case which isn’t understood - perhaps instead of dismissing what I’m saying out of hand, you might try to believe I’m NOT actually trying to rake you over the coals - maybe I REALLY DO believe what I’m writing. (Which I do.)
And I’d say even if you are being told by some people that there isn’t an issue - I’ll respond - do those people have a heavy mathematics background? Twenty plus years of experience with Axis and Allies? What of training in psychology? Marketing? Programming? Understand my responses typically combine all these disciplines. That’s perhaps why I see issues where others don’t - not because there ARE no issues, but because I just have more experience in different disciplines.
For example? You remember I said months ago that the PRNG issue was a public relations issue. That didn’t mean it could be answered with simply “public relations” - as with any question of practical issues, that meant the practical aspect had to be addressed. If a local politician has a “public relations” issue with not being able to obtain federal grant allocation, they can’t just respond all the time with a blanket “it was looked into, it just didn’t work out”. The REAL issues need to be looked at - WHY did federal grants go to surrounding counties but NOT the local? Why couldn’t this be resolved? Is it because the county in question didn’t have the same level of infrastructure as other counties? Or did other counties call in political favors? What was the real reason? And even if the REAL reason can’t be mentioned for political reasons - at LEAST a reason that SOUNDS real needs to come out. That’s public relations.
So there I said - look. How do people parse information? I say they’re not looking at individual dice results, they are looking at outcomes of groups of dice. Is that reasonable? Yes. It is. Then I say testing needs to be done on that basis (which as far as I know it totally wasn’t). Is THAT reasonable? Yes.
Then I say HOW you PRESENT that information. And I know binomial distributions create a curve, I know actual data differs to calculated projected - even if we’re not talking about real live situations with unknown variables, you ARE going to get deviation between the two. So then I said - what? Instead of citing arcane things like Dickey-Fuller (which I don’t think really answer the question anyways) or “internal testing” - I said you present the information VISUALLY - which means people WITHOUT mathematical backgrounds can look at it. And yes, you could possibly create a false narrative with the data. I never said you couldn’t.
And of course - mathematically - and you would need mathematics to know this - you can’t determine the validity or invalidity of a dataset to a given degree of precision if you don’t have a sufficient sized dataset to test. So you incorporate that as well.
But returning to the psychology aspect - instead of having arcane non-answers which don’t work out mathematically for those that do understand the mathematics, and look like evasions to those that don’t understand the mathematics - you have VISUAL representation. If the datasets indicate there is no problem (which might NOT be the case - but again, if you have the datasets and analysis protocols that’s a lot of the work that needs to be done for an actual fix right there) - anyways, that VISUAL representation convinces a lot of people right there. And for those that DO dig deeper, the onus of alternate models is on them - because you CAN say (provided you DID have data exportation and analysis tools and visual representation) that the developers did everything that could reasonably be expected.
As to programming experience - well, I wrote how PRNGs work in concept, and isn’t that true? Mathematics - clustering, isn’t THAT true? And twenty plus years of Axis and Allies experience - well when I say there’s no substitute for live defender decisions, I already outlined months ago how you CAN create a complete substitute but that’s so cumbersome to use it’s just as bad as the current situation, and if you do NOT have live defender decisions that changes defender ability to respond - and that affects everything from removing 2 fighters then 1 carrier then 2 fighters then 1 carrier etc if defending fighters don’t have a safe landing zone to taking preferentially Russian or UK or US troops as casualties, changing the units available to potentially do territory trading with air backup, to wanting subs NOT to submerge for a main fleet, but wanting them to ALSO submerge at other areas on the same round. Those distinctions aren’t something that very casual players are going to think important - but they ARE a distinct part of Axis and Allies play for veterans - and as I said, if you DID want to have a “casual” oriented Axis and Allies fine, but saying the online version is based on the 1942 v2 board game when there are actually a lot of differences! - well.
And to wrap up - I know it’s a lot of text here. But let’s not shoot the messenger okay? I tried short versions, all the time. I asserted allied carrier use was important. Wasn’t accepted, so I wrote a whole treatment of the issue (and more besides). I said PRNG was a public relations issue. Wasn’t accepted so I wrote out the details of how I think it should be treated. I said the developers needed to hire a competent statistician to look at things. Wasn’t done. So what can reasonably be done? I stated things in brief, they don’t get traction. I say hire professionals, it doesn’t happen. I write out a great amount of detail - NOT EXHAUSTIVE by any means, but just the bare minimum to shape and inform the case - and I get called out for text walls. It’s like really now. I don’t have any skin in the game, is it expected that I personally underwrite the bills for the research or for hiring experts or to do the work myself?
I understand I can’t expect people to understand decades of context from a few paragraphs. I understand even if the game generated up to four million in revenue that’s spoken for (to generate profits, if nothing else - and since it’s always a case of comparison with investments, maximum profit for minimum resource allocation is a reasonable goal). And I understand people don’t want to read a load of text. But reasonably, what else should I do or can I do?
To quote Tolkien’s “The Hobbit”
“What else do you suppose a burglar is to do?” asked Bilbo angrily. “I was
not engaged to kill dragons, that is warrior’s work, but to steal treasure. I made
the best beginning I could. Did you expect me to trot back with the whole hoard
of Thror on my back? If there is any grumbling to be done, I think I might have a
say. You ought to have brought five hundred burglars not one. I am sure it
reflects great credit on your grandfather, but you cannot pretend that you ever
made the vast extent of his wealth clear to me. I should want hundreds of years
to bring it all up, if I was fifty times as big, and Smaug as tame as a rabbit.”
It worked out for Bilbo and the dwarves; maybe things will all work out here too.